The Mortar platform offers full support for Luigi. You can build pipelines on your computer with the Mortar Development Framework, and then deploy and run them in the cloud with one command.
Mortar Projects have a standardized structure that keeps the various elements of a project (Pigscripts, UDFs, Luigiscripts) organized. Luigi scripts, unsurprisingly, reside in the luigiscripts directory of a Mortar project.
In order to work with Luigi and Mortar Projects, be sure to first install the Mortar Development Framework.
When you're developing a Luigi script, it's convenient to work locally on your computer with your favorite code editor. You can do so with the mortar local commands provided by the Mortar Development Framework.
To locally run any Luigi script in your project's luigiscripts directory, you can run:
mortar local:luigi luigiscripts/my_luigi_script.py
If your Luigi script takes parameters (documented here by the authors of Luigi), you can pass to Luigi them exactly as you ordinarily would:
mortar local:luigi luigiscripts/my_luigi_script.py \ --my-parameter "the-value-for-my-parameter"
The first time you run
mortar local:luigi, the Mortar Development Framework will automatically install everything you need to work on a Luigi script (Python, Luigi, Python Virtual Environments, Configuration Templating, etc) into the
.mortar-local directory underneath your project root.
One thing to note about local Luigi runs: although Luigi is running locally, the Pigscript Tasks you use will run in the cloud. To speed up this process, we recommend setting the
cluster_size parameter to 0, which will run your Pigscript directly on Mortar's Pig Servers and not require a cluster to be launched.
Also, because Pigscripts run in the cloud, the Mortar Development Framework syncs a snapshot of your code to the cloud each time you run
mortar local:luigi. This ensures that any local changes you've made to Pigscripts are automatially accounted for when you run Luigi!
While it's convenient to develop a Luigi pipeline locally, you'll quickly find that you don't want keep your laptop running for hours while pipelines complete!
Fortunately, it's easy to run a pipeline in the cloud on Mortar. Just substitute
mortar luigi for
mortar local:luigi in your command:
mortar luigi luigiscripts/my_luigi_script.py --my-parameter "the-value-for-my-parameter"
When you run the command, you should see the following:
Taking code snapshot... done Sending code snapshot to Mortar... done Requesting job execution... done job_id: some_job_id Job status can be viewed on the web at: https://app.mortardata.com/jobs/pipeline_job_detail?job_id=some_job_id
This tells you that your job has started successfully and gives you the URL to monitor its progress. If you open that URL, you should see your Luigi job running, and should see logs start streaming into the Logs console.
To guarantee production stability, Mortar runs Luigi from an integration-tested fork at github.com/mortardata/luigi. We regularly pull in and release updates as they pass unit and integration tests.
Mortar has also open-sourced the mortar-luigi project, a collection of extensions for using Mortar from within Luigi. It contains a large collection of useful Tasks for things like running Pigscripts, managing clusters, running shell scripts, and interacting with databases like DynamoDB.
There are a number of features of Luigi that aren’t used by Mortar, but which you can read about in the Luigi docs.
In addition to running pipelines, Luigi was also originally designed to run Hadoop jobs directly. It has native Python MapReduce support built in to its luigi.hadoop.JobTask class. Mortar does not use Luigi’s Hadoop support; instead we use the MortarProjectPigscriptTask Task to run and manage Pig jobs on Hadoop. Then, Mortar handles all of the heavy lifting on an Elastic MapReduce cluster for you.