Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Running Your Pipeline

Now that you have defined a data pipeline with a Luigi script, it's time to run your pipeline.

Running Locally

First we're going to try running the Luigi script on our local machine rather than offloading the full orchestration of the pipeline to Mortar.

In the terminal, just run:

cd YOUR_PROJECT_ROOT_DIRECTORY

mortar local:luigi luigiscripts/word-luigi.py --output-path s3://mortar-example-output-data/<your-handle-here>/q-words

where <your-handle-here> should be an identifier unique to you (such as first initial and last name).

The --output-path argument passes the required output_path parameter to your Luigi script. Note that although the parameters within the Luigi script use underscores, the parameter must be passed in the command line using hyphens. You can use similar parameters to override the defaults in your script (for example, cluster_size).

You should see status messages streaming through the terminal window, like this:

Mortar Local Luigi

(If Luigi returns an error that you can't solve, try looking at the word-luigi-solution.py script in the luigiscripts directory for a completed version of the script.)

When your script runs to completion, you will be able to find the output data in the S3 location specified by your output-path parameter. (You can access files in S3 using a tool such as s3cmd or via the AWS console.) Or you can run your pipeline in the cloud, after which Mortar will serve up a convenient download link for your results.

Running in the Cloud

That pipeline runs just fine locally, but eventually you'll build more complex pipelines that you'll want to deploy to run in the cloud. Also, you won't want to keep your laptop running for hours while pipelines complete!

It's incredibly easy to run in the cloud. Just substitute mortar luigi for mortar local:luigi in your run command:

mortar luigi luigiscripts/word-luigi.py --output-path s3://mortar-example-output-data/<your-handle-here>/q-words2

Note that we also added a 2 to the end of the output-path parameter. Otherwise, Luigi would have detected that our results already exist at the original location and not rerun the job. This behavior is called idempotency, and it's one of Luigi's most useful features for complex pipelines.

When you submit the mortar luigi command, your code will sync to the cloud, and your job will start running. Instead of seeing a stream of log messages, you'll get one link to check the status of your job in the Mortar application.

If you follow the link, you'll see both the progress of your Luigi job and of any Pig jobs run as part of your pipeline:

Mortar Luigi Progress

Running in the cloud makes it easier to retrieve the output of any of your Pig scripts, should you need them. Once your Luigi job completes, click on the "Results" button next to your Pig job on the Job History page. You'll see a link to download your results directly from S3, like this:

Mortar Luigi Results

Go ahead and download the file and open it up. If you’ve ever wondered which words beginning with “q” are the most common, now you have an answer to your … wait for it … question.

For examples of more complex Luigi pipelines, including pipelines that string together numerous Pig scripts, check out our Recommendation Engine and Redshift Data Warehouse data apps.