Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Start a New Luigi Script

Luigi Tasks for Sample Project

In this example, we will string two Tasks together—first we'll run a Pig script using a Task called WordRank, and once that job completes we will run a check on the results to make sure they look sensible using a Task called SanityTest. (Don't worry—we'll explain how these Tasks work shortly.)

Each Luigi Task declares its dependencies on other Tasks, requiring those to be completed before it can start. Therefore, if you run the last Task in a dependency chain, Luigi will run all its dependent Tasks, and all the dependencies of those Tasks, and so on, all the way down until it finds a Task whose dependencies are fully satisfied. Below is a diagram of the dependency tree for the Tasks in the simple Luigi script we'll be creating.

Luigi Dependency Tree

Start the Sample Project

For the purposes of this tutorial we'll be working within an example project developed by Mortar that contains several ready-to-run Pig scripts. We'll use Luigi to run the google_books_words.pig script, which analyzes the Google Books corpus to determine the frequency with which individual words appear. By default the Pig script analyzes only a portion of the data (words beginning with “q”), which makes development and testing much faster, but the Pig script also contains instructions for how to analyze the full corpus.

Once you've installed Mortar, run the following command (prepending your handle to generate a unique name) in the terminal to get a copy of Mortar's sample project:

mortar projects:fork git@github.com:mortardata/mortar-examples.git <your-handle>-mortar-examples

Mortar projects have a standardized structure that keeps the various elements of a project (Pig scripts, UDFs, Luigi scripts) organized. Luigi scripts, unsurprisingly, reside in the luigiscripts directory of a Mortar project.

The luigiscripts directory also contains client.cfg.template, a Luigi configuration file. Each time you run Luigi, Mortar will expand the variables in this file (e.g. ${MORTAR_EMAIL}) to their actual values and store the result in luigiscripts/client.cfg. The client.cfg file is regenerated every time Mortar runs, so any local changes will be overwritten. To avoid losing changes only edit the client.cfg.template.

Create a New Luigi Script

Now that you have a sample project to work with, you're ready to start writing your own Luigi script.

ACTION: In your favorite code editor, create a new blank file called word-luigi.py in the luigiscripts directory of your new project.

When you develop a Mortar project from scratch, you'll see a template Luigi script in your luigiscripts directory, but we're going to build a pipeline from scratch for this tutorial. Now you're ready to write your script.