Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Running Your Code Outside of Mortar

This article provides tips for how to run your Mortar Luigi and Pig code outside of the Mortar platform.

1. Export Your Code

The first step for running outside of the Mortar platform is to export a copy of your code. To export your Mortar Project code follow the steps at Exporting Your Mortar Projects. To export your Web Project code follow the steps at Exporting Your Web Projects.


2. Run an Elastic MapReduce (EMR) Cluster

To run your Pig scripts, you'll need to start a Hadoop cluster. The easiest way to do that is to use Amazon's Elastic MapReduce service.

You can start a EMR cluster in a number of ways, including from the AWS Management Console, AWS CLI, and any of the AWS SDKs. See the AWS EMR documentation for more information.

To mimic the EMR clusters that Mortar launches, you can use the following settings:

  • AMI Version: 2.4.7
  • Hadoop distribution: Amazon 1.0.3
  • Hardware
    • Master: m1.xlarge
    • Core: m1.xlarge

Mortar does not install any applications (Pig, Luigi, etc) onto its EMR clusters, but rather runs those from a separate Luigi/Pig server (see next section).

Mortar also runs several EMR bootstrap actions to setup python properly on EMR cluster instances. We have open-sourced these at https://github.com/mortardata/mortar-luigi-example/tree/master/emr-bootstrap-actions. The bootstrap scripts are:

  1. setup_python_2_7.sh: Configure python, easy_install, and pip to run version 2.7
  2. install_python_packages.sh: Install the standard Mortar python packages (setuptools, pip, nltk, scikit-learn). You can add your own additional packages here as well.

3. Setup a Luigi/Pig Server

To run your code, you'll need to setup a server with Luigi and Pig installed. The recommended packages and versions are:

Other optional system packages you may want to install (if you use them) are:

  • postgresql client: required by pyscopg2
  • mysql client: required to extract data from MySQL

Optional python packages are:

  • numpy: 1.7.1
  • scipy: 0.12.0
  • scikit-learn: 0.15.0
  • nltk: 2.0.4
  • psycopg2: latest
  • mysql-connector-python: latest
  • mortar-luigi: mortardata/mortar-luigi master
  • stillson: 0.1.1

4. Convert Your Mortar Luigi Tasks to Use Standard Luigi Pig Task

If your Luigi scripts are using the MortarProjectPigscriptTask task you will need to switch it to use the standard Luigi PigTask task.

While MortarProjectPigscriptTask ran your Pig job through the Mortar service, PigTask will run your Pig job on your Luigi/Pig server. To ensure your Pig jobs continue to work as they did with Mortar we have created a version of the PigTask that you can use to run your Pig job with the same libraries and settings previously used by Mortar.

To convert your Luigi tasks, start by checking out mortar-luigi-example from GitHub. This repository contains a script run-mortar-project-luigi.py which includes two Luigi Task classes.

The first class MortarStylePigTask is a common class you can extend for all of your specific PigTasks. This sets the Pig properties, parameters, and libraries that were used by Mortar.

The second class ExcitePigTask is an example task class that shows you how to run a Pig script from your Mortar project.

You can copy the MortarStylePigTask class directly into your existing Luigi script and then replace each MortarProjectPigscriptTask with a class similar to ExcitePigTask. For each of these tasks you will need to:

  • Set the path to your Pig script relative to the root of your Mortar project
  • Set any Pig parameters specific to your script.
  • Update the requires method to match the dependencies that your job requires.
  • Update the expected output file of your job.

When running your Luigi script you will also need to set a Luigi parameter "mortar-project-root" which is the absolute path to the root of your Mortar project which must be checked out somewhere on your Luigi/Pig server.