Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Learn Hadoop and Pig
Developing Your Data App
Developing a Mortar Project

Developing a Mortar Project

While examples are great to get started, ultimately you are going to want to create your own project for your data app. In this section we'll create a new Mortar Project from scratch.

How Mortar Projects Work

Mortar Projects organize your code into a consistent project structure. You can develop them on your local computer and run them locally or in the cloud.

For local development, the Mortar Development Framework installs and configures all of the Pig and Hadoop libraries you need. When you're ready to run at scale, it snapshots your code, syncs it to the cloud, and runs it on an on-demand, private Hadoop cluster.

For a deeper discussion, see How Mortar Projects Work.


Setup Your Code Editor

To get the most out of Mortar Projects, make sure your favorite code editor is setup with Pig syntax highlighting. Check out our code editor plugin instructions for good editors and syntax plugin installation help.


Create a New Project

Mortar Projects come in two flavors: Public and Private. Public Mortar Projects can be viewed and forked by anyone. Private Mortar Projects are only accessible to users in your Mortar account.

To start a new Private Mortar Project from scratch, use the projects:create command

mortar projects:create my-sample-private-project

This will create a new project skeleton and register it with Mortar as a Private Mortar Project. This project will have folders created for commonly used items, such as pigscripts, macros, and UDFs.

If you want to create a new Public Mortar Project, add the --public switch:

mortar projects:create my-sample-public-project --public

Recall that Mortar project names share one global namespace, so you'll need to pick a unique name.


Add Code

Any Pig code that you want to run with Mortar should be put in the pigscripts directory in your project. You should already have an example pigscript in that directory called my-sample-project.pig that was generated when your project was created.

UDFs can be stored anywhere in your project, but for your convenience there is a udfs folder with subfolders for each language.

The most important thing to know about UDFs in Mortar projects is that your Pigscript must REGISTER all UDF files with the proper relative path. Your my-sample-project.pig shows an example of registering my-sample-project.py:

-- register a UDF with a relative path
REGISTER '../udfs/python/my-sample-project.py' USING streaming_python AS mysampleproject;

Set Local AWS Keys

If your scripts use data in Amazon S3, you'll need to set your AWS keys to run mortar:local commands. Once logged in, mortar:local commands will automatically synchronize with the AWS key paired to your mortar account. Please ensure that your keys are set on the settings page and then login with mortar login in order to have your AWS keys automatically synchronized when using mortar:local commands.

Alternatively, you can manually set your AWS keys. These can be set on the command line or in your ~/.bashrc file with:

# set AWS keys for mortar:local
export AWS_ACCESS_KEY="MY_AWS_ACCESS_ID_HERE"
export AWS_SECRET_KEY="MY_AWS_SECRET_ACCESS_KEY_HERE"

These keys are only used for running mortar:local commands with Pig; they are not sent or used for remote commands.

NOTE: If you don't need S3 access, you can just provide blank values for these environment variables.


Development Workflow

Generally the pattern of pigscript development follows a Write → Illustrate → Run paradigm.

Write

It's easiest to write a Pig script incrementally, adding only one or two new aliases at a time. This makes debugging simpler.

Illustrate

Illustrate is the best tool to check what you've written so far. Illustrate will check your Pig syntax, and then show a small subset of data flowing through each alias in your pigscript.

To get the fastest results, use the local:illustrate command.

mortar local:illustrate pigscripts/my-sample-project.pig

Once the illustrate result is ready, a web browser tab will open to show the results:

Illustrate Results

Run

When the script is finished and everything looks good in illustrate, it's time to run your job. The job will run on a new private AWS Elastic MapReduce Hadoop cluster, with the number of nodes that you specify.

mortar jobs:run pigscripts/my-sample-project.pig --clustersize 3

Use illustrate to be as certain of your results as you can be before doing a run. Illustrate is fast and free; runs will process your entire data set but will take much longer.


AT THIS POINT, YOU SHOULD BE ABLE TO:

  • Register a Mortar Project
  • Illustrate a pigscript in a Mortar Project
  • Run a job on a cluster using a Mortar Project
  • Create a new Mortar Project
  • REGISTER a UDF in a pigscript
  • Develop iteratively in a Mortar Project

Now that we've got the basics, let's delve into running your Mortar project.