Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Learn Hadoop and Pig
Developing Your Data App
Mortar Project Example

Mortar Project Example

Mortar Projects are the best way to develop a data app using Mortar. You develop code locally in your own dev environment and deploy it to Mortar with a single command.

Why Mortar Projects?

Mortar Projects make it easy to develop a large Hadoop project with a team. Mortar Projects give you:

  • Pig and Hadoop on Your Computer: When you create a Mortar Project, you get a local installation of Pig and Hadoop ready to use, without needing to install anything yourself. That means faster development, and better testing.

  • Version Control and Code Sharing: Mortar Projects are backed by source control, either through Mortar or your own system, so you can collaborate with team members on a project.

  • 1-Button Deployment: When you're ready to run your project on a Hadoop cluster, a single command is all that's needed to deploy and run in the cloud.


Public vs Private

Mortar Projects come in two flavors: Public and Private. Public Mortar Projects can be viewed and forked by anyone. Private Mortar Projects are only accessible to users in your Mortar account. When creating a new project you will need to decide if this is code that you would like to share with the world or if this is code that you want to keep within your organization.


Setup your Project

Mortar has an example project that contains sample code for different styles of analysis. We'll use Mortar to create your own copy of this project.

Mortar project names share one global namespace so you'll need to pick a unique name. For this tutorial you can prepend your Mortar handle to "mortar-examples" in order to generate a unique name.

mortar projects:fork git@github.com:mortardata/mortar-examples.git <your-handle>-mortar-examples --public
cd <your-handle>-mortar-examples

Open Twitter Example

In order to take a quick look at how to run Mortar Projects, let's use the coffee_tweets example.

Open your project in your favorite dev environment and look at the coffee_tweets.pig script. This script calculates the percentage of tweets in each state that indicate "coffee snobbery."


Use Local Illustrate

The best way to understand what this script is doing is to run an illustrate on it. This will let us see the data flowing through every alias to figure out what is changing at each step.

Rather than send our code out to the cloud and wait for the response to come back, we can get a much faster result by using Mortar's local mode. This uses Pig and Hadoop installed locally to perform an illustrate, and thus can run very quickly.

Run a local illustrate.

# Uses read-only example AWS keys - use your own keys for your data
export AWS_ACCESS_KEY="AKIAJ54D5RAJFAYAEFZQ"
export AWS_SECRET_KEY="frNw2FM1UqE1VmTRe8TZ7AloIpLeugdRCBW74pJX"
mortar local:illustrate pigscripts/coffee_tweets.pig

The first time you run a local command, Mortar will download and install dependencies. Once that's done for a project, you should see very fast illustrate results. Run the command again to see a speedy illustrate.


Use Local Run

Sometimes when developing, illustrate doesn't give you enough feedback about how your script is working. In these cases you can try running your code against a small subset of your data to see how it works. To avoid the time and cost of running your job on a remote Hadoop cluster you can use Mortar's local mode.

Run coffee_tweets.pig locally.

mortar local:run pigscripts/coffee_tweets.pig -f params/coffee_tweets/local.small.params

Using the -f option, we pass in a parameter file that loads and stores the tweet data on our machine.


Run Job on a Hadoop Cluster

Once we are happy with our script, we can deploy it to run on a cluster.

By default the Mortar example project uses AWS spot instances to save money. Running this example on a 2-node spot instance cluster for 1 hour should cost you approximately $0.28 in pass-through AWS costs. Before running this job you will need to add your credit card to your account. You can do that on our Billing Page.

Once your credit card has been added, all you need to do is use the jobs:run command.

mortar jobs:run pigscripts/coffee_tweets.pig

As output of this command, you will be given a jobs:status command to run to see the job progress.

mortar jobs:status YOUR_JOB_ID_HERE --poll

You can also check on your job status by logging into the Mortar website and viewing the Jobs Page.

Once your job is done, you can visit the Job Details page to download a list of the states with the most coffee snobs per tweet-capita.


AT THIS POINT, YOU SHOULD BE ABLE TO:

  • Register a Mortar Project
  • Illustrate a script in a Mortar Project
  • Run a job on a cluster using a Mortar Project

Next, let's see how to start a new Mortar Project from scratch.