While Web Projects are a convenient way to get started using Mortar, Mortar Projects make it much easier to develop a large Hadoop project with a team. Mortar Projects give you:
Pig and Hadoop on Your Computer: When you create a Mortar Project, you get a local installation of Pig and Hadoop ready to use, without needing to install anything yourself. That means faster development, and better testing.
Version Control and Code Sharing: Mortar Projects are backed by source control, either through Mortar or your own system, so you can collaborate with team members on a project.
1-Button Deployment: When you're ready to run your project on a Hadoop cluster, a single command is all that's needed to deploy and run in the cloud.
First, install the Mortar Development Framework to your local computer and login.
Mortar has an example project that contains sample code for different styles of analysis. Clone this example project using git.
git clone email@example.com:mortardata/mortar-examples.git cd mortar-examples
If you have a passphrase for your ssh key or have multiple ssh keys you will want to use ssh-agent to add your keys. This helps avoid being asked for a password or having to specify which key each time you run a mortar command. We use the common key path and name of id_rsa in the example below. Adjust this to the key and path you setup when following the github instructions above.
ssh-agent /bin/bash ssh-add ~/.ssh/id_rsa
In order to take a quick look at how to run Mortar Projects, let's use the
mortar-examples project in your favorite dev environment and look at the
coffee_tweets.pig script. This script calculates the
percentage of tweets in each state that indicate "coffee snobbery."
The best way to understand what this script is doing is to run an illustrate on it. This will let us see the data flowing through every alias to figure out what is changing at each step.
Rather than send our code out to the cloud and wait for the response to come back, we can get a much faster result by using Mortar's local mode. This uses Pig and Hadoop installed locally to perform an illustrate, and thus can run very quickly.
Run a local illustrate.
# Uses read-only example AWS keys - use your own keys for your data export AWS_ACCESS_KEY="AKIAJ54D5RAJFAYAEFZQ" export AWS_SECRET_KEY="frNw2FM1UqE1VmTRe8TZ7AloIpLeugdRCBW74pJX" mortar local:illustrate pigscripts/coffee_tweets.pig
The first time you run a local command, Mortar will download and install dependencies. Once that's done for a project, you should see very fast illustrate results. Run the command again to see a speedy illustrate.
Sometimes when developing, illustrate doesn't give you enough feedback about how your script is working. In these cases you can try running your code against a small subset of your data to see how it works. To avoid the time and cost of running your job on a remote Hadoop cluster you can use Mortar's local mode.
mortar local:run pigscripts/coffee_tweets.pig -f params/coffee_tweets/local.small.params
-f option, we pass in a parameter file that loads and stores the tweet data on our machine.
Now that the example code exists locally, you can register it as a project with Mortar. This will allow you to run the code on a Hadoop cluster.
In order to register a project with the Mortar service you'll first need to signup for a Mortar account.
For this example, we'll use a Public Mortar Project. We'll discuss Public and Private Mortar Projects in more depth in the next tutorial.
Mortar project names share one global namespace so you'll need to pick a unique name. For this tutorial you can prepend your Mortar handle to "mortar-examples" in order to generate a unique name.
Register your first Public Mortar Project:
mortar projects:register <your-handle>-mortar-examples --public
Once we are happy with our script, we can deploy it to run on a cluster. All we need to do this is to use the
mortar jobs:run pigscripts/coffee_tweets.pig
If we specify no parameters, this will by default launch a 2-node cluster to run the script on. As output of this command,
you should be given a
jobs:status command to run to see the job progress.
mortar jobs:status YOUR_JOB_ID_HERE --poll
You can also check on your job status by logging into the Mortar website and viewing the Jobs Page.
Once your job is done, you can visit the Job Details page to download a list of the states with the most coffee snobs per tweet-capita.
AT THIS POINT, YOU SHOULD BE ABLE TO:
Next, let's see how to start a Mortar Project from scratch with Developing a Mortar Project.