Mortar Projects organize your code into a consistent project structure. You can develop them on your local computer and run them locally or in the cloud.
For local development, the Mortar Development Framework installs and configures all of the Pig and Hadoop libraries you need. When you're ready to run at scale, it snapshots your code, syncs it to the cloud, and runs it on an on-demand, private Hadoop cluster.
For a deeper discussion, see How Mortar Projects Work.
To get the most out of Mortar Projects, make sure your favorite code editor is setup with Pig syntax highlighting. Check out our code editor plugin instructions for good editors and syntax plugin installation help.
Mortar Projects come in two flavors: Public and Private. Public Mortar Projects can be viewed and forked by anyone. Private Mortar Projects are only accessible to users in your Mortar account.
To start a new Private Mortar Project from scratch, use the
mortar projects:create my-sample-private-project
This will create a new project skeleton and register it with Mortar as a Private Mortar Project. This project will have folders created for commonly used items, such as pigscripts, macros, and UDFs.
If you want to create a new Public Mortar Project, add the
mortar projects:create my-sample-public-project --public
Recall that Mortar project names share one global namespace, so you'll need to pick a unique name.
Any Pig code that you want to run with Mortar should be put in the
pigscripts directory in your project. You should already have an example pigscript in that directory called
my-sample-project.pig that was generated when your project was created.
UDFs can be stored anywhere in your project, but for your convenience there is a
udfs folder with subfolders for each language.
The most important thing to know about UDFs in Mortar projects is that your Pigscript must
REGISTER all UDF files with the proper relative path. Your
my-sample-project.pig shows an example of registering
-- register a UDF with a relative path REGISTER '../udfs/python/my-sample-project.py' USING streaming_python AS mysampleproject;
If your scripts use data in Amazon S3, you'll need to set your AWS keys to run
mortar:local commands. Once logged in,
mortar:local commands will automatically synchronize with the AWS key paired to your mortar account. Please ensure that your keys are set on the settings page and then login with
mortar login in order to have your AWS keys automatically synchronized when using
Alternatively, you can manually set your AWS keys. These can be set on the command line or in your
~/.bashrc file with:
# set AWS keys for mortar:local export AWS_ACCESS_KEY="MY_AWS_ACCESS_ID_HERE" export AWS_SECRET_KEY="MY_AWS_SECRET_ACCESS_KEY_HERE"
These keys are only used for running
mortar:local commands with Pig; they are not sent or used for remote commands.
NOTE: If you don't need S3 access, you can just provide blank values for these environment variables.
Generally the pattern of pigscript development follows a Write → Illustrate → Run paradigm.
It's easiest to write a Pig script incrementally, adding only one or two new aliases at a time. This makes debugging simpler.
Illustrate is the best tool to check what you've written so far. Illustrate will check your Pig syntax, and then show a small subset of data flowing through each alias in your pigscript.
To get the fastest results, use the
mortar local:illustrate pigscripts/my-sample-project.pig
Once the illustrate result is ready, a web browser tab will open to show the results:
When the script is finished and everything looks good in illustrate, it's time to run your job. The job will run on a new private AWS Elastic MapReduce Hadoop cluster, with the number of nodes that you specify.
mortar jobs:run pigscripts/my-sample-project.pig --clustersize 3
Use illustrate to be as certain of your results as you can be before doing a run. Illustrate is fast and free; runs will process your entire data set but will take much longer.
AT THIS POINT, YOU SHOULD BE ABLE TO:
Now that we've got the basics, let's delve into running your Mortar project.