Most Mortar users load and store their data in Amazon S3. S3 provides secure, durable, and inexpensive storage and is accessible through many different tools and programming languages. Additionally, Mortar's Elastic MapReduce (EMR) Hadoop clusters run in the same datacenters as S3, making it very fast to pull data into and out of S3.
Mortar connects to S3 via Amazon Web Services AWS Access Keys. These keys allow your Mortar Hadoop clusters to load input data from S3 and push output data back to S3.
New Mortar accounts are assigned Example Data AWS Access Keys. The Example Data keys can only load data from Mortar's example S3 buckets (mortar-example-data, twitter-gardenhose-mortar, etc) and store it back to Mortar's example output bucket (mortar-example-data-output).
If you want to load and store your own S3 data, you'll need to add your AWS Access Keys to your Mortar account.
In order to connect to Mortar, you'll need to create AWS Access Keys with S3 permissions. It's easy to do this using the AWS Identity and Access Management (IAM) console.
Log in to your AWS IAM Console:
Choose the Users option from the lefthand sidebar, and click the "Create New Users" button:
Provide a descriptive user name, such as "mortar-iam-user", leave "Generate an access key for each User" checked, and click "Create":
Open the "Show User Security Credentials" section, and write down the "Access Key Id" and "Secret Access Key ID" for your newly created user:
While still on the Users tab in the IAM Console, click the checkbox next to the name of the user you created:
Choose the "Permissions" tab at the bottom of the screen, and click "Attach User Policy":
Scroll Down inside "Select Policy Template Until you reach "Amazon S3 Full Access" and click "Select":
Click "Apply Policy":
Add the keys you copied down to your Mortar account on the Mortar AWS Settings page:
If you want more fine-grained control over your key permissions, you can use a custom IAM policy
Next, you need to write the Policy Document. It is written in Amazon's JSON IAM policy format.
You can start from this policy document, which gives read access to the Mortar example datasets and write access to the Mortar example output buckets:
First, make sure that you keep the
s3:ListAllMyBuckets stanza in your policy document. It provides a top-level list of the buckets available to the user, without allowing access to any of them. Most S3 libraries (and Mortar) require this permission to test whether your credentials are valid.
Add your input data buckets to the
s3:ListBucket, s3:GetBucketLocation and
s3:GetObject stanzas of the example document. These permissions allow your Mortar Hadoop clusters to find your bucket, list objects in the bucket, and read data from those objects.
Add your output data buckets to the
s3:ListBucket, s3:GetBucketAcl, s3:GetBucketLocation stanza and the
s3:GetObject, s3:GetObjectAcl, s3:DeleteObject, s3:PutObject stanza of the example document. These permissions allow your Mortar Hadoop clusters to find your bucket, list the bucket, get objects from it, delete objects from it, and put objects into it.
GetObjectAcl output bucket permissions allow you to create S3-authenticated links to your output data, visible on the Mortar Job Details page. You can remove these permissions if you like, and those links will not be created.
One gotcha to be aware of—the IAM Policy editor will reject your JSON if it has any whitespace before the first curly brace, so make sure to remove that if you see an error.
Once you've added a policy to your user, you can add their AWS Access Keys to Mortar in the same way as the "Adding IAM AWS Access Keys to Mortar" above.