Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Connecting MongoLab with Mortar

Choosing a Connection Strategy

If your MongoLab database is hosted in AWS us-east-1, use the Direct Mongo Connection strategy below. Mortar's clusters run in the AWS us-east-1 region, and will be able to quickly load and store data from your Mongo database.

If your database is hosted outside of AWS us-east-1 (in a different region or different cloud), use the Mongodump Data in S3 strategy below to connect to your data.

Direct Mongo Connection Strategy

For this strategy, you will connect directly to your database in MongoLab. If you have a replica set, you should be sure to connect to the secondary nodes to keep traffic off of the primary. Your Mongo URI connection string will look like:


You can create the appropriate users and find your connection string from the MongoLab console. For a full walk-through of how to create MongoLab users and find your connection string, see our blog post.

Mongodump Data in S3 Strategy

This strategy lets you point Hadoop and Pig directly at MongoLab backups stored in Amazon S3.

MongoLab Backups

MongoLab produces recurring and one-time mongodump backups, suitable for use with Mortar. You can choose to store these backups in your own S3 bucket or in a bucket owned by MongoLab.

Produce and Download a Backup

MongoLab backups are stored tarred and gzipped in S3. In order to process them with Hadoop, they must first be untarred and ungzipped.

Here are the steps to produce and download a MongoLab backup:

  1. Using MongoLab's automated backup process, confirm that you have a recent recurring backup or schedule a new one-time backup.
  2. After your backup has succeeded, open the Backups tab for your cluster and click the Download icon to download the most recent backup.
  3. On your local computer, un-tar and un-zip the backup file you downloaded: tar -zxvf mybackup.tgz
  4. Proceed to the Uploading Data to S3 step below.

Uploading Data to S3

Pig's Mongo BSON Loader will pick up input data from Amazon S3: a simple, inexpensive, and near-infinitely-large storage system at AWS. S3 stores data in “buckets,” which are similar to directories. Buckets contain files, which are called “objects” in S3. To learn more about S3, check out AWS’s S3 Details page.

You'll want to upload the BSON files you got from mongodump to an S3 bucket in your AWS account. That bucket must be in the US Standard region for Mortar's Hadoop clusters to process it efficiently. You only need to upload ones for the collections you want to analyze; you can start with a single collection. There are 3 steps to get your BSON files uploaded to S3:

  1. Find or create your AWS account
  2. Get your AWS access keys
  3. Upload your data to a new S3 bucket in your account

We’ll explore each of these in order.

1. Find or Create an AWS Account

If you already have an Amazon Web Services (AWS) account and a login to the AWS Management Console, you can skip this portion and move to the next step. Otherwise, we’ll need to create an AWS account where you can upload your recommendation input data.

Creating an account at AWS is very easy. To do so, visit the AWS homepage, click “Sign Up,” provide your information, and create your account. If AWS asks which products you intend to use, be sure to select AWS S3. You’ll need to provide a credit card to AWS to cover any costs you incur, but note that AWS has a very generous free usage tier to get you started, and that S3 pricing is very inexpensive.

2. Get your AWS Access Keys

Next, you’ll need to get your AWS Access Keys. These keys will allow you to create a new S3 bucket and upload your data to it.

There are two types of AWS Access Keys: account-level keys that provide full account access and fine-grained (IAM) keys that provide access only to specific AWS resources. This tutorial will use account-level keys, but if you prefer IAM keys (more complex), you can follow these alternate setup steps for IAM.

To get your account-level AWS Access Keys:

  1. Go to the AWS Security Credentials page.
  2. Open the “Access Keys” section and push the “Create New Access Key” button.
  3. Expand the “Show Access Key” link, and write down your Access Key ID and Secret Access Key in a secure location.

Note that AWS only allows two pairs of access keys to be active at a time. If you already have two active pairs of keys, you’ll need to look up the Secret Access Key for one of them from the Legacy Security Credentials page, or talk to your IT department to get them.

3. Upload Your Data to a New S3 Bucket in US Standard Region

Now, we’re ready to upload our input data to a newly created S3 bucket. We’ll use the AWS Management Console to do this quickly and easily. (Check here for other upload options.)

First, create a new S3 bucket:

  1. Go to the S3 Management Console page in the AWS Management Console. If prompted, login with your AWS username and password.
  2. Press the “Create Bucket” button to create a new bucket.
  3. Name your bucket, using dashes to separate words (e.g. mycompany-mortar-recs-data). Keep your bucket in the US Standard Region, where Mortar’s Hadoop clusters run, to ensure fast and free data transfer between Hadoop and S3.
  4. Press “Create Bucket” to make your new bucket.

Next, upload your extracted data files into the bucket:

  1. Click on the name of your newly created bucket in the S3 Management Console.
  2. Click the “Create Folder” button to create a new folder, and name it “input”.
  3. Click the “input” folder to open it up.
  4. Click the “Upload” button, select your files, and press “Start Upload” to upload them.

4. Set your AWS Access Keys in Mortar

While you are waiting for your data to upload, you should add your AWS Access Keys to your Mortar account on the Mortar AWS Settings page:

Mortar AWS Settings.

These keys will be stored encrypted at Mortar, allowing you to access your data in S3.

When the upload finishes, your input data will be stored in Amazon S3 and ready to load into Pig.