Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Connecting a Self-Hosted MongoDB to Mortar


Choosing a Connection Strategy

If your database lives in AWS us-east-1, then you'll be able to use the Direct Mongo Connection strategy below to connect to it. Mortar's clusters run in the AWS us-east-1 region, and will be able to quickly load and store data from your Mongo database.

If your database is hosted outside of AWS us-east-1 (in a different region, different cloud, or on-premise), use the Mongodump Data in S3 strategy below to connect to your data.


Direct Mongo Connection Strategy

For this strategy, you will connect directly to your Mongo database. If you have a replica set, you should be sure to connect to the secondary nodes to keep traffic off of the primary. Your Mongo URI connection string will look like:

mongodb://username:password@host:port/database.collection?readPreference=secondary

Mongodump Data in S3 Strategy

If your database is hosted outside AWS's us-east-1 region, there's still an easy way to process it with Mortar. To do so, you copy a recent mongodump backup to your Amazon S3 bucket, and then point Hadoop and Pig directly at it. Here are the steps to do so:

Produce and Download a Backup

First, locate a recent mongodump backup or take a new one.

Once you locate or take a backup, ensure that you have an un-tarred and decompressed copy of the BSON files ready to work with.

Uploading Data to S3

Pig's Mongo BSON Loader will pick up input data from Amazon S3: a simple, inexpensive, and near-infinitely-large storage system at AWS. S3 stores data in “buckets,” which are similar to directories. Buckets contain files, which are called “objects” in S3. To learn more about S3, check out AWS’s S3 Details page.

You'll want to upload the BSON files you got from mongodump to an S3 bucket in your AWS account. That bucket must be in the US Standard region for Mortar's Hadoop clusters to process it efficiently. You only need to upload ones for the collections you want to analyze; you can start with a single collection. There are 3 steps to get your BSON files uploaded to S3:

  1. Find or create your AWS account
  2. Get your AWS access keys
  3. Upload your data to a new S3 bucket in your account

We’ll explore each of these in order.

1. Find or Create an AWS Account

If you already have an Amazon Web Services (AWS) account and a login to the AWS Management Console, you can skip this portion and move to the next step. Otherwise, we’ll need to create an AWS account where you can upload your recommendation input data.

Creating an account at AWS is very easy. To do so, visit the AWS homepage, click “Sign Up,” provide your information, and create your account. If AWS asks which products you intend to use, be sure to select AWS S3. You’ll need to provide a credit card to AWS to cover any costs you incur, but note that AWS has a very generous free usage tier to get you started, and that S3 pricing is very inexpensive.

2. Get your AWS Access Keys

Next, you’ll need to get your AWS Access Keys. These keys will allow you to create a new S3 bucket and upload your data to it.

There are two types of AWS Access Keys: account-level keys that provide full account access and fine-grained (IAM) keys that provide access only to specific AWS resources. This tutorial will use account-level keys, but if you prefer IAM keys (more complex), you can follow these alternate setup steps for IAM.

To get your account-level AWS Access Keys:

  1. Go to the AWS Security Credentials page.
  2. Open the “Access Keys” section and push the “Create New Access Key” button.
  3. Expand the “Show Access Key” link, and write down your Access Key ID and Secret Access Key in a secure location.

Note that AWS only allows two pairs of access keys to be active at a time. If you already have two active pairs of keys, you’ll need to look up the Secret Access Key for one of them from the Legacy Security Credentials page, or talk to your IT department to get them.

3. Upload Your Data to a New S3 Bucket in US Standard Region

Now, we’re ready to upload our input data to a newly created S3 bucket. We’ll use the AWS Management Console to do this quickly and easily. (Check here for other upload options.)

First, create a new S3 bucket:

  1. Go to the S3 Management Console page in the AWS Management Console. If prompted, login with your AWS username and password.
  2. Press the “Create Bucket” button to create a new bucket.
  3. Name your bucket, using dashes to separate words (e.g. mycompany-mortar-recs-data). Keep your bucket in the US Standard Region, where Mortar’s Hadoop clusters run, to ensure fast and free data transfer between Hadoop and S3.
  4. Press “Create Bucket” to make your new bucket.

Next, upload your extracted data files into the bucket:

  1. Click on the name of your newly created bucket in the S3 Management Console.
  2. Click the “Create Folder” button to create a new folder, and name it “input”.
  3. Click the “input” folder to open it up.
  4. Click the “Upload” button, select your files, and press “Start Upload” to upload them.

4. Set your AWS Access Keys in Mortar

While you are waiting for your data to upload, you should add your AWS Access Keys to your Mortar account on the Mortar AWS Settings page:

Mortar AWS Settings.

These keys will be stored encrypted at Mortar, allowing you to access your data in S3.

When the upload finishes, your input data will be stored in Amazon S3 and ready to load into Pig.