Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Get Recommendations

This section continues from the previous section about generating signals. Now that you have your data in the right format, the Mortar recommendation engine will take care of the rest!

Storing your Recommendations

The last thing you need to do before you are ready to run your recommendations is to uncomment the remaining portion of your script.

/******* Use Mortar recommendation engine to convert signals to recommendations **********/
-- Call the default Mortar recommender algorithm on your user-item data.
-- The input_signals alias needs to have the following fields: (user, item, weight:float)
item_item_recs = recsys__GetItemItemRecommendations(user_signals);
user_item_recs = recsys__GetUserItemRecommendations(user_signals, item_item_recs);

-- Store recommendations
rmf $OUTPUT_PATH/item_item_recs;
rmf $OUTPUT_PATH/user_item_recs;

store item_item_recs into '$OUTPUT_PATH/item_item_recs' using PigStorage();
store user_item_recs into '$OUTPUT_PATH/user_item_recs' using PigStorage();

This code is taking your weighted user-item signals and generating recommendations for individual users and items. For more information about how the Mortar recommendation engine works, please read Recommendation Engine Basics. If you are only interested in generating recommendations for your items (and not your users) you can delete the user_item_recs line and the corresponding store statement.


If you would like to store your recommendations directly into MongoDB (instead of to S3) replace the section of the script below "-- Store recommendations" with the following code:

-- Store recommendations
%default DB '<your_database>'
%default II_COLLECTION 'item_item_recs'
%default UI_COLLECTION 'user_item_recs'

store item_item_recs into
    '$CONN/$DB.$II_COLLECTION' using com.mongodb.hadoop.pig.MongoInsertStorage('','');

store user_item_recs into
    '$CONN/$DB.$UI_COLLECTION' using com.mongodb.hadoop.pig.MongoInsertStorage('','');


If you would like to store your recommendations directly into a SQL database (instead of to S3) replace the section of the script below "-- Store recommendations" with the following code:

-- Store recommendations

-- Definitions for PostgreSQL example; change for other providers
%default DATABASE_TYPE 'postgresql'
%default DATABASE_DRIVER 'org.postgresql.Driver'
%default DATABASE_HOST '<host>:<port>'
%default DATABASE_NAME '<dbname>'
%default DATABASE_USER '<username>'
%default II_TABLE '<ii-table-name>'
%default UI_TABLE '<ui-table-name>'

store item_item_recs  into 'hdfs:///unused-ignore'
   'INSERT INTO $II_TABLE(from_id,to_id,weight,raw_weight,rank) VALUES (?,?,?,?,?)');

store user_item_recs  into 'hdfs:///unused-ignore'
   'INSERT INTO $UI_TABLE(from_id,to_id,weight,reason_item,user_reason_item_weight,item_reason_item_weight,rank) VALUES (?,?,?,?,?,?,?)');

You must first create the target tables in your database:

    weight NUMERIC, raw_weight NUMERIC, rank INTEGER NOT NULL, PRIMARY KEY (from_id, rank));

    weight NUMERIC, reason_item CHARACTER VARYING, user_reason_item_weight NUMERIC,
    item_reason_item_weight NUMERIC, rank INTEGER NOT NULL, PRIMARY KEY (from_id, rank));

Setting reasonable primary keys and indexes is generally necessary to get acceptable query times on the large mount of resulting data.

After creating those tables, you will need to set your database password. You can securely store this value for your project by doing:

mortar config:set DATABASE_PASS=<password>

Running in the Cloud

Now that your script is ready, it's time to run this job in the cloud. Before doing this, you need to pick the number of nodes you want to use for running your job. For your first run use a cluster size of 10. This is only an initial value to use--you should adjust your cluster size based on how long your job takes and your individual requirements for acceptable run time.

The Mortar recommender project defaults to using AWS spot instances. While spot instance prices aren’t guaranteed, they’re typically around $0.14 per hour per node. On very rare occasions when the spot price goes above $0.44 per hour per node, your cluster will be automatically terminated and the job will fail. In most cases the significant cost savings of spot instances will be worth the very small chance of your job failing, but read Setting Project Defaults if you would like to change the defaults of your project.

Its a good idea to shut down your cluster when you are done with it. This can be done by passing the '--singlejobcluster' option when running your job or by going to the Clusters Page. If you forget, Mortar will automatically shut down your cluster after it has been idle for 1 hour.

Ok, now you’re ready to run!

mortar run pigscripts/my-recommender.pig -f params/my_recommender.params --clustersize 10

After starting your job you will see output similar to:

Taking code snapshot... done
Sending code snapshot to Mortar... done
Requesting job execution... done
job_id: some_job_id

Job status can be viewed on the web at:

Or by running:

    mortar jobs:status some_job_id --poll

This tells you that your job has started successfully and gives you two common ways to monitor the progress of your job.

Monitoring Your Job's Progress

In the example recommender tutorial we covered how to monitor your job's progress but let's recap what you will do.

Open up your job status page at the url displayed after you started your job.

Job Status Page

The top of the page shows your job's overall progress. Remember that there are three main stages to how your job runs:

  1. Validation - Mortar checks your script for some simple error conditions.
  2. Cluster Starting - Mortar starts a Hadoop cluster for your job. This can take 5-15 minutes.
  3. Running - Your job is running on your cluster.

Once your job starts running on your cluster, you can use the visualization tab to see how your job is broken up into Hadoop Map/Reduce jobs and to see various metrics about how your job is doing.

Illustrate Results

On the details tab you will see various metadata about this job. Mortar keeps track of the exact code and parameters you used to run this job. As you are developing your recommendation engine it can be useful to go back to one of your previous jobs to get a sense of what you changed and how that affected your results.

If your job fails the details tab will show you the error that occurred. To diagnose errors you can use the log tabs on the right to get more detailed information.

Viewing Your Recommendations


To view recommendations stored in MongoDB you can connect normally and query the collections you used to store your results. Skip down to "Evaluating Your Results" below.


To view recommendations stored in a SQL database you can connect normally and query the tables you used to store your results. Skip down to "Evaluating Your Results" below.

Once your job has finished, it's time to take a look at your results. Your data will be organized in the following format in the S3 bucket you used for your OUTPUT_PATH parameter.

* item_item_recs
  * part-r-00000
  * part-r-00001
  * ...
  * part-r-NNNNN
* user_item_recs
  * part-r-00000
  * part-r-00001
  * ...
  * part-r-NNNNN

Your recommendations are broken up into multiple "part" files based on how Hadoop distributes the processing of your job.

Open up the web url displayed after you started your job and go to the details tab.

Download Results

From this tab you can download some or all of your result files. Depending on the size of your data you may want to use a tool like the AWS Command Line Interface, Transmit, or s3cmd to download your results, providing the AWS keys you wrote down previously to your tool of choice.

Once you have your results, open them and take a look.

Evaluating Your Results

The schema for the item-item recommendation output is:

  • item_A: the item recommendations are generated for
  • item_B: the item being recommended
  • weight: weight of the link after normalizing for popularity
  • raw_weight: original weight of the link. Note that if the link came from traversing the graph (i.e., the link is not direct), this value will be null.
  • rank: order of the recommendation (e.g., 2 indicates that the row is the second recommendation for item_A)

The schema for the user-item recommendation output is:

  • user: the user the recommendations are generated for
  • item: item being recommended
  • weight: weight of link between user and recommended item
  • reason_item: the item linking this user to the recommended item
  • user_reason_item_weight: weight of link between user and reason_item
  • item_reason_item_weight: weight of the link between the reason_item and the recommended item
  • rank: order of the recommendation (e.g., 2 indicates that the row is the second recommendation for the user)

An important note on the weight fields is that while a larger weight does indicate a stronger relationship, the absolute weight doesn't mean anything on its own. Absolute weight will vary based on the number of signals, the weight assigned them, and the total number of items available. Moreover, it is not possible to conclude that a link with twice the weight is twice as strong; there is no guarantee of a linear relationship.

Judging the initial quality of your recommendations requires some knowledge of your domain and your data. A good approach to take is to pick a few items that are very different (a romance and a horror movie, a pop artist and an instrumentalist, an outdoor and a kitchen good, etc.) and see that the recommendations for each of those make sense.

Congratulations, you've just generated your first recommendations!