Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Interpret Results

Once you've done a successful run of the recommendation engine, the next step is to decide whether the recommendations you have created are likely to drive sales and engagement.

Data Dive

Deciding whether recommendations are "good" without doing A/B testing is not an exact science. The process generally combines a lot of common sense with domain expertise. There is no one true way, but here are some suggestions.

Get Detailed Information

There is an alternative macro to recsys__GetItemItemRecommendations called recsys__GetItemItemRecommendationsDetailed that provides more detail about how links are generated. It takes longer to run than the default macro, but it can be very helpful for understanding the results.

import 'recommender_alternatives.pig';

user_signals = foreach raw_input generate
             user as user,
             item as item,
             1.0  as weight,
             'PURCHASE' as signal_type;

item_item_recs = recsys__GetItemItemRecommendationsDetailed(user_signals);

The detailed macro requires a fourth argument in user_signals called signal_type that describes where the signal came from.

The output then includes two additional data fields:

  • link_data: this shows the number and quantity of signals that generated the recommendation. This is the sum of all signals to either item among the people who interacted with both. The number of users is also given here. If the link is not direct, this value will be null.
  • linking_item: if the link is not direct, shows the item that came between item_A and item_B on the graph (i,e. if item_A -> item_C -> item_B, this value would be item_C). If the link is direct, this value will be null.

On the example retail recommender, the output looks like this:

fargo   the godfather   6.8140154   2.6100628   1   [WISHLIST#4,NUM_USERS#5,PURCHASE#6]
fargo   butch cassidy and the sundance kid  6.49299 1.9853055   2   [WISHLIST#1,NUM_USERS#3,PURCHASE#5]
fargo   48 hours    5.010393    1.9853055   3   [WISHLIST#1,NUM_USERS#3,PURCHASE#5]
fargo   the untouchables    3.1157842       4       butch cassidy and the sundance kid
fargo   m.  2.971754    1.3863515   5   [WISHLIST#4,NUM_USERS#3,PURCHASE#3]

Focus on Item-Item Recommendations

Even if your goal is to personalize results for users, the intermediate item-item results give a very good indication of how good the recommendations will be. It's nearly impossible to decide whether the recommendations for an unknown user are good or not: it's much easier to know if Toy Story is a good recommendation for Antz than if Toy Story is a good recommendation for Susie G. in Chicago. Even if you have internal users to look at, they may get odd results if they have signals generated during testing or development.

Choose a Small Number of Sentinel Items

Make a short list of items that have different characteristics that make them likely to have disparate audiences. For the LastFM example, we might decide we want: heavy metal, classical, pop, jazz, and something obscure. Then we'd pick good examples of each of those categories and view their results every time we evaluate the recommender.

Be Specific

In looking at your sentinel items, take note of any problems you see with the recommendations. A few things to look for:

  • Do you have too little variation in your recommendations?
  • Do everyone's recommendations include the same really popular items?
  • Are some items missing recommendations?
  • Do some recommendations seem totally unrelated to the source item?

Once you have a list of issues to solve, divide them into two sets:

  1. Signal problems: These are issues with your data that you can solve when extracting your data or in the signal generation section of your script. You can address these issues by doing things like removing duplicated items from your data, gathering more data, removing invalid data, making sure all of your data is from the same time period, etc.
  2. Algorithm problems: These are issues with how the recommendations are generated, which can be addressed by modifying the recommendation algorithm.

Create Readable Output

If you are working with item ids, it's very hard to interpret the results without first converting them to readable strings. There are helper functions located in recsys_helper.pig that take the output of the standard recommendation engine macros and add translations of the ids.

import 'recsys_helper.pig';

signals = load '$INPUT_PATH' using PigStorage('user_id', 'user_name', 'item_id', 'item_name');

item_names = distinct(foreach signals generate item_id as id, item_name as name);
user_names = distinct(foreach signals generate user_id as id, user_name as name);

item_item_recs_names = Recsys__ItemRecNamesFromIds(item_item_recs, item_names);
user_items_recs_names = Recsys__UserRecNamesFromIds(user_item_recs, user_names, item_names);

Signal Techniques

Further described here, these techniques focus on cleaning up and improving data before it enters the recommendation engine.

Algorithm Techniques

As discussed here, these are modifications to the engine algorithm that either use additional types of data or alter the way signals are processed.

  • Adjust the Prior: Tweak the parameter that handles items with small sample size.
  • Add Item-Item Links: Include relationships directly between items. Requires metadata about items to generate the relationships.
  • Modify Item-Item Links: Use relationships directly between items to modify the results of the user-item signals. Requires metadata about items.
  • User-Item Diversity: Give users recommendations based on a more diverse set of items.
  • Item-Item Diversity: Add variety to item-item recommendations using metadata.
  • Popularity Boost: Increase the prevalence of popular items in results.
  • In-Stock Items: Don't return items that aren't currently available to users.

Where to Start

Here are some suggested techniques, depending on which issues you've identified in your results. Within each section the techniques are listed in order of suggested consideration, so scan through the headings to identify your primary issue, then look at each suggested technique in order to see if it is appropriate for your data.

Results Are Too Weird/Obscure

Results Have a Lot of "Wrong" Items

Results Are Too Similar to Each Other

Results Include Unavailable Items

Many Items Don't Have Recommendations

Results Are Just OK

Cold Start Problem

When new users come to a web site or application, there will initially be no data about their preferences. While this can make it hard to generate personalized recommendations for newcomers, there are techniques and tricks to handling this issue. For further discussion, see Handle Cold Start under Integration Best Practices