Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Algorithm Techniques

These techniques modify and extend the standard Mortar recommendation engine to better work with your data.

Examples of Techniques

For most of the techniques described, the mortar-recsys project comes with example scripts. Each of these examples builds on the retail example previously described. Some of the examples use an additional data set (found in data/retail/inventory.json) set called inventory.json that represents the current inventory.


Adjust the Prior

For:

  • Results include too many weird and unpopular items
  • Results feel generally "not great"

Where:

  • my_recommender.params

The parameter BAYESIAN_PRIOR has an effect on recommendations that can be hard to predict. It controls for item outliers that have very small sample sizes. When only a very small number of users have interacted with an item, it’s hard to draw inferences about it. If both people who have read uncle Bob’s self-published novella also read Pride and Prejudice, we shouldn’t conclude that Bob’s novella is a good recommendation for Pride and Prejudice. The prior controls for that by pretending that there were other people who read Bob’s novel who didn’t read Pride and Prejudice.

Generally if the prior needs to be adjusted, it should be larger. Particularly if your results seem to include a lot of oddball, esoteric results (and that’s not what you want), increasing the prior can help. How much to increase it by is hard to say, but try doubling it and see what happens.


For:

  • When there is data not captured by the user-item signals
  • You have metadata that relates items to one another

Where:

  • my-recommender.pig

It may be that you have data that links items directly to other items: photos where two pieces of jewelry appear together, or items that appear on the same wishlist. It might also be that you have metadata that links items together: books and articles written by the same author, or clothing from the same designer. That additional data can be incorporated into the recommendation engine.

How:

The first step is to generate item-item links. This is done by either loading a data set directly containing these logical connections, or by generating them from existing data (i.e.: items on the same wishlist). The item-item signals should have the following schema:

item_signals : { (
  item_A: chararray,
  item_B: chararray,
  weight:  float
) }

Choose the weight based on the strength of the relationship between the two items, just as you did when generating your signals. For example, if two books have the same author, you might link them with a weight of five times the buy signal. Once generated, the item-item links need to be passed as an argument into the recsys__GetItemItemRecommendations_AddItemItem macro. This replaces the standard recsys__GetItemItemRecommendations macro that would normally be used. Note that the new macro takes an additional argument of your item_signals.

-- generate user_signals
  --
  --
-- generate your item-item links
item_signals = load '$ITEM_SIGNALS' as (
                    item_A: chararray,
                    item_B: chararray,
                    weight: float
                    );

item_item_recs = recsys__GetItemItemRecommendations_AddItemItem(user_signals, item_signals);
--item_item_recs = recsys__GetItemItemRecommendations(user_signals); replace this macro.

Example:

Under pigscripts/techniques/add-item-item-technique.pig, you can see the inventory data set loaded. This inventory data is then used to generate item-item links by creating a link between movies with the same genre. These item-item links are then passed on to recsys__GetItemItemRecommendations_AddItemItem.

mortar local:run pigscripts/techniques/add-item-item-technique.pig -f params/technique.params

For:

  • You have metadata that relates items to one another
  • You want to add a negative signal between items
  • You want to diversify your results

Where:

  • my-recommender.pig

New item-item links is one way of incorporating metadata about items into the recommendations. Another way is to modify the links that are already there—upgrading or downgrading them based on meta-factors. This is a great way to diminish signal you don’t want, such as overlap between classical and heavy metal music.

This differs from adding item-item links in a couple of ways. First, it’s usually best to modify weights by a multiplicative factor (e.g.: 0.7), so it has as much effect on large affinities as small ones. Second, modifying links does not add new ones, so you can’t add any affinities that aren’t present in the user data.

How:

The first step is to find or generate the metadata that will provide some sort of classification for a particular item. The generated metadata should have the following schema:

  metadata : { (
      item: chararray,
      metadata_field: chararray
  ) }

The metadata_field should describe some sort of classification of the item (e.g.: the genre of a movie). Then replace recsys__GetItemItemRecommendations with recsys__GetItemItemRecommendations_ModifyCustom and be sure to add the desired weight adjustment in the Custom Code section.

/******** Custom GetItemItemRecommnedations *********/
define recsys__GetItemItemRecommendations_ModifyCustom(user_item_signals, metadata) returns item_item_recs {

    -- Convert user_item_signals to an item_item_graph
    ii_links_raw, item_weights   =   recsys__BuildItemItemGraph(
                                       $user_item_signals,
                                       $LOGISTIC_PARAM,
                                       $MIN_LINK_WEIGHT,
                                       $MAX_LINKS_PER_USER
                                     );
    -- NOTE this function is added in order to combine metadata with item-item links
        -- See macro for more detailed explination
    ii_links_metadata           =   recsys__AddMetadataToItemItemLinks(
                                        ii_links_raw, 
                                        $metadata
                                    ); 


    --The code here should adjust the weights based on an item-item link and the equality of metadata.
    -- In this case, if the metadata is the same, the weight is reduced.  Otherwise the weight is left alone.
    ii_links_adjusted           =  foreach ii_links_metadata generate item_A, item_B,

    /********* Custom Code starts here ********/

    -- the amount of weight adjusted is dependant on the domain of data and what is expected
    -- It is always best to adjust the weight by multiplying it by a factor rather than addition with a constant
    (metadata_B == metadata_A ? (weight * 0.5): weight) as weight;

    /******** Custom Code stops here *********/

    -- remove negative numbers just incase
    ii_links_adjusted_filt = foreach ii_links_adjusted generate item_A, item_B,
                                      (weight <= 0 ? 0: weight) as weight; 
    -- Adjust the weights of the graph to improve recommendations.
    ii_links                    =   recsys__AdjustItemItemGraphWeight(
                                        ii_links_adjusted_filt,
                                        item_weights,
                                        $BAYESIAN_PRIOR
                                    );

    -- Use the item-item graph to create item-item recommendations.
    $item_item_recs =  recsys__BuildItemItemRecommendationsFromGraph(
                           ii_links,
                           $NUM_RECS_PER_ITEM, 
                           $NUM_RECS_PER_ITEM
                       );
};

Copy the code above (or from the example file) and make a change to only the section where "Custom Code starts here" is labeled. You'll need to set the change in weight based on the relationship of the metadata of the two items. This customized macro then needs to be used in place of recsys__GetItemItemRecommendations.

-- generate user_signals
  --
  --

-- generate metadata
metadata = load '$METADATA_SOURCE' as (
                    item: chararray,
                    metadata_field:chararray
                  );

item_item_recs = recsys__GetItemItemRecommendations_ModifyCustom(user_signals, metadata);
--item_item_recs = recsys__GetItemItemRecommendations(user_signals); replace this macro

Example:

In pigscripts/techniques/modify-item-item-technique.pig we continue the analysis of the movie retail store. This example weakens the connection between two items when the genre is the same. The genre acts as the metadata_field and the movie title is the item. The customized macro is put at the top of the file. When the metadata_field (the genre) of two movies are the same, the weight is divided by half, otherwise the weight is left alone. This weakens the links generated from building the item-item graph when the movies are the same genre to give more diverse results. Alternatively, the weights could instead have been strengthened to create recommendations of more similar movies.

mortar local:run pigscripts/techniques/modify-item-item-technique.pig -f params/technique.params

User-Item Diversity

For:

  • Items recommended to a user are all too similar

Where:

  • my_recommender.params

If a user purchases a book, a lightbulb, and a snowboard, we probably shouldn’t show them only lighting-relating recommendations. Or only snow-sport-related recommendations. We want to show the user a mix of items that relate to what they’ve already seen or purchased. This might happen naturally, but it might not if some categories of item have very strong affinities and others don’t. To encourage diversity, we can add increasing penalties for having more than one recommendation tied to a single user-item affinity.

How:

To use this technique, change the ADD_DIVERSITY_FACTOR in your params file to true. Run your pigscript with the modified param file.

Example:

Run the retail-recsys example with diversify_user_item.params instead of the retail.params file.

mortar local:run pigscripts/retail-recsys.pig -f params/diversify_user_item.params

Item-Item Diversity

For

  • Item-item recommendations are very similar to each other

Where:

  • my-recommender.pig

If a user is interested in a Red Sox game, it might make sense to recommend another Red Sox game. But it probably doesn’t make sense for the first twenty recommendations to all be Red Sox games. That’s not a very interesting set, and the user isn’t seeing anything new or unexpected.

To avoid these pools of similarity, we’ll need to find a piece of metadata we want to diversify across. It might be sports team, music genre, or color; it just needs to be collected as a single piece of information available for all items. (Note: it could be randomly generated for items missing the relevant information.) Then the algorithm penalizes successive recommendations that have the same metadata as previous ones.

How:

The first step is to generate metadata that will provide the classification of each item. The generated metadata should have the following schema

metadata : { (
    item: chararray,
    metadata_field: chararray
) }

The metadata_field should describe some sort of classification of the item (e.g.: the genre of a movie).

Next, replace recsys__GetItemItemRecommendations in your pigscript with recsys__GetItemItemRecommendations_DiversifyItemItem. Note that recsys__GetItemItemRecommendations_DiversifyItemItem has a second argument which is the metadata entity.

-- generate user_signals
  --
  --

-- generate metadata
metadata = load '$METADATA_SOURCE' as (
                    item: chararray,
                    metadata_field:chararray
                  );

item_item_recs = recsys__GetItemItemRecommendations_DiversifyItemItem(user_signals, metadata);
-- item_item_recs = recsys__GetItemItemRecommendations(user_signals); replace this macro

Example:

In pigscripts/techniques/diversify-items-technique.pig, genre acts as the metadata_field and the movie title is the item. The script then calls recsys__GetItemItemRecommendations_DiversifyItemItem to generate item-item recommendations diversified over genre.

mortar local:run pigscripts/techniques/diversify-items-technique.pig -f params/technique.params

Popularity Boost

For:

  • Results include too many weird and unpopular items
  • Results feel generally “not great”

Where:

It turns out that just recommending popular items can be an extremely successful strategy. The Mortar recommendation engine acts to prevent popular items from always showing up, but if the result is a lot of bizarre results, one option is to boost the signal to make popular items show up more. This may reduce the “interesting” quotient of the results, but it can also make them look better overall.

How:

To use the popularity boost, replace recsys__GetItemItemRecommendations with recsys__GetItemItemRecommendations_PopularityBoost. This will automatically handle the popularity boost improvements.

-- generate user signals
  --
  --

item_item_recs = recsys__GetItemItemRecommendations_PopularityBoost(user_signals);
--item_item_recs = recsys__GetItemItemRecommendations(user_signals); replace this macro

Example:

The pigscripts/techniques/popularity-boost-technique.pig script shows the use of the popularity boost macro.

mortar local:run pigscripts/techniques/popularity-boost-technique.pig -f params/technique.params

In-Stock Items

For:

  • You have a lot of items currently unavailable to users

Where:

  • my-recommender.pig

For some styles of business, it can turn out that a lot of items become unavailable after a period of time has passed. This happens with rotating inventory, but also with videos and even social media. If a noticeable number of the recommendations turn out to be unavailable, it will increase the performance and the number of successful results to use a version of the algorithm that removes out-of-stock items from final consideration. Those items still provide valuable linking signals, so we don’t remove them entirely from the signals phase—we just make sure they don’t appear in the results.

How:

You'll want to figure out which items should have recommendations generated for them, and which items can be recommended.

--items for which recommendations should be generated
source_items : { (
    item: chararray
) }

--items that can be recommended
dest_items : { (
    item: chararray
) }

This could also be the same set of items:

available_items : { (
    item: chararray
) }

Generally, if you are generating user recommendations, you'll want source_items to be every possible item, because you'll want to preserve those useful links between users and items.

For this technique, use recsys__GetItemItemRecommendations_WithAvailableItems in place of the standard recsys__GetItemItemRecommendations. Note that this macro takes three arguments where the second and third are source_items and dest_items.

-- generate user signals
  --
  --

-- generate available items 
items = load '$AVAILABLE_ITEMS_PATH' as (
          item: chararray,
          is_available: int -- this is any boolean to check if item is available or not
        );

-- filter out unavailable items
available_items   =   foreach (filter items by is_available == 1) generate item;

item_item_recs = recsys__GetItemItemRecommendations_WithAvailableItems(user_signals, items, available_items);
--item_item_recs =  recsys__GetItemItemRecommendations(user_signals); replace this macro

Example:

The pigscripts/techniques/in-stock-technique.pig script uses the inventory count to determine which items are in stock and which are not. Items that have an inventory count of 0 don't appear in the results.

mortar local:run pigscripts/techniques/in-stock-technique.pig -f params/technique.params