Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Signal Techniques

These techniques involve changes to the user-item signals before they enter the recommendation engine. Sometimes big wins can come from improvements to the original data.

Adjust Signal Weights

For:

  • Results feel generally "not great"

In the beginning we encouraged you to pick some sensible weights for your incoming user-item signals. One possible source of poor recommender results is that what seemed like sensible weights turn out not to be.

There are no golden rules, and these values don’t need to be perfect, but trying a couple of runs using different signal weights can point to where a problem might be. Maybe your customers use their wishlist as a place to store gift ideas, so those items don’t represent their preferences. Maybe your users are sharing media ironically (“look at how bad this is!”). Adjusting the signal weights can help fix these kinds of problems.

If you choose to create detailed data, you can look at the link_data field in your output to get a sense for which signals are producing better results. If poor results tend to be high in a particular signal type, that signal may want to have a smaller weight.

You can also dig deeper into the signals for a particular item pair; pigscripts/techniques/debug-item-item-recs.pig shows an example of finding the source signals behind a recommendation using the recsys__SelectSignals macro.

 mortar local:run pigscripts/techniques/debug-item-item-recs.pig

Verify Original Data

For:

  • Items appear multiple times in the results with slightly different names
  • Results feel generally "not great"

It may be that some part of your data-entry process is manual, or that you’ve pulled in data from various sources. If you see recommendations for both “old man’s child” and “old mans child,” that means that the signal is getting split between two different items when it should be going entirely to one. Resolving that in your original data set may be helpful to more than just the recommendation engine.


Look for Negative Signals

For:

  • Results feel generally "not great" or are actively bad

Can your users take items off a wishlist without purchasing them? Can they return items? Can they unfavorite or unfollow? It’s important to make sure that those negative signals are cancelling out the positive signals; a user who returns an item shouldn’t get the full “buy” affinity.

If your data contains a rating scale, ratings below the halfway point should not contribute any positive affinity. If a user rates something as a 1 out of 10, that user does not like the item with a strength of .1, that user actively dislikes it.

You can include signals with negative values (i.e., a negative sign on the weight); to reduce or eliminate positive signals.


Look for Additional Signals

For:

  • Results feel generally "not great"

When you gathered your signals, was there something that was hard to get that wasn't included? Maybe you've thought of something else that could be added into the mix. Now is the time to integrate anything that didn't get included in the first pass.


Keep More Data

For:

  • Many items are missing recommendations
  • Many items don't have enough recommendations

The default parameters in my_recommender.params cut off the number of recommendations generated at a conservative number for performance reasons. If your data is sparse or you find that a lot of items don't have any recommendations, these numbers can be adjusted to return more results.

MAX_LINKS_PER_USER

This parameter indicates how many signals from a single user will be included. Bots and users with extreme usage patterns can cause performance issues, so they are proactively screened out. If you think your users will often have more than 100 signals, increase this number.

MIN_LINK_WEIGHT

After item-item connections have been calculated, any links below this threshold will be discarded on the premise that the relationship is weak. If there aren't enough recommendations being generated, lowering this number will help retain some of those weaker connections.

NUM_RECS_PER_ITEM

This defines the number of recommendations to be generated for each item. The larger the number, the more possible recommendations. If there are users without recommendations, increasing this number may help.

NUM_RECS_PER_USER

This defines the number of recommendations to be generated for each user. The larger the number, the more recommendations available for each user.


Remove Bots

For:

  • You have data that might be from bots instead of humans

Where:

  • my_recommender.params
  • my-recommender.pig

Bots can destroy your recommendations, particularly if you are using view data. The bot will view 10,000 web pages, creating false affinities, and it will do it again and again. Bots add noise to the data, and as an added bonus, they also make the algorithms take longer.

How:

The first step is to add a THRESHOLD parameter to your param file. The THRESHOLD parameter represents the maximum number of user-item signals a single user can have before being considered a bot. Choose a value based on what your estimate of the largest number of signals a real person could actually generate. It's ok if this drops out a couple human users; getting rid of the noise is more important than keeping all of the signal.

The second step is to use the recsys__RemoveBots macro (found in recsys_util.pig) in your pigscript. This macro takes two arguments: your user-item signals and the threshold parameter. The macro removes the bot signals and returns the remaining user-item signals. Use the macro after you have generated the user-item signals and before applying recsys__GetItemItemRecommendations.

-- generate user signals
  --
  --

-- remove bots
user_signals_no_bots = recsys__RemoveBots(user_signals);

item_item_recs = recsys__GetItemItemRecommendations(user_signals_no_bots); -- use user_signals that is cleaned of bots

Example:

In pigscripts/techniques/remove-bots-technique.pig bots are removed by running the recsys__RemoveBots macro. The THRESHOLD parameter is set in params/technique.params. To run this example:

mortar local:run pigscripts/techniques/remove-bots-technique.pig -f params/technique.params