The Mortar recommendation engine generates recommendations based on user interactions with items. These interactions, called signals, are assigned a weight indicating the strength of the interaction.
If you did the Running an Example Recommender tutorial you will recall that it used two signals--a user purchasing a movie and a user wishlisting a movie--and that it treated a purchase as being twice as meaningful an interaction as wishlisting a movie.
The Mortar recommendation engine requires signals in the form of three fields: A user id, an item id, and a weight to indicate the relative strength of the interaction.
Uncomment the signal generation section of the script:
/******* Convert Data to Signals **********/ -- Create (user, item, weight) tuples for your data to be used as input to the Mortar recommender. user_signals = foreach raw_input generate user as user, item as item, 1.0 as weight; -- Arbitrarily choose 1 as weight for purchasing an item.
In this example code, the signal generation is very straightforward. It’s treating every interaction between a user and an item as a signal with the same weight.
Figuring out your signals requires some thinking about the data you have, the actions your users can take, and the relative importance of those actions. Take a look at
pigscripts/lastfm-recsys.pig for some examples of extracting signals from data.
Don’t worry too much about coming up with the best possible set of signals. Often it can be easier to just get something working and then add more signals or adjust your signal weights as you see where your recommendations need improvement.
The simplest thing to do here is to just modify the
user_signals statement to work with your data by pulling the user and item ids from your data and picking a weight for the action the data row represents. The relative weighting of signals is much more important than the absolute value of the signal. So if you’re having having trouble figuring out weights to use, just assign your strongest signal a value of 1.0 and then assign every other signal a weight relative to that using your best guess.
If you have multiple load statements and you want to extract signals from each one, you can convert each data set to the "user, item, weight" format and union the individual results at the end (similar to how purchase and wishlist data is combined in
You should also remember that you have the full power of Pig available to you here to get your data into the proper format. In some complex cases you may want to use UDFs to implement custom signal extraction logic.
Once you have your Pig code for extracting signals written, use illustrate to verify that it works.
mortar local:illustrate pigscripts/my-recommender.pig -f params/my_recommender.params
Once illustrate completes successfully, you should be shown the aliases in your script with some example data for each alias. Ensure that at the very end of the illustrate output you have a single relation called user_signals that contains fields for user, item, and weight.
If you have an error with your Pig code, you should get an error message explaining what’s gone wrong. Once you are successfully generating your signals, it's time for the next step: running your recommendation engine.