Deciding whether recommendations are "good" without doing A/B testing is not an exact science. The process generally combines a lot of common sense with domain expertise. There is no one true way, but here are some suggestions.
There is an alternative macro to
recsys__GetItemItemRecommendationsDetailed that provides more detail about how links are generated.
It takes longer to run than the default macro, but it can be very helpful for understanding the results.
import 'recommender_alternatives.pig'; user_signals = foreach raw_input generate user as user, item as item, 1.0 as weight, 'PURCHASE' as signal_type; item_item_recs = recsys__GetItemItemRecommendationsDetailed(user_signals);
The detailed macro requires a fourth argument in
signal_type that describes where the signal came from.
The output then includes two additional data fields:
On the example retail recommender, the output looks like this:
fargo the godfather 6.8140154 2.6100628 1 [WISHLIST#4,NUM_USERS#5,PURCHASE#6] fargo butch cassidy and the sundance kid 6.49299 1.9853055 2 [WISHLIST#1,NUM_USERS#3,PURCHASE#5] fargo 48 hours 5.010393 1.9853055 3 [WISHLIST#1,NUM_USERS#3,PURCHASE#5] fargo the untouchables 3.1157842 4 butch cassidy and the sundance kid fargo m. 2.971754 1.3863515 5 [WISHLIST#4,NUM_USERS#3,PURCHASE#3]
Even if your goal is to personalize results for users, the intermediate item-item results give a very good indication of how good the recommendations will be. It's nearly impossible to decide whether the recommendations for an unknown user are good or not: it's much easier to know if Toy Story is a good recommendation for Antz than if Toy Story is a good recommendation for Susie G. in Chicago. Even if you have internal users to look at, they may get odd results if they have signals generated during testing or development.
Make a short list of items that have different characteristics that make them likely to have disparate audiences. For the LastFM example, we might decide we want: heavy metal, classical, pop, jazz, and something obscure. Then we'd pick good examples of each of those categories and view their results every time we evaluate the recommender.
In looking at your sentinel items, take note of any problems you see with the recommendations. A few things to look for:
Once you have a list of issues to solve, divide them into two sets:
If you are working with item ids, it's very hard to interpret the results without first converting them to readable strings.
There are helper functions located in
recsys_helper.pig that take the output of the standard recommendation engine macros and add translations of the ids.
import 'recsys_helper.pig'; signals = load '$INPUT_PATH' using PigStorage('user_id', 'user_name', 'item_id', 'item_name'); item_names = distinct(foreach signals generate item_id as id, item_name as name); user_names = distinct(foreach signals generate user_id as id, user_name as name); item_item_recs_names = Recsys__ItemRecNamesFromIds(item_item_recs, item_names); user_items_recs_names = Recsys__UserRecNamesFromIds(user_item_recs, user_names, item_names);
Further described here, these techniques focus on cleaning up and improving data before it enters the recommendation engine.
As discussed here, these are modifications to the engine algorithm that either use additional types of data or alter the way signals are processed.
Here are some suggested techniques, depending on which issues you've identified in your results. Within each section the techniques are listed in order of suggested consideration, so scan through the headings to identify your primary issue, then look at each suggested technique in order to see if it is appropriate for your data.
When new users come to a web site or application, there will initially be no data about their preferences. While this can make it hard to generate personalized recommendations for newcomers, there are techniques and tricks to handling this issue. For further discussion, see Handle Cold Start under Integration Best Practices