Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Integrate Into Your Application

After running the Luigi pipeline, you should have your results stored and ready to go. To serve recommendations, all that’s left to do is display them in your application.

MongoDB

If your recommendation engine pipeline writes results to MongoDB instead of to DynamoDB go directly to Where to Put Recommendations.


DBMS

If your recommendation engine pipeline writes results to a SQL database instead of to DynamoDB go directly to Where to Put Recommendations.


DynamoDB Schema

The provided scripts for writing to DynamoDB do so with these schemas.

For item-item recommendations:

  • from_id (Primary Hash Key): the item recommendations are generated for
  • rank (Primary Range key): order of the recommendation (e.g., 2 indicates that the row is the second recommendation for the from_id)
  • to_id: item being recommended
  • weight: weight of the link after normalizing for popularity
  • raw_weight: original weight of the link

weight and raw_weight exist primarily for debugging purposes and don’t need to be used in the queries.

For user-item recommendations:

  • from_id (Primary Hash Key): the user the recommendations are generated for
  • rank (Primary Range Key): order of the recommendation (e.g., 2 indicates that the row is the second recommendation for the from_id)
  • reason_item: the item linking this user to the recommended item
  • to_id: item being recommended
  • user_reason_item_weight: weight of link between user and reason_item
  • weight: weight of link between user and recommended item
  • item_reason_item_weight: weight of the link between the reason_item and the recommended item

user_reason_item_weight, weight, and item_reason_item_weight exist primarily for debugging and don’t need to be used in the queries.

Query Language

There are libraries to interact with DynamoDB in most major programming languages. Here are a few common ones:

Query Example

Generally all you’ll need to do is query on the HashKey (from_id), order by the RangeKey (rank), and maybe limit the results. In Python that would look like:

# creating a new table object on every query can be slow
def get_table(table_name):
    if not TABLE_CACHE.get(table_name):
    TABLE_CACHE[table_name] = dynamodb.make_table(table_name)
    return TABLE_CACHE.get(table_name)

def query_dynamo(id, limit, table_name):
    kwargs = {'from_id__eq': id, 'limit': limit, 'reverse': True}
    table = get_table(table_name)
    results = table.query(**kwargs)

Where to Put Recommendations

Here are some of the places that recommendations are commonly displayed:

Item-Item Recommendations

  • individual item pages
  • "View Cart" page
  • checkout page
  • "watch next" slate after a video

User-Item Recommendations

  • landing page
  • user’s home page
  • email campaign

Best Practices

Red/Black

During the time that data is being written into DynamoDB (or whatever data store you have chosen), you still need to be able to serve recommendations based on the previous run. The easiest way to accomplish this is to implement a red/black scheme. For red/black, you load new data into a different table than the one your production application is using to serve results. After the load finishes, you tell your application to "flip" to the new table. This allows you to avoid computing an update diff or locking rows while handling new data. If you were running the recommendation engine every day, you'd alternate your "Red" database and your "Black" database daily.

Day 1: Red
Day 2: Black
Day 3: Red
Day 4: Black
...

One way to accomplish this if you run no more than once a day is to incorporate the date into the table name. This guarantees that you never touch a data store that is in production. Your app will need a configuration parameter indicating which table is "active"; your luigi script can update this at the end of its run. With this scheme you will need to clean up old database tables on a regular basis, or else pay to have out of date data sticking around.

Verify against Primary Database

Because recommendations will be generated in a batch process, inevitably there will be a gap of time between data extraction and display. It’s important to make sure that the recommended item is still available or valid, before displaying it to the user.

Show a Picture

If plausible in your domain, showing recommendations as pictures is much more effective than showing them as text.

Handle Cold Start

Because user recommendations are based on their existing behavior, new users will not have any recommendations with this method. To get around this there are two strategies which can be used in combination or separately.

The first strategy is to simply make a table of the most popular items, and use those for recommendations for any user who doesn't have enough data for personalization. Popularity is often a reasonable proxy for personalization when the data isn't available.

The second strategy is to use item-item recommendations to build a picture of the user as soon as they start interacting with the site. Instead of showing them user-item recommendations, show them a mix of item-item recommendations from all the things they've viewed in this session; this way the recommendations feel as though they are being continuously updated. This strategy can work for any user, not just brand new ones.

Don’t Make Them Scroll

A sidebar or a bottom bar viewable without needing to scroll is going to garner a lot more clicks than one that’s not visible in the default view.

A/B Tests

The only way to know if your recommendations are working to generate revenue or engagement is to run an A/B test. There are a number of tools that will handle this for you, or you can create user cohorts yourself.

The last element of a fully operational recommendation engine is to automate your pipeline.