Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Split Scripts

So far we’ve had all of the recommendation engine code running in one Pig script, but it will ultimately be more convenient to break up the work over several scripts. While this does add a small amount of overhead in storing and loading data between each script, it adds a considerable amount of flexibility for debugging future problems and restarting in the event of a cluster failure.

Code Sections

The recommended script breakdown is:

  • 01_Generate_Signals
  • 02_Generate_Item_Item_Recs
  • 03_Generate_User_Item_Recs

This provides good, persistent debug data and allows restart of the process from intermediate points.

You can either replace the example code in the Pig script files with your own, or create new files to split your code into.

01_Generate_Signals

Anything prior to recsys__GetItemItemRecommendations should go in here. This script will go from loading your raw data to storing your final (user, item, weight) signals out to S3.

02_Generate_Item_Item_Recs

The recsys__GetItemItemRecommendations macro and any modifications you’ve made to it should go in here. This script will load the signals stored out in the previous script, run the item-item recommendations, and store the results to S3. This may seem like a small amount of code for one script, but we’re breaking up the job by run time and intermediate data rather than lines of code.

03_Generate_User_Item_Recs

The recsys__GetUserItemRecommendations macro and any modifications you’ve made to it should go here. This script will load both the signals and the item-item recommendations previously stored, generate the user-item recommendations, and store the results to S3. If you aren’t generating recommendations directly for users, you can skip this script entirely.

Once your scripts are split up, the next step is to run your pipeline.