The recommended script breakdown is:
This provides good, persistent debug data and allows restart of the process from intermediate points.
You can either replace the example code in the Pig script files with your own, or create new files to split your code into.
Anything prior to
recsys__GetItemItemRecommendations should go in here. This script will go from loading your raw data to storing your final (user, item, weight) signals out to S3.
recsys__GetItemItemRecommendations macro and any modifications you’ve made to it should go in here. This script will load the signals stored out in the previous script, run the item-item recommendations, and store the results to S3. This may seem like a small amount of code for one script, but we’re breaking up the job by run time and intermediate data rather than lines of code.
recsys__GetUserItemRecommendations macro and any modifications you’ve made to it should go here. This script will load both the signals and the item-item recommendations previously stored, generate the user-item recommendations, and store the results to S3. If you aren’t generating recommendations directly for users, you can skip this script entirely.
Once your scripts are split up, the next step is to run your pipeline.