Of course, what's actually exciting about integrating with DataHero is being able to visualize your own data. You should now have a little familiarity with Pig, so here are the steps for writing your own Pig script to store to Google Drive and use with DataHero.
The first step in the Pig script is loading data in from its source. This can be Amazon S3 or MongoDB. If your data lives elsewhere you can likely do an export to Amazon S3, which is very inexpensive and reliable storage for large quantities of data.
Once you know the location and format of your data, try the Load Statement Generator to generate the Pig.
As you work on your Pig script, Illustrate will be your friend, helping you see errors as you work without needing to start up a cluster:
mortar local:illustrate pigscripts/<my-pigscript-name>.pig
When you are ready to run your script and store out your data, use GoogleDriveStorage:
STORE output INTO '$OUTPUT_PATH' USING com.mortardata.pig.GoogleDriveStorage('<my-file-name>', 'YES_CONVERT', 'YES_OVERWRITE');
This will store data out both to Amazon S3 and your Google Drive.
There is a limit to the size of file you can upload to Google Drive and convert into a Google Sheet. What you want to do in your Pig script is convert your big data down to a smaller size. This often involves doing aggregates over users or days, and then storing out those aggregate rows. Chances are you can make your data a lot smaller without losing the dimension you want to analyze.
Once your Google Drive is connected to DataHero, all you need to do is click the Import button on the home page to bring in a new data file.
If you want to see the same analysis on a daily or weekly basis, you can set up a Schedule to run the Mortar job regularly.
DataHero can automatically update dashboards when the source file changes; all you need to do is write to the same file name and use the
YES_OVERWRITE option in GoogleDriveStorage.