Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Storage
GoogleDriveStorage

Using GoogleDriveStorage

GoogleDriveStorage is an extension of PigStorage that stores data both in the specified file system as a CSV, and then to Google Drive.

Connect to Google Drive

To use GoogleDriveStorage with Mortar, you must first authenticate Mortar to connect to your Google account.

To authorize Mortar to write files from Hadoop into your Google Drive, go to My Settings->Google Auth. Click on the Link Google Account button to see Google's authorization window.

Google Auth

Select the account you want to be able to write to (only one at a time, for now), and that's it! You can always change the account connected by going back to this page and selecting a different account.

Store Output to Google Drive

To store data to Google Drive, use the GoogleDriveStorage function:

STORE output INTO '$OUTPUT_PATH'
    USING com.mortardata.pig.GoogleDriveStorage('google_file_name');

The data will first be stored as a CSV to the location indicated by $OUTPUT_PATH and then uploaded to GoogleDrive to a file named google_file_name.

By default, GoogleDriveStorage will upload your data as a Google Spreadsheet, and will overwrite an existing file of the same name. To set these parameters, use:

STORE output INTO '$OUTPUT_PATH'
    USING com.mortardata.pig.GoogleDriveStorage('google_file_name', 'YES_CONVERT', 'YES_OVERWRITE');

YES_CONVERT specifies that the file should be converted to a Google Spreadsheet, and YES_OVERWRITE specifies that an existing file of the same name should be overwritten. This is useful when integrating with other applications that can pull updated data from a Google file.

Note: GoogleDriveStorage only works with Pig 0.12. To specify Pig version use --pigversion 0.12

Limitations

There is currently a limit to the size of file that may be uploaded to Google Drive using GoogleDriveStorage. In the event that an upload fails, output data can still be found in the output path location.