Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Export From SQL Database

If your data is stored in a SQL database like MySQL, PostgreSQL, Microsoft SQLServer, or Oracle, you will often want to extract it to S3 for analysis and processing. Pig does not load data directly from databases, but Mortar offers several other options to extract your data.

MySQL

The most efficient way to extract data from a MySQL database is via the MySQL Command Line Tool. It offers a great deal of flexibility, and works well at high volumes of data.

Mortar has created a Luigi Task called ExtractMySQLData that you can drop into any Luigi pipeline to extract data from MySQL. It automatically scripts the MySQL command-line tool to setup a connection to your database, extract the data you want, and then sync that data to S3.

For an example of using the ExtractMySQLData Task to export Wikipedia data from MySQL, see the wikipedia-luigi-mysql.py script, and the ExtractMySQLData documentation.

Other Databases

Built-In Extraction Tool

For other databases, we recommend extracting flat files using your database's built-in bulk extraction tool and uploading them to S3 in tab-delimited or CSV format. By database, the recommended solutions are:

Sqoop

Alternately, Mortar has experimental support for extracting data from JDBC-compliant databases with Apache Sqoop. This support is experimental—it currently only works when running Luigi from your computer with mortar local:luigi, not on the Mortar service with mortar luigi. As such, we recommend using it only for one-time extracts and for development, not for production pipelines.

Please see the Sqoop Tasks section of the Mortar Sqoop documentation for instructions on how to use Sqoop.