If your data is stored in a SQL database like MySQL, PostgreSQL, Microsoft SQLServer, or Oracle, you will often want to extract it to S3 for analysis and processing. Pig does not load data directly from databases, but Mortar offers several other options to extract your data.
The most efficient way to extract data from a MySQL database is via the MySQL Command Line Tool. It offers a great deal of flexibility, and works well at high volumes of data.
Mortar has created a Luigi Task called ExtractMySQLData that you can drop into any Luigi pipeline to extract data from MySQL. It automatically scripts the MySQL command-line tool to setup a connection to your database, extract the data you want, and then sync that data to S3.
For other databases, we recommend extracting flat files using your database's built-in bulk extraction tool and uploading them to S3 in tab-delimited or CSV format. By database, the recommended solutions are:
Alternately, Mortar has experimental support for extracting data from JDBC-compliant databases with Apache Sqoop. This support is experimental—it currently only works when running Luigi from your computer with
mortar local:luigi, not on the Mortar service with
mortar luigi. As such, we recommend using it only for one-time extracts and for development, not for production pipelines.
Please see the Sqoop Tasks section of the Mortar Sqoop documentation for instructions on how to use Sqoop.