Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

The ETL Pipeline

We use an ETL pipeline to populate your data warehouse. ETL stands for the three main steps involved; Extracting your source data, Transforming it into a standard easy-to-query format, and Loading it into the data warehouse.

Extract

Extraction is the first step in the pipeline and involves moving your data from one or many source systems into a common intermediate storage location in S3. Mortar currently supports extraction from S3, MongoDB, and SQL databases.


Transform

Transform is the second step in the pipeline and involves using Apache Pig, Python, Java, or other languages to transform all of your data into a standard easy-to-query form. Common transformations are data cleansing, joining data from different sources, or splitting single data elements into separate data points.


Load

The last step in the pipeline is to load your data into Amazon Redshift. The Mortar ETL pipeline uses Hadoop to load your data in a massively parallel way ensuring that your pipeline can grow with your data far into the future.