Extraction is the first step in the pipeline and involves moving your data from one or many source systems into a common intermediate storage location in S3. Mortar currently supports extraction from S3, MongoDB, and SQL databases.
Transform is the second step in the pipeline and involves using Apache Pig, Python, Java, or other languages to transform all of your data into a standard easy-to-query form. Common transformations are data cleansing, joining data from different sources, or splitting single data elements into separate data points.
The last step in the pipeline is to load your data into Amazon Redshift. The Mortar ETL pipeline uses Hadoop to load your data in a massively parallel way ensuring that your pipeline can grow with your data far into the future.