If you’ve already done the Build an Example Pipeline section of this tutorial, you should be familiar with the various steps we're going to cover now. If you haven’t done that section, take a minute to at least read it over to become familiar with the different steps of building a Redshift data warehouse.
There are three things you will need to do to build your custom Redshift data warehouse:
In the following tutorial articles we're going to cover each of these in detail. Before doing that there are a couple of things you'll need to set up.
As in the wikipedia example, you'll want an S3 bucket for intermediate data and AWS Access Keys that can reach your data in S3 and write to Redshift. You can use the same bucket and keys you created for the example—for a refresher, see the section about creating a bucket and AWS keys in the wikipedia tutorial.
You will need to have a running Redshift cluster to complete this tutorial. If you already have a running cluster you would like to use, you can skip the remaining steps in this article.
AWS charges by the hour for Redshift (see pricing). If you're unsure how large of a cluster you will need, start with the smallest Redshift cluster (Node type of dw2.large and cluster type of Single Node) for $0.25/hour. It is easy to upsize your Redshift cluster if you need better performance. To avoid incurring extra costs, be sure to stop your cluster when you are done with it.
To start a Redshift cluster, follow the official AWS documentation for a step-by-step tutorial of what to do. You will need to do steps 1-3 and the first part of step 4. Be sure to place your cluster in the US East region for fast and free data transfer to Mortar's Hadoop clusters. Also, you do not need to worry about creating tables or running queries, as the Mortar ETL pipeline takes care of that for you.
The AWS documentation recommends using SQL Workbench/J to connect and query Redshift, but most other SQL clients will work as well.