Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

R in Mortar
Run an Example R Script

Run an Example R Script

This quick tutorial will walk you through the basics of running an R script inside a data pipeline with Mortar.

For this example, we will craft a simple data pipeline to pull some data down from S3, run an R script to graph the data, and then upload the graph back to S3. We will also generate a URL where you can download the graph automatically from S3.

The general process is to get an R script working locally, and then wire it up to a data pipeline. Once it is wired up, you can run it automatically as often as you like.

Install Mortar

First, make sure that you have a Mortar account (you can request a free trial here) and that you've installed the Mortar Development Framework. You can download it here—it's the fastest way to build and run data pipelines.

Start the Example Project

For the purposes of this tutorial we'll be working within the mortar-examples project developed by Mortar that contains several ready-to-run examples. We'll use Luigi to run a data pipeline containing an R script.

Once you've installed Mortar, run the following command (prepending your handle to generate a unique name) in the terminal to get a copy of Mortar's example project:

mortar projects:fork git@github.com:mortardata/mortar-examples.git <your-handle>-mortar-examples

Mortar projects have a standardized structure that keeps the various elements of a project (Pig scripts, UDFs, Luigi scripts) organized. R scripts live in the rscripts directory, and Luigi scripts, unsurprisingly, live in the luigiscripts directory.

Test the R Script Locally

The R script we'll be working with is olympics.R, a script that graphs the medals won by each country in the Olympics.

Open up olympics.R in your favorite code editor and take a look. The script takes in two parameters, input_path and output_path. Let's run it once quickly to make sure it works. Make sure you have R installed on your computer, and run:

cd <your_mortar_project_root>
Rscript rscripts/olympics.R \
    data/OlympicAthletes.csv \
    /tmp/medals_graph.png

When the script completes, you should have an Olympic medal histogram in /tmp/medals_graph.png. Open it up and take a look!

Run in a Data Pipeline

Now we've got a locally-tested R script. But what if your company needs this graph updated every day? We don't want to get stuck manually pulling down the data, building the graph, and uploading it every day. Computers should do this for us!

So let's build a data pipeline to automate the process. We'll have the pipeline pull data from S3, run the R script on it, upload the graph back to S3, and then give us a link to the data.

Open up luigiscripts/olympics-luigi.py in your code editor. This is a Luigi data pipeline to automate this process.

There are four steps in the pipeline, each represented by a Python class that extends luigi.Task:

  1. CopyOlympicsDataFromS3: copies the raw data from S3 to the local disk
  2. PlotOlympicsData: runs the R script to generate a graph
  3. CopyOlympicMedalsGraphToS3: uploads the graph from local disk to S3
  4. PrintS3GraphLink: generates an https link to the graph and prints it to the log file

Let's run the data pipeline now. You can either run it locally or in the cloud. We'll run in the cloud.

In the terminal, just run:

cd <your_mortar_project_root>
mortar luigi luigiscripts/olympics-luigi.py \
    --s3-path "s3://mortar-example-output-data/<your-handle>/medals_graph.png"

where <your-handle> should be an identifier unique to you (such as first-initial-last-name).

When you run the command, you should see the following:

Taking code snapshot... done
Sending code snapshot to Mortar... done
Requesting job execution... done
job_id: some_job_id

Job status can be viewed on the web at:

 https://app.mortardata.com/jobs/pipeline_job_detail?job_id=some_job_id

This tells you that your job has started successfully and gives you the URL to monitor its progress. If you open that URL, you should see your pipeline running in the logs: first moving data from S3, then running your R script, then uploading data to S3, and finally printing out the URL for the graph.

The URL in the logs should look something like:

2014-10-12 05:42:01,811 (olympics-luigi.py:160)   DOWNLOAD GRAPH AT: https://mortar-example-output-data.s3.amazonaws.com/...

Open up your URL and pull down the graph!

Olympic Medals Graph

Next Steps

Proceed to the next article to start working with your own R scripts. Or, learn a bit more about Luigi data pipelines.