Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Load Logs from Papertrail

You can easiliy process Papertrail log archives in Hadoop using Mortar's PapertrailLoader for Pig.

Setup Papertrail to Save Logs to S3

If you haven't already, setup Papertrail log archival to S3. It's not only useful for analyzing your logs, but also for peace of mind. Each night, papertrail will upload a gzip-compressed, tab-separated text file with the previous day's logs.


Loading Papertrail Logs

Once those logs are arriving in S3, you can use Mortar's PapertrailLoader to read them.

You'll need to register the jar for the loader first, and then use a load statement. The PapertrailLoader takes no schema information and can be used as follows:

REGISTER s3n://mhc-software-mirror/papertrail/papertrail-loader-0.2.jar

log_data = LOAD 's3n://my-s3-bucket/papertrail/files'
           USING com.mortardata.pig.PapertrailLoader();

This assumes that your data is tab-separated, and loads each record in your data with the following schema:

(id: long, generated_at:chararray, received_at:chararray, source_id:chararray, source_name:chararray, source_ip:chararray, facility_name:chararray, severity_name:chararray, message:chararray)

All text that occurs within a line after the severity_name field is considered to be part of the message field, and tabs are preserved within that field.


Unpacking the Message Field

Generally, the bulk of your useful data will be within the message field. This field may require further parsing to be usable, which can involve writing a UDF or using one provided. For example:

DEFINE FromJson org.apache.pig.piggybank.evaluation.FromJsonWithSchema('f1: int, f2: chararray');

log_data = LOAD 's3n://my-s3-bucket/papertrail/files'
           USING com.mortardata.pig.PapertrailLoader();

message = FOREACH log_data GENERATE FLATTEN(FromJson(message));

This takes a message field that is a JSON object that looks like this:

{
    "f1": 432523,
    "f2": "tacos"
 }

It then produces a new relation named message whose fields are f1 and f2: (432523, "tacos"). For more on the JSON UDFs, see the JSON page.