Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Using Apache Log Data

You can process Apache logs (both in Common Log Format and Combined Log Format) with Mortar by using the Piggybank functions CommonLogLoader and CombinedLogLoader.

Loading Common Log Format

You can use CommonLogLoader as such:

data = LOAD 's3n://path/to/input'
USING org.apache.pig.piggybank.storage.apachelog.CommonLogLoader()
AS (addr: chararray, logname: chararray, user: chararray, time: chararray,
method: chararray, uri: chararray, proto: chararray,
status: int, bytes: int);

For example, the log line

81.19.151.110 - - [017/Jan/2013:13:08:02 +0000] "GET /api/function HTTP/1.1" 200 156

Will be loaded as:

(81.19.151.119,-,-,17/Jan/2013:13:08:02 +0000,GET,/api/function,HTTP/1.1,200,156)

Loading Combined Log Format

You can use CombinedLogLoader as such:

data = LOAD 's3n://path/to/input'
            USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader()
            AS (addr: chararray, logname: chararray, user: chararray, time: chararray,
                method: chararray, uri: chararray, proto: chararray,
                status: int, bytes: int, referer: chararray, userAgent: chararray);

For example, the log line

81.19.151.110 - - [17/Jan/2013:13:08:02 +0000] "GET /api/function HTTP/1.1" 200 156 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1"

Will be loaded as:

(81.19.151.119,-,-,17/Jan/2013:13:08:02 +0000,GET,/api/function,HTTP/1.1,200,156,-,Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1)

Loading Custom Log Formats

You can load any regular-expression parseable format using the Piggybank function MyRegExLoader. Pass it a pattern, and each regex group (sections of the pattern enclosed in parentheses) matched will be returned to you as a chararray.

For example, you could load data in the format 01234: string - string

data = LOAD 's3n://path/to/input'
            USING org.apache.pig.piggybank.storage.MyRegExLoader(
                '(\\d{5}):\\s*([a-zA-Z]*)\\s-\\s([a-zA-Z]*)'
                ) AS (id: chararray, string1: chararray, string2: chararray);

Note that special characters must be double-escaped (two backslashes instead of one).


Mortar Project Example

For a full example in a Mortar project, clone down the mortar-examples repository and check out the nasa_logs pigscript.