Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

Pig Help and Resources

Apache Pig is a simple, open-source data flow language for Hadoop. It makes it easy to write scripts for Hadoop that perform well against massive data sets.

Pig on Mortar

Mortar supports Apache Pig version 0.9 and Apache Pig version 0.12, with patches applied for:

  • Better Amazon S3 Integration
  • Pure-Python and Jython UDFs
  • Working Illustrate command
  • Script Validation
  • Improved error handling and error messages
  • Additional/Improved loaders for data formats like JSON, XML, CSV, etc.

By default Mortar runs with Apache Pig version 0.9. To learn more about features only offered in Apache Pig version 0.12 check out Apache Pig Release Notes.


Pig Tutorial

We've created an interactive Pig Tutorial to walk new users through the basic concepts and keywords.


Pig Cheat Sheet

For ongoing reference or a faster ramp-up, we've made a handy Pig Cheat Sheet. We've also compiled a shorter SQL->Pig Cheat Sheet that translates a number of key SQL concepts into their Pig equivalents.


Programming Pig Book

An excellent resource, available for free online, is the O'Reilly book Programming Pig.


References

The most complete references for pig can be found in the Apache Pig 0.9 and Apache Pig 0.12 documentation.


Video

Jonathan Coveney of twitter has also posted a great introductory Pig video tutorial. If you don't want to watch the full video, you can just read the slides.


Blog Posts


Mailing List

Additionally, the Pig User Mailing List is a good source for pig questions and answers.