Pig Help and Resources
is a simple, open-source data flow language for Hadoop. It makes it easy to write scripts for Hadoop that perform well against massive data sets.
Pig on Mortar
Mortar supports Apache Pig version 0.9 and Apache Pig version 0.12, with patches applied for:
- Better Amazon S3 Integration
- Pure-Python and Jython UDFs
- Working Illustrate command
- Script Validation
- Improved error handling and error messages
- Additional/Improved loaders for data formats like JSON, XML, CSV, etc.
By default Mortar runs with Apache Pig version 0.9. To learn more about features only offered in Apache Pig version 0.12 check out Apache Pig Release Notes.
We've created an interactive Pig Tutorial to walk new users through the basic concepts and keywords.
Pig Cheat Sheet
For ongoing reference or a faster ramp-up, we've made a handy Pig Cheat Sheet. We've also compiled a shorter SQL->Pig Cheat Sheet that translates a number of key SQL concepts into their Pig equivalents.
Programming Pig Book
An excellent resource, available for free online, is the O'Reilly book Programming Pig.
The most complete references for pig can be found in the Apache Pig 0.9 and Apache Pig 0.12 documentation.
Jonathan Coveney of twitter has also posted a great introductory Pig video tutorial. If you don't want to watch the full video, you can just read the slides.
Additionally, the Pig User Mailing List is a good source for pig questions and answers.