Apache Pig is an open-source library in two parts: a high-level data flow language called PigLatin, plus an engine that parses, optimizes, and automatically executes PigLatin scripts as a series of MapReduce jobs on a Hadoop cluster. It’s much easier and faster to write than MapReduce, and it’s easier to decipher, too: PigLatin scripts read like a series of steps rather than spaghetti code. On top of that, Pig code is future-proofed and ready to take advantage of coming Hadoop improvements.
Pig is easy to learn, especially for anyone with a SQL background, which makes it accessible to data scientists who may not be software engineers by background. It also allows users to integrate code written in Java, Python, Jython, and other languages, providing extensive flexibility and use of specialized libraries.
Pig is open source and is actively supported by an impressive community of developers who are constantly contributing new code. It also has stability and staying power by virtue of its big-time institutional users, such as Netflix, LinkedIn, Twitter, Salesforce, and Stanford University.