Apache Pig has spawned from the Apache Hadoop project and is one of its modules that where in charge of providing a way to analyze the data it processed and stored.
Pig uses a custom query language called "Pig Latin" which is incredibly easy to learn and supports both relational and functional styles.
This means you can use it as a classic SQL language benefiting from data joins and filters, or you can use its MapReduce features, the data mappers and reducers.
By default Apache Pig was meant to be used inside Hadoop installations, but newer versions allow it to run separately via a separate JVM.
What is new in this release:
- Pluggable execution engines (to allow pig run on non-mapreduce engines in future)
- Auto-local mode (to jobs with small input data size to run in-process)
- Fetch optimization (to improve interactiveness of grunt)
- Fixed counters for local-mode
- Support for user level jar cache
- Support for blacklisting and whitelisting pig commands
- Several performance fixes and debuggability features
- A few non-backwards compatible interface modifications have been introduced in this release to make pig work with non-mapreduce engines
What is new in version 0.14.0:
- Pluggable execution engines (to allow pig run on non-mapreduce engines in future)
- Auto-local mode (to jobs with small input data size to run in-process)
- Fetch optimization (to improve interactiveness of grunt)
- Fixed counters for local-mode
- Support for user level jar cache
- Support for blacklisting and whitelisting pig commands
- Several performance fixes and debuggability features
- A few non-backwards compatible interface modifications have been introduced in this release to make pig work with non-mapreduce engines
What is new in version 0.11.0:
- This release includes DateType datatype, RANK, CUBE and ROLLUP operators, Groovy udfs, custom reducer estimation, schema-based tuples and HCatalog DDL integration.
What is new in version 0.9.1:
- This release works with Hadoop 0.20.
What is new in version 0.6:
- Added Zebra as a contrib project. See http://wiki.apache.org/pig/zebra
- Added UDFContext, gives UDFs a way to pass info from front to back end and gives UDFS access to JobConf in the backend.
- Added left outer join for fragment replicate join.
- Added ability to set job priority from Pig Latin.
- Enhanced multi-query to work with joins in some cases.
- Reworked memory manager to significantly reduce GC Overhead and Out of Heap failures.
- Added Accumulator interface for UDFs.
- Over 100 bug fixes and improvements.
Requirements:
- Java 1.6.x or higher
- Apache Hadoop 0.20.x or higher
Comments not found