DataFu

Software Screenshot:
DataFu
Software Details:
Version: 1.2.0 / 1.3.0-rc1 updated
Upload Date: 10 Feb 16
Developer: LinkedIn
Distribution Type: Freeware
Downloads: 79

Rating: 5.0/5 (Total Votes: 1)

DataFu was developed at LinkedIn and is written entirely in Java.

DataFu includes functions/libraries for working with:

- Statistics

- Estimation

- Sampling

- Sessions

- Link Analysis

- Set operations

- Bags

DataFu is perfect for data mining and statistical applications working on top of Hadoop or Pig databases.

These functions allow developers to take full advantage of all the data stored inside a Hadoop or Pig database without having to deal with massive system requirements in order to do so.

What is new in this release:

  • Pair of UDFs for simple random sampling with replacement.
  • More dependencies now packaged in DataFu so fewer JAR dependencies required.
  • SetDifference UDF for computing set difference (e.g. A-B or A-B-C).

What is new in version 1.2.0:

  • Pair of UDFs for simple random sampling with replacement.
  • More dependencies now packaged in DataFu so fewer JAR dependencies required.
  • SetDifference UDF for computing set difference (e.g. A-B or A-B-C).

What is new in version 1.1.0:

  • Added SHA hash UDF.
  • InUDF and AssertUDF added for Pig 0.12 compatibility. These are the same as In and Assert.
  • SimpleRandomSample, which implements a scalable simple random sampling algorithm.

Similar Software

Other Software of Developer LinkedIn

Bisection Dominion
Bisection Dominion

14 Dec 14

Rest.li
Rest.li

20 Jul 15

LinkedIn
LinkedIn

15 Dec 14

Comments to DataFu

Comments not found
Add Comment
Turn on images!