Prerequisites
You will at a minimum need the following:
- Java 6 or Java 7 (Java 7 is recommended)
Setup passphraseless ssh
These instructions are taken from the Hadoop Quick Start Guide.
Now check that you can ssh to the localhost without a passphrase:
ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Heads Up!
Also you will need to know the location of the JAVA_HOME directory.
Download
Download Source and Binary Artifacts
Both the source and binary artifacts are provided via mirrors here:
Apache Blur 0.2.4 Source
Apache Blur 0.2.4 Hadoop1 Binary
Apache Blur 0.2.4 Hadoop2 Binary
Compile Hadoop
If building from source, the distribution needs to be compiled before use
Clone master
git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git
Hadoop 1
Build the artifacts for Hadoop 1 (if you want to run the tests remove the "-DskipTests")
cd incubator-blur/
mvn install -DskipTests -Dhadoop1
The binary artifact is located distribution/target/apache-blur-0.2.4-incubating-SNAPSHOT-hadoop1-bin.tar.gz
.
Hadoop 2
Heads Up!
While all the tests pass on Hadoop 2, Blur has not be tested at scale on Hadoop 2 and bin/blur-config.sh script will likely require modification to include the correct Hadoop 2 libraries.
Build the artifacts for Hadoop 2 (if you want to run the tests remove the "-DskipTests")
cd incubator-blur/
mvn install -DskipTests -Dhadoop2
The binary artifact is located distribution/target/apache-blur-0.2.4-incubating-SNAPSHOT-hadoop2-bin.tar.gz
.
Install
Once a distribution is available, follow the simple steps to install.
Extract the contents of the distribution
tar -xzvf apache-blur-*-bin.tar.gz
For bash edit .bash_profile and add:
export BLUR_HOME=<directory where Blur was extracted>
Minimum Configuration
There are a few things at a minimum that will need to be configured to start Apache Blur
Edit $BLUR_HOME/conf/blur-env.sh and set JAVA_HOME:
export JAVA_HOME=<Java Home Directory>
Caution
If this variable is not set, then the script will attempt to locate JAVA_HOME by using the location of the "java" command.Starting Apache Blur
Starting Apache blur is a simple one command step
To start Apache Blur run the following command:
$BLUR_HOME/bin/start-all.sh
This will start a single Controller server and a single Shard server on your localhost.
You should see:
blur@blurvm:~$ apache-blur-0.2.4-incubating-SNAPSHOT/bin/start-all.sh
localhost: ZooKeeper starting as process 6650.
localhost: Shard [0] starting as process 6783.
localhost: Controller [0] starting as process 6933.
If you run the start command again you should see:
blur@blurvm:~$ apache-blur-0.2.4-incubating-SNAPSHOT/bin/stop-all.sh
localhost: Stopping Controller [0] server with pid [6933].
localhost: Stopping Shard [0] server with pid [6783].
localhost: Stopping ZooKeeper with pid [6650].
If you see it starting the servers again, then there is likely some issue with startup. Look in the $BLUR_HOME/logs directory for log and out files.
Shell
Once the servers have been started, you can use the shell to interact with Blur.
The shell command can be found in the bin directory
Auto detect the controller servers from the $BLUR_HOME/conf/controllers file
$BLUR_HOME/bin/blur shell
You can also explicitly call out the controller servers.
$BLUR_HOME/bin/blur shell controller1:40010,controller2:40010
Once in the shell, tables can be created, enabled, disabled, and removed. Type help to get a list of the commands.
Shell Example
The below example creates a table and stores the contents of the table in a local directory of /data/testTableName which will only work if you are running blur in a single instance. Normally if you are running a hadoop cluster this will be a hdfs URI for example hdfs://host:port/blur/tables/testTableName.
Create Table
blur> #Creates a table called testtable in the hdfs directory of /data/testtable with 11 shards
blur> create -t testtable -c 11 -l hdfs://namenode/data/testtable
Note
The local directory can be used however the integrity of the data may be compromised.
blur> #Creates a table called testtable in the local directory of /data/testtable with 11 shards
blur> create -t testtable -c 11 -l file:///data/testtable
Mutate
blur> #Adds a row to testtable
blur> mutate testtable rowid1 recordid1 fam0 col1:value1
Query
blur> #Runs a query on testtable
blur> query testtable fam0.col1:value1
- Results Summary -
total : 1
time : 7.874 ms
-----------------------------------------------------------------------------------------------------
hit : 0
score : 1.4142135381698608
id : rowid1
recordId : recordid1
family : fam0
col1 : value1
-----------------------------------------------------------------------------------------------------
- Results Summary -
total : 1
time : 7.874 ms
Enable Highlighting
blur> #Turns highlighting on
blur> highlight
highlight of query command is now on
Query with Highlights
blur> #Runs a query on testtable with highlighting on, notice <<<value1>>> is highlighted
blur> query testtable2 fam0.col1:value1
- Results Summary -
total : 1
time : 13.395 ms
-----------------------------------------------------------------------------------------------------
hit : 0
score : 1.4142135381698608
id : rowid1
recordId : recordid1
family : fam0
col1 : <<<value1>>>
-----------------------------------------------------------------------------------------------------
- Results Summary -
total : 1
time : 13.395 ms
blur>