Skip to content
Jacob Nelson edited this page Jun 1, 2015 · 5 revisions

Follow the instructions on our docker image here: https://registry.hub.docker.com/u/uwsampa/giraph-docker/

What's below here is old and broken.

Old instructions

These instructions will help build a recent release of Giraph. This builds jar files with include dependencies, which are not included in the release tarball from the Giraph website. You should be able to use a release tarball too, but using the jars with dependencies was easier for our simple experiments.

Prerequisites

You must have the following installed to build and use Giraph:

  • Java 1.6 or later
  • Maven 3 or later
  • A supported version of Hadoop, as described here. This tutorial assumes Hadoop 2.6.0, with Yarn.

Step 1: Get the Giraph source

git clone https://git-wip-us.apache.org/repos/asf/giraph.git

Step 2: Build most recent release

git co -t origin/release-1.1

Step 3: Build Giraph for our Hadoop version

First, get rid of this symbol as described here: [[http://mail-archives.apache.org/mod_mbox/giraph-user/201501.mbox/%[email protected]%3E]]

Then, change into the giraph directory and run this command:

mvn -Phadoop_yarn -Dhadoop.version=2.6.0 -DskipTests package

Step 4: Prepare library dependences.

Giraph doesn't seem to play well with Hadoop 2.*, jar dependencies, and HDFS with permissions. I couldn't find a way to use the --yarnjars arguments to get the jars in the right place at this point. Not sure how to proceed to get Giraph running on our cluster.

Older instructions

These instructions will help you set up a version of Giraph that works with recent versions of Hadoop. At the time this was written, the most recent Giraph release (1.0.0) doesn't support the most recent release of Hadoop (2.4.1), so we use the Giraph development trunk.

Prerequisites

You must have the following installed to build and use Giraph:

  • Java 1.6 or later
  • Maven 3 or later
  • A supported version of Hadoop, as described here. This tutorial assumes Hadoop 2.4.1, with Yarn.

Step 1: Get the Giraph source

git clone https://git-wip-us.apache.org/repos/asf/giraph.git

Step 2: Build Giraph for our Hadoop version

Change into the giraph directory and run this command:

mvn -Phadoop_yarn -Dhadoop.version=2.4.1 package -DskipTests

Step 3:

zookeeper

Step 4:

Run a demo.

hadoop jar /shared/hadoop/giraph/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.6.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op giraph-output-$(date +%s) -yj giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.6.0-jar-with-dependencies.jar,giraph-1.1.0-SNAPSHOT-for-hadoop-2.6.0-jar-with-dependencies.jar -w 1

$HADOOP_HOME/bin/hdfs dfs -put /scratch/nelson/giraph-1.0.0-x/tiny-graph.txt input $HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny-graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output-$(date +%s) -yj giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar,giraph-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar -w 1

$HADOOP_HOME/bin/hadoop jar /scratch/nelson/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/nelson/input/tiny-graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/nelson/output -yj giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar,giraph-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar -w 1

$GIRAPH_PREFIX/bin/giraph ./giraph-examples-1.1.0-SNAPSHOT.jar org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny-graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output -yj giraph-core-1.1.0-SNAPSHOT.jar,giraph-examples-1.1.0-SNAPSHOT.jar -w 1

runs: $HADOOP_PREFIX/bin/hadoop jar /usr/local/giraph/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny-graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output-$(date +%s) -yj giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar,giraph-1.1.0-SNAPSHOT-for-hadoop-2.4.1-jar-with-dependencies.jar -w 1