Install giraph in hadoop node

Home > Java > Install giraph in hadoop node

Setup hadoop in single Cent-OS node as explained at Install hadoop in a single Cent-OS node
Create a directory for temporary files such as /opt/hadoop/tmp and add following to 'conf/core-site.xml' file:
```
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
```

Edit conf/mapred-site.xml file and add following configuration to allow 4 mappers to run in parallel:

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>

Edit conf/hdfs-site.xml and add:
```
<property>
<name>dfs.replication</name>
<value>1</value>
<description></description>
</property>
```
to configure hdfs to maintain only one copy of data, effectively disabling replication.
Format the node using ./bin/hadoop namenode -format, only if not formatted already.
Start all services using ./bin/start-all.sh
Install maven using
```
sudo yum -y install maven
```
Verify that installed version is >= 3.0 using mvn --version
Download latest stable giraph from https://www.apache.org/dyn/closer.cgi/giraph/
Extract giraph source in /opt/hadoop/giraph folder
Make sure giraph files are owned by hadoop:hadoop
Edit ~/.bash_profile for hadoop user and add:
```
export GIRAPH_HOME=/opt/hadoop/giraph
```
Exit from hadoop user and login again. Verify that variable is set using:
```
set | grep GIRAPH
```
Install maven using:
```
cd $GIRAPH_HOME
mvn package
```
If you want to avoid running tests after install use:
mvn package -DskipTests
If installation is successful then folder 'giraph-core/target' should have file named 'giraph-<ver>-for-hadoop-<ver>-jar-with-dependencies.jar'. Also folder 'giraph-examples/target/' would have jar file for examples with similar naming.

Steps learned from https://giraph.apache.org/quick_start.html

Testing giraph by running a simple giraph job

We will run a simple shortest-path computation giraph job to verify giraph installation:

Create input file named tiny_graph.txt with following data in '/opt/hadoop/data' folder:
```
[0,0,[[1,1],[3,3]]]
[1,0,[[0,1],[2,2],[3,1]]]
[2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]]
[4,0,[[3,4],[2,4]]]
```
Each line above has the format '[source_id,source_value,[[dest_id, edge_value],...]]'. In this graph, there are 5 nodes and 12 directed edges. Copy the input file to HDFS:
../hadoop/bin/hadoop dfs -copyFromLocal tiny_graph.txt /user/hduser/input/tiny_graph.txt

../hadoop/bin/hadoop dfs -ls /user/hduser/input

Run the task using:

../hadoop/bin/hadoop jar /opt/hadoop/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner  org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1

Check the output using:

../hadoop/bin/hadoop dfs -cat /user/hduser/output/shortestpaths/p* | less

Steps learned from https://giraph.apache.org/quick_start.html

Home > Java > Install giraph in hadoop node

Anonymous

Search

Install giraph in hadoop node

Namespaces

More

Page actions

Testing giraph by running a simple giraph job

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Install giraph in hadoop node

Testing giraph by running a simple giraph job

Navigation

Wiki tools

Page tools