Install hadoop in a single Cent-OS node

Home > Java > Install hadoop in a single Cent-OS node

Install Oracle Java. Steps can be learned from Install oracle java in Cent-OS

Create user account and password for hadoop using:

sudo /sbin/useradd hadoop
sudo /usr/bin/passwd hadoop

Configure key based login from hadoop to hadoop itself using:

sudo su - hadoop
ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
#To test configuration, should echo hadoop
ssh hadoop@localhost "echo $USER"
exit

Download hadoop source from one of the mirrors linked at https://www.apache.org/dyn/closer.cgi/hadoop/common/ Download the latest stable .tar.gz release from stable folder. (Ex hadoop-1.2.1.tar.gz)

Extract hadoop sources in /opt/hadoop and make hadoop:hadoop its owner using:

sudo mkdir /opt/hadoop
cd /opt/hadoop/
sudo tar xzf <path-to-hadoop-source>
sudo mv hadoop-1.2.1 hadoop
sudo chown -R hadoop:hadoop .

Configure hadoop for single node setup using:

Login as user hadoop and change pwd to /opt/hadoop/hadoop using:
```
sudo su - hadoop
cd /opt/hadoop/hadoop
```

Edit conf/core-site.xml and insert following within configuration tag:

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

Edit conf/hdfs-site.xml and insert following within configuration tag:

<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Edit conf/mapred-site.xml and following within configuration tag:

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

Edit conf/hadoop-env.sh and do following changes:
- Uncomment JAVA_HOME and set it to export JAVA_HOME=/opt/jdk1.7.0_40 or appropriate value based on installed java
- Uncomment HADOOP_OPTS and set it to
```
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
```

Format namenode using:
```
./bin/hadoop namenode -format
```
Start all services using:
```
./bin/start-all.sh
```
Verify that all services got started using 'jps' command whose ouput should be similar to:
```
26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 JobTracker
26249 TaskTracker
25807 NameNode
```
with different process-ids.
Try to access different services at:
- http://localhost:50030/ for the Jobtracker
- http://localhost:50070/ for the Namenode
- http://localhost:50060/ for the Tasktracker
To stop all services use:
```
./bin/stop-all.sh
```

Steps learned from http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/

Verify hadoop installation by running a map-reduce job

We will try to run hadoop wordcount example on large text file to verify hadoop is functioning properly:

Download large text file using:

wget http://www.gutenberg.org/cache/epub/132/pg132.txt

in /opt/hadoop/data

Verify that all hadoop services are running using jps. If they are not running use ../hadoop/bin/start-all.sh to start everything.

Copy file from local filesystem to hdfs using:

../hadoop/bin/hadoop dfs -copyFromLocal pg132.txt /user/hduser/input/pg132.txt

Verify file got copied using:

../hadoop/bin/hadoop dfs -ls /user/hduser/input

Note you can find more dfs commands using:

../hadoop/bin/hadoop dfs help

Open various hadoop web UIs in different browser tabs:
http://localhost:50030/

Shows number of map / reduce processes

http://localhost:50060/

Shows number of hadoop tasks being executed

http://localhost:50070/

Shows number of live nodes and also provides a file browser to browse hdfs

Start hadoop job using:

../hadoop/bin/hadoop jar ../hadoop/hadoop-examples-1.2.1.jar wordcount /user/hduser/input/pg132.txt /user/hduser/output/wordcount

The examples jar version might be different from 1.2.1 based on installed version of hadoop.

Check the output using:

../hadoop/bin/hadoop dfs -cat /user/hduser/output/wordcount/p* | less

Steps learned from https://giraph.apache.org/quick_start.html

Home > Java > Install hadoop in a single Cent-OS node

Anonymous

Search

Install hadoop in a single Cent-OS node

Namespaces

More

Page actions

Verify hadoop installation by running a map-reduce job

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Install hadoop in a single Cent-OS node

Verify hadoop installation by running a map-reduce job

Navigation

Wiki tools

Page tools