Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86 64 Desktop

From Notes_Wiki
Revision as of 05:43, 15 February 2023 by Saurabh (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop

Installation steps

  1. Install java as mentioned at Installing Java on Ubuntu 12.04 x86_64 Desktop
  2. Create user account and group for hadoop using:
    sudo groupadd hadoop
    sudo useradd hadoop -b /home -g hadoop -mkU -s /bin/bash
    cd /home/hadoop
    sudo cp -rp /etc/skel/.[^.]* .
    sudo chown -R hadoop:hadoop .
    sudo chmod -R o-rwx .
    Note -m in useradd should be specified before -k.
  3. Install openssh server using:
    sudo apt-get -y install openssh-server
  4. Setup password-less ssh for hadoop user:
    sudo su - hadoop
    cat ~/.ssh/ >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys
    #To test configuration, should echo hadoop
    ssh hadoop@localhost "echo $USER"
  5. Download hadoop source from one of the mirrors linked at Download the latest stable .tar.gz release from stable folder. (Ex hadoop-2.2.0.tar.gz)
  6. Extract hadoop sources in /opt/hadoop and make hadoop:hadoop its owner:
    sudo mkdir /opt/hadoop
    cd /opt/hadoop/
    sudo tar xzf <path-to-hadoop-source>
    sudo mv hadoop-2.2.0 hadoop
    sudo chown -R hadoop:hadoop .
  7. Configure hadoop single-node setup using:
    1. Login as user hadoop:
      sudo su - hadoop
    2. Edit '~/.bashrc' and append
      export JAVA_HOME=/opt/jdk1.7.0_40
      export HADOOP_INSTALL=/opt/hadoop/hadoop
      export HADOOP_PREFIX=/opt/hadoop/hadoop
      export HADOOP_HOME=/opt/hadoop/hadoop
      export PATH=$PATH:$HADOOP_INSTALL/bin
      export PATH=$PATH:$HADOOP_INSTALL/sbin
      export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    3. Change folder to /opt/hadoop/hadoop/etc/hadoop
    4. Edit '' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'. Do not leave it as ${JAVA_HOME} as that does not works.
    5. Edit '/opt/hadoop/hadoop/libexec/' and prepend following line at start of script:
      export JAVA_HOME=/opt/jdk1.7.0_40
    6. Exit from hadoop user and relogin using 'sudo su - hadoop'. Check hadoop version using 'hadoop version' command.
    7. Again change folder to /opt/hadoop/hadoop/etc/hadoop
    8. Use 'mkdir /opt/hadoop/tmp'
    9. Edit 'core-site.xml' and add following between <configuration> and </configuration>:
    10. Edit 'yarn-site.xml' and add following between <configuration> and </configuration>:
      yarn.nodemanager.aux-services.mapreduce.shuffle.class should probably be yarn.nodemanager.aux-services.mapreduce_shuffle.class Need to verify this
    11. Use 'cp mapred-site.xml.template mapred-site.xml'
    12. Edit 'mapred-site.xml' and add following between <configuration> and </configuration> tags:
    13. Setup folders for HDFS using:
      cd ~
      mkdir -p mydata/hdfs/namenode
      mkdir -p mydata/hdfs/datanode
      cd /opt/hadoop/hadoop/etc/hadoop
    14. Edit 'hdfs-site.xml' and put following values between <configuration> and </configuration>:
  8. Format namenode using 'hdfs namenode -format'
  9. Start dfs and yarn using '' and ''.
  10. Test using 'jps', you should see following services running:
    18098 Jps
    17813 NodeManager
    17189 DataNode
    16950 NameNode
    17462 SecondaryNameNode
    17599 ResourceManager
  11. Access NameNode at http://localhost:50070 and ResourceManager at http://localhost:8088
  12. Run sample map reduce job using:
    cd /opt/hadoop/hadoop
    hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
    Verify by using ResourceManager UI that task finishes successfully.

Some steps learned from and

Running a word count example on single-node hadoop installation

Use following steps:

      #Assuming all services are running, use jps to verify
      hadoop dfs -mkdir /user/hadoop/input
      hadoop dfs -copyFromLocal pg132.txt /user/hadoop/input/pg132.txt
      cd /opt/hadoop/hadoop 
      hadoop jar \
      ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar \
      wordcount /user/hadoop/input/pg132.txt /user/hadoop/output/wordcount
      hadoop dfs -cat /user/hadoop/output/wordcount/p* | less

To stop running hadoop daemons use:

Steps learned from

Home > Ubuntu > Hadoop cluster setup > Installing Hadoop 2.2.0 on single Ubuntu 12.04 x86_64 Desktop