Hadoop Pseudo-Distributed Mode – Setup – Ubuntu – old post. Do not use



Here is the summary of Hadoop – psedodistributed mode installation. This is my 2nd post regarding the environmental setup.

System Specs

  • OS: Ubuntu 32 bit/VirtualBox VM
  • RAM: 4 GB
  • CPU: 1
  • Java: 1.8
  • Hadoop: 2.6

Update Ubuntu

Let’s update ubuntu first before starting the process. This may take much time based on your update frequency.

The following command will update the package definitions.

pandian@kunthi:~$ sudo apt-get update
Fetched 1,646 kB in 8s (204 kB/s)
AppStream cache update completed, but some metadata was ignored due to errors.
Reading package lists... Done

The following command will update the packages

pandian@kunthi:~$ sudo apt-get dist-upgrade
355 upgraded, 5 newly installed, 0 to remove and 0 not upgraded.
Need to get 295 MB/465 MB of archives.
After this operation, 279 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y

<It is time consuming. Take a break.>

Installing JDK

With reference to http://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04 follow the below given instructions to install JDK

pandian@kunthi:~$ sudo apt-add-repository ppa:webupd8team/java
pandian@kunthi:~$ sudo apt-get update
pandian@kunthi:~$ sudo apt-get install oracle-java8-installer
pandian@kunthi:~$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) Client VM (build 25.101-b13, mixed mode)
pandian@kunthi:~$ whereis java
java: /usr/bin/java /usr/share/java /usr/share/man/man1/java.1.gz

Create User and User Group

Let’s run Hadoop with its own user and user group.

pandian@kunthi:~$ sudo groupadd -g 599 hadoop
pandian@kunthi:~$ sudo useradd -u 599 -g 599 hadoop

Directory structure

Let’s create the directory structure

pandian@kunthi:~$ sudo mkdir -p /opt/hadoop
pandian@kunthi:~$ sudo chown hadoop:hadoop -R /opt/hadoop
pandian@kunthi:~$ sudo mkdir -p /var/lib/hadoop/journaldata
pandian@kunthi:~$ sudo chown hadoop:hadoop -R /var/lib/hadoop/journaldata

User access and sudo privilage

We are still doing linux tasks. We haven’t touched Hadoop part yet.

pandian@kunthi:~$ sudo passwd hadoop
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
pandian@kunthi:~$ sudo usermod -d /opt/hadoop hadoop
pandian@kunthi:/opt/software/hadoop$ sudo su
root@kunthi:/home/pandian# cp /etc/sudoers /etc/sudoers.20160820
root@kunthi:~# vi /etc/sudoers

I made the highlighted change.

# User privilege specification
hadoop ALL=(ALL:ALL) ALL
root@kunthi:~# exit
pandian@kunthi:~$ su - hadoop
$ pwd
$ bash
hadoop@kunthi:~$ id
uid=1001(hadoop) gid=599(hadoop) groups=599(hadoop)

Hadoop package download

I copy the link to download hadoop from http://hadoop.apache.org/releases.html. Here is how you’ll download it.

hadoop@kunthi:~$ wget http://download.nus.edu.sg/mirror/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz

The downloaded file is saved in the hadoop directory.
hadoop@kunthi:~$ ls -alt
total 24
-rw-rw-r-- 1 hadoop hadoop 15339 Aug 20 07:43 hadoop-2.6.4.tar.gz
hadoop@kunthi:~$ gunzip hadoop-2.6.4.tar.gz
hadoop@kunthi:~$ tar -xvf hadoop-2.6.4.tar

This will extract the tar file in a new location /opt/hadoop/hadoop-2.6.4. Here is the content of the folder.
hadoop@kunthi:~$ ls -alt hadoop-2.6.4
total 60
drwxr-xr-x 3 hadoop hadoop 4096 Aug 20 07:53 ..
drwxr-xr-x 9 hadoop hadoop 4096 Feb 12 2016 .
drwxr-xr-x 2 hadoop hadoop 4096 Feb 12 2016 bin
drwxr-xr-x 3 hadoop hadoop 4096 Feb 12 2016 etc
drwxr-xr-x 2 hadoop hadoop 4096 Feb 12 2016 include
drwxr-xr-x 3 hadoop hadoop 4096 Feb 12 2016 lib
drwxr-xr-x 2 hadoop hadoop 4096 Feb 12 2016 libexec
-rw-r--r-- 1 hadoop hadoop 15429 Feb 12 2016 LICENSE.txt
-rw-r--r-- 1 hadoop hadoop 101 Feb 12 2016 NOTICE.txt
-rw-r--r-- 1 hadoop hadoop 1366 Feb 12 2016 README.txt
drwxr-xr-x 2 hadoop hadoop 4096 Feb 12 2016 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Feb 12 2016 share

Lets create the configuration directory for Hadoop.
hadoop@kunthi:~$ sudo mkdir -p /etc/hadoop/conf
Create a softlink for the conf folder
hadoop@kunthi:~$ sudo ln -s /opt/hadoop/hadoop-2.6.4/etc/hadoop/** /etc/hadoop/conf/
hadoop@kunthi:~$ ln -s hadoop-2.6.4 hadoop

SSH Keys creation.

Hadoop wants to create key based SSH login
hadoop@kunthi:~$ mkdir ~/.ssh
hadoop@kunthi:~$ cd ~/.ssh/
hadoop@kunthi:~/.ssh$ touch authorized keys
hadoop@kunthi:~/.ssh$ touch known hosts
hadoop@kunthi:~/.ssh$ chmod 700 ~/.ssh/&& chmod 600 ~/.ssh/*
hadoop@kunthi:~/.ssh$ ssh localhost
The authenticity of host 'localhost (' can't be established.
ECDSA key fingerprint is SHA256:Fj6op9qzbfodhsQTmpQJ17G/mcAvu541bTMTb3huhPg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hadoop@localhost's password:
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-31-generic i686)

Bash profile – Environmental variables

As the home folder of the unix user is created by me manually, I need to create the bashprofile. I’ll get a copy of the bash profile, which is working for another user
hadoop@kunthi:~$ cp /home/pandian/.bash*
I’ll modify the above environmental variables to .bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export HADOOP_HOME=/opt/hadoop/hadoop

Let’s apply the changes to current session
hadoop@kunthi:~$ source ~/.bashrc

Hadoop env config

Let’s specify JAVA_HOME
hadoop@kunthi:~/hadoop/etc/hadoop$ cd $HADOOP_HOME/etc/hadoop/
hadoop@kunthi:~/hadoop/etc/hadoop$ cp hadoop-env.sh hadoop-env.sh.20160820

I made the following changes to hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/

Setup passwordless ssh login

hadoop@kunthi:~/hadoop$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/hadoop/.ssh/id_rsa):
Your identification has been saved in /opt/hadoop/.ssh/id_rsa.
Your public key has been saved in /opt/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UXGO3tnfK9K8DayD0/jc+T/WgZetCHOuBAcssUw3gBo hadoop@kunthi
The key's randomart image is:
+---[RSA 2048]----+
| .+.o o.. |
| E .o = o + |
| o + + . . |
| . . + . o |
| S o o o o|
| oo o. =o|
| ==o+..=|
| =.+=+=oo|
| +=o+=++|
hadoop@kunthi:~/hadoop$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
hadoop@kunthi:~/hadoop$ sudo /etc/init.d/ssh restart
[ ok ] Restarting ssh (via systemctl): ssh.service.
hadoop@kunthi:~/hadoop$ ssh hadoop@kunthi
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-31-generic i686)

Define the slave name

Add slave hostname. After change, this is the slave name. It is similar to my hostname
hadoop@kunthi:~/hadoop$ cat /etc/hadoop/conf/slaves


Make the appropriate changes core-site.xml
hadoop@kunthi:~/hadoop$ cat etc/hadoop/core-site.xml


hadoop@kunthi:~$ cd $HADOOP_HOME
hadoop@kunthi:~/hadoop$ hadoop version
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /opt/hadoop/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar
hadoop@kunthi:~/hadoop$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/name
hadoop@kunthi:~/hadoop$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/name
hadoop@kunthi:~/hadoop$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/data
hadoop@kunthi:~/hadoop$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/data
hadoop@kunthi:~/hadoop$ cp etc/hadoop/hdfs-site.xml etc/hadoop/hdfs-site.xml.20160820

I made the folllowing changes

Formatting and starting the namenode

hadoop@kunthi:~/hadoop$ hadoop namenode -format
16/08/20 09:15:09 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = kunthi/
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.4
16/08/20 09:15:10 INFO common.Storage: Storage directory /var/lib/hadoop/cache/hadoop/dfs/name has been successfully formatted.
16/08/20 09:15:10 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at kunthi/
hadoop@kunthi:~/hadoop/sbin$ sudo mkdir /logs
hadoop@kunthi:~/hadoop/sbin$ sudo chown hadoop:hadoop /logs/
hadoop@kunthi:~/hadoop/sbin$ start-dfs.sh
hadoop@kunthi:~/hadoop/sbin$ start-yarn.sh
hadoop@kunthi:~/hadoop/sbin$ jps
6290 DataNode
6707 NodeManager
6599 ResourceManager
6459 SecondaryNameNode
6155 NameNode
7003 Jps
hadoop@kunthi:~/hadoop/sbin$ ./mr-jobhistory-daemon.sh start historyserver

Access the job tracker, name node and data node using your browser as shown below

hadoop001 - jobhistory hadoop002 - namenode information hadoop003 - datanode information



2 thoughts on “Hadoop Pseudo-Distributed Mode – Setup – Ubuntu – old post. Do not use

  1. Pingback: Hadoop – psedodistributed mode installation – second time | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s