Monday, 11 March 2013

Hadoop Installtion on OE Linux --Cluster nodes

My requirement is to install and configure Hadoop on Oracle Linux. I followed the below procedure.

 Prerequisites :

1.     Java 1.6 or later must be installed on all machines(nodes).
2.     ssh  must be installed on all machines. Also check if sshd service is running on all machines(nodes).
3.     Clock on all machines  must be in sync  with each other.

Installtion & Configuration :

1.     CREATE USER : Create common user on all Linux nodes. Here are the steps to do that
                                           I.         Execute command : useradd hadoop
                                         II.        Execute command : passwd hadoop . It will ask to enter password, please set the desired password and confirm it.  OR  use Administration--> User and Groups.
2.       CONFIGURE SSH FOR PASSWORD LESS ENTRY :
SSHb/wClusterNodes

 3.      DOWNLOAD
LinkToDownloadSoftware


4.        INSTALL : Now will see hadoop-1.0.4.tar.gz created in your current directory. We need to unpack this file.  [We will do it on one node only, later on will copy the directory to all other nodes, those steps are mentioned later in the document.]
                                            i.          Use command –“ tar xzf hadoop-1.0.4.tar.gz–C  /home/hadoop/” . Now a directory will be created of name “hadoop-1.0.4” at /home/hadoop.
                                          ii.          Now we need to set the owner and group of this directory as “hadoop”. Execute command – chown –R hadoop:hadoop /home/hadoop/hadoop-1.0.4/
5.       CREATE DIRECTORIES : [Do it on all nodes].

i.      Execute command  sudo mkdir /var/log/hadoop/

ii.  Execute command – sudo chown hadoop:hadoop /var/log/hadoop/

iii. Execute command – sudo mkdir /usr/local/hadoopstorage

iv. Execute command – sudo chown hadoop:hadoop /usr/local/hadoopstorage/

v. 
Execute command  - chmod -R 755 /usr/loacal/hadoopstorage

vi. Execute command – cd /usr/local/hadoopstorage/
                              vii.  Execute command – mkdir datanode
                              viii. Execute command – mkdir namenode
6.       CONFIGURATIONS : Now we need to set various configuration files according to our environment.

i.      cd  /home/hadoop/hadoop-1.0.4/conf/

ii.     There will be on file of name “masters” . Open that file and write <IP> of master server(Namenode). Execute command -  vi masters , then write IP and save file. In my case it was 192.16.12.248

iii.    Now we need to add IPs of slaves(Datanodes) into “slaves” file. So execute – vi slaves. And write all IPs of slave machines. In my case it was as –
         192.16.12.249
iv.     Now we need to make changes in /hadoop-env.sh script. So execute vi hadoop-env.sh and change few of the lines as mentioned below :
a.       We need to set JAVA_HOME variable. In my case it was like –
export JAVA_HOME=/usr/java/jdk1.7.0_03/
b.      Set heap size as –
export HADOOP_HEAPSIZE=2000
c.       Set log file path as –
export HADOOP_LOG_DIR=/var/log/hadoop
v.     Now we need to set core-site.xml file. Here we need to set how namenode will be accessed i.e. we need to use  IP/hostname of namenode. So add following block into your core-site.xml file under “configuration” tag
.
<property>
<name>fs.default.name</name>
<value>hdfs://<IP_OF_NAMENODE>/</value>
<final>true</final>
</property>
       My core-site.xml file looks like –
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xslhref=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://paxmsql102/</value>
<final>true</final>
</property>
</configuration>

vi.    Now we need to set hdfs-site.xml file as below :
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoopstorage/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoopstorage/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoopstorage/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description> Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
vii.   Now we are done with all configuration changes. So we need to copy all these changes to other nodes too. So either we can follow all above mentioned steps on every node, or we can just copy this hadoop-1.0.4 directory to all the nodes. For that we need to do –

a.     scp –r /home/hadoop/hadoop-1.0.4  hadoop@<IP>:/home/hadoop/
b.     Now go to that node where you just copy this stuff and check if “/home/hadoop/hadoop-1.0.4” directory has same user and group as of first node.
c.       Repeat a,b for all other nodes.

Format and Manage cluster :

Till now we are done with setup, next step is to format this cluster.  For that we need to execute following commands from Master(Namenode) :
       a.       cd /home/hadoop/hadoop-1.0.4/
       b.      ./bin/hadoop namenode –format

ii.     START CLUSTER : HDFS is one point start i.e. we can start the hdfs cluster by just one command on master node. It
will start the whole cluster(i.e. namenode and all the datanodes) On Master(Namenode) execute following command :
        ./bin/start-dfs.sh

iii.     STOP CLUSTER : On Master(Namenode) execute following command :
       ./bin/stop-dfs.sh

Setup Verification :

a.     Execute – cd /home/hadoop/hadoop-1.0.4/
b.     Execute – ./bin/hadoop fsls /
c.     Execute – ./bin/hadoop fsmkdir /test
d.    Execute – ./bin/hadoop fschown –R hadoop:hadoop /test
e.    Execute – ./bin/hadoop fsls /

Common Problems : attached in later

 








No comments:

Post a Comment