My requirement is to install and configure Hadoop on Oracle Linux. I followed the below procedure.
Prerequisites :
1. Java 1.6 or later must be installed on all machines(nodes).
2. ssh must be installed on all machines. Also check if sshd service is running on all machines(nodes).
3. Clock on all machines must be in sync with each other.
Installtion & Configuration :
1. CREATE USER : Create common user on all Linux nodes. Here are the steps to do that
I. Execute command : useradd hadoop
II. Execute command : passwd hadoop . It will ask to enter password, please set the desired password and confirm it. OR use Administration--> User and Groups.
2. CONFIGURE SSH FOR PASSWORD LESS ENTRY :
SSHb/wClusterNodes
3. DOWNLOAD :
LinkToDownloadSoftware
4. INSTALL : Now will see hadoop-1.0.4.tar.gz created in your current directory. We need to unpack this file. [We will do it on one node only, later on will copy the directory to all other nodes, those steps are mentioned later in the document.]
SSHb/wClusterNodes
3. DOWNLOAD :
LinkToDownloadSoftware
4. INSTALL : Now will see hadoop-1.0.4.tar.gz created in your current directory. We need to unpack this file. [We will do it on one node only, later on will copy the directory to all other nodes, those steps are mentioned later in the document.]
i. Use command –“ tar xzf hadoop-1.0.4.tar.gz–C /home/hadoop/” . Now a directory will be created of name “hadoop-1.0.4” at /home/hadoop.
ii. Now we need to set the owner and group of this directory as “hadoop”. Execute command – chown –R hadoop:hadoop /home/hadoop/hadoop-1.0.4/
5. CREATE DIRECTORIES : [Do it on all nodes].
i. Execute command sudo mkdir /var/log/hadoop/
ii. Execute command – sudo chown hadoop:hadoop /var/log/hadoop/
iii. Execute command – sudo mkdir /usr/local/hadoopstorage
iv. Execute command – sudo chown hadoop:hadoop /usr/local/hadoopstorage/
v. Execute command - chmod -R 755 /usr/loacal/hadoopstorage
vi. Execute command – cd /usr/local/hadoopstorage/
vii. Execute command – mkdir datanode
viii. Execute command – mkdir namenode
6. CONFIGURATIONS : Now we need to set various configuration files according to our environment.
6. CONFIGURATIONS : Now we need to set various configuration files according to our environment.
i. cd /home/hadoop/hadoop-1.0.4/conf/
ii. There will be on file of name “masters” . Open that file and write <IP> of master server(Namenode). Execute command - vi masters , then write IP and save file. In my case it was 192.16.12.248
iii. Now we need to add IPs of slaves(Datanodes) into “slaves” file. So execute – vi slaves. And write all IPs of slave machines. In my case it was as –
192.16.12.249
iv. Now we need to make changes in /hadoop-env.sh script. So execute vi hadoop-env.sh and change few of the lines as mentioned below :
iv. Now we need to make changes in /hadoop-env.sh script. So execute vi hadoop-env.sh and change few of the lines as mentioned below :
a. We need to set JAVA_HOME variable. In my case it was like –
export JAVA_HOME=/usr/java/jdk1.7.0_03/
b. Set heap size as –
export HADOOP_HEAPSIZE=2000
c. Set log file path as –
export HADOOP_LOG_DIR=/var/log/hadoop
v. Now we need to set core-site.xml file. Here we need to set how namenode will be accessed i.e. we need to use IP/hostname of namenode. So add following block into your core-site.xml file under “configuration” tag
.
.
<property>
<name>fs.default.name</name>
<value>hdfs://<IP_OF_NAMENODE>/</value>
<final>true</final>
</property>
My core-site.xml file looks like –
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://paxmsql102/</value>
<final>true</final>
</property>
</configuration>
vi. Now we need to set hdfs-site.xml file as below :
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoopstorage/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoopstorage/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoopstorage/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>
Default block replication. The actual number of replications can be
specified when the file is created. The default is used if replication
is not specified in create time.
</description>
</property>
</configuration>
vii. Now
we are done with all configuration changes. So we need to copy all
these changes to other nodes too. So either we can follow all above
mentioned steps on every node, or we can just copy this hadoop-1.0.4
directory to all the nodes. For that we need to do –
a. scp –r /home/hadoop/hadoop-1.0.4 hadoop@<IP>:/home/hadoop/
b. Now go to that node where you just copy this stuff and check if “/home/hadoop/hadoop-1.0.4” directory has same user and group as of first node.
c. Repeat a,b for all other nodes.
Format and Manage cluster :
Till now we are done with setup, next step is to format this cluster. For that we need to execute following commands from Master(Namenode) :
a. cd /home/hadoop/hadoop-1.0.4/
b. ./bin/hadoop namenode –format
ii. START CLUSTER : HDFS is one point start i.e. we can start the hdfs cluster by just one command on master node. It
will start the whole cluster(i.e. namenode and all the datanodes) On Master(Namenode) execute following command :
./bin/start-dfs.sh
iii. STOP CLUSTER : On Master(Namenode) execute following command :
./bin/stop-dfs.sh
Setup Verification :
a. Execute – cd /home/hadoop/hadoop-1.0.4/
b. Execute – ./bin/hadoop fs –ls /
c. Execute – ./bin/hadoop fs –mkdir /test
d. Execute – ./bin/hadoop fs –chown –R hadoop:hadoop /test
e. Execute – ./bin/hadoop fs –ls /
No comments:
Post a Comment