Wednesday, January 25, 2012

Setting up Pseudo Cluster.

With the success of the single node setup, I wish to get a 'cluster' up and running again. Using the selection of tutorials from before, I hope to achieve this:


Note that since the account was setup and given privelages in the last posts details, that shall be skipped here.

1. Create temp working directory

:~$ mkdir app
:~$ mkdir app/hadoop
:~$ mkdir app/hadoop/tmp
:~$ chmod -R 777 hadoop

Since hduser was previously setup as a sudo user - the above commands work in my case.

2. Setup config files.

Note: GEDIT works... if sudo is typed before it... weird!!!

2.1 Core-site.xml edit:
cd /usr/local/hadoop/conf
:~$ sudo gedit core-site.xml

Between the <configuration> </configuration> tags some code needs to be inserted.

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
//this will map hadoop.tmp.dir to the temporary directory that was created in step 1 above.

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
//this will set the default comms port to :54310 I believe... This is where I was getting an error before, we shall see if it is recreated shortly... hopefully creating a new virtual drive disk image and starting from scratch will prevent this - as it helped with the word count example.

2.2 mapred-site.xml
:~$ sudo gedit mapred-site.xml

Between the <configuration> </configuration> tags some code needs to be inserted:

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
//this will set the mapreduce job tracker to use port 54311.

2.3 hdfs-site.xml
:~$ sudo gedit hdfs-site.xml

Between the <configuration> </configuration> tags some code needs to be inserted:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
//Sets up the number of drives that will be replicated. Value is 1 as only one drive will be used.

3. Format Namenode
Prepare HDFS by formatting namenode before setting up the cluster:

:~$ hadoop namenode -format

by using the command :~$ ssh localhost we can test if the ssh server is running. In my case, I set a password and it will ask Every_single_time it wants to connect... so I will re-generate the ssh key to remove the password:

:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
//key is generated
:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
//adds the generated key to the authorized_keys folder.

typing :~$ ssh localhost now gives an update readout, and a last-login timestamp.

No comments:

Post a Comment