Not able to deploy workers on Spark-1.2.0 - hadoop

I am new to spark and using spark-1.2.0 with hadoop 2.4.1. I have set up master and four slave nodes. But two of my nodes are not starting.
I have defined IP addresses of nodes in slaves file in spark-1.2.0/conf/ directory.
But when I try to run ./sbin/start-all.sh the error is as follows :
failed to launch org.apache.spark.deploy.worker.Worker
could not find or load main class org.apache.spark.deploy.worker.Worker
This is happening for two nodes. Other two are working fine.
I've also setup spark-env.sh in master as well as slaves. The master also has passwordless ssh connectiviy to the slaves.
I've also tried doing ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT
It gives out the same error as before. Can someone help me with this. Where am I doing mistake?

So I figured out the solution. For all those who are starting new with spark, please check all the jar files in lib folder. I had spark-assembly-1.2.0-hadoop2.4.0.jar file missing in my slave.

I also encountered the same issue. If this is localmode cluster setup then you can run instead:
./sbin/start-master.sh
./sbin/start-slave.sh spark://localhost:7077
Then run:
MASTER=spark://localhost:7077 ./bin/pyspark
I was able to execute my jobs on the shell.
Do remember to setup up conf/slaves and conf/spark-env.sh as per here:
http://pulasthisupun.blogspot.com/2013/11/how-to-set-up-apache-spark-cluster-in.html
Also change localhost to your hostname.

Related

Spark - Add Worker from Local Machine (standalone spark cluster manager)?

When running spark 1.4.0 in a single machine, I can add worker by using this command "./bin/spark-class org.apache.spark.deploy.worker.Worker myhostname:7077". The official documentation points out another way by adding "myhostname:7077" to the "conf/slaves" file followed by executing the command "sbin/start-all.sh" which invoke the master and all workers listed in conf/slaves file. However, the later method doesn't work for me (with time-out error). Can anyone help me with this?
Here is my conf/slaves file (assume the master URL is myhostname:700):
myhostname:700
The conf.slaves file should just be the list of the hostnames, you don't need to include the port # that spark runs on (I think if you do it will try and ssh on that port which is probably where the timeout comes from).

CDH4 : Add new node to existing cluster

I have successfully created hadoop cluster with CDH4 on ubuntu . I have created this with one master(master) and one slave(slave1) . Now I want to add one more cluster . For this I just cloned slave2 and updated hosts and ssh accordingly . Then I updated conf/slaves file with all datanode dns names in all nodes and restarted everything . But it's not detecting the new datanode instead it only shows the old one that is slave1 not slave2 . Can anyone please help me on this ?
I have used cdh4-repository_1.0_all.deb
#user2009755, you need to create a master and slave file only in the master. And in configuration files in $HADOOP_HOME/etc/hadoop, make necessary changes to the URI pointing to the master node.NOTE: Try to format the namenode and delete the tmp files (usually /tmp/*) but if you changed it in core-site.xml, format that directory in all nodes and start all the daemons, it worked for me.
There is so many reasons,
Have you change the dfs.replication value to 3 in conf/hdfs-site.xml??
check on master with cammands hduser#master:~$ ssh slave it should be show the slave terminal if not then execute this cammand -hduser#master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser#slave
for fully understand see this link
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

hadoop 2 node cluster setup

Can anyone please tell how to create a hadoop 2 node cluster ?i have 2 Virual machines both have hadoop single node setup installed properly,please tell how to form 2 node cluster from that ,and mention proper network setup for that.Thanks in Advance.i am using hadoop -1.0.3 version.
Hi can you do one thing for network configuration of vms.
Make sure of the network connect of virtual machine setting of hostonly.
And add two hostname and ips in /etc/hosts file.
add two hostnames in $HADOOP_HOME/conf/slaves file.
Make sure of secondary namenode hostname in $HADOOP_HOME/conf/masters file.
Then make sure same configuration on two machine.
hadoop namenode -format
start-all.sh
Best of luck.

add hadoop servers to same cluster(Master-Slave )

I have 2 hadoop pseudo(standalone) servers, which I created for testing purposes.
Now I want to club the two servers into one cluster and make this a Master-Slave configuration.
Is there anyway to achieve this?
Thanks in advance.
Step 1
Determine which pseudo you want to be master. Once decided add master machine name in file
$hadoop_home/conf/master
Step 2
If u want other pseudo as well as master machine to act as DataNodes then add these machine names in file
$hadoop_home/conf/slaves
Step 3
Make SSH password less connection between master and slave. And change
/etc/hosts
file if necessary if they are getting connected.
Step 4
Now prepare the hadoop MultiNode Cluster to start
Format namenode
$hadoop_home/bin/hadoop namenode -format
Start the cluster
$hadoop_home/bin/start-all.sh
sure you can, just follow Running Hadoop on Ubuntu Linux (Multi-Node Cluster)

Multi Node Cluster Hadoop Setup

Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,

Resources