I'm trying to get this question solve,
To get mesos slave, is it we have to install Mesos and start mesos slave set up or?
And also I have problem with mesos master which I run a command
./bin/mesos-master.sh --ip=*** --work_dir=/var/lib/mesos
end up it does not continue to run so i stop it running. End up I run the same above command and I get error shown
Failed to initialize, bind: Address already in use [98]
Which part did I do wrongly?
You have to run mesos-master first and then you can connect mesos slave running on a different node to the master. You can refer to getting started guide of mesos. only one slave can connect to the master on the same port. If you get bind address already in use, you can try running slave on another port by passing --port=value parameter. Replace value with port number.
to start mesos master on localhost:
./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
to start and connect slave to master
./bin/mesos-slave.sh --master=127.0.0.1:5050
to start and connect another slave to the same master you have to use another port as default port 5051 is already used by the first connected slave. Use argument --port-value to start slave on another port
./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5053
You may get a permission denied error. If so use sudo to access the given port
sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5053
You can run one more slave but you have to specify ip and a different workdir using
./mesos-slave.sh --master=<ipaddr>:<port> --ip=<ip of slave> --work_dir=<work_dir other than that of a running slave> --port=<another_port>
edit your etc/hosts and add more local ips with the following entries
127.0.0.2 slave2
127.0.0.3 slave3
then you can replace --ip=<ip of slave> with --ip=slave1 or --ip=slave2
You may have to replace <another_port> with ports like 5052,5053 or any available port if you have a running slave. The slave will be using the default port.
To run only a mesos-slave on a host is simple by installing the mesos package and only running the mesos-slave process with the correct flags, it's not a problem if the master is also installed, but be careful only to run the masters correct to the quorum number.
Something already running on the port you are trying to start the mesos-master, which has a web interface.
Check what program runs on the mesos default port, or use another port, more info about the command line documentation available here: Mesos configuration
To see what's using port 5050 or 5051 use either one of these commands:
sudo fuser -v 5050/tcp
sudo lsof -i | grep 5050
Both command will give you the process pid which holds the port. Either kill them or specify a new port for mesos by starting it with the correct port option:
./bin/mesos-master.sh --ip=*** --work_dir=/var/lib/mesos --port=FREE_PORT
Where do you specify the zookeepers for the mesos master and slaves? The following flags are required to start mesos-master (see the link I gave you):
--advertise_ip, --advertise_port, --quorum, --work_dir, --zk
What are your current full configuration for mesos master? You can find the files under related at /etc/mesos/, /etc/mesos-master/, /etc/mesos-slave/, /etc/defaults/mesos, /etc/defaults/mesos-master, /etc/defaults/mesos-slave. If you copy paste the lines from them and the mesos log here, we might give you more help.
Also please explain the cluster you would like to set up (Number of hosts, masters, slaves) and we can also help there.
excecute below command :
sudo netstat -peanut
Then check which process is using the port 5050 and 5051.
Kill those process using the pid.
Start the mesos master and slave again.
This happens to me when I killed the mesos slave accidentally and then restarted it but failed with address-bind issue.
Related
From what I know I am able to set up Mesos master, slave, zookeeper, marathon on a single node.
But once I execute the command to start mesos-master and after that I am trying to start mesos-slave as well but I don't have any way to continue to execute other commands else where. I have to stop the running and run but the problem is mesos-master already stop running.
Don't execute the commands directly from your shell, you want to start all of those components (zookeeper, mesos-master, mesos-slave, and marathon) as services.
/etc/init.d/zookeeper start
start mesos-master
start mesos-slave
start marathon
I forget if zookeeper creates the init script as part of the install for you or not, you may have to find it in the Hadoop docs.
As for the other 3, they all use 'upstart' and you can find the configuration files in /etc/init/
I am new to spark and using spark-1.2.0 with hadoop 2.4.1. I have set up master and four slave nodes. But two of my nodes are not starting.
I have defined IP addresses of nodes in slaves file in spark-1.2.0/conf/ directory.
But when I try to run ./sbin/start-all.sh the error is as follows :
failed to launch org.apache.spark.deploy.worker.Worker
could not find or load main class org.apache.spark.deploy.worker.Worker
This is happening for two nodes. Other two are working fine.
I've also setup spark-env.sh in master as well as slaves. The master also has passwordless ssh connectiviy to the slaves.
I've also tried doing ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT
It gives out the same error as before. Can someone help me with this. Where am I doing mistake?
So I figured out the solution. For all those who are starting new with spark, please check all the jar files in lib folder. I had spark-assembly-1.2.0-hadoop2.4.0.jar file missing in my slave.
I also encountered the same issue. If this is localmode cluster setup then you can run instead:
./sbin/start-master.sh
./sbin/start-slave.sh spark://localhost:7077
Then run:
MASTER=spark://localhost:7077 ./bin/pyspark
I was able to execute my jobs on the shell.
Do remember to setup up conf/slaves and conf/spark-env.sh as per here:
http://pulasthisupun.blogspot.com/2013/11/how-to-set-up-apache-spark-cluster-in.html
Also change localhost to your hostname.
I am setting up a Hadoop YARN cluster and I am using a machine as both a master and a slave. When I start the YARN using the following command, it starts the nodemanager on slaves but not on the master node.
sbin/yarn-daemons.sh start nodemanager
I have a master which also is slave and then I have another two slaves within the cluster, the nodemanagers in the slaves are starting properly.
The error I get :
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
Output of some of the Commands .
cat /etc/services | grep 8040
ampify 8040/tcp # Ampify Messaging Protocol
ampify 8040/udp # Ampify Messaging Protocol
lsof -i tcp:8040
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 28021 df 195u IPv6 3580602 0t0 TCP server1.mydomain.com:ampify (LISTEN
Under the default configuration that Hadoop ships, port 8040 is the port that the NodeManager uses for the localizer. This is basically a server endpoint responsible for bringing the files required to run a container onto the local node. (For example, this can be a MapReduce job's jar file or distributed cache files.)
Assuming that there is another server on the machine (here shown as Ampify) legitimately bound to port 8040, and you don't want to stop that service, then it is possible to reconfigure the port used by the NodeManager for the localizer. Set property yarn.nodemanager.localizer.address in your yarn-site.xml file. This is documented here:
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Pulling that from the XML source in the Hadoop tree, here is the documentation for the property:
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>${yarn.nodemanager.hostname}:8040</value>
</property>
Above error means, you are trying to start a process on 8040, which is already occupied by another instance.
To get rid of this error, you need to kill the process which is currently listening to port 8040. Your lsof output says pid is 28021. kill the process using the following command and start again
kill -9 28021
master host ip:192.168.10.10 user:hadoopm
slaver host ip:192.168.10.11 user:slaver1
slaver host ip:192.168.10.12 user:slaver2
hadoop version 1.0
How do I configure ?
because namenode use start-all.sh script ,it will ssh to datanode use the same username(hadoopm), but the datanode(slaver) has no name (hadoopm), so it will start fail.
and i must use diffrent username between master and slave。
so how do i configure ???
You can just create a user on each node, then all your slaves will share same user.
on debian , add users .
Pseudo-Distributed single node cluster implementation
I am using window 7 with CYGWIN and installed hadoop-1.0.3 successfully. I still start services job tracker,task tracker and namenode on port (localhost:50030,localhost:50060 and localhost:50070).I have completed single node implementation.
Now I want to implement Pseudo-Distributed multiple node cluster . I don't understand how to divide in master and slave system through network ips?
For your ssh problem just follow the link of single node cluster :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and yes, you need to specify the ip's of master and slave in conf file
for that you can refer this url :
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I hope this helps.
Try to create number of VM you want to add in your cluster.Make sure that those VM are having the same hadoop version .
Figure out the IPs of each VM.
you will find files named master and slaves in $HADOOP_HOME/conf mention the IP of VM to conf/master file which you want to treat as master and and do the same with conf/slaves
with slave nodes IP.
Make sure these nodes are having Passwordless-ssh connection.
Format your namenode and then run start-all.sh.
Thanks,