Adding new Spark workers on AWS EC2 - access error - amazon-ec2

I have the existing oeprating Spark cluster that was launched with spark-ec2 script. I'm trying to add new slave by following the instructions:
Stop the cluster
On AWS console "launch more like this" on one of the slaves
Start the cluster
Although the new instance is added to the same security group and I can successfully SSH to it with the same private key, spark-ec2 ... start call can't access this machine for some reason:
Running setup-slave on all cluster nodes to mount filesystems, etc...
[1] 00:59:59 [FAILURE] xxx.compute.amazonaws.com
Exited with error code 255 Stderr: Permission denied (publickey).
, obviously, followed by tons of other errors while trying to deploy Spark stuff on this instance.
The reason is that Spark Master machine doesn't have an rsync access for this new slave, but the 22 port is open...

The issue was that SSH key generated on Spark Master was not transferred to this new slave. Spark-ec2 script with start command omits this step. The solution is to use launch command with --resume options. Then the SSH key is transferred to the new slave and everything goes smooth.
Yet another solution is to add the master's public key (~/.ssh/id_rsa.pub) to the newly added slaves ~/.ssh/authorized_keys. (Got this advice on Spark mailing list)

Related

Hadoop on Google Compute Engine

I am trying to setup hadoop cluster in Google Compute Engine through "Launch click-to-deploy software" feature .I have created 1 master and 1 slave node and tried to start the cluster using start-all.sh script from master node and i got error "permission denied(publickey)" .
I have generated public and private keys in both slave and master nodes .
currently i logged into the master with my username, is it mandatory to login into master as "hadoop" user .If so ,what is the password for that userid .
please let me know how to overcome this problem .
The deployment creates a user hadoop which owns Hadoop-specific SSH keys which were generated dynamically at deployment time; this means since start-all.sh uses SSH under the hood, you must do the following:
sudo su hadoop
/home/hadoop/hadoop-install/bin/start-all.sh
Otherwise, your "normal" username doesn't have SSH keys properly set up so you won't be able to launch the Hadoop daemons, as you saw.
Another thing to note is that the deployment should have already started all the Hadoop daemons automatically, so you shouldn't need to manually run start-all.sh unless you're rebooting the daemons after some manual configuration updates. If the daemons weren't running after the deployment ran, you may have encountered some unexpected error during initialization.

Hadoop slave tasktracker error "retrying to connect"

I have a simple setup on EC2
One Master
One Slave
When I do JPS, I see everything except the tasktracker process. When I execute start-all.sh/stop-all.sh, I see the tasktracker message start,logging,stopping etc. I am able to run my pig commands in Mapreduce mode as well on the slave. But I cant do:
Cannot access the http://ec2machineSlave.com:50060/tasktracker.jsp
JPS does not show tasktracker
On Slave get error Retrying to connect to server ec2machineMaster/ipaddress:54300. Already tried x time(s)
I can SSH between Master to Slave. Created a id_dsa key on master, copied the id_dsa.pub to slave and did all the permission thing.
Can anyone suggest what could be wrong or if I need to trying something else.

Permission denied (publickey) on EC2 while starting Hadoop

My manager has provided me with an Amazon instance along with a ppk. Able to login; trying to install hadoop; made the needed config changes like, edited the masters and slaves file from localhost to the EC2 instance name, added needed properties to the mapred-site.xml/hdfs-site.xml/core-site.xml files, formatted the namenode into HDFS.
Now, when i run start-dfs.sh script, i get the following errors.
starting namenode, logging to /home/ubuntu/hadoop/libexec/../logs/hadoop-ubuntu-namenode-domU-12-31-39-07-60-A9.out
The authenticity of host 'XXX.amazonaws.com (some IP)' can't be established.
Are you sure you want to continue connecting (yes/no)? yes
XXX.amazonaws.com: Warning: Permanently added 'XXX.amazonaws.com,' (ECDSA) to the list of known hosts.
XXX.amazonaws.com: Permission denied (publickey).
XXX.amazonaws.com: Permission denied (publickey).
as of now, the master and slave nodes would be the same machine.
XXX is the instance name, and some IP is its IP. Masking them for security reasons.
I have absolutely no idea about using an EC2 instance, SSH etc. only need to run a simple MapReduce Program in it.
Kindly suggest.
Hadoop uses SSH to transfer information from master to slaves. It looks like your nodes are trying to talk to each other via SSH but haven't been configured to do so. In order to communicate, the Hadoop master node need passwordless SSH access to the slave nodes. Passwordless is useful so that every time you try to run a job you don't have to enter your password again for each of the slave nodes. That would be quite tedious. It looks like you'll have to set this up between the nodes before you can continue.
I would suggest you check this guide and find the section called "Configuring SSH". It lays out how to accomplish this.

hadoop cluster clarification

I am a newbie in hadoop and I am trying to run a hadoop jar on Amazon EC2. I have started my amazon ec2 instance through the console, uploaded my files to the dfs and then was able to successfully run the job jar and generate output on the instance.
But still I am confused on one part. I am not sure if the job was run on a single machine in amazon ec2 or was it ran on a cluster? How do I find the number of worker nodes involved for my jar run?
In some reference links I see we have to use launch-cluster command , for example "bin/hadoop-ec2 launch-cluster test-cluster 2" . What is the difference in starting the instance from the console and using this command like launch-cluster.

Hbase on EC2 using Whirr. How?

I'm trying to create a test cluster on EC2 with Hadoop and Hbase using Whirr.
Following instructions from some sites:
http://whirr.apache.org/docs/0.7.0/whirr-in-5-minutes.html
http://www.bigfastblog.com/run-the-latest-whirr-and-deploy-hbase-in-minutes
http://dal-cloudcomputing.blogspot.com/2011/06/how-to-set-up-hadoop-and-hbase-together.html
Steps that I did without problems (or that is what I think...):
Generate ssh keypair with ssh-keygen -t rsa
Modify hbase recipe with keys, amis, zones, and number of nodes (1 master, 1 datanode)
Launch cluster
At this point, I see 2 new EC2 instances in my AWS Panel and I can connect trough ssh to both, but I only see this on home:
configure-zookeeper_hadoop-namenode_hadoop-jobstracker_hbase-master
and
configure_hadoop-datanode_hadoop-tasktracker_hbase-regionserver
I have tried execute (in another terminal)
sh ~/.whirr/hbase/hbase-proxy.sh
But at this point, I can't follow any guide (execute hadoop, hbase, whatever).

Resources