I'm trying to create a test cluster on EC2 with Hadoop and Hbase using Whirr.
Following instructions from some sites:
http://whirr.apache.org/docs/0.7.0/whirr-in-5-minutes.html
http://www.bigfastblog.com/run-the-latest-whirr-and-deploy-hbase-in-minutes
http://dal-cloudcomputing.blogspot.com/2011/06/how-to-set-up-hadoop-and-hbase-together.html
Steps that I did without problems (or that is what I think...):
Generate ssh keypair with ssh-keygen -t rsa
Modify hbase recipe with keys, amis, zones, and number of nodes (1 master, 1 datanode)
Launch cluster
At this point, I see 2 new EC2 instances in my AWS Panel and I can connect trough ssh to both, but I only see this on home:
configure-zookeeper_hadoop-namenode_hadoop-jobstracker_hbase-master
and
configure_hadoop-datanode_hadoop-tasktracker_hbase-regionserver
I have tried execute (in another terminal)
sh ~/.whirr/hbase/hbase-proxy.sh
But at this point, I can't follow any guide (execute hadoop, hbase, whatever).
Related
I downloaded scripts from https://github.com/mapr/gce for run MapR script to create MapR Hadoop cluster on GCP.
I already credential Google account with GCP. gcloud auth list OK.
Run MapR script.
./launch-admin-training-cluster.sh --project stone-cathode-10xxxx --cluster MaprBank10 --config-file 4node_yarn.lst --image centos-6 --machine-type n1-standard-2 --persistent-disks 1x256
This's messages from Cygwin command line.
CHECK: -----
project-id stone-cathode-10xxxx
cluster MaprBank10
config-file 4node_yarn.lst
image centos-6 machine n1-standard-2
zone us-central1-b
OPTIONAL: -----
node-name none
persistent-disks 1x256
----- Proceed {y/N} ? y Launch node1
Creating persistent data volumes first (1x256) seq: not found
Launch node2
Creating persistent data volumes first (1x256) seq: not found
Launch node3
Creating persistent data volumes first (1x256) seq: not found
Launch node4
Creating persistent data volumes first (1x256) seq: not found
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
How to Investigate and solve issue. Thank you very much.
The course ADM-201 has particular requirement that you must follow.
The config file that you have chosen is 4node_yarn.lst and it supports 3 x 50 persistent-disks for each node in your 4-node MapR cluster. Since you are mentioning only one disk (1 x 256) in your command, it is not able to meet it's requirements.
Also carefully follow the "Set Up a Virtual Cluster" provided by MapR. Below is the screenshot from the guide provided by MapR.
I have the existing oeprating Spark cluster that was launched with spark-ec2 script. I'm trying to add new slave by following the instructions:
Stop the cluster
On AWS console "launch more like this" on one of the slaves
Start the cluster
Although the new instance is added to the same security group and I can successfully SSH to it with the same private key, spark-ec2 ... start call can't access this machine for some reason:
Running setup-slave on all cluster nodes to mount filesystems, etc...
[1] 00:59:59 [FAILURE] xxx.compute.amazonaws.com
Exited with error code 255 Stderr: Permission denied (publickey).
, obviously, followed by tons of other errors while trying to deploy Spark stuff on this instance.
The reason is that Spark Master machine doesn't have an rsync access for this new slave, but the 22 port is open...
The issue was that SSH key generated on Spark Master was not transferred to this new slave. Spark-ec2 script with start command omits this step. The solution is to use launch command with --resume options. Then the SSH key is transferred to the new slave and everything goes smooth.
Yet another solution is to add the master's public key (~/.ssh/id_rsa.pub) to the newly added slaves ~/.ssh/authorized_keys. (Got this advice on Spark mailing list)
how to create Amazon EMR cluster from the command line in Ubuntu? I have the private key,access key and the pem file?....Can anyone guide me as how to run the word count example from the command line
You can use AWS command line tools (CLI) for this. http://docs.aws.amazon.com/cli/latest/userguide/installing.html
Once these are installed, you have to configure the tool using 'aws configure' command and enter priate key, access key.
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
You will also need to enter the region where your EMR cluster (and other resources) will be launched.
To create cluster, the 'create-cluster' command need to be used.
http://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html
You dont need the pem file for these steps.
Once the cluster is launched, you can run the word count demo as a 'step'. You can add word count demo as a 'step'
Starting a cluster and running a hadoop job (a script in this case):
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html
Some examples of add-steps is in this section for an already running cluster:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
I am trying to setup hadoop cluster in Google Compute Engine through "Launch click-to-deploy software" feature .I have created 1 master and 1 slave node and tried to start the cluster using start-all.sh script from master node and i got error "permission denied(publickey)" .
I have generated public and private keys in both slave and master nodes .
currently i logged into the master with my username, is it mandatory to login into master as "hadoop" user .If so ,what is the password for that userid .
please let me know how to overcome this problem .
The deployment creates a user hadoop which owns Hadoop-specific SSH keys which were generated dynamically at deployment time; this means since start-all.sh uses SSH under the hood, you must do the following:
sudo su hadoop
/home/hadoop/hadoop-install/bin/start-all.sh
Otherwise, your "normal" username doesn't have SSH keys properly set up so you won't be able to launch the Hadoop daemons, as you saw.
Another thing to note is that the deployment should have already started all the Hadoop daemons automatically, so you shouldn't need to manually run start-all.sh unless you're rebooting the daemons after some manual configuration updates. If the daemons weren't running after the deployment ran, you may have encountered some unexpected error during initialization.
I have CDH for running in a cluster and I have ssh access to the machine. I need to connect my Mac to Cluster, so if I do hadoop fs -ls , it should show me the content of the cluster.
I have configured HADOOP_CONF to point to the configuration of the cluster. I am running CDH4 in my cluster. Am I missing something here , Is it possible to connect ?
Is there some ssh key setup that I need to do ?
There are a few of things you will need to ensure to do this:
You need to set your HADOOP_CONF_DIR environment variable to point to a directory that carries config XMLs that point to your cluster.
Your Mac should be able to directly access the hosts that form your cluster (all of them). This can be done via VPN, for example - if the cluster is secured from external networks.
Your Mac should carry the same version of Hadoop that the cluster runs.