Run MapR script (Google Cloud SDK) to create MapR Hadoop cluster on GCP does not work - hadoop

I downloaded scripts from https://github.com/mapr/gce for run MapR script to create MapR Hadoop cluster on GCP.
I already credential Google account with GCP. gcloud auth list OK.
Run MapR script.
./launch-admin-training-cluster.sh --project stone-cathode-10xxxx --cluster MaprBank10 --config-file 4node_yarn.lst --image centos-6 --machine-type n1-standard-2 --persistent-disks 1x256
This's messages from Cygwin command line.
CHECK: -----
project-id stone-cathode-10xxxx
cluster MaprBank10
config-file 4node_yarn.lst
image centos-6 machine n1-standard-2
zone us-central1-b
OPTIONAL: -----
node-name none
persistent-disks 1x256
----- Proceed {y/N} ? y Launch node1
Creating persistent data volumes first (1x256) seq: not found
Launch node2
Creating persistent data volumes first (1x256) seq: not found
Launch node3
Creating persistent data volumes first (1x256) seq: not found
Launch node4
Creating persistent data volumes first (1x256) seq: not found
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
How to Investigate and solve issue. Thank you very much.

The course ADM-201 has particular requirement that you must follow.
The config file that you have chosen is 4node_yarn.lst and it supports 3 x 50 persistent-disks for each node in your 4-node MapR cluster. Since you are mentioning only one disk (1 x 256) in your command, it is not able to meet it's requirements.
Also carefully follow the "Set Up a Virtual Cluster" provided by MapR. Below is the screenshot from the guide provided by MapR.

Related

How to change the container log location in a Dataproc cluster?

What is the correct way to change the container log location in a Dataproc cluster during cluster creation?
The default path is /var/log/hadoop-yarn/userlogs and I want to change it to a local SSD mount such as /mnt/1/hadoop/yarn/userlogs. I tried adding
--properties=yarn:yarn.nodemanager.log-dirs
to the gcloud dataproc clusters create command but got the error -
bash: --properties=yarn:yarn.nodemanager.log-dirs=/mnt/1/hadoop/yarn: No such file or directory
This is most likely because the local SSD gets mounted after the cluster is created. Can someone please help?

Hadoop client node installation

I have 12 node cluster. Its Hardware information are :
NameNode : CPU Core i3 2.7 Ghz | 8GB RAM | 500 GB HDD
DataNode : CPU Core i3 2.7 Ghz | 2GB RAM | 500 GB HDD
I have installed the hadoop 2.7.2. I am using normal hadoop installation process on ubuntu and it work fine. But I want to add client machine.and I have no such clue that how to add client machine.
Question :
Installing process of Client machine. ?
How to run any script of pig/hive on that client machine ?
Client should have same copy of Hadoop Distribution and configuration which is present at Namenode then Only Client will come to know on which node Job tracker/Resourcemanager is running, and IP of Namenode to access HDFS data.
Also you need to update /etc/hosts of client machine with IP addresses and hostnames of namenode and datanode.
Note that, you shouldn’t start any hadoop service on client machine.
Steps to follow on client machine:
create an user account on the cluster, say user1
create an account on client machine with the same name: user1
configure client machine to access the cluster machines (ssh w\out passphrase i.e, password less login)
copy/get a hadoop distribution same as cluster to client machine and extract it to /home/user1/hadoop-2.x.x
copy(or Edit) the hadoop configuration files (*-site.xml) from Namenode of the cluster - from this client will know where the Namenode/resourcemanager is running.
Set environment variables: JAVA_HOME, HADOOP_HOME (/home/user1/hadoop-2.x.x)
Set hadoop bin to your path: export PATH=$HADOOP_HOME/bin:$PATH
test it out: hadoop fs -ls / which should list the root directory of the cluster hdfs.
you may face some issues like privileges, may need to set JAVA_HOME places like conf/hadoop-env.sh on client machine. update/comment any error you get.
Answers to more questions from comments:
How to load data from client node to hdfs ? - Just run hadoop fs commands from client machine: hadoop fs -put /home/user1/data/* /user/user1/data - you can also write shell-scripts that would run these command(s) if you need to run them many times.
Why I am installing hadoop on the client if we only use ssh to connect remotely to the master node ?
because client need to communicate with cluster, and need to know
where cluster nodes are.
client will be running hadoop jobs
like hadoop fs commands, hive queries, hadoop jar commnads, spark
jobs, developing mapreduce jobs etc for which client will need
hadoop binaries on client node.
Basically you are not only using the ssh to
connect, but you are performing some operations on hadoop cluster from
client node so you would need hadoop binaries.
ssh is used by
hadoop binaries on client node, when you run such operations like hadoop fs
-ls/ from client node to cluster. (remember adding $HADOOP_HOME/bin to PATH as part of installation process above)
when you are saying "we only use ssh" - that sounds to me like when you want to make changes/access hadoop configuration files from cluster you are connecting using ssh to cluster nodes - you do this as part of administrative work but when you need to run hadoop commands/jobs against cluster from client node you dont need to ssh manually - hadoop installation on client node will take care of it.
with out hadoop instalations how can you run hadoop commands/jobs/queries from client node to cluster?
3. should user name 'user1' must be same ? what if it is different ? - it will work. you can install hadoop on client node under group user say: qa or dev, and all users on client node as sudo under that group. than when user1 on client node need to run any hadoop job on cluster: user1 should be able to sudo -i -u qa and then run hadoop command from it.

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

add hadoop servers to same cluster(Master-Slave )

I have 2 hadoop pseudo(standalone) servers, which I created for testing purposes.
Now I want to club the two servers into one cluster and make this a Master-Slave configuration.
Is there anyway to achieve this?
Thanks in advance.
Step 1
Determine which pseudo you want to be master. Once decided add master machine name in file
$hadoop_home/conf/master
Step 2
If u want other pseudo as well as master machine to act as DataNodes then add these machine names in file
$hadoop_home/conf/slaves
Step 3
Make SSH password less connection between master and slave. And change
/etc/hosts
file if necessary if they are getting connected.
Step 4
Now prepare the hadoop MultiNode Cluster to start
Format namenode
$hadoop_home/bin/hadoop namenode -format
Start the cluster
$hadoop_home/bin/start-all.sh
sure you can, just follow Running Hadoop on Ubuntu Linux (Multi-Node Cluster)

How to connect mac to hadoop/hdfs cluster

I have CDH for running in a cluster and I have ssh access to the machine. I need to connect my Mac to Cluster, so if I do hadoop fs -ls , it should show me the content of the cluster.
I have configured HADOOP_CONF to point to the configuration of the cluster. I am running CDH4 in my cluster. Am I missing something here , Is it possible to connect ?
Is there some ssh key setup that I need to do ?
There are a few of things you will need to ensure to do this:
You need to set your HADOOP_CONF_DIR environment variable to point to a directory that carries config XMLs that point to your cluster.
Your Mac should be able to directly access the hosts that form your cluster (all of them). This can be done via VPN, for example - if the cluster is secured from external networks.
Your Mac should carry the same version of Hadoop that the cluster runs.

Resources