Cloudera Docker image ... everything errors out

Cloudera Docker image ... everything errors out - hadoop

I am running a 16 GB Macbook pro with El Capitan OS. I installed the cloudera docker image using
docker pull cloudera/quickstart:latest
docker run --privileged=true --hostname=quickstart.cloudera -t -i 9f3ab06c7554 /usr/bin/docker-quickstart
the image boots fine, and I can see most services starting up
Started Hadoop historyserver: [ OK ]
starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-quickstart.cloudera.out
Started Hadoop nodemanager: [ OK ]
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-quickstart.cloudera.out
Started Hadoop resourcemanager: [ OK ]
starting master, logging to /var/log/hbase/hbase-hbase-master-quickstart.cloudera.out
Started HBase master daemon (hbase-master): [ OK ]
starting rest, logging to /var/log/hbase/hbase-hbase-rest-quickstart.cloudera.out
Started HBase rest daemon (hbase-rest): [ OK ]
starting thrift, logging to /var/log/hbase/hbase-hbase-thrift-quickstart.cloudera.out
Started HBase thrift daemon (hbase-thrift): [ OK ]
Starting Hive Metastore (hive-metastore): [ OK ]
Started Hive Server2 (hive-server2): [ OK ]
Starting Sqoop Server: [ OK ]
Sqoop home directory: /usr/lib/sqoop2
Some failures as well
Failure to start Spark history-server (spark-history-server[FAILED]n value: 1
Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-quickstart.cloudera.out
hbase-regionserver.
Starting hue: [FAILED]
But once the bootup is complete, if I try to run anything it fails
for example trying to run spark-shell
[root#quickstart /]# spark-shell
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000b0000000, 357892096, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 357892096 bytes for committing reserved memory.
# An error report file with more information is saved as:
# //hs_err_pid3113.log
or trying to run hive shell
[root#quickstart /]# hive
Unable to determine Hadoop version information.
'hadoop version' returned:
Hadoop 2.6.0-cdh5.5.0 Subversion http://github.com/cloudera/hadoop -r fd21232cef7b8c1f536965897ce20f50b83ee7b2 Compiled by jenkins on 2015-11-09T20:37Z Compiled with protoc 2.5.0 From source with checksum 98e07176d1787150a6a9c087627562c This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.5.0.jar
[root#quickstart /]#
My question is what can I do so that I can run the spark-shell and the hive shell successfully?

Since you are running Docker on a Mac, Docker runs under VirtualBox, not directly with the Mac's memory. (Same thing would happen in Windows).
You probably wouldn't get these errors on a Linux host since Docker isn't virtualized there.
The Cloudera quickstart vm recommends 8Gb of memory to run all the services and the docker vm is only 512Mb, I think.
The solution would be to stop the docker-machine instance, open VirtualBox, and increase the memory size of the "default" VM to the necessary amount.

Related

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.

It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

Start hbase in CDH5 VM in standalone mode

How can I start my Hbase in Standalone mode in a CDH5 VM. In CDH3 VM, I used to run
'sudo sh start-hbase.sh'
in the below path:
/usr/lib/hbase/bin
But, I can only see 'start-hbase.cmd' in the above path in CDH5 VM. Please let me know how can I start my HBase instance by invoking the above '.cmd' file

We can use the following command to start a service in CDH5 VM
sudo service <(service name)> start
eg:
sudo service zookeeper-server start
or we can also go to the path
/etc/init.d
and execute the same command as above!

Spark Standalone Mode: Worker not starting properly in cloudera

I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips

Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.

Hadoop CDH3 ERROR. Could not start Hadoop datanode daemon

I'm deploying Hadoop CDH3 in pseudo-distributed mode on a VPS.
So i have installed CDH3, then i have executed
sudo apt-get install hadoop-0.20-conf-pseudo
but if i try to start all daemons with
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
it throws
ERROR. Could not start Hadoop datanode daemon
The same installation and starting commands works on my notebook.
I don't understand the cause. In fact the log file is empty. The available RAM is about 900MB, with 98G of available disk space.
Which can be the cause or how can i discover it? I'm excluding that the error is from the configuration files.

Consider using Cloudera Manager, it could save you some time (especially if you use multiple nodes). There is a nice video on Youtube which shows deployment process

hbase 0.90.5 not started in distributed mode with hadoop 1.0

I can't start HMaster :(
Please help me. Second day about this error
Exception in thread "main" java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
Unable to start master
Has already worked well hadoop cluster installation. Wait 30 sec before start hbase.
I followed this tutorial http://hbase.apache.org/book/example_config.html#d0e2432
Change system configuration in required section ulimit and nproc
Have: 1 master, 4 slaves
Here all diagnostic information
Java java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Debian 6.03 Linux slave1 2.6.32-5-amd64
Copy hadoop-core to hbase/lib on each machine
hduser#slave1:/usr/local/hbase$ ls lib/hadoo*
lib/hadoop-core-1.0.0.jar
Hbase: hbase-0.90.5
DETAILED CONFIGURATION HERE http://pastie.org/private/hnhpw2jeq7p2njegnuha
(unable to put here - cause 2 links limit)

you can also copy commons-configuration-1.6.jar in the hadoop lib directory to the hbase lib directory and try agai

There must be more errors or warning before this error.
Clear the /hbase/logs dir, then start_hbase.sh and provide full log here

make your hosts file as following:
127.0.0.1 localhost
For Hadoop
192.168.56.1 master
192.168.56.101 slave
and in hbase conf put following entries :
hbase.rootdir
hdfs://master:9000/hbase
hbase.master
master:60000
The host and port that the HBase master runs at.
hbase.regionserver.port
60020
The host and port that the HBase master runs at.
hbase.cluster.distributed
true
hbase.tmp.dir
/home/cluster/Hadoop/hbase-0.90.4/temp
hbase.zookeeper.quorum
master
dfs.replication
2
hbase.zookeeper.property.clientPort
2181
Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
If you are using localhost anywhere remove that and replace it with "master" which is name for namenode in your hostfile....
one morething you can do
sudo gedit /etc/hostname
this will open the hostname file bydefault ubuntu will be there so make it master. and restart your system.
For hbase specify in "regionserver" file inside conf dir put these entries
master
slave
and restart.everything.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio