Cloudera Docker keeps retrying to connect to 8032 - hadoop

I use the docker image from cloudera, but it seems the configuration not quite right. Because I do this:
hadoop jar /usr/lib/hadoop*/contrib/streaming/hadoop-streaming*cdh*.jar \
-mapper mapper -reducer reducer \
-file mapper -file reducer \
-input input -output output
I got this all the time:
18/03/14 02:34:33 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
This is how I did prior to running the process above.
Increase Docker memory into 8GB
Start the container, by running this in the host
docker run -p 7180:7180 \
--hostname=quickstart.cloudera --privileged=true \
-t -i cloudera/quickstart:latest \
/usr/bin/docker-quickstart
Start the manager
/home/cloudera/cloudera-manager --express
Open cloudera manager to start HDFS
Upload sample input into HDFS

You need to use the manager to start not just HDFS, but also YARN

Related

hive start tez who fails to connect to yarn

I have installed an hadoop cluster with one master and 3 nodes.
hadoop v 3.2.1, hive 3.1.2 and tez 0.10.0
I verified conf files and I can't find why tez is trying to connect to yarn throw localhost:18032
yarn is configured to be on hadoop-master:8032.
I tested directly tez with an example and it connects correctly to yarn
hadoop jar tez-examples-0.10.0.jar orderedwordcount /user/hive/foo.txt /user/hive/tmp/out
So, when executing a SQL like insert into ... I have that log
2022-10-28 12:36:15,036 INFO ipc.Client (Client.java:handleConnectionFailure(1010)) - Retrying connect to server: localhost/127.0.0.1:18032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
I encountered such a situation, no error (because I turned off yarn's memory check). If you don't use the tez engine, you can start hive normally. Later, I think it will be because the memory is too small to start it? I restarted many components of the cluster: including hdfs, yarn, flume, kafa, zookeeper. Then restart hdfs and yarn and zookeeper. Finally, hive can be started.

Why am I unable to connect to yarn?

I'm trying to connect to yarn by doing yarn application -list. But I cannot because it says:
<date> <time> INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s): retyr policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime= 1000 MILLISECONDS)
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s): retry policy is RetryUpToMaximumCount
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s): retry policy is RetryUpToMaximumCount
I have a file under /etc/hadoop/conf.empty/yarn-site.xml, which I assume is related to this in some way. I have a file at /etc/hadoop/conf.empty/ called yarn-env.sh. I tried running this file, but it didn't change anything.
Am I doing something wrong? Or maybe something is not correctly configured? How do I start yarn?
yarn-site.xml is for configuring YARN daemons ResourceManager, NodeManager and ApplicationMaster. The properties relating to these services go in here. And the environment settings for YARN can be modified with yarn-env.sh.
Start YARN services, (From the path of yarn-site.xml file posted, the installation does not appear to be done using tarballs. So the startup scripts might not be available)
On ResourceManager host
sudo service hadoop-yarn-resourcemanager start
And on each NodeManager host
sudo service hadoop-yarn-nodemanager start
Note: Make sure the preliminary configuration properties are set for both HDFS and YARN and the HDFS daemons Namenode and Datanode are started and running.
Additionally, Configure the mapreduce to use yarn in mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
You need to start the hadoop service, at least you need to start:
start-dfs.sh
start-yarn.sh
these shell script are located in the hadoop bin folder.
Depending on the installation maybe you need even to start history server.
If it is the first time you start hadoop, you need to format the namenode, otherwise the dfs service would not start.

Hadoop Configuration() object not picking up /etc/hadoop/conf/core-site.xml

I'm starting an application which loads creates a new YarnConfiguration() object.
When I'm running it I'm setting HADOOP_CONF_DIR to /etc/hadoop/conf where the configuration files are.
I'm then starting the application;
yarn -jar jarname.jar --config.file config/local.properties and getting the following error;
INFO: Connecting to ResourceManager at /0.0.0.0:8032
Jul 25, 2016 12:33:49 PM org.apache.hadoop.ipc.Client handleConnectionFailure
INFO: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So it doesn't seem to be picking up the details of the yarn resource manager which are running on another client.
the yarn-site.xml has the correct values in it.
Ignoring the shame of how long this took me to spot, incase anyone else has the same problem -
It came down to the -jar which was incorrect. The command needed to be yarn jar without the hypen.

Running JAR in Hadoop on Google Cloud using Yarn-client

i want to run a JAR in Hadoop on Google Cloud using Yarn-client.
i use this command in the master node of hadoop
spark-submit --class find --master yarn-client find.jar
but it return this error
15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What is the problem? In case it is useful this is my yarn-site.xml
<?xml version="1.0" ?>
<!--
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/yarn-logs/</value>
<description>
The remote path, on the default FS, to store logs.
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-m-on8g</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5999</value>
<description>
In your case, it looks like the YARN ResourceManager may be unhealthy for unknown reasons; you can try to fix yarn with the following:
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh
However, it looks like you're using the Click-to-Deploy solution; Click-to-Deploy's Spark + Hadoop 2 deployment actually doesn't support Spark on YARN at the moment, due to some bugs and lack of memory configs. You'd normally run into something like this if you just try to run it with --master yarn-client out-of-the-box:
15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1434561664937
yarnAppState: RUNNING
15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
The well-supported way to deploy is a cluster on Google Compute Engine with Hadoop 2 and Spark configured to be able to run on YARN is to use bdutil. You'd run something like:
./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d \
-e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy
# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell
# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy
# When done, delete your cluster
./bdutil -e my_custom_env.sh delete
With spark_on_yarn_env.sh Spark should default to yarn-client, though you can always re-specify --master yarn-client if you want. You can see a more detailed explanation of the flags available in bdutil with ./bdutil --help. Here are the help entries just for the flags I included above:
-b, --bucket
Google Cloud Storage bucket used in deployment and by the cluster.
-d, --use_attached_pds
If true, uses additional non-boot volumes, optionally creating them on
deploy if they don't exist already and deleting them on cluster delete.
-e, --env_var_files
Comma-separated list of bash files that are sourced to configure the cluster
and installed software. Files are sourced in order with later files being
sourced last. bdutil_env.sh is always sourced first. Flag arguments are
set after all sourced files, but before the evaluate_late_variable_bindings
method of bdutil_env.sh. see bdutil_env.sh for more information.
-P, --prefix
Common prefix for cluster nodes.
-p, --project
The Google Cloud Platform project to use to create the cluster.
-z, --zone
Specify the Google Compute Engine zone to use.

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server.
The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:
2015-04-21 19:05:22,825 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:
2015-04-21 19:05:55,822 [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
I found this question here but in my case the job history server is started. If I run netstat, I find:
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 12073/java off (0.00/0/0)
Where 12073 is ...
12073 pts/4 Sl 0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
I tried opening the port 10200 in case it was a firewall issue:
ACCEPT tcp -- anywhere anywhere tcp dpt:10020
... but no luck.
After a few minutes, some of the tasks just arbitrarily continue to the next part.
I'm using Hadoop 2.3 and Pig 0.14.
My question is:
1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?
... or failing that ...
2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?
It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.
The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)
<property>
<name>mapreduce.jobhistory.address</name>
<value>cm:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
Then restart the job history server:
mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver
If you get a bind exception (port in use), it means the stop didn't work. Either
Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration
Pig should pick up the new settings automatically. Run a Pig script and hope for the best.
start history server in hadoop bin using the below command
bin$ ./mr-jobhistory-daemon.sh start historyserver
run pig using the below command
$pig
Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml,
then:
mapred --daemon start
The solution was the History server was not running:
[user#vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user#vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user#vm9 sbin]$
Now pig can run properly and it will connect to the job history server and the dump command is working fine.

Resources