hive start tez who fails to connect to yarn - hadoop

I have installed an hadoop cluster with one master and 3 nodes.
hadoop v 3.2.1, hive 3.1.2 and tez 0.10.0
I verified conf files and I can't find why tez is trying to connect to yarn throw localhost:18032
yarn is configured to be on hadoop-master:8032.
I tested directly tez with an example and it connects correctly to yarn
hadoop jar tez-examples-0.10.0.jar orderedwordcount /user/hive/foo.txt /user/hive/tmp/out
So, when executing a SQL like insert into ... I have that log
2022-10-28 12:36:15,036 INFO ipc.Client (Client.java:handleConnectionFailure(1010)) - Retrying connect to server: localhost/127.0.0.1:18032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

I encountered such a situation, no error (because I turned off yarn's memory check). If you don't use the tez engine, you can start hive normally. Later, I think it will be because the memory is too small to start it? I restarted many components of the cluster: including hdfs, yarn, flume, kafa, zookeeper. Then restart hdfs and yarn and zookeeper. Finally, hive can be started.

Related

Not able to run dump in pig

I am trying to dump a relation but getting following error.
I have tried start-all.sh and tried formatting namenode using hadoop namenode -format.
But I am not getting what is wrong.
Error:-
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Start the JobHistoryServer
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Pig when ran in mapreduce mode expects the JobHistoryServer to be available.
To configure JobHistoryServer, add these properties to mapred-site.xml replacing hostname with actual name of the host where the process is started
<property>
<name>mapreduce.jobhistory.address</name>
<value>hostname:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hostname:19888</value>
</property>
I would first ensure I'm able to connect to namenode from hdfs client on a edge node. If not some problem/inconsistency with your namenode configs in core-site.xml file either with ports or hostname.
Once you are able to run below with out any issues and ensure namenode is not in safe mode on url http://namenode_host:50070 (which prevents any writes)
hadoop fs -ls /
Then I would proceed with pig. Looks like based on your error hdfs client is unable to reach namenode for some reason which could be firewall or config issue.

Why am I unable to connect to yarn?

I'm trying to connect to yarn by doing yarn application -list. But I cannot because it says:
<date> <time> INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s): retyr policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime= 1000 MILLISECONDS)
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s): retry policy is RetryUpToMaximumCount
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s): retry policy is RetryUpToMaximumCount
I have a file under /etc/hadoop/conf.empty/yarn-site.xml, which I assume is related to this in some way. I have a file at /etc/hadoop/conf.empty/ called yarn-env.sh. I tried running this file, but it didn't change anything.
Am I doing something wrong? Or maybe something is not correctly configured? How do I start yarn?
yarn-site.xml is for configuring YARN daemons ResourceManager, NodeManager and ApplicationMaster. The properties relating to these services go in here. And the environment settings for YARN can be modified with yarn-env.sh.
Start YARN services, (From the path of yarn-site.xml file posted, the installation does not appear to be done using tarballs. So the startup scripts might not be available)
On ResourceManager host
sudo service hadoop-yarn-resourcemanager start
And on each NodeManager host
sudo service hadoop-yarn-nodemanager start
Note: Make sure the preliminary configuration properties are set for both HDFS and YARN and the HDFS daemons Namenode and Datanode are started and running.
Additionally, Configure the mapreduce to use yarn in mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
You need to start the hadoop service, at least you need to start:
start-dfs.sh
start-yarn.sh
these shell script are located in the hadoop bin folder.
Depending on the installation maybe you need even to start history server.
If it is the first time you start hadoop, you need to format the namenode, otherwise the dfs service would not start.

Running an Oozie job

I'm trying to configure Oozie to work on my hadoop-2.7.1 cluster. Everything seems to work fine, YARN, Hue, MapReduce and Spark. Jobs send by yarn jar... command finish correctly, but sending some job with oozie, either by CLI oozie job ... -run or by Hue, the job is stuck at 33% and node logs show this:
2015-11-06 06:08:56,121 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18030
2015-11-06 06:08:57,165 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:18030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
I don't use 18030 port anywhere in my configuration, probably I should change its hostname from localhost to the network hostname. But where do I configure it? I've tried to change yarn.resourcemanager.scheduler.address, but that wasn't it.
EDIT:
I run oozie job -config examples/apps/shell/job.properties -run with job.properties containing:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
examplesRoot=examples
oozie.libpath=/data/shared/hadoop-2.7.1/etc/hadoop
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell
The error is occurring while trying to contact the Resource Manager.
The above mentioned log line is being printed in RMProxy.java:
LOG.info("Connecting to ResourceManager at " + rmAddress);
When you are using Oozie with MRv1, in "job.properties" file, the value of jobTracker is set to the Job Tracker's address:
jobTracker={JobTracker Host}:{JobTracker Port}
But, when you migrate your Oozie job to MRv2, you need to change "job.properties", to make jobTracker value to point to Resource Manager address:
jobTracker={RM Host}:{RM Port}
Please refer to the link here: https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
jobTracker = Variable to define the resource manager address in case of Yarn implementation. Format: <resourcemanager_hostname>:<port>
EDIT:
I went through the Hadoop source code. The only place where port "18030" is being used is in "SLS" (Yarn Scheduler Load Simulator).
SLS has a yarn-site.xml file (present at location: \hadoop-tools\hadoop-sls\src\main\sample-conf\yarn-site.xml), which has following configuration:
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:18030</value>
</property>
From your description, it seems the yarn-site.xml that is being used, is similar to the one used by SLS.

apache-spark 1.3.0 and yarn integration and spring-boot as a container

I was running spark application as a query service (much like spark-shell but within servlet container of spring-boot) with spark 1.0.2 and standalone mode. Now After upgrading to spark 1.3.1 and trying to use Yarn instead of standalone cluster things going south for me. I created uber jar with all dependencies (spark-core, spark-yarn, spring-boot) and tried to deploy my application.
15/07/29 11:19:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/29 11:19:27 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/07/29 11:19:28 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/07/29 11:19:29 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
I also tried to exclude spark-yarn dependencies and supplied it during runtime but same exception. We use MapR distribution and they said it's not possible to run spark jobs on yarn without using spark-submit script. I can try to launch my webapp using that script as my build artifact is spring-boot jar (not war) but that just doesn't feel right. I should be able to init service from my container not other way around.
EDIT 1: how I launch my application:
I launch it from a machine where hadoop client is installed and configured.
java -cp myspringbootapp.jar com.myapp.Application
com.myapp.Application in turns creates SparkContext as a spring managed bean. That I use later to serve user requests.
I did got it working with few steps: 1) Exclude hadoop jars from uber jar (spring-boot maven plugin gives you uber jar by default and there you need to make exclusion) 2) use ZIP layout with spring boot maven plugin that allows you to use loader.path spring configuration to provide extra classpath during runtime. 3) use java -loader.path='/path/to/hadoop/jar,/path/to/hadoop/conf/' -jar myapp.jar
PS - error i was getting was due to hadoop jar being on classpath without proper configuration files. by default hadoop jar is packed with yarn-default.xml which tries to locate your resource manager at 0.0.0.0/0.0.0.0:8032. You can still try packing hadoop jar but be sure to provide path to your custom hadoop conf. i.e. yarn-site.xml which has proper setting for your resource manager host, port, ha etc.

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server.
The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:
2015-04-21 19:05:22,825 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:
2015-04-21 19:05:55,822 [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
I found this question here but in my case the job history server is started. If I run netstat, I find:
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 12073/java off (0.00/0/0)
Where 12073 is ...
12073 pts/4 Sl 0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
I tried opening the port 10200 in case it was a firewall issue:
ACCEPT tcp -- anywhere anywhere tcp dpt:10020
... but no luck.
After a few minutes, some of the tasks just arbitrarily continue to the next part.
I'm using Hadoop 2.3 and Pig 0.14.
My question is:
1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?
... or failing that ...
2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?
It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.
The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)
<property>
<name>mapreduce.jobhistory.address</name>
<value>cm:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
Then restart the job history server:
mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver
If you get a bind exception (port in use), it means the stop didn't work. Either
Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration
Pig should pick up the new settings automatically. Run a Pig script and hope for the best.
start history server in hadoop bin using the below command
bin$ ./mr-jobhistory-daemon.sh start historyserver
run pig using the below command
$pig
Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml,
then:
mapred --daemon start
The solution was the History server was not running:
[user#vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user#vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user#vm9 sbin]$
Now pig can run properly and it will connect to the job history server and the dump command is working fine.

Resources