Hive query is not executing - yarn shutting down - hadoop

I was running a hive query and it is not showing any error, but data is not getting inserted, if I rerun the same query after 10mins the records are getting inserted into table
my query
insert into tablename
values(‘exr‘,‘e’,’r’,’20220909')
when I have checked the hs2 logs i found below information
hiveserver2.log:2022-06-08T05:31:32,981 INFO [HiveServer2-Background-Pool: Thread-6666]: SessionState (:()) - Status: Running (Executing on YARN cluster with App id application_55555_09090)
hiveserver2.log:2022-06-08T05:32:31,017 INFO [HiveServer2-Handler-Pool: Thread-6666]: client.TezClient (:()) - Shutting down Tez Session, sessionName=HIVE-ceb52736-83c3-4c57-8c28-31bce3ee3791, applicationId=application_55555_09090
here
in logs for the applicationId i have fetched the logs in last line it is showing Shutting down Tez Session -- this is the problem
if i have fetched logs for application id which got inserted into the hive table the last line of logs was as below
Completed executing command(queryId=hive_4-eee-444);
so please help me with this why this issue is happening is there any reason or how to avoid this issue of Shutting down Tez Session session in yarn

Related

hadoop cluster with active standby namenode + gap in the edit log

we have ambari cluster , HDP version 2.6.5
cluster include management of two name-node ( one is active and the secondary is standby )
and 65 datanode machines
we have problem with the standby name-node that not started and from the namenode logs we can see the following
2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412.
at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:693)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1073)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:723)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:697)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778)
for now the active namenode is up but the standby name node is down
regarding to
java.io.IOException: There appears to be a gap in the edit log. We expected txid 90247527115, but got txid 90247903412.
what is the preferred solution to fix this problem?
There are many causes for this, However, check this article this should help.
Follow exact steps in exact orders mentioned in article.
In short the error means namenode matadata is damaged/corrupted.

Why does my yarn application not have logs even with logging enabled?

I have enabled logs in the xml file: yarn-site.xml, and I restarted yarn by doing:
sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-yarn-nodemanager restart
I ran my application, and then I see the applicationID in yarn application -list. So, I do this: yarn logs -applicationId <application ID>, and I get the following:
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
Do I need to change some other configuration? Or am I accessing the logs the wrong way?
Thank you.
yarn application -list
will list only the applications that are either in SUBMITTED, ACCEPTED or RUNNING state.
Log aggregation collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. Refer the description of yarn.log-aggregation-enable property here.
So, the applicationId listed by the command isn't completed yet and the logs are not yet collected. Thus the response when trying to access the logs of a running application
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
You can try the same command yarn logs -applicationId <application ID> to view the logs once the application has completed.
To list all the FINISHED applications, use
yarn application -list -appStates FINISHED
Or to list all the applications
yarn application -list -appStates ALL
Enable Log Aggregation
Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
In version 2.3.2 of hadoop and higher you can get log aggregation to occur hourly on running jobs using this configuration in yarn-site.xml:
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
See this for further details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html
It was probably saved with another appOwner. You can try to specify the application owner in your command:
yarn logs -appOwner .. -application_id ..
ROOT CAUSE: When log aggregation has been enabled each users application logs will, by default, be placed in the directory hdfs:///app-logs//logs/<APPLICATION_ID>. By default only the user that submitted the job and members of the hadoop group will have access to read the log files. In the example directory listing below you can see that the permissions are 770. No access for anyone other than the owner and members of the hadoop group.
[root#mycluster ~]$ hdfs dfs -ls /app-logs
Found 3 items
drwxrwx--- - hive hadoop 0 2017-03-10 15:33 /app-logs/hive
drwxrwx--- - user1 hadoop 0 2017-03-10 15:37 /app-logs/user1
drwxrwx--- - spark hadoop 0 2017-03-10 15:39 /app-logs/spark
SOLUTION: The message above can be deceiving and does not necessarily indicate that log aggregation has not been enabled. To obtain yarn logs for an application the 'yarn logs' command must be executed as the user that submitted the application. In the example below the application was submitted by user1. If we execute the same command as above as the user 'user1' we should get the following output if log aggregation has been enabled.
yarn logs -applicationId application_1473860344791_0001
16/09/19 23:10:33 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:10:33 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
16/09/19 23:10:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/09/19 23:10:34 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e03_1473860344791_0001_01_000001 on mycluster.somedomain.com_45454
LogType:stderr
Log Upload Time:Wed Sep 14 09:44:15 -0400 2016
LogLength:0
Log Contents:
End of LogType:stderr
REFERENCE: The following document describes how to use log aggregation to collect logs for long-running YARN applications.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_yarn-resource-management/content/ch_log_a...

Spark NullPointerException on SQLListener.onTaskEnd while finishing task

I have a Spark application using Scala which perform series of transformation, then writing the result to parquet file.
The transformation part finished without problem, the result output is written to HDFS correctly. The application is running on top of YARN cluster of 30 nodes.
However, the Spark application itself will not complete and exit the YARN. It will remain in resource manager.
After hanging for about an hour (consuming resources and vcores), then either it finishes or throw an error and killed itself.
Here is the error log of the application. Appreciate if anyone can shed some light on this matter.
16/08/24 14:51:12 INFO impl.ContainerManagementProtocolProxy: Opening proxy : phhdpdn013x.company.com:8041
16/08/24 14:51:22 INFO cluster.YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (phhdpdn013x.company.com:54175) with ID 1
16/08/24 14:51:22 INFO storage.BlockManagerMasterEndpoint: Registering block manager phhdpdn013x.company.com:24700 with 2.1 GB RAM, BlockManagerId(1, phhdpdn013x.company.com, 24700)
16/08/24 14:51:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/08/24 14:51:29 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
16/08/24 15:11:00 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception
java.lang.NullPointerException
at org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
16/08/24 15:11:46 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception
java.lang.NullPointerException
aa
What is your version of Spark?
Your ERROR looks a lot like this issue
https://issues.apache.org/jira/browse/SPARK-12339

PIG: script never ending

I have a coordinator which contains one workflow, with several "PIG forks". Each "PIG fork" is the execution of the same PIG script with different parameters.
Such coordinator consumes all resources available on the cluster, because PIG scripts have lot of data to proceed. Now here is the problem.
Sometimes, the coordinator successfully terminates in two hours. Sometimes, it never ends.
In this second case, PIG logs are:
2016-05-19 04:14:08,884 [communication thread] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1460732649780_25701_m_000000_0 is : 1.0
...
2016-05-19 07:40:38,492 [communication thread] INFO org.apache.hadoop.mapred.TaskAttemptListenerImpl - Progress of TaskAttempt attempt_1460732649780_25701_m_000000_0 is : 1.0
Such message then repeats indefinitely in every forked PIG scripts... and YARN seems frozen (it can't allocate new containers).
Do you have solutions to address such issue ?

Submit Job in Spark using Yarn Cluster

I am unable to submit the job in yarn cluster.The job is running fine under yarn-client option. When submit it to yarn-cluster only this log is coming multiple times.
Application report for application_1421828570504_0002 (state: ACCEPTED)
and got failed with the following exception.
diagnostics: Application application_1421828570504_0002 failed 10 times due to AM Container for app
attempt_1421828570504_0002_000010 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
You should have a look at the logs of your application:
> yarn logs --applicationId application_1421828570504_0002
This will yield some debug information of the actual run within the spark containers.
Since it is running locally but not on the cluster my wild guess would be a missing SparkContext definition. Have a look at my answer to this question for a fix.

Resources