hadoop - How to kill a TEZ job started by hive? - hadoop

Below is what I can find. But the problem is if we reuse jdbc hive session all the hive queries go as same Application-Id. Is there a way I can kill a dag?
Tez jobs can be listed using: yarn application -list
Tez jobs can be killed using: yarn application -kill Application-Id

Related

How to Kill Hive Query, without knowing application id?

My hive-server2 list a few running jobs, so I can find the various query_id.
But there is not yarn-application information in the Yarn 8088 pages.
My question is how to kill the running job.
If you are using Yarn as resource manager, you can find all running jobs by running the following in shell:
yarn application -list -appStates ALL
You can change ALL to RUNNING etc. depending on what application state you are interested in seeing.
An alternative command to the above to see running applications is:
mapred job -list
In order to kill a specific application/job, with YARN you can run:
yarn application -kill <application_id>
Or otherwise:
mapred job -kill <job_id>

Job name for Tez job in beeline and view it in YARN

I'm using Beeline and like to set a specific name for a TEZ job, like I use mapreduce.job.name for a MapReduce job. I tried hive.query.name, but it doesn't make any difference in yarn application -list.
Some say we can view the name only in TEZ UI, but I only have access to YARN. Please help me.
I have a load script in Beeline with TEZ as execution engine running now,
when I'm trying to see the active applications in YARN with yarn application -list command, I get something like HIVE-<UUID> as the job name.
I would like to change it to more readable.
I can do the same if the execution engine is MR with SET mapreduce.job.name = myJobName command.
I want similar command for TEZ engine, as I already said SET hive.query.name=myJobName is not seems to be working.
Try also to set session id:
set hive.session.id=myJobName;
Or start hive with hiveconf parameter:
hive --hiveconf hive.session.id=myJobName -f "myscript.hql"

how to know the spark application's parent application in oozie spark action

when I use oozie's spark action launch a spark application, oozie will first launch a mapreduce applicaton, then the mapreduce launch a spark application. How can I know a spark application is launched by which mapreduce task?
So far I can see MapReduce application is named with some oozie information, like oozie:launcher:T=spark:W=JavaWordCount:A=spark-test:ID=0000023-171207132348866-oozie-oozi-W, and the spark application has a application tags like oozie-6e83d420c018bc0f63bccd19fe73b24f.But I still don't konw how to associate them?
You can get the spark application-id by using yarn client:
to show all application-id and get the oozie:launcher mapreduce application-id please run the following command;
yarn application -list
then you can get the spark application id by using the oozie:launcher mapreduce application-id like this:
yarn logs -applicationId $APPID | grep "Submitted application" | awk '{print $NF}'
please change $APPID by the first mapreduce application-id which launch the spark application

How to run sqoop and spark streaming jobs together

I have a problem with sqoop and spark streaming jobs running together.
When i start spark streaming job and sqoop , the sqoop job stay on "accepted" mode and can't start. However,after killing spark job ,the sqoop job can run properly .
I really dont know what is the problem .

How to kill a mapred job started by hive?

I'm working by CDH 5.1 now. It starts normal Hadoop job by YARN but hive still works with mapred. Sometimes a big query will hang for a long time and I want to kill it.
I can find this big job by JobTracker web console while it didn't provide a button to kill it.
Another way is killing by command line. However, I couldn't find any job running by command line.
I have tried 2 commands:
yarn application -list
mapred job -list
How to kill big query like this?
You can get the Job ID from Hive CLI when you run a job or from the Web UI. You can also list the job IDs using the application ID from resource manager. Ideally, you should get everything from
mapred job -list
or
hadoop job -list
Using the Job ID you can kill it by using the below command.
hadoop job -kill <job_id>
Another alternative would be to kill the application using
yarn application -kill <application_id>

Resources