I'm running Spark applications on YARN, when I kill the job using:
yarn application kill -applicationId application_XYZ
I can not go to Spark Job GUI of killed application form Hadoop GUI (ResourceManager). When I open Spark history server directly and try to display Incomplete application application logs it works. When job is completed (not killed) log can be displayed this way: Hadoop GUI -> Spark history server. I'm using YARN log aggregation service to aggregate logs. Aslo I can access application logs using:
yarn logs -applicationId application_XYZ
Have you experienced the same behaviour when you kill a Spark application? Is there anything wrong with killing application this way?
There is nothing wrong in killing of the application like that.And yes, Hadoop UI does not show the output of the killed jobs but as you mentioned you can see it from the logs on the master.
Related
I'm running a large Spark job (about 20TB in and stored to HDFS) alongside Hadoop. The spark console is showing the job as complete but Hadoop still things the job is running, both in the console and the logs are still spitting out 'running'.
How long should I be waiting until I should be worried?
You can try to stop the spark context cleanly. If you havent close it add a sparkcontext stop method at the end of the job. For example
sc.stop()
My hadoop job was running over 10 hours but since I put it in wrong queue, the containers are kept getting killed by the scheduler.
How do I change the queue of currently running hadoop job without restarting it?
Thank you
if running Yarn you can change the current job's queue by
yarn application -movetoqueue <app_id> -queue <queue_name>
When I run a hadoop job with the hadoop application it prints a lot of stuff. Among them, It show the relative progress of the job ("map: 30%, reduce: 0%" and stuff like that). But, when running a job without the application it does not print anything, not even errors. Is there a way to get that level of logging without the application? That is, without running [hadoop_folder]/bin/hadoop jar <my_jar> <indexer> <args>....
You can get this information from Application Master (assuming you use YARN and not MR1 where you would get it from Job Tracker). There is usually web UI where you can find this information. Details will depend on your Hadoop installation / distribution.
In case of Hadoop v1 check Job tracker web URL and in case of Hadoop v2 check Application Master web UI
Running Spark 1.3.1 on Yarn and EMR. When I run the spark-shell everything looks normal until I start seeing messages like INFO yarn.Client: Application report for application_1439330624449_1561 (state: ACCEPTED). These messages are generated endlessly, once per second. Meanwhile, I am unable to use the Spark shell.
I don't understand why this is happening.
Seeing (near) endless Accepted messages from YARN has always been a sure sign that there were not enough cluster resources to allocate for my Spark jobs / shell. YARN will continue trying to schedule your Spark application, but will eventually time-out if not enough resources become available in a certain amount of time.
Are you providing any command line options to spark-shell that override the defaults provided? When I ask for too many executors/cores/memory YARN will accept my request but never transition to a Running ApplicationMaster.
Try running a spark-shell with no options (other than perhaps --master yarn) and see if it gets past Accepted.
Realized there were a couple of streaming jobs I had killed in the terminal, but I guess they were somehow still running. I was able to find these in the UI showing all running applications on YARN (I wasn't able to execute Hive queries as either). Once I killed the jobs using the command below the spark-shell started as usual.
yarn application -kill application_1428487296152_25597
I guess that YARN is not having resources enough for running jobs.
Please check
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html
for calculating how many resources can you provide to YARN.
Please check the number of cores and the RAM quantity that it is controlled by the following variables:
yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.memory-mb
Can someone tell how to kill a container? i see nodes are still running containers even after the application is finished and i want to know the command to kill them? Because of this issue, my subsequent applications stays in accepted state.
Thanks
Hadoop job -list
This gives you jobs that are running with JobID's
To kill job
Hadoop job –kill JobID
If yarn application is finished and some containers are still running, I'd say this is a bug somewhere. Is this a MR app? I don't think there's any commands to kill containers and anyway those should be handled by a nodemanager. Resource manager and Node manager should kill all containers when application is finished.
You didn't provide any info on what is this app, hadoop version, operating system, etc. Having said that, I once had a problem in my ubuntu hosts which had HADOOP-9752 bug which prevented nodemanager to kill a container.