How to get workdir and more of a completed job in slurm? - cluster-computing

Using scontrol makes it very easy to get the WorkDir, StdErr, StdOut and Command of a running and queued job. Is it possible to access these informations after a job is completed? I could not find any hints when looking in the documentation of sacct.

sacct will not report that information. If your cluster has the elasticsearch plugin configured, that information will be stored in elasticsearch as well as the full script submitted
The jobcomp/filetxt JobCompType plugin will also store the WorkDir but not the other fields.
You can also use a SlurmctldEpilog to store all the data you want in a file.

Related

See print in python script running on spark with spark-submit

I have to test some code using Spark and I'm pretty new to it.
The code I have runs an ETL script on a cluster. The ETL script is written in Python and have several prints in it but I'm unable to see those prints. The Python script is added to the spark-submit in the --py-files tag. I don't if those prints are unreachable since they are happening in the YARN executors and I should change them to logs and use log4j or add them to an accumulator reachable by the driver.
Any suggestions would help.
The final goal is to see how the execution of the code is going.I don't know if simple prints is the best solution but it was already in the code I was given to test.

Check whether the job is completed or not through unix

I have to run multiple spark job one by one in a sequence, So I am writing a shell script. One way I can do is to check success file in output folder for job status, but i wanna know that is there any other way to check the status of spark-submit job using unix script, where I am running my jobs.
You can use command
yarn application -status <APPLICATIOM ID>
where <APPLICATIOM ID> is your application ID and check for line like:
State : RUNNING
This will give you the status of your application
To check the list of application, run via yarn you can use command
yarn application --list
You can add also -appTypes to limit the listing based on the application type

How to get the job id of a specific running hadoop jobs

I need to get the id of a specific hadoop job.
In my case, I lunch a sqoop commande remotely and I went to verify the job status with this commande :
hadoop job -status job_id | grep -w 'state'
I can get this information from the GUI but i went to do something
can any one help me !!!
You can use the Yarn REST apis, via your browser or curl from the command line. It will list all the currently running and previously running jobs, including sqoop and the mapreduce jobs that sqoop generates and executes. Use the UI first, if you have it up and running just point your browser to http:<host>:8088/cluster (not sure if the port is the same on all hadoop distributions. I believe 8088 is the default on apache). Alternatively you can use yarn commands directly, e.g, yarn application -list.

View Log from Map/Reduce Task

I know that i can find the map/reduce task log inside: /usr/local/hadoop/logs/userlogs/.
Are there a friendly way to see it?
For example, when i clicked http://127.0.0.1:8088/cluster/, I can see all jobs executed in the cluster. Then i clicked in a FINISHED job. But now, when i try to click in Tracking URL: History it gives me an error, Why can i see the task logs from here?
I would like to see the stderr, stdout and syslog from each task.
Try using Job Browser from HUE
or Use the command
yarn logs -applicationId [OPTIONS]
general options are:
-appOwner AppOwner (assumed to be current user if
not specified)
-containerId ContainerId (must be specified if node
address is specified)
-nodeAddress NodeAddress in the format nodename:port
(must be specified if container id is
specified)
Example: yarn logs -applicationId application_1414530900704_0007

Error: Jobflow entered COMPLETED while waiting to ssh

I just started to practice AWS EMR.
I have a sample word-count application set-up, run and completed from the web interface.
Following the guideline here, I have setup the command-line interface.
so when I run the command:
./elastic-mapreduce --list
I receive
j-27PI699U14QHH COMPLETED ec2-54-200-169-112.us-west-2.compute.amazonaws.comWord count
COMPLETED Setup hadoop debugging
COMPLETED Word count
Now, I want to see the log files. I run the command
./elastic-mapreduce --ssh --jobflow j-27PI699U14QHH
Then I receive the following error:
Error: Jobflow entered COMPLETED while waiting to ssh
Can someone please help me understand what's going on here?
Thanks,
When you setup a job on EMR, this means that Amazon is going to provision a cluster on-demand for you for a limited amount of time. During that time, you are free to ssh to your cluster and look at the logs as much as you want, but by the time your job has finished running, then your cluster is going to be taken down ! At that point, you won't be able to ssh anymore because your cluster simply won't exist.
The workflow typically looks like this:
Create your jobflow
It will be for a few minutes in status STARTING. At that point if you try to run ./elastic-mapreduce --ssh --jobflow <jobid> it will simply wait because the cluster is not available yet.
After a while the status will switch to RUNNING. If you had already started the ssh command above it should automatically connect you to your cluster. Otherwise you can initiate your ssh command now and it should connect you directly without any wait.
Depending on the nature of your job, the RUNNING step could take a while or be very short, it depends what amount of data you're processing and the nature of your computations.
Once all your data has been processed, the status will switch to SHUTTING_DOWN. At that point, if you already sshed before you will get disconnected. If you try to use the ssh command at that point, it will not connect.
Once the cluster has finished shutting down it will enter a terminal state of either COMPLETED or FAILED depending on whether your job succeeded or not. At that point your cluster is no longer available, and if you try to ssh you will get the error you are seeing.
Of course there are exceptions, you could setup an EMR cluster in interactive mode, for example you just want to have Hive setup and then ssh there and run Hive queries and you would have to take your cluster down manually. But if you just want a MapReduce job to run, then you will only be able to ssh for the duration of the job.
That being said, if all you want to do is debugging, there is not even a need to ssh in the first place ! When you create your jobflow, you have the option to enable debugging, so you could do something like that:
./elastic-mapreduce --create --enable-debugging --log-uri s3://myawsbucket
What that means is that all the logs for your job will end up being written to the S3 bucket specified (you have to own this bucket of course and have permission to write to it). Also if you do that, you can go into the AWS console afterwards in the EMR section, and you will be able to see next to your job a button to debug as shown below in the screenshot, this should make your life much easier:

Resources