Auto sys file watcher job - job-scheduling

Task: Is to create file watcher job in autosys that would watch out for a particular file.
The requirement is that the file comes at 9:00am everyday and the file watcher job starts running by 8.50am. If the file is received by 10:00 am then job should terminate successfully else an alert email(thru SSIS package, another autosys job) should be triggered.
I'm using Autosys(windows).
I'm not sure how to tell file watcher job to Start looking for file around 8:50am and end looking for file at 10:00 am and if the file is not received by 10 am then trigger another auto sys job. How to set this up.
Any help would be much appreciated.
Thanks,
Cindy!!

for the first job:
start_times: "08:50"
term_run_time: 70
for the second job:
condition: failure(first_job)

Related

Cron job gets file from server, what should I add to the script to have it check again in 15mins if the file is unchanged?

This Crontab Day of the Week syntax does not provide a solution.
I have a cron job set; using wget to download a file & generate a report.
What should I do to amend the script so that if the file on the server hasn't been updated yet, it tries the job again after 15 minutes?

After triggering a Jenkins job remotely via a Bash script, when should I retrieve the job id?

I already built a script trigger_jenkins_job.sh which works perfectly fine for now. It’s composed mainly of 3 functions:
input_checkpoint
run_remotejob #: Running Jenkins job remotely using Json api.
sleep 10 #: 10 sec estimated time until pending duration is over
#and Jenkins job start running, i.e. a given slave was
#assigned to run the job.
get_buildID #: Retrieving build state, last build ID and last stable
#build ID using
The problem is I want to get rid of that sleep 10 seconds. And in the same time, I want to be sure before executing the function get_buildID that the remotely- triggered job is actually running on a node.
That way I will be retrieving the triggered job’s id, and not the last one in the queue before triggering that job.
Regarding the Jenkins file of the job, I specified:
agent {
label 'linux-node'
}
So, I guess the question is, I need some how from by bash script, to test if linux-node is running the remotely-triggered job, and if yes I execute the function get_buildID.
Get rid of the sleep command and use the wait command.
If you are triggering Job with tokens,it command itself should return you buildNumber.
Another way could be REST API. Please see "nextBuildNumber" field there (if build is still pending) else "number"

How get exception,error,log for HIVE-SQOOP based Batch Job?

I have Hadoop cluster with 6 datanode and 1 namenode. I have few(4) jobs in HIVE which run on every day and push some data from logfile to our OLPT data base using sqoop. I do not have oozie installed in the environment. All are written in HIVE script file (.sql file) and I run those from unix script(.sh file). Those shell script file are attach with different OS cron job to run those on different time.
Now Requirement is This:
Generate log/status for each job separately on daily basis. So that at the end of the day looking into those log we can identify which job run successfully and time it took to run , which job failed and dump/stack stace for that failed job.(Feature plan is that we will have mail server and every failed or success job shell script will send mail to respective stack holder with those log/status file as attachment)
Now my problem is how I can find error/exception if anything I have to run those batch job / shell script and how to generate success log also with execution time?
I tried to get the output in text file for each query run into HIVE by redirecting the output but that is not working.
for example :
Select * from staging_table;>>output.txt
Is there any way to do this by configuring HIVE log for each and every HIVE job on day to day basis?
Please let me know if any one face this issue and how can I resolve this?
Select * from staging_table;>>output.txt
this is Redirecting output if you are looking for that option then below is the way from the console.
hive -e 'Select * from staging_table' > /home/user/output.txt
this will simply redirect the output. It wont display job specific log information.
However, I am assuming that you are running on yarn, if you are expecting to see application(job) specific for logs please see this
Resulting log file locations :
During run time you will see all the container logs in the ${yarn.nodemanager.log-dirs}
Using UI you can see the logs i.e job level and task level.
other way is to look from and dump application/job specific logs from command line.
yarn logs -applicationId your_application_id
Please note that using the yarn logs -applicationId <application_id> method is preferred but it does require log aggregation to be enabled first.
Also see much better explanation here

Script didn't Finish execution but cron job started again

i am trying to run a cron job which will execute my shell script, my shell script is having hive & pig scripts. I am setting the cron job to execute after every 2 mins but before my shell script is getting finish my cron job starts again is it going to effect my result or once the script finishes its execution then only it will start. I am in a bit of dilemma here. Please help.
Thanks
I think there are two ways to better resolve this, a long way and a short way:
Long way (probably most correct):
Use something like Luigi to manage job dependencies, then run that with Cron (it won't run more than one of the same job).
Luigi will handle all your job dependencies for you and you can make sure that a particular job only executes once. It's a little more work to get set-up, but it's really worth it.
Short Way:
Lock files have already been mentioned, but you can do this on HDFS too, that way it doesn't depend on where you run the cron job from.
Instead of checking for a lock file, put a flag on HDFS when you start and finish the job, and have this as a standard thing in all of your cron jobs:
# at start
hadoop fs -touchz /jobs/job1/2016-07-01/_STARTED
# at finish
hadoop fs -touchz /jobs/job1/2016-07-01/_COMPLETED
# Then check them (pseudocode):
if(!started && !completed): run_job; add_completed; remove_started
At the start of the script, have a check:
#!/bin/bash
if [ -e /tmp/file.lock ]; then
rm /tmp/file.lock # removes the lock and continue
else
exit # No lock file exists, which means prev execution has not completed.
fi
.... # Your script here
touch /tmp/file.lock
There are many others ways of achieving the same. I am giving a simple example.

How to Sync Cronjob with the files its running which takes more than 5 mins to complete but i have Cronjob set for every 3 mins

I am facing a problem with the scripts which i am using in my code, my cronjob runs every 5 mins but the scripts which it is running some time takes more time and i want my cronjob to wait for those files to finish its processing and then execute in the earliest interval is it possible?
Please see below example. Kindly propose me a solution. TIA.
I am running a cronjob for e.g.:
*/5 * * * * /home/Sti/New_Int/fetch_My_Data.sh
This job is invoking below scripts and few details about what each script is doing
fetch_Some_Data.sh --> This script is just moving few files from one location to another so that the only required files can be processed.
tran.sh --> This script opens a for loop and for each file it will open a DB connection by invoking PostP.sh script and for processing it has a sleep time of 60 seconds.
PostP.sh --> This is a script which creates a DB connection and terminates it for each file which is being processed in point 2.
So can you provide me a solution so that if the files are not processed in point 2 the cronjob won't run till then
I usually use a temporary file in such cases to indicate a running instance, and till that file exists all other instances simply exit or Error.
Add this logic to your shell script before doing anything else:
if exists file
then
exit
end if
else
touch empty file
end else

Resources