Should I create SYNC jobs only in SQLake? - upsolver

Should we always be just creating sync jobs as our general rule of thumb in Upsolver SQLake?

In most cases, yes, you want to use sync jobs.The only case that you don't want to use sync job is when you have an input to table that you don't want to wait.
Example: you have 5 jobs that write to a table and some jobs that read from that table. If you don't want the entire pipeline to stuck if one of the 5 jobs is stuck, then your pipeline needs to be unsync (or at least this specific job that you think it may stuck to be unsync)
Note: unsync is not a keyword. If create JOB by default creates unsync job. CREATE SYNC job creates SYNC job.

Related

Delete a Job and wait until job is deleting in client-go

// Delete a Batch Job by name
func (k K8sClient) DeleteBatchJob(name string, namespace string) error {
return k.K8sCS.BatchV1().Jobs(namespace).Delete(context.TODO(), name, metav1.DeleteOptions{})
}
I am deleting a job if already exists and then starting a new Job, but the operation here is asynchronous and job creation phase started when the job is being deleted, which I don't want.
I want to successfully delete a job before creation of new one.
How can I implement this functionality using go?
If you give every job a unique name, you won't have to wait for the asynchronous deletion to make a new one. This is how the cron scheduler works in k8s - it creates uniquely named jobs every time.
To find and manage the jobs, you can use labels instead of the job name.

How to log into the database when each Job has been dispatched in laravel?

In my application I need to log when each job is dispatched, retried and executed. The reason why I need this log is to detect whether on a failure the job is re-dispatched or not.
Also I need to keep 1 month long the job dispatch.
In another words I need to log:
Time dispatched,
Time redispatched,
Time finished
error that caused re-dispatch
where the job has been redispatched.
Is there some sort of trigger in laravel 5.7 that allows you to do custom logs during job dispatch?
So far I have seen that there are the jobs and failed_jobs tables that contains only ephemeral information only during the job execution. After job excecution records are removed.
If you want to build it yourself, then Job events
If you want to have a dashboard and logs for everything you mentioned, then Laravel Horizon

Create One Time Scheduled Job, Run When Others Not Running

I want to create a Scheduled Job in Oracle11g Express.
I am new to Job Scheduling and my search so far points to chains but as I want to create a Job out of a function that runs irregulary at a yet unknown date I believe that Chains aren't working for my case.
The Job will be created when a Procedure finishes, which determines its Scheduled Date X.
The Job will perform some critical changes which is why I don't want it to start while other regular scheduled jobs are running.
I want it to wait until the other jobs finish.
I only want to have it run once and than drop the job.
Is there some good practice for this case or some Option I have missed?

Hadoop schedule jobs to run sequentially (one job after other)?

Lets say I am resource constrained in my Hadoop environment and I don't want to schedule really long running jobs (ie takes days to complete). I am analyzing vast amount of past time series data. I want to schedule mapreduce jobs that take a day's worth of data at a time (which takes an hour to crunch).
So how do I schedule such that new job is submitted as soon as previous job is completed?
If you want a quick and simple approach you could just write a shell script that calls hadoop jar in sequence for each job you want to run.
If you want a more robust approach you could use Apache Oozie to define a workflow of jobs that will run your jobs in sequence. If you are new to Hadoop you may find it easiest to define and run your Oozie workflow using the Hue GUI.

Periodic hadoop jobs running (best practice)

Customers able to upload urls in any time to database and application should processes urls as soon as possible. So i need periodic hadoop jobs running or run hadoop job automatically from other application(any script identifies new links were added, generates data for hadoop job and runs job). For PHP or Python script, i could set up cronjob, but what is best practice for periodic hadoop jobs running (prepare data for hadoop, upload data, run hadoop job and move data back to database?
Take a look at Oozie, the new workflow system from Y!, which can run jobs based on different triggers. A good overflow is presented by Alejandro here: http://www.slideshare.net/ydn/5-oozie-hadoopsummit2010
If you want urls to be processed as soon as possible, you'll have them processed each at a time. My recommendation is to wait for some number of links (or MB of links, or for example 10 min, every day).
And batch process them (I do my processing daily, but that jobs takes few hours)

Resources