How to configure the wait process along with ListSFTP process because before fetching the file I need to run a schedule to change the permission of the file which runs every minute.
Related
I am stuck in a scenario in spring batch job remote partitioning where master started successfully but worker failed to start.The job is deployed on Aws batch , so master is waiting indefinitely for workers to finish since worker cannot comeup.
Can anyone suggest me the way to handle such scenario. I dont want my master node to wait till timeout has occured.
The manager is configurable with a timeout to fail if workers do not reply in time. So it won't wait indefinitely.
And if that happens, the job instance will fail and you can either:
restart it (only failed partitions will be restarted)
or abandon it and start a new instance.
I have a requirement where i need to watch for the arrival of a particular file. I am planning to use Autosys file trigger job. But i need to kick start a dependent job once FT job finds the occurrence of a file. But as per the autosys manual, when a file is detected only an alarm is triggered .is there a way to kick start a script once a FT job detects a file???
I'm currently building cd pipeline that replace existing Google Cloud Dataflow streaming pipeline with the new one with bash command. The old and new has the same name job. And I write bash command like this
gcloud dataflow jobs drain "${JOB_ID}" --region asia-southeast2 && \
gcloud dataflow jobs run NAME --other-flags
The problem with this command is that the first command doesn't wait until the job finish draining so that the second command throw error because duplicated job name.
Is there a way to wait until dataflow job finish draining? Or is there any better way?
Thanks!
Seeing as this post hasn't garnered any attention, I will be posting my comment as a post:
Dataflow jobs are asynchronous to the command gcloud dataflow jobs run, so when you use && the only thing that you'll be waiting on will be for the command to finish and since that command is just to get the process started (be it draining a job or running one) it finishes earlier than the job/drain does.
There are a couple of ways you could wait for the job/drain to finish, both having some added cost:
You could use a Pub/Sub step as part of a larger Dataflow job (think of it as a parent to the jobs you are draining and running, with the jobs you are draining or running sending a message to Pub/Sub about their status once it changes) - you may find the cost of Pub/Sub [here].
You could set up some kind of loop to repeatedly check the status of the job you're draining/running, likely inside of a bash script, though that can be a bit more tedious and isn't as neat as a listener, and it would require one's own computer/connection to be maintained or a GCE instance.
I am using the File Source stream component to read files from a directory and send a File instance to a custom processor that reads the file and launches a specific task using a TaskLauncher sink. If I drop 5 files in the directory, 5 tasks launch at the same time. What I am trying to achieve is to have each Task executed one after the other, so I need to monitor the state of the Tasks to ensure the prior task has completed before launching another task. What are my options for implementing this? As a side note, I am running this on a Yarn cluster.
Thanks,
-Frank
I think asynchronous task launching by the YARN TaskLauncher could be the reason to make it look like all the tasks are launched at the same time. One possible approach you can try is to have a custom task launcher sink that launches the task and waits for the task status to be completed before it starts processing the next trigger request.
Is there a similar event scheduler from MySQL available in PostgreSQL?
While a lot of people just use cron, the closest thing to a built-in scheduler is PgAgent. It's a component to the pgAdmin GUI management tool. A good intro to it can be found at Setting up PgAgent and doing scheduled backups.
pg_cron is a simple, cron-based job scheduler for PostgreSQL that runs
inside the database as an extension. A background worker initiates
commands according to their schedule by connecting to the local
database as the user that scheduled the job.
pg_cron can run multiple jobs in parallel, but it runs at most one
instance of a job at a time. If a second run is supposed to start
before the first one finishes, then the second run is queued and
started as soon as the first run completes. This ensures that jobs run
exactly as many times as scheduled and don’t run concurrently with
themselves.
If you set up pg_cron on a hot standby, then it will start running the
cron jobs, which are stored in a table and thus replicated to the hot
standby, as soon as the server is promoted. This means your periodic
jobs automatically fail over with your PostgreSQL server.
Source: citusdata.com