I'm currently building cd pipeline that replace existing Google Cloud Dataflow streaming pipeline with the new one with bash command. The old and new has the same name job. And I write bash command like this
gcloud dataflow jobs drain "${JOB_ID}" --region asia-southeast2 && \
gcloud dataflow jobs run NAME --other-flags
The problem with this command is that the first command doesn't wait until the job finish draining so that the second command throw error because duplicated job name.
Is there a way to wait until dataflow job finish draining? Or is there any better way?
Thanks!
Seeing as this post hasn't garnered any attention, I will be posting my comment as a post:
Dataflow jobs are asynchronous to the command gcloud dataflow jobs run, so when you use && the only thing that you'll be waiting on will be for the command to finish and since that command is just to get the process started (be it draining a job or running one) it finishes earlier than the job/drain does.
There are a couple of ways you could wait for the job/drain to finish, both having some added cost:
You could use a Pub/Sub step as part of a larger Dataflow job (think of it as a parent to the jobs you are draining and running, with the jobs you are draining or running sending a message to Pub/Sub about their status once it changes) - you may find the cost of Pub/Sub [here].
You could set up some kind of loop to repeatedly check the status of the job you're draining/running, likely inside of a bash script, though that can be a bit more tedious and isn't as neat as a listener, and it would require one's own computer/connection to be maintained or a GCE instance.
Related
I'm implementing a service that would reject job requests from being processed if an existing job is running. Unfortunately, I'm not sure if there is a way to tell the difference between a job that is actively running and a job that ended due to an unexpected shutdown like turning Tomcat off. The statuses in the tables are the same with status = STARTED and exit_code = UNKNOWN.
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions("MY_JOB");
Is there a way to tell the two apart or implementation that would change active job statuses to maybe ABANDONED?
There is indeed no way, by just looking at the database, to distinguish between a job that is effectively running and a job that has been abruptly killed (in both cases, the status is STARTED).
What you need to do is, in addition to checking status in the database, find a way to see if a job is effectively running. This really depends on how you run your jobs. For example, if you run your jobs in separate JVMs, you can write some code that checks if there is a JVM currently running your job. If you deploy your jobs to Kubernetes, you could ask Kubernetes if there is a pod currently running your job, etc.
However, if you can identify the execution that has been abruptly stopped for which the status has been stuck at STARTED (because Spring Batch did not have a chance to update its status to FAILED with a graceful shutdown), then you can update the status manually to ABONDONED and set its END_TIME to a non null value. This way, JobExplorer#findRunningExecutions will not return it anymore as a running execution.
So I have 2 yml pipelines currently... one starts running the server and after server is up and running I start the other pipeline that runs tests in one job and once that's completed starts a job that shuts down the server from first pipeline.
I'm kinda new to yml and wondering if there is a way to run all this in a single pipeline...
The problem I came across is that if I put server to run in a first job I do not know how to condition the second job to kick off after server is running. This job doesn't have succeeded of failed condition because it's still in progress as the server has to run in order for tests to be run.
I tried adding a variable that I set to true after server is running but it still never jumps to the next job?
I looked into templates too but those are not very clear to me so any suggestion or documentation or tutorial would be very helpful on how to achive putting this in one pipeline...
I already googled a bunch and will keep googling but figured someone here might have an answer already.
Each agent can run only one job at a time. To run multiple jobs in parallel you must configure multiple agents. You also need sufficient parallel jobs.
You can specify the conditions under which each job runs. By default, a job runs if it does not depend on any other job, or if all of the jobs that it depends on have completed and succeeded. You can customize this behavior by forcing a job to run even if a previous job fails or by specifying a custom condition.
Since you have added a variable that you set to true after server is running. Then try to enable a custom condition, set that job run if a variable is xxx.
More details please kindly check official doc here:
Specify jobs in your pipeline
Specify conditions
We are trying to migrate our laravel setup to use docker. Dockerizing the laravel app was straight forward however we ran into an issue where if do a deployment while scheduled jobs are running they would be killed since the container is destroyed. Whats the best practice here? Having a separate container to run the laravel scheduler doesnt seem like it would solve the problem.
Run the scheduled job in a different container so you can scale it independently of the laravel app.
Run multiple containers of the scheduled job so you can stop some to upgrade them while the old ones will continue processing jobs.
Docker will send a SIGTERM signal to the container and wait for the container to exit cleanly before issuing SIGKILL (the time between the two signals is configurable, 10 seconds by default). This will allow to finish your current job cleanly (or save a checkpoint to continue later).
The plan is to stop old containers and start new containers gradually so there aren't lost jobs or downtime. If you use an orchestrator like Docker Swarm or Kubernetes, they will handle most of these logistics for you.
Note: the laravel scheduler is based on cron and will fire processes that will be killed by docker. To prevent this have the scheduler add a job to a laravel queue. The queue is a foreground process and it will be given the chance to stop/save cleanly by the SIGTERM that it will receive before being killed.
Does capacity-scheduler in yarn run app in parallel on the same queue for the same user.
For example:If we have 2 hive CLI on 2 terminals with same user, and the same query is started on both, do they execute on the default queue in parallel or sequentially.
Currently, the UI shows 1 running, and 1 in pending state:
Is there a way to run it in parallel?
Yarn capacity scheduler run jobs in FIFO manner for the jobs submitted in the same queue. For example if both the hive cli's got submitted for default queue then which ever able to secure resources first will get into running state and other will wait(only if enough resources are not present in the queue).
If you want parallel execution
1) you can run other job in different queue.You can define the queue name while launching job on yarn.
2) You need to define resources in a manner so that both job can get resources as desired.
Is there a similar event scheduler from MySQL available in PostgreSQL?
While a lot of people just use cron, the closest thing to a built-in scheduler is PgAgent. It's a component to the pgAdmin GUI management tool. A good intro to it can be found at Setting up PgAgent and doing scheduled backups.
pg_cron is a simple, cron-based job scheduler for PostgreSQL that runs
inside the database as an extension. A background worker initiates
commands according to their schedule by connecting to the local
database as the user that scheduled the job.
pg_cron can run multiple jobs in parallel, but it runs at most one
instance of a job at a time. If a second run is supposed to start
before the first one finishes, then the second run is queued and
started as soon as the first run completes. This ensures that jobs run
exactly as many times as scheduled and don’t run concurrently with
themselves.
If you set up pg_cron on a hot standby, then it will start running the
cron jobs, which are stored in a table and thus replicated to the hot
standby, as soon as the server is promoted. This means your periodic
jobs automatically fail over with your PostgreSQL server.
Source: citusdata.com