How to tell Octopus Deploy to wait until another deployment finishes on the same machine? - octopus-deploy

Sometimes it is preferred and/or required to host dozens of applications on a single server. Not saying this is "right" or "wrong," I'm only saying that it happens.
A downside to this configuration is the error message Waiting for the script in task [TASK ID] to finish as this script requires that no other Octopus scripts are executing on this target at the same time appears whenever more than one deployment to the same machine is running. It seems like Octopus Deploy is fighting itself.
How can I configure Octopus Deploy to wait for one deployment to completely finish before the next one is started?

Before diving into the answer, it is important to understand why that message is appearing in the first place. Each time a step is run on a deployment target, the tentacle will create a "Mutex" to prevent others projects from interfering with it. An early use case for this was updating the IIS metabase during a deployment. In certain cases, concurrent updates would cause random errors.
Option 1: Disable the Mutex
We've seen cases where the mutex is the cause of the delay. The mutex is applied per step, not per deployment. It is common to see a situation where it looks like Octopus is "jumping" between deployments. Depending on the number of concurrent deployments, that can slow down the deployment. The natural thought is to disable the mutex altogether.
It is possible to disable the mutex by adding the variable OctopusBypassDeploymentMutex and setting it to True. That variable can exist in either a specific project or in a variable set.
More details on what that variable does can be found in this document. If you do disable the mutex please test it and monitor for any failures. For the most part, we don't see issues disabling the mutex, but it has happened from time to time. It depends on a host of other factors such as application type and Windows version.
Option 2: Leverage Deploy a Release Step
Another option is to coordinate the projects using the deploy a release step. Typically this works best when the projects being deployed are part of the same application suite. In the example screenshot below I have five "deployment" projects:
Azure Worker IaC
Database Worker IaC
Kubernetes Worker IaC
Script Worker IaC
OctoStudy
The project Unleash the Kraken coordinates deployments for those projects.
It does this by using the Deploy a Release step. First it spins up all the infrastructure, then it deploys the application.
This won't work as well if the server is hosting 50 disparate applications.
Option 3: Leverage the API to check for running deployments
The final option is to include a step at the start of each project which hits the API to check for active releases to the deployment targets for the deployment target. If an active deployment is found then wait until it is done.
You can do this by hitting the endpoint https://[YOUR URL]/api/[SPACE ID]/machines/[Machine Id]/tasks?skip=0&name=Deploy&states=Executing%2CCancelling&spaces=[SPACE ID]&includeSystem=false. That will tell you all the active tasks being run for a specific machine.
You can get Machine Id by pulling the value from Octopus.Deployment.Machines. You can get Space Id by pulling the value from Octopus.Space.Id.
The pseudo code for this approach could look like this (I'm not including the actual code as your requirements could be very different).
activeDeployments = true
while (activeDeployments)
{
activeDeployments = false
foreach(machineId in Octopus.Deployment.Machines)
{
activeTasks = https://[YOUR URL]/api/[Octopus.Space.Id]/machines/[Machine Id]/tasks?skip=0&name=Deploy&states=Executing%2CCancelling&spaces=[Octopus.Space.Id]&includeSystem=false
if (activeTasks.Count > 0)
{
activeDeployments = true
}
}
if (activeDeployments = true)
{
Sleep for 5 seconds
}
}

I had this message hit me because I hit the Task Cap on the Octopus Server.
In Octopus\Configuration\Nodes change the task cap to 1 to have 1 deployment at a time even with agents on different servers. The message will display constantly
Or simply increase this value to prevent the message from occurring at all.

Related

Looking for a way to run a workflow on all self-hosted agents in a pool for Github actions

I’m trying to run a maintenance workflow on all agents.
It’s a cleaning job like “docker system prune”. I wish it runs on all self-host agents (about 12 agents) weekly on Sunday night.
I noticed that workflows can run with on schedule event. This is great.
But I didn’t find a way to make all self-hosted agents to run the workflow. Any suggestions?
I believe that this problem is more about your operating system(s), not about workflows on GitHub side. It might be possible to do as workflow, but then your agents are requiring either to be on host operating system, or have access for Docker socket. (I don't know how you are hosting them). Either way, might be insecure depending on if host is used on something else as well.
As GitHub docs are stating, you are responsible about your operating system.
In general, you can schedule maintenance jobs with cron, might be the most used one. How to install it, depends on your operating system.
To add scheduled work, run command crontab -e, select editor and add line to the end:
0 3 * * 0 /usr/bin/docker system prune -f to run 03:00 AM Sunday weekly.
However, if you really want to use workflows, you could read some docs on here. It states, that "Labels allow you to send workflow jobs to specific types of self-hosted runners, based on their shared characteristics." So you could create specific maintenance job for every runner with different label. This requires many scheduled jobs as runners are not intended to launch multiple times for same job.

Azure DevOps Server 2019 (on-premises): Can agent jobs be run serially?

I have a scenario where I would like a build to start running on one agent (Job 1), and then after doing some work, I'd like it to run a step on a special agent (pool) of machines with specially licensed software. (Job 2). When that is done I'd like the rest of the build to complete on the original agent (Job 3).
I have been able to use "Variable Tools for Azure DevOps Services" to successfully pass any number of variables between agent jobs, even when they are running on different machines. It is no problem for me to pass a UNC path from Job1 to Job2 / Job3, etc.
However, what I am seeing is that no matter what I do, agent jobs are always running in parallel, and there is no way to get them to run serially, unless they are locked to the same agent on the same machine, which defeats the whole purpose.
Does anyone know of a means to accomplish this? Right now in tests, I have to use "Start-Sleep" or something similar, and repeatedly monitor an external event. A terribly inelegant work-around.
I found the answer. A job properties contains a field called "dependencies". You can make it serial by setting a dependency on the previous job.
In Azure Devops for the agent job you will get below options
You can select any option based on your requirements.

Pausing Teamcity builds that are running

I would like to have Teamcity build configuration that currently has 3 build steps:
Build an artifact to perform tests on & install on remote server
Kick off long running test job on remote server
Pause build awaiting external event (i.e. remote job finishing)
Retrieve results and record the report
I have had a look through the documentation and I can see how I can pause (step 3) the entire build configuration (which stops any additional builds running) ... but not just a single running build.
The Step 2 script that is running the external job has the various parameters passed to it, so that it can issue a REST call back to the teamcity server to resume the build job.
Basically I don't want to tie up a build agent waiting the entire hour the test takes to run.
I have googled and everything I can find points me at pausing the build configuration.
I am currently having to look at splitting the build configuration into two. The first will kick of the test job and finish. Then when the external test job finishes it will call teamcity to start a second job to retrieve and store the reports. But that feels disconnected to me in that I will not be able to show a single job with build/test/report.
At the moment (TeamCity v 2018.1) there is no direct way to pause the build, release the build agent, and later resume the execution.
What you described is the recommended workaround.
Also, please watch/vote for related issue: https://youtrack.jetbrains.com/issue/TW-30777

Preserve a lock on shared resource in TC pipeline (between build configurations)

I have a teamcity pipeline with multiple build configurations. Some of them need a database for testing. As of yet I have a pool of db schemas configured as a shared resource in TeamCity. Each build configuration lock a schema and then deploy DB into it as the first step. Now, I'd like to have a build configuration that would lock a resource, deploy db and then it should be used by other builds in the pipeline.
Something like:
deploy_db -> build_binaries -> unit tests
-> integration tests
-> ... other tests
I'd like to run test builds in parallel. For that I'd need to lock a schema in deploy_db and then pass a lock to other builds. Is it possible?
Perhaps I could lock any value from schema pool in deploy_db and then in other build configs use %dep.deploy_db.locked_schema% to lock a specific value.
However then it is probably possible that build for another branch can trigger deploy_db that could lock the same schema (using "lock any value") ? I mean it's probably possible if it's triggered after deploy_db finish and before next build executes lock? There would be a short period of time when lock is released after deploy_db and before it's acquired again. Technically another build could lock same value in this time. Is it possible to prevent this?
For example can I lock a value in the first build from a pipeline and release it in the last one?
This is not yet supported. You can watch/vote for corresponding request in the issue tracker

Teamcity: number of busy agents at any time

I'm new to team city
is there a way thru the api to know the number of busy agents at a given time
I know I can do this to get the list of agents
teamcityhost/app/rest/agents/
Since TeamCity can only run a single job per agent they are a 1:1 mapping. You can get a list of the running builds using a build locator like this. The default count limit is 100, so if you have more than 100 agents you'll want to include something big:
/httpAuth/app/rest/builds/?locator=running:true&count=200
You'll get back something like this with a count on the root element:
<builds count="1" nextHref="...">
<build id="10458" ... />
</builds>
If you are using TeamCity 8.1, JetBrains have added an endpoint for queue builds, however the instance I've got here is only 8.0 so I could test it for you. If you use this make sure you filter it to just running builds to exclude those that are actually queued waiting for an agent.
http://confluence.jetbrains.com/display/TCD8/REST+API#RESTAPI-QueuedBuilds

Resources