Teamcity Cloud Profile Canceling Builds in progress - teamcity

I am trying to setup Teamcity Autoscaling using EC2 instances as the agents
We keep having the agents shutdown mid-build
We have a value for "Terminate instance idle time" in the profile set to 120
The way i interpret that value is if no build has run on the agent for 2 hours it will terminate the agent.
Do I need to set another value to prevent it from shutting down mid build?

Related

if an aws spot instance is stopped by AWS and then restarts will it just start where it left off?

I am running luigi, a pipeline manager which is processing 1000 tasks. Currently I poll for the AWS termination notice. If it is present then I requeue the job; wait 30 minutes; then launch a new server starting all the tasks from scratch. However sometimes it restarts the same job multiple times which is inefficient.
Instead I am considering using create_fleet with InstanceInterruptionBehaviour=Stop? If I do this then when it restarts will it still be running the luigi daemon and retain the state of all the tasks?
All InstanceInterruptionBehaviour=Stop does is effectively shutdown your EC2 instance rather than terminate it. Since the "persistent" setting is required in addition to EBS storage" you will keep all the data currently on the attached EBS volumes at the time of the instance stop.
It is completely dependent on the application itself (Luigi in this case) to be able to store the state of its execution and pick back up from where it left off. For one, you'll want to ensure you enable the service daemon to automatically start upon system start (example):
sudo systemctl enable yourservice

Does ECS update-service command marks the container instance state to draining when use with the --force-new-deployment option?

The command:
aws ecs update-service --service my-http-service --task-definition amazon-ecs-sample --force-new-deployment
As per AWS docs: You can use this option (--force-new-deployment) to trigger a new deployment with no service definition changes. For example, you can update a service's tasks to use a newer Docker image with the same image/tag combination (my_image:latest ) or to roll Fargate tasks onto a newer platform version.
My question, if I use '--force-new-deployment' (as I will use the exisiting tag or definition), will the underline 'ECS Instance' automatically set to DRAINING state, so that any new task (if any) will not start in the EXISTING ecs-instance that is suppose to go away during rolling-update deployment strategy (or deployment controller) ?
In other words, will there be any chance:
For a new task to be created on the existing/old container instance, that is suppose to go away during rolling update.
Also, what would happen with the ongoing task that is running on this existing/old container instance, that is suppose to go away during rolling update.
Ref: https://docs.aws.amazon.com/cli/latest/reference/ecs/update-service.html
Please note that no Container instance is going anywhere with this 'update-service' command. This command will only create a new Deployment under the ECS service and when the new tasks become healthy, remove the old task(s).
Edit 1:
How about the request that were served by old task?
I am assuming the tasks are behind an Application Load Balancer. In this case, old tasks will be deregistered from the ALB.
Note: In the following discussion, target is the ECS Task.
To give you a brief description on how the Deregistration Delay works with ECS, the following is the sequential order when a task connected to an ALB is stopped. It can be due to a scale in event, deployment of a new task definition, decrease of the number of tasks, force deployment, etc.
ECS sends DeregisterTargets call and the targets change the status to "draining". New connections will not be served to these targets.
If the deregistration delay time elapsed and there are still in-flight requests, the ALB will terminate them and clients will receive 5XX responses originated from the ALB.
The targets are deregistered from the target group.
ECS will send the stop call to the tasks and the ECS-agent will gracefully stop the containers (SIGTERM).
If the containers are not stopped within the stop timeout period (ECS_CONTAINER_STOP_TIMEOUT by default 30s) they will force stopped (SIGKILL).
As per the ELB documentation [1] if a deregistering target has no in-flight requests and no active connections, Elastic Load Balancing immediately completes the deregistration process, without waiting for the deregistration delay to elapse. However, even though target deregistration is complete, the status of the target will be displayed as draining and you can see the registered Target of the old task is still present in the TargetGroup console with status as draining until the deregistration delay elapses.
The design of ECS is to stop the Task after the completion of Draining process as mentioned in the ECS document [2] and hence the ECS Service waits for the TargetGroup to complete the Draining process before issuing the stop call.
Ref:
[1] Target Groups for Your Application Load Balancers - Deregistration Delay - https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#deregistration-delay
[2] Updating a Service - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

Run batch file on windows ec2 instance then shutdown instance?

I have a .bat file on a windows ec2 instance I would like to run every day.
Is there any way to schedule the instance to run this file every day and then shut down the ec2 instance without manually going to the ec2 management console and launching the instance?
There are two requirements here:
Start the instance each day at a particular time (This is an assumption I made based on your desire to shutdown the instance each day, so something needs to turn it on)
Run the script and then shutdown
Option 1: Start & Stop
Amazon CloudWatch Events can perform a task on a given schedule, such as once-per-day. While it has many in-built capabilities, it cannot natively start an instance. Therefore, configure it to trigger an AWS Lambda function. The Lambda function can start the instance with a single API call.
When the instance starts up, use the normal Windows OS capabilities to run your desired program, eg: Automatically run program on Windows Server startup
When the program has finished running, it should issue a command to the Windows OS to shutdown Windows. The benefit of doing it this way (instead of trying to schedule a shutdown) is that the program will run to completion before any shutdown is activated. Just be sure to configure the EC2 instance to Stop on Shutdown (which is the default behaviour).
Option 2: Launch & Terminate
Instead of starting and stopping an instance, you could instead launch a new instance using an Amazon CloudWatch Events schedule.
Pass the desired PowerShell script to run in the instance's User Data. This script can install and run software.
When the script has finished, it should call the Windows OS command to shutdown Windows. However, this time configure Terminate on Shutdown so that the instance is terminated (deleted). This is fine because the above schedule will launch a new instance next time.
The benefit of this method is that the software configuration, and what should be run each time, can be fully configured via the User Data script, rather than having to start the instance, login, change the scripts, then shutdown. There is no need to keep an instance around just to be Stopped for most of the day.
Option 3: Rethink your plan and Go Serverless!
Instead of using an Amazon EC2 instance to run a script, investigate the ability to run an AWS Lambda function instead. The Lambda function might be able to do all the processing you desire, without having to launch/start/stop/terminate instances. It is also cheaper!
Some limitations might preclude this option (eg maximum 5 minutes run-time, limit of 500MB disk space) but it should be the first option you explore rather than starting/stopping an Amazon EC2 instance.

Octopus Deploy queue doesn't pick up new tasks

Since the last 2 weeks Octopus Deploy doesn't pick up new tasks. In the server logs I discovered this log multiple times:
Outstanding health checks for the machine policy Default Machine Policy were not completed before the next task was due to be scheduled. If this error persists, check the Tasks tab for any running health check tasks, and cancel them manually.
I've deleted the health check task manually, but this doesn't solve the problem. I restarted the Tentacle service: but still no luck.... how can I resolve this?

Is it possible to get TeamCity to stop & restart Amazon EC2 instances for build agents?

I have TeamCity (7.0.2) successfully spinning up an EC2 VM from a custom AMI, running our build, and sending back the build artifacts.
However, even when I used to do this with older TeamCity versions, I was always unhappy with the notion that it simply terminates the instances after they are done, and then creates new instances using the configured AMI next time a build agent is needed.
Can I get TeamCity to issue "stop" commands instead, followed by "start" commands? This has a tonne of advantages - quicker spin-up time, allowing for named instances in the agent stats, and saving the Mercurial clone to EBS for the next build are just three.
p.s. I guess I could use chained builds to call the EC2 API directly rather than use the in-built cloud support, but that sounds like a lot of work and feels flaky
We plan to provide support of EBS instances start stop in TeamCity 7.1
Please vote for TW-16419
TeamCity 7.0 may leak EBS volumes TW-12517

Resources