Disabled Microsoft CI is still attempting to deploy - continuous-integration

About six months ago I contacted Microsoft support because when I did Continuous Delivery I would get error messages from every server it had once been on, that it failed. (five or six messages each time)
Since it was not supposed to actually deploy to any of those servers that were no longer being used, I contacted Microsoft support.
They were unable to figure out the cause, despite an email chain to me that was at least 35 responses deep.
So, they disabled CI for my repository.
Not a great solution, but I no longer needed to worry about my code being deployed to random servers.
However, today, even though have not enabled CI configuration I again received a message that it had attempted to deploy and failed.
So, if Microsoft can't fix this CI issue, what is my next step?
EDIT: Well, it looks like the one failed deployment was a fluke as I have since checked in twice more, and I havent received any emails indicated a failed deployment.

I have since pushed a build twice more and it did not happen again. So random error I guess.

Related

Alter 'status' request interval of CloudBuild submit

I'm trying to setup the CI/CD setup of a mono repository using Google Cloud Build. We have a single Cloud Build trigger that starts a build on a new commit, it does some general steps and then then starts a build for every (micro)service in the mono repository using gcloud build submit.
This however means that if 4 or 5 people are push code to the repository roughly at the same time we can have around 50-70 concurrent builds running in cloud build. Which in itself isn't an issue for us. The only issues is that when this happens the following errors will popup:
{
“code”: 429,
“message”: “Quota exceeded for quota metric ‘Build and Operation Get requests’ and limit ‘Build and Operation Get requests per minute’ of service ‘cloudbuild.googleapis.com’ for consumer ‘project_number:<PROJECT_NUMBER>’.“,
“status”: “RESOURCE_EXHAUSTED”,
“details”: [{
“#type”: “type.googleapis.com/google.rpc.ErrorInfo”,
“reason”: “RATE_LIMIT_EXCEEDED”,
“domain”: “googleapis.com”,
“metadata”: {
“service”: “cloudbuild.googleapis.com”,
“consumer”: “projects/<PROJECT_NUMBER>”,
“quota_limit”: “GetRequestsPerMinutePerProject”,
“quota_metric”: “cloudbuild.googleapis.com/get_requests”
}
}]
}
In other words: We are running into quota limits. The quota only allows us to only make 900 operational requests per minute.
We already tried switching to private pools in the hope that the above quota limit was only there for when you don't use private pools, but this unfortunately still makes us hit the quota.
Now, I am trying to find out if I can decrease the amount of these operational requests.
A possible solution might be related to how I am using gcloud build submit. When you run gcloud build submit, it starts a new build, waits for the build to finish, and shows the output of the build. To achieve this, I presume that gcloud is making requests every few seconds to find out what the status of the build is. I suspect that these 'status' requests are why my Cloud Build quota limit is reached. Which is why I'm trying to see how I can lower the amount of these requests per minute.
One option is to simple decrease the amount of builds running in parallel, which is unfortunately not an option in my situation. If I execute them sequentially it simply takes more time than acceptable in my situation.
Another option would be to increase the time in between such 'status' requests. However, on this page I did unfortunately not find a CLI flag to alter this.
Note: I did find the --async flag, however that does NOT help me, since I still want the process to wait until the build has succeeded. And I also did find the --supress-logs, which also does NOT help me, since these requests presumably don't interact with Cloud Build but with the GCS bucket where the logs are stored.
The only option left that I can think off, is that I can start my builds with the --async flag and then manually request whether the build has succeeded using a longer interval. However I do feel like that is a lot of manual work that, for which I need to write some bash scripts that need to be maintained. This preferably isn't a path I would like to take unless really necessary.
Does anyone know of another way of achieving this?
If 4 or 5 people are push code to the repository
This shouldn't happen. The reason it shouldn't happen is because you should use the "push" trigger on the main branch, not on a development branch.
What do I mean by this?
I mean that building should occur on the main branch, which would correspond to joined effort of those five users and a responsible party in charge of unifying their changes.
So, really, your users should be pushing to the development branch, and pushes to main should be reserved for things that need to be built.
How can we work around this if we're only allowed one branch or are required to have updates visible on one branch?
My recommendation would be to use the tag filter, specifically filter the pushes by tag, as mentioned in the documentation. That way only the pushes person in charge of merging the changes will be built (assuming that this person pushes to the tag you've set)
TL;DR
Don't create push triggers for Cloud Build on a branch multiple people are working on. Either create it with a tag filter or have seperate development and main branches (people work on dev, builds are only made from pushes to main)

How can I suppress subsequent pipelines emails in Gitlab CI in case of a series of failed jobs?

I've got a pipeline that logs in and logs out off a web-application every 5 minutes to ensure that the apps backend works, the database is up...
There occurred a problem that was not even related to the app directly, and my boss was bombarded with email-notifications. Is it possible to limit the emails that notify of a series of broken pipelines to only one, and suppress all subsequent emails until the pipeline has been fixed?
It seems that the editor for "Pipelines emails" is rather limited and doesn't support this directly. However, this option exists in Jenkins, and I'm wondering if someone figured out a solution or a workaround to achieve this in Gitlab CI. (Is it possible to script something like this in the ".gitlab-ci.yml"-file?)

Inconsistent crashes on Heroku app: where to look?

Disclaimer: Please tell me if the question is too broad, and I will do my best to narrow it down.
We have an Heroku app which is running 2 web 1X dynos. This infrastructure has been running for the last 9 months.
However, in the last few weeks, we had several episodes where the app would see its response times skyrocketing for about an hour, before returning to normal without us doing anything about it.
On the pictures below, you can find an extract of Heroku Metrics during one of these "episodes", which happened yesterday afternoon.
As you can see, response time is going up and eventually, almost any request made to the server gets a timeout. During the event, it was barely possible to even load the home page of our website, hosted on this app. Most of the times, we would get the "Application Error" Heroku page.
What I see is:
The amount of requests to the server (failed or not) was not crazily high (less than 1000 every 10 minutes). For this reason, I think a DDOS attack is out of the picture.
Everything that is shown by Heroku Logs is that the failed request get a 503 (Service Unavailable) error, which would make me think about an overload.
The dynos do not seem overloaded. The memory usage is low, and the dyno load is reasonable, nothing unusual.
Heroku reported no issue during our crash event, as https://status.heroku.com/ states (last incident was on the 1st of July).
Restarting the dynos through several methods (from the interface, a command line or triggering an automatic deployment via our Gitlab repository) had no effect.
I am quite unsure as to how to interpretate these metrics, and what would be the solution to ensure this kind of episode does not happen again.
So my question is: where should I look? Is there some kind of documentation about how to investigate crashes on Heroku apps?

CRM 2013 SP1 On-Prem: Mailbox stuck in: "The email configuration test is in progress" status

I have a main incoming mailbox in CRM (sort of like a contact#organisation.com address) but for some reason today, the mail has stopped flowing into CRM.
I have tried resetting the async services as mentioned here: https://community.dynamics.com/crm/f/117/t/159892
But the mailbox remains stuck.
I have also tried enabling tracing on the backend servers but even have 20 minutes I didn't find any issues in the trace logs...
EDIT
So the process completed at 4am this morning and I can't work out why the test took over 17 hours to complete without error.
What is odd is all the async jobs ran yesterday (in a timely matter) without issue and I did not find any errors related to this email issue in the trace logs.
I am concerned with why this process took so long without any indication of an issue and I would be happy for someone to raise some suggestions

How can I prevent TeamCity Cloud Agents (running on Amzon EC2) from leaving defunct volume snapshots when they terminate?

I've recently set up TeamCity to use agents running on Amazon EC2. I've been using it for a week or so and noticed my bill was racking up quite quickly. When I looked in my dashboard I had 30 or so snapshots, which I presume I'm being charged for.
TeamCity starts agents on demand and stops them when they've been idle for about an hour. At that point, the instance is deleted. I would expect the corresponding snapshot to be deleted too, but that doesn't seem to be happening, they are piling up like zombies!
When setting this up I followed Roy Osherove's guide and I tried to follow to the letter, but I'm sure I must have missed something. Does anyone know where I went wrong?
you can create a special teamcity-agent image, when add esb volume, check the "Delete on Termination" checkbox.

Resources