TeamCity seems to completely shut down during the clean-up process, including stopping all active builds. The only option for scheduling seems to be nightly. I have builds that take up to several days to run. Do I have any options other than disabling the scheduled process?
Even for my play server the build process took almost 30min to execute. I'm a bit worried as well about what a production server would look like, especially running daily!
You need a heavy infrastructure in place for CI if you are planning to make it robust. See whether the build artifacts are cleared and configured properly in teamcity. Instead of doing a cleanup in one setting, you can automate it and probably remove relatively older build history periodically
And the ansewr to your question is No, cleanup does not require you to shutdown the server unless it crashes.
Edit : The teamcity shutdown may be due to innumerous reasons i can only speculate on. It might have gotten shut down because of a crash. But according to the policy given in Jebrains wiki, there is no mention about the server shutdown. And it is totally not logical to shutdown the entire server for a cleanup. Source.
Related
I'm trying, but failing to setup a reliable continuous integration environment using Xcode server.
I have a git repository on a headless mac mini server running the Xcode server service, the server has a separate development user account with administrator privileges that is used by Xcode.
I have setup my schemes, with testing included and shared them to the repository.
The bots run, check out code, build, analyze and archive, but only seems to run tests when it feels like it, which is almost never. I've checked the schemes and they have not changed since Xcode ran the tests and when it didn't.
On first setting them up, tests wouldn't run at all, until I added administrator privileges to the development account, then the tests ran a couple of times, before Xcode server decided to stop running them again.
I don't seem to get any reason why the tests aren't run, sometimes the bots fail to run because of some crash during the setup, and an error is reported, but mostly the bot seems to run, they just don't execute the tests, and no error is reported.
I've logged in remotely to the server, and the simulator is running, but never seems to do anything.
Here's a screenshot of an example bot, you can see the tests used to run, it sees I've reduced my warnings and got rid of an analysis issue. You can also see where no tests run, and no kind of warning or error is given as to why.
I've tried restarting the server, nope.
I've tried restarting the client, nope.
It's really frustrating and can't find any recent issues that offer a proper solution to this. The server is in constant use running backups and other tasks, so I'd rather not have a solution that involves me logging in to the server and restarting something every time there's a problem, which is always, it makes the whole point of bots useless if I'm spending more time logging in to my server trying to get them to work than they are at actually running.
Anyone have similar issues and a solution?
Edit: Noticed that my memory usage was very high on the server, memory pressure was practically always amber, so went out and got some memory today, increased the mac mini's memory from 4GB to 16GB, and now the tests have started running again. Also, the whole process is much faster (less than surprising i guess).
Could it just be low memory causing problems with the simulator? I've only just installed the memory and restarted, so I'll give it a few test runs before I confirm this solution, it's stopped working before...
Seems that this may be a memory issue, I upgraded the servers memory from 4GB to 16GB as my Activity monitor was showing significant memory pressure.
Since doing this, the bots started running tests again, and the total running time for the bot is a quarter that it was.
As per my edit, I've been running the bots for a day now, including bots that run on multiple simulators, and everything seems to be fine.
It's not very good that no obvious indication is given in xcode as to why the tests didn't run.
For reference and to see if this might fix your problems, original server specs were :
Mac Mini Server edition (late 2012)
2.3 GHz Intel Core i7
4GB memory
2x1TB drives
Replaced the 2x2GB memory sticks with 2x8GB sticks (The maximum allowed for the model)
EDIT : After a month of running with no problems, increasing the memory has solved the problem permanently.
On a windows 2012 RT (x64) TEST server we are running a Tomcat 8 installation and the CPU usage is disconcerting in its regularity of hitting peak usage.
The behavior is happening after an installation of our application but before anyone is accessing it. I have accessed a few pages and tested some features but nothing that could create this behavior that I know of.
There are 2 virtual processors on the server and every ~20 seconds, the CPU usage will spike (on the one processor that is running Tomcat) to 100% for 10 seconds (give or take). See below:
The regularity of the pattern indicates to me that something is incorrect in either the installation or the settings of Tomcat 8.
I have installed the YourKit Java Profiler (by SO recommendation) which I was hoping could shed some light on what is causing these spikes, but haven't been able to see the reason the threads are starting -- at least in part to my newness to YourKit. I did attach it to the Tomcat launch file and it seems to be tracking the behavior.
The catalina logs are silent during the spiking occurrences (as are my application logs) but when I stopped Tomcat there were some messages about ThreadLocals getting started but could not be removed and then: "...Threads are going to be renewed over time to try and avoid a probable memory leak."
I left the server running over the weekend and the pattern has continued until today so I don't think my symptoms are going away. Whatever is starting up has now consumed all available RAM on the system just from starting up these threads (and/or YourKit) every 20 seconds.
What is a possible approach to isolate this aberrant Tomcat activity and hopefully stop or rectify it?
There are many graphs and tabs in YourKit so I hesitate to list everything that might be helpful. Thanks for helping me narrow down the problem with what YourKit (or other tools) could offer me.
Info from catalina log regarding start-up:
Apache Tomcat/8.0.23
Architecture: amd64
Java Home: C:\Program Files\Java\jre1.8.0_65
CATALINA_BASE: C:\Program Files\Apache Software Foundation\Tomcat 8.0
2015-12-08 Update
At Gergely's request, the application is a local installation of DSpace. It's a Java application with a Postgres SQL database backend. We are customizing an opensource version of it from here: http://www.dspace.org/introducing. I'm not exactly sure what else can be helpful and I think the stack trace is more revealing as to what is (and isn't) running -- see below.
By turning on Stack Telemetry in YourKit, "CPU Estimation" was made available by dragging the cursor across a period of profiler history. To me, it looks like all the CPU is doing is spinning idly. Are the Java files pictured below Tomcat routines? They don't strike me as DSpace related (although I'm not an expert) nor does it look like any work is being done while the CPU is peaking.
Of note: the stack trace is identical during the quiet periods -- the only difference being CPU Time (ms) is in the hundreds rather than thousands of milliseconds. For a more direct comparison than what is below, the hump represents ~8,000 ms in Thread.run() and the quiet periods consume ~125 ms of cpu time (although covering approximately the same amount of time).
Lastly, when pages of the application are being requested, a subsequent branch of code appears in the Call Tree. If it happened during the time of a spike it may only take 400 ms of CPU time to load a whole page. The code branch that appears is ApplicationFilterChain.java as a whole separate branch alongside PooledExecutor$Worker.run() -- both underneath java.lang.Thread.run() in the hierarchy.
When trying to interpret the stack trace: Is EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run() responsible?
Processor spikes with no known, associated activity
2015-12-08 Update #2
YourKit comes pre-configured to hide certain java class name patterns which obscured drilling down on java.lang.Thread. Clearing the filters enabled the following screenshots showing that the vast majority of processing time during a spike event is through calling the following 3 methods:
java.io.WinNTFileSystem.canonicalize0
java.io.WinNTFileSystem.getBooleanAttributes (inFile.exists())
StardardRoot.java
My apologies for not yet knowing enough about Tomcat or DSpace to know who is launching these tasks. (In case it matters the line directly above the first line is java.lang.Thread.run() and then <All threads>)
Thank you to those who has viewed and responded to this inquiry. As various individuals have surmised, the problem was related to our settings and use of Tomcat -- not a problem with Tomcat itself (most likely).
This is an attempt to answer the question without perfect knowledge at installing the DSpace application and Tomcat but I think I know enough to be dangerous and potentially helpful to follow-up users.
When installing the application DSpace there are some installation properties in Tomcat's configuration directories that determine whether or not to allow for changes in coding files to be reflected immediately without a Tomcat restart. These settings for us were previously in the directory [tomcat]/conf/Catalina/localhost/ and each of the three files contained a small, insignificant XML file like (e.g. oai.xml):
<?xml version='1.0'?>
<Context docBase="E:/dspace/webapps/oai"
reloadable="false"
cachingAllowed="true"/>
You can find documentation on these properties at the following link:
https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace
Within that documentation is a recommendation about the reloadable and cachingAllowed properties. Search for "Tomcat Context Settings in Production". Here is an excerpt (emphasis mine):
These settings are extremely useful to have when you are first getting started with DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and see your changes get automatically reloaded by Tomcat (without having to restart Tomcat). However, it is worth noting that the Apache Tomcat documentation recommends Production sites leave the default values in place (reloadable="false" cachingAllowed="true"), as allowing Tomcat to automatically reload all changes may result in "significant runtime overhead".
It is entirely up to you whether to keep these Tomcat settings in place. We just recommend beginning with them, so that you can more easily customize your site without having to require a Tomcat restart. Smaller DSpace sites may not notice any performance issues with keeping these settings in place in Production. Larger DSpace sites may wish to ensure that Tomcat performance is more streamlined.
When I switched these boolean flags to reloadable="false" and cachingAllowed="true" the spiked CPU experience stopped immediately. I don't know if the warning about "Larger sites" applies to us or whether "streamlined performance" could refer to the negative activity I observed.
I presume there may be other problems with our installation that allowed this particular manifestation; one ominous clue is that our production server seems to be operating with these flags in the reloadable="true" configuration. Java, Tomcat, Windows, AND DSpace are ALL getting new versions at the same time so it is fairly difficult to pinpoint why similar Tomcat <context> settings produce such different results.
I am at least content for now to have new behavior and that the system has calmed down. I'll post more if I learn more but will be focusing next on other quandaries.
Update
FWIW, the attributes are settings that directly control Tomcat and they have changed between versions. E.g., cachingAllowed was removed in version 8 which means it can be removed from the Context elements. Compare:
https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Attributes
https://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Attributes
And for good measure, here is the help text for reloadable in the Tomcat 8 documentation:
Set to true if you want Catalina to monitor classes in /WEB-INF/classes/ and /WEB-INF/lib for changes, and automatically reload the web application if a change is detected. This feature is very useful during application development, but it requires significant runtime overhead and is not recommended for use on deployed production applications. That's why the default setting for this attribute is false. You can use the Manager web application, however, to trigger reloads of deployed applications on demand.
So it would seem that the ultimate answer is that Tomcat 8 on Windows 2012-R2 with the flag reloadable='true' polls for changes to WEB-INF/lib and WEB-INF/classes. The volume of the folders and files to peruse may very well be the cause of these intense, spiked CPU events. For now I will be relying on reloadable='false' which definitely removes the symptom for us.
Not an explicit answer, but way too long for a comment
After reviewing the update on this question and reading a bit I suspect that this recurring issues is caused by a CuratorTask. Reasons being:
The stacktrace you acquired clearly shows that a WorkerThread managed by the DSpace library (so Tomcat is not to be blamed) is using the processor at those times.
After reading a bit about DSpace itself, it looks like that it has a feature that allows users to define curator tasks that should be periodically executed.
On top of this there is at least one task that is - according to the documentation - It is activated by default, so theoretically there can be any number of tasks activated by default.
Moreover this conversation reveals at least 1 curation task that is actived every 10 seconds.
All these together point to the same direction. I would suggest using the UI of DSpace (probably in Admin mode) to look around and find the active curation tasks and verify if their scheduling corresponds to what you have observed.
It must be obvious, but I cant get a usecase of Delayed Job, cause due to ruby`s Gargabe Collector specific, it doesnt free memory back to OS. And once delayed job process will take all memory anyway. And the only way is to restart delayed job process.
But if I restart delayed job process and there is currenlty running task - it will never be completed. Probably, there is some workaround to restart that task later, but this approach seems ugly to me.
I tried real jobs and some simple computatuin without any variables, symbols or links so I dont think that "my code leaks". Still, every new job increases memory of delayed_job process.
May be I use Delayed job for something that its not designed? Or it could be environment problem (besides, tried on local machine and on VPS) ?
Tested on: Ubuntu 14.04 and Debian 6 (both x86), Rails 3.2, delayed_job 4.0.2, delayed_job_active_record 4.0.1, ruby 2.1.2
I could give some code examples, but, as I mentioned, I tried both: real job and simple computation. So I won`t if it is not significant and my mistakes are fundamental.
Due to my conditions - my tasks can be executed for couple of minutes, read and write about 100K records to database and require a lot of computation, tasks cant be interrupted, and number of tasks limited by 10-20 dayli, may be - I only guess to use Resque, because it forks process everytime, so there should be no problems with accumulating memory with time.
So do I realy do something wrong or this is a nature of DJ - to occupie all memory or require a restart - and if I cant restart it, I shouldnt use its approach ?
Everything I read on the internet (not so much, by the way) tells that its rubys GC trouble that it doesnt free memory back to OS, and some advises to profile code for unlinked objects (it sounds the most realistic to my case, but, I tried a lot with code that doesnt create any objects, and I explicitly set everything to nil and call GC.start)
Our Jenkins server(linux machine) slows down over a period of time and it gets unresponsive. All the jobs take unexpectedly long time(even though they run on slaves which are different machines from server). One of things I have observed is increase in the number of open files. The number seems to be increasing as shown in the image below. Does anyone have a solution to keep check on this without restarting the server? Also, are there any configurations/tweaks that could improve the performance of the jenkins server?
We are using Jenkins for four years and we tried to keep it up-to-date (Jenkins + plug-ins).
Like you we experimented some inconvenience, depending on new versions of Jenkins or plug-ins...
So we decided to stop this "continuous" upgrade
Here are humble tips:
Avoid technical debt. Update Jenkins as much as you can, but use only "Long Term Support" versions (latest is 2.138.2)
Backup your entire jenkins_home before any upgrade!
Restart Jenkins every night
Add RAM to your server. Jenkins use file system a lot and this will improve caching
Define JVM min/max memory parameters with the same value to avoid dynamic reallocation, for example: -Xms4G -Xmx4G
Add slaves and execute jobs only on slaves
In addition to above, you can also try:
Discarding old builds
Distribute the builds on multiple slaves, if possible.
After several months of successful and unadulterated continuous integration, my Hudson instance, running on Mac OSX 10.7.4 Lion, decides it wants to enter shutdown mode after every 20-30 minutes of inactivity.
For those of you familiar with shutdown mode, the instance of course doesn't shutdown, but has the undesirable effect (in this case) of stopping new jobs from starting.
I know I haven't changed any settings, so it makes me think the problem was slowly growing and keeps triggering shutdown mode.
I know there is plenty of storage space on the machine with 400+ GB to go so I'm wondering what else would trigger shutdown mode without actually using the Hudson web portal to manually do it.
As mentioned before, the problem also seems to be tied to inactivity. I tried creating a quick fix, which is a build job that does nothing every 5 minutes. It appeared to work at first, but after long periods of inactivity I will find it back in shutdown mode.
Any ideas what might be going on?
Solution: disable the thinBackup plugin
...
I figured this out by taking a look at the Hudson logs at http://localhost:8080/log/all
thinBackup was running every time the Hudson instance went into shutdown mode.
The fact that shutdown mode was occurring at periods of inactivity is also consistent with the behavior of thinBackup.
I then disabled the plug-in and Hudson no longer enters shutdown mode. What's odd is that thinBackup had been installed for some time before this problem starting occurring. I am seeking out a solution from thinBackup to re-enable the plugin without the negative effects and will update here if I get an answer.
According to this link, the thinBackup plugin puts Hudson into shutdown mode on purpose to do the backup activity. It is supposed to automatically come out of shutdown mode once it is done.
I saw this with some jobs that seemed to stall and never finish overnight, so Hudson never came out of shutdown mode because thinBackup must have been waiting on the jobs to finish.