We have started to explore Spring Cloud Consul and we see that if the Consul agent is down the app crashes when it started... is it the expected behavior? we have expected that the app will wait to the agent to be up and retry at least several times or at least that this behavior will be configurable...
Also if the agent was up at start it successfully registers the service in the catalog but if at some point the agent went down even for several seconds the app will fail to talk the agent and will not retry to talk with the agent again... This cause a scenario that the app is up but it is no longer talking to the agent again we would expect a retry...
It is an open issue that we are tracking.
Related
We have a springboot service which has the capability to recover itself after database restart. But all of a sudden we noticed "recoverer is already running, abandoning this recovery request" in the logs and healthcheck of service is failed. We had to get the service restarted in both our datacenters.
Has anybody faced the similar issue?
==Edit
Below are the configurations:
spring.jta.log-dir=target/transaction-logs
spring.jta.bitronix.datasource.className=bitronix.tm.resource.jdbc.lrc.LrcXADataSource
spring.jta.bitronix.datasource.driverProperties.driverClassName=com.microsoft.sqlserver.jdbc.SQLServerDriver
spring.jta.bitronix.datasource.driverProperties.url=
spring.jta.bitronix.datasource.driverProperties.user=
spring.jta.bitronix.datasource.driverProperties.password=
spring.jta.bitronix.datasource.test-query=select 1
spring.jta.bitronix.datasource.max-pool-size=100
spring.jta.bitronix.datasource.prepared-statement-cache-size=100
I have a IBM WebSphere 8.5.5.13 ND on Windows 2016 standard edition with JDK 1.7 enabled. I see that, node agent and server1 (application server) are getting stopped everyday but the deployment manager is still up and running (i.e. admin console can be accessed). So, I have to start nodeagent and the associated server manually everyday. Investigation done so far
Checked if the windows servers are getting restarted everyday? No they are not
Checked nodeagent start and stop server logs but there are no entries to see, some command was issued for stopping
Checked application server profile (server1) logs but nothing is there.
FYI, I don't have clustering done on WAS but it is planned for the future.
I don't no where else I can look for the reason the node agent and server1 is getting stopped everyday.
okay, this is what I found out. In my case I have
Dmgr01 - registered under windows service
Node agent - not registered under windows service
application server - no need or never register application server if you have deployment manager
Since my node agent was not registered under windows service, whenever I log-off or my session is killed due to in-activity, the default behavior is that, all running processes (jave.exe) associated with WebSphere will be crashed and there will not be any trace of it. This is why, I was unable to find the any logs.
I registered my node agent as windows service and everything worked.
Where can I find that whether iis job has been scheduled or last ran? I checked the task scheduler but it did not lead me to anything. any help will be appreciated.
Short Answer
Unfortunately, no short and fast way to see when IIS was reset last. This needs a monitoring system in your environment.
Long Answer
IIS as the service is controlled by WWW publishing service in Windows Services. Therefore, anytime the service is down, IIS is down (Server reboot, stopping the service manually etc.)
Service start\stops are logged in Windows event viewer as Service has stopped and Service has started.
IIS has application pools which are "containers" for websites to run in. These app pools may stop\restart due to variety of reasons.
Application Pool was manually reset at some point using IIS Manager.
Application Pools are by default set to restart ever 1740 minutes (29 hours). Therefore, any request at that moment of restart is going to receive an error.
At times application pools crash due to exceptions in website code.
Just as services Application pool restarts are logged in the Windows Event Viewer
Example:
A worker process with process id of '1234' serving application pool '{your apppool name}' was shutdown due to inactivity. Application Pool timeout configuration was set to 20 minutes. A new worker process will be started when needed.
You can also look at the IIS Logs (C:\inetpub\logs\LogFiles by default) When ever you see the log header, it means the process was started at that time.
I got a Glassfish 4.1.1 copy with two domains on Win2012R2 (no clusters, no instances). I've set a windows service for each of those.
Both services run regularly up until the moment when I restart either or both of them thru their admin web console (server (Admin Server) -> Restart). The following happens:
The domain-related service stops, but does not start again,
The allegedly stopped domain is perfectly functional (deployed apps and admin console are there) (!!!),
When I try to start the win service manually, I get Error 1067 (GF reports "something" is already listening on required ports and that's the domain itself that is now, somehow, NOT run as a service!),
I can start the service again only after I've stopped the domain thru server (Admin Server) -> Stop.
Why did I mention two domains? Because this does not happen when I have just one domain with its' service.
Domains do not share ports, only things in common are the JDK/JRE and general GF files.
Is this a bug in Glassfish or did I set something wrong?
This is a limitation, rather than a bug. The problem is that GlassFish has no way to tell whether or not it is running as a service (and, if it is, what the name of that service would be).
The restart command means that GlassFish is restarting itself, so Windows detects that the process it started has been terminated and shows the service as stopped, but GlassFish spawns a new JVM itself. It has no capability to tell Windows to start the service again.
Essentially, the behaviour you are seeing is expected.
After some more testing, I realized what was going on:
Glassfish is definitively capable of restarting its' own Windows service,
The thing that was happening is it takes GF a few seconds do this on its' own,
But, before GF domain could restart as a service, I clicked the URL to return to admin console, every time. That forced it to run as an ordinary executable.
It does seem like the restart happens faster with just one win service, but I won't claim that as an absolute truth without more testing, for which I have no time now.
Teamcity build agent in disconnected state (Agent has unregistered (will upgrade)) on the server UI.
The build agent service was in hung state tried reboot but still didn't work so manually upgraded the TC build agent with the version server had. Rebooted the build agent service. Still disconnected. Please suggest.
I ran into this issue and found a solution, but I'm going to make a few assumptions about your setup.
This fixed an issue I had with a TeamCity build agent on Windows and running as a user account (as opposed to a System Account).
Stopped the TeamCity service and changed the account to a System Account
Started the TeamCity service and waited about 10 minutes for the upgrade to complete. The build agent showed up in the "Connected" agents tab indicating a successful upgrade.
Stopped the TeamCity service and switched back to the user account
Started the TeamCity service
The other option is to grant the user account permissions to start/stop services, but I went this route instead. See this article for those steps.
Old question but someone might find my comments useful. If you cant read the upgrade logs, check the buildAgent/update/ folder, If files and files sizes are changing in this particular folder then It means that the Agent is updating and you only need to wait. If this is not the case but you still see Agent has unregistered (will upgrade) in Team city under Agents --> Disconnected then the agent is either hung or there is some problem with it. Stop the agent from the services and then by running the agent.bat(Windows) and agent.sh (nix) by giving stop argument and then start it from the same script using start argument. You can also see the status of the agent using status argument. If this also does not work then you will have to read all the logs.
This worked for me:
In the Agents tab, I removed the build agent by clicking "Remove Agent".
I restarted the service.
I refreshed the Agents tab and it the build agent appeared in the Unauthorized Agents.
I authorized the agent and it's now connected.
For the one who kept restarting build agent service and see the "Agent has unregistered(will upgrade)", please check the log under BuildAgent/logs to see the upgrade process and wait.
I just encountered this problem too on Ubuntu Linux 19.10 and it is related to systemd. My TeamCity agents are started and stopped using a systemd script and apparently this is what prevents them from upgrading. When I stopped teamcity systemd services and started agents manually with agent.sh start agents successfully updated and worked just fine since that.
It could be the permissions on the account under which the agent is running. In BuildAgent\Logs\Upgrade.txt, you may find this
Upgrade failed: Failed to stop TeamCity build agent service. Please check TeamCity build agent service user have enough permissions to stop and start the service.
java.io.IOException: Failed to stop TeamCity build agent service. Please check TeamCity build agent service user have enough permissions to stop and start the service.
Although the service appears to be running fine on the machine (windows in my case), it produces the error in it's log rather than event viewer or failing to start, and disconnects from TeamCity on upgrade.
I gave higher privileges and it started to work. +1 to Lemtronix's way if you do not want to restrict permissions of your service account.
I had the same issue. I triggered a build and the agent was automatically changed to connected status.
Looks like the agent tries to upgrade itself, but if your Windows service is set up running from non-admin account, it fails.
Options are:
Temporary change service account to System as proposed by #Lemtronix
Add user to administrators group and restart service.
I resolved this issue with TeamCity 2019.2.4 on Windows Server 2016 by completing the following steps listed below:
Stop the TeamCity Build Agent service.
Stop the TeamCity Server service.
Start the TeamCity Server service.
Start the TeamCity Build Agent service.
Refresh the TeamCity UI tab in your browser window, and wait a few moments for the status to reflect Connected in green.
Changing the service user did not fix the issue for me :
but this method worked (Windows, with Teamcity server and agents on the same machine).
Stop these services : Teamcity server, all the Teamcity agents.
Open a cmd.exe instance as Administrator.
Run these (adapt to you own paths / number of agents). It will start Teamcity server and agents as simple processes (i.e. not as services) :
cd C:\Teamcity\bin
.\runAll.bat start
cd C:\Teamcity\BuildAgent1\bin
.\agent.bat start
cd C:\Teamcity\BuildAgent2\bin
.\agent.bat start
cd C:\Teamcity\BuildAgent3\bin
.\agent.bat start
Check in the UI that the agents have registered again.
Kill the processes created in 3. ; and start the Teamcity services again.