Memurai hangs while processing - windows

I am using Memurai 2.0.2 for cache in my distributed application. It runs different services on different machines and all services have Memurai details with them.
The problem that happens is, that sometimes Memurai process just hangs. The Memurai process keeps on running but no queries are served. I am not able to create a connection to it. It's log file consists of an error:
Error trying to rename the existing AOF to old tempfile: Broken pipe
This generally occurs when I restart the Memurai service. Although I am not sure what is the reason for it. Memurai works fine if I restart its service once.
What can be the issue here? What steps can I take to avoid/ minimize its occurrence?

Memurai 2.0.2 is fairly outdated now. Perhaps get the latest version (3.1.4 at the time of this response) at https://www.memurai.com/get-memurai

For whoever looking for an answer, this happened because another service restarted Memurai service when background rewriting of AOF was in progress. Due to this, some zombie processes were getting created and when Memurai started again, this error was coming up.
The solution that we did was to check if any background rewriting is happening by using settings aof_rewrite_scheduled and aof_rewrite_in_progress from Persistence info. If any of these flags is true then don't stop the service.

Related

How do you troubleshoot all Apache threads becoming occupied and idle?

I have a Drupal 6 site that is frequently (about once a day) going down. The hosting provider is reporting that something in our site code is occupying all Apache threads but keeping them idle, making the server run out of threads to respond to new requests. A simple restart of Apache frees the threads and fixes the issue, though it reoccurs within a few hours or a day.
I have no idea how to troubleshoot this issue and have never come across PHP code doing this. Is there some kind of Apache settings change I can make to capture more information about what might be keeping a thread occupied but idle? What typical PHP routines can cause this behavior? I looked for code that connects to external resources, but didn't see any issues there.
Any hints for what to look at, capture more information, or PHP code that can cause this would be most useful.
With Drupal6 you could have the poormanscron module running sometimes, or even the classical cron (from crontab wget or whatever).
Then you could get one heavy cron operation putting your database under heavy stuff. Then if your database reponse time is becoming very slow every http request will become very slow (as for example sessions are in the database, and several hundreds queries are required for a drupal page). having all reqests slowing down may put all the avàilable php process in a 'occupied state'.
Restarting apache all current process are stoped. If you run the cron via wget and not via drush cron tasks are a nice thing to check (running cron via drush would make it run via php-cli' restarting apache would not kill the cron). You can try a module like elysia cron to get more details on cron tasks and maybe isolate the long ones (you have a report on tasks duration).
This effect (one request hurting bad the database, all requests slowing down, no more process available) could also be done by one bad piece of code coming from any of your installed modules. This would be harder to detect.
So I would ensure slow queries are tracked on MySQL (see my.cnf otinons), then analyse theses requests with tolls like mysqsla. The problem is that sometimes one query is so big that all query becames slow. Se use time of crash te detect the first ones. Use also tho MySQL option to track queries not using indexes.
Another way to get all apache process stalled on php operation with drupal is having a lock problem. Drupal is using is own lock implementation with MySQL. You could maybe add some watchdog (drupal internal debug messages) calls on theses files to try to detect locks problems.
Then you could also have sonme external http requests calls made by drupal. Calling external websites like facebook, google, some tiny url tools, or drupal.org module update things (which always try to find all modules, even the one you write). If the distant website is down or filtering your traffic you'll have problems (but the apache restart would not help you, so it may not be that).

Troubleshooting MVC4 Web API Performance Issues

I have an asp.net mvc4 web api interface that gets about 54k requests a day.
http://myserv.x.com/api/123/getstuff?whatstuff=thisstuff
I have 3 web servers behind a load balancer that are setup to handle the http requests.
On average response times are ~300ms. However, lately something has gone awry (or maybe it has always been there) as there is sporadic behavior of response times coming back in 10-20sec. This would be for the same request hitting the same server directly instead of through the load balancer.
GIVEN:
- System has been passed down to me so there may be gaps with IIS confiuration, etc,.
- Database: SQL Server 2008R2
- Web Servers: Windows Server 2008R2 Enterprise SP1
- IIS 7.5
- Using MemoryCache aggressively with Model and Business Objects with eviction set to 2hrs
- Looked at the logs but really don't see anything significantly relevant
- One application pool...no other LOB applications running on this server
Assumptions & Ask:
Somehow I'm thinking that something is recycling the application pool or IIS worker threads are shutting down and restarting thus causing each new request to warmup and recache itself. It's so sporadic that it's tough to trouble shoot right now. The same request to the same server comes back fast as expected (back to back N requests) since it was cached in about 300ms....but wait about 5-10-20min and that same request to the same server takes 16seconds.
I have limited tracing to go by as these are prod systems so I can only expose so much logging details. Any help and information attacking this or similar behavior somebody else has run into is appreciated. Thx
UPDATE:
The w3wpe.exe process grows to ~3G. Somehow it gets wiped out and the PID changes so itself or something is killing it every 3-4min I see tons of warnings in my webserver (IIS) log:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
After 4-5 days of assessing IIS and configuration vs internal code issues I finally found the issue with little to no help with windbg or debugdiag IIS tools. Those tools contain so much information even with mini dumps or log trace stacks that they can be red herrings. Best bet was to reproduce it by setting up a "copy intelligently" instance of a production system, which we did not have at the time and took a bit for ops to set something up.
Needless to say the problem had to do with over cacheing business objects. There was one race condition where updates on a certain table were updating an attribute to that corresponding business object (updates were coming from multiple servers) which was causing an OOC stackoverflow that pretty much caused the cacheing to recursively cache itself to death thus causing the w3wp.exe process to die and psuedo-recycle itself. It was one of those edge cases that was incredibly hard to test and repro in a non-production environment.

AppFabric velocity as a state server

Is anybody using Windows AppFabric Server for out of process state management?
Any feedback, advice would be appreciated.
Using AppFabric Caching. We tried this and it appeared to work, was easy to setup etc. There are some very strange settings when setting up the cache about peristance which need to be read carefully.
Our issue was on two server we installed IIS and Appfabric Caching and told app to try the local one first. When we went into production is just started to fail. It appears that with only two servers there is a lead server which if it goes down things stop working, we read that if needed to scale to 3 or more servers to get the behaviour we wanted. Not an option when just gone live and not working so we switched to SQL server for now while we look at nCache and ScaleOut and Memchached
The other issue is that caching and session state are not the same animal, if you loose your cache it should not be the end of the world just put it back together, we need to keep session state for the lotted time period at all costs.

Jboss failover testing

I have a peculiar situation here. I have installed JBoss 5.1.0 as a service in Wintel box.
The service will restart itself if the JBoss instance fails.
However I could not find a way to test this scenario. I killed the JVM that was running the JBoss, but it did not restart the service. I need to make the JBoss service end abnormally so that I can ensure it is restarts again.
In a nutsehll, I need a way to make JBoss end abnormally.
Please help.
Write a JSP that calls System.exit(1)? That might fall foul of the Security Manager, though, and JBoss might not permit it.
In my experience, JBoss nodes (and app servers in general) tend not to crash in such a way as the process exists. Instead, they're more likely to consume increasing resources (e.g. memory) until they stop responding, and need explicit restarting. That's certainly easier to reproduce, but it's harder to handle automatically.

Error 1053: the service did not respond to the start or control request in a timely fashion

I have recently inherited a couple of applications that run as windows services, and I am having problems providing a gui (accessible from a context menu in system tray) with both of them.
The reason why we need a gui for a windows service is in order to be able to re-configure the behaviour of the windows service(s) without resorting to stopping/re-starting.
My code works fine in debug mode, and I get the context menu come up, and everything behaves correctly etc.
When I install the service via "installutil" using a named account (i.e., not Local System Account), the service runs fine, but doesn't display the icon in the system tray (I know this is normal behavior because I don't have the "interact with desktop" option).
Here is the problem though - when I choose the "LocalSystemAccount" option, and check the "interact with desktop" option, the service takes AGES to start up for no obvious reason, and I just keep getting
Could not start the ... service on Local Computer.
Error 1053: the service did not respond to the start or control request in a timely fashion.
Incidentally, I increased the windows service timeout from the default 30 seconds to 2 minutes via a registry hack (see http://support.microsoft.com/kb/824344, search for TimeoutPeriod in section 3), however the service start up still times out.
My first question is - why might the "Local System Account" login takes SOOOOO MUCH LONGER than when the service logs in with the non-LocalSystemAccount, causing the windows service time-out? what's could the difference be between these two to cause such different behavior at start up?
Secondly - taking a step back, all I'm trying to achieve, is simply a windows service that provides a gui for configuration - I'd be quite happy to run using the non-Local System Account (with named user/pwd), if I could get the service to interact with the desktop (that is, have a context menu available from the system tray). Is this possible, and if so how?
Any pointers to the above questions would be appreciated!
After fighting this message for days, a friend told me that you MUST use the Release build. When I InstallUtil the Debug build, it gives this message. The Release build Starts fine.
If you continue down the road of trying to make your service interact with the user's desktop directly, you'll lose: even under the best of circumstances (i.e. "before Vista"), this is extremely tricky.
Windows internally manages several window stations, each with their own desktop. The window station assigned to services running under a given account is completely different from the window station of the logged-on interactive user. Cross-window station access has always been frowned upon, as it's a security risk, but whereas previous Windows versions allowed some exceptions, these have been mostly eliminated in Vista and later operating systems.
The most likely reason your service is hanging on startup, is because it's trying to interact with a nonexistent desktop (or assumes Explorer is running inside the system user session, which also isn't the case), or waiting for input from an invisible desktop.
The only reliable fix for these issues is to eliminate all UI code from your service, and move it to a separate executable that runs inside the interactive user session (the executable can be started using the global Startup group, for example).
Communication between your UI code and your service can be implemented using any RPC mechanism: Named Pipes work particularly well for this purpose. If your communications needs are minimal, using application-defined Service Control Manager commands might also do the trick.
It will take some effort to achieve this separation between UI and service code: however, it's the only way to make things work reliably, and will serve you well in the future.
ADDENDUM, April 2010: Since this question remains pretty popular, here's a way to fix another common scenario that causes "service did not respond..." errors, involving .NET services that don't attempt any funny stuff like interacting with the desktop, but do use Authenticode signed assemblies: disable the verification of the Authenticode signature at load time in order to create Publisher evidence, by adding the following elements to your .exe.config file:
<configuration>
<runtime>
<generatePublisherEvidence enabled="false"/>
</runtime>
</configuration>
Publisher evidence is a little-used Code Access Security (CAS) feature: only in the unlikely event that your service actually relies on the PublisherMembershipCondition will disabling it cause issues. In all other cases, it will make the permanent or intermittent startup failures go away, by no longer requiring the runtime to do expensive certificate checks (including revocation list lookups).
I faced this problem because of a missing framework on the box running my service. The box had .NET 4.0 and the service was written on top of .NET 4.5.
I installed the following download on the box, restarted, and the service started up fine:
http://www.microsoft.com/en-us/download/details.aspx?id=30653
To debug the startup of your service, add the following to the top of the OnStart() method of your service:
while(!System.Diagnostics.Debugger.IsAttached) Thread.Sleep(100);
This will stall the service until you manually attach the Visual Studio Debugger using Debug -> Attach to Process...
Note: In general, if you need a user to interact with your service, it is better to split the GUI components into a separate Windows application that runs when the user logs in. You then use something like named pipes or some other form of IPC to establish communication between the GUI app and your service. This is in fact the only way that this is possible in Windows Vista.
In service class within OnStart method don't do huge operation, OS expect short amount of time to run service, run your method using thread start:
protected override void OnStart(string[] args)
{
Thread t = new Thead(new ThreadStart(MethodName)); // e.g.
t.Start();
}
I'm shooting blind here, but I've very often found that long delays in service startups are directly or indirectly caused by network function timeouts, often when attemting to contact a domain controller when looking up account SIDs - which happens very often indirectly via GetMachineAccountSid() whether you realize it or not, since that function is called by the RPC subsystem.
For an example on how to debug in such situations, see The Case of the Process Startup Delays on Mark Russinovich's blog.
If you are using Debug code as below in your service the problem may arise.
#if(!DEBUG)
ServiceBase[] ServicesToRun;
ServicesToRun = new ServiceBase[]
{
new EmailService()
};
ServiceBase.Run(ServicesToRun);
#else
//direct call function what you need to run
#endif
To fix this, while you build your windows service remove #if condition because it didn't work as it is.
Please use argument for debug mode instead as below.
if (args != null && args.Length > 0)
{
_isDebug = args[0].ToLower().Contains("debug");
}
In my case the problem was missing version of .net framework.
My service used
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
</startup>
But .net Framework version of server was 4, so by changing 4.5 to 4 the problem fixed:
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0" />
</startup>
Copy the release DLL or get the dll from release mode rather than Debug mode and paste it to installation folder,,it should work
I was running into a similar problem with a Service I was writing. It worked fine then one day I started getting the timeout on Start errors. It happened in one &/or both Release and Debug depending on what was going on. I had instantiated an EventLogger from System.Diagnostics, but whatever error I was seeing must have been happening before the Logger was able to write...
If you are not aware of where to look up the EventLogs, in VS you can go to your machine under the Server Explorer. I started poking around in some of the other EventLogs besides those for my Service. Under Application - .NETRuntime I found the Error logs pertinent to the error on startup. Basically, there were some exceptions in my service's constructor (one turned out to be an exception in the EventLog instance setup - which explained why I could not see any logs in my Service EventLog). On a previous build apparently there had been other errors (which had caused me to make the changes leading to the error in the EventLog set up).
Long story short - the reason for the timeout may be due to various exceptions/errors, but using the Runtime EventLogs may just help you figure out what is going on (especially in the instances where one build works but another doesn't).
Hope this helps!
Install the debug build of the service and attach the debugger to the service to see what's happening.
I want to echo mdb's comments here. Don't go this path. Your service is not supposed to have a UI... "No user interaction" is like the definining feature of a service.
If you need to configure your service, write another application that edits the same configuration that the service reads on startup. But make it a distinct tool -- when you want to start the service, you start the service. When you want to configure it, you run the configuration tool.
Now, if you need realtime monitoring of the service, then that's a little trickier (and certainly something I've wished for with services). Now you're talking about having to use interprocess communications and other headaches.
Worst of all, if you need user interaction, then you have a real disconnect here, because services don't interact with the user.
In your shoes I would step back and ask why does this need to be a service? And why does it need user interaction?
These two requirements are pretty incompatible, and that should raise alarms.
I had this problem and it drove me nuts for two days…
If your problem similar to mine:
I have settings “User settings” in my windows service, so the service can do self-maintenance, without stopping and starting the service. Well, the problem is with the “user settings”, where the config file for these settings is saved in a folder under the user-profile of the user who is running the windows service under the service-exe file version.
This folder for some reason was corrupted. I deleted the folder and service start working back again happily as usual…
I had this problem, it took about a day to fix. For me the problem was that my code skipped the "main content" and effectively ran a couple of lines then finished. And this caused the error for me. It is a C# console application which installs a Windows Service, as soon as it tried to run it with the ServiceController (sc.Run() ) then it would give this error for me.
After I fixed the code to go to the main content, it would run the intended code:
ServiceBase.Run(new ServiceHost());
Then it stopped showing up.
As lots of people have already said, the error could be anything, and the solutions people provide may or may not solve it. If they don't solve it (like the Release instead of Debug, adding generatePublisherEvidence=false into your config, etc), then chances are that the problem is with your own code.
Try and get your code to run without using sc.Run() (i.e. make the code run that sc.Run() would have executed).
This problem usually occurs when there is some reference missing on your assembly and usually the binding fails at the run time.
to debug put Thread.Sleep(1000) in the main(). and put a break point in the next line of execution.
Then start the process and attach the debugger to the process while it is starting. Press f5 after it hit the break point. It will throw the exception of missing assembly or reference.
Hopefully this will resolve this error.
Once try to run your exe file. I had the same problem, but when I ran it direct by double click on the exe file, I got a message about .Net framework version, because I was released the service project with a framework which it wasn't installed on target machine.
Took me hours, should have seen the event viewer get_AppSettings().
A change in the app config, caused the problem.
Adding 127.0.0.1 crl.microsoft.com to the "Hosts" file solved our issue.
My issue was due to target framework mentioned in windows service config was
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6"/>
</startup>
and my server in which I tried to install windows service was not supported for this .Net version.
Changing which , I could able to resolve the issue.
I had a similar issue, steps I followed:
Put a Debugger.Launch() in the windows service constructor
Followed step by step to see where it got stuck
My issue wasn't due to any error.
I had a BlockingCollection.GetConsumingEnumerable() in the way that caused the windows service to wait.
I had this problem too. I made it to work by changing Log On account to Local System Account. In my project I had it setup to run as Local Service account. So when I installed it, by default it was using Local Service. I'm using .net 2.0 and VS 2005. So installing .net 1.1 SP1 wouldn't have helped.
Both Local System Account and Local Service would not work for me, i then set it to Network Service and this worked fine.
In my case, I had this trouble due to a genuine error. Before the service constructor is called, one static constructor of member variable was failing:
private static OracleCommand cmd;
static SchedTasks()
{
try
{
cmd = new OracleCommand("select * from change_notification");
}
catch (Exception e)
{
Log(e.Message);
// "The provider is not compatible with the version of Oracle client"
}
}
By adding try-catch block I found the exception was occuring because of wrong oracle version. Installing correct database solved the problem.
I also faced similar problem and found that there was issue loading assembly. I was receiving this error immediately when trying to start the service.
To quickly debug the issue, try to run service executable via command prompt using ProcDump http://technet.microsoft.com/en-us/sysinternals/dd996900. It shall provide sufficient hint about exact error.
http://bytes.com/topic/net/answers/637227-1053-error-trying-start-my-net-windows-service helped me quite a bit.
This worked for me. Basically make sure the Log on user is set to the right one. However it depends how the account infrastructure is set. In my example it's using AD account user credentials.
In start up menu search box search for 'Services'
-In Services find the required service
-right click on and select the Log On tab
-Select 'This account' and enter the required content/credentials
-Ok it and start the service as usual
In case you have a windows form used for testing, ensure that the startup object is still the service and not the windows form
We have Log4Net configured to log to a database table. The table had grown so large that the service was timing out trying to log messages.
open the services window as administrator,Then try to start the service.That worked for me.
Build project in Release Mode.
Copy all Release folder files to source path.
Execute Window service using command prompt window in administrative access.
Never delete files from source path.
At lease this works for me.
Release build did not work for me, however, I looked through my event viewer and Application log and saw that the Windows Service was throwing a security exception when it was trying to create an event log. I fixed this by adding the event source manually with administration access.
I followed this guide from Microsoft:
open registry editor, run --> regedit
Locate the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\Application
Right-click the Application subkey, point to New, and then click Key.
Type event source name used in your windows service for the key name.
Close Registry Editor.

Resources