Tomcat is running a webapp under Windows. After a few days (under very low load), the exception mentioned in the title starts to appear in the logs, no new connections can be established from that point on, the only fix is then to reboot the server.
Environment:
Latest Tomcat 6
Windows Server 2008 R2
JDK 6 update 30
SQL Server 2008
Kerberos authentication
Evidence collected so far:
netstat shows no excessive amount of connections
ProcessExplorer shows no excessive amount of open file handles
system main memory usage is average
JVM heap usage is average
restarting Tomcat does not solve the problem
Open questions:
if we were leaking connections, shouldn't they show up in netstat?
shouldn't a restart of the appserver resolve the problem, because the OS should free all process resources?
is there a way to trace the problem to its origin? E.g. installing monitoring software, maybe something similar to lsof etc.?
I'm out of ideas, any hints appreciated!
The reason we got this error is a bug in Windows Server 2008 R2 / Windows 7. The kernel leaks loopback sockets due to a race condition on machines with more than one core, this patch fixes the issue:
http://support.microsoft.com/kb/2577795
I was running Alfresco Community 4.0d on Windows 7 64 bit and had the same symptoms and errors.
The problem was fixed with Microsoft's patch: "Kernel sockets leak on a multiprocessor computer that is running Windows Server 2008 R2 or Windows 7" (http://support.microsoft.com/kb/2577795) (ie. Buddy Casino's answer (see below)).
Another observation I'd like to add is that Windows connections (Internet Explorer, Remote Desktop etc) would work again about 5-10 mins after the Alfresco services were shutdown.
Alfresco is an excellent product and I was afraid I would have to scrap it. Fortunately stackoverflow came to the rescue !
Thanks again to Buddy Casino's answer.
Boo to the person who down-voted the Question.
We are seeing the same thing on a similar setup, W2008R2, Tomcat 6.0.29, Java 1.6.0.25. Restarting tomcat does not help, but restarting the server itself does, at least for a while. After the last time we started shutting down individual services and believe we have it narrowed down to either an instance of Alfresco that is also running on the server or the Backup Exec Agent services. After those services (four in total) were stopped, the applications in Tomcat started working again, although we were still seeing the buffer/connections error in the stdout log which was strange. Will need to wait for the problem to return before confirming which are the culprit, which could be anywhere from a few days to a week or more.
Any chance you are running either Alfresco or BE on your server?
Related
I'm having a very strange problem with an application in windows 10. It consists of several .exe in the same computer communicating between them with sockets using system.net.sockets library.
The problem I have is that after installing Windows 10 in a new computer, install all windows updates and then installing that application, connection to sockets doesn't work correctly and the application fails. The strangest thing is that if you leave the computer alone for 1-2 days the applications starts working just fine. The same has happened after installing version 1803 update, it stops working and then works one or two days later.
Any idea of what can it be? Has anyone seen something similar?
It really seems to be related to the 1803 update you mentioned.
Symptoms:
Running an application from a network share will fail when creating a socket;
Copying the very same application to a local drive/path will work just fine, without any further modification.
We are also struggling with this while connecting to an Oracle database (both ODBC and ODP.NET) and it seems the issue has recently been acknowledged:
https://support.oracle.com/knowledge/Oracle%20Database%20Products/2399465_1.html
It also seems this is a recurrent Windows bug:
Win Socket Creation fails with Error code 10022 if non super user
https://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/3076a9cd-57a0-418d-8de1-07adc3b486bb/socket-fails-with-error-10022-when-application-is-run-from-certain-network-shares-on-vista-and?forum=wsk
Sorry, no effective solution at the time (other than copying the app binaries to a local folder). I'll update this answer once we get a better solution.
OK, looking a little further I found here in SO that this might be related to a SMBv1 network share, which describes the environment we had here (the network share was disabled because of another bug we faced - thanks MSFT).
Re-enabling SMBv2 / SMBv3 on the server solved the issue.
Related post:
After Windows 10 update 1803 my program can't open a socket when running from network share
We have multiple applications developed in Visual Foxpro 8.0 running in a data center on Windows 2008 R2 on VMware. We also have a Citrix farm on the same network where users run yet another VFP 8.0 application in Citrix sessions. All applications share the same set of data tables located on a file server (also Windows 2008 R2 VM). Virtual hosts are connected by 10Gb LAN (managed switch).
Since mid-July we started seeing random 1104 "Error reading file..." errors on multiple different applications on multiple servers. All of them reference different files on the file server.
The problem started mid-July and it frequency gradually increased. Earlier it was most frequent in the afternoons by 3 pm, now it happens from early morning till late afternoon. It affects EDI servers (these run batch jobs in unattended mode) and Citrix servers and a variety of applications. It occurs when a VFP application (any of them) tries to open a database container file or individual tables most often with USE command but some times executing a SQL Select statement, or when loading a VFP form that opens tables in DataEnvironment
We caught a moment when the same exact error happened on two different servers running different applications at the same exact moment (up to a second). We also saw two different applications running on the same computer erroring out at the same moment.
We replaced the file server with a new virtual machine with no relief (we since changed it back to the old file server ).
We disabled the antivirus.
We updated VMware on all hosts to the latest version.
Sysinternals Process Monitor displays "INVALID_NETWORK_RESPONSE" event when the error occurs.
We captured traffic on both the server side and client side when the error occurred and had it analyzed by a network analysis specialist. He observed a peculiar pattern, where client OS starts retrieving the file in question from the file server AFTER VFP application had thrown an error. It seems that VFP application requests a file from OS, then it either gets an abnormal response or just times out and only after that the OS sends packets requesting the file. Again, this happens sporadically.
OpLocks and SMB2 have been disabled on all computers both on the server and client side of the equation for many years and everything was running smoothly until now...
Any advice would be greatly appreciated.
My first piece of advice would be to re-enable OpLocks and SMB2. There is no reason to mess with either of those items as things stand today and you are losing a huge amount of performance running at SMB1 level.
In my experience these issues have almost always been caused by one of the following.
Antivirus/antimalware software.
Replication or online backup software like MozyPro.
The Windows Search indexing service.
You should consider installing the Windows 7 / Server 2008 R2 Enterprise Hotfix Rollup if you haven't already.
That problem mostly related by SMB2!
Some Antivirus Software!
Windows updates! If you use VFP apps by DBF/DBC file. Do not update your system/OS. That is my personal suggestion. Windows Server 2012+ or Windows 10+ prorbably would big problems at near future.
And the point high probably is:
What is your I/O request per secs? if your IO request bigger than 1000~2000 per secs for a dbf file that is a bottle neck; and your storage device is HDD -> you need to switch/update your HDD to SSD. I suggest m.2 pro series SSD.
I'm having a very strange problem. We're running a Java application, using Hibernate on top of JDBC, on a Windows Server 2012, on a VM.
When we try to read a lot of data from an Oracle 12c database, it's systematically super slow.
But once we run Wireshark once... It's instantly 100x faster! And it stays like this until we reboot the machine, even if we close Wireshark afterwards.
Any explanations? It really sounds like issue with Windows network cards drivers..
Edit 1 : We ruled out Hibernate : the problem happens as well with only JDBC
Edit 2 : We ruled out WinPcap, Wireshark without it still fixes the problem.
I had a similar issue, but using a .Net application accessing a Oracle 12c database, and uninstalling WinPcap/Wireshark didn't solved the slowness problem.
As the environment in question runs under vmware, we have looked into issues related to the vmxnet3 driver (under Windows 2012 R2) and found, as a contouring solution, how to disable Receive Side Coalescing (RSC) feature on it. Without reboot, the problem is gone.
In Powershell:
Disable-NetAdapterRsc *
Source: After upgrading a virtual machine to hardware version 11, network dependent workloads experience performance degradation (2129176)
The LAN which has about a half dozen windows xp professional pcs and one windows 7 professional pc.
A jet/access '97 database file is acting as the database.
The method of acccess is via dao (DAO350.dll) and the front end app is written in vb6.
When an instance is created it immediately opens a global database object which it keeps open for the duration of its lifetime.
The windows 7 machine was acting as the fileserver for the last few months without any glitches.
Within the last week what's happened is that instances of the app will work for a while (say 30 mins) on the xp machines and then will fail on database operations, reporting connection errors (eg disk or network error or unable to find such and such a table.
Instances on the windows 7 machine work normally.
Moving the database file to one of the xp machines has the effect that the app works fine on ALL the xp machines but the error occurs on the windows 7 machine instead.
Just before the problem became apparent a newer version of the app was installed.
Uninstalling and installing the previous version did not solve the problem.
No other network changes that I know of were made although I am not entirely sure about this as the hardware guy did apparently visit about the same time the problems arose, perhaps even to do something concerning online backing up of data. (There is data storage on more than one computer) Apparently he did not go near the win 7 machine.
Finally I know not very much about networks so please forgive me if the information I provide here is superfluous or deficient.
I have tried turning off antivirus on the win 7 machine, restarting etc but nothing seems to work.
It is planned to move our database from jet to sql server express in the future.
I need some suggestions as to the possible causes of this so that I can investigate it further. Any suggestions would be gretly appreciated
UPDATE 08/02/2011
The issue has been resolved by the hardware guy who visited the client today. The problem was that on this particular LAN the IP addresses were allocated dynamically except for the Win 7 machine which had a static IP address.
The static address happened to lie within the range from which the dynamic addresses were being selected. This wasn't a problem until last week when a dynamic address was generated that matched the static one and gave rise to the problems I described above.
Thanks to everyone for their input and thanks for not closing the question.
Having smart knowledgeable people to call on is a great help when you're under pressure from an unhappy customer and the gaps in your own knowledge mean that you can't confidently state that your software is definitely not to blame.
I'd try:
Validate that same DAO and ODBC-drivers is used on both xp- and vista machines.
Is LAN single broadcast domain? If not, rewire. (If routers required make
sure WINS is working)
Upgrade to ms-sql. It could be just a day of well worth work, ;-)
regards,
//t
I've just installed Windows 7 x64 Ultimate on my desktop PC. I installed IIS, Visual Studio 2008, registered ASP.NET, etc.
I have this ASP.NET 3.5 website I'm working on running EXTREMELY slow on this new IIS. On STA and PROD servers (Windows 2003 Server) and on my old XP/IIS 5.1 everything runs smoothly.
A page which usually takes 1-2 seconds to load is taking 8 seconds!!!
I saw this post on IIS forum. It says something about Vista/7 not pooling connections (just to let you know, the website is running locally but it's connecting to a SQL Server 2005 hosted on a remote server).
It seems that it takes a while to "start loading" the page... I mean, I click refresh and it stays for several seconds "Waiting for localhost"... Then when it gets response it loads the whole page normally...
I don't have a clue how to force Win7/IIS7.5 to pool database connections.
EDIT: I've created a new empty ASP.NET web application to see if the problems happens too. The answer is no, it responds fast as it should with an empty default page. Maybe is something related to the DB connection. I will do a further test. It should be a way to fix it...
EDIT 2: Debugging the app I noticed that the delay occurs AFTER the execution of .NET code (Page_Load, etc)... so the delay seems to be somewhere when IIS serves the page to the browser.
For those having the same problem, here's two possible solution.
1) Disabling IPv6 support in Firefox (only for Firefox)
Most of the authors that I found out about suggest this approach as quickest and cleanest solution. What you need to do is basically to open configuration settings in Firefox (about:config) and to change network.dns.disableIPv6 setting to true.
2) Change localhost settings in your hosts file (all browsers)
This came to me as an idea to check where and how can I interfere in IPv6 settings on my machine. I saw one of the comments on above mentioned sources saying that one can get rid of the problem by simply replacing localhost with machine name in the url.
It didn’t take me long to check and see that disabling my IPv6 localhost lookup does the same thing as disabling IPv6 directly in Firefox.
What you need to do is basically to comment / delete this particular line in your hosts file:
#::1 localhost
Note: ::1 notation is IPv6 equivalent of the IPv4 127.0.0.1 lookup address.
I believe the second solution might be more suitable for users who do not want to disable IPv6 in general, and the first one for all others that still do not use IPv6 in their regular work.
I was having the same issue: extremely dead slow site performance using IIS 7.5 on Windows 7 64-bit with a Core 2 Duo with 4GB RAM and 3 Application Pool Processes running only 1 website. Here's what I did to get the speed back to IIS, problem solved...
The trick for me was to run IIS using 32-bit workers, as instructed by Microsoft on IIS.net, which you can read here:
http://learn.iis.net/page.aspx/201/32-bit-mode-worker-processes/
Simple solution provided (I don't want to rewrite it here)... Either you can run a 1-line command from the Windows Command Prompt or a 1-line command from Windows PowerShell. I just ran it from the command line (make sure you open Command Line or PowerShell as Administrator -- right-click > Run as Administrator).
Thanks,
Marty McGee
You can try running multiple processes as application pools:
Open IIS
Click Application Pools
Right click the app pool for your app
and click Advanced Settings
Find the
"Maximum Worker Processes" and update
it to 3 (or the number of processes
you want to allow to run).
I know the op was running IIS 7.5 and this may not apply to him, but I'm posting this as it might help others running IIS Express 8.0. I had the same problem and none of the IPv6 or hosts file changes worked for me. My asp.net MVC4 project was really slow after hitting F5 to refresh js changes on localhost. It was happening across all browsers - Chrome, FF, and IE. Eventually I discovered that IIS Express 8.0 is extremely slow when serving up js files and seems to be a bug. If I ran iisexpress on the command line and hit F5 I could see each js file took 4 or 5 seconds to load.
I ended up uninstalling IIS 8.0 and installing IIS express 7.5 and straight away the problem was fixed. Here are the steps I followed:
Uninstall IIS express 8.0
Delete the IISExpress folder (on Win 7 it's in My Documents\IISExpress)
Install IIS express 7.5 (Link to IIS Express 7.5 download)
IIS Express 8.0 seems to be installed with VS 2012 so if you had a new install or possibly a service pack update this might upgrade the previous IIS Express version.