What does Non HTTP response message: Socket closed means? - jmeter

What does Non HTTP response message: Socket closed means ?
During load test we see 30% of the time , call to third party failed.

This is a SocketException and according to description:
Thrown to indicate that there is an error creating or accessing a Socket.
In your situation this is the latter case and "Socket closed" means that JMeter either tries to send a message to the socket which has already been closed or attempts to read from the closed socket.
In the majority of cases the error indicates the problem with the server so you should not be worrying about JMeter side of things, however if you are absolutely sure that the system under test behaves correctly you can follow recommendations from JMeterSocketClosed wiki page where several workarounds are listed.
Also with regards to your "call to third party" stanza, ideally you should be focusing only on the domain(s) which is(are) related to the application you're testing, all 3rd-party stuff like images, banners, videos, maps, whatever should be excluded from your testing scope because:
you might not be allowed to create excessive traffic for these components as it can be considered a DoS attack
even if you detect a problem with a 3rd-party component it won't be something you will be able to efficiently control or fix
See Excluding Domains from the Load Test article for more information and reasons for re-considering your approach

Related

What constitutes a true error in JMeter when the server always returns 200?

I am load testing an e-commerce site. Under medium to heavy loads, Jmeter reports an unusually high number of errors, i.e. succes=FALSE in the resulting .csv file which translates to a fairly high % errors in the JMeter dashboard report.
On inspecting the Kibana logs, other than a few WARN, there aren't any errors and the status for all the http requests are 200. In fact, I am able to verify that the test users created in my flow are able to login and view whatever the test was supposed to do.
My question is - how is JMeter determining this error (non-200 ?) when the server side returns nothing but 200? For fewer threads, the errors are 0%.
First I would check if any assertions can fail for any reasons. So disabling any assertions and running most primitive version of the script without any possibility of failure (other than based on sampler response) would be my first step. If it doesn't help...
HTTP Sampler will fail for return code < 200, or return code >= 400, or any Java exception, which will happen when request could not be sent, or response from the server was not received (in full). Most commonly those are timeouts related to various underlying problems:
Lack of available ports on JMeter side. Since you indicated that problem happens only with more JMeter users, it is a possibility. On paper you have 65535 ports, but both Windows and Linux are limiting number of available ports to few thousands by default. Make sure you have enough ports to run the load, and monitor port usage as you ramp threads up.
Even if all ports are available, many ports may be sitting for a few minutes in TIME_WAIT or CLOSE_WAIT state, depending on how your client and server interact and other network issues. So theoretically you may have 65k ports, but practically you don't have enough. In this case it's worth checking why it happens (can point to a bug) and possibly reduce the time ports will be waiting in this *_WAIT state.
You also need to make sure you have enough RAM, JVM heap and CPU: monitor to see if there's a bottleneck that can cause such slowness that breaks the ability of a Sampler to finish its work successfully.
If all of the above is working as expected, then either server or someone between client and server (load balancer, proxy, firewall) is causing this issue.
I would start from the server, and verify that all requests you are sending are received by a server. E.g. if you send 2000 requests, and server receives 1000, which finish successfully, then from server perspective everything is fine, but Jmeter will be at 50% error rate.
Also check your JMeter log definitions and make sure failures and exceptions are written to them properly (they are multi-line messages, and may cause log definitions to break when getting to them, and thus you wont' see them). For example run HTTP Sampler that points to non-available server. See if you can diagnose what returned exception was.
Beyond that - figuring out which part is "lying" to you would be an approach, but hard to offer any more specific advice based on current information

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

How does Google Docs autosave work?

Okay, I know it sounds generic. But I mean on an AJAX level. I've tried using Firebug to track the NET connections and posts and it's a mystery. Does anyone know how they do the instant autosave constantly without DESTROYING the network / browser?
My guess (and this is only a guess) is that google uses a PUSH service. This seems like the most viable option given their chat client (which is also integrated within the window) also uses this to delivery "real time" messages with minimal latency.
I'm betting they have a whole setup that manages everything connection related and send flags to trigger specific elements. You won't see connection trigers because the initial page visit establishes the connection then just hangs on the entire duration you have the page open. e.g.
You visit the page
The browser established a connection to [example]api.docs.google.com[/example] and remains open
The client-side code then sends various commands and receives an assortment of responses.
These commands are sent back and forth until you either:
Lose the connection (timeout, etc.) in which case it's re-established
The browser window is closed
Example of, how I see, a typical communication:
SERVER: CLIENT:
------- -------
DOC_FETCH mydocument.doc
DOC_CONTENT mydocument.doc 15616 ...
DOC_AUTOSAVE mydocument.doc 24335 ...
IM collaboratorName Hi Joe!
IM_OK collaboratorName OK
AUTOSAVE_OK mydocument.doc OK
Where the DOC_FETCH command is saying I want the data. The server replies with the corresponding DOC_CONTENT <docname> <length> <contents>. Then the client triggers DOC_AUTOSAVE <docname> <length> <content>. Given the number of potential simultaneous requests, I would bet they keep the "context" in the requests/responses so after something is sent it can be matched up. In this example, it knows the IM_OK matches the second request (IM), and the AUTOSAVE_OK matches the first request (AUTOSAVE)--Something like how AOL's IM protocol works.
Again, this is only a guess.
--
To prove this, use something like ethereal and see if you can see the information transferring in the background.

What is meaning of 'Blocking' in Firebug Net Panel?

I'm using Firebug 1.5.2 and while testing a site before production release i can see a huge amount of time consumed by 'Blocking' parts of the requests.
What exactly does the 'Blocking' mean?
"Blocking" previously (earlier versions of FireBug) was called "Queuing". It actually means that request is sitting in queue waiting for available connection. As far as I know number of persistent connections by default is limited in last versions of Firefox to 6, IE8 also 6. Earlier it was only 2. It can be changed by user in browser settings.
Also as I know that while javascript file is loading, all other resources (css, images) are blocked
Blocking is a term used to describe an event that stops other events or code from processing (within the same thread).
For example if you use "blocking" sockets then code after the socket request has been made will not be processed until the request is complete (within the same thread).
Asynchronous activities (non blocking) would simply make the request and let other code run whilst the request happened in the background.
In your situation it basically means that certain parts of firebug / the browser cannot activate until other parts are complete. I.e. it is waiting for an image to download before downloading more.
As far as I know, two reasons cause components to cause blocking others from loading.
Browser's enforced (but usually configurable) limit of how many parallel resources can be loaded from a particular host at a time.
Inline javascript, which can cause the browser to wait and see if it at all needs to go ahead with downloading the rest of the components (just in case the javascript redirects or replaces the content of the page)
It means "waiting for connection". As explained in the official documentation by Mozilla, "Blocking" is "Time spent in a queue waiting for a network connection." That can be due to Firefox hitting its internal parallel connections limit, as explained there and in answers here.
It can also mean "waiting because server is busy". One possible reason for "Blocking" times is missing in the official documentation linked above: it can happen when the server cannot provide a connection at the time because it is overloaded. In that case, the connection request goes into a queue on the server until it can be processed once a worker process becomes free [source].
In a technical sense, such a connection is not yet established because the request is awaiting accept() from the server [source]. And maybe that is why it is subsumed under "Blocking" by Firefox, as it could also be considered "Time spent in a queue waiting for a network connection".
(This behaviour is not fully consistent as of Firefox 51 though: for the first URL called up in a new tab, the time before the server accepts the connection request is not counted at all in the "Timings" tab – only for subsequent URLs entered. Either of both behaviours could be a bug, I don't know which one.)

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources