What do the fields in rpcdebug -c's dmesg output mean? - linux-kernel

I'm trying to track down a stall that may or may not be on the client host, but on the server side. Unfortunately this is all kernel RPC level, with some of if not controlled by my code...
I'm something of a neophyte when it comes to rpc debugging, so any references would be helpful.
I have this output in dmesg when rpcdebug -c -m all is run:
[2109401.599881] -pid- flgs status -client- --rqstp- -timeout ---ops--
[2109401.600055] 51580 0880 0 ffff9af4c4da9800 ffff9af4c4416600 15000 ffffffffc0b26680 nfsv3 GETATTR a:call_status [sunrpc] q:xprt_pending
[2109401.600300] 51581 0880 0 ffff9af4c4da9800 ffff9af42465f800 15000 ffffffffc0b26680 nfsv3 GETATTR a:call_status [sunrpc] q:xprt_pending
I get the PID, flags, and timeout, but:
What are the "client" and "rqstp" fields supposed to mean? If I see either duplicated in subsequent outputs, does that mean the RPC is stalled? And which way?
Is "xprt_pending" a "waiting to send the RPC" queue? If that queue was "delayq," I would know what that means in the context of the problem we're trying to diagnose. But this state doesn't seem to be explained anywhere I find in Google. (And my Google-Fu is usually better than this...)
What is the "ffffffffc0b26680" supposed to be? It repeats all over the output, for EVERY RPC listed.
I'm trying to avoid running with rpcdebug set, because I'm dealing with an intermittent stall, and I'd rather not slow EVERYTHING down in the hopes of catching the stall.

Related

Why would socket.write hang indefinitely?

What would make a write call to a TCPSocket hang indefinitely?
lotsOfBytes = # a really large number of bytes, like 1 or 2 MB of data
socket = TCPSocket.new # some config
socket.write(lotsOfBytes) # this line hangs
I am trying to debug an issue where a get_multi operation sent to memcached with a large number of keys hangs indefinitely, and it does so on a line that resembles that code snippet. I'm trying to better understand how the low-level sockets on which this library is built are expected to work.
What are the values of following attributes on your TCPSocket:
Keep-alive activated and what value is set?
Timeout set and what value is set?
If you will do a Wireshark dump, it's much better so see what happens before hanging connection.
tcpdump? are there any attempts to send anything?
netstat - for see output queue.
does it work on a small number of bytes in your environment?

ChromeCast - Stream Calls Failing when Stalled for long time

I'm attempting to play a live stream on ChromeCast. The stream is thrown fine and starts playback appropriately.
However when I play the stream longer: somewhere between 2-15 minutes, the player stops playing and I get MediaStatus.IDLE_REASON_ERROR in my RemoteMediaClient.Callback.
When looking at the console logs from ChromeCast I see that 3-4 calls are failed. Here are the logs:
14:50:26.931 GET https://... 0 ()
14:50:27.624 GET https://... 0 ()
14:50:28.201 GET https://... 0 ()
14:50:29.351 GET https://... 0 ()
14:50:29.947 media_player.js:64 [1381.837s] [cast.player.api.Host] error: cast.player.api.ErrorCode.NETWORK/3126000
Looking at Cast MediaPlayer.ErrorCode Error 312.* is
Failed to retrieve the media (bitrated) playlist m3u8 file with three retries.
Developers need to validate that their playlists are indeed available. It could be the case that a user that cannot reach the playlist as well.
I checked, the playlist was available. So I thought perhaps the server wasn't responding. So I looked at the network calls response logs.
Successful Request
Stalled Request
Note that the stall time far exceeds the usual stall time.
ChromeCast isn't making these calls at all, the requests are simply stalled for a long time until they are cancelled. All the successful requests are stalled for less than 14ms (mostly under 2ms).
The Network Analysis Timing Breakdown provides three reasons for stalling
There are higher priority requests.
There are already six TCP connections open for this origin, which is the limit. Applies to HTTP/1.0 and HTTP/1.1 only.
The browser is briefly allocating space in the disk cache
While I do believe the first one should not be the case, the later two can be. However in both cases I believe the fault lies with cast.player.
Am I doing something wrong?
Has anyone else faced the same issue? Is there any way to either fix it or come up with a work-around.

WinAPI - Why does ICMPSendEcho2Ex report false timeouts when Timeout is set below 1000ms?

Edit: I started asking this as a PowerShell / .Net question and couldn't find any reference to it on the internet. With feedback, it appears to be a WINAPI issue so this is an edit/rewrite/retag, but much of the tests and background reference .Net for that reason.
Summary
WINAPI ping function IcmpSendEcho2 appears to have a timing bug if the ping timeout parameter is set below 1000ms (1 second). This causes it to return intermittent false timeout errors. Rather than a proportional "lower timeout = more fails" behaviour, it appears to be a cutoff of >=1000ms+ is expected behaviour, <=999ms triggers false timeouts, often in an alternating success/fail/success/fail pattern.
I call them false timeouts because I have WireShark packet capture showing a reply packet coming back well within the timeout, and partly because the 1ms change shouldn't be a significant amount of time when the replies normally have 500-800ms of headroom, and partly because I can run two concurrent sets of pings with different timeouts and see different behavior between the two.
In the comments of my original .Net question, #wOxxOm has:
located the Open Source .Net code where System.Net.NetworkInformation.Ping() wraps the WinAPI and there appears to be no specific handling of timeouts there, it's passed down to the lower layer directly - possibly line 675 with a call to UnsafeNetInfoNativeMethods.IcmpSendEcho2()
and #Lieven Keersmaekers has investigated and found things beyond my skill level to interpret:
"I can second this being an underlying WINAPI problem. Following a success and timedout call into IPHLPAPI!IcmpSendEcho2Ex: the 000003e7 parameter is on the stack, both set up an event and both return into IPHLPAPI!IcmpEchoRequestComplete with the difference of the success call's eax register containing 00000000 and the timedout call's eax register containing 00000102 (WAIT_TIMEOUT)
"Compiling a 64bit C# version, there's no more calls into IPHLPAPI. A consistent thing that shows up is clr.dll GetLastError() returning WSA_QOS_ADMISSION_FAILURE for timeouts. Also consistent in my sample is the order of thread executions between a success and a timeout call being slightly different."
This StackOverflow question hints that the WSA_QOS_ADMISSION_FAILURE might be a misleading error, and is actually IP_REQ_TIMED_OUT.
Testing steps:
Pick a distant host and set some continuous pings running. (The IP in my examples belongs to Baidu.cn (China) and has ping replies around ~310ms to me). Expected behaviour for all of them: almost entirely ping replies, with occasional drops due to network conditions.
PowerShell / .Net, with 999ms timeout, actual result is bizarre reply/drop/reply/drop patterns, far more drops than expected:
$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) {
$Pinger.Send('111.13.101.208', 999)
start-sleep -Seconds 1
}
command prompt ping.exe with 999ms timeout, actual result is more reliable (edit: but later findings call this into question as well):
ping 111.13.101.208 -t -w 999
PowerShell / .Net, with 1000ms timeout, actual result is as expected:
$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) {
$Pinger.Send('111.13.101.208', 1000)
start-sleep -Seconds 1
}
It's repeatable with C# as well, but I've edited that code out now it seems to be a WinAPI problem.
Example screenshot of these running side by side
On the left, .Net with 999ms timeout and 50% failure.
Center, command prompt, almost all replies.
On the right, .Net with 1000ms timeout, almost all replies.
Earlier investigations / Background
I started with a 500ms timeout, and the quantity of fake replies seems to vary depending on the ping reply time of the remote host:
pinging something 30ms away reports TimedOut around 1 in 10 pings
pinging something 100ms away reports TimedOut around 1 in 4 pings
pinging something 300ms away reports TimedOut around 1 in 2 pings
From the same computer, on the same internet connection, sending the same amount of data (32 byte buffer) with the same 500ms timeout setting, with no other heavy bandwidth use. I run no antivirus networking filter outside Windows 10 defaults, two other people I know have confirmed this frequent TimedOut behaviour on their computers (now two more have in the comments), with more or fewer timeouts, so it ought not to be my network card or drivers or ISP.
WireShark packet capture of ping reply which is falsely reported as a timeout
I ran the ping by hand four times to a ~100ms away host, with a 500ms timeout, with WireShark capturing network traffic. PowerShell screenshot:
WireShark screenshot:
Note that the WireShark log records 4 requests leaving, 4 replies coming back, each with a time difference of around 0.11s (110 ms) - all well inside the timeout, but PowerShell wrongly reports the last one as a timeout.
Related questions
Googling shows me heaps of issues with System.Net.NetworkInformation.Ping but none which look the same, e.g.
System.Net.NetworkInformation.Ping crashing - it crashes if allocated/destroyed in a loop in .Net 3.5 because its internals get wrongly garbage collected. (.Net 4 here and not allocating in a loop)
Blue screen when using Ping - 6+ years of ping being able to BSOD Windows (not debugging an Async ping here)
https://github.com/dotnet/corefx/issues/15989 - it doesn't timeout if you set a timeout to 1ms and a reply comes back in 20ms, it still succeeds. False positive, but not false negative.
The documentation for Ping() warns about the low-timeout-might-still-say-success but I can't see it warns that timeout might falsely report failure if set below 1 second.
Edit: Now that I'm searching for ICMPSendEcho2, I have found exactly this problem documented before in a blog post in May 2015 here: https://www.frameflow.com/ping-utility-flaw-in-windows-api-creating-false-timeouts/ - finding the same behavior, but having no further explanation. They say that ping.exe is affected, when I originally thought it wasn't. They also say:
"In our tests we could not reproduce it on Windows Server 2003 R2 nor on Windows Server 2008 R2. However it was seen consistently in Windows Server 2012 R2 and in the latest builds of Windows 10."
Why? What's wrong with the timeout handling that makes it ignore ping repsonses coming into the network stack completely? Why is 1000ms a significant cutoff?

MongoDB-Java performance with rebuilt Sync driver vs Async

I have been testing MongoDB 2.6.7 for the last couple of months using YCSB 0.1.4. I have captured good data comparing SSD to HDD and am producing engineering reports.
After my testing was completed, I wanted to explore the allanbank async driver. When I got it up and running (I am not a developer, so it was a challenge for me), I first wanted to try the rebuilt sync driver. I found performance improvements of 30-100%, depending on the workload, and was very happy with it.
Next, I tried the async driver. I was not able to see much difference between it and my results with the native driver.
The command I'm running is:
./bin/ycsb run mongodb -s -P workloads/workloadb -p mongodb.url=mongodb://192.168.0.13:27017/ycsb -p mongodb.writeConcern=strict -threads 96
Over the course of my testing (mostly with the native driver), I have experimented with more and less threads than 96; turned on "noatime"; tried both xfs and ext4; disabled hyperthreading; disabled half my 12 cores; put the journal on a different drive; changed sync from 60 seconds to 1 second; and checked the network bandwidth between the client and server to ensure its not oversubscribed (10GbE).
Any feedback or suggestions welcome.
The Async move exceeded my expectations. My experience is with the Python Sync (pymongo) and Async driver (motor) and the Async driver achieved greater than 10x the throughput. further, motor is still using pymongo under the hoods but adds the async ability. that could easily be the case with your allanbank driver.
Often the dramatic changes come from threading policies and OS configurations.
Async needn't and shouldn't use any more threads than cores on the VM or machine. For example, if you're server code is spawning a new thread per incoming conn -- then all bets are off. start by looking at the way the driver is being utilized. A 4 core machine uses <= 4 incoming threads.
On the OS level, you may have to fine-tune parameters like net.core.somaxconn, net.core.netdev_max_backlog, sys.fs.file_max, /etc/security/limits.conf nofile and the best place to start is looking at nginx related performance guides including this one. nginx is the server that spearheaded or at least caught the attention of many linux sysadmin enthusiasts. Contrary to popular lore one should reduce your keepalive timeout opposed to lengthen it. The default keep-alive timeout is some absurd (4 hours) number of seconds. you might want to cut the cord in 1 minute. basically, think a short sweet relationship with your clients connections.
Bear in mind that Mongo is not Async so you can use a Mongo driver pool. nevertheless, don't let the driver get stalled on slow queries. cut it off in 5 to 10 seconds using the following equivalents in Java. I'm just cutting and pasting here with no recommendations.
# Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. If max_time_ms is None no limit is applied.
# Raises TypeError if max_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.
CONN_MAX_TIME_MS = None
# socketTimeoutMS: (integer) How long (in milliseconds) a send or receive on a socket can take before timing out. Defaults to None (no timeout).
CLIENT_SOCKET_TIMEOUT_MS=None
# connectTimeoutMS: (integer) How long (in milliseconds) a connection can take to be opened before timing out. Defaults to 20000.
CLIENT_CONNECT_TIMEOUT_MS=20000
# waitQueueTimeoutMS: (integer) How long (in milliseconds) a thread will wait for a socket from the pool if the pool has no free sockets. Defaults to None (no timeout).
CLIENT_WAIT_QUEUE_TIMEOUT_MS=None
# waitQueueMultiple: (integer) Multiplied by max_pool_size to give the number of threads allowed to wait for a socket at one time. Defaults to None (no waiters).
CLIENT_WAIT_QUEUE_MULTIPLY=None
Hopefully you will have the same success. I was ready to bail on Python prior to async

Jmeter TCP Sampler

We are running JMeter for connecting TCP Socket thorugh BinaryTCPClientImpl , We are getting the response code : 500
Response message: org.apache.jmeter.protocol.tcp.sampler.ReadException
JMeter Version : 2.9
Help out
If this is the error
ERROR - jmeter.protocol.tcp.sampler.TCPSampler: org.apache.jmeter.protocol.tcp.sampler.ReadException:
at org.apache.jmeter.protocol.tcp.sampler.BinaryTCPClientImpl.read(BinaryTCPClientImpl.java:140)
at org.apache.jmeter.protocol.tcp.sampler.TCPSampler.sample(TCPSampler.java:414)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:429)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Unknown Source)
then you have 2 options. The first (and much easier if it applies
to you) is to use the LengthPrefixedBinaryTCPClientImpl. If this
applies to you, that is, if your responses are always the same fixed
sizes, you can simply set the tcp.binarylength.prefix.length property
and go about your business.
If that is not the case, then your other option is to extend
org.apache.jmeter.protocol.tcp.sampler.TCPClient. It may help to get in
touch with the client team of this proprietary protocol, because after
all, they have implemented something that works. You'll probably have
to extend it to look something like LengthPrefixedBinaryTCPClientImpl
read N bytes. Although, this runs the risks of reading too many or too
few bytes. If your application server ever miscalculates the size of
it's output, you suffer the consequences by getting another timeout or
leaving extra bytes in the buffer and reading them on the next iteration
(and then cascading errors).

Resources