I am having one MIB SAF-CKPT-MIB
When I am trying the command
snmpget -v2c -c public -mALL (IP_address) SAF-CKPT-MIB::saCkptCheckpointMaxSectionSize.14.118.100.115.95.118.100.101.115.116.95.100.98.95.49 SAF-CKPT-MIB::saCkptNodeReplicaType.14.118.100.115.95.118.100.101.115.116.95.100.98.95.49.14.115.97.102.78.111.100.101.61.83.67.95.50.95.50
I am getting the message "Timeout: No Response from IP_address"
When I am changing the order like
snmpget -v2c -c public -mALL (IP_address) SAF-CKPT-MIB::saCkptNodeReplicaType.14.118.100.115.95.118.100.101.115.116.95.100.98.95.49.14.115.97.102.78.111.100.101.61.83.67.95.50.95.50 SAF-CKPT-MIB::saCkptCheckpointMaxSectionSize.14.118.100.115.95.118.100.101.115.116.95.100.98.95.49
It is working fine .....
My question is how this changing the order is making difference here ??
I hope my question is clear ...
The message "Timeout: No Response from IP_address" is indicating that the snmp server at IP_address is not responding within the timeout period that snmpget is using (iirc it's 5 seconds by default).
Either the snmp server at IP_address is not responding at all in the first instance or is responding too slowly. This can be tested by increasing the timeout. eg:
snmpget -v2c -c public -mALL -t 60 (IP_address)
Of course it could also be tested by capturing the packets using Wireshark as suggested by Lex Li.
Any change in behaviour due to the order of the MIB variables in the request sounds like a problem in the implementation of this MIB at the snmp server.
Rarely saw such order related issue, as generally speaking the order does not have an impact on the resulting SNMP packets. But of course, capturing network traffic with either Wireshark or Microsoft Network Monitor can show some hints under the hood.
Lex Li
http://sharpsnmplib.codeplex.com
Related
I'm trying to track down a stall that may or may not be on the client host, but on the server side. Unfortunately this is all kernel RPC level, with some of if not controlled by my code...
I'm something of a neophyte when it comes to rpc debugging, so any references would be helpful.
I have this output in dmesg when rpcdebug -c -m all is run:
[2109401.599881] -pid- flgs status -client- --rqstp- -timeout ---ops--
[2109401.600055] 51580 0880 0 ffff9af4c4da9800 ffff9af4c4416600 15000 ffffffffc0b26680 nfsv3 GETATTR a:call_status [sunrpc] q:xprt_pending
[2109401.600300] 51581 0880 0 ffff9af4c4da9800 ffff9af42465f800 15000 ffffffffc0b26680 nfsv3 GETATTR a:call_status [sunrpc] q:xprt_pending
I get the PID, flags, and timeout, but:
What are the "client" and "rqstp" fields supposed to mean? If I see either duplicated in subsequent outputs, does that mean the RPC is stalled? And which way?
Is "xprt_pending" a "waiting to send the RPC" queue? If that queue was "delayq," I would know what that means in the context of the problem we're trying to diagnose. But this state doesn't seem to be explained anywhere I find in Google. (And my Google-Fu is usually better than this...)
What is the "ffffffffc0b26680" supposed to be? It repeats all over the output, for EVERY RPC listed.
I'm trying to avoid running with rpcdebug set, because I'm dealing with an intermittent stall, and I'd rather not slow EVERYTHING down in the hopes of catching the stall.
What would make a write call to a TCPSocket hang indefinitely?
lotsOfBytes = # a really large number of bytes, like 1 or 2 MB of data
socket = TCPSocket.new # some config
socket.write(lotsOfBytes) # this line hangs
I am trying to debug an issue where a get_multi operation sent to memcached with a large number of keys hangs indefinitely, and it does so on a line that resembles that code snippet. I'm trying to better understand how the low-level sockets on which this library is built are expected to work.
What are the values of following attributes on your TCPSocket:
Keep-alive activated and what value is set?
Timeout set and what value is set?
If you will do a Wireshark dump, it's much better so see what happens before hanging connection.
tcpdump? are there any attempts to send anything?
netstat - for see output queue.
does it work on a small number of bytes in your environment?
Is there any way to get the status of the registration of scalar in net-snmp ?
In my case there are around 14 scalars and registration of those taking time and returning "no such object found" error when I issue snmpget on localhost for ~5 seconds after target configuration and later the reply for snmpget is positive.
Should I keep snmpd in disabled state till the registration of scalars is done ?
Thanks in advance !
Edit: I started asking this as a PowerShell / .Net question and couldn't find any reference to it on the internet. With feedback, it appears to be a WINAPI issue so this is an edit/rewrite/retag, but much of the tests and background reference .Net for that reason.
Summary
WINAPI ping function IcmpSendEcho2 appears to have a timing bug if the ping timeout parameter is set below 1000ms (1 second). This causes it to return intermittent false timeout errors. Rather than a proportional "lower timeout = more fails" behaviour, it appears to be a cutoff of >=1000ms+ is expected behaviour, <=999ms triggers false timeouts, often in an alternating success/fail/success/fail pattern.
I call them false timeouts because I have WireShark packet capture showing a reply packet coming back well within the timeout, and partly because the 1ms change shouldn't be a significant amount of time when the replies normally have 500-800ms of headroom, and partly because I can run two concurrent sets of pings with different timeouts and see different behavior between the two.
In the comments of my original .Net question, #wOxxOm has:
located the Open Source .Net code where System.Net.NetworkInformation.Ping() wraps the WinAPI and there appears to be no specific handling of timeouts there, it's passed down to the lower layer directly - possibly line 675 with a call to UnsafeNetInfoNativeMethods.IcmpSendEcho2()
and #Lieven Keersmaekers has investigated and found things beyond my skill level to interpret:
"I can second this being an underlying WINAPI problem. Following a success and timedout call into IPHLPAPI!IcmpSendEcho2Ex: the 000003e7 parameter is on the stack, both set up an event and both return into IPHLPAPI!IcmpEchoRequestComplete with the difference of the success call's eax register containing 00000000 and the timedout call's eax register containing 00000102 (WAIT_TIMEOUT)
"Compiling a 64bit C# version, there's no more calls into IPHLPAPI. A consistent thing that shows up is clr.dll GetLastError() returning WSA_QOS_ADMISSION_FAILURE for timeouts. Also consistent in my sample is the order of thread executions between a success and a timeout call being slightly different."
This StackOverflow question hints that the WSA_QOS_ADMISSION_FAILURE might be a misleading error, and is actually IP_REQ_TIMED_OUT.
Testing steps:
Pick a distant host and set some continuous pings running. (The IP in my examples belongs to Baidu.cn (China) and has ping replies around ~310ms to me). Expected behaviour for all of them: almost entirely ping replies, with occasional drops due to network conditions.
PowerShell / .Net, with 999ms timeout, actual result is bizarre reply/drop/reply/drop patterns, far more drops than expected:
$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) {
$Pinger.Send('111.13.101.208', 999)
start-sleep -Seconds 1
}
command prompt ping.exe with 999ms timeout, actual result is more reliable (edit: but later findings call this into question as well):
ping 111.13.101.208 -t -w 999
PowerShell / .Net, with 1000ms timeout, actual result is as expected:
$Pinger = New-Object -TypeName System.Net.NetworkInformation.Ping
while($true) {
$Pinger.Send('111.13.101.208', 1000)
start-sleep -Seconds 1
}
It's repeatable with C# as well, but I've edited that code out now it seems to be a WinAPI problem.
Example screenshot of these running side by side
On the left, .Net with 999ms timeout and 50% failure.
Center, command prompt, almost all replies.
On the right, .Net with 1000ms timeout, almost all replies.
Earlier investigations / Background
I started with a 500ms timeout, and the quantity of fake replies seems to vary depending on the ping reply time of the remote host:
pinging something 30ms away reports TimedOut around 1 in 10 pings
pinging something 100ms away reports TimedOut around 1 in 4 pings
pinging something 300ms away reports TimedOut around 1 in 2 pings
From the same computer, on the same internet connection, sending the same amount of data (32 byte buffer) with the same 500ms timeout setting, with no other heavy bandwidth use. I run no antivirus networking filter outside Windows 10 defaults, two other people I know have confirmed this frequent TimedOut behaviour on their computers (now two more have in the comments), with more or fewer timeouts, so it ought not to be my network card or drivers or ISP.
WireShark packet capture of ping reply which is falsely reported as a timeout
I ran the ping by hand four times to a ~100ms away host, with a 500ms timeout, with WireShark capturing network traffic. PowerShell screenshot:
WireShark screenshot:
Note that the WireShark log records 4 requests leaving, 4 replies coming back, each with a time difference of around 0.11s (110 ms) - all well inside the timeout, but PowerShell wrongly reports the last one as a timeout.
Related questions
Googling shows me heaps of issues with System.Net.NetworkInformation.Ping but none which look the same, e.g.
System.Net.NetworkInformation.Ping crashing - it crashes if allocated/destroyed in a loop in .Net 3.5 because its internals get wrongly garbage collected. (.Net 4 here and not allocating in a loop)
Blue screen when using Ping - 6+ years of ping being able to BSOD Windows (not debugging an Async ping here)
https://github.com/dotnet/corefx/issues/15989 - it doesn't timeout if you set a timeout to 1ms and a reply comes back in 20ms, it still succeeds. False positive, but not false negative.
The documentation for Ping() warns about the low-timeout-might-still-say-success but I can't see it warns that timeout might falsely report failure if set below 1 second.
Edit: Now that I'm searching for ICMPSendEcho2, I have found exactly this problem documented before in a blog post in May 2015 here: https://www.frameflow.com/ping-utility-flaw-in-windows-api-creating-false-timeouts/ - finding the same behavior, but having no further explanation. They say that ping.exe is affected, when I originally thought it wasn't. They also say:
"In our tests we could not reproduce it on Windows Server 2003 R2 nor on Windows Server 2008 R2. However it was seen consistently in Windows Server 2012 R2 and in the latest builds of Windows 10."
Why? What's wrong with the timeout handling that makes it ignore ping repsonses coming into the network stack completely? Why is 1000ms a significant cutoff?
I would like to know if I can get a breakdown of response times in JMeter load tests. E.g. when I use curl I can get the breakdown of each response time by specifying curl format like so,
\n
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
\n
and then making the actual curl call like so,
curl -w "#curl-format.txt" "http://some.api/call"
As you can see this gives me the breakdown in terms of time spent doing a DNS Name resolution, connecting with the server, transferring response form server to the client etc.
Is it possible to get something similar in JMeter?
So I have at least found a way to get what I want, partially.
In Jmeter I can collect the Connect time, which is a combination of DNS lookup, handshake & connection.
If someone has a better answer, would be happy to know it.