Understanding Xlib Failed Requests - x11

Without going into too much detail (I am looking for debugging techniques here), I would like to understand how to better debug Xlib failed requests. In particular dealing with the glx extension. The occurrence of the bug I am fighting is in a complex place within my application and trying to pull it apart to provide a small sample here is not possible.
With that said the failed request I am seeing is
x10: fatal 10 error 11 (Resource temporarily unavailable) on X server ":0.0"
after 46 requests (46 know processed) with 0 event remaining.
X Error of failed request: BadAccess (attempt to access private resource denied)
Major opcode of failed request: 135 (GLX)
Minor opcode of failed request: 5 (XGLMakeCurrent)
Serial number of failed request: 46
Current serial number in output stream: 46
I can see where the problem is being caused by stepping through with the debugger. However, I can't fully discern why it is happening.

The clue where to look is in the name of the extension and the name of the request itself. Unfortunately in this case because of your use of Xgl this isn't so helpful. But you can check what the request really is by checking out the protocol documentation like this one for glproto. From that you can see that the request is really glxMakeCurrent. Then you just need to find the documentation or code for that request.
The GLX specification says glxMakeCurrent will give BadAccess if "the context is current to some other thread".
Now, your error is about XGLMakeCurrent which is an implementation detail of Xgl . But from reading the implementation of this function it passes through to the the underlying GLX implementation.
To fix your problem I suggest you try and identify if that context is being used in another thread.

For debugging you want to turn on synchronous requests, this slows down your code but will make every X request wait until the server processes it before continuing and immediately return an error. You can turn it on with
XSynchronize(display, True);
Now you will get the X error at the routine that caused the issue and can use standard debugging tools from there.

Related

What does Non HTTP response message: Socket closed means?

What does Non HTTP response message: Socket closed means ?
During load test we see 30% of the time , call to third party failed.
This is a SocketException and according to description:
Thrown to indicate that there is an error creating or accessing a Socket.
In your situation this is the latter case and "Socket closed" means that JMeter either tries to send a message to the socket which has already been closed or attempts to read from the closed socket.
In the majority of cases the error indicates the problem with the server so you should not be worrying about JMeter side of things, however if you are absolutely sure that the system under test behaves correctly you can follow recommendations from JMeterSocketClosed wiki page where several workarounds are listed.
Also with regards to your "call to third party" stanza, ideally you should be focusing only on the domain(s) which is(are) related to the application you're testing, all 3rd-party stuff like images, banners, videos, maps, whatever should be excluded from your testing scope because:
you might not be allowed to create excessive traffic for these components as it can be considered a DoS attack
even if you detect a problem with a 3rd-party component it won't be something you will be able to efficiently control or fix
See Excluding Domains from the Load Test article for more information and reasons for re-considering your approach

ZeroMQ assertion failed: socket handle no longer valid for some reason

Got a Windows 10 c++ program using ZeroMQ that aborts very often on the same group of computers due to assertion failures.
The assert statement is buried deep into the libzmq code.
On other machines, the same program runs fine without those problems (but in all fairness, that's with different OS build numbers and program configurations).
The assertion failure seems to happen because internal zeromq (socket and/or pipe based) connection(s)/handles get unexpectedly closed.
What could possibly cause something like that?
More information:
The assertion failure seems to have something to do with the channels/mailboxes that ZeroMQ uses for internal signaling. In older versions of the library this works with several loopback TCP sockets while modern versions rely on a solution involving IOCP (I/O completion ports).
Here's a long standing and possibly related issue where the original author himself talked about a similar crash that happened to him:
https://github.com/zeromq/libzmq/issues/1108
Working with the crash dumps of our application I see that the stack trace leading to the assert statement usually happens at point right after attempting to read from a socket (or socket file descriptor?). The read or receive action fails and then the library panics.
So, suddenly a socket handle no longer seems valid. Examples of errors that I see are "The resource is temporarily unavailable" and things like "Invalid handle/parameter".
Can it be that something or someone is forcefully closing the socket for us?
What could be causing this behavior?
This happens for an old version of zeromq (4.0.10) as well as a modern one (4.3.5). This leads me to believe that the fault is somewhere else if such different implementations fail roughly the same way.
When trying to reproduce the problem I can trigger a similar assertion failure for 4.0.x by manually force closing an internal TCP connection that ZeroMQ uses with TCPView. The resulting assertion failure is instant and the crash dump looks identical to what happens in the wild.
But the modern version doesn't seem to use loopback sockets, so I couldn't close the "private" connections there. Maybe they are using pipes or unix style sockets instead (which is now possible on Windows 10 I have heard).
For a moment I have considered ephemeral port exhaustion as a reason for all this trouble but that alone doesn't make sense to me: I don't expect the OS to force close existing connections, existing connections should keep working. You'd expect only new connections to fail then.
As #user253751 suggested, the culprit seems to be a particular piece of code in the application that closes the same HANDLE twice. A serious bug in our code, not ZeroMQ!
On Windows, closed handles immediately get reused, so anything that is opened right after the first CloseHandle is at risk of being unexpectely closed when the second CloseHandle strikes, due to the bug.

net-snmp mfd implemented writable objects

I am trying to extend agent to support my mib. I was using old api before but this time I decided going into the newer one. So I started by using mib2c.mfd.conf as mib2c configuration file and after some effort by following ifTable tutorial in net-snmp site I succeeded in processing get requests. But in the case of set requests I face the error:
Error in packet.
Reason: notWritable (That object does not support modification)
Failed object: ZT400-CONF-MIB::nodeLoc.1
and no debugging log is outputted by my agent. (In the case of get I receive something like this):
internal:nodeTable:_mfd_nodeTable_pre_request: called
verbose:nodeTable:nodeTable_pre_request: called
internal:nodeTable:_mfd_nodeTable_object_lookup: called
verbose:nodeTable:nodeTable_row_prep: called
internal:nodeTable:_mfd_nodeTable_get_values: called
internal:nodeTable:_mfd_nodeTable_get_column: called for 2
verbose:nodeTable:nodeName_get: called
internal:nodeTable:_mfd_nodeTable_post_request: called
verbose:nodeTable:nodeTable_post_request: called
And for set request as it stated in read-me file I expect receiving at least first two lines.
what is the problem? can anyone help?
Is there a working tutorial for implementing snmp settable objects with net-snmp mfd?

bad internet connection and PFQueryCollectionViewController

I was testing my app and I noticed this error message popping up from the PFQueryCollectionViewController:
2015-07-06 21:40:58.445 Noms[320:29335] [Error]: The Internet connection appears to be offline. (Code: 100, Version: 1.7.5)
2015-07-06 21:40:58.446 Noms[320:29335] [Error]: Network connection failed. Making attempt 2 after sleeping for 1.604623 seconds.
This was expected, since my phone was not connected to the internet. However, I want to detect this error and handle it myself, rather having the endless loading scroll on the screen. Looking at the documentation didn't yield any variables that I thought could be useful.
Does anyone know how I might receive this callback so I can handle them?
Parse provides a "cancel" method on all PFQueries, including the one in PFQueryTableViewController. In theory, you could cancel your query after a timeout of your choosing. Unfortunately, many developers are reporting a bug in that method that prevents it from working.
My best recommendation recommendation is to check for internet connection before firing your query. Apple's Reachability project is a great tool for this.

What does CFNetwork error -4 in domain CFStreamErrorHTTP mean?

A customer is reporting a connection failure with a strange profile: it apparently only fails for the very first URL request via CFNetwork since the app has launched.
The error code apparently being returned by CFNetwork is domain CFStreamErrorHTTP, but with error code -4, which does not correspond to any publicly defined error code for this domain.
In CFHTTPStream.h, the publicly defined error codes for CFStreamErrorHTTP end auspiciously at -3, strongly hinting that -4 may be an error code that Apple is using but which has not yet been publicly documented.
Any idea what's going on here? Has anybody else seen this error code and found rhyme or reason for it?
Probably not the final answer and this may have changed since they closed sourced CFNetwork, but I did find the following online which indicates that -4 is a connection lost error.
http://www.opensource.apple.com/source/CFNetwork/CFNetwork-129.9/HTTP/CFHTTPConnection.c
I guess you'll have to show some of the code that's failing, but a few questions spring to mind. First, can you trace this issue yourself, can you reproduce it? In particular, it would be interesting to see on which thread this happens, and what's the current runLoop mode. It could be indicative of a stream or connection that fails scheduling on the internal CF runloops.
Other than this (and it's a shame CFNetwork is no longer publicly updated), it could be a zillion things, but you'll need to log as much information as you can if you can't directly debug the failure (hint hint -- https://github.com/fpillet/NSLogger can help you remotely log the info from the client).
Finally, ask the question on Mac Dev Forums (or iOS Dev Forums if your code runs on iOS). Ping Quinn, He Knows It All. Once he can't publicly answer the question, open a DTS incident and send him the ticket #. He's the guy you want to look into your problem :-)

Resources