CryptoAPI: CERT_STORE_PROV_MEMORY and thread safety - winapi

I'm writing a client/server solution using CryptoAPI to provide SSL encryption over a TCP socket. In the client I have a global CERT_STORE_PROV_MEMORY certificate store that I share between all connections (i.e. multiple threads).
My question is whether this is thread safe? Can multiple threads call functions (e.g. CertGetIssuerCertificateFromStore()) on the certificate store at the same time?

CertGetIssuerCertificateFromStore() is a reading function. So, concurrent usage of them is safe. Taken CERT_CONTEXT will be a copy of existing one, so it can be modified as you wish.

Related

Two-way communication between kernel-mode driver and user-mode application?

I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.
You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).
There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx
Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.

Best way to communicate from KEXT to Daemon and block until result is returned from Daemon

In KEXT, I am listening for file close via vnode or file scope listener. For certain (very few) files, I need to send file path to my system daemon which does some processing (this has to happen in daemon) and returns the result back to KEXT. The file close call needs to be blocked until I get response from daemon. Based on result I need to some operation in close call and return close call successfully. There is lot of discussion on KEXT communication related topic on the forum. But they are not conclusive and appears be very old (year 2002 around). This requirement can be handled by FtlSendMessage(...) Win32 API. I am looking for equivalent of that on Mac
Here is what I have looked at and want to summarize my understanding:
Mach message: Provides very good way of bidirectional communication using sender and reply ports with queueing mechansim. However, the mach message APIs (e.g. mach_msg, mach_port_allocate, bootstrap_look_up) don't appear to be KPIs. The mach API mach_msg_send_from_kernel can be used, but that alone will not help in bidirectional communication. Is my understanding right?
IOUserClient: This appears be more to do with communicating from User space to KEXT and then having some callbacks from KEXT. I did not find a way to initiate communication from KEXT to daemon and then wait for result from daemon. Am I missing something?
Sockets: This could be last option since I would have to implement entire bidirectional communication channel from KEXT to Daemon.
ioctl/sysctl: I don't know much about them. From what I have read, its not recommended option especially for bidirectional communication
RPC-Mig: Again I don't know much about them. Looks complicated from what I have seen. Not sure if this is recommended way.
KUNCUserNotification: This appears to be just providing notification to the user from KEXT. It does not meet my requirement.
Supported platform is (10.5 onwards). So looking at the requirement, can someone suggest and provide some pointers on this topic?
Thanks in advance.
The pattern I've used for that process is to have the user-space process initiate a socket connection to the KEXT; the KEXT creates a new thread to handle messages over that socket and sleeps the thread. When the KEXT detects an event it needs to respond to, it wakes the messaging thread and uses the existing socket to send data to the daemon. On receiving a response, control is passed back to the requesting thread to decide whether to veto the operation.
I don't know of any single resource that will describe that whole pattern completely, but the relevant KPIs are discussed in Mac OS X Internals (which seems old, but the KPIs haven't changed much since it was written) and OS X and iOS Kernel Programming (which I was a tech reviewer on).
For what it's worth, autofs uses what I assume you mean by "RPC-Mig", so it's not too complicated (MIG is used to describe the RPC calls, and the stub code it generates handles calling the appropriate Mach-message sending and receiving code; there are special options to generate kernel-mode stubs).
However, it doesn't need to do any lookups, as automountd (the user-mode daemon to which the autofs kext sends messages) has a "host special port" assigned to it. Doing the lookups to find an arbitrary service would be harder.
If you want to use the socket established with ctl_register() on the KExt side, then beware: The communication from kext to user space (via ctl_enqueuedata()) works OK. However opposite direction is buggy on 10.5.x and 10.6.x.
After about 70.000 or 80.000 send() calls with SOCK_DGRAM in the PF_SYSTEM domain complete net stack breaks with disastrous consequences for complete system (hard turning off is the only way out). This has been fixed in 10.7.0. I workaround by using setsockopt() in our project for the direction from user space to kext as we only send very small data (just to allow/disallow some operation).

Many-to-many messaging on local machine without broker

I'm looking for a mechanism to use to create a simple many-to-many messaging system to allow Windows applications to communicate on a single machine but across sessions and desktops.
I have the following hard requirements:
Must work across all Windows sessions on a single machine.
Must work on Windows XP and later.
No global configuration required.
No central coordinator/broker/server.
Must not require elevated privileges from the applications.
I do not require guaranteed delivery of messages.
I have looked at many, many options. This is my last-ditch request for ideas.
The following have been rejected for violating one or more of the above requirements:
ZeroMQ: In order to do many-to-many messaging a central broker is required.
Named pipes: Requires a central server to receive messages and forward them on.
Multicast sockets: Requires a properly configured network card with a valid IP address, i.e. a global configuration.
Shared Memory Queue: To create shared memory in the global namespace requires elevated privileges.
Multicast sockets so nearly works. What else can anyone suggest? I'd consider anything from pre-packaged libraries to bare-metal Windows API functionality.
(Edit 27 September) A bit more context:
By 'central coordinator/broker/server', I mean a separate process that must be running at the time that an application tries to send a message. The problem I see with this is that it is impossible to guarantee that this process really will be running when it is needed. Typically a Windows service would be used, but there is no way to guarantee that a particular service will always be started before any user has logged in, or to guarantee that it has not been stopped for some reason. Run on demand introduces a delay when the first message is sent while the service starts, and raises issues with privileges.
Multicast sockets nearly worked because it manages to avoid completely the need for a central coordinator process and does not require elevated privileges from the applications sending or receiving multicast packets. But you have to have a configured IP address - you can't do multicast on the loopback interface (even though multicast with TTL=0 on a configured NIC behaves as one would expect of loopback multicast) - and that is the deal-breaker.
Maybe I am completely misunderstanding the problem, especially the "no central broker", but have you considered something based on tuple spaces?
--
After the comments exchange, please consider the following as my "definitive" answer, then:
Use a file-based solution, and host the directory tree on a Ramdisk to insure good performance.
I'd also suggest to have a look at the following StackOverflow discussion (even if it's Java based) for possible pointers to how to manage locking and transactions on the filesystem.
This one (.NET based) may be of help, too.
How about UDP broadcasting?
Couldn't you use a localhost socket ?
/Tony
In the end I decided that one of the hard requirements had to go, as the problem could not be solved in any reasonable way as originally stated.
My final solution is a Windows service running a named pipe server. Any application or service can connect to an instance of the pipe and send messages. Any message received by the server is echoed to all pipe instances.
I really liked p.marino's answer, but in the end it looked like a lot of complexity for what is really a very basic piece of functionality.
The other possibility that appealed to me, though again it fell on the complexity hurdle, was to write a kernel driver to manage the multicasting. There would have been several mechanisms possible in this case, but the overhead of writing a bug-free kernel driver was just too high.

Multi-threaded Windows Service - Erlang

I am going to tell the problem that I have to solve and I need some suggestions if i am in the right path.
The problem is:
I need to create a Windows Service application that receive a request and do some action. (Socket communication) This action is to execute a script (maybe in lua or perl).This script models te bussiness rules of the client, querying in Databases, making request in websites and then send a response to the client.
There are 3 mandatory requirements:
The service will receive a lot of request at the same time. So I think to use the worker's thread model.
The service must have a high throughput. I will have many of requests at the same second.
Low Latency: I must response these requests very quickly.
Every request will generate a log entries. I cant write these log entries in the physical disk at same time the scripts execute because the big I/O time. Probably I will make a queue in memory and others threds will consume this queue and write on disk.
In the future, is possible that two woker's thread have to change messages.
I have to make a protocol to this service. I was thinking to use Thrift, but i don't know the overhead involved. Maybe i will make my own protocol.
To write the windows service, i was thinking in Erlang. Is it a good idea?
Does anyone have suggestions/hints to solve this problem? Which is the better language to write this service?
Yes, Erlang is a good choice if you're know it or ready to learn. With Erlang you don't need any worker thread, just implement your server in Erlang style and you'll receive multithreaded solution automatically.
Not sure how to convert Erlang program to Windows service, but probably it's doable.
Writing to the same log file from many threads are suboptimal because requires locking. It's better to have a log-entries queue (lock-free?) and a separate thread (Erlang process?) that writes them to the file. BTW, are you sure that executing external script in another language is much faster than writing a log-record to the file?
It's doubtfully you'll receive much better performance with your own serialization library than Thrift provides for free. Another option is Google Protocol Buffers, somebody claimed that it's faster.
Theoretically (!) it's possible that Erlang solution won't provide you required performance. In this case consider a compilable language, e.g. C++ and asynchronous networking, e.g. Boost.Asio. But be ready that it's much more complicated than Erlang way.

Using OpenSSL without threads

Whenever I bash into OpenSSL on Windows or Mac I always make my own memory BIOs, and link them up to the platforms message based (asynchronous non blocking) socket implementation. (WSAAsyncSelect on windows: CFSocket on Mac)
Secure programming with the OpenSSL API hosted on ibm.com seems to be the best reference on implementing OpenSSL - but it implements a very simple blocking connection.
Is there a standard way to setup and use OpenSSL with non blocking sockets - such that calls to SSL_read will not block if there is no data for example?
SSL_read() (and the other SSL functions) work fine if the underlying socket is set non-blocking. If insufficient data is available, it will return a value less than zero; SSL_get_error() called on the return value will return SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, indicating what SSL is waiting for.
Using BIO_set_nbio with either BIO_new_socket or BIO_new_connect/accept is probably less code than creating memory BIOs. Not sure if there's anything more standard than that. The docs explain this in more detail:
http://www.openssl.org/docs/crypto/BIO_s_connect.html

Resources