Risc-V: Minimum CSR requirements for simple RV32I implementation capable of leveraging GCC - gcc

What would be the bare minimum CSR requirements for a RV32I capable of running machine code generated with GCC?
I'm thinking of a simple fpga-based (embedded) implementation. No virtual memory or linux support is required.
Also, what GCC flags should I use in order to prevent it from using unimplemented CSR related instructions?
I'm still quite confused after scanning through the RISCV Privileged ISA Specification.
Thanks!

Have a look at the RARS simulator as an example of a simple RISC V implementation.  It implements sufficient CSRs (e.g. the exception cause, processor status, exception pc, vector table address, etc..) that you can program an interrupt handler.
You'll need:
utvec — sets the exception handler address
ustatus — to enable/disable interrupts,
uscratch — needed by software exception handler,
ucause — tells the reason for exception
uepc — tells the address of uncompleted instruction at exception
And some others.  In RARS, you can see the registers implemented in the register display, Control and Status tab.
I believe RARS supports the timer, so has some
CSRs for that.  It also provides a floating point unit, so some CSRs
for exceptions for that as well as rounding configuration.  For
handling memory access exceptions, it has utval.  And then it
offers some counters.  See also table 2.2 in Document Version
20190608-Priv-MSU-Ratified
I would think that your usage of CSRs would be restricted to standalone application configuration, e.g. initial bootup, and interrupt handling, both of which would be written in assembly.
Hard to imagine that compiled C code (object files, .o's) would touch the CSRs in any way.  If you have an example of that, please share it.
In some environments, the C implementation allows for standalone (e.g. unhosted) programs.  It is possible that such a program created by some compiler includes startup configuration and an exception handler though more likely that these would be user supplied.  See, for example, http://cs107e.github.io/guides/gcc/

Related

is there a workqueue feature in xnu kernel?

I need to use workqueue-like feature on Mac OSX (kernel mode driver) and am looking for a way to add work into a queue to be processed by a kernel thread later. Conceptually this is the same thing as workqueue feature available in Linux kernel. Is there something similar on XNU kernel as well?
I don't think there's a direct equivalent as such, although I admit I'm not intimately familiar with the Linux side, so I'll avoid comparing and just tell you about what's available on macOS/xnu.
I/O Kit IOWorkLoops
If you're building an I/O Kit driver, and especially if you're writing a secondary interrupt handler, you'll be using IOWorkLoops. Interrupts are abstracted by IOEventSource objects, which schedule secondary interrupt handlers to run on the driver's IOWorkLoop.
Each IOWorkLoop wraps one kernel thread and also provides a serialisation/locking mechanism for resources shared with that thread. All jobs submitted to a workloop either explicitly through an IOCommandGate or the workloop object directly, or as a result of an IOEventSource event will be serialised. Note that IOCommandGate jobs will run synchronously on the calling thread, not the workloop thread.
As always with macOS/OSX internals, you will want to look at the header file comments and possibly the implementation in the xnu source for details. I personally find IOWorkLoops a bit clumsy for some tasks, but if you're dealing with PCI devices, etc. you don't really have a choice.
thread_call
A more lightweight background work mechanism is the thread_call API. It's defined in <kern/thread_call.h> and supports running functions on an OS-managed background thread, optionally after a delay or with a specific priority. This is probably closer to what you know from Linux, has a fairly straightforward API, but is not suitable for secondary interrupt handlers.

Is it possible to allow a particular user-level application to run in kernel-mode?

This is a hypothetical question. Suppose there is an application (which typically executes in user mode) that wants to access kernel data structures, read register values, and perform some kernel-level functions.
Is there a way for kernel and/or CPU to allow this application to perform its functions while maintaining the normal user-level/kernel-level isolation for other applications except this one?
In order to either put your app in kernel space (kernel memory) or to run it in ring 0 CPU mode, you will need to do that from kernel code. In normal state of operation you can't run app from the kernel with mentioned privileges (at least there is no existing API to do that). It's probably possible to implement some kernel code which is able of this. But it will be tricky and will mess up the whole concept of kernel-space/user-space separation, and if any advanced user-space API was used -- it won't work anyway.
If you are thinking about just giving your app ring 0 privileges -- it won't work either, because kernel has its own stack and because of kernel-space/user-space memory separation, so you won't be able to run internal kernel API.
Basically, you can achieve the same thing by writing kernel module instead. And for running some kernel code on behalf of user-space app -- you can use system calls interface.
So, answering your question: no, it's not possible to run user-space app in kernel mode so it can use internal kernel API.

EXE size bloats while using Websocketpp

I've built a very basic EXE which uses Websocketpp client, which just connects to a Websocket server, and sends and receives a mesage.
I've used VS 2013.
I'm noticing that the size of the EXE is mammoth. It's like 2.3 MB for Release and 6 MB for Debug.
Any ideas as to how I can reduce the size of EXE??
WebSocket++ author here. The sizes you quote seem about the right ballpark. Keep in mind that a "very basic sample" like the echo_server (which produces a ~1MB executable on linux) does a lot more than you might think based on the ~50 lines in the program source.
Out of the box any WebSocket++/Asio based program is a high performance event based client/server system and includes code for DNS resolution, IPv4 and IPv6, timers, SHA1/MD5 hashing, base64 encoding, UTF8 validation, logging, thread safety, and parsers for URIs, HTTP, and multiple WebSocket protocol versions. Just because you only use these capabilities to echo back messages doesn't make this a trivial program.
Some observations/notes on the topic:
Due to the way templates work, the code for WebSocket++, ASIO, and the STL is compiled into your program rather than sitting in an externally linked library. This may make a WebSocket++ or Asio program look artificially larger than a program that links to an external library.
The situation described in #1 can sometimes end up more efficient than an external library because this program will only include the parts of the library that your code actually uses, rather than all parts. I.e. If you don't instantiate a client endpoint no client code will be included. If your config disables TLS encryption, logging, or the thread safety features they will also not be included. Again due to the way templates work this can go both ways. For example: A program that includes both a client and a server will have some potentially unnecessary duplication.
The size of WebSocket++'s code is largely correlated to the number of different endpoint configs that you use and the options enabled in each of those configs. These represent a fixed size no matter what else your program does. If your program does little, they will make up a large proportion of the code. If your program does a lot that proportion will shrink.
WebSocket++ is fairly modular (though this is less well documented right now). If you are really concerned about code size (small embedded systems perhaps?) and don't actually need all the features that Asio and WebSocket++ bring out of the box, you can set up a custom config that either removes many features or replaces them with your own space optimized implementations.
Say you only ever need to service one non-TLS connection with no DNS lookup and no security timeouts in a guaranteed single threaded program with no logging. You can implement your own network transport policy based on your native OS socket library that doesn't include all the stuff that Asio does. You can also stub out the locking/concurrency and logger policies you don't need.

Who enforces dwShareMode on CreateFile? The OS or the Driver?

I have a Windows application that interacts with some hardware. A handle to the hardware is opened with CreateFile, and we control the hardware using DeviceIoControl.
I'm attempting to update an application which uses this hardware to open the hardware in an exclusive mode, so that other programs can't access the hardware at the same time (the hardware has mutable state that I can't have changed out from under me). I do this by passing 0 as the dwShareMode parameter to CreateFile. After making this change, I am still able to run two separate instances of my application. Both calls to CreateFile in both processes are successful. Neither returns INVALID_HANDLE_VALUE.
I believe one of several things is happening, and I'm asking for help narrowing the problem down.
I badly misunderstand the dwShareMode parameter
dwShareMode doesn't have any effect on DeviceIoControl - only ReadFile or WriteFile
The driver itself is somehow responsible for respecting the dwShareMode parameter and our driver is written badly. This, sadly, isn't totally unheard of.
Edit Option 2 is nonsense. dwShareMode should prevent the 2nd CreateFile from happening, DeviceIoControl has nothing to do with it. It must be option #1 or option #3
The Question:
Is the device driver responsible for looking at the dwShareMode parameter, and rejecting requests if someone has already opened a handle without sharing, or is the OS responsible?
If the device driver is responsible, then I'm going to assume #3 is happening. If the OS is responsible, then it must be #1.
Some additional Clues:
IRP_MJ_CREATE documentation suggests that the sharing mode does indeed get passed down to the device driver
I believe that sharing rules are only enforced on some devices. In many (most?) cases enforcing sharing rules on the device object itself (as opposed to on objects within the device namespace) would make no sense.
Therefore, it must be the responsibility of the device driver to enforce these rules in those rare cases where they are required. (Either that or the device driver sets a flag to instruct the operating system to do so, but there doesn't seem to be a flag of this sort.)
In the case of a volume device, for example, you can open the device with a sharing mode of 0 even though the volume is mounted. [The documentation for CreateFile says you must use FILE_SHARE_WRITE but this does not appear to be true.]
In order to gain exclusive access to the volume, you use the FSCTL_LOCK_VOLUME control code.
[That's a file system driver, so it might not be a typical case. But I don't think it makes a difference in this context.]
Serial port and LPT drivers would be an example of a device that should probably enforce sharing rules. I think there may be some applicable sample code, perhaps this would shed light on things?
Edited to add:
I've had a look through the Windows Research Kernel source (this is essentially the same as the Windows Server 2003 kernel) and:
1) The code that opens a device object (by sending IRP_MJ_CREATE to the driver) does not appear to make any attempt to enforce the sharing mode parameter, though it does check access permissions and enforces the Exclusive flag for the driver;
2) I've also searched the code for references to the structure member that holds the requested dwShareMode. As far as I can see it is written into the relevant structure by the internal function that implements CreateFile, and later passed to the device driver, but otherwise ignored.
So, my conclusion remains the same: enforcing the sharing mode rules, or providing an alternative mechanism if appropriate, is the responsibility of the device driver.
(The kernel does, however, provide functions such as IoCheckShareAccess to assist file system drivers in enforcing sharing rules.)
In cases where we open a COM port with :
CreateFile(devname,GENERIC_READ | GENERIC_WRITE,
0,
0,
OPEN_EXISTING,
0,
0);
It doesnt allow another application to open the same COM port until the previous handle is closed. I would suggest walking through serenum.sys to check if it has a role here.

Two-way communication between kernel-mode driver and user-mode application?

I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.
You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).
There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx
Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.

Resources