Why jiffies protected by seq-lock but not directly use atomic operation - linux-kernel

Why jiffies protected by seq-lock but not directly use atomic operation?

seq-lock doesn't block the writer, which for the jiffie counter is time sensitive. The reader also doesn't block, just loops until the read succeeds.
An atomic increment is a read-modify-write, so the writer has to use a mechanism like CAS.

jiffies_64, being a 64-bit value, may not be atomic to read and/or write on all architectures without the use of expensive "lock" instruction prefixes. Reading the trailing portion (jiffies) that is as big as the CPU's common word and register size however, is sufficiently atomic.

Related

How to access physical address during interrupt handler linux

I wrote an interrupt handler in linux.
Part of the handler logic I need to access physical address.
I used iormap function but then I fell into KDB during handler time.
I started to debug it and i saw the below code which finally called by ioremap
What should I do? Is there any other way instead of map the region before?
If i will need to map it before it means that i will probably need to map and cache a lot of unused area.
BTW what are the limits for ioremap?
Setting up a new memory mapping is an expensive operation, which typically requires calls to potentially blocking functions (e.g. grabbing locks). So your strategy has two problems:
Calling a blocking function is not possible in your context (there is no kernel thread associated with your interrupt handler, so there is no way for the kernel to resume it if it had to be put to sleep).
Setting up/tearing down a mapping per IRQ would be a bad idea performance-wise (even if we ignore the fact that it can't be done).
Typically, you would setup any mappings you need in a driver's probe() function (or in the module's init() if it's more of a singleton thing). This mapping is then kept in some private device data structure, which is passed as the last argument to some variant of request_irq(), so that the kernel then passes it back as the second argument to the IRQ handler.
Not sure what you mean by "need to map and cache a lot of unused area".
Depending on your particular system, you may end up consuming an entry in your CPU's MMU, or you may just re-use a broader mapping that was setup by whoever wrote the BSP. That's just the cost of doing business on a virtual memory system.
Caching is typically not enabled on I/O memory because of the many side-effects of both reads and writes. For the odd cases when you need it, you have to use ioremap_cached().

Why is Ruby's IO#pwrite thread-safe?

I was wondering if someone could explain me why Ruby's IO::pwrite function is said to be thread-safe in the documentation:
This is advantageous to combining IO#seek and IO#write in that it is
atomic, allowing multiple threads/process to share the same IO object
for reading the file at various location
My understanding of atomicity is that it's all or nothing, if an error is raised the "transaction" will be rolled back so in this case the file would be closed with its original contents (correct?).
Atomicity does not guarantee thread synchronization however, unless rb_thread_io_blocking_region is a synchronized method?
Here's a snippet of the source of the pwrite function, also available here
n = (ssize_t)rb_thread_io_blocking_region(internal_pwrite_func, &arg, fptr->fd);
if (n < 0) rb_sys_fail_path(fptr->pathv);
rb_str_tmp_frozen_release(str, tmp);
return SSIZET2NUM(n);
}
The synchronization is performed by the kernel (the operating system), not Ruby.
As per the documentation, Ruby's pwrite calls this pwrite which takes care of the synchronization.
The behavior of pwrite system call is described here. Specifically:
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the
write() for that position until such byte positions are again
modified.
Any subsequent successful write() to the same byte position in the
file shall overwrite that file data.
The extensive rationale discusses serialization in more detail.

Why doesn't gcc handle volatile register?

I'm working on a timing loop for the AVR platform where I'm counting down a single byte inside an ISR. Since this task is a primary function of my program, I'd like to permanently reserve a processor register so that the ISR doesn't have to hit a memory barrier when its usual code path is decrement, compare to zero, and reti.
The avr-libc docs show how to bind a variable to a register, and I got that working without a problem. However, since this variable is shared between the main program (for starting the timer countdown) and the ISR (for actually counting and signaling completion), it should also be volatile to ensure that the compiler doesn't do anything too clever in optimizing it.
In this context (reserving a register across an entire monolithic build), the combination volatile register makes sense to me semantically, as "permanently store this variable in register rX, but don't optimize away checks because the register might be modified externally". GCC doesn't like this, however, and emits a warning that it might go ahead and optimize away the variable access anyway.
The bug history of this combination in GCC suggests that the compiler team is simply unwilling to consider the type of scenario I'm describing and thinks it's pointless to provide for it. Am I missing some fundamental reason why the volatile register approach is in itself a Bad Idea, or is this a case that makes semantic sense but that the compiler team just isn't interested in handling?
The semantics of volatile are not exactly as you describe "don't optimize away checks because the register might be modified externally" but are actually more narrow: Try to think of it as "don't cache the variable's value from RAM in a register".
Seen this way, it does not make any sense to declare a register as volatile because the register itself cannot be 'cached' and therefore cannot possibly be inconsistent with the variable's 'actual' value.
The fact that read accesses to volatile variables are usually not optimzed away is merely a side effect of the above semantics, but it's not guaranteed.
I think GCC should assume by default that a value in a register is 'like volatile' but I have not verified that it actually does so.
Edit:
I just did a small test and found:
avr-gcc 4.6.2 does not treat global register variables like volatiles with respect to read accesses, and
the Naggy extension for Atmel Studio detects an error in my code: "global register variables are not supported".
Assuming that global register variables are actually considered "unsupported" I am not surprised that gcc treats them just like local variables, with the known implications.
My test code looks like this:
uint8_t var;
volatile uint8_t volVar;
register uint8_t regVar asm("r13");
#define NOP asm volatile ("nop\r\n":::)
int main(void)
{
var = 1; // <-- kept
if ( var == 0 ) {
NOP; // <-- optimized away, var is not volatile
}
volVar = 1; // <-- kept
if ( volVar == 0 ) {
NOP; // <-- kept, volVar *is* volatile
}
regVar = 1; // <-- optimized away, regVar is treated like a local variable
if ( regVar == 0 ) {
NOP; // <-- optimized away consequently
}
for(;;){}
}
The reason you would use the volatile keyword on AVR variables is to, as you said, avoid the compiler optimizing access to the variable. The question now is, how does this happen though?
A variable has two places it can reside. Either in the general purpose register file or in some location in RAM. Consider the case where the variable resides in RAM. To access the latest value of the variable, the compiler loads the variable from RAM, using some form of the ld instruction, say lds r16, 0x000f. In this case, the variable was stored in RAM location 0x000f and the program made a copy of this variable in r16. Now, here is where things get interesting if interrupts are enabled. Say that after loading the variable, the following occurs inc r16, then an interrupt triggers and its corresponding ISR is run. Within the ISR, the variable is also used. There is a problem, however. The variable exists in two different versions, one in RAM and one in r16. Ideally, the compiler should use the version in r16, but this one is not guaranteed to exist, so it loads it from RAM instead, and now, the code does not operate as needed. Enter then the volatile keyword. The variable is still stored in RAM, however, the compiler must ensure that the variable is updated in RAM before anything else happens, thus the following assembly may be generated:
cli
lds r16, 0x000f
inc r16
sei
sts 0x000f, r16
First, interrupts are disabled. Then, the the variable is loaded into r16. The variable is increased, interrupts are enabled and then the variable is stored. It may appear confusing for the global interrupt flag to be enabled before the variable is stored back in RAM, but from the instruction set manual:
The instruction following SEI will be executed before any pending interrupts.
This means that the sts instruction will be executed before any interrupts trigger again, and that the interrupts are disabled for the minimum amount of time possible.
Consider now the case where the variable is bound to a register. Any operations done on the variable are done directly on the register. These operations, unlike operations done to a variable in RAM, can be considered atomic, as there is no read -> modify -> write cycle to speak of. If an interrupt triggers after the variable is updated, it will get the new value of the variable, since it will read the variable from the register it was bound to.
Also, since the variable is bound to a register, any test instructions will utilize the register itself and will not be optimized away on the grounds the compiler may have a "hunch" it is a static value, given that registers by their very nature are volatile.
Now, from experience, when using interrupts in AVR, I have sometimes noticed that the global volatile variables never hit RAM. The compiler kept them on the registers all the time, bypassing the read -> modify -> write cycle alltogether. This was due, however, to compiler optimizations, and it should not be relied on. Different compilers are free to generate different assembly for the same piece of code. You can generate a disassembly of your final file or any particular object files using the avr-objdump utility.
Cheers.
Reserving a register for one variable for a complete compilation unit is probably too restrictive for a compiler's code generator. That is, every C routine would have to NOT use that register.
How do you guarantee that other called routines do NOT use that register once your code goes out of scope? Even stuff like serial i/o routines would have to NOT use that reserved register. Compilers do NOT recompile their run-time libraries based on a data definition in a user program.
Is your application really so time sensitive that the extra delay for bringing memory up from L2 or L3 can be detected? If so, then your ISR might be running so frequently that the required memory location is always available (i.e. it doesn't get paged back down thru the cache) and thus does NOT hit a memory barrier (I assume by memory barrier you are referring to how memory in a cpu really operates, through caching, etc.). But for this to really be true the up would have to have a fairly large L1 cache and the ISR would have to run at a very high frequency.
Finally, sometimes an application's requirements make it necessary to code it in ASM in which case you can do exactly what you are requesting!

Kernel threads vs Timers

I'm writing a kernel module which uses a customized print-on-screen system. Basically each time a print is involved the string is inserted into a linked list.
Every X seconds I need to process the list and perform some operations on the strings before printing them.
Basically I have two choices to implement such a filter:
1) Timer (which restarts itself in the end)
2) Kernel thread which sleeps for X seconds
While the filter is performing its stuff nothing else can use the linked list and, of course, while inserting a string the filter function shall wait.
AFAIK timer runs in interrupt context so it cannot sleep, but what about kernel threads? Can they sleep? If yes is there some reason for not to use them in my project? What other solution could be used?
To summarize: my filter function has got only 3 requirements:
1) Must be able to printk
2) When using the list everything else which is trying to access the list must block until the filter function finishes execution
3) Must run every X seconds (not a realtime requirement)
kthreads are allowed to sleep. (However, not all kthreads offer sleepful execution to all clients. softirqd for example would not.)
But then again, you could also use spinlocks (and their associated cost) and do without the extra thread (that's basically what the timer does, uses spinlock_bh). It's a tradeoff really.
each time a print is involved the string is inserted into a linked list
I don't really know if you meant print or printk. But if you're talking about printk(), You would need to allocate memory and you are in trouble because printk() may be called in an atomic context. Which leaves you the option to use a circular buffer (and thus, you should be tolerent to drop some strings because you might not have enough memory to save all the strings).
Every X seconds I need to process the list and perform some operations on the strings before printing them.
In that case, I would not even do a kernel thread: I would do the processing in print() if not too costly.
Otherwise, I would create a new system call:
sys_get_strings() or something, that would dump the whole linked list into userspace (and remove entries from the list when copied).
This way the whole behavior is controlled by userspace. You could create a deamon that would call the syscall every X seconds. You could also do all the costly processing in userspace.
You could also create a new device says /dev/print-on-screen:
dev_open would allocate the memory, and print() would no longer be a no-op, but feed the data in the device pre-allocated memory (in case print() would be used in atomic context and all).
dev_release would throw everything out
dev_read would get you the strings
dev_write could do something on your print-on-screen system

Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure?

Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control
You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.
Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf
Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.
As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows

Resources