What is guarded region and how it differs from critical region? - windows

Threre is a "guarded region" concept in windows, that is similar to critical region. Who knows how is it differs from Critical?

Recall that a process on any modern OS is made of several components. The ones we're interested in are:
code: the body of the program being executed,
memory: code, as well as execution related data (heap and stack) are stored there for the duration of a process,
thread: an execution context, ie CPU state and memory bits (stack for instance) related to that context,
signal slots: a process structure holding signals emitted by threads to each other, each thread has one such structure (ie, it's one of the memory bits of a thread)
Within a process, there may be several instances of these objects; usually code and memory are accessible by all the existing threads within a process. At any time there may be as many active threads (ie, executing concurrently) as there are available cores on your CPUs. These threads, if they belong to a same process, might interfer when accessing the same memory or code sections. For that purpose, Windows implements the so called critical sections, which are basically protected code blocks (it is very similar to the concept of synchronized code blocks in Java).
However, threads may also be diverted from their current execution code path when a signal is triggered or posted to them. On Windows, APCs are one form of that signaling mechanism. Guarded sections are available to make sure that a thread will complete a given code block before being able to handle these signals (APC in this case).
So, while a critical section will prevent any other thread than the active one to execute the protected code section concurrently, a guarded section will ensure that the current thread will not execute any other code than the guarded one once it started it.
As a simple analogy, imagine a code section like a flat in a building (the process code). A person (thread) who enters a critically protected code will lock the flat's main door, thus preventing any other person to enter it while she's still inside. If the code is guarded, then the OS will lock the person's cellphone, preventing her from answering calls until she actually leaves the flat.
A typical scenario for critical sections is when a specific resource needs to be accessed exclusively (a socket, an file handle, an in memory data structure). For APCs, similarly, guarded sections would prevent a thread from interfering with itself by trying to access one such resource in an APC execution when it was already using it in its current execution code path.


Storing Per-Process Data in Kernel Module / Passing Data Between sys_enter and sys_exit Probe

Familiarity with how Linux Kernel Tracepoints work is not necessarily required to help with this question, it is just what is motivating this problem. In essence, I am looking for a way to store per-process data for a kernel module, without modifying the Linux source (e.g. struct task_struct), and ideally without using locks. Here is my specific question:
I have a kernel module that hooks into the sys_enter (defined here for x86_64, aarch64) and sys_exit (x86_64, aarch64) tracepoints. For each system call issued, I need to pass some data between the enter probe and the exit probe.
Some things I have considered: I could ...
...use one global variable -- but that will be shared between concurrently executing system calls on different CPUs, creating a race.
...use one global map from PID (of the process issuing the system call) to my data, together with locks -- but that will unnecessarily require synchronization between all CPUs on each system call. I would like to avoid this, since the data is "local" to each issued system call, so I feel like there should be a way to keep it local and not add costly synchronization.
...use a per-CPU global variable -- but (it is my understanding that) a process may move to another CPU during the system call execution, making this approach incorrect.
...kmallocing some memory for my custom data upon each system call entry, then pass the address to that memory by clobbering one of the registers in struct pt_regs (both the entry and exit probe receive a pointer to said struct) -- but then I will have a memory leak for system calls that do not trigger the exit probe (such as sys_exit, which never returns).
I am open to any suggestions how these ideas could be refined to address the problems I listed, or any completely different ideas that I am not thinking of.
I'd use an RCU enabled hashtable, for safety.
The first option isn't actually doable, as you stated.
The third one requires you to track which process is using which CPU, which seems unnecessary.
The leaking problem of the fourth option can probably be solved somehow, but allocating memory on each system call can introduce a serious delay.
Of course that accessing the hashtable will also slow down the system, but It won't trigger a memory allocation for each system call, so I assume it'll be less harmful.
Also, I may be wrong here, but if you assume that only process creation/destruction will introduce changes to table itself (not to the data within each entry, but the location and hash value of each row) than maybe you won't even have to synchronize on each system call, but only on ones that will cause process creation/destruction.

How to identify a process in Windows? Kernel and User mode

In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.

Difference between pthread and fork on gnu/Linux

What is the basic difference between a pthread and fork w.r.t. linux in terms of
implementation differences and how the scheduling varies (does it vary ?)
I ran strace on two similar programs , one using pthreads and another using fork,
both in the end make clone() syscall with different arguments, so I am guessing
the two are essentially the same on a linux system but with pthreads being easier
to handle in code.
Can someone give a deep explanation?
EDIT : see also a related question
In C there are some differences however:
Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.
Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.
Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.
Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.
On Linux, the system call clone clones a task, with a configurable level of sharing.
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing).
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory.
You should look at the clone manpage.
In particular, it lists all the possible clone modes and how they affect the process/thread, virtual memory space etc...
You say "threads easier to handle in code": that's very debatable. Writing bug-free, deadlock-free multi-thread code can be quite a challenge. Sometimes having two separate processes makes things much simpler.

Fork and Threads in ruby

I am running a program on a machine with a two processors, when I do a fork is the child created as a native thread or it is like a green thread/coroutine. Is the child running concurrently with the parent or it is just parallel?
The working of fork() in general is to generate a new, independent process, duplicate the page table, and mark all pages owned by the process that called fork() as copy-on-write in that process. Then, fork() returns in both processes (the return value lets the respective process know which one it is).
On a system with more than one processor (or processor cores) you can normally (assuming you do have a SMP-enabled system, cpu affinity doesn't prevent it) expect those two processes to use both processors, but you do not strictly have a guarantee.
Threads are generated in the same way on some systems (e.g. Linux) with the exception that the pages owned by the first process are not marked copy-on-write, but are instead owned by both processes afterwards (they use the same page table). On other systems, threads may be implemented differently, e.g. in user land, in which case you will not benefit from multiple cpus with threads.
As a side note, the disadvantage of using fork() and running 2 processes instead of threads is that the processes do not share a common address space, which means that the TLB must be flushed on a context switch.
This depends on the operating system, programming-language, compiler and runtime-library, so i can only give you an example: If you use _beginthread under Windows (no matter if you use MinGW or the MSCRT directly) you use both your processors. Further to explain the semantics of "concurrent" vs. "parallel": they are non-exclusive.

Can I be sure that the code I write is always executed in the same thread?

I normally work on single threaded applications and have generally never really bothered with dealing with threads. My understanding of how things work - which certainly, may be wrong - is that as long as we're always dealing with single threaded code (i.e. no forks or anything like that) it will always be executed in the same thread.
Is this assumption correct? I have a fuzzy idea that UI libraries/frameworks may spawn off threads of their own to handle GUI stuff (which accounts for the fact that the Windows task manager tells me that my 'single threaded' application is actually running on 10 threads) but I'm guessing that this shouldn't affect me?
How does this apply to COM? For instance, if I were to create an instance of a COM component in my code; and that COM component writes some information to a thread-based location (using System.Threading.Thread.GetData for instance) will my application be able to get hold of that information?
So in summary:
In single threaded code, can I be sure that whatever I store in a thread-based location can be retrievable from anywhere else in the code?
If that single threaded code were to create an instance of a COM component which stores some information in a thread-based location, can that be similarly retrievable from anywhere else?
UI usually has the opposite constraint (sadly): it's single threaded and everything must happen on that thread.
The easiest way to check if you are always in the same thread (for, say, a function) is to have an integer variable set at -1, and have a check function like (say you are in C#):
void AssertSingleThread()
if (m_ThreadId < 0) m_ThreadId = Thread.CurrentThread.ManagedThreadId;
Debug.Assert(m_ThreadId == Thread.CurrentThread.ManagedThreadId);
That said:
I don't understand the question #1, really. Why store in a thread-based location if your purpose is to have a global scope ?
About the second question, most COM code runs on a single thread and, most often, on the thread where your UI message processing lives - this is because most COM code is designed to be compatible with VB6 which is single-thread.
The reason your program has about 10 threads is because both Windows (if you use some of its features like completion ports, or some kind of timers) and the CLR (for example for the GC or, again, some types of timers) may create threads in your process space (technically any program with enough priviledges, can too).
Think about having the model of having a single dataStore class running in your mainThread that all threads can read and write their instance variables to. This will avoid a lot of problems that might arise accessing threads all over the shop.
Simple idea, until you reach the fun part of threading. Concurrency and synchronization; simply, if you have two threads that want to read and write to the same variable inside dataStore at the same time, you have a problem.
Java handles this by allowing you to declare a variable or method synchronized, allowing only one thread access at a time.
I believe some .NET objects have Lock and Synchronized methods defined on them, but I know no more than this.
