Do system call tables in Linux change over time? - linux-kernel

Specifically, for x86 64-bit architecture, will the syscall table ever change? For instance, could there be additions of new syscalls or relocations of certain syscalls? I am fairly new to the concept and was wondering about this.

New syscalls will be added, but existing ones won't be renumbered or removed, as doing so would break userspace programs, which Linus is famously dead-set against ever doing.

Related

How can an Operating System be coded in high level languages?

I just started diving into the world of operating systems and I've learned that processes have a certain memory space they can address which is handled by the operating system. I don't quite understand how can an Operating System written in high level languages like c and c++ obtain this kind of memory management functionality.
You have caught the bug and there is no cure for it :-)
The language you use to write your OS has very little to do with the way your OS operates. Yes, most people use C/C++, but there are others. As for the language, you do need a language that will let you directly communicate with the hardware you plan to manage, assembly being the main choice for this part. However, this is less than 5% of the whole project.
The code that you write must not rely upon any existing operating system. i.e.: you must code all of the function yourself, or call existing libraries. However, these existing libraries must be written so that they don't rely upon anything else.
Once you have a base, you can write your OS in any language you choose, with the minor part in assembly, something a high level language won't allow. In fact, in 64-bit code, some compilers no longer allow inline assembly, so this makes that 5% I mentioned above more like 15%.
Find out what you would like to do and then find out if that can be done in the language of choice. For example, the main operating system components can be written in C, while the actual processor management (interrupts, etc) must be done in assembly. Your boot code must be in assembly as well, at least most of it.
As mentioned in a different post, I have some early example code that you might want to look at. The boot is done in assembly, while the loader code, both Legacy BIOS and EFI, are mostly C code.
To clarify fysnet's answer, the reason you have to use at least a bit of assembly is that you can only explicitly access addressable memory in C/C++ (through pointers), while hardware registers (such as the program counter or stack pointer) often don't have memory addresses. Not only that, but some registers have to be manipulated with CPU architecture-dependent special instructions, and that, too, is only possible in machine language.
I don't quite understand how can an Operating System written in high level languages like c and c++ obtain this kind of memory management functionality.
As described above, depending on the architecture, this could be achieved by having special instructions to manage the MMU, TLB etc. INVLPG is one example of such an instruction in the x86 architecture. Note that having a special instruction requiring kernel privileges is probably the simplest way to implement such a feature in hardware in a secure manner, because then it is simply sufficient to check if the CPU is in kernel mode in order to determine whether the instruction can be executed or not.
Compilers turn high-level languages into asm / machine code for you, so you don't have to write asm yourself. You pick a compiler that handles memory the way you want your OS to; e.g. using the callstack for automatic storage, and not implicitly calling malloc / free (because those won't exist in your kernel).
To link your compiled C/C++ into a kernel, you typically have to know more about the ABI it targets, and the toolchain especially the linker.
The ISO C standard treats implementation details very much as a black box. But real compilers that people use for low level stuff work in well-known ways (i.e. make the expected/useful implementation choices) that kernel programmers depend on, in terms of compiling code and static data into contiguous blocks that can be linked into a single kernel executable that can be loaded all as one chunk.
As for actually managing the system's memory, you write code yourself to do that, with a bit of inline asm where necessary for special instructions like invlpg as other answers mention.
The entry point (where execution starts) will normally be written in pure asm, to set up a callstack with the stack pointer register pointing to it.
And set up virtual memory and so on so code is executable, data is read/write, and read-only data is readable. All of this before jumping to any compiled C code. The first C you jump to is probably more kernel init code, e.g. initializing data structures for an allocator to manage all the memory that isn't already in use by static code/data.
Creating a stack and mapping code/data into memory is the kind of setup that's normally done by an OS when starting a user-space program. The asm emitted by a compiler will assume that code, static data, and the stack are all there already.

Why would one add a custom system call to Linux kernel?

So I followed an online tutorial on implementing a custom syscall on Arch Linux that simply takes a string and prints it to the kernel log. After completing the tutorial and testing out the syscall, I was left wondering in what situations do kernel devs say "Oh, lets change how this syscall works" or "We need to add another one". I am familiar with the concepts of userspace and kernelspace and why syscalls exist in the first place, but the benefits of new ones are beyond me. I would appreciate any insight on the matter, even a reason why some of the existing syscalls were added or modified and how it affected the overall state of the kernel.
Your instincts are good, nurture them. Only under extreme pressure should a kernels interface be extended. Increasing the surface of the kernel increases the attack potential, needlessly complicates already complicated software, and is rarely a net benefit.
Micro-kernels, or even small-interface kernels (like plan9) have delivered on reduced attack surface, serviceability and reliability. The original architect of UNIX wrote a fantastic paper on why the kernel must be minimal. Everything discussed in the domain of least-privilege reaches out to minimal kernels.

Why isn't the distinction between CPUs more ubiquitous?

I know that every program one writes has to eventually boil down to machine code - that's what compilers produce, that's what executable files consist of, and that's the only language that processors understand. I also know that different processors may have different instruction sets (I know 65c816 assembly, and I imagine it's vastly different from today's computers).
Here's what I'm not getting, though: If there exist different instruction sets, then why do we not seem to have to care about that every time we use software?
If a program was compiled for one particular CPU, it might not run on another - and yet, I never see notices like "Intel users download this version, AMD users download this one". I never have to even be aware of what CPU I'm on, every executable just seems to... work. The same goes for compilers, apparently - there isn't a separate version of, say, GCC, for every processor there is, right?
I'm aware that the differences in instruction sets are much more subtle than they used to be, but even then there should at least be a bit of a distinction. I'm wondering why there doesn't seem to be any.
What is it that I'm not understanding?
There actually are sometimes different versions for Intel/AMD. Even for different versions of Intel and/or AMD. That's not common (especially in the kind of software people usually use) because it's not user friendly (most people don't even know what a CPU is or does, let alone what kind they have exactly), so what often is that either the multiple versions are all in the same executable and selected at runtime, or a "least common denominator" sub-set of instructions is used (when performance is not really a concern). There are significant differences between AMD and Intel though, the most significant one is in which instruction sets they support. AMD always lags behind Intel in implementing Intels new instructions (new sets come out regularly), and Intel usually does not implement AMDs extensions (AMD64 is the big exception (99% accepted by Intel, small changes made), also a couple of instructions here and there were borrowed, but XOP will never happen even though it's awesome, 3DNow! was never adopted by Intel). As an example of software that does not package the code for different "extended instruction sets" in the same executable, see y-cruncher.
To get back to the beginning though, for some (I can't name any off the top of my head, but I've seen it before) high performance software you may see different versions each specifically tailored to get maximum performance on one specific microarchitecture. For example, P4 (netburst) and Core2 are two very different beasts (that's mostly P4's fault for being crazy), even though Core2 is backwards compatible and could run the same code unmodified, that is not necessarily a good idea from a performance perspective.
There is no Intel/AMD versions, because they use the same IS family: x86.
However, there are applications where you have to look out for different versions when you download them. There are still instruction sets that are quite different and might make a program act differently. For example if you have a PowerPC architecture and code a network based application on it, you can forget the little to big endian conversion, but if you compile the same code on x86, which is little endian, the application most likely will produce garbage on the network side.
There is also the difference in how many instructions there are, e.g. RISC vs CISC.
In the end there are a lot of differences to look for and in most programming languages you don't have to worry too much about them though, as the compiler/interpreter will handle most things for you. If you work lower lever then you have to know what you're doing on each architecture.
Also if you compile for ARM, you won't be able to run the program on any other machine, like your PC with x86. It will not work at all.
Because the op codes may/do differ, take the mov instruction, on x86 the op code is 0x88, on ARM it might be 0x13 etc.
The distinction is in fact quite dramatic. Except in the case of Intel vs. AMD. AMD makes their processors compatible with Intel machine code. On purpose of course.
Today there is a move to JIT compiling (Java, .NET, etc.). In this case, the executable file doesn't contain machine code. It contains a simple intermediate language that is compiled just before it is executed, in the machine code of the running processor. This allows the processor architecture to be completely opaque.
AMD is an intel clone. or vice versa depending on your view of the situation. Either way, there is so much in common that programs can be compiled as to run on either (within reason, cant go back 10 years for example, cant make a 32bit processor understand 64 bit specific instructions). Next step is the motherboards have to do similar things and they do, there maybe some intel or amd specific chip support items but then you get into generic peripherals that can be found on either platform or are widespread enough on one or the other platform that the operating systems and/or applications support them.

Why code segment is common for different instances of same program

I wanted to know why code segment is common for different instances of same program.
For example: consider program P1.exe running, if another copy of P1.exe is running, code segment will be common for both running instances. Why is it so?
If the code segment in question is loaded from a DLL, it might be the operating system being clever and re-using the already loaded library. This is one of the core points of using dynamically loaded library code, it allows the code to be shared across multiple processes.
Not sure if Windows is clever enough to do this with the code sections of regular EXE files, but it would make sense if possible.
It could also be virtual memory fooling you; two processes can look like they have the same thing on the same address, but that address is virtual, so they really are just showing mappings of physical memory.
Code is typically read-only, so it would be wasteful to make multiple copies of it.
Also, Windows (at least, I can't speak for other OS's at this level) uses the paging infrastructure to page code in and out direct from the executable file, as if it were a paging file. Since you are dealing with the same executable, it is paging from the same location to the same location.
Self-modifying code is effectively no longer supported by modern operating systems. Generating new code is possible (by setting the correct flags when allocating memory) but this is separate from the original code segment.
The code segment is (supposed to be) static (does not change) so there is no reason not to use it for several instances.
Just to start at a basic level, Segmentation is just a way to implement memory isolation and partitioning. Paging is another way to achieve this. For the most part, anything you can achieve via segmentation, you can be achieve via paging. As such, most modern operating systems on the x86 forego using segmentation at all, instead relying completely on paging facilities.
Because of this, all processes will usually be running under the trivial segment of (Base = 0, Limit = 4GB, Privilege level = 3), which means the code/data segment registers play no real part in determining the physical address, and are just used to set the privilege level of the process. All processes will usually be run at the same privilege, so they should all have the same value in the segment register.
Edit
Maybe I misinterpreted the question. I thought the question author was asking why both processes have the same value in the code segment register.

Seeking articles on shared memory locking issues

I'm reviewing some code and feel suspicious of the technique being used.
In a linux environment, there are two processes that attach multiple
shared memory segments. The first process periodically loads a new set
of files to be shared, and writes the shared memory id (shmid) into
a location in the "master" shared memory segment. The second process
continually reads this "master" location and uses the shmid to attach
the other shared segments.
On a multi-cpu host, it seems to me it might be implementation dependent
as to what happens if one process tries to read the memory while it's
being written by the other. But perhaps hardware-level bus locking prevents
mangled bits on the wire? It wouldn't matter if the reading process got
a very-soon-to-be-changed value, it would only matter if the read was corrupted
to something that was neither the old value nor the new value. This is an edge case: only 32 bits are being written and read.
Googling for shmat stuff hasn't led me to anything that's definitive in this
area.
I suspect strongly it's not safe or sane, and what I'd really
like is some pointers to articles that describe the problems in detail.
It is legal -- as in the OS won't stop you from doing it.
But is it smart? No, you should have some type of synchronization.
There wouldn't be "mangled bits on the wire". They will come out either as ones or zeros. But there's nothing to say that all your bits will be written out before another process tries to read them. And there are NO guarantees on how fast they'll be written vs how fast they'll be read.
You should always assume there is absolutely NO relationship between the actions of 2 processes (or threads for that matter).
Hardware level bus locking does not happen unless you get it right. It can be harder then expected to make your compiler / library / os / cpu get it right. Synchronization primitives are written to makes sure it happens right.
Locking will make it safe, and it's not that hard to do. So just do it.
#unknown - The question has changed somewhat since my answer was posted. However, the behavior you describe is defiantly platform (hardware, os, library and compiler) dependent.
Without giving the compiler specific instructions, you are actually not guaranteed to have 32 bits written out in one shot. Imagine a situation where the 32 bit word is not aligned on a word boundary. This unaligned access is acceptable on x86, and in the case of the x68, the access is turned into a series of aligned accesses by the cpu.
An interrupt can occurs between those operations. If a context switch happens in the middle, some of the bits are written, some aren't. Bang, You're Dead.
Also, lets think about 16 bit cpus or 64 bit cpus. Both of which are still popular and don't necessarily work the way you think.
So, actually you can have a situation where "some other cpu-core picks up a word sized value 1/2 written to". You write you code as if this type of thing is expected to happen if you are not using synchronization.
Now, there are ways to preform your writes to make sure that you get a whole word written out. Those methods fall under the category of synchronization, and creating synchronization primitives is the type of thing that's best left to the library, compiler, os, and hardware designers. Especially if you are interested in portability (which you should be, even if you never port your code)
The problem's actually worse than some of the people have discussed. Zifre is right that on current x86 CPUs memory writes are atomic, but that is rapidly ceasing to be the case - memory writes are only atomic for a single core - other cores may not see the writes in the same order.
In other words if you do
a = 1;
b = 2;
on CPU 2 you might see location b modified before location 'a' is. Also if you're writing a value that's larger than the native word size (32 bits on an x32 processor) the writes are not atomic - so the high 32 bits of a 64 bit write will hit the bus at a different time from the low 32 bits of the write. This can complicate things immensely.
Use a memory barrier and you'll be ok.
You need locking somewhere. If not at the code level, then at the hardware memory cache and bus.
You are probably OK on a post-PentiumPro Intel CPU. From what I just read, Intel made their later CPUs essentially ignore the LOCK prefix on machine code. Instead the cache coherency protocols make sure that the data is consistent between all CPUs. So if the code writes data that doesn't cross a cache-line boundary, it will work. The order of memory writes that cross cache-lines isn't guaranteed, so multi-word writes are risky.
If you are using anything other than x86 or x86_64 then you are not OK. Many non-Intel CPUs (and perhaps Intel Itanium) gain performance by using explicit cache coherency machine commands, and if you do not use them (via custom ASM code, compiler intrinsics, or libraries) then writes to memory via cache are not guaranteed to ever become visible to another CPU or to occur in any particular order.
So just because something works on your Core2 system doesn't mean that your code is correct. If you want to check portability, try your code also on other SMP architectures like PPC (an older MacPro or a Cell blade) or an Itanium or an IBM Power or ARM. The Alpha was a great CPU for revealing bad SMP code, but I doubt you can find one.
Two processes, two threads, two cpus, two cores all require special attention when sharing data through memory.
This IBM article provides an excellent overview of your options.
Anatomy of Linux synchronization methods
Kernel atomics, spinlocks, and mutexes
by M. Tim Jones (mtj#mtjones.com), Consultant Engineer, Emulex
http://www.ibm.com/developerworks/linux/library/l-linux-synchronization.html
I actually believe this should be completely safe (but is depends on the exact implementation). Assuming the "master" segment is basically an array, as long as the shmid can be written atomically (if it's 32 bits then probably okay), and the second process is just reading, you should be okay. Locking is only needed when both processes are writing, or the values being written cannot be written atomically. You will never get a corrupted (half written values). Of course, there may be some strange architectures that can't handle this, but on x86/x64 it should be okay (and probably also ARM, PowerPC, and other common architectures).
Read Memory Ordering in Modern Microprocessors, Part I and Part II
They give the background to why this is theoretically unsafe.
Here's a potential race:
Process A (on CPU core A) writes to a new shared memory region
Process A puts that shared memory ID into a shared 32-bit variable (that is 32-bit aligned - any compiler will try to align like this if you let it).
Process B (on CPU core B) reads the variable. Assuming 32-bit size and 32-bit alignment, it shouldn't get garbage in practise.
Process B tries to read from the shared memory region. Now, there is no guarantee that it'll see the data A wrote, because you missed out the memory barrier. (In practise, there probably happened to be memory barriers on CPU B in the library code that maps the shared memory segment; the problem is that process A didn't use a memory barrier).
Also, it's not clear how you can safely free the shared memory region with this design.
With the latest kernel and libc, you can put a pthreads mutex into a shared memory region. (This does need a recent version with NPTL - I'm using Debian 5.0 "lenny" and it works fine). A simple lock around the shared variable would mean you don't have to worry about arcane memory barrier issues.
I can't believe you're asking this. NO it's not safe necessarily. At the very least, this will depend on whether the compiler produces code that will atomically set the shared memory location when you set the shmid.
Now, I don't know Linux, but I suspect that a shmid is 16 to 64 bits. That means it's at least possible that all platforms would have some instruction that could write this value atomically. But you can't depend on the compiler doing this without being asked somehow.
Details of memory implementation are among the most platform-specific things there are!
BTW, it may not matter in your case, but in general, you have to worry about locking, even on a single CPU system. In general, some device could write to the shared memory.
I agree that it might work - so it might be safe, but not sane.
The main question is if this low-level sharing is really needed - I am not an expert on Linux, but I would consider to use for instance a FIFO queue for the master shared memory segment, so that the OS does the locking work for you. Consumer/producers usually need queues for synchronization anyway.
Legal? I suppose. Depends on your "jurisdiction". Safe and sane? Almost certainly not.
Edit: I'll update this with more information.
You might want to take a look at this Wikipedia page; particularly the section on "Coordinating access to resources". In particular, the Wikipedia discussion essentially describes a confidence failure; non-locked access to shared resources can, even for atomic resources, cause a misreporting / misrepresentation of the confidence that an action was done. Essentially, in the time period between checking to see whether or not it CAN modify the resource, the resource gets externally modified, and therefore, the confidence inherent in the conditional check is busted.
I don't believe anybody here has discussed how much of an impact lock contention can have over the bus, especially on bus bandwith constrained systems.
Here is an article about this issue in some depth, they discuss some alternative schedualing algorythems which reduse the overall demand on exclusive access through the bus. Which increases total throughput in some cases over 60% than a naieve scheduler (when considering the cost of an explicit lock prefix instruction or implicit xchg cmpx..). The paper is not the most recent work and not much in the way of real code (dang academic's) but it worth the read and consideration for this problem.
More recent CPU ABI's provide alternative operations than simple lock whatever.
Jeffr, from FreeBSD (author of many internal kernel components), discusses monitor and mwait, 2 instructions added for SSE3, where in a simple test case identified an improvement of 20%. He later postulates;
So this is now the first stage in the
adaptive algorithm, we spin a while,
then sleep at a high power state, and
then sleep at a low power state
depending on load.
...
In most cases we're still idling in
hlt as well, so there should be no
negative effect on power. In fact, it
wastes a lot of time and energy to
enter and exit the idle states so it
might improve power under load by
reducing the total cpu time required.
I wonder what would be the effect of using pause instead of hlt.
From Intel's TBB;
ALIGN 8
PUBLIC __TBB_machine_pause
__TBB_machine_pause:
L1:
dw 090f3H; pause
add ecx,-1
jne L1
ret
end
Art of Assembly also uses syncronization w/o the use of lock prefix or xchg. I haven't read that book in a while and won't speak directly to it's applicability in a user-land protected mode SMP context, but it's worth a look.
Good luck!
If the shmid has some type other than volatile sig_atomic_t then you can be pretty sure that separate threads will get in trouble even on the very same CPU. If the type is volatile sig_atomic_t then you can't be quite as sure, but you still might get lucky because multithreading can do more interleaving than signals can do.
If the shmid crosses cache lines (partly in one cache line and partly in another) then while the writing cpu is writing you sure find a reading cpu reading part of the new value and part of the old value.
This is exactly why instructions like "compare and swap" were invented.
Sounds like you need a Reader-Writer Lock : http://en.wikipedia.org/wiki/Readers-writer_lock.
The answer is - it's absolutely safe to do reads and writes simultaneously.
It is clear that the shm mechanism
provides bare-bones tools for the
user. All access control must be taken
care of by the programmer. Locking and
synchronization is being kindly
provided by the kernel, this means the
user have less worries about race
conditions. Note that this model
provides only a symmetric way of
sharing data between processes. If a
process wishes to notify another
process that new data has been
inserted to the shared memory, it will
have to use signals, message queues,
pipes, sockets, or other types of IPC.
From Shared Memory in Linux article.
The latest Linux shm implementation just uses copy_to_user and copy_from_user calls, which are synchronised with memory bus internally.

Resources