Linux kernel which asm headers / symbols / macros are available on all architectures?

Linux kernel which asm headers / symbols / macros are available on all architectures? - linux-kernel

I want to use something (header file, struct, function or macro) that is declared / defined under arch/XXX/asm/include (in my case, PAGE_TABLE) in a kernel module.
Is it possible to know if that thing is present on all architectures?
Phrased differently: what exactly is the arch-portable API that the kernel exposes to kernel space under asm/?
I could find . or grep -r into the kernel tree, but is there a better way to know that for every new architecture that comes out, that thing must be defined for the architecture to be supported? After all, even if something is furnished on all existing architectures, who guarantees that it is not just a coincidence that they all furnish those things, but that they are not mandatory?
Taking headers for example, in recent source snapshots, x86 contains acpi.h, but arm does not, but all architectures seem to have page.h. So how can I know that I can use #include <asm/page.h> but not acpi.h? page.h on the other hand, is expected to have an implementation on all archs since include/linux/ uses it in several points, and include/linux is meant to be portable to all architectures (please confirm this point).

You can check it by yourself:
Algorithm:
Init array of all the possible ARCH's e.g. i386
Init /tmp/include/name_of_arch directory for each arch
Iterate for each ARCH: make headers_install ARCH=name_of_arch INSTALL_HDR_PATH=/tmp/include/name_or_arch/
For each file in each arch folder: compute sha256sum
Find the intersection of all the commons sha256 signatures.
You can find some the ARCH's in checkstack.pl script, e.g. m68k or in the main Makefile of the kernel.

Related

GCC [for ARM] force no floating point

I would like to create a build of my embedded C code which specifically checks that floating point operations aren't introduced into it by accident. I've tried adding +nofp to my [cortex-m3] processor architecture but GCC for ARM doesn't like that (probably because the cortex-m3 doesn't have a floating point unit). I've tried specifying -mfpu=none but that isn't a permitted option. I've tried leaving -lm off the linker command-line but the linker seems too clever to be fooled by that and is compiling code with double in it and resolving pow() anyway.
This post: https://gcc.gnu.org/legacy-ml/gcc-help/2011-07/msg00093.html from 2011 hints that GCC has no such option, since no-one is interested in it, which surprises me as it seems like a common thing to want, at least from an embedded standpoint, to avoid accidental C-library bloat.
Does anyone know of a way to do this with GCC/newlib without me having to go through and manually hack stuff out of the C library file it chooses?

It is not just a library issue. Your target will use soft-fp, and the compiler will supply floating point code to implement arithmetic operators regardless of the library.
The solution I generally apply is to scan the map file for instances of the compiler supplied floating-point routines. If your code is "fp clean" there will be no such references. The math library and any other code that perform floating-point arithmetic operations will use these operator implementations, so you only need look for these operator calls and can ignore the Newlib math library functions.
The internal soft-fp routines are listed at https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html. It is probably feasible to manually check the mapfile for fp symbols but you might write yourself a script or tool to scan the map file for these names to check your. The cross-reference section of the map file will list all modules these symbols are used in so you can use that to identify where the floating point code is used.
The Newlib stdio functions support floating-point by default. If your formatted I/O is limited to printf() you can use iprintf() instead or you can rebuild Newlib with FLOATING_POINT undefined to remove floating point support from all but scanf() (no idea why). You can then use the map file technique again to find "banned" formatted I/O functions (although these are likely to also use the floating point operator functions in any case, so you will already have spotted them indirectly).
An alternative is to use an alternative stdio library to override the Newlib versions. There are any number of "tiny printf" implementations available you could use. If you link such a library as object code or list its library ahead of Newlib in the link command, it will override the Newlib versions.

Only one --hash-style in embedded Linux. Why?

I am trying to get a software package built and deployed into rootfs with OpenEmbedded-based Arago. Unfortunately the software package includes prebuilt shared libs. As far as I understand, Arago builds the entire Linux distro with --hash-style=gnu, while those shared libs have been built with --hash-style=sysv, I suspect. At least the build stops with "No GNU_HASH in the ELF binary" QA issue.
I understand what hashes are for. But I guess I do not understand how they are being used when the system is running.
Why is it necessary to have one hash style for all ELFs in the system? Why can't the dynamic linker determine the hash style on the fly and just use it?

The dynamic linker can and does figure out the kind of hash table ("sysv" or "gnu") present in the ELF and works accordingly.
Unfortunately what you see is a case where support for gnu hash sections has NOT been back-ported to older version of the dynamic linker in use on your system.
A similar situation exists wherein binaries built for RHEL5/FC6 do NOT work on RHEL4/FC5.
Why .gnu.hash is incompatible with .hash(sysv)?
Generating an ELF with the gnu hash section imposes certain restrictions(additional rules) on the construction of the dynamic symbol table.
With GNU hash, the dynamic symbol table is divided into two parts. The first part receives the symbols that can be omitted from the hash table. GNU hash does not impose any specific order for the symbols in this part of the dynamic symbol table.
The second part of the dynamic symbol table receives the symbols that are accessible from the hash table. These symbols are required to be sorted by increasing (hash % nbuckets) value, using the GNU hash function described above. The number of hash buckets (nbuckets) is recorded in the GNU hash section, described below. As a result, symbols which will be found in a single hash chain are adjacent in memory, leading to better cache performance.
Reference: blogs.oracle.com/ali/entry/gnu_hash_elf_sections

I had the same No GNU_HASH in the ELF binary issue with my Yocto Arago build, but it turned out that my application's Makefile wasn't using $(LDFLAGS) which is set by the Yocto build and contains -Wl,--hash-style=gnu among other important things.
I'm mentioning this because this question is the top search result for that error message, and this might help other people.

GCC atomic built-ins: Is there a list showing which are supported on which platform?

Is there a site listing the various platforms and their support for GCC's atomic built-ins, for the various GCC versions?
EDIT:
To be more clear:
GCC adds _sync... as intrinsics on platforms it contains support for. On all other platforms it keeps those as normal functions declarations but does not supply an implementation. This must be done by some framework.
So the question is: For which platforms does GCC supply which intrinsics without need to add a function implementation?

I'm not aware if there's such a list, however http://gcc.gnu.org/projects/cxx0x.html says atomics are supported since GCC 4.4.
GCC libstdc++ implements <atomic> on top of the builtin functions `__sync_fetch_and_add' and friends ( http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Atomic-Builtins.html ).
These functions are expanded either using machine specific expanders in the machine description of the target (usually in a file named `sync.md') or, lacking such expanders, using a CAS loop. If the presense of `sync.md' file is any indication for a proper atomics support, then you can count in MIPS, i386, ARM, BlackFin, Alpha, PowerPC, IA64 and Sparc.

[Though this is an old question, I thought I should update and complete the answer]
I am not aware of a per-architecture-version and per-gcc-version table, describing supported built-ins.
The __sync built-in functions of gcc exist since version 4.1 (see, e.g., gcc 4.1.2 manual. As stated there:
Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix `_n' where n is the size of the data type.
So, when there is not an implementation for a specific architecture, a compilation warning will appear and, I guess, a link-time error, unless you provide the required function with the appropriate name.
After gcc 4.7 there are also __atomic built-ins and __sync built-ins are deprecated.
For example, see how Fedora uses gcc __sync and __atomic here

Difference between API and ABI

I am new to Linux system programming and I came across API and ABI while reading
Linux System Programming.
Definition of API:
An API defines the interfaces by which
one piece of software communicates
with another at the source level.
Definition of ABI:
Whereas an API defines a source
interface, an ABI defines the
low-level binary interface between two
or more pieces of software on a
particular architecture. It defines
how an application interacts with
itself, how an application interacts
with the kernel, and how an
application interacts with libraries.
How can a program communicate at a source level? What is a source level? Is it related to source code in any way? Or the source of the library gets included in the main program?
The only difference I know is API is mostly used by programmers and ABI is mostly used by a compiler.

API: Application Program Interface
This is the set of public types/variables/functions that you expose from your application/library.
In C/C++ this is what you expose in the header files that you ship with the application.
ABI: Application Binary Interface
This is how the compiler builds an application.
It defines things (but is not limited to):
How parameters are passed to functions (registers/stack).
Who cleans parameters from the stack (caller/callee).
Where the return value is placed for return.
How exceptions propagate.

The API is what humans use. We write source code. When we write a program and want to use some library function we write code like:
long howManyDecibels = 123L;
int ok = livenMyHills(howManyDecibels);
and we needed to know that there is a method livenMyHills(), which takes a long integer parameter. So as a Programming Interface it's all expressed in source code. The compiler turns this into executable instructions which conform to the implementation of this language on this particular operating system. And in this case result in some low level operations on an Audio unit. So particular bits and bytes are squirted at some hardware. So at runtime there's lots of Binary level action going on which we don't usually see.
At the binary level there must be a precise definition of what bytes are passed at the Binary level, for example the order of bytes in a 4 byte integer, or the layout of a complex data structure - are there padding bytes to align some values. This definition is the ABI.

I mostly come across these terms in the sense of an API-incompatible change, or an ABI-incompatible change.
An API change is essentially where code that would have compiled with the previous version won't work anymore. This can happen because you added an argument to a function, or changed the name of something accessible outside of your local code. Any time you change a header, and it forces you to change something in a .c/.cpp file, you've made an API-change.
An ABI change is where code that has already been compiled against version 1 will no longer work with version 2 of a codebase (usually a library). This is generally trickier to keep track of than API-incompatible change since something as simple as adding a virtual method to a class can be ABI incompatible.
I've found two extremely useful resources for figuring out what ABI compatibility is and how to preserve it:
The list of Do's and Dont's with C++ for the KDE project
Ulrich Drepper's How to Write Shared Libraries.pdf (primary author of glibc)

Linux shared library minimal runnable API vs ABI example
This answer has been extracted from my other answer: What is an application binary interface (ABI)? but I felt that it directly answers this one as well, and that the questions are not duplicates.
In the context of shared libraries, the most important implication of "having a stable ABI" is that you don't need to recompile your programs after the library changes.
As we will see in the example below, it is possible to modify the ABI, breaking programs, even though the API is unchanged.
main.c
#include <assert.h>
#include <stdlib.h>
#include "mylib.h"
int main(void) {
mylib_mystruct *myobject = mylib_init(1);
assert(myobject->old_field == 1);
free(myobject);
return EXIT_SUCCESS;
}
mylib.c
#include <stdlib.h>
#include "mylib.h"
mylib_mystruct* mylib_init(int old_field) {
mylib_mystruct *myobject;
myobject = malloc(sizeof(mylib_mystruct));
myobject->old_field = old_field;
return myobject;
}
mylib.h
#ifndef MYLIB_H
#define MYLIB_H
typedef struct {
int old_field;
} mylib_mystruct;
mylib_mystruct* mylib_init(int old_field);
#endif
Compiles and runs fine with:
cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out
Now, suppose that for v2 of the library, we want to add a new field to mylib_mystruct called new_field.
If we added the field before old_field as in:
typedef struct {
int new_field;
int old_field;
} mylib_mystruct;
and rebuilt the library but not main.out, then the assert fails!
This is because the line:
myobject->old_field == 1
had generated assembly that is trying to access the very first int of the struct, which is now new_field instead of the expected old_field.
Therefore this change broke the ABI.
If, however, we add new_field after old_field:
typedef struct {
int old_field;
int new_field;
} mylib_mystruct;
then the old generated assembly still accesses the first int of the struct, and the program still works, because we kept the ABI stable.
Here is a fully automated version of this example on GitHub.
Another way to keep this ABI stable would have been to treat mylib_mystruct as an opaque struct, and only access its fields through method helpers. This makes it easier to keep the ABI stable, but would incur a performance overhead as we'd do more function calls.
API vs ABI
In the previous example, it is interesting to note that adding the new_field before old_field, only broke the ABI, but not the API.
What this means, is that if we had recompiled our main.c program against the library, it would have worked regardless.
We would also have broken the API however if we had changed for example the function signature:
mylib_mystruct* mylib_init(int old_field, int new_field);
since in that case, main.c would stop compiling altogether.
Semantic API vs Programming API
We can also classify API changes in a third type: semantic changes.
The semantic API, is usually a natural language description of what the API is supposed to do, usually included in the API documentation.
It is therefore possible to break the semantic API without breaking the program build itself.
For example, if we had modified
myobject->old_field = old_field;
to:
myobject->old_field = old_field + 1;
then this would have broken neither programming API, nor ABI, but main.c the semantic API would break.
There are two ways to programmatically check the contract API:
test a bunch of corner cases. Easy to do, but you might always miss one.
formal verification. Harder to do, but produces mathematical proof of correctness, essentially unifying documentation and tests into a "human" / machine verifiable manner! As long as there isn't a bug in your formal description of course ;-)
Tested in Ubuntu 18.10, GCC 8.2.0.

This is my layman explanations:
API - think of include files. They provide programming interfaces.
ABI - think of kernel module. When you run it on some kernel, it has to agree on how to communicate without include files, i.e. as low-level binary interface.

(Application Binary Interface) A specification for a specific hardware platform combined with the operating system. It is one step beyond the API (Application Program Interface), which defines the calls from the application to the operating system. The ABI defines the API plus the machine language for a particular CPU family. An API does not ensure runtime compatibility, but an ABI does, because it defines the machine language, or runtime, format.
Courtesy

Let me give a specific example how ABI and API differ in Java.
An ABI incompatible change is if I change a method A#m() from taking a String as an argument to String... argument. This is not ABI compatible because you have to recompile code that is calling that, but it is API compatible as you can resolve it by recompiling without any code changes in the caller.
Here is the example spelled out. I have my Java library with class A
// Version 1.0.0
public class A {
public void m(String string) {
System.out.println(string);
}
}
And I have a class that uses this library
public class Main {
public static void main(String[] args) {
(new A()).m("string");
}
}
Now, the library author compiled their class A, I compiled my class Main and it is all working nicely. Imagine a new version of A comes
// Version 2.0.0
public class A {
public void m(String... string) {
System.out.println(string[0]);
}
}
If I just take the new compiled class A and drop it together with the previously compiled class Main, I get an exception on attempt to invoke the method
Exception in thread "main" java.lang.NoSuchMethodError: A.m(Ljava/lang/String;)V
at Main.main(Main.java:5)
If I recompile Main, this is fixed and all is working again.

Your program (source code) can be compiled with modules who provide proper API.
Your program (binary) can run on platforms who provide proper ABI.
API restricts type definitions, function definitions, macros, sometimes global variables a library should expose.
ABI restricts what a "platform" should provide for you program to run on. I like to consider it in 3 levels:
processor level - the instruction set, the calling convention
kernel level - the system call convention, the special file path convention (e.g. the /proc and /sys files in Linux), etc.
OS level - the object format, the runtime libraries, etc.
Consider a cross-compiler named arm-linux-gnueabi-gcc. "arm" indicates the processor architecture, "linux" indicates the kernel, "gnu" indicates its target programs use GNU's libc as runtime library, different from arm-linux-androideabi-gcc which use Android's libc implementation.

API - Application Programming Interface is a compile time interface which can is used by developer to use non-project functionality like library, OS, core calls in source code
ABI[About] - Application Binary Interface is a runtime interface which is used by a program during executing for communication between components in machine code

The ABI refers to the layout of an object file / library and final binary from the perspective of successfully linking, loading and executing certain binaries without link errors or logic errors occuring due to binary incompatibility.
The binary format specification (PE, COFF, ELF, .obj, .o, .a, .lib (import library, static library), .NET assembly, .pyc, COM .dll): the headers, the header format, defining where the sections are and where the import / export / exception tables are and the format of those
The instruction set used to encode the bytes in the code section, as well as the specific machine instructions
The actual signature of the functions and data as defined in the API (as well as how they are represented in the binary (the next 2 points))
The calling convention of the functions in the code section, which may be called by other binaries (particularly relevant to ABI compatibility being the functions that are actually exported)
The way data is represented and aligned in the data section with respect to its type (particularly relevant to ABI compatibility being the data that is actually exported)
The system call numbers or interrupt vectors hooked in the code
The name decoration of exported functions and data
Linker directives in object files
Preprocessor / compiler / assembler / linker flags and directives used by the API programmer and how they are interpreted to omit, optimise, inline or change the linkage of certain symbols or code in the library or final binary (be that binary a .dll or the executable in the event of static linking)
The bytecode format of .NET C# is an ABI (general), which includes the .NET assembly .dll format. The virtual machine that interprets the bytecode has a specific ABI that is C++ based, where types need to be marshalled between native C++ types that the native code's specific ABI uses and the boxed types of the virtual machine's ABI when calling bytecode from native code and native code from bytecode. Here I am calling an ABI of a specific program a specific ABI, whereas an ABI in general, such as 'MS ABI' or 'C ABI' simply refers to the calling convention and the way structures are organised, but not a specific embodiment of the ABI by a specific binary that introduces a new level of ABI compatibility concerns.
An API refers to the set of type definitions exported by a particular library imported and used in a particular translation unit, from the perspective of the compiler of a translation unit, to successfully resolve and check type references to be able to compile a binary, and that binary will adhere to the standard of the target ABI, such that if the library that actually implements the API is also compiled to a compatible ABI, it will link and work as intended. If the API is updated the application may still compile, but there will now be a binary incompatibility and therefore a new binary needs to be used.
An API involves:
Functions, variables, classes, objects, constants, their names, types and definitions presented in the language in which they are coded in a syntactically and semantically correct manner
What those functions actually do and how to use them in the source language
The source code files that need to be included / binaries that need to be linked to in order to make use of them, and the ABI compatibility thereof

I'll begin by answering your specific questions.
1.What is a source level? Is it related to source code in any way?
Yes, the term source level refers to the level of source code. The term level refers to the semantic level of the computation requirements as they get translated from the application domain level to the source code level and from the source code level to the machine code level (binary codes). The application domain level refers what end-users of the software want and specify as their computation requirements. The source code level refers to what programmers make of the application level requirements and then specify as a program in a certain language.
How can a program communicate at a source level? Or the source of the library gets included in the main program?
Language API refers specifically to all that a language requires(specifies) (hence interfaces) to write reusable modules in that language. A reusable program conforms to these interface (API) requirements to be reused in other programs in the same language. Every reuse needs to conform to the same API requirements as well. So, the word "communicate" refers to reuse.
Yes, source code (of a reusable module; in the case of C/C++, .h files ) getting included (copied at pre-processing stage) is the common way of reusing in C/C++ and is thus part of C++ API. Even when you just write a simple function foo() in the global space of a C++ program and then call the function as foo(); any number of times is reuse as per the C++language API. Java classes in Java packages are reusable modules in Java. The Java beans specification is also a Java API enabling reusable programs (beans) to be reused by other modules ( could be another bean) with the help of runtimes/containers (conforming to that specification).
Coming to your overall question of the difference between language API and ABI, and how service-oriented APIs compare with language APIs, my answer here on SO should be helpful.

Size of a library and the executable

I have a static library *.lib created using MSVC on windows. The size of library is say 70KB. Then I have an application which links this library. But now the size of the final executable (*.exe) is 29KB, less than the library. What i want to know is :
Since the library is statically linked, I was thinking it should add directly to the executable size and the final exe size should be more than that? Does windows exe format also do some compression of the binary data?
How is it for linux systems, that is how do sizes of library on linux (*.a/*.la file) relate with size of linux executable (*.out) ?
-AD

A static library on both Windows and Unix is a collection of .obj/.o files. The linker looks at each of these object files and determines if it is needed for the program to link. If it isn't needed, then the object file won't get included in the final executable. This can lead to executables that are smaller then the library.
EDIT: As MSalters points out, on Windows the VC++ compiler now supports generating object files that enable function-level linking, e.g., see here. In fact, edit-and-continue requires this, since the edit-and-continue needs to be able to replace the smallest possible part of the executable.

There is additional bookkeeping information in the .lib file that is not needed for the final executable. This information helps the linker find the code to actually link. Also, debug information may be stored in the .lib file but not in the .exe file (I don't recall where debug info is stored for objs in a lib file, it might be somewhere else).

The static library probably contains several functions which are never used. When the linker links the library with the main executable, it sees that certain functions are never used (and that their addresses are never taken and stored in function pointers), it just throws away the code. It can also do this recursively: if function A() is never called, and A() calls B(), but B() is never otherwise called, it can remove the code for both A() and B(). On Linux, the same thing happens.

A static library has to contain every symbol defined in its source code, because it might get linked into an executable which needs just that specific symbol. But once it is linked into an executable, we know exactly which symbols end up being used, and which ones don't. So the linker can trivially remove unused code, trimming the file size by a lot. Similarly, any duplicate symbols (anything that's defined in both the static library and the executable it's linked into gets merged into a single instance.

Disclaimer: It's been a long time since I dealt with static linking, so take my answer with a grain of salt.
You wrote: I was thinking it should add directly to the executable size and final exe size should be more than that?
Naive linkers work exactly this way - back when I was doing hobby development for CP/M systems (a LONG time ago), this was a real problem.
Modern linkers are smarter, however - they only link in the functions referenced by the original code, or as required.

Additionally to the current answers, the linker is allowed to remove function definitions if they have identical object code - this is intended to help reduce the bloating effects of templated code.

#All: Thanks for the pointers.
#Greg Hewgill - Your answer was a good pointer. Thanks.
The answer i found out was as follows:
1.)During Library building what happens is if the option "Keep Program debug databse" in MSVC (or something alike ) is ON, then library will have this debug info bloating its size.
but when i statically include that library and create a executable, the linker strips all that debug info from the library before geenrating the exe and hence the exe size is less than that of the library.
2.) When i disabled the option "Keep Program debug databse", i got an library whose size was smaller than the final executable, which was what i thought is nromal in most situations.
-AD

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio