What is this apparently non-standard struct packing syntax fed into GCC? - gcc

I am a little bit dumbstruck by some code that is associated with a 3rd-party code base I'm working with. All code is written in C or assembler except for a number of files adhering to the syntax described below. I cannot find any documentation on this syntax yet GCC swallows it without any problem. It's GCC 8 I work with. The syntac must be some extension to GCC. It would be very nice if somebody could enlighten me as to exactly what extension it is and where it is documented.
The code obviously defines struct types with packing and uses syntax like this:
Comment lines begin with "--"
Keywords are "block", "padding", "field", and "field_high", possibly more. A typical piece of code looks like this:
block <BLOCK_NAME> {
field <FIELD_NAME_NO_1> 1
field <FIELD_NAME_NO_2> 1
padding 8
field_high <FIELD_NAME_NO_3> 6
}
A block can contain any number of fields and paddings. The numbers given always add up to a word length on the target architecture.
Files containing this kind of code most often have ".bf" es their extension while ".c" can occur too. Some files have #include's referring to ordinary C headers while some ordinary C files have #includes referring to ".bf" files.

A quick glance at the tools directory in the Git repository found me bitfield_gen.py, which claims to be a code generator for "bitfield structures". I presume that's what .bf stands for.
There are some CMake functions for building bitfield targets in tools/helpers.cmake. That will probably make sense to people more familiar with CMake than I am.

The Bit Field Generator is documented here http://research.davidcock.fastmail.fm/papers/Cock_08.pdf

Related

What does #...# mean in this Makefile snippet?

Can Somebody explain me on short (just as idea) what the following fragment suggests?
- I'm new in C language so I don't understand the meaning of #...# sign:
#SET_MAKE#
VPATH = #srcdir#
pkgdatadir = $(datadir)/#PACKAGE#
pkgincludedir = $(includedir)/#PACKAGE#
pkglibdir = $(libdir)/#PACKAGE#
pkglibexecdir = $(libexecdir)/#PACKAGE#
or:
build_triplet = #build#
host_triplet = #host#
If is needed to put more code, let me know.
Thanks in advance.
The system of using names enclosed in # is used by autoconf to mark strings that should be replaced by the configure script.
These appear to be build-system variables of some sort, as the # symbol is not (I believe) used in C at all. Considering the names, this seems even more likely. The package and source directory will be inserted in the corresponding places.
Perhaps more interesting are the $(var)s, which are used often in Visual Studio project files (but not source, and a VS proj is a make file of sorts itself).
My guess is you have two make/build system variable types being used here. Whether they're from two system, I do not know. As Brian Roach pointed out in a comment, at least GNU autoconf is involved here.
What file did this come from, and what other text surrounds it? That may shed more light, if a well known name is used. It is possible this isn't a code file at all, and just a make file; or it could be a code file with build system variables in (for at-build replace).
This is not C at all, looks more like a makefile of some sort. Take a look at the filename where you found this, I doubt it ends in .c.

How can I force the order of functions in a binary with the gcc toolchain?

I'm building a static binary out of several source files and libraries, and I want to control the order in which the functions are put into the resulting binary.
The background is, I have external code which is linked against offsets in this binary. Now if I change the source, all the offsets change because gcc may decide to order the functions differently, so I want to put the referenced functions at the beginning in a fixed order so their offsets stay unchanged...
I looked through ld's documentation but couldn't find anything about order of functions.
The only thing i found was -fno-toplevel-reorder which doesn't really help me.
There is really no clean and reliable way of forcing a function to a particular address (except for the entry function) or even forcing functions having a particular order (and if you could enforce the order that would still not mean that the addresses stay the same when the source is changed!).
The biggest problem that I see is that even if it may be possible to fix a function to some address, it will be sheer impossible to fix all of them to exactly the addresses that the already existing external program expects (assuming you cannot modify this program). If that actually worked, it would be total coincidence and sheer luck.
It might be almost easiest to provide trampolines at the addresses that the other program expects, and having the real functions (whereever they may be) pointed to by these. That would require your code to use a different base address, so the actual program code doesn't collide with the trampolines.
There are three things that almost work for giving functions fixed addresses:
You can place each function that isn't allowed to move in its proper section using __attribute__ ((section ("some name"))). Unluckily, .text always appears as the first section, so if anything in .text changes so the size is bumped over the 512 byte boundary, your offsets will change. By default (but see below) you can't get a section to start before .text.
The -falign-functions=n commandline option lets you align functions to a boundary. Normally this is something around 16 bytes. Now, you could choose a large value like for example 1024. That will waste an immense amount of space, but it will also make sure that as long as functions only change moderately, the addresses of the following functions will remain the same. Obviously it still does not prevent the compiler/linker from reordering entire blocks when it feels like it (though -fno-toplevel-reorder will prevent this at least partially).
If you are willing to write a custom linker script, you can assign a start address for each section. These are virtual memory addresses, not positions in the executable, but I assume the hard linking works with VMAs (based on the default image base) too. So that could kind of work, although with much trouble and not in a pretty way.
When writing your own linker script, you could also consider putting the functions that must not move into their own sections and moving these sections at the beginning of the executable (in front of .text), so changes in .text won't move your functions around.
Update:
The "gcc" tag suggests that you probably target *NIX, so again this is probably not going to help you, but... if you have the option to use COFF, dollar-sign sections might work (the info might be interesting for others, in any case).
I just stumbled across this today (emphasis mine):
The "$" character (dollar sign) has a special interpretation in section names in object files. When determining the image section that will contain the contents of an object section, the linker discards the "$" and all characters that follow it. Thus, an object section named .text$X actually contributes to the .text section in the image. However, the characters following the "$" determine the ordering of the contributions to the image section. All contributions with the same object-section name are allocated contiguously in the image, and the blocks of contributions are sorted in lexical order by object-section name. Therefore, everything in object files with section name .text$X ends up together, after the .text$W contributions and before the .text$Y contributions.
If the documentation does not lie (and if I'm not reading wrong), this means you should be able to pack all the functions that you want located in the front into one section .text$A, and everything else into .text$B, and it should do just that.
Build your code with -ffunction-sections -- this will place each function into its own section.
If you are using GNU-ld, the linker script gives you absolute control, but is a very platform-specific and somewhat painful solution.
A better solution might be to use the recent work on gold, which allows exactly the function ordering you are seeking.
A lot of it comes from the order the functions are in the file and the order the files are on the command line when you link.
Embed something in the code that your external code can find, a const structure with some ascii code and the address to functions perhaps, then no matter where the compiler puts the functions you can find them.
that or use the normal .dll or .so mechanisms, and not have to mess with it.
In my experience, gcc -O0 will fix the binary order of functions to match the order in the source code.
However as others have mentioned, even if the order is fixed, the offsets can change as you modify the source code or upgrade your toolchain.

Looking for C source code for snprintf()

I need to port snprintf() to another platform that does not fully support GLibC.
I am looking for the underlying declaration in the Glibc 2.14 source code. I follow many function calls, but get stuck on vfprintf(). It then seems to call _IO_vfprintf(), but I cannot find the definition. Probably a macro is obfuscating things.
I need to see the real C code that scans the format string and calculates the number of bytes it would write if input buffer was large enough.
I also tried looking in newlib 1.19.0, but I got stuck on _svfprintf_r(). I cannot find the definition anywhere.
Can someone point me to either definition or another one for snprintf()?
I've spent quite a while digging the sources to find _svfprintf_r() (and friends) definitions in the Newlib. Since OP asked about it, I'll post my finding for the poor souls who need those as well. The following holds true for Newlib 1.20.0, but I guess it is more or less the same across different versions.
The actual sources are located in the vfprintf.c file. There is a macro _VFPRINTF_R set to one of _svfiprintf_r, _vfiprintf_r, _svfprintf_r, or _vfprintf_r (depending on the build options), and then the actual implementation function is defined accordingly:
int
_DEFUN(_VFPRINTF_R, (data, fp, fmt0, ap),
struct _reent *data _AND
FILE * fp _AND
_CONST char *fmt0 _AND
va_list ap)
{
...
http://www.ijs.si/software/snprintf/ has what they claim is a portable implementation of snprintf, including vsnprintf.c, asnprintf, vasnprintf, asprintf, vasprintf. Perhaps it can help.
The source code of the GNU C library (glibc) is hosted on sourceware.org.
Here is a link to the implementation of vfprintf(), which is called by snprintf():
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c

Syntax Checking with unsupported languages

I have some files that have a particular syntax that is similar to ada (not identical though), however I would like to verify the syntax before going and running them. There isn't a compiler for these files, so I can't check them before using them. I tried to use the following:
gcc -c -gnats <file>
However this says compilation unit expected. I've tried a few variations on this, but to no avail.
I just want to make sure the file is syntactically correct before using it, but I'm not sure how to do it, and I really don't want to write an entire syntax checker just for this.
Is there some way to include an additional unsupported language to gcc without going through a recompile? Also is this simply a file that details to gcc what the syntax constructs are, or what would be entailed? I don't need a full compile, only a syntax check.
Alternately, are there any syntax checkers I can use that I can update an ada syntax check with the small number of changes required for this language?
I've listed Ada as a tag, since the syntax is nearly identical, and finding something that will do ada syntax checking without compiling will be a 90% solution for me.
You could try running the files through gnatchop first. The GCC Ada compiler is rather unique in that it expects filenames to match up with the main unit names inside the file. That may be what your error message is trying to say.
gnatchop will go through any files you give it and write out Ada source files with the appropriate names to make gcc happy (even splitting files into multiple files if needed).
Another option you might be interested in is OpenToken. It is a parser construction toolkit, written in Ada, that allows you to build your own parsers fairly easily. It comes with a syntax recognizer for Ada, so you may just be able to tweak that a bit for your needs.

Where is the source code for isnan?

Because of the layers of standards, the include files for c++ are a rats nest. I was trying to figure out what __isnan actually calls, and couldn't find anywhere with an actual definition.
So I just compiled with -S to see the assembly, and if I write:
#include <ieee754.h>
void f(double x) {
if (__isinf(x) ...
if (__isnan(x)) ...
}
Both of these routines are called. I would like to see the actual definition, and possibly refactor things like this to be inline, since it should be just a bit comparison, albeit one that is hard to achieve when the value is in a floating point register.
Anyway, whether or not it's a good idea, the question stands: WHERE is the source code for __isnan(x)?
Glibc has versions of the code in the sysdeps folder for each of the systems it supports. The one you’re looking for is in sysdeps/ieee754/dbl-64/s_isnan.c. I found this with git grep __isnan.
(While C++ headers include code for templates, functions from the C library will not, and you have to look inside glibc or whichever.)
Here, for the master head of glibc, for instance.

Resources