How can I create a custom variable attribute to direct movs into different address spaces? - gcc

So, I'm building a custom backend for GCC for a processor. This processor has 4 address spaces: local, global, mmm, and mmr. I want to make it such that when writing c code, you can do this:
int global x = 5;
which would cause the compiler to spit out an instruction like this:
ldi.g %reg, 5
I know that certain processors like blackfin and MeP do something similar to this, so I figure its possible to do, however I have no idea how to do it. The technique that should allow me to do this is a variable attribute.
Any suggestions on how I could go about doing this?

You can add target-specific attributes by registering a struct attribute_spec table using TARGET_ATTRIBUTE_TABLE, as described in the GCC internals documentation. The details of struct attribute_spec can be found in the source (gcc/tree.h).
This handler doesn't need to do anything beyond returning NULL_TREE, although typically it will at least do some error checking. (Read the comments in gcc/tree.h, and look at examples in other targets.)
Later, you can obtain the list of attributes for a declaration tree node with DECL_ATTRIBUTES() (see the internals docs again), and use lookup_attribute() (see gcc/tree.h again) to see if a given attribute in the list.
You want to references to a symbol to generate different assembly based on your new attributes, so you probably want to use the TARGET_ENCODE_SECTION_INFO hook ("Define this hook if references to a symbol or a constant must be treated differently depending on something about the variable or function named by the symbol") to set a flag on the symbol_ref (as the docs suggest). You can define a predicate for testing this flag in the .md .

Related

Is there a way to somehow group several symbols from different files together so that if one of them is referenced, all are linked?

A bit of context first.
I'm curently working on a modular (embedded) microOS with drivers, ports, partitions and other such objects represented as structures, with distinct "operations" structures containing pointer to what you would call their methods. Nothing fancy there.
I have macros and linker script bits to make it so that all objects of a given type (say, all drivers), though their definitions are scattered across source files, are disposed as in an array, somewhere in flash, but in a way that lets the linker (I work with GNU GCC/LD.) garbage collected those who aren't explicitly referenced in the code.
However, after a few years refining the system and increasing its flexibility, I come at a point where it is too flash-greedy for small to medium microcontrollers. (I work only with 32 bits architectures, nothing too small.) I was to be exepected, you might say, but I'm trying to go further and do better, and currently I'm doubting LD will let me do it.
What I would like to get is that methods/functions which aren't used by the code get garbage collected too. This isn't currently the case, since they are all referred to by the pointer structures of the objects owning them. I would like to avoid using macro configuration switches and the application developper having to go through several layers of code to determine which functions are currently used and which ones can be safely disabled. I would really, really like that the linker's garbage collection let me automatize all the process.
First I thought I could split each methods structure between sections, so that, say, all the ports' "probe" methods end up in a section called .struct_probe, and that the wrapping port_probe() function could reference a zero-length object inside that function, so that all ports' probe references get linked if and only if the port_probe() function is called somewhere.
But I was wrong in that, for the linker, the "input sections" which are his resolution for garbage collection (since at this point there's no more alignment information inside, and it couldn't afford to take advantage of a symbol - and afferent object - being removed by reordering the insides of the containing section and shrinking it) aren't identified solely by a section name, but by a section name and a source file. So if I implement what I intended to, none of my methods will get linked in the final executable, and I will be toast.
That's where I'm at currently, and frankly I'm quite at a loss. I wondered if maybe someone here would have a better idea for either having each method "backward reference" the wrapping function or some other object which would in turn be referenced by the function and take all methods along, or as the title says, somehow group those methods / sections (without gathering all the code in a single file, please) so that referencing one means linking them all.
My gratitude for all eternity is on the line, here. ;)
Since I have spent some time documenting and experimenting on the following lead, even though without success, I would like to expose here what I found.
There's a feature of ELF called "group sections", which are used to define sections groups, or groups of sections, such as if one member section of the group is live (hence linked), all are.
I hoped this was the answer to my question. TL;DR: It wasn't, because group sections are meant to group sections inside a module. Actually, the only type of groups currently defined is COMDAT groups, which are by definition exclusive from groups with the same name defined in other modules.
Documentation on that feature and its implementation(s) is scarce to say the least. Currently, the standard's definition of group sections can be found here.
GCC doesn't provide a construct to manipulate section groups (or any kind of sections flags/properties for that matter). The GNU assembler's documentation specifies how to affect a section to a group here.
I've found no evocation in any GNU document about LD's handling of group sections. It is mentioned here and here, though.
As a bonus, I've found a way to specify sections properties (including grouping) in C code with GCC. This is a dirty hack, so it may be it won't work anymore by the time you read this.
Apparently, when you write
int bar __attribute__((section("<name>")));
GCC takes what's between the quotes an blindly pastes it so in the assembly output:
.section <name>,"aw"
(The actual flags can differ if the name matches one of a few predefined patterns.)
From there, it's all a matter of code injection. If you write
int bar __attribute__((section("<name>,\"awG\",%probbits,<group> //")));
you get
.section <name>,"awG",%progbits,<group> //"aw"
and the job is done. If you wonder why simply characterizing the section in a separate inline assembly statement isn't enough, if you do that you get an empty grouped section and a stuffed solitary section with the same name, which won't have any effect at link time. So.
This isn't entierely satisfying, but for lack of a better way, that's what I went for:
It seems the only way you have to effectively merge sections from several compilation units from the linker's point of view is to first link the resulting objects together in one big object, then link the final program using that big object instead of the small ones. Sections that had the same name in the small objects will be merged in the big one.
This is a bit dirty, though, and has also some drawbacks, such as perhaps merging some sections you wouldn't want to be, for garbage collecting purposes, and hiding which file each section comes from (though the information remains in the debug sections) if, say, you wanted to split the main sections (.text, .data, .bss...) in the final ELF so as to be able to see what file contributes what amount to flash and RAM usage, for instance.

goyacc: getting context to the yacc parser / no `%param`

What is the most idiomatic way to get some form of context to the yacc parser in goyacc, i.e. emulate the %param command in traditional yacc?
I need to parse to my .Parse function some context (in this case including for instance where to build its parse tree).
The goyacc .Parse function is declared
func ($$rcvr *$$ParserImpl) Parse($$lex $$Lexer) int {
Things I've thought of:
$$ParserImpl cannot be changed by the .y file, so the obvious solution (to add fields to it) is right out, which is a pity.
As $$Lexer is an interface, I could stuff the parser context into the Lexer implementation, then force type convert $$lex to that implementation (assuming my parser always used the same lexer), but this seems pretty disgusting (for which read non-idiomatic). Moreover there is (seemingly) no way to put a user-generated line at the top of the Parse function like c := yylex.(*lexer).c, so in the many tens of places I want to refer to this variable, I have to use the rather ugly form yylex.(*lexer).c rather than just c.
Normally I'd use %param in normal yacc / C (well, bison anyway), but that doesn't exist in goyacc.
I'd like to avoid postprocessing my generated .go file with sed or perl for what are hopefully obvious reasons.
I want to be able to (go)yacc parse more than one file at once, so a global variable is not possible (and global variables are hardly idiomatic).
What's the most idiomatic solution here? I keep thinking I must be missing something simple.
My own solution is to modify goyacc (see this PR) which adds a %param directive allowing one or more fields to be added to the $$ParserImpl structure (accessible as $$rcvr in code). This seems the most idiomatic route. This permits not only passing context in, but the ability for the user to add additional func()s using $$ParserImpl as a receiver.

__attribute__((io)), __attribute__((address)) in gcc for AVR don't seem to have any effect

I am trying to use variable attributes specifically provided by AVR flavor of gcc (https://gcc.gnu.org/onlinedocs/gcc/AVR-Variable-Attributes.html#AVR-Variable-Attributes).
The manual says that these special attributes should allow me to force the placement of a variable at the predetermined memory address. They even give an example:
volatile int porta __attribute__((address (0x600)));
But when I compile and debug this code example from the above mentioned document, the variable declared with such attribute is placed into a location in SRAM that compiler and linker determine, not at the address 0x600, as requested. Actually, if I remove the attribute entirely from the declaration, the end result does not change - the variable is placed at the same "whatever" address. Same thing happens when I use "io" and "io_low" attributes instead of "address".
I am using gcc toolchain packaged in the latest version Atmel Studio 7.0.19.31 targeted at 8-bit MCUs (ATMega64).
Hence the question: has anyone tried to use these special AVR-specific attributes with any success?
Important notes:
I am aware that in general to accomplish a placement of a variable at a fixed address in gcc you need to follow a two-step process (using section attribute and then modifying the linker script), but specificially for AVR it seems like these single-step attributes were provided, the question is how to make them work. A two-step process is not an option for me.
I am aware that in general one can always do this:
*(volatile int*)0x600 = your_data_here;
But this is not an option for me either, I need an actual variable declared (because I want to map it onto a bitwise structure to have access to individual bits without explicitly using the masks and logical operations.
So I am really looking for a way to make the provided attributes work, not for a workaround. What am I missing?
typedef struct {
uint8_t rx:4;
uint8_t tx:4;
} Pio_TXRXMUX_t;
#define Pio_TXRXMUX (*(volatile Pio_TXRXMUX_t *)(0x22)) //PORTA on ATMEGA1280

How to programmatically inject parameters/instructions into a pre-built portable executable

I have two executables, both manually created by me, I shall call them 1.exe and 2.exe respectively. First of all, both the executables are compiled by MSVS 2010, using the Microsoft compiler. I want to type a message into 1.exe, and I want 1.exe to inject that message into 2.exe (possibly as some sort of parameter), so when I run 2.exe after 1.exe has injected the message, 2.exe will display that message.
NOTE - this is not for illicit use, both these executables were created by me.
The big thing for me is:
Where to place the message/instructions in 2.exe so they can be easily accessed by 2.exe
How will 2.exe actually FIND use these parameters (message).
I fully understand that I can't simply use C++ code as injection, it must be naked assembly which can be generated/translated by the compiler at runtime (correct me if I am wrong)
Some solutions I have been thinking of:
Create a standard function in 2.exe requiring parameters (eg displaying the messagebox), and simply inject these parameters (the message) into the function?
Make some sort of structure in 2.exe to hold the values that 1.exe will inject, if so how? Will I need to hardcode the offset at which to put these parameters into?
Note- I don't expect a spoonfeed, I want to understand this aspect of programming proficiently, I have read up the PE file format and have solid understanding of assembly (MASM assembler syntax), and am keen to learn alot more. Thank you for your time.
Very few programmers ever need to do this sort of thing. You could go your entire career without it. I last did it in about 1983.
If I remember correctly, I had 2.exe include an assembler module with something like this (I've forgotten the syntax):
.GLOBAL TARGET
TARGET DB 200h ; Reserve 512 bytes
1.exe would then open 2.exe, search the symbol table for the global symbol "TARGET", figure out where that was within the file, write the 512 bytes it wanted to, and save the file. This was for a licensing scheme.
The comment from https://stackoverflow.com/users/422797/igor-skochinsky reminded me that I did not use the symbol table on that occasion. That was a different OS. In this case, I did scan for a string.
From your description it sounds like passing a value on the command line is all you need.
The Win32 GetCommandLine() function will give you ther passed value that you can pass to MessageBox().
If it needs to be another running instance then another form of IPC like windows messages (WM_COPYDATA) will work.

GCC syntax check ensure NULL passed as last parameter in function call with variable arguments

I want to do something similar to how, in GCC, you can do syntax checking on printf-style calls (to make sure that the argument list is actually correct for the call).
I have some functions that take a variable number of parameters. Rather than enforce what parameters are sent, I need to ensure that the last parameter passed is a NULL, regardless of how many parameters are passed-in.
Is there a way to get GCC to do this type of syntax check during compile time?
You probably want the sentinel function attribute, so declare your function like
void foo(int,double,...) __attribute__((sentinel));
You might consider customizing your GCC with a plugin or a MELT extension to typecheck more precisely your variadic functions. That is, you could extend GCC with your own attributes which would do more precise checks (or simply make additional checks based on the names of your functions).
The ex06/ example of melt-examples is doing a similar check for the jansson library; unfortunately that example is incomplete today October 18th 2012, I am still working on it.
In addition, you could define a variadic macro to call such a function by always adding a NULL e.g. something like:
#define FOO(N,D,...) foo((N),(D),##__V_ARGS__,NULL)
Then by coding FOO(i+3,3.14,"a") you'll get foo((i+3),(3.14),"a",NULL) so you are sure that a NULL is appended.
Basile Starynkevitch is right, go with a function attribute. There are a ton of other useful function attributes, like being able to tell the compiler "If the caller doesn't check the return value of this function, it's an error."
You may also want to see if splint can check for you, but I don't think so. I think it would have stuck in my memory.
If you haven't read over this page of GCC compiler flags, do that, too. There are a ton of handy checks in there. http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

Resources