How to embed metadata in object file from GCC plugin - gcc

I'm trying to write a GCC plugin that does some domain-specific analysis of the programs it compiles. I'm wondering about the best way to embed the analysis results as some kind of metadata (like debug information) in the generated object files.
Ideally, some metadata (in my case, text) should be embedded in each object file, the linker should retain the data from all the objects it links, and finally I should have some way to access all the metadata from the final binary using objdump, readelf or similar.
My current idea is to try to add a uniquely named global string variable to each compilation unit, by adding it to the GIMPLE AST. However, I'm wondering if there is a more "disciplined" way; how can plugins generate debug information or other such metadata?

I'm giving myself a preliminary answer, based on this answer on how to create a global variable: Insert global variable declaration whit a gcc plugin
This code seems to work for just embedding a string my_string of length size as variable varname in the binary:
// make a char array type
type = build_array_type_nelts(char_type_node, size);
// create the variable and set its name
var = add_new_static_var(type);
name = get_identifier(varname);
DECL_NAME(var) = name;
// make sure this is a definition (otherwise GCC optimizes it away!)
TREE_PUBLIC(var) = 1;
// initialize the variable to a string value
initializer = build_string_literal(size, my_string);
DECL_INITIAL(var) = initializer;

Related

Have dll import symbols from its calling .exe

Related to but not equivalent to DLL Get Symbols From Its Parent (Loader)
Is there a way to convince the Windows loader to resolve a particular symbol referenced by A.dll from either the loading executable or an intermediate dll without specifying the file to resolve symbols from in A.dll?
It's pretty obvious how to do it if the loading .exe has a known name, but if it isn't ...
Here's a good reason why you'd actually want to do this: https://www.gnu.org/software/libc/manual/html_node/Replacing-malloc.html
If this can be done, a good answer would say how to do it some way or another.
I'm half-expecting the answer is it can't be done. In that case, a good answer would show why this is impossible. "The build tools don't support this." is a bad answer.
when we use import we need exactly indicate module name and function name. and we can not use complex algorithms. also for exe not exist well known alias which we can use in place exactly exe name. for compare: in case get GetModuleHandle we can use NULL for get handle to the file used to create the calling process (.exe file). but in case LoadLibraryExW we can not use 0 or empty string (L"") or some another alias for say - we want handle to exe. when loader load our module - he read dll name from IMAGE_IMPORT_DESCRIPTOR and try found or load module with this name first by low level, private, core of LoadLibraryExW. here need exactly name. or load fail. as result use import - not a solution here, if we dont know exe name at build time
possible variant - resolve functions pointers yourself at runtime. here we can get exe HMODULE by GetModuleHandle(0). also if need we can search function not only in exe but somewhere else. can implement any search algorithm.
here exist several ways. for concrete example let we need get pointer to function with signature:
void WINAPI fn(int i);
we can declare pointer to this function and resolve it in runtime
void (WINAPI *fn)(int);
*(void**)&fn = GetProcAddress(GetModuleHandleW(0), "fn");
say on DLL_PROCESS_ATTACH
a slightly different solution (although at the binary level it is completely equivalent) declare function with __declspec(dllimport) attribute. this is for CL.EXE (more known as MSVC) compiler only. so
__declspec(dllimport) void fn(int i);
in this case CL yourself generate pointer to function with name __imp_ ## __FUNCDNAME__ name. so by fact the same as in first variant, when we declare pointer yourself. only difference in syntax and.. symbol name. it will be look like __imp_?fn2##YAXH#Z. problem here that __imp_?fn2##YAXH#Z not valid name for c/c++ - we can not direct assign value to it from c/c++. even if we declare function with extern "C" - function name will be containing # symbol (illegal for c++) for __stdcall and __fastcall functions, for x86. also name will be different for different platforms (x86, x64, etc). for access such names - need or use external asm file (for asm ? and # symbols valid in name) or use /alternatename linker option - for set alias for such name and access symbol via it. say like
__pragma(comment(linker, "/alternatename:__imp_?fn##YAXH#Z=__imp_fn"))
and init via
*(void**)&__imp_fn = GetProcAddress(GetModuleHandle(0), "fn");
another option use __declspec(dllimport) in function declarations + add import library, where all __imp___FUNCDNAME__ (such __imp_?fn2##YAXH#Z) is defined. (even if we have not such library we can easy create it yourself - all what need - correct function declarations with empty implementation). and after we add such import lib to linker input - add /DELAYLOAD:dllname where dllname - exactly name from import lib. sense that this dllname will(can) be not match to exe - all what need - it must be unique. and we need yourself handle delayload (called when we first time call fn). for implement delayload we need implement
extern "C" FARPROC WINAPI __delayLoadHelper2(
PCImgDelayDescr pidd,
FARPROC * ppfnIATEntry
);
we can implement it yourself, or add delayimp.lib to our project. here (in delayimp.lib) the delayLoadHelper2 and implemented. however we must customize this process (default implementation(look in /include/DelayHlp.cpp) will be use LoadLibraryExA with dllname which is not excepted in our case - otherwise we can be simply use import as is). so we need mandatory implement __pfnDliNotifyHook2:
for example:
FARPROC WINAPI MyDliHook(
unsigned dliNotify,
PDelayLoadInfo pdli
)
{
switch (dliNotify)
{
case dliNotePreLoadLibrary:
if (!strcmp(pdli->szDll, "unique_exe_alias"))
{
return (FARPROC)GetModuleHandle(0);
}
}
return 0;
}
const PfnDliHook __pfnDliNotifyHook2 = MyDliHook;
we can look for dliNotePreLoadLibrary notification and instead default LoadLibraryEx(dli.szDll, NULL, 0); use GetModuleHandle(0); for get base of exe.
the "unique_exe_alias" (which linker got from import library) here play role not real exe name, which is unknown, but unique tag(alias) for exe

Embedding big data file into executable binary

I am working on a C++11 application that is supposed to ship as a single executable binary file. Optionally, users can provide their own CSV data files to be used by the application. To simplify things, assume each element is in format key,value\n. I have created a structure such as:
typedef struct Data {
std::string key;
std::string value;
Data(std::string key, std::string value) : key(key), value(value) {}
} Data;
By default, the application should use data defined in a single header file. I've made a simple Python script to parse default CSV file and put it into header file like:
#ifndef MYPROJECT_DEFAULTDATA
#define MYPROJECT_DEFAULTDATA
#include "../database/DefaultData.h"
namespace defaults {
std::vector<Data> default_data = {
Data("SomeKeyA","SomeValueA"),
Data("SomeKeyB","SomeValueB"),
Data("SomeKeyC","SomeValueC"),
/* and on, and on, and on... */
Data("SomeKeyASFHOIEGEWG","SomeValueASFHOIEGEWG")
}
}
#endif //MYPROJECT_DEFAULTDATA
The only problem is, that file is big. I'm talking 116'087 (12M) lines big, and it will probably be replaced with even bigger file in the future. When I include it, my IDE is trying to parse it and update indices. It slows everything down to the point where I can hardly write anything.
I'm looking for a way to either:
prevent my IDE (CLion) from parsing it or
make a switch in cmake that would use this file only with release executables or
somehow inject data directly into executable
Since your build process already includes a pre-process, which generates C++ code from a CSV, this should be easy.
Step 1: Put most of the generated data in the .cpp file, not a header.
Step 2: Generate your code so that it doesn't use vector or string.
Here's how to do these:
struct Data
{
string_view key;
string_view value;
};
You will need an implementation of string_view or a similar type. While it was standardized in C++17, it doesn't rely on C++17 features.
As for the data structure itself, this is what gets generated in the header:
namespace defaults {
extern const std::array<Data, {{GENERATED_ARRAY_COUNT}}> default_data;
}
{{GENERATED_ARRAY_COUNT}} is the number of items in the array. That's all the generated header should expose. The generated .cpp file is a bit more complex:
static const char ptr[] =
"SomeKeyA" "SomeValueA"
"SomeKeyB" "SomeValueB"
"SomeKeyC" "SomeValueC"
...
"SomeKeyASFHOIEGEWG" "SomeValueASFHOIEGEWG"
;
namespace defaults
{
const std::array<Data, {{GENERATED_ARRAY_COUNT}}> default_data =
{
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
...
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
};
}
ptr is a string which is a concatenation of all of your individual strings. There is no need to put spaces or \0 characters or whatever between the individual strings. However, if you do need to pass these strings to APIs that take NULL-terminated strings, you'll either have to copy them into a std::string or have the generator stick \0 characters after each generated sub-string.
The point is that ptr should be a single, giant block of character data.
{{GENERATED_OFFSET}} and {{GENERATED_SIZE}} are offsets and sizes within the giant block of character data that represents a single substring.
This method will solve two of your problems. It will be much faster at load time, since it performs zero dynamic allocations. And it puts the generated strings in the .cpp file, thus making your IDE cooperate.

What input section attributes are required to place a C variable in a data memory output section?

My application has a number of modules that each require some variables to be stored in off-chip non-volatile memory. To make the reading and writing of these easier, I'm trying to collect them together into a contiguous region of RAM, to that the NVM driver can address a single block of memory when communicating with the NVM device.
To achieve this, I have created a custom linker script containing the following section definition.
.nvm_fram :
{
/* Include the "nvm_header" input section first. */
*(.nvm_header)
/* Include all other input sections prefixed with "nvm_" from all modules. */
*(.nvm_*)
/* Allocate a 16 bit variable at the end of the section to hold the CRC. */
. = ALIGN(2);
_gld_NvmFramCrc = .;
LONG(0);
} > data
_GLD_NVM_FRAM_SIZE = SIZEOF(.nvm_fram);
The data region is defined in the MEMORY section using the standard definition provided by Microchip for the target device.
data (a!xr) : ORIGIN = 0x1000, LENGTH = 0xD000
One example of a C source file which attempts to place its variables in this section is the NVM driver itself. The driver saves a short header structure at teh beginning of the NVM section so that it can verify the content of the NVM device before loading it into RAM. No linker error reported for this variable.
// Locate the NVM configuration in the non-volatile RAM section.
nvm_header_t _nvmHeader __attribute__((section(".nvm_header")));
Another module that has variables to store in the .nvm_fram section is the communications (CANopen) stack. This saves the Module ID and bitrate in NVM.
// Locate LSS Slave configuration in the non-volatile RAM section.
LSS_slave_config_t _slaveConfig __attribute__((section(".nvm_canopen"))) =
{ .BitRate = DEFAULT_BITRATE, .ModuleId = DEFAULT_MODULEID };
Everything compiles nicely, but when the linker runs, the following error stops the build.
elf-ld.exe: Link Error: attributes for input section '.nvm_canopen' conflict with
output section '.nvm_fram'
It's important that the variables can be initialised with values by the crt startup, as shown by the _slaveConfig declaration above, in case the NVM driver cannot load them from the NVM device (it's blank, or the software version has changed, etc.). Is this what's causing the attributes mismatch?
There are several questions here and on the Microchip forums, which relate to accessing symbols that are defined in the linker script from C. Most of these concern values in the program Flash memory and how to access them from C; I know how to do this. There is a similar question, but this doesn't appear to address the attributes issue, and is a little confusing due to being specific to a linker for a different target processor.
I've read the Microchip linker manual and various GCC linker documents online, but can't find the relevant sections because I don't really understand what the error means and how it relates to my code. What are the 'input and output section attributes', where are they specified in my code, and how do I get them to match eachother?
The problem is due to the _nvmHeader variable not having an initial value assigned to it in the C source, but the _slaveConfig variable does.
This results in the linker deducing that the .nvm_fram output section is uninitialised (nbss) from the .nvm_header input section attributes. So, when it enconters initialised data in the .nvm_canopen input section from the _slaveConfig variable, there is a mismatch in the input section attributes: .nvm_fram is for uninitialised data, but .nvm_canopen contains initialised data.
The solution is to ensure that all variables that are to be placed in the .nvm_fram output section are given initial values in the C source.
// Type used to hold metadata for the content of the NVM.
typedef struct
{
void* NvmBase; // The original RAM address.
uint16_t NvmSize; // The original NVM section size.
} nvm_header_t;
// The linker supplies the gld_NVM_FRAM_SIZE symbol as a 'number'.
// This is represented as the address of an array of unspecified
// length, so that it cannot be implicitly dereferenced, and cast
// to the correct type when used.
extern char GLD_NVM_FRAM_SIZE[];
// The following defines are used to convert linker symbols into values that
// can be used to initialise the _nvmHeader structure below.
#define NVM_FRAM_BASE ((void*)&_nvmHeader)
#define NVM_FRAM_SIZE ((uint16_t)GLD_NVM_FRAM_SIZE)
// Locate the NVM configuration in the non-volatile RAM section.
nvm_header_t _nvmHeader __attribute__((section(".nvm_header"))) =
{
.NvmBase = NVM_FRAM_BASE, .NvmSize = NVM_FRAM_SIZE
};
The answer is therefore that the output section attributes may be determined partly by the memory region in which the section is to be located and also by the attributes of the first input section assigned to it. Initialised and uninitialised C variables have different input section attributes, and therefore cannot be located within the same output section.

Golang assignment of []map[string]struct error

As you could probably tell from the below code I am working on a project which creates csv reports from data in mongoDB. After getting the data I need in, I need to structure the data into something more sensible then how it exists in the db, which is fairly horrendous (not my doing) and near impossible to print the way I need it. The structure that makes the most sense to me is a slice (for each document of data) of maps of the name of the data to a structure holding the data for that name. Then I would simply have to loop through the document and stuff values into the structs where they belong.
My implementation of this is
type mongo_essential_data_t struct {
caution string
citation string
caution_note string
}
mongo_rows_struct := make([]map[string]mongo_essential_data_t, len(mongodata_rows))
//setting the values goes like this
mongo_rows_struct[i][data_name].caution_note = fmt.Sprint(k)
//"i" being the document, "k" being the data I want to store
This doesn't work however. When doing "go run" it returns ./answerstest.go:140: cannot assign to mongo_rows_struct[i][data_name].caution_note. I am new to Go and not sure why I am not allowed to do this. I'm sure this is an invalid way to reference that particular data location, if it is even possible to reference it in Go. What is another way to accomplish this setting line? If it is too much work to accomplish this the way I want, I am willing to use a different type of data structure and am open to suggestions.
This is a known issue of Golang, known as issue 3117. You can use a temporary variable to get around it:
var tmp = mongo_rows_struct[i][data_name]
tmp.caution_note = fmt.Sprint(k)
mongo_rows_struct[i][data_name] = tmp
as per my understanding, when you write:
mongo_rows_struct[i][data_name]
compiler will generate code, which will return copy of mongo_essential_data_t struct(since struct in go is value type, not reference type), and
mongo_rows_struct[i][data_name].caution_note = fmt.Sprint(k)
will write new value to that copy. And after that copy will be discarded. Obviously, its not what you expect. So Go compiler generate error to prevent this misunderstanding.
In order to solve this problem you can:
1. Change definition of your data type to
[]map[string]*mongo_essential_data_t
2. Explicitly create copy of your struct, make changes in that copy and write it back to the map
data := mongo_rows_struct[i][data_name]
data.caution_note = fmt.Sprint(k)
mongo_rows_struct[i][data_name] = data
Of course, first solution is preferable because you will avoid unnecessary copying of data

How to create an INI file with an empty section?

I want to write a *.ini file in MFC. I know those typical methods with section names, key names and key values.
I`m wondering how to write a ini file which contains only a section name, an ini file like below:
...
[section]
...
I tried the function WritePrivateProfileString() with two NULL parameters; I thought it would work but failed.
Standard ini files are supposed to be in a special format, if you're writing them in a incompatible format (which I think you are), they're not standard ini files, but you can just write it manually using normal IO classes (CStdioFile or similar, too long since I did MFC so I can't remember the best way).
That way you can write any data you want in any format you want.
Inorder to get an empty section first define a key in that section and then delete that key by doing this you will get an empty section. [section]
[DllImport("kernel32", CharSet = CharSet.Unicode)]
static extern long WritePrivateProfileString(string section, string key, string value, string filePath);
public void DeleteKey(string Key, string Section = null)
{
Write(Key, null, Section ?? exe);
}

Resources