Extract global variables from a.out file - gcc

Edit (updated question)
I have a simple C program:
// it is not important to know what the code does you may skip the code
main.c
#include <bsp.h>
unsigned int AppCtr;
unsigned char AppFlag;
int SOME_LARGE_VARIABLE;
static void AppTest (void);
void main (void)
{
AppCtr = 0;
AppFlag = 0;
AppTest();
}
static void Foo(void){
SOME_LARGE_VARIABLE=15;
}
static void AppTest (void)
{
unsigned int i;
i = 0;
while (i < 200000) {
i++;
}
BSP_Test();
SOME_LARGE_VARIABLE=3;
Foo();
}
bsp.c
extern int SOME_LARGE_VARIABLE;
extern unsigned char AppFlag;
unsigned int long My_GREAT_COUNTER;
void BSP_Test (void) {
SOME_LARGE_VARIABLE = 5;
My_GREAT_COUNTER = 4;
}
(the program does not do anything useful... My goal is to extract the variable names their location where they are being declared and their memory address)
When I compile the program I get the file a.out which is an elf file containing debug information.
Someone on the company wrote a program in .net 5 years ago that will get all this information from the a.out file. This is what the code returns:
// Name Display Name Type Size Address
For this small program it works great and also for other large projects.
That code is 2000 lines long with several bugs and it does not support .NET version 4. That's why I am trying to recreate it.
So my question is, I am lost in the sense that I don't know what approach to take in order to solve this problem. These are the options I have been considering:
Organize the buggy code of the program I showed on the first image and try to see what it does and how it parses the a.out file in order to get that information. Once I fully understand it try to figure out why it does not support version 3 and 4.
I am ok at creating regex expressions so maybe try to look for the pattern in the a.out file by doing something like: So far I was able to find the pattern where there is just one file (main.c). But when there are several files it get's more complicated. I haven't tried it yet. Maybe it will be not that complicated and it will be possible to find the pattern.
Install Cygwin so that I can use linux commands on windows such as objdump, nm or elfread. I have't played enough with the commands when I use those commands such as readelf -w a.out I get way more information that I need. There are some cons why I have not spend that much time with this approach:
Cons: It takes a while to install cygwin on windows and when giving this application to our customers we don't want them to have to install it. Maybe there is a way of just installing the commands objdump and elfread without having to install the whole thing
Pros: If we find the right command to use we will not be reinventing the wheel and save some time. Maybe it is a matter of parsing the results of a command such as objdump -w a.out
In case you want to download the a.out file in order to parse it here it is.
Summary
I will to be able to get the global variables on a.out file. I will like to know what type each variable is (int, char, ..), what memory address they have and I will also like to know on what file the variable is being declared (main.c or someOtherFile.c). I will appreciate if I don't have to use cygwin as that will make it more easy to deploy. Since this question asks for a lot, I attempted to split it into more:
objdump/readelf get variables information
Get location of symbols in a.out file
perhaps I should delete the other questions. sorry being redundant.

Here is what I will do. Why reinvent the wheel!
Download linux commands that will be needing on windows from here.
on the bin directory there should be: readelf.exe
Note we will not need Cygwin or any program so deploying will be simple!
Once we have that file execute in cmd:
// cd "path where readelf.exe is"
readelf.exe -s a.out
and this is the list that will come out:
so if you take a look we are interested in getting all the variables that are of type OBJECT with size greater than 0.
Once we got the variables we can use the readelf.exe -w a.out command to take a look at the tree and it looks like: let's start looking for one of the variable we found on step 2 (SOME_GREAT_COUNTER) Note that at the top we know the location where the variable is being declared, we got more information such as the line where it was declared and the memory address
The last thing we are missing to do is to get the type. if you take a look we see that the type is = <0x522>. What that means is that we have to go to 522 of the tree to get more info about that time. If we go to that part this is what we get: From looking at the tree we know that SOME_LARGE_VARIABLE is of type unsigned long

Related

Getting "cannot find symbol .... while executing load ..." error when trying to run Hello World as a C extension (dll) example

I have used the C code from the following verbatim: https://wiki.tcl-lang.org/page/Hello+World+as+a+C+extension
/*
* hello.c -- A minimal Tcl C extension.
*/
#include <tcl.h>
static int
Hello_Cmd(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const objv[])
{
Tcl_SetObjResult(interp, Tcl_NewStringObj("Hello, World!", -1));
return TCL_OK;
}
/*
* Hello_Init -- Called when Tcl loads your extension.
*/
int DLLEXPORT
Hello_Init(Tcl_Interp *interp)
{
if (Tcl_InitStubs(interp, TCL_VERSION, 0) == NULL) {
return TCL_ERROR;
}
/* changed this to check for an error - GPS */
if (Tcl_PkgProvide(interp, "Hello", "1.0") == TCL_ERROR) {
return TCL_ERROR;
}
Tcl_CreateObjCommand(interp, "hello", Hello_Cmd, NULL, NULL);
return TCL_OK;
}
My command for compiling is nearly verbatim except for the last character, indicating Tcl version 8.6 rather than 8.4, and it compiles without error:
gcc -shared -o hello.dll -DUSE_TCL_STUBS -I$TCLINC -L$TCLLIB -ltclstub86
Then I created the following Tcl program:
load hello.dll Hello
puts "got here"
But when running it with tclsh get the following error:
cannot find symbol "Hello_Init"
while executing
"load ./hello.dll Hello"
(file "hello.tcl" line 1)
So I am essentially following a couple of suggestions from Donal Fellows answer here: cannot find symbol "Embeddedrcall_Init" The OP there however commented that, like me, the suggestion(s) hadn't resolved their issue. One thing that I didn't try from that answer was "You should have an exported (extern "C") function symbol in your library" -- could that be the difference maker? Shouldn't it have been in the example all along then?
At the suggestion of somebody on comp.lang.tcl I found "DLL Export Viewer" but when I run it against the DLL it reports 0 functions found :( What am I doing wrong?
Could it be an issue with MinGW/gcc on Windows, and I need to bite the bullet and do this with Visual Studio? That's overkill I'd like to avoid if possible.
The core of the problem is that your function Hello_Init is not ending up in the global symbol table exported by the resulting DLL. (Some linkers would put such things in as _Hello_Init instead of Hello_Init; Tcl adapts to them transparently.) The symbol must be there for Tcl's load command to work: without it, there's simply no consistent way to tell your extension code what the Tcl_Interp context handle is (which allows it to make commands, variables, etc.)
(If you'd been working with C++, one of the possible problem is a missing extern "C" whose actual meaning is to turn off name mangling. That's probably not the problem here.)
Since you are on Windows — going by the symbols in your DLL, such as EnterCriticalSection and GetLastError — the problem is probably linked to exactly how you are linking. I'm guessing that Tcl is defining your function to have __declspec(dllexport) (assuming you've not defined STATIC_BUILD, which absolutely should not be used when building a DLL) and yet that's not getting respected. Assuming you're using a modern-enough version of GCC… which you probably are.
I'm also going through the process of how to build tcl extensions in C and had exactly the same problem when working though this same example using tcl 8.6.
i.e. I was compiling using MinGW GCC (64-bit), and used the following:
gcc -shared -o hello.dll -DUSE_TCL_STUBS "-IC:\\ActiveTcl\\include" "-LC:\\ActiveTcl\\lib" -ltclstub86
And like the OP I got no compile error, but when loading the dll at a tclsh prompt tcl complained :
'cannot find symbol "Hello_Init"'
I can't say that I understand, but I was able to find a solution that works thanks to some trial and error, and some information on the tcl wiki here
https://wiki.tcl-lang.org/page/Building+Tcl+DLL%27s+for+Windows
In my case I had to adjust the compiler statement to the following
gcc -shared -o hello.dll hello.c "-IC:\\ActiveTcl\\include" "-LC:\\ActiveTcl\\bin" -ltcl86t
Obviously those file paths are specific to my system, but basically
I had to add an explicit reference to the .c file
I had to include the tcl86t dll library from the tcl bin directory
I had to remove the -DUSE_TCL_STUBS flag ( meaning that the references -LC:\\ActiveTcl\\lib and -ltclstub86 could also be removed)
(attempting to use the -DUSE_TCL_STUBS flag caused the compiler to complain with C:\ActiveTcl\lib/tclstub86.lib: error adding symbols: File format not recognized )
This successfully compiled a dll that I could load, and then call the hello function to print my 'Hello World' message.
Something else I stumbled over, and which wasn't immediately obvious:
reading https://www.tcl.tk/man/tcl8.6/TclCmd/load.htm, tcl epxects to find an 'init' function based on a certain naming convention.
if the C extension does not define a package name then the name of that init function will be derived from the dll filename.
This caused a few problems for me (when compiling via Eclipse IDE), as the dll name was being automatically determined from the eclipse projet name.
For example, if I recompile the same example, but call the .dll something else, eg.
gcc -shared -o helloWorldExenstion.dll hello.c "-IC:\\ActiveTcl\\include" "-LC:\\ActiveTcl\\bin" -ltcl86t
Then at tclsh prompt:
% load helloWorldExtension
cannot find symbol "Helloworldextension_Init"

How to add binary resources

I have a project that requires a bunch of graphic files in the executable. Since there is no file system at the target I cant just use the fopen function. One way would converting the file content to a C source code that contains the variable definition like this
unsigned char file1_content[] = {
0x01, 0x02, ...
};
It's cumbersome to build such files even with a converter tool.
Is there any way to add binary files to the rdata section while specifying a variable name for each file? I think about using the linker script for this but didn't find a way.
It's not particularly cumbersome with a tool, and that's the classic solution. Search for "bin2c" to find some.
You simply need to include these "asset-building" steps in your build process, i.e. call the tool from the Makefile. This also means that the tool is only run if the source data has changed, which is nice.
At least the GNU linker (LD) seems capable of placing files in the sections of the output file (see the Section Placement documentation, like so:
.data : { afile.o bfile.o cfile.o }
But this sounds quite cumbersome, and it needs you to think about the sections of your executable file which often a bit too low-level. Also, it seems to require the input(s) to be object files, which kind of makes the problem circular since a generic binary asset isn't a linker-compatible object file.
I would recommend going with the bin2c approach.
You may use linker option --format along with -Wl, to pass it to linker, like:
gcc -Wl,--format=binary -Wl,myfile.bin -Wl,--format=default
Last setting format to default allows you to switch linker back to standard input format.
You may access your binary resources from sources via simple _binary_myfile_bin_start assembler symbol (for myfile.bin, for xxx.yyy it will be _binary_xxx_yyy_start and _binary_xxx_yyy_end) like:
extern uint8_t data[] asm("_binary_myfile_bin_start");
And next use data. It is much better then do objcopy by yourself, or use resource hacking.
UPD: Expanding with a little example -- main outputs first four bytes of its own object file:
#include "stdio.h"
#include "stdint.h"
extern uint8_t data[] asm("_binary_main_o_start");
int
main(void)
{
fprintf(stdout, "0x%x, 0x%x, 0x%x, 0x%x\n", data[0], data[1], data[2], data[3]);
return 0;
}
Now compile an run:
$ gcc -o main.o -c main.c
$ gcc -o main main.o -Wl,--format=binary -Wl,main.o -Wl,--format=default
$ ./main
0x7f, 0x45, 0x4c, 0x46

How can I dump an abstract syntax tree generated by gcc into a .dot file?

I think the question's title is self-explanatory, I want to dump an abstract syntax tree generated by gcc into a .dot file (Those files generated by Graphviz) because then I want to view it in a .png file or similar. Is there any way I can do that?
There are two methods, both including two steps
Using GCC internal vcg support
Compile your code (say test.c) with vcg dumps
gcc -fdump-tree-vcg -g test.c
Use any third party tool to get dot output from vcg
graph-easy test.c.006t.vcg --as_dot
Compile with raw dumps and then preprocess them with some scripts to form dot files (like in this useful article)
Both methods have their own good and bad sides -- with first you can really get only one dump of AST before gimple translation, but it is easy. With second you may convert any raw dump to dot-format, but you must support scripts, that is overhead.
What to prefer -- is on your own choice.
UPD: times are changing. Brand new option for gcc 4.8.2 makes it possible to generate dot files immediately. Just supply:
gcc test.c -fdump-tree-all-graph
and you will get a plenty of already formatted for you dot files:
test.c.008t.lower.dot
test.c.012t.cfg.dot
test.c.016t.ssa.dot
... etc ...
Please be sure to use new versions of GCC with this option.
According to the man page, you can get this information via -fdump- command.
Let's look at a dummy example:
// main.c
int sum(int a, int b) {
return a + b;
}
int main(void) {
if (sum(8, 10) < 20) {
return -1;
}
return 1;
}
For gcc 7.3.0:
gcc -fdump-tree-all-graph main.c -o main
There are a lot of options to get the necessary information. Check out the manual for this info.
After that, you'll get many files. Some of them with .dot respresentation(graph option is used):
main.c.003t.original
main.c.004t.gimple
main.c.006t.omplower
...
main.c.011t.cfg
main.c.011t.cfg.dot
...
With GraphViz we can retrieve a pretty-printed graph for each function:
dot -Tpng main.c.011t.cfg.dot -o main.png
You'll get something like this:
main.png
There are a lot of developer options which can help you understand how compiler process your file at a low level: GCC Developer Options

gcc recompile "make" result no changes

i'm using cs50 appliance.
i've tried to write a new file test.c , found as long as i include int i line, it doesn't generate a new file test, if i remove that line and make again, it can generate test file. then i made changes on the test file, it still output the original file result, no reflect the new changes.
#include <stdio.h>
#include <cs50.h>
int
main (void)
{
printf("Number: \n");
int i = GetInt();
}
it was running properly before though... anyone can help please?
Apparently your default rules for make run the compiler on test.c.
The compiler notices that you are assigning a value to variable i, but you never use that value in any way; it would normally report this as a warning.
Apparently either your compiler or make are configured in such a way that this warning becomes a fatal error to make.
The remedy is to use the variable. It looks as though you need to pick up a book on the C programming language, or follow a course, if that's not what you're doing already.

sys_call_table in linux kernel 2.6.18

I am trying to set the sys exit call to a variable by
extern void *sys_call_table[];
real_sys_exit = sys_call_table[__NR_exit]
however, when I try to make, the console gives me the error
error: ‘__NR_exit’ undeclared (first use in this function)
Any tips would be appreciated :) Thank you
Since you are in kernel 2.6.x , sys_call_table isnt exported any more.
If you want to avoid the compilation error try this include
#include<linux/unistd.h>
however, It will not work. So the work around to "play" with the sys_call_table is to find the address of sys_call_table in SystemXXXX.map (located at /boot) with this command:
grep sys_call System.map-2.6.X -i
this will give the addres, then this code should allow you to modify the table:
unsigned long *sys_call_table;
sys_call_table = (unsigned long *) simple_strtoul("0xc0318500",NULL,16);
original_mkdir = sys_call_table[__NR_mkdir];
sys_call_table[__NR_mkdir] = mkdir_modificado;
Hope it works for you, I have just tested it under kernel 2.6.24, so should work for 2.6.18
also check here, Its a very good
http://commons.oreilly.com/wiki/index.php/Network_Security_Tools/Modifying_and_Hacking_Security_Tools/Fun_with_Linux_Kernel_Modules
If you haven't included the file syscall.h, you should do that ahead of the reference to __NR_exit. For example,
#include <syscall.h>
#include <stdio.h>
int main()
{
printf("%d\n", __NR_exit);
return 0;
}
which returns:
$ cc t.c
$ ./a.out
60
Some other observations:
If you've already included the file, the usual reasons __NR_exit wouldn't be defined are that the definition was being ignored due to conditional compilation (#ifdef or #ifndef at work somewhere) or because it's being removed elsewhere through a #undef.
If you're writing the code for kernel space, you have a completely different set of headers to use. LXR (http://lxr.linux.no/linux) searchable, browsable archive of the kernel source is a helpful resource.

Resources