Emdedding accessible text in binary - post-compilation - gcc

I have a rather weird question but I don't really know how to put it or where to start looking from.
My question is not about "embedding" a text file (we already have at compile time) - that is too obvious.
My question is if (and how) I could let's say "package" an existing (created by C) binary with a text file and generate a new... working binary with access to that file.
I'm a Mac user. I know that could work with an .app package and all that. But that's still not what I want. I want to be able to "tweak" an existing binary, add some (accessible - how?) additional text data to it, and the binary remaining absolutely functional.
Is that even possible?
P.S. The only serious tool I've looked into is bsdiff and bspatch but I'm not really sure it's what I'm looking for.

You can definitely do this, but the exact procedure is going to be different for every platform, with a few commonalities. Your tool of choice here is probably going to be llvm_objcopy.
At a high level, you will create a special segment or section in the binary (or both as in the case of MachO) containing the data you want, and then you'll have to parse your own executable to retrieve it. Since you said you're on a Mac, we can start there as an example.
Create the dumbest possible test program as a starting point:
test.c
#include <stdio.h>
int main(int argc, char **argv)
{
printf("I'm a binary!\n");
return 0;
}
Compile and run it:
prompt$ clang -o test test.c
prompt$ ./test
I'm a binary
Now create a text file hello.txt and put some text in it:
Hello world!
You can embed this into the MachO file with llvm-objcopy
llvm-objcopy --add-section __MAGIC,__magic_section=hello.txt test test
Then check that it still runs:
prompt$ ./test
I'm a binary!
You can verify that the section has been added using otool -l, you can also run strings on the binary, and you should see your text:
prompt$ strings ./test
I'm a binary!
Hello world!
Now you have to be able to retrieve this data. If you were compiling everything in a priori, it would be easy since the linker would make symbols for you marking the start and end of the __magic_section section that you added.
Since you specifically said this has to be an a posteriori step, you're going to have to actually self-parse the MachO file to look up the __magic_section section in the __MAGIC segment that you added. You can find a few different references for parsing MachO files, but you probably also want to make use of built in dyld functionality. See Apple's Reference on dyld utility calls that can for example give you the Mach header for the running process. Linux has similar functionality by way of dl_iterate_phdr.
Once you know where the section is in your original binary, you can retrieve the text.
To repeat all of this on Linux, you will do pretty much the same thing, but you'll be working with the ELF file format instead of MachO. The same principles would apply though.
As a side note: this is exactly how code signing works on MacOS. A signature is generated and placed into a dedicated "signature" section in the binary to be read by the system on launch.

Related

How to determine if an ELF file is a Go ELF file?

I need to determine whether a given ELF file originated from Go. According to this link:
$ readelf -a traefik.stripped | grep "\.note\.go\.buildid"
Is this in any way inferior to go's native way:
$ go tool buildid traefik.stripped
oPIWoPjqt1P3rttpA3ee/ByNXPhvgS37nIoJY-HYB/8b25JYXrgktA-FYgU5MU/0Posfq41xZW9BEPEG4Ub
Are both methods guaranteed to work on stripped binaries?
I need to determine whether a given ELF file originated from Go
That is impossible to do in general. What is and isn't a Go binary is not well defined, and a sufficiently optimized Go binary may end up containing just a few instructions. E.g. on x86_64, you may end up with a single HLT instruction.
how come strip itself doesn't remove this section?
This section (indeed every section) is not necessary for execution -- you can remove all sections, and the binary will still work.
This section is present only to help developers identify a particular build. strip doesn't remove it by default because that would defeat the purpose of this section, but it certainly can do so.
can an innocent go developer build a golang ELF and accidentally remove this (redundant??) section
Sure. The developer can run a broken version of strip, or he can have aliased strip with strip --strip-all, or he could have used some other ELF post-processing tool, or he could have used UPX, or ...
The mentioned section is a NOTE section:
$ readelf -a traefik.stripped | grep "\.note\.go\.buildid" | sed -n "1,1p"
[11] .note.go.buildid NOTE 0000000000400f9c 00000f9c
And apparently NOTE sections might sometimes be removed for size reductions (related):
objcopy --remove-section=.note.go.buildid traefik.stripped traefik.super.stripped
Removing the mentioned section does not seem to harm the integrity of the whole binary
As for using standard go tools the section should be there, but there is a way that the go nature of a binary can be hidden without any malicious intent. Using upx to reduce the size of the binary will completely hide the go nature of the binary as upx works with binaries from any language.

debugging yacc YYDEBUG where is y.debug

I trying to debug a the yacc generated component for awk (awk.g.c) but when I define YYDEBUG it includes y.debug which I don't seem to have.
Where does y.debug come from?
Without it there are several references that are undefined.
I'm compiling the old 32v or V7 version of awk so I'm not sure if this is something that still exists.
Some versions of yacc (in particular, the AT&T version, still available as part of Plan 9) generated an additional file with the suffix .debug containing debugging information, notably the table which translated symbol numbers back into names. Modern yacc-alikes just insert this information into the generated C file, on the grounds that the memory consumption is basically trivial these days.
The name table might not be generated if you don't request it, but the way you ask for it depends on the yacc version:
Most bison versions only generate the table if the trace option is enabled. (Posix mandates -t for this, but bison provides a host of alternatives and not all historical yaccs complied.)
As indicated above, some really old yaccs put the name table into y.debug. The AT&T implementation, as I mentioned above, always did this, but guarded the #include line with a preprocessor conditional on YY_DEBUG
However, the yacc implementation you pointed to in a comment, which uses the conditionally-included y.debug mechanism, only generates the y.debug file if you invoke it with the -D flag. So that's what you need to do.
Background notes
I unearthed the information in point 3 from the V10 source linked in a comment. The download link is at the top of this page; that wasn't immediately obvious from the link in the comment. (That's the complete source tarball, which is about 70MB. The individual files linked to by the link in the comment have been HTMLised, which makes them a pain to work with.) I could have saved myself some time by reading the release notes (called yaccnews rather than CHANGES). The last note in that file describes the implementation, and I include the paragraph here since it has all the details on how debugging works in this particular yacc version.
8/11/81
Debugging changed. If the parser starts with %{#define YYDEBUG %} and yacc is invoked as yacc -D (for Debugging), then the parser uses an external variable named yydebug to control debugging output. If yydebug == 1, the parser prints out the text of the reduction when it performs one. If yydebug == 2, the parser also prints out the name of the token returned by each call to yylex, and if yydebug == 3, the parser also prints out the active items each time it changes state (this is uninteresting).
For what it's worth, it should be possible to generate a working, compilable parser using a modern yacc (such as bison or byacc). In the long run, that will probably be easier. (If you use bison and you require legacy yacc compatibility, you can use the -y flag. That flag is not supported by byacc, which claims to be legacy compatible regardless.)

Interpreting Mach-O Data

I am trying to tinker with the appearance of the Dock in OS X.
I have the raw data from the Dock's Mach-O executable, but I do not know much about them. I am trying to figure out where I might find the segments/sections where the Dock actually gets drawn. For example, I see all kinds of sections, such as __DATA,__mod_init_func and __DATA,__cfstring, and I am just wondering if there is an easy way to tell which of these sections (or even particular segments) has the data I'm looking for, or a way decompile the data into a more readable format.
You can't really "decompile" a mach-o file unless you understand everything about them. You can get some "human-readable" contents from the raw data like its methods and instances eg.
-[AClass anInstance:]
would be something like:
-(id)anInstance:(id)arg1;
I would suggest some other tool for understanding this.
There are a few command lines that could use:
nm /path/to/mach-o // Prints all the strings of a Mach-O Executable
hexdump -C /path/to/mach-o // Shows the Hexadecimal Code of a Mach-O Executable
otool -t /path/to/mach-o // Outputs the raw (Hexadecimal) "_TEXT,__text" section of a Mach-O Executable (Compare this with the hexdump -C command)
otool -tV /path/to/mach-o // Outputs the converted (Human Readable-ish) "__TEXT,__text" section of a Mach-O Executable
But if you really want to understand everything about Mach-O
I suggest downloading Hopper at:
https://www.hopperapp.com Which is good for showing you bytes of a mach-o binary and what they're for. Then you can have a look here: https://www3.nd.edu/~dthain/courses/cse40243/fall2015/intel-intro.html which will teach you how to understand how the mach-o is compiled and how you can read the execution methods.
eg.
1. Open Hopper and drag and drop in the Mach-O executable then wait for it to load.
2. Execute "otool -tV /path/to/mach-o" in Terminal.app
You can notice the difference between hopper and terminal's output
and begin to piece the differences together. You can then open the website I provided and learn what all the output functions are for.
I hope this helped you a little and gets you started on a search for knowledge.
Your Welcome.

Boost.Tests: Specify function name instead of main

I am working on a project using the cmake build system. By default CMake has a nice framework for generating a single executable from a set of C/C++ code. The cmake function is called create_test_sourcelist. What it does is generate a C/C++ dispatcher with a single main entry point which will call other C/C++ code.
Therefore I have a bunch of C/C++ files with a function signature such as: int TestFunctionality1(int argc, char *argv[]), which I'd like to keep as-is, unless of course it means even more work.
How can I keep this system in place and start using BOOST_CHECK ? I could not figure out how to specify the actual main entry point is not called int main(int argc, char *argv[]).
My goal is have a framework for integration with Jenkins, since the project already uses Boost, I believe this should be doable without re-writing the existing CMake test suite and changing all tests into independent main function.
Unfortunately, it seems there is no straightforward and clean way to do that.
From one side the only useful function of create_test_sourcelist is to generate a test driver: a (stupid pretty simple, naive and with lack of abilities to hack/extend) C/C++ translation unit based on ${cmake-prefix}/share/cmake/Templates/TestDriver.cxx.in (and there is no way to select some other template).
From other side, Boost UTF offers its own test runner (which is equal to test driver in CMake's terminology), but in any variant (static, dynamic, single-header) it contains definition for main() some way or another (even in case of external test runner).
…so you end up with two main() functions and no ability to choose a single one.
Digging a little into sources of create_test_sourcelist I've really wonder why are they implemented it as a command and not as ordinal (extern) cmake module — it doesn't do any special (which can't be implemented using CMake language). This command is really stupid — it doesn't check that required functions are really exists (you'll get compile errors if smth wrong). There are no ways for flexible customization of output source file at all. All that is does is stripping paths and extensions from a list of source files and substitute it to mentioned template using ordinal configure_file()…
So, personally I see no reason to use it at all. It is why I've done the same (but better ;) job in the module mentioned in the comment above.
If you are still want to use that command, generated test driver completely useless if you want to use Boost UTF. You need to provide your own initialization function anyway (and it is not main()), where you can manually register your test cases into a master test suite (or organize your tests into some more complex tree). In that case there is absolutely no reason to use create_test_sourcelist! All what you can get from it is a list of sources that needs to be given to add_executable()… but it is much more easy to do w/ set()… This command even can't help you w/ list of test functions (list of filenames w/o extension actually) to call (it used internally and not exported). Do you still want to use that command??

How can I add sections to an existing (OS X) executable?

Is there any way of adding sections to an already-linked executable?
I'm trying to code-sign an OS X executable, based on the Apple instructions. These include the instruction to create a suitable section in the binary to be signed, by adding arguments to the linker options:
-sectcreate __TEXT __info_plist Info.plist_path
But: The executable I'm trying to sign is produced using Racket (a Scheme implementation), which assembles a standalone executable from Racket/scheme code by cloning the 'real' racket executable and editing the Mach-O file directly.
So the question is: is there a way I can further edit this executable, to add the section which is required for the code-signing?
Using ld doesn't work when used in the obvious way:
% ld -arch i386 -sectcreate __TEXT __info_plist ./hello.txt racket-executable
ld: in racket-executable, can't link with a main executable
%
That seems fair enough, I suppose. Libtool doesn't have any likely-looking options, and neither does the redo_prebinding command (which is at least a command for editing executables).
The two possibilities suggested by the relevant Racket list were (i) to extend the the racket compilation tool to adjust the surgery which is done on the executable (feasible, but scary), or (ii) to create a custom racket executable which has the desired section already in place. Both seem like sledgehammer-and-nut solutions. The macosx-dev list didn't come up with any suggestions.
I think this is infeasible.
There appear to be no stock commands which edit a Mach-O object file (which includes executables). The otool command allows you to view the structure of such a file (use otool -l), but that's about it.
The structure of a Mach-O object file is documented on the Apple reference site. In summary, a Mach-O object file has the following structure:
a header, followed by
a sequence of 'load commands', followed by
a sequence of 'segments' (some of the load commands are responsible for pointing to the offsets of the segments within the file)
The segments contain zero or more 'sections'. The header and load commands are deemed to be in the first segment of the file, before any of that segment's sections. There are a couple of dozen load commands documented, and other commands defined in the relevant header file, therefore clearly private.
Adding a section would imply changing the length of a segment. Unless the section were very small, this would require pushing the following segment further into the file. That's a no-no, because a lot of the load commands refer to data within the file with absolute offsets from the beginning of the file (as opposed to, say, the beginning of the segment or section which contains them), so that relocating the segments in a putative Mach-O editor would involve patching up a large number of offsets. That doesn't look easy.
There are one or two Mach-O file viewers on the web, but none which do much that's different from otool, as far as I can see (it's not particularly hard: I wrote most of one this afternoon whilst trying to understand the format). There's at least one Mach-O editor, but it didn't seem to do anything that lipo didn't do (called 'moatool', but the source appears to have disappeared from google code).
Oh well. Time to find a Plan B, I suppose.
The gimmedebugah tool is able to modify the __info_plist TEXT section of an existing binary. See https://reverse.put.as/2013/05/28/gimmedebugah-how-to-embedded-a-info-plist-into-arbitrary-binaries/
It is available here: https://github.com/gdbinit/gimmedebugah

Resources