I need to determine whether a given ELF file originated from Go. According to this link:
$ readelf -a traefik.stripped | grep "\.note\.go\.buildid"
Is this in any way inferior to go's native way:
$ go tool buildid traefik.stripped
oPIWoPjqt1P3rttpA3ee/ByNXPhvgS37nIoJY-HYB/8b25JYXrgktA-FYgU5MU/0Posfq41xZW9BEPEG4Ub
Are both methods guaranteed to work on stripped binaries?
I need to determine whether a given ELF file originated from Go
That is impossible to do in general. What is and isn't a Go binary is not well defined, and a sufficiently optimized Go binary may end up containing just a few instructions. E.g. on x86_64, you may end up with a single HLT instruction.
how come strip itself doesn't remove this section?
This section (indeed every section) is not necessary for execution -- you can remove all sections, and the binary will still work.
This section is present only to help developers identify a particular build. strip doesn't remove it by default because that would defeat the purpose of this section, but it certainly can do so.
can an innocent go developer build a golang ELF and accidentally remove this (redundant??) section
Sure. The developer can run a broken version of strip, or he can have aliased strip with strip --strip-all, or he could have used some other ELF post-processing tool, or he could have used UPX, or ...
The mentioned section is a NOTE section:
$ readelf -a traefik.stripped | grep "\.note\.go\.buildid" | sed -n "1,1p"
[11] .note.go.buildid NOTE 0000000000400f9c 00000f9c
And apparently NOTE sections might sometimes be removed for size reductions (related):
objcopy --remove-section=.note.go.buildid traefik.stripped traefik.super.stripped
Removing the mentioned section does not seem to harm the integrity of the whole binary
As for using standard go tools the section should be there, but there is a way that the go nature of a binary can be hidden without any malicious intent. Using upx to reduce the size of the binary will completely hide the go nature of the binary as upx works with binaries from any language.
Related
I have a rather weird question but I don't really know how to put it or where to start looking from.
My question is not about "embedding" a text file (we already have at compile time) - that is too obvious.
My question is if (and how) I could let's say "package" an existing (created by C) binary with a text file and generate a new... working binary with access to that file.
I'm a Mac user. I know that could work with an .app package and all that. But that's still not what I want. I want to be able to "tweak" an existing binary, add some (accessible - how?) additional text data to it, and the binary remaining absolutely functional.
Is that even possible?
P.S. The only serious tool I've looked into is bsdiff and bspatch but I'm not really sure it's what I'm looking for.
You can definitely do this, but the exact procedure is going to be different for every platform, with a few commonalities. Your tool of choice here is probably going to be llvm_objcopy.
At a high level, you will create a special segment or section in the binary (or both as in the case of MachO) containing the data you want, and then you'll have to parse your own executable to retrieve it. Since you said you're on a Mac, we can start there as an example.
Create the dumbest possible test program as a starting point:
test.c
#include <stdio.h>
int main(int argc, char **argv)
{
printf("I'm a binary!\n");
return 0;
}
Compile and run it:
prompt$ clang -o test test.c
prompt$ ./test
I'm a binary
Now create a text file hello.txt and put some text in it:
Hello world!
You can embed this into the MachO file with llvm-objcopy
llvm-objcopy --add-section __MAGIC,__magic_section=hello.txt test test
Then check that it still runs:
prompt$ ./test
I'm a binary!
You can verify that the section has been added using otool -l, you can also run strings on the binary, and you should see your text:
prompt$ strings ./test
I'm a binary!
Hello world!
Now you have to be able to retrieve this data. If you were compiling everything in a priori, it would be easy since the linker would make symbols for you marking the start and end of the __magic_section section that you added.
Since you specifically said this has to be an a posteriori step, you're going to have to actually self-parse the MachO file to look up the __magic_section section in the __MAGIC segment that you added. You can find a few different references for parsing MachO files, but you probably also want to make use of built in dyld functionality. See Apple's Reference on dyld utility calls that can for example give you the Mach header for the running process. Linux has similar functionality by way of dl_iterate_phdr.
Once you know where the section is in your original binary, you can retrieve the text.
To repeat all of this on Linux, you will do pretty much the same thing, but you'll be working with the ELF file format instead of MachO. The same principles would apply though.
As a side note: this is exactly how code signing works on MacOS. A signature is generated and placed into a dedicated "signature" section in the binary to be read by the system on launch.
I trying to debug a the yacc generated component for awk (awk.g.c) but when I define YYDEBUG it includes y.debug which I don't seem to have.
Where does y.debug come from?
Without it there are several references that are undefined.
I'm compiling the old 32v or V7 version of awk so I'm not sure if this is something that still exists.
Some versions of yacc (in particular, the AT&T version, still available as part of Plan 9) generated an additional file with the suffix .debug containing debugging information, notably the table which translated symbol numbers back into names. Modern yacc-alikes just insert this information into the generated C file, on the grounds that the memory consumption is basically trivial these days.
The name table might not be generated if you don't request it, but the way you ask for it depends on the yacc version:
Most bison versions only generate the table if the trace option is enabled. (Posix mandates -t for this, but bison provides a host of alternatives and not all historical yaccs complied.)
As indicated above, some really old yaccs put the name table into y.debug. The AT&T implementation, as I mentioned above, always did this, but guarded the #include line with a preprocessor conditional on YY_DEBUG
However, the yacc implementation you pointed to in a comment, which uses the conditionally-included y.debug mechanism, only generates the y.debug file if you invoke it with the -D flag. So that's what you need to do.
Background notes
I unearthed the information in point 3 from the V10 source linked in a comment. The download link is at the top of this page; that wasn't immediately obvious from the link in the comment. (That's the complete source tarball, which is about 70MB. The individual files linked to by the link in the comment have been HTMLised, which makes them a pain to work with.) I could have saved myself some time by reading the release notes (called yaccnews rather than CHANGES). The last note in that file describes the implementation, and I include the paragraph here since it has all the details on how debugging works in this particular yacc version.
8/11/81
Debugging changed. If the parser starts with %{#define YYDEBUG %} and yacc is invoked as yacc -D (for Debugging), then the parser uses an external variable named yydebug to control debugging output. If yydebug == 1, the parser prints out the text of the reduction when it performs one. If yydebug == 2, the parser also prints out the name of the token returned by each call to yylex, and if yydebug == 3, the parser also prints out the active items each time it changes state (this is uninteresting).
For what it's worth, it should be possible to generate a working, compilable parser using a modern yacc (such as bison or byacc). In the long run, that will probably be easier. (If you use bison and you require legacy yacc compatibility, you can use the -y flag. That flag is not supported by byacc, which claims to be legacy compatible regardless.)
In my code, I have a special section and some data in it:
#pragma section("dead",read,discard)
__declspec(allocate("dead"))
static const char dead_str[] = "DEAD";
doSomething(dead_str);
That section may or may not be loaded with the program, but is still part of the binary image. What I want to do is completely remove it from the image so it's guaranteed to not be loaded, and so that it cannot be found in the binary. Basically, I want:
strings myprogram.exe | grep DEAD
to have no hits. I'm fine with the program crashing when I try to reference the string.
In GNU, I'd do:
objcopy --remove-section=dead myprogram.exe
and Cygwin's objcopy actually does that, but corrupts the executable so it can no longer load. editbin.exe from MSVC can just change the flags for the section, but it stays in the image. And linker optimizations will not remove dead_str because it is referenced in doSomething().
Is there some way (ideally as a linker flag) I can remove an entire section from PE/COFF files?
I was searching for a linker setting too, but it seems this is not possible at all with MSVC.
What you can do is either writing your own tool to strip sections from PE files, or use CFF Explorer for example.
My c/obj-c code (an iOS app built with clang) has some functions excluded by #ifdefs. I want to make sure that code that gets called from those functions, but not from others (dead code) gets stripped out (eliminated) at link time.
I tried:
Adding a local literal char[] in a function that should be eliminated; the string is still visible when running strings on the executable.
Adding a function that should be eliminated; the function name is still visible when running strings.
Before you ask, I'm building for release, and all strip settings (including dead-code stripping, obviously) are enabled.
The question is not really xcode/apple/iOS specific; I assume the answer should be pretty much the same on any POSIX development platform.
(EDIT)
In binutils, ld has the --gc-sections option which does what you want for sections on object level. You have several options:
use gcc's flags -ffunction-sections and -fdata-sections to isolate each symbol into its own section, then use --gc-sections;
put all candidates for removal into a separate file and the linker will be able to strip the whole section;
disassemble the resulting binary, remove dead code, assemble again;
use strip with appropriate -N options to discard the offending symbols from the
symbol table - this will leave the code and data there, but it won't show up in the symbol table.
Is there any way of adding sections to an already-linked executable?
I'm trying to code-sign an OS X executable, based on the Apple instructions. These include the instruction to create a suitable section in the binary to be signed, by adding arguments to the linker options:
-sectcreate __TEXT __info_plist Info.plist_path
But: The executable I'm trying to sign is produced using Racket (a Scheme implementation), which assembles a standalone executable from Racket/scheme code by cloning the 'real' racket executable and editing the Mach-O file directly.
So the question is: is there a way I can further edit this executable, to add the section which is required for the code-signing?
Using ld doesn't work when used in the obvious way:
% ld -arch i386 -sectcreate __TEXT __info_plist ./hello.txt racket-executable
ld: in racket-executable, can't link with a main executable
%
That seems fair enough, I suppose. Libtool doesn't have any likely-looking options, and neither does the redo_prebinding command (which is at least a command for editing executables).
The two possibilities suggested by the relevant Racket list were (i) to extend the the racket compilation tool to adjust the surgery which is done on the executable (feasible, but scary), or (ii) to create a custom racket executable which has the desired section already in place. Both seem like sledgehammer-and-nut solutions. The macosx-dev list didn't come up with any suggestions.
I think this is infeasible.
There appear to be no stock commands which edit a Mach-O object file (which includes executables). The otool command allows you to view the structure of such a file (use otool -l), but that's about it.
The structure of a Mach-O object file is documented on the Apple reference site. In summary, a Mach-O object file has the following structure:
a header, followed by
a sequence of 'load commands', followed by
a sequence of 'segments' (some of the load commands are responsible for pointing to the offsets of the segments within the file)
The segments contain zero or more 'sections'. The header and load commands are deemed to be in the first segment of the file, before any of that segment's sections. There are a couple of dozen load commands documented, and other commands defined in the relevant header file, therefore clearly private.
Adding a section would imply changing the length of a segment. Unless the section were very small, this would require pushing the following segment further into the file. That's a no-no, because a lot of the load commands refer to data within the file with absolute offsets from the beginning of the file (as opposed to, say, the beginning of the segment or section which contains them), so that relocating the segments in a putative Mach-O editor would involve patching up a large number of offsets. That doesn't look easy.
There are one or two Mach-O file viewers on the web, but none which do much that's different from otool, as far as I can see (it's not particularly hard: I wrote most of one this afternoon whilst trying to understand the format). There's at least one Mach-O editor, but it didn't seem to do anything that lipo didn't do (called 'moatool', but the source appears to have disappeared from google code).
Oh well. Time to find a Plan B, I suppose.
The gimmedebugah tool is able to modify the __info_plist TEXT section of an existing binary. See https://reverse.put.as/2013/05/28/gimmedebugah-how-to-embedded-a-info-plist-into-arbitrary-binaries/
It is available here: https://github.com/gdbinit/gimmedebugah