Suggestions on how to write a debug format conversion tool - debugging

I'm looking to write a tool that aims to convert debug symbols of one format to another format that's compatible for use under GDB. This seems like a tedious and potentially complex project so I'm not exactly sure how to tackling it.
Intially I'm aiming to convert the Turbo Debug Symbol table(TDS) emitted from borland compilers into something like stabs or dwarf format(seems like dwarf is prefer from my research). But ideally I want to design my tool to be easy enough to extend so it could convert other formats too later on. e.g. codeview4 or maybe even pdb.
My primary motivation for creating this are:
Interoperability. If I can convert a foreign debug format into a form gdb can work with then source-level debugging would be possible on binaries compiled from another compiler other than gcc. This means any frontend debugging interface that uses gdb as a backend will work as well.
No other tools exist. I did a google searching around for similar tools and the closest I've found is tds2dbg. But it doesn't quite do what I'm looking for.
What I have to work with at the moment:
I already have a debug hook API that can understand the TDS debug format. I can use that to help me get at the needed information from the source format I'm converting from.
For the scope of this project, I'm mainly interested in getting this to work under the win32 environment. Other platforms and tools I'm not really concerned about.
The target dwarf debug format I'm converting to. This one I'm really not familiar with at all. I have used gcc ported compilers like MinGW before and debugged them with gdb with the dwarf format. But I don't have any idea how this format is implemented on windows.
The last point is the one I'm concerned about. I'm reading through the dwarf spec documentation but I find I'm having trouble really understanding and comprehending how it works. There's so much detail in there but at the same time it doesn't have any details about how dwarf gets implemented on object files and image files on a platform that doesn't use ELF natively -- namely the PE-COFF format that windows uses. The documentation is also a very dry read, long sentences make it hard to understand and diagrams and illustrations are sparse. I came across an API called libDwarf that should take most of the parsing work out of interpreting dwarf. The problem is I'm still trying to get it to build and I don't know yet how it will work out.
I haven't written any code yet since I don't fully understand what it is I need to build. I have a feeling the biggest hurtle will be figuring out how to work with dwarf due to it's complexity. Googling for information on how dwarf works under windows hasn't turned up anything helpful either. Like for example, there's no information about the 'glue' code that's needed to contain dwarf within a PE executable image file. How are the dwarf sections exactly layed out? Are there any header information for each section? GDB clearly doesn't just take a 'raw' dwarf debug file and use it as is. So what kind of format does gdb expect the debug file to be in for it to be able to work with it?
My question is, how can I start on such a project? More importantly, where can I turn to for help when I inevitably get stuck on a problem?

Affinic Assembler for Windows
Affinic Assembler is an x86/x86-64 assembler for Windows that takes GAS-syntax assembly source with DWARF debug information and generates corresponding CodeView format sections in object file in order to make the linked program debuggable in Visual Studio. This program is good for Cygwin and MinGW users to port Linux code to Windows.
http://www.affinic.com/?page_id=48

You are asking several questions here :-)
I think you are heading in the right direction, using libdwarf.
BUT, have you taken a look at objcopy to see if this tool can do some of the work for you? It probably doesn't support borland, pdb or codeview4, but it might be worth looking into. (Another approach may be to extend objcopy to support the formats you are trying to convert between.)
I have used the dwarf-discuss mailing list sometimes when I have become stuck.
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
As for the questions on dwarf, split them into separate questions and I will do my best to
answer them. :-)

Related

How to inspect a MacOS executable file (Mach-O)?

How to see some basic information about a MacOS binary file, specifically what was used to build it, which frameworks it is linked against and what system calls it is using?
I tried nm and otool -L but their output is only partially useful.
For example, what would indicate that a binary was built with xcode or golang compiler?
NB. I am not interested in reverse engineering macos binaries. I just want to know better what is running on my system using just the tools that are already included OOTB.
The compiler used is not a trivial question. Some compilers embed information about themselves, and you sometimes find that by running strings, but it's not guaranteed. That said, the answer on Mac is almost always clang, so it's not usually that hard to get the basics. As an example:
strings iTunes | grep clang
COMPILER=clang-9.0.0
But that's just luck (and might not even be accurate).
IDA Pro does a good job at this, and is the gold standard for reverse engineering work. If this kind of thing is important to you, then IDA Pro is the tool. It's expensive.
To get a list of frameworks that it links at runtime, otool -L is the tool you want. I don't understand what you mean by "isn't very readable." It just prints out all the frameworks, one per line. It's hard to imagine anything more readable than that. What are you looking for here?
otool -L will not tell you what static libraries were used, so "which frameworks it was built against" may be too broad to answer. You can generally find the symbols for well-known static libraries (OpenSSL for instance), but there is no easy way to know precisely what was included in a binary, particularly if debugging information is stripped. (If there's debugging information, then the filepaths will tend to be available, which can tell you a lot more about how it was built.)
There is no easy static way to get a list of all system calls, since those may be buried in libraries. nm is generally what you want, though. It includes a lot of things that aren't "system" calls, though, since it's going to include every symbol linked from external sources (you probably want something like nm -ju). Again, I'm not sure what you mean by "isn't very readable." It's one symbol per line.
In order to get the list of system calls at runtime, run the application with dtruss. It'll output every system call as it's made, and will focus on actual "system" calls (i.e. syscall).

gdb, how to step into c runtime? Where is crt_c.c?

When I'm stepping into debugged program, it says that it can't find crt/crt_c.c file. I have sources of gcc 6.3.0 downloaded, but where is crt_c.c in there?
Also how can I find source code for printf and rand in there? I'd like to step through them in debugger.
Ide is codeblocks, if that's important.
Edit: I'm trying to do so because I'm trying to decrease size of my executable. Going straight into freestanding leaves me with a lot of missing functions, so I intend to study and replace them one by one. I'm trying to do that to make my program a little smaller and faster, and to be able to study assembly output a bit easier.
Also, forgot to mention, I'm on windows, msys2. But answer is still helpful.
How can I find source code for printf and rand in there?
They (printf, rand, etc....) are part of your C standard library which (on Linux) is outside of the GCC compiler. But crt0 is provided by GCC (however, is often not compiled with debug information) and some C files there are generated in the build tree during compilation of GCC.
(on Windows, most of the C standard library is proprietary -inside some DLL provided by MicroSoft- and you are probably forbidden to look into the implementation or to reverse-engineer it; AFAIK EU laws might mention some exception related to interoperability¸ but then you need to consult a lawyer and I am not a lawyer)
Look into GNU glibc (or perhaps musl-libc) if you want to study its source code. libc is generally using system calls (listed in syscalls(2)) provided by the Linux kernel.
I'd like to step through them in debugger.
In practice you won't be able to do that easily, because the libc is provided by your distribution and has generally been compiled without debug information in DWARF format.
Some Linux distributions provide a debuggable variant of libc, perhaps as some libc6-dbg package.
(your question lacks motivation and smells like some XY problem)
I intend to study and replace them one by one.
This is very unrealistic (particularly on Windows, whose system call interface is not well documented) and could take you many years (or perhaps more than a lifetime). Do you have that much time?
Read also Operating Systems: Three Easy Pieces and look into OsDev wiki.
I'm trying to do so because I'm trying to decrease size of my executable.
Wrong approach. A debugger needs debug info (e.g. in DWARF) which will increase the size of the executable (but could later be stripped). BTW standard C functions are in some common shared library (or DLL on Windows) which is used by many processes.
I'm on windows, msys2.
Bad choice. Windows is proprietary. Linux is made of free software (more than ten billions lines of source code, if you consider all useful packages inside a typical Linux distribution), whose source code you could study (even if it would take several lifetimes).

Debugging Assembly Code (Intel 8086)

I'm in an Assembly class focusing on the intel 8086 architecture (all compiling / linking / execution comes from running DOS on win7 via DOS-Box).
I've finished programming the latest assignment, but as I have yet to program any program successfully the first time through, I am now stuck trying to debug my code.
I have visual studio 2010 and was wondering if there was some built in feature that would help me debug my assembly code, specifically, I'm looking to track the value of a variable.
Failing that, instructions pointing to a DOS-Box debugger (and instructions!) would be much appreciated. (I think I've been able to run codeview debug, but I couldn't figure out how to do what I was looking for).
You are generating 16-bit code, you have to break into a museum to find better tooling. Try Borland's, maybe the debugger included with Turbo C.
Yes, indeed, you can use the debugger in VS to examine pretty much everything. Irvine's site has a section specifically on using the debugger here. You can examine registers, use the watch window, etc. He also has a guide for highlighting asm keywords if you need that.
Edit: as Hans pointed out, if you are using 16-bit instead of 32-bit protected, you'll need different tools. There are several choices, listed here.
Borland's tools for DOS were called tasm, tlink, and tdebug.

Can the Visual Studio ARM Assembler produce binaries that don't require an OS?

I'll admit upfront that I don't know a whole lot about ARM development, so I probably have by information wrong here.
Visual Studio comes with an ARM assembler (armasm.exe), which is extremely convenient because I use the tools included with VS for basically everything and I'm not too wild about paying for an ARM assembler that comes bundled with a C compiler that I'll never use from other companies.
Now, my understanding is that ARM binaries that are run on-the-metal need to be in a pure binary format instead of something like ELF or PE. Is ARMASM capable of outputting binaries that can run without an operating system? The MSDN documentation for ARMASM appears to be lacking in regards to that type of information.
If not, can you recommend a free ARM assembler that provides macro support and doesn't come bundled with a bunch of extra fluff?
The assembler just produces object files. It's up to the linker to produce the final, executable, file. I'm pretty sure Microsoft uses pretty much their usual linker, which produces PE format executables (which is a COFF variant, in case you care). Offhand, I don't know of a linker/locator that will take MS-COFF format object files and produce a pure binary output file (though that hardly means one doesn't exist -- I've never really looked for one).
Also note that running on the bare metal most means burning your file to some variant of ROM. That means you really don't need a pure binary output file -- what you really need is a file suitable for a ROM burner. That usually means Motorola S-records or Intel hex format (quite a few ROM burners accept both).
I know that doesn't give you a "final answer", but it should at least give you a few terms suitable for Googling to get more relevant information...

Is There a Way to Tell What Language Was Used for a Program?

I have a desktop program I downloaded and installed. It runs from an .exe file.
Is there some way from the .exe file to tell what programming language was used to write the program?
Are there any tools are available to help with this?
What languages can be determined and which ones cannot?
Okay here are two of the sort of things I'm looking for:
Tips to Determine Whether an App is Written in Delphi or Not
This "IsDelphi" program by Bruce McGee will find all applications built with Delphi, Delphi for .Net or C++ Builder that are on your hard drive.
I use WinDowse (a small freeware utility written in Delphi) to spy the windows of the program.. for example if you look at the "Class" TabSheet you can discover the "Class" Name of the control..
For example:
TFormXX, TEditYY, TPanelZZZ for delphi apps
WindowsForms10.XXXX.yyy, for .NET apps
wxWindowsXXX for wxWindows apps
AfxWndXX for MFC/VC++ apps (I think)
I think this is the fastest way (although not the most accurate) to find information about apps..
I understand your curiosity.
You can identify Delphi and C++ Builder apps and their SKU by looking for a couple of specific resources that the linker adds. Specifically RC Data\DVCLAL and RC DATA\PACKAGEINFO. The XN Resource Editor makes this a lot easier, but it might choke on compressed EXEs.
EXE compressors complicate things a little. They can hide or scramble the contents of the resources. Programs compressed with UPX are easy to identify with a HEX editor because the first 2 sections in the PE header are named UPX0 and UPX1. You can use the app to decompress these.
Applications compiled with .Net aren't difficult to detect. Recent versions of Delphi even include an IsAssembly function, or you could do a little spelunking in the PE header. Check out the IsManaged function in IsDelphi.
Telling which .Net language was used is trickier. By default, VB.Net includes a reference to Microsoft.VisualBasic, and VCL.Net apps included Borland specific references. However, VCL.Net is defunct in favour of Delphi Prism, and you can add a reference to the VB assembly to any managed language.
I haven't looked at some of the apps that use signatures to identify the the compiler, so I don't know how well they work.
I hope this helps.
First, look to see what run time libraries it loads. A C program won't normally load Visual Basic's library.
Also, examine the executable for telltale strings. In most executables, this is near the end. If the program uses string constants, there might be a clue in how they are stored.
A good disassembler, plus of course an excellent understanding of the underlying CPU architecture, can often help you identify the runtime libraries that are in play. Unless the exe has been carefully "stripped" of symbols and/or otherwise masked, the names of symbols seen in runtime libraries will often provide you with programming-language hints, because different languages' standards specify different names, and vendors of compilers and accompanying runtime libraries usually respect those standards pretty closely.
Of course, you won't get there without knowledge of the various possible languages and their library standards -- and if the code's author was intent to mask the information, that's not too hard for them to do, either.
If you have available a large set of samples from known compilers, I should think this would be an excellent application for machine learning. I believe so-called "supervised learning" is relevant here. Unfortunately I know next to nothing about the topic—only that I have heard some impressive results presented at conferences.
You might dig through the proceedings of the Working Conference on Reverse Engineering to see if anyone else is interested in this problem.
Assuming this is an application for Windows...
Does Reflector recognize it as a .NET assembly? Then it's MSIL, 99% either VB or C#, but you'll likely never know which, nor does it matter.
Does it need an intrepreter (like Java?)? Then it's Java (or whatever the interpreter is.)
Check what runtime DLLs it requires.
Does it require the VB runtime dlls? Congratulations, VB from VisualStudio 6.0 or earlier.
Does it require the Delphi dlls? Congratulations, Delphi.
Did you make it this far? C/C++. Assume C++ unless it requires msys or cygwin dlls, in which case C has maybe a 25% chance.
Congratulations, this should come out correct for the vast majority of Windows software. This probably doesn't actually help you though, as a lot of the same things can be done in all of these languages.
IDA Pro Free (http://www.hex-rays.com/idapro/idadownfreeware.htm) may be helpful. Even if you don't understand assembly language, if you load the EXE into IDA Pro then its initial progress output might (if there are any telltale signs) include its best guess as to which compiler was used.
Start with various options to dumpbin. The symbol names, if not carefully erased, will give you all kinds of hints as to whether it is C, C++, CLR, or something else.
Other tools use signatures to identify the compiler used to create the executable, like PEiD, CFF Explorer and others.
They normally scan the entry point of the executable vs the signature.
Signature Explorer from CFF Explorer can give you an understanding of how one signature is constructed.
It looks like the VC++ linker from V6 up adds a signature to the PE header which youcan parse.
i suggest PEiD (freeware, closed source). Has all of Delphi for Win32 signatures, also can tell you which was packer used (if any).

Resources