Decompile a Mac Kernel Extension? - macos

Is it possible to decompile a Mac kernel extension?

In theory it is possible to decompile any binary code.
Kernel extensions are a little bit tricky because
a) they're C++, so virtual methods make the code harder to follow.
b) linking happens differently in kernel extensions, so any decompiler would need be specially designed to handle kernel extensions in order to find dependencies and symbol names.

you can use gdb (as nate c suggested) to inspect the assembly code of a kernel extension. i'm not aware of any decompilers for kernel extensions specifically.
you can use the kextload tool to create a symbols file that you can load into gdb. this will let you see decoded symbol names for functions, &c. there's a crash (haha get it?) tutorial here: http://praveenmatanam.wordpress.com/2008/05/22/kext-debugging-on-mac/
why do you want to do this?

It is no problem to decompile 32bit kext's using the hexrays decompiler.
Decompiling c++ code, means you have to define your structs in the right way: when an object has virtual methods, the first item in the object will be a pointer to the object's vtable.
if you declare the vtable in IDA or hexrays as well, and make sure all the types of the function pointers are correct, hexrays will produce quite readable code.
But chances are that the parts of the kext you are interested in were written in C-like C++, and you don't need to worry about that at all.

For reversing 64-bit kexts, acquire ida pro and x64 Decompiler (any of mac/lin/win).
Also, you can usually debug a kext (without symbols) using lldb remote setup. (gdb is gone.)
If you happen to work for a large security shop, do the song-and-dance: sign an NDA, give rights to first born and just get the OSX source.
Also, here's a large list of decompilers:
https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers

Related

gdb, how to step into c runtime? Where is crt_c.c?

When I'm stepping into debugged program, it says that it can't find crt/crt_c.c file. I have sources of gcc 6.3.0 downloaded, but where is crt_c.c in there?
Also how can I find source code for printf and rand in there? I'd like to step through them in debugger.
Ide is codeblocks, if that's important.
Edit: I'm trying to do so because I'm trying to decrease size of my executable. Going straight into freestanding leaves me with a lot of missing functions, so I intend to study and replace them one by one. I'm trying to do that to make my program a little smaller and faster, and to be able to study assembly output a bit easier.
Also, forgot to mention, I'm on windows, msys2. But answer is still helpful.
How can I find source code for printf and rand in there?
They (printf, rand, etc....) are part of your C standard library which (on Linux) is outside of the GCC compiler. But crt0 is provided by GCC (however, is often not compiled with debug information) and some C files there are generated in the build tree during compilation of GCC.
(on Windows, most of the C standard library is proprietary -inside some DLL provided by MicroSoft- and you are probably forbidden to look into the implementation or to reverse-engineer it; AFAIK EU laws might mention some exception related to interoperability¸ but then you need to consult a lawyer and I am not a lawyer)
Look into GNU glibc (or perhaps musl-libc) if you want to study its source code. libc is generally using system calls (listed in syscalls(2)) provided by the Linux kernel.
I'd like to step through them in debugger.
In practice you won't be able to do that easily, because the libc is provided by your distribution and has generally been compiled without debug information in DWARF format.
Some Linux distributions provide a debuggable variant of libc, perhaps as some libc6-dbg package.
(your question lacks motivation and smells like some XY problem)
I intend to study and replace them one by one.
This is very unrealistic (particularly on Windows, whose system call interface is not well documented) and could take you many years (or perhaps more than a lifetime). Do you have that much time?
Read also Operating Systems: Three Easy Pieces and look into OsDev wiki.
I'm trying to do so because I'm trying to decrease size of my executable.
Wrong approach. A debugger needs debug info (e.g. in DWARF) which will increase the size of the executable (but could later be stripped). BTW standard C functions are in some common shared library (or DLL on Windows) which is used by many processes.
I'm on windows, msys2.
Bad choice. Windows is proprietary. Linux is made of free software (more than ten billions lines of source code, if you consider all useful packages inside a typical Linux distribution), whose source code you could study (even if it would take several lifetimes).

Can I mix arm-eabi with arm-elf?

I have a product which bootloader and application are compiled using a compiler (gnuarm GCC 4.1.1) that generates "arm-elf".
The bootloader and application are segregated in different FLASH memory areas in the linker script.
The application has a feature that enables it to call the bootloader (as a simple c-function with 2 parameters).
I need to be able to upgrade existing products around the world, and I can safely do this using always the same compiler.
Now I'd like to be able to compile this product application using a new GCC version that outputs arm-eabi.
Everything will be fine for new products, where both application and bootloader are compiled using the same toolchain, but what happens with existing products?
If I flash a new application, compiled with GCC 4.6.x and arm-none-eabi, will my application still be able to call the bootloader function from the old arm-elf bootloader?
Furthermore, not directly related to the above question, can I mix object files compiled with arm-elf into a binary compiled with arm-eabi?
EDIT:
I think is good to make clear I am building for a bare metal ARM7, if it makes any difference...
No. An ABI is the magic that makes binaries compatible. The Application Binary Interface determines various conventions on how to communicate with other libraries/applications. For example, an ABI will define calling convention, which makes implicit assumptions about things like which registers are used for passing arguments to C functions, and how to deal with excess arguments.
I don't know the exact differences between EABI and ABI, but you can find some of them by reading up on EABI. Debian's page mentions the syscall convention is different, along with some alignment changes.
Given the above, of course, you cannot mix arm-elf and arm-eabi objects.
The above answer is given on the assumption that you talk to the bootloader code in your main application. Given that the interface may be very simple (just a function call with two parameters), it's possible that it might work. It'd be an interesting experiment to try. However, it is not ** guaranteed** to work.
Please keep in mind you do not have to use EABI. You can generate an arm-elf toolchain with gcc 4.6 just as well as with older versions. Since you're using a binary toolchain on windows, you may have more of a challenge. I'd suggest investigating crosstool-ng, which works quite well on Linux, and may work okay on cygwin to build the appropriate toolchain.
There is always the option of making the call to bootloader in inline assembly, in which case you can adhere to any calling standard you need :).
However, besides the portability issue it introduces, this approach will also make two assumptions about your bootloader and application:
you are able to detect in your app that a particular device has a bootloader built with your non-EABI toolchain, as you can only call the older type bootloader using the assembly code.
the two parameters you mentioned are used as primitive data by your bootloader. Should the bootloader use them, for example, as pointers to structs then you could be facing issues with incorrect alignment, padding and so forth.
I Think that this will be OK. I did a migration something like this myself, from what I remember I only ran into a problem to do with handling division.
This is the best info I can find about the differences, it suggests that if you don't have struct alignment issues, you may be OK.

stdio's printf and Windows Driver

I want to use "printf" in driver code (DDK), therefore I've included stdio.h. But the compiler says:
error LNK2001: unresolved external symbol __imp__printf
Any ideas? I seen somewhere that it is not possible - but that's awful - I can't believe it. Why can't I use standard C routines in kernel code?
C functions like printf come from a static cstd.lib or something AFAIK don't they?
Why would WDK provide me with stdio.h then?
The Windows kernel only supports part of the standard C runtime. In particular, high-level functionality — like file streams, console I/O, and networking — is not supported. Instead, you need to use native kernel APIs for similar functionality.
The reason that stdio.h is included with the WDK is because some parts of the C runtime are provided for your convenience. For example, you can use memcmp (although the native RtlCompareMemory is preferred). Microsoft has not picked through the CRT headers to #ifdef out the bits and pieces that are not available in kernel mode. Once you develop some experience writing kernel drivers, you'll get the hang of what's possible in the kernel, and what probably won't work.
To address your high-level question: you're probably looking for some debug/logging mechanism. You really have two options:
DbgPrintEx is the easiest to use. It's basically a drop-in for printf (although you need to be careful about certain types of string inserts when running >=DISPATCH_LEVEL). Output goes to the debugger, or, if you like, to DbgView.
WPP is the industrial-strength option. The initial learning curve is pretty steep (although there are samples in the WDK). However, it is very flexible (e.g., you can create your own shrieks, like Print("My IP address is: %!IPV4!", ip);), and it is very fast (Microsoft ships WPP tracing in the non-debug builds of most Windows components).

Can the Visual Studio ARM Assembler produce binaries that don't require an OS?

I'll admit upfront that I don't know a whole lot about ARM development, so I probably have by information wrong here.
Visual Studio comes with an ARM assembler (armasm.exe), which is extremely convenient because I use the tools included with VS for basically everything and I'm not too wild about paying for an ARM assembler that comes bundled with a C compiler that I'll never use from other companies.
Now, my understanding is that ARM binaries that are run on-the-metal need to be in a pure binary format instead of something like ELF or PE. Is ARMASM capable of outputting binaries that can run without an operating system? The MSDN documentation for ARMASM appears to be lacking in regards to that type of information.
If not, can you recommend a free ARM assembler that provides macro support and doesn't come bundled with a bunch of extra fluff?
The assembler just produces object files. It's up to the linker to produce the final, executable, file. I'm pretty sure Microsoft uses pretty much their usual linker, which produces PE format executables (which is a COFF variant, in case you care). Offhand, I don't know of a linker/locator that will take MS-COFF format object files and produce a pure binary output file (though that hardly means one doesn't exist -- I've never really looked for one).
Also note that running on the bare metal most means burning your file to some variant of ROM. That means you really don't need a pure binary output file -- what you really need is a file suitable for a ROM burner. That usually means Motorola S-records or Intel hex format (quite a few ROM burners accept both).
I know that doesn't give you a "final answer", but it should at least give you a few terms suitable for Googling to get more relevant information...

Is There a Way to Tell What Language Was Used for a Program?

I have a desktop program I downloaded and installed. It runs from an .exe file.
Is there some way from the .exe file to tell what programming language was used to write the program?
Are there any tools are available to help with this?
What languages can be determined and which ones cannot?
Okay here are two of the sort of things I'm looking for:
Tips to Determine Whether an App is Written in Delphi or Not
This "IsDelphi" program by Bruce McGee will find all applications built with Delphi, Delphi for .Net or C++ Builder that are on your hard drive.
I use WinDowse (a small freeware utility written in Delphi) to spy the windows of the program.. for example if you look at the "Class" TabSheet you can discover the "Class" Name of the control..
For example:
TFormXX, TEditYY, TPanelZZZ for delphi apps
WindowsForms10.XXXX.yyy, for .NET apps
wxWindowsXXX for wxWindows apps
AfxWndXX for MFC/VC++ apps (I think)
I think this is the fastest way (although not the most accurate) to find information about apps..
I understand your curiosity.
You can identify Delphi and C++ Builder apps and their SKU by looking for a couple of specific resources that the linker adds. Specifically RC Data\DVCLAL and RC DATA\PACKAGEINFO. The XN Resource Editor makes this a lot easier, but it might choke on compressed EXEs.
EXE compressors complicate things a little. They can hide or scramble the contents of the resources. Programs compressed with UPX are easy to identify with a HEX editor because the first 2 sections in the PE header are named UPX0 and UPX1. You can use the app to decompress these.
Applications compiled with .Net aren't difficult to detect. Recent versions of Delphi even include an IsAssembly function, or you could do a little spelunking in the PE header. Check out the IsManaged function in IsDelphi.
Telling which .Net language was used is trickier. By default, VB.Net includes a reference to Microsoft.VisualBasic, and VCL.Net apps included Borland specific references. However, VCL.Net is defunct in favour of Delphi Prism, and you can add a reference to the VB assembly to any managed language.
I haven't looked at some of the apps that use signatures to identify the the compiler, so I don't know how well they work.
I hope this helps.
First, look to see what run time libraries it loads. A C program won't normally load Visual Basic's library.
Also, examine the executable for telltale strings. In most executables, this is near the end. If the program uses string constants, there might be a clue in how they are stored.
A good disassembler, plus of course an excellent understanding of the underlying CPU architecture, can often help you identify the runtime libraries that are in play. Unless the exe has been carefully "stripped" of symbols and/or otherwise masked, the names of symbols seen in runtime libraries will often provide you with programming-language hints, because different languages' standards specify different names, and vendors of compilers and accompanying runtime libraries usually respect those standards pretty closely.
Of course, you won't get there without knowledge of the various possible languages and their library standards -- and if the code's author was intent to mask the information, that's not too hard for them to do, either.
If you have available a large set of samples from known compilers, I should think this would be an excellent application for machine learning. I believe so-called "supervised learning" is relevant here. Unfortunately I know next to nothing about the topic—only that I have heard some impressive results presented at conferences.
You might dig through the proceedings of the Working Conference on Reverse Engineering to see if anyone else is interested in this problem.
Assuming this is an application for Windows...
Does Reflector recognize it as a .NET assembly? Then it's MSIL, 99% either VB or C#, but you'll likely never know which, nor does it matter.
Does it need an intrepreter (like Java?)? Then it's Java (or whatever the interpreter is.)
Check what runtime DLLs it requires.
Does it require the VB runtime dlls? Congratulations, VB from VisualStudio 6.0 or earlier.
Does it require the Delphi dlls? Congratulations, Delphi.
Did you make it this far? C/C++. Assume C++ unless it requires msys or cygwin dlls, in which case C has maybe a 25% chance.
Congratulations, this should come out correct for the vast majority of Windows software. This probably doesn't actually help you though, as a lot of the same things can be done in all of these languages.
IDA Pro Free (http://www.hex-rays.com/idapro/idadownfreeware.htm) may be helpful. Even if you don't understand assembly language, if you load the EXE into IDA Pro then its initial progress output might (if there are any telltale signs) include its best guess as to which compiler was used.
Start with various options to dumpbin. The symbol names, if not carefully erased, will give you all kinds of hints as to whether it is C, C++, CLR, or something else.
Other tools use signatures to identify the compiler used to create the executable, like PEiD, CFF Explorer and others.
They normally scan the entry point of the executable vs the signature.
Signature Explorer from CFF Explorer can give you an understanding of how one signature is constructed.
It looks like the VC++ linker from V6 up adds a signature to the PE header which youcan parse.
i suggest PEiD (freeware, closed source). Has all of Delphi for Win32 signatures, also can tell you which was packer used (if any).

Resources