Rust library for inspecting .rlib binaries - binaryfiles

I'm looking for a way to load and inspect .rlib binaries generated by rustc. I've hunted around the standard library without much luck. My assumption is that an .rlib contains all the type information necessary to statically type check programs that "extern crate" it. rustc::metadata is where my hunt ended. I can't quite figure out if the structures available at this point in the compiler are intended as entry points for users, or if they are solely intermediate abstractions depending on a chain of previously initialized data.
Alternatively, If there's a way to dump an .rlib to stdout in a parsable form then that's also fantastic. I tried /usr/bin/nm, but it seemed to be excluding function type signatures. Maybe I'm missing something.
Anyways, I'm working on an editor utility for emacs that I hope at some point will provide contextually relevant information such as available methods, module items and their types, etc. I'd really appreciate any hints anyone has.

The .rlib file is an ar archive file. You can use readelf to read its content.
Try readelf -s <your_lib>.rlib. The type name may be mingled/decorated by the compiler so it may not be exactly the same as in .rs file.

Related

Linking against an external object file (.o) with autoconf

For work purposes I need to link against an object file generated by another program and found in its folder, the case is that I did not find information about this kind of linkage. I think that if I hardcode the paths and put the name-of-obj.o in front of the package_LDADD variable should work, but the case is that I don't want to do it that way.
If the object is not found I want the configure to fail and tell the user that the name-of-obj.o is missing.
I tried by using AC_LIBOBJ([name-of-obj.o]) but this will try to find in the root directory a name-of-obj.c and compile it.
Any tip or solution around this issue?
Thank you!
I need to link against an object file generated by another program and
found in its folder
What you describe is a very unusual requirement, not among those that the Autotools are designed to handle cleanly or easily. In particular, Autoconf has no mechanisms specifically applicable to searching for bare object files, as opposed to libraries, and Automake has no particular automation around including such objects when it links. Nevertheless, these tools do have enough general purpose functionality to do what you want; it just won't be as tidy as you might like.
I think that if I hardcode the paths and put the
name-of-obj.o in front of the package_LDADD variable should work, but
the case is that I don't want to do it that way.
I take it that it is the "hardcode the paths" part that you want to avoid. Adding an item to an appropriate LDADD variable is not negotiable; it is the right way to get your object included in the link.
If the object is not found I want the configure to fail and tell the
user that the name-of-obj.o is missing.
Well, then, the key thing appears to be to get configure to perform a search for your object file. Autoconf does not have a built-in mechanism to perform such a search, but it's just a macro-based shell-script generator, so you can write such a search in shell script + Autoconf, maybe something like this:
AC_MSG_CHECKING([for name-of-obj.o])
OTHER_LOCATION=
for my_dir in
/some/location/other_program/src
/another/location/other_program.12345/src
$srcdir/../relative/location/other_program/src; do
AS_IF([test -r "${my_dir}/name-of-obj.o"], [
# optionally, perform any desired test to check that the object is usable
# ... perhaps one using AC_LINK_IFELSE ...
# if it passes, then
OTHER_LOCATION=${my_dir}
break
])
done
# Check whether the object was in fact discovered, and act appropriately
AS_IF([test "x${OTHER_LOCATION}" = x], [
# Not found
AC_MSG_RESULT([not found])
AC_MSG_ERROR([Cannot configure without name-of-obj.o])
], [
AC_MSG_RESULT([${OTHER_LOCATION}/name-of-obj.o])
AC_SUBST([OTHER_LOCATION])
])
That's functional, but of course you could embellish, such as by providing for the package builder to specify a location to use via a command-line argument (AC_ARG_WITH(...)). And if you want to do this for multiple objects, then you would probably want to wrap up at least some of that into a custom macro.
The Automake side is much less involved. To get the object linked, you just need to add it to the appropriate LDADD variable, using the output variable created by the above, such as:
foo_LDADD = $(OTHER_LOCATION)/name-of-obj.o
Note that if you're building just one program target then you can use the general LDADD instead of foo_LDADD, but note that by default these are alternatives not complements.
With that said, this is a bad idea overall. If you want to link something that is not part of your project, then you should get it from an installed library. That can be a local, custom-built library, of course, so long as it is a library, not a bare object file, and it is installed. It can be a static library if you don't want to rely on or distribute a separate shared library.
On the other hand, if your project is part of a larger build, then the best approach is probably to integrate it into that build, maybe as a subproject. It would still be best to link a library instead of a bare object file, but in a subproject context it might make sense to use a lib that was not installed to the build system. In conjunction with a command-line argument that tells it where to find the wanted lib, this could make the needed Autoconf code much cleaner and clearer.

Extract Structure definitions from executable

I need to extract structure definitions from an executable. How can I do that?
I read we can do it using ELF, but not sure how to do this. Any help here?
I read we can do it using ELF, but not sure how to do this.
What you probably read is that if a binary contains debug info, then the types of variables, structures, and great many other kinds of info can be extracted from that binary.
This isn't specific to ELF: many other executable formats (such as COFF) allow for embedding of debugging info as well.
Further, the format of that debugging info is different between different platforms. Some of the common UNIX ones are DWARF and STABS (with DWARF being more recent and much more powerful).
If you have an ELF binary, and you suspect that it may contain DWARF debug info, you can decode it using readelf -wi a.out (be prepared for there to be a lot of info, if any is present at all). objdump -g can be used to decode STABS (recent objdump versions can decode DWARF as well).
Or, as suggested by tristan, you can load the executable into GDB and use info types and ptype commands.
If the binary doesn't contain debug info, then DrPrItay's answer is correct: you can't easily recover structure definitions from it. However, you still can recover them by using reverse-engineering techniques. For example, many struct definitions used by the Wine project (example) were obtained by such techniques.
As much as I know, you can't. c / c++ programs are not like java, structs dont gain a symbol. Their just definitions for your compiler about how to align and pack variables within stack frames or some other memory (struct data members). For example unlike java you dont have what resembles class loading when loading shared objects's (no header file included within your c program ) you can only load global variables and functions. Defining a struct is much as creating some data type, it's definition should be only present for compilation, you dont get a symbol within the symtable for int or char then why should you for some struct? It simply makes no sense. Symbols aee soley meant for objects that your compiler doesn't recognize during compilation - link time/load time/run time

Java Bytecode manipulation libraries

I am starting to work on a project and for one of the tasks I need to analyze the source code in order to gather information about the classes and their methods. More specifically, for each method I need to know which internal attributes and external objects (references) it uses throughout the entire method body.
I discussed it with my supervisors and they think that Bytecode manipulation libraries is the way to go. I already looked at BCEL, ASM and Javassist but I'm not sure which one I need to use. Do they all provide access to the method body where I can see all the instructions and get the information I need?
Any advice would be appreciate it. Thank you!
If you really “need to analyze the source code”, then libraries which allow to inspect the bytecode are not the way to go.
Otherwise, you really need to define your task precisely. Either, you are about to analyze classes, regardless of whether you will look at their source code or byte code, or you want to analyze source code and consider doing it by compiling first, followed by analyzing the compiled result. In the latter case, you have to compare the effort of both steps with alternative solution, which may, e.g. incorporate direct source code analysis.
Parsing byte code is rather easy, easier than analyzing source code, which is the reason why bytecode is produced prior to the execution of Java programs. To answer your concrete question, yes, all three libraries offer you a way to analyze the instructions and associated information. Which one is the best to fit your needs, is a question that is beyond the scope of Stackoverflow.
Whether analyzing the byte code helps, depends on your exact requirements. When it comes to field and method access, you may precisely get most of them using that approach. Only inlined compile-time constants lack their origins. When it comes to type use, you have to consider that not every source code artifact has an existing counterpart in the byte code, e.g. widening casts produce no actual code and and local variables usually don’t have a declared type (debugging information aside), but only an implied type which depends on how they are actually used. They also have no information about Generics, unless debugging information has been included.

Best way to fix IAT and relocs when patching (merging) two different binaries (x86 PE)?

First of - Hello and thank you for reading this,
I have one DLL which I do not have the source code but need to add some functionalities into it.
I made up another DLL implementing all these needed functionalities in C - using Visual Studio.
Now I need to insert the generated code from this new DLL into the target DLL (it has to be done at the file level {not at runtime}).
I am probably creating a new PE section on the target DLL and put there all the code/data/rdata from the dll I made up. The problem is that I need somehow to fix the IAT and the relocs relative to this new inserted code on the target DLL.
My question is:
What is the best way to do it?
It would be nice if Visual Studio came up with an option to build using only (mostly) relative addressing - This would save me a lot when dealing with the relocs.
I guess I could encapsulate all my vars and constants into a struct, hopefully MSVC would then only need to relocate the address of this "container" struct and use relative addressing to access its members. But don't know if this is a good idea.
I could even go further and get rid of the IAT by making a function pointer which would dynamically load the needed function module (kind of the Delay Load Module). And again, put this function pointer inside the "container" struct I said before.
The last option I have is to make it all by hand, manually editing the binary in hex... which I really didn`t want to do, because it would take some good time to do it for every single IAT entry and reloc entry. I have already written a PE file encryptor some time ago so I know most of the inner workings and know it can be done, just want to know your thoughts and maybe a tool already exists to help me out?
Any suggestions is highly appreciated!
Thanks again for your time for reading this!
Since you are asking for suggestions, take a look at the very good PORTABLE EXECUTABLE FILE FORMAT – A REVERSE ENGINEER VIEW PDF Document. The Section "Adding Code to a PE File" describes some techniques (and presents Tools) to add code to an existing PE image without having the code of the target image (your scenario) by manipulation the IAT table and Sections tables.

Size of a library and the executable

I have a static library *.lib created using MSVC on windows. The size of library is say 70KB. Then I have an application which links this library. But now the size of the final executable (*.exe) is 29KB, less than the library. What i want to know is :
Since the library is statically linked, I was thinking it should add directly to the executable size and the final exe size should be more than that? Does windows exe format also do some compression of the binary data?
How is it for linux systems, that is how do sizes of library on linux (*.a/*.la file) relate with size of linux executable (*.out) ?
-AD
A static library on both Windows and Unix is a collection of .obj/.o files. The linker looks at each of these object files and determines if it is needed for the program to link. If it isn't needed, then the object file won't get included in the final executable. This can lead to executables that are smaller then the library.
EDIT: As MSalters points out, on Windows the VC++ compiler now supports generating object files that enable function-level linking, e.g., see here. In fact, edit-and-continue requires this, since the edit-and-continue needs to be able to replace the smallest possible part of the executable.
There is additional bookkeeping information in the .lib file that is not needed for the final executable. This information helps the linker find the code to actually link. Also, debug information may be stored in the .lib file but not in the .exe file (I don't recall where debug info is stored for objs in a lib file, it might be somewhere else).
The static library probably contains several functions which are never used. When the linker links the library with the main executable, it sees that certain functions are never used (and that their addresses are never taken and stored in function pointers), it just throws away the code. It can also do this recursively: if function A() is never called, and A() calls B(), but B() is never otherwise called, it can remove the code for both A() and B(). On Linux, the same thing happens.
A static library has to contain every symbol defined in its source code, because it might get linked into an executable which needs just that specific symbol. But once it is linked into an executable, we know exactly which symbols end up being used, and which ones don't. So the linker can trivially remove unused code, trimming the file size by a lot. Similarly, any duplicate symbols (anything that's defined in both the static library and the executable it's linked into gets merged into a single instance.
Disclaimer: It's been a long time since I dealt with static linking, so take my answer with a grain of salt.
You wrote: I was thinking it should add directly to the executable size and final exe size should be more than that?
Naive linkers work exactly this way - back when I was doing hobby development for CP/M systems (a LONG time ago), this was a real problem.
Modern linkers are smarter, however - they only link in the functions referenced by the original code, or as required.
Additionally to the current answers, the linker is allowed to remove function definitions if they have identical object code - this is intended to help reduce the bloating effects of templated code.
#All: Thanks for the pointers.
#Greg Hewgill - Your answer was a good pointer. Thanks.
The answer i found out was as follows:
1.)During Library building what happens is if the option "Keep Program debug databse" in MSVC (or something alike ) is ON, then library will have this debug info bloating its size.
but when i statically include that library and create a executable, the linker strips all that debug info from the library before geenrating the exe and hence the exe size is less than that of the library.
2.) When i disabled the option "Keep Program debug databse", i got an library whose size was smaller than the final executable, which was what i thought is nromal in most situations.
-AD

Resources