Exclude specific symbols from dSYM - xcode

I'm building an iOS project that includes a sub-project whose symbols I would like exclude from the product's .dSYM DWARF file.
The situation is that the sub-project (a static library) contains valuable proprietary code that I would not want an attacker to be able to symbolicate, even if they had the dSYM files used for resymbolicate crash reports for the whole app. The subproject covers a very specific domain and is well tested independently, so I'm not worried about being unable to resymbolicate stack traces in that code. However, I do need to be able to resymbolicate crash reports for the rest of the app, so I need a dSYM (as distributing symbols with the app is not an option).
I've already managed to make sure that all of the relevant symbols are stripped from the binary, and setting GCC_GENERATE_DEBUGGING_SYMBOLS=NO removed a lot from the dSYM, but I'm still seeing class-private C++ method names inside the dSYM file. For reference, I'm using clang.
How could I produce a dSYM for my app without compromising the symbols of this sub-project?

With a bog-standard Xcode workflow, this might be difficult. You could probably do something with a shell script phase which moves the static library to a different filename ("hides" it) and then runs dsymutil on your main app binary to create a dSYM. Because dsymutil can't find the static library, it won't be able to include any debug information for those functions. Alternatively, you can create a no-debug-info version of the static library although this will take a little bit more scripting. A static library is really a zip file of object (.o) files -- you need to create a directory, extract the .o files (ar x mylib.a), strip the .o files, then create a new static library (ar q mylib-nodebuginfo.a *.o I think) and put that in place before running dsymutil.
I know no on way to selectively remove debug information from a dSYM once it has been created, though. It's possible to do but I don't think anyone has written a tool like that.

Related

Gcc dead code removal with specific functions used

Lets say I have a large library liblarge, and application app which links to liblarge.
Liblarge is under the LGPL license, and app is under a proprietary one. I'd like to be able to remove all "dead code" from liblarge which is not used from app. Can I do this somehow? Provide a list of used functions to the linker perhaps?
There is no easy way for you to proceed.
You can use the above technique (in my comment) on a private copy to workout which *.o you can remove. Then you can build your own modified liblarge source tree that builds DSO/DLL but removes the *.o from the linker command line (for building the DSO/DLL) after you worked out you did not need.
This is just how C/C++ works a lot of information is lost once code is turned into object code.
For example you might then wish to try and reduce the size of each *.o file. The main way to do that is to split up .c/.cpp compilation units.
The problem with the C/C++ ABIs is that the compiler is free to put code anywhere in the *.o file and then jump into and out of segments inside it using relative offsets. There is not enough metadata saved in the *.o to be able to take apart compiled code and see all the dependencies it requires to function. To do this you need to manually split up the input source code.
This is one reason why for embedded software development when memory footprint used to be important you would literally put one function in inside on source file. These days embedded systems have a lot of memory.

What do you need in a static library?

I want to try making a simple game engine. Just something that handles states, assets, characters/actors and their stats and an inventory. Most of the code I can take from other games I've wrote, but I'm confused on how I then turn it into a static library. Do I need a main.cpp? If so what has to go in it? Under Linux I'm guessing I compile it to .so and add the headers to my include directory and then just link to the .so but what do I do on Windows and Mac?
A .so is not a static library, it's a dynamic one. A static library is, in its most basic, a .o file compiled from a single C file, or a .a file which is simply a collection of .o files.
A static library is different from a shared one in that the object code is linked directly in to the final executable, requiring no dependencies at run time.
Under Unix, the ar(1) command is used to bundle .o files in to a composite .a file. I do not know the comparable utility for Windows.
Once you have the .a file, you will simply need the combination of the .a file and the .h files to build your code. You use the .h files for compiling, and then link against the .a file.
Shared libraries have a specific advantage over static libraries in that if you have multiple, yet different, programs relying on the same libraries, the code from the shared libraries can be shared among all of the programs at the same time, so in that sense they lower the overall impact on the system. Their downside is slower start up times (though that's pretty marginal nowadays). Statically linked libraries can not be shared across independent programs, but if you run the same executable several times, its code will be shared.

Exploring .app files on mac

Can one get access to the application source files on mac? In Applications folder any .app file can be explored and there get access to the header files, is it all or can the class files be accessed too?
Unless a Mac application includes private frameworks (in the application bundle), which includes their headers (rare), no.
Most of the time, a Mac application will just contain the application's binary, as well as resources (icons, images, L10N, etc.).
You may disassemble the binary, if you know how to deal with assembly language.
If the application was built with Objective-C, you can use specific tools to produce a header file from the binary, with all the Objective-C interfaces.
Take a look at ClassDump, for instance.
You may also use the nm command, on the application's binary, to get a list of the symbols it contains.

Size difference between static and dynamic (debug) library and impact on final exe

I never put much thought into the size difference between a static library and a dynamic library until I downloaded pre-built libraries of boost today. I found that the static libraries of boost are much much bigger than the dynamic libraries.
For example, the debug multi-threaded boost wave static library is 97.7 mb in size while the same library, but dynamic, is only 1.4 mb in size (including import library and dll)! That is a huge difference. Why is that?
Second question, if I statically link against, let's say, the wave library. Does that mean my executable will balloon in size to more than 97.7 mb?
The static libraries have the full debug symbol information in them. For DLLs that information would be in .pdb files (which I assume would be similar in size to the static libs).
When you link to the static lib, the symbol information will not be copied into the .exe - it will be placed in the .pdb file (if your build is configured to create a .pdb file). The .pdb file does not need to be distributed with the .exe, whether or not the .pdb is created.
In the pre-built library download I get from boostpro.com, I don't get .pdb files for the boost DLLs they provide. if you build the DLLs yourself, you'll probably get the .pdb files (though you might have to set some config option, for which I have no idea what the details are).
update:
Looks like I might be wrong about easily getting .pdb files for the boost DLLs. From http://comments.gmane.org/gmane.comp.lib.boost.build/23246:
> Is there an additional option that I can pass on the command line to
> have the (correctly generated) PDB files also copied into the stage
> directory?
Not at this time. You can only hack
tools/build/v2/tools/package.jam to
add <install-type>PDB everywhere where
<install-type>SHARED_LIB or
<install-type>STATIC_LIB is now
written.
No, just because the LIB file is a certain size, doesn't mean it will add that size to your EXE. In fact, most linkers are smart enough to link in only the stuff that's used. Compare that to a dynamic library, which must contain everything.
Static libraries definitely make your EXE larger, but I always prefer it. Then I don't have to worry about missing or incompatible libraries at run time. (Or at least, I minimize the chances of this.)
Since static libraries do not contain finished binary data, but rather information needed for linker to build binary, this information may be bigger than built binaries.
When some function defined in header file is used in cpp-file, compiler puts its code (either inlines, or simply adds) to resulting object file. This means that there will be a lot of duplicates. It's linker's job to merge them, so static library just waits for linker to be reduced :)
Generally size of executable is usually bigger with static libraries, but size of executable together with dynamic libraries is usually smaller. DLL and EXE are linked separately, so linker cannot know which functionality is needed in DLL and which can be thrown out. In case of static library, linker has such information and can take only those obj-files which are used.
The debug static library contains debug information, which explains the huge size difference.

Debug info as dSYM file a security risk for disassembly?

I'm compiling a software on Mac OS X, and I don't want to expose the internals of it. But it would be great if I could use crashlogs sent by the users to inspect the crash reasons. I fear that the debug info generation as dSYM file exposes the internals of my app (the dSYM files not being distributed, anyway), so my question(s) is (are):
Does the dSYM file generation modify the generated application binary? If it does, how does it modify the binary? Is it a security risk for my intellectual property (e.g. is disassembly easier with dSYM file generation)?
Thanks.
The only thing the dSYM would provide to someone trying to disassemble your code is routine and symbol names that might have otherwise been stripped by the deployment build.
This only applies to unexported C and C++ routines. Routines names from Objective-C code gets included no matter what.
So unless you're worried about revealing the names of your C routines, I don't see any security risk.

Resources