Gcc dead code removal with specific functions used

Gcc dead code removal with specific functions used - gcc

Lets say I have a large library liblarge, and application app which links to liblarge.
Liblarge is under the LGPL license, and app is under a proprietary one. I'd like to be able to remove all "dead code" from liblarge which is not used from app. Can I do this somehow? Provide a list of used functions to the linker perhaps?

There is no easy way for you to proceed.
You can use the above technique (in my comment) on a private copy to workout which *.o you can remove. Then you can build your own modified liblarge source tree that builds DSO/DLL but removes the *.o from the linker command line (for building the DSO/DLL) after you worked out you did not need.
This is just how C/C++ works a lot of information is lost once code is turned into object code.
For example you might then wish to try and reduce the size of each *.o file. The main way to do that is to split up .c/.cpp compilation units.
The problem with the C/C++ ABIs is that the compiler is free to put code anywhere in the *.o file and then jump into and out of segments inside it using relative offsets. There is not enough metadata saved in the *.o to be able to take apart compiled code and see all the dependencies it requires to function. To do this you need to manually split up the input source code.
This is one reason why for embedded software development when memory footprint used to be important you would literally put one function in inside on source file. These days embedded systems have a lot of memory.

Related

Should the STM32 HAL be included as a precompiled library

I have a Keil STM32 project for a STM32L0. I sometimes (more often than I want) have to change the include paths or global defines. This will trigger a complete recompile for all code because it needs to ‘check’ for changed behaviour because of these changes. The problem is: I didn’t necessarily change relevant parameters for the HAL and as such it isn’t needed (as far as I understand) that these files are completely recompiled. This recompilation takes up quite a bit of time because I included all the HAL drivers for my STM32L0.
Would a good course of action be to create a separate project which compiles the HAL as a single library and include that in my main project? (This would of course be done for every microcontroller separately as they have different HALs).
ps. the question is not necessarily only useful for this specific example but the example gives some scope to the question.
pps. for people who aren't familiar with the STM32 HAL. It is the standardized interface with which the program interfaces with the underlying hardware. It is supplied in .c and .h files instead of the precompiled form of the STD/STL.
update
Here is an example of the defines that need to be managed in my example project:
STM32L072xx,USE_B_BOARD,USE_HAL_DRIVER, REGION_EU868,DEBUG,TRACE
Only STM32L072xx, and DEBUG are useful for configuring the HAL library and thus there shouldn't be a need for me to recompile the HAL when I change TRACE from defined to undefined. Therefore it seems to me that the HAL could be managed separately.
edit
Seeing as a close vote has been cast: I've read the don't ask section and my question seeks to constructively add to the knowledge of building STM32 programs and find a best practise on how to more effectively use the HAL libraries. I haven't found any questions on SO about building the HAL as a static library and therefore this question at least qualifies as unique. This question is also meant to invite a rich answer which elaborates on the pros/cons of building the HAL as a separate static library.

The answer here is.. it depends. As already pointed out in the comments, it depends on how you're planning to manage your projects. To answer your question in an unbiased way:
Option #1 - having HAL sources directly in your project means rebuilding HAL every time anything in its (and underlying) headers changes, which you've already noticed. Downside of it is longer build times. Upside - you are sure that what you build is what you get.
Option #2 - having HAL as a precompiled static library. Upside - shorter build times, downside - you can no longer be absolutely certain that the HAL library you include actually works as you want it to. In particular, you'd need to make sure in some way that all the #defines are exactly the same as when the library has been built. This includes project-wide definitions (DEBUG, STM32L072xx etc.), as well as anything in HAL config files (stm32l0xx_hal_conf.h).
Seeing how you're a Keil user - maybe it's just a matter of enabling multi-core build? See this link: http://www.keil.com/support/man/docs/armcc/armcc_chr1359124201769.htm. HAL library isn't so large that build times should be a concern when it comes to rebuilding its source files.
If I was to express my opinion and experience - personally I wouldn't do it, as it may lead to lower reliability or side effects that will be very hard to diagnose and will only get worse as you add more source files and more libraries like this. Not to mention adding more people to work on the project and explaining to them how they "need to remember to rebuild X library when they change given set of header files or project-wide definitions".
In fact, we've ran into the same dilemma for the code base I work on - it spans over 10k source and header files in total, some of which are configuration-specific and many of which are shared. It's highly modular which allows us to quickly create something new (both hardware- and software-wise) just by configuring existing code, mainly through a set of header files. However because this configuration is done through headers, making a change in them usually means rebuilding a large portion of the project. Even though build times get annoying sometimes, we opted against making static libraries for the reasons mentioned above. To me personally it's better to prioritize reliability, as in "I know what I build".
If I was to give any general tips that help to avoid rebuilds as your project gets large:
Avoid global headers holding all configuration. It's usually tempting to shove all configuration in one place, create pretty comments and sections for each software module in this one file. It's easier to manage this way (until this file becomes too big), but because this file is so common, it means that any change made to it will cause a full rebuild. Split such files to separate headers corresponding to each module in your project.
Include header files only where you need them. I sometimes see an approach where there are header files created that only "bundle" other header files and such header file is later included. In this case, making a change to any of those "smaller" headers will have an effect of having to recompile all source files including the larger file. If such file didn't exist, then only sources explicitly including that one small header would have to be recompiled. Obviously there's a line to be drawn here - including too "low level" headers may not be the greatest idea either, e.g. they may not be meant to be included as being internal library files which may change any time.
Prioritize including headers in source files over header files. If you have a pair of your own *.c (*.cpp) and *.h files - let's say temp_logger.c/.h and you need ADC - then unless you really need some ADC definition in your header (which you likely won't), then include the ADC header file in your temp_logger.c file. Later on, all files making use of the temp_logger functions won't have to be re-compiled in case HAL gets rebuilt again.

My opinion is yes, build the HAL into a library. The benefit of faster build time outweighs the risk of the library getting out of date. After some point early in the project it's unusual for me to change something that would affect the HAL. But the faster build time pays off many times.
I create a multi-project workspace with one project for the HAL library, another project for the bootloader, and a third project for the application. When I'm developing, I only rebuild the application project. When I make a release build, I select Project->Batch Build and rebuild all three projects. This way the release builds always use all the latest code and build settings.
Also, on the Options for Target dialog, Output tab, unchecking Browse Information will greatly reduce the build time.

Recommended tool to automate complicated build procedure

I am developing an OS for embedded devices that runs bytecode. Basically, a micro JVM.
In the process of doing so, I am able to compile and run Java applications to bytecode(ish) and flash that on, for instance, an Atmega1284P.
Now I've added support for C applications: I compile and process it using several tools and with some manual editing I eventually get bytecode that runs on my OS.
The process is very cumbersome and heavy and I would like to automate it.
Currently, I am using makefiles for automatic compilation and flashing of the Java applications & OS to devices.
All steps, roughly, for a C application are as follows and consist of consecutive manual steps:
(1) Use Docker to run a Linux container with lljvm that compiles a .c file to a .class file (see also https://github.com/davidar/lljvm/tree/master)
(2) convert this c.class file to a jasmin file (https://github.com/davidar/jasmin) using the ClassFileAnalyzer tool (http://classfileanalyzer.javaseiten.de/)
(3) manually edit this jasmin file in a text editor by replacing/adjusting some strings
(4) convert the modified jasmin file to a .class file again using jasmin
(5) put this .class file in a folder where the rest of my makefiles (the ones that already make and deploy the OS and class files from Java apps) can take over.
Current options seem to be just keep using makefiles but this is a bit unwieldly (I already have 5 different makefiles and this would further extend that chain). I've also read a bit about scons. In essence, I'm wondering which are some recommended tools or a good approach for complicated builds.

Hopefully this may help a bit, but the question as such could probably be a subject for a heated discussion without much helpful results.
As pointed out in the comments by others, you really need to automate the steps starting with your .c file to the point you can integrated it with the rest of your system.
There is generally nothing wrong with make and you would not win too much by switching to SCons. You'd get more ways to express what you want to do. Among other things meaning that if you wanted to write that automation directly inside the build system and its rules, you could also use Python and not only shell (should that be of a concern though, you could just as well call that Python code from make). But the essence of target, prerequisite, recipe is still there. And with that need for writing necessary automation for those .c to integration steps.
If you really wanted to look into alternative options. bazel might be of interest to you. The downside being the initial effort to write the necessary rules to fit your needs could be costly. And depending on size of your project, might just be too much. On the other hand once done with that, it'd be very easy to use (apply those rules on growing code base) and you could also ditch the container and rely on its more lightweight sand-boxing and external rules to get the tools and bits you need for your build... all with a single system for build description.

Exclude specific symbols from dSYM

I'm building an iOS project that includes a sub-project whose symbols I would like exclude from the product's .dSYM DWARF file.
The situation is that the sub-project (a static library) contains valuable proprietary code that I would not want an attacker to be able to symbolicate, even if they had the dSYM files used for resymbolicate crash reports for the whole app. The subproject covers a very specific domain and is well tested independently, so I'm not worried about being unable to resymbolicate stack traces in that code. However, I do need to be able to resymbolicate crash reports for the rest of the app, so I need a dSYM (as distributing symbols with the app is not an option).
I've already managed to make sure that all of the relevant symbols are stripped from the binary, and setting GCC_GENERATE_DEBUGGING_SYMBOLS=NO removed a lot from the dSYM, but I'm still seeing class-private C++ method names inside the dSYM file. For reference, I'm using clang.
How could I produce a dSYM for my app without compromising the symbols of this sub-project?

With a bog-standard Xcode workflow, this might be difficult. You could probably do something with a shell script phase which moves the static library to a different filename ("hides" it) and then runs dsymutil on your main app binary to create a dSYM. Because dsymutil can't find the static library, it won't be able to include any debug information for those functions. Alternatively, you can create a no-debug-info version of the static library although this will take a little bit more scripting. A static library is really a zip file of object (.o) files -- you need to create a directory, extract the .o files (ar x mylib.a), strip the .o files, then create a new static library (ar q mylib-nodebuginfo.a *.o I think) and put that in place before running dsymutil.
I know no on way to selectively remove debug information from a dSYM once it has been created, though. It's possible to do but I don't think anyone has written a tool like that.

ada95 have 3 files .ali, .adb and .o - can I compile

I've found some old college work, with my final Ada95 project on it. Sadly, the disc was corrupted, and I have only managed to recover 3 files (the source and executable couldnt be recovered):
project.adb, project.ali and project.o
Are these 3 files enough to compile a new exe? I'm downloading the gnat compiler now, but have to admit, I have forgotten almost everything ada related...
Frank
[EDIT]
shucks.... using GCC to compile the project.adb throws an error about a missing ads file, which I cannot recover.
Is it possible to extract this / compile just the ".o" or ".ali" files? Or, am I stuffed?

project.adb is a source file.
Since you say that gcc complains about a missing .ads file, that indicates that project.adb contains a package body. You can manually construct a corresponding package spec by putting the following into package.ads:
package Project is
end Project;
Now that's almost certainly not enough, because the project spec probably had some type and constant declarations in it, so you'd have to analyze your package body and identify what it references. Infer what those declarations should look like and add them. Oh, and if your package body "with's" any packages that are not part of the standard Ada library, you'll have to recover those as well.
If you do manage to get your reverse engineered spec and the body to compile, you'll still have to create a "driver" program that "with's" the project package, and calls whatever functions and/or procedures that carried out the function of your project (and you'll have to pull the specs of those subprograms--which match their appearance in the package body--into the spec as well.)
Frankly, if it were me, I'd spend more time on trying to use some disk recovery tools to pull whatever else I could off the disk.

In Ada95 (and 2005) one mostly work with adb files (occasionally with ads files) everything else is generated on the run. In your case the adb file is surely other linked up to other ads files.
However, ads files are usually small programs (Obviously, if you are not attempting really exotic things as 'the dining philosophers') which pertain to the algorithmic/mathematical structure of the program, if you can dig out what you did in your project then it should not be impossible to restore it !

Comparing VB6.exes

We're going through a massive migration project at the minute and trying to validate the code that is deployed to the live estate matches the code we have in source control.
Obviously the .net code is easy to compare because we can disassemble. I don't believe this is possible in vb6 exes because of the manner of compilation.
Does anyone have any ideas on how I could validate the source code and the compiled executable matches the file I have in Live.
Thanks

Visual Basic had (has) two ways of compiling, one to the interpreter ( called P-code) that would result in smaller binaries, and a second one that generates "regular" windows .exe file (called native) that was introduced because it was supposed to be fastar than p-code; although the compiled file size increased with this option.
If your compilation was using p-code, it is in theory possible to restore the sources.
Either way is pretty difficult to do, but there are tools that claim they can partially do this, one that I know of ( never tried but there is a trial version ) is VB-decompiler
http://www.vb-decompiler.org/

Unfortunately that's almost impossible. Bear in mind that VB6 code compiled on different machines will have different exe sizes and deployment requirements.
This is why the old VB'ers had a dedicated machine to compile their code.

This won't help you with already deployed items, but if you upped the revision number on every compile (there is a project setting to do this for you automatically) then you could easily compare version numbers.

My old company bought a copy of that VB-Decompiler and as noted before VB5/6 generates P-Code extra, that tool did produce some code and if not Assembly code which could be "read".

If you have all the code you compiled, you could compare the CRC's of that code to what is deployed in the field. But if you don't have the original compiled code, depending upon how you compiled the code you (if you used P-Code rather than Native Code you may be able to disassemble but the disassembly will look nothing like your source code). I doubt you would have shipped the PDB's with the exe's, but if you did, you could certainly use those to compare with the source code in your repository.

Have a trusted computer that can check out the various libraries and exes you make and compile them automatically. Keep those in a read-only but accessible location. Then do a binary comparison between the deployed site and your comparison site.
However I am not sure of the logic over disassembling the the complied units. My company and most other places I know of use a combination of a build computer and unit testing. In our company the EXE we make is a very thin shell over a bunch of libraries. For example a button click will be passed to a UI Active X DLL that does the actual processing. What we do after a build is run a special EXE that perform our list of unit test. If they all passed we know our libraries, where 90% of our code is, are good. As for the actual EXE we have a hand procedure that takes about two hours to do and then we are good. IIt is rare for any errors to happen in the EXE.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio