partial parallel and serial compilation with make - makefile

I am compiling llvm with make. When I do a parallel compile I do not have enough RAM during the linking steps. Is it possible to to a parallel compilation for all the object files and serial compilation during the linking step? For now stop compilation when my machine starts swapping and just restart the build process with make -j1, it would be neat if this could be done without human interaction.

I am not aware of any make implementation that dynamically adapts the degree of parallelism according to resource consumption. Indeed, although there is a potential to do that to some degree, the problem is not really solvable by make, because processes' resource consumption is not static. That is, a make could conceivably observe, say, that physical RAM was overcommitted, and react by holding off on starting new child processes, but it cannot easily protect against starting several child processes while resource consumption is low, which then balloon to demand more resources than are available.
However, depending on the makefile, there may be a workaround: you can name specific targets for make to build. You can use that to specify what targets you want to build in parallel, and then run a separate make to complete the build serially. That might look something like this:
make -j4 object1.o object2.o object3.o object4.o
make
That's rather unwieldy though. Supposing that there exists a target (or that you can create one) that represents all the object files but not any of the linked libraries / executables, then you can use that:
Makefile
OBJECTS = object1.o object2.o object3.o object4.o
all: my_program
my_program: objects
# ...
# This target collects all the objects:
objects: $(OBJECTS)
.PHONY: all objects
Command line
make -j4 objects
make

If your memory consumption correlates somehow with the load, you may limit make to take load into account with -l option. Relevant documentation:
When the system is heavily loaded, you will probably want to run fewer jobs than when it is lightly loaded. You can use the ‘-l’ option to tell make to limit the number of jobs to run at once, based on the load average. The ‘-l’ or ‘--max-load’ option is followed by a floating-point number.
This might or might not help in your case, but it can be worth to check.

Related

many Shell Commands architecture

at work, we are using docker and docker-compose, our developers need to start many containers locally and import a large database, there are many services that need to run together for development to be successful and easy.
so we sort of define reusable functions as make commands to make the code easier to maintain, is there another way to define and reuse many shell commands better than make.
for us due to network limitations running docker locally is the only option.
we managed to solve this challenge and make our developers' life easier by abstracting away complex shell commands behind multiple make targets, and in order to split these numerous make targets that control our docker infrastructure and containers we decided to split the targets among many files with .mk extension.
there are multiple make commands, like 40 of them, some of them are low level, some are meant to be called by developers to do certain tasks.
make launch_app
make import_db_script
make build_docker_images
but lately things are starting to become a little slow, with make commands calling other make commands internally, each make call is taking significant amount of time, since each lower level make call has to go through all defined .mk files, and do some calculations, as it shows when we run make -d, so it starts to add up to a considerable overhead.
is there any way to manage a set of complex shell commands using anything other than make, while still being easy for our developers to call.
thanks in advance.
Well, you could always just write your shell commands in a shell script instead of a makefile. Using shell functions, shell variables, etc. it can be managed. You don't give examples of how complex your use of make constructs is.
StackOverflow is not really a place to ask open-ended questions like "what's the best XYZ". So instead I'll treat this question as, "how can I speed up my makefiles".
To me it sounds like you just have poorly written makefiles. Again, you don't show any examples but it sounds like your rules are invoking lots of sub-makes (e.g., your rule recipes run $(MAKE) etc.) That means lots of processes invoked, lots of makefiles parsed, etc. Why don't you just have a single instance of make and use prerequisites, instead of sub-makes, to run other targets? You can still split the makefiles up into separate files then use include ... to gather them all into a single instance of make.
Also, if you don't need to rebuild the makefiles themselves you should be sure to disable the built-in rules that might try to do that. In fact, if you are just using make to run docker stuff you can disable all the built-in rules and speed things up a good bit. Just add this to your makefile:
MAKEFLAGS += -r
(see Options Summary for details of this option).
ETA
You don't say what version of GNU make you're using, or what operating system you're running on. You don't show any examples of the recipes you're using so we can see how they are structured.
The problem is that your issue, "things are slow", is not actionable, or even defined. As an example, the software I work on every day has 41 makefiles containing 22,500 lines (generated from cmake, which means they are not as efficient as they could be: they are generic makefiles and not using GNU make features). The time it takes for my build to run when there is nothing to actually do (so, basically the entire time is taken by parsing the makefiles), is 0.35 seconds.
In your comments you suggest you have 10 makefiles and 50 variables... I can't imagine how any detectable slowness could be caused by this size of makefile. I'm not surprised, given this information, that -r didn't make much difference.
So, there must be something about your particular makefiles which is causing the slowness: the slowness is not inherent in make. Obviously we cannot just guess what that might be. You will have to investigate this.
Use time make launch_app. How long does that take?
Now use time make -n launch_app. This will read all makefiles but not actually run any commands. How long does that take?
If make -n takes no discernible time then the issue is not with make, but rather with the recipes you've written and switching to a different tool to run those same recipes won't help.
If make -n takes a noticeable amount of time then something in your makefiles is slow. You should examine it for uses of $(shell ...) and possibly $(wildcard ...); those are where the slowness will happen. You can add $(info ...) statements around them to get output before and after they run: maybe they're running lots of times unexpectedly.
Without specific examples of things that are slow, there's nothing else we can do to help.

Can a dynamically linked glibc application dlopen() a static linked musl shared object?

I have a library that is currently dynamically linked against glibc.
This library dynamically loaded into an application that is also dynamically linked against glibc. I have no control over the application, only over the shared object.
However, sometimes loading the library causes the application to get SIGKILLd because it has pretty strict real-time requirements and rlimits set accordingly. Looking at this with a profiler tells me that most of the time is actually spent in the linker. So essentially dynamic linking is actually too slow (sometimes). Well that's not a problem I ever thought I'd have :)
I was hoping to solve this issue by producing a statically linked shared object. However, googling this issue and reading multiple other SO threads have warned me not to try to static link glibc. But these seem glibc specific issues.
So my question is, if I were to statically link this shared library against musl and then let a (dynamically linked) glibc application dlopen it, would that be safe? Is there a problem in general with multiple libc's?
Looking at this with a profiler tells me that most of the time is actually spent in the linker.
Something is very wrong with your profiling methodology.
First, the "linker" does not run when the application runs, only the loader (aka rtld, aka ld-linux) does. I assume you mean't the loader, not the linker.
Second, the loader does have some runtime cost at startup, but since every function you call is only resolved once, proportion of the loader runtime cost for the duration of an application which runs for any appreciable time (longer than about 1 minute) should quickly approach zero.
So essentially dynamic linking is actually too slow (sometimes).
You can ask the loader to resolve all dynamic symbols in your shared library at load time by linking with -Wl,-z,now linker flag.
if I were to statically link this shared library against musl and then let a (dynamically linked) glibc application dlopen it, would that be safe?
Not only this wouldn't be safe, it would most likely not work at all (except for most trivial shared library).
Is there a problem in general with multiple libc's?
Linking multiple libc's into a single process will cause too many problems to count.
Update:
resolving all symbols at load time is exactly the opposite of what I want, as the process gets sigkilled during loading of the shared object, after that it runs fine.
It sounds from this that you are using dlopen while the process is already executing time-critical real-time tasks.
That is not a wise thing to do: dlopen (among other things) calls malloc, reads data from disk, performs mmap calls, etc. etc. All of these require locks, and can wait arbitrarily long.
The usual solution is for the application to perform initialization (which loading your library would be part of) before entering time-critical loop.
Since you are not in control of the application, the only thing you can do is tell the application developers that their current requirements (if these are in fact their requirements) are not satisfiable -- they must provide some way to perform initialization before entering time-critical section, or they will always risk a SIGKILL. Making your library load faster will only make that SIGKILL appear with lower frequency, but it will not remove it completely.
Update 2:
yes, i'm aware that the best I can do is lower the frequency and not "solve" the problem, only try to mitigate it.
You should look into prelink. It can dramatically lower the time required to perform relocations. It's not a guarantee that your chosen prelink address will be available, so you may still get SIGKILLed sometimes, but this could be an effective mitigation.
It is theoretically possible to do something like that, but you will have to write a new version of the musl startup code that copes with the fact that the thread pointer and TCB have already been set up by glibc, and run that code from an ELF constructor in the shared object. Some musl functionality will be unavailable due to TCB layout differences.
I don't think it is likely that this will solve your actual problem. Even if it is time-related, it is possible that this hack makes things worse because it increases the amount of run-time relocations needed.

Using gmake to build large system

I'm working on trying to fix/redo the makefile(s) for a legacy system, and I keep banging my head against some things. This is a huge software system, consisting of numerous executables, shared objects, and libraries, using C, Fortran, and a couple of other compilers (such as Motif's UIL compiler). I'm trying to avoid recursive make and I would prefer "one makefile to build them all" rather than the existing "one makefile per executable/.so/.a" idea. (We call each executable, shared object, library, et al a "task," so permit me to use that term as I go forward.)
The system is scattered across tons of different source subdirectories, with includes usually in one of 6 main directories, though not always.
What would be the best way to deal with the plethora of source & target directories? I would prefer including a Task.mk file for each task that would provide the task-specific details, but target-specific variables don't give me enough control. For example, they don't allow me to change the source & target directories easily, at least from what I've tried.
Some things I cannot (i.e., not allowed to) do include:
Change the structure of the project. There's too much that would need to be changed everywhere to pull that off.
Use a different make. The official configuration (which our customer has guaranteed, so we don't need to deal with unknown configurations) uses gmake 3.81, period.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

profile linking times with gcc/g++ and ld

I'm using g++ to compile and link a project consisting of about 15 c++ source files and 4 shared object files. Recently the linking time more than doubled, but I don't have the history of the makefile available to me. Is there any way to profile g++ to see what part of the linking is taking a long time?
Edit: After I noticed that the makefile was using -O3 optimizations all the time, I managed to halve the linking time just by removing that switch. Is there any good way I could have found this without trial and error?
Edit: I'm not actually interested in profiling how ld works. I'm interested in knowing how I can match increases in linking time to specific command line switches or object files.
Profiling g++ will prove futile, because g++ doesn't perform linking, the linker ld does.
Profiling ld will also likely not show you anything interesting, because linking time is most often dominated by disk I/O, and if your link isn't, you wouldn't know what to make of the profiling data, unless you understand ld internals.
If your link time is noticeable with only 15 files in the link, there is likely something wrong with your development system [1]; either it has a disk that is on its last legs and is constantly retrying, or you do not have enough memory to perform the link (linking is often RAM-intensive), and your system swaps like crazy.
Assuming you are on an ELF based system, you may also wish to try the new gold linker (part of binutils), which is often several times faster than the GNU ld.
[1] My typical links involve 1000s of objects, produce 200+MB executables, and finish in less than 60s.
If you have just hit your RAM limit, you'll be probably able to hear the disk working, and a system activity monitor will tell you that. But if linking is still CPU-bound (i.e. if CPU usage is still high), that's not the issue. And if linking is IO-bound, the most common culprit can be runtime info. Have a look at the executable size anyway.
To answer your problem in a different way: are you doing heavy template usage? For each usage of a template with a different type parameter, a new instance of the whole template is generated, so you get more work for the linker. To make that actually noticeable, though, you'd need to use some library really heavy on templates. A lot of ones from the Boost project qualifies - I got template-based code bloat when using Boost::Spirit with a complex grammar. And ~4000 lines of code compiled to 7,7M of executable - changing one line doubled the number of specializations required and the size of the final executable. Inlining helped a lot, though, leading to 1,9M of output.
Shared libraries might be causing other problems, you might want to look at documentation for -fvisibility=hidden, and it will improve your code anyway. From GCC manual for -fvisibility:
Using this feature can very substantially
improve linking and load times of shared object libraries, produce
more optimized code, provide near-perfect API export and prevent
symbol clashes. It is *strongly* recommended that you use this in
any shared objects you distribute.
In fact, the linker normally must support the possibility for the application or for other libraries to override symbols defined into the library, while typically this is not the intended usage. Note that using that is not for free however, it does require (trivial) code changes.
The link suggested by the docs is: http://gcc.gnu.org/wiki/Visibility
Both gcc and g++ support the -v verbose flag, which makes them output details of the current task.
If you're interested in really profiling the tools, you may want to check out Sysprof or OProfile.

Resources