How to link a program in GCC to prelinked library? - gcc

OK, I have the problem, I do not know exactly the correct terms in order to find what I am looking for on google. So I hope someone here can help me out.
When developing real time programs on embedded devices you might have to iterate a few hundred or thousand times until you get the desired result. When using e.g. ARM devices you wear out the internal flash quite quickly. So typically you develop your programs to reside in the RAM of the device and all is ok. This is done using GCC's functionality to split the code in various sections.
Unfortunately, the RAM of most devices is much smaller than the flash. So at one point in time, your program gets too big to fit in RAM with all variables etc. (You choose the size of the device such that one assumes it will fit the whole code in flash later.)
Classical shared objects do not work as there is nothing like a dynamical linker in my environment. There is no OS or such.
My idea was the following: For the controller it is no problem to execute code from both RAM and flash. When compiling with the correct attributes for the functions this is also no big problem for the compiler to put part of the program in RAM and part in flash.
When I have some functionality running successfully I create a library and put this in the flash. The main development is done in the 'volatile' part of the development in RAM. So the flash gets preserved.
The problem here is: I need to make sure, that the library always gets linked to the exact same location as long as I do not reflash. So a single function must always be on the same address in flash for each compile cycle. When something in the flash is missing it must be placed in RAM or a lining error must be thrown.
I thought about putting together a real library and linking against that. Here I am a bit lost. I need to tell GCC/LD to link against a prelinked file (and create such a prelinked file).
It should be possible to put all the library objects together and link this together in the flash. Then the addresses could be extracted and the main program (for use in RAM) can link against it. But: How to do these steps?
In the internet there is the term prelink as well as a matching program for linux. This is intended to speed up the loading times. I do not know if this program might help me out as a side effect. I doubt it but I do not understand the internals of its work.
Do you have a good idea how to reach the goal?

You are solving a non-problem. Embedded flash usually has a MINIMUM write cycle of 10,000. So even if you flash it 20 times a day, it will last a year and half. An St-Nucleo is $13. So that's less than 3 pennies a day :-). The TYPICAL write cycle is even longer, at about 100,000. It will be a long time before you wear them out.
Now if you are using them for dynamic storage, that might be a concern, depending on the usage patterns.
But to answer your questions, you can build your code into a library .a file easily enough. However, GCC does not guarantee that it links the object code in any order, as it depends on optimization level. Furthermore, only functions that are referenced in a library file is pulled in, so if your function calls change, it may pull in more or less library functions.

Related

Auto-detect GPU

I need detect GPU (videocard) and set settings of the app, appropriate to GPU performance.
I'm able to make a list with settings for each GPU model, but I don't understand how to easily detect model of GPU installed in PC.
What is the best way to solve this task? Does any way to do this that is not dependent on installed driver/some software?
The above comment by Ben Voigt summarizes it: Simply don't do it.
See if the minimum version of your favorite compute API (OpenCL or whatever) is supported, and if the required extensions are present, compile some kernels, and see if that produces errors. Run the kernels and benchmark them. Ask the API how much local/global memory you have available, what warp sizes it supports, and so on.
If you really insist on detecting the GPU model, prepare for trouble. There are two ways of doing this, one is parsing the graphic card's advertised human readable name, this is asking for trouble right away (since many cards that are hugely different will advertise the same human-readable name, and some model names even lie about their architecture generation!).
The other, slightly better way is finding the vendor/model ID combination and looking that one up. This works somewhat better but it is equally painful and error-prone.
You can parse these vendor and model IDs from the "key" string inside the structure that you get when you call EnumDisplayDevices. Which, if I remember correctly, Microsoft calls "reserved", in other words it's kind of unsupported/undocumented.
Finding out the vendor is still relatively easy. A vendor ID of 0x10DE is nVidia, and 0x1002 is AMD/ATI. 0x163C is Intel. However, sometimes, very rarely, a cheapish OEM will advertise its own ID instead.
Then you have the kind of meaningless model number (it's not like bigger numbers are better, or some other obvious rule!), which you need to look up somewhere. nVidia and AMD publish these officially [1] [2], although they are not necessarily always up-to-date. There was a time when nVidia's list lacked the most recent models for almost one year (though the list I just downloaded seems to be complete). I'm not aware of other manufacturers, including Intel, doing this consistently.
Spending some time on Google will lead you to sites like this one, which are not "official" but may allow you to figure out most stuff anyway... in a painstalking manner.
And then, you know the model, and you have gained pretty much nothing. You still need to translate this to "good enough for what I want" or "not good enough".
Which you could have found out simply by compiling your kernels and seeing that no error is reported, and running them.
And what do you do in 6 months when 3 new GPU models are released after your application which obviously cannot know these has already shipped? How do you treat these?

AVR's Program memory

I ve written a code in C for ATmega128 and
I d like to know how the changes that I do in the code influence the Program Memory.
To be more specific, let's consider that the code is similar to that one:
d=fun1(a,b);
c=fun2(c,d);
the change that I do in the code is that I call the same functions more times e.g.:
d=fun1(a,b);
c=fun2(c,d);
h=fun1(k,l);
n=fun2(p,m);
etc...
I build the solution at the AtmelStudio 6.1 and I see the changes in the Program Memory.
Is there anyway to foresee, without builiding the solution, how the chages in the code will affect the program memory?
Thanks!!
Generally speaking this is next to impossible using C/C++ (that means the effort does not pay off).
In your simple case (the number of calls increase), you can determine the number of instructions for each call, and multiply by the number. This will only be correct, if the compiler does not inline in all cases, and does not apply optimzations at a higher level.
These calculations might be wrong, if you upgrade to a newer gcc version.
So normally you only get exact numbers when you compare two builds (same compiler version, same optimisations). avr-size and avr-nm gives you all information, for example to compare functions by size. You can automate this task (by converting the output into .csv files), and use a spreadsheet or diff to look for changes.
This method normally only pays off, if you have to squeeze a program into a smaller device (from 4k flash into 2k for example - you already have 128k flash, that's quite a lot).
This process is frustrating, because if you apply the same design pattern in C with small differences, it can lead to different sizes: So from C/C++, you cannot really predict what's going to happen.

how to minimize a programming language compile time?

I was thinking more about the programming language i am designing. and i was wondering, what are ways i could minimize its compile time?
Your main problem today is I/O. Your CPU is many times faster than main memory and memory is about 1000 times faster than accessing the hard disk.
So unless you do extensive optimizations to the source code, the CPU will spend most of the time waiting for data to be read or written.
Try these rules:
Design your compiler to work in several, independent steps. The goal is to be able to run each step in a different thread so you can utilize multi-core CPUs. It will also help to parallelize the whole compile process (i.e. compile more than one file at the same time)
It will also allow you to load many source files in advance and preprocess them so the actual compile step can work faster.
Try to allow to compile files independently. For example, create a "missing symbol pool" for the project. Missing symbols should not cause compile failures as such. If you find a missing symbol somewhere, remove it from the pool. When all files have been compiled, check that the pool is empty.
Create a cache with important information. For example: File X uses symbols from file Y. This way, you can skip compiling file Z (which doesn't reference anything in Y) when Y changes. If you want to go one step further, put all symbols which are defined anywhere in a pool. If a file changes in such a way that symbols are added/removed, you will know immediately which files are affected (without even opening them).
Compile in the background. Start a compiler process which checks the project directory for changes and compile them as soon as the user saves the file. This way, you will only have to compile a few files each time instead of everything. In the long run, you will compile much more but for the user, turnover times will be much shorter (= time user has to wait until she can run the compiled result after a change).
Use a "Just in time" compiler (i.e. compile a file when it is used, for example in an import statement). Projects are then distributed in source form and compiled when run for the first time. Python does this. To make this perform, you can precompile the library during the installation of your compiler.
Don't use header files. Keep all information in a single place and generate header files from the source if you have to. Maybe keep the header files just in memory and never save them to disk.
what are ways i could minimize its compile time?
No compilation (interpreted language)
Delayed (just in time) compilation
Incremental compilation
Precompiled header files
I've implemented a compiler myself, and ended up having to look at this once people started batch feeding it hundreds of source files. I was quite suprised what I found out.
It turns out that the most important thing you can optimize is not your grammar. It's not your lexical analyzer or your parser either. Instead, the most important thing in terms of speed is the code that reads in your source files from disk. I/O's to disk are slow. Really slow. You can pretty much measure your compiler's speed by the number of disk I/Os it performs.
So it turns out that the absolute best thing you can do to speed up a compiler is to read the entire file into memory in one big I/O, do all your lexing, parsing, etc. from RAM, and then write out the result to disk in one big I/O.
I talked with one of the head guys maintaining Gnat (GCC's Ada compiler) about this, and he told me that he actually used to put everything he could onto RAM disks so that even his file I/O was really just RAM reads and writes.
In most languages (pretty well everything other than C++), compiling individual compilation units is quite fast.
Binding/linking is often what's slow - the linker has to reference the whole program rather than just a single unit.
C++ suffers as - unless you use the pImpl idiom - it requires the implementation details of every object and all inline functions to compile client code.
Java (source to bytecode) suffers because the grammar doesn't differentiate objects and classes - you have to load the Foo class to see if Foo.Bar.Baz is the Baz field of object referenced by the Bar static field of the Foo class, or a static field of the Foo.Bar class. You can make the change in the source of the Foo class between the two, and not change the source of the client code, but still have to recompile the client code, as the bytecode differentiates between the two forms even though the syntax doesn't. AFAIK Python bytecode doesn't differentiate between the two - modules are true members of their parents.
C++ and C suffer if you include more headers than are required, as the preprocessor has to process each header many times, and the compiler compile them. Minimizing header size and complexity helps, suggesting better modularity would improve compilation time. It's not always possible to cache header compilation, as what definitions are present when the header is preprocessed can alter its semantics, and even syntax.
C suffers if you use the preprocessor a lot, but the actual compilation is fast; much of C code uses typedef struct _X* X_ptr to hide implementation better than C++ does - a C header can easily consist of typedefs and function declarations, giving better encapsulation.
So I'd suggest making your language hide implementation details from client code, and if you are an OO language with both instance members and namespaces, make the syntax for accessing the two unambiguous. Allow true modules, so client code only has to be aware of the interface rather than implementation details. Don't allow preprocessor macros or other variation mechanism to alter the semantics of referenced modules.
Here are some performance tricks that we've learned by measuring compilation speed and what affects it:
Write a two-pass compiler: characters to IR, IR to code. (It's easier to write a three-pass compiler that goes characters -> AST -> IR -> code, but it's not as fast.)
As a corollary, don't have an optimizer; it's hard to write a fast optimizer.
Consider generating bytecode instead of native machine code. The virtual machine for Lua is a good model.
Try a linear-scan register allocator or the simple register allocator that Fraser and Hanson used in lcc.
In a simple compiler, lexical analysis is often the greatest performance bottleneck. If you are writing C or C++ code, use re2c. If you're using another language (which you will find much more pleasant), read the paper aboug re2c and apply the lessons learned.
Generate code using maximal munch, or possibly iburg.
Surprisingly, the GNU assembler is a bottleneck in many compilers. If you can generate binary directly, do so. Or check out the New Jersey Machine-Code Toolkit.
As noted above, design your language to avoid anything like #include. Either use no interface files or precompile your interface files. This tactic dramatically reduces the burdern on the lexer, which as I said is often the biggest bottleneck.
Here's a shot..
Use incremental compilation if your toolchain supports it.
(make, visual studio, etc).
For example, in GCC/make, if you have many files to compile, but only make changes in one file, then only that one file is compiled.
Eiffel had an idea of different states of frozen, and recompiling didn't necessarily mean that the whole class was recompiled.
How much can you break up the compliable modules, and how much do you care to keep track of them?
Make the grammar simple and unambiguous, and therefore quick and easy to parse.
Place strong restrictions on file inclusion.
Allow compilation without full information whenever possible (eg. predeclaration in C and C++).
One-pass compilation, if possible.
One thing surprisingly missing in answers so far: make you you're doing a context free grammar, etc. Have a good hard look at languages designed by Wirth such as Pascal & Modula-2. You don't have to reimplement Pascal, but the grammar design is custom made for fast compiling. Then see if you can find any old articles about the tricks Anders pulled implementing Turbo Pascal. Hint: table driven.
it depends on what language/platform you're programming for. for .NET development, minimise the number of projects that you have in your solution.
In the old days you could get dramatic speedups by setting up a RAM drive and compiling there. Don't know if this still holds true, though.
In C++ you could use distributed compilation with tools like Incredibuild
A simple one: make sure the compiler can natively take advantage of multi-core CPUs.
Make sure that everything can be compiled the fist time you try to compile it. E.g. ban forward references.
Use a context free grammar so that you can find the correct parse tree without a symbol table.
Make sure that the semantics can be deduced from the syntax so you can construct the correct AST directly rather than by mucking with a parse tree and symbol table.
How serious a compiler is this?
Unless the syntax is pretty convoluted, the parser should be able to run no more than 10-100 times slower than just indexing through the input file characters.
Similarly, code generation should be limited by output formatting.
You shouldn't be hitting any performance issues unless you're doing a big, serious compiler, capable of handling mega-line apps with lots of header files.
Then you need to worry about precompiled headers, optimization passes, and linking.
I haven't seen much work done for minimizing the compile time. But some ideas do come to mind:
Keep the grammar simple. Convoluted grammar will increase your compile time.
Try making use of parallelism, either using multicore GPU or CPU.
Benchmark a modern compiler and see what are the bottlenecks and what you can do in you compiler/language to avoid them.
Unless you are writing a highly specialized language, compile time is not really an issue..
Make a build system that doesn't suck!
There's a huge amount of programs out there with maybe 3 source files that take under a second to compile, but before you get that far you'd have to sit through an automake script that takes about 2 minutes checking things like the size of an int. And if you go to compile something else a minute later, it makes you sit through almost exactly the same set of tests.
So unless your compiler is doing awful things to the user like changing the size of its ints or changing basic function implementations between runs, just dump that info out to a file and let them get it in a second instead of 2 minutes.

What's a good profiling tool to use when source code isn't available?

I have a big problem. My boss said to me that he wants two "magic black box":
1- something that receives a micropocessor like input and return, like output, the MIPS and/or MFLOPS.
2- something that receives a c code like input and return, like output, something that can characterize the code in term of performance (something like the necessary MIPS that a uP must have to execute the code in some time).
So the first "black box" I think could be a benchmark of EEMBC or SPEC...different uP, same benchmark that returns MIPS/MFLOPS of each uP. The first problem is OK (I hope)
But the second...the second black box is my nightmare...the only thingh that i find is to use profiling tool but I ask a particular profiling tool.
Is there somebody that know a profiling tool that can have, like input, simple c code and gives me, like output, the performance characteristics of my c code (or the times that some assembly instruction is called)?
The real problem is that we must choose the correct uP for a certai c code...but we want a uP tailored for our c code...so if we know a MIPS (and architectural structure of uP, memory structure...) and what our code needed
Thanks to everyone
I have to agree with Adam, though I would be a little more gracious about it. Compiler optimizations only matter in hotspot code, i.e. tight loops that a) don't call functions, and b) take a large percentage of time.
On a positive note, here's what I would suggest:
Run the C code on a processor, any processor. On that processor, find out what takes the most time.
You could use a profiler for this. The simple method I prefer is to just run it under a debugger and manually halt it, some number of times (like 10) and each time write down the call stack. I suppose there is something in the code taking a good percentage of the time, like 50%. If so, you will see it doing that thing on roughly that percentage of samples, so you won't have to guess what it is.
If that activity is something that would be helped by some special processor, then try that processor.
It is important not to guess. If you say "I think this needs a DSP chip", or "I think it needs a multi-core chip", that is a guess. The guess might be right, but probably not. It is probably the case that what takes the most time is something you never would guess, like memory management or I/O formatting. Performance issues are very good at hiding from you.
No. If someone made a tool that could analyse (non-trivial) source code and tell you its performance characteristics, it would be common place. i.e. everyone would be using it.
Until source code is compiled for a particular target architecture, you will not be able to determine its overall performance. For instance, a parallelising compiler targeting n processors might conceivably be able to change an O(n^2) algorithm to one of O(n).
You won't find a tool to do what you want.
Your only option is to cross-compile the code and profile it on an emulator for the architecture you're running. The problem with profiling high level code is the compiler makes a stack of optimizations that are non trivial and you'd need to know how the particular compiler did that.
It sounds dumb, but why do you want to fit your code to a uP and a uP to your code? If you're writing signal processing buy a DSP. If you're building a SCADA box then look into Atmel or ARM stuff. Are you building a general purpose appliance with a user interface? Look into PPC or X86 compatible stuff.
Simply put, choose a bloody architecture that's suitable and provides the features you need. Optimization before choosing the processor is retarded (very roughly paraphrasing Knuth).
Fix the architecture at something roughly appropriate, work out roughly the processing requirements (you can scratch up an estimate by hand which will always be too high when looking at C code) and buy a uP to match.

What's the difference between Managed/Byte Code and Unmanaged/Native Code?

Sometimes it's difficult to describe some of the things that "us programmers" may think are simple to non-programmers and management types.
So...
How would you describe the difference between Managed Code (or Java Byte Code) and Unmanaged/Native Code to a Non-Programmer?
Managed Code == "Mansion House with an entire staff or Butlers, Maids, Cooks & Gardeners to keep the place nice"
Unmanaged Code == "Where I used to live in University"
think of your desk, if you clean it up regularly, there's space to sit what you're actually working on in front of you. if you don't clean it up, you run out of space.
That space is equivalent to computer resources like RAM, Hard Disk, etc.
Managed code allows the system automatically choose when and what to clean up. Unmanaged Code makes the process "manual" - in that the programmer needs to tell the system when and what to clean up.
I'm astonished by what emerges from this discussion (well, not really but rhetorically). Let me add something, even if I'm late.
Virtual Machines (VMs) and Garbage Collection (GC) are decades old and two separate concepts. Garbage-collected native-code compiled languages exist, even these from decades (canonical example: ANSI Common Lisp; well, there is at least a compile-time garbage-collected declarative language, Mercury - but apparently the masses scream at Prolog-like languages).
Suddenly GCed byte-code based VMs are a panacea for all IT diseases. Sandboxing of existing binaries (other examples here, here and here)? Principle of least authority (POLA)/capabilities-based security? Slim binaries (or its modern variant SafeTSA)? Region inference? No, sir: Microsoft & Sun does not authorize us to even only think about such perversions. No, better rewrite our entire software stack for this wonderful(???) new(???) language§/API. As one of our hosts says, it's Fire and Motion all over again.
§ Don't be silly: I know that C# is not the only language that target .Net/Mono, it's an hyperbole.
Edit: it is particularly instructive to look at comments to this answer by S.Lott in the light of alternative techniques for memory management/safety/code mobility that I pointed out.
My point is that non technical people don't need to be bothered with technicalities at this level of detail.
On the other end, if they are impressed by Microsoft/Sun marketing it is necessary to explain them that they are being fooled - GCed byte-code based VMs are not this novelty as they claim, they don't solve magically every IT problem and alternatives to these implementation techniques exist (some are better).
Edit 2: Garbage Collection is a memory management technique and, as every implementation technique, need to be understood to be used correctly. Look how, at ITA Software, they bypass GC to obtain good perfomance:
4 - Because we have about 2 gigs of static data we need rapid access to,
we use C++ code to memory-map huge
files containing pointerless C structs
(of flights, fares, etc), and then
access these from Common Lisp using
foreign data accesses. A struct field
access compiles into two or three
instructions, so there's not really
any performance. penalty for accessing
C rather than Lisp objects. By doing
this, we keep the Lisp garbage
collector from seeing the data (to
Lisp, each pointer to a C object is
just a fixnum, though we do often
temporarily wrap these pointers in
Lisp objects to improve
debuggability). Our Lisp images are
therefore only about 250 megs of
"working" data structures and code.
...
9 - We can do 10 seconds of Lisp computation on a 800mhz box and cons
less than 5k of data. This is because
we pre-allocate all data structures we
need and die on queries that exceed
them. This may make many Lisp
programmers cringe, but with a 250 meg
image and real-time constraints, we
can't afford to generate garbage. For
example, rather than using cons, we
use "cons!", which grabs cells from an
array of 10,000,000 cells we've
preallocated and which gets reset
every query.
Edit 3: (to avoid misunderstanding) is GC better than fiddling directly with pointers? Most of the time, certainly, but there are alternatives to both. Is there a need to bother users with these details? I don't see any evidence that this is the case, besides dispelling some marketing hype when necessary.
I'm pretty sure the basic interpretation is:
Managed = resource cleanup managed by runtime (i.e. Garbage Collection)
Unmanaged = clean up after yourself (i.e. malloc & free)
Perhaps compare it with investing in the stock market.
You can buy and sell shares yourself, trying to become an expert in what will give the best risk/reward - or you can invest in a fund which is managed by an "expert" who will do it for you - at the cost of you losing some control, and possibly some commission. (Admittedly I'm more of a fan of tracker funds, and the stock market "experts" haven't exactly done brilliant recently, but....)
Here's my Answer:
Managed (.NET) or Byte Code (Java) will save you time and money.
Now let's compare the two:
Unmanaged or Native Code
You need to do your own resource (RAM / Memory) allocation and cleanup. If you forget something, you end up with what's called a "Memory Leak" that can crash the computer. A Memory Leak is a term for when an application starts using up (eating up) Ram/Memory but not letting it go so the computer can use if for other applications; eventually this causes the computer to crash.
In order to run your application on different Operating Systems (Mac OSX, Windows, etc.) you need to compile your code specifically for each Operating System, and possibly change alot of code that is Operating System specific so it works on each Operating System.
.NET Managed Code or Java Byte Code
All the resource (RAM / Memory) allocation and cleanup are done for you and the risk of creating "Memory Leaks" is reduced to a minimum. This allows more time to code features instead of spending it on resource management.
In order to run you application on different Operating Systems (Mac OSX, Windows, etc.) you just compile once, and it'll run on each as long as they support the given Framework you are app runs on top of (.NET Framework / Mono or Java).
In Short
Developing using the .NET Framework (Managed Code) or Java (Byte Code) make it overall cheaper to build an application that can target multiple operating systems with ease, and allow more time to be spend building rich features instead of the mundane tasks of memory/resource management.
Also, before anyone points out that the .NET Framework doesn't support multiple operating systems, I need to point out that technically Windows 98, WinXP 32-bit, WinXP 64-bit, WinVista 32-bit, WinVista 64-bit and Windows Server are all different Operating Systems, but the same .NET app will run on each. And, there is also the Mono Project that brings .NET to Linux and Mac OSX.
Unmanaged code is a list of instructions for the computer to follow.
Managed code is a list of tasks for the computer follow that the computer is free to interpret on its own on how to accomplish them.
The big difference is memory management. With native code, you have to manage memory yourself. This can be difficult and is the cause of a lot of bugs and lot of development time spent tracking down those bugs. With managed code, you still have problems, but a lot less of them and they're easier to track down. This normally means less buggy software, and less development time.
There are other differences, but memory management is probably the biggest.
If they were still interested I might mention how a lot of exploits are from buffer overruns and that you don't get that with managed code, or that code reuse is now easy, or that we no longer have to deal with COM (if you're lucky anyway). I'd probably stay way from COM otherwise I'd launch into a tirade over how awful it is.
It's like the difference between playing pool with and without bumpers along the edges. Unless you and all the other players always make perfect shots, you need something to keep the balls on the table. (Ignore intentional ricochets...)
Or use soccer with walls instead of sidelines and endlines, or baseball without a backstop, or hockey without a net behind the goal, or NASCAR without barriers, or football without helmets ...)
"The specific term managed code is particularly pervasive in the Microsoft world."
Since I work in MacOS and Linux world, it's not a term I use or encounter.
The Brad Abrams "What is Managed Code" blog post has a definition that say things like ".NET Framework Common Language Runtime".
My point is this: it may not be appropriate to explain it the terms at all. If it's a bug, hack or work-around, it's not very important. Certainly not important enough to work up a sophisticated lay-persons description. It may vanish with the next release of some batch of MS products.

Resources