I need to compile a minimum Linux kernel, I mean,
with the basic and generic modules to get it work on low resources machines.
Is there any specification of the minimum modules that a kernel must have to accomplish my needs?
The unique requirement is that it must be stable.
Where can I find information about it?
I'm not exactly sure what "accomplish my needs" means, but whenever I need something real small/quick/easy I use: http://www.damnsmalllinux.org/
Related
OK, I have the problem, I do not know exactly the correct terms in order to find what I am looking for on google. So I hope someone here can help me out.
When developing real time programs on embedded devices you might have to iterate a few hundred or thousand times until you get the desired result. When using e.g. ARM devices you wear out the internal flash quite quickly. So typically you develop your programs to reside in the RAM of the device and all is ok. This is done using GCC's functionality to split the code in various sections.
Unfortunately, the RAM of most devices is much smaller than the flash. So at one point in time, your program gets too big to fit in RAM with all variables etc. (You choose the size of the device such that one assumes it will fit the whole code in flash later.)
Classical shared objects do not work as there is nothing like a dynamical linker in my environment. There is no OS or such.
My idea was the following: For the controller it is no problem to execute code from both RAM and flash. When compiling with the correct attributes for the functions this is also no big problem for the compiler to put part of the program in RAM and part in flash.
When I have some functionality running successfully I create a library and put this in the flash. The main development is done in the 'volatile' part of the development in RAM. So the flash gets preserved.
The problem here is: I need to make sure, that the library always gets linked to the exact same location as long as I do not reflash. So a single function must always be on the same address in flash for each compile cycle. When something in the flash is missing it must be placed in RAM or a lining error must be thrown.
I thought about putting together a real library and linking against that. Here I am a bit lost. I need to tell GCC/LD to link against a prelinked file (and create such a prelinked file).
It should be possible to put all the library objects together and link this together in the flash. Then the addresses could be extracted and the main program (for use in RAM) can link against it. But: How to do these steps?
In the internet there is the term prelink as well as a matching program for linux. This is intended to speed up the loading times. I do not know if this program might help me out as a side effect. I doubt it but I do not understand the internals of its work.
Do you have a good idea how to reach the goal?
You are solving a non-problem. Embedded flash usually has a MINIMUM write cycle of 10,000. So even if you flash it 20 times a day, it will last a year and half. An St-Nucleo is $13. So that's less than 3 pennies a day :-). The TYPICAL write cycle is even longer, at about 100,000. It will be a long time before you wear them out.
Now if you are using them for dynamic storage, that might be a concern, depending on the usage patterns.
But to answer your questions, you can build your code into a library .a file easily enough. However, GCC does not guarantee that it links the object code in any order, as it depends on optimization level. Furthermore, only functions that are referenced in a library file is pulled in, so if your function calls change, it may pull in more or less library functions.
I used a ATmega649 before but then switched to ATmega649V.
Does it matter which MCU version given to the compiler, ATmega649, ATmega649V or ATmega649P?
I understand it as the architecture is exactly the same it is only some powersaving that is somehow achieved without changing the architecture that is the difference?
Using avr-gcc.
well, you can use an "almost" compatible architecture with no harm, though you have to triple check the datasheet that there's no difference in the way registers are setup otherwise your program won't work, or worst will work until a feature is failing. It is usually a source of frustration when you've forgotten you've been using a close enough, but not exactly the architecture you're targetting.
I don't know well enough the Atmega649X, and I won't thoroughly read the lengthy datasheets to find those differences. So if you decide to do it, be careful, and don't forget about that!
usually the additional letters signalize differences in max speed, supply voltage ratings or power consumptions. the core itself is compatible. so if numbers match, it is no difference from the compilers point of view.
however the flash tool may recognize them as different parts and require correct settings.
I need detect GPU (videocard) and set settings of the app, appropriate to GPU performance.
I'm able to make a list with settings for each GPU model, but I don't understand how to easily detect model of GPU installed in PC.
What is the best way to solve this task? Does any way to do this that is not dependent on installed driver/some software?
The above comment by Ben Voigt summarizes it: Simply don't do it.
See if the minimum version of your favorite compute API (OpenCL or whatever) is supported, and if the required extensions are present, compile some kernels, and see if that produces errors. Run the kernels and benchmark them. Ask the API how much local/global memory you have available, what warp sizes it supports, and so on.
If you really insist on detecting the GPU model, prepare for trouble. There are two ways of doing this, one is parsing the graphic card's advertised human readable name, this is asking for trouble right away (since many cards that are hugely different will advertise the same human-readable name, and some model names even lie about their architecture generation!).
The other, slightly better way is finding the vendor/model ID combination and looking that one up. This works somewhat better but it is equally painful and error-prone.
You can parse these vendor and model IDs from the "key" string inside the structure that you get when you call EnumDisplayDevices. Which, if I remember correctly, Microsoft calls "reserved", in other words it's kind of unsupported/undocumented.
Finding out the vendor is still relatively easy. A vendor ID of 0x10DE is nVidia, and 0x1002 is AMD/ATI. 0x163C is Intel. However, sometimes, very rarely, a cheapish OEM will advertise its own ID instead.
Then you have the kind of meaningless model number (it's not like bigger numbers are better, or some other obvious rule!), which you need to look up somewhere. nVidia and AMD publish these officially [1] [2], although they are not necessarily always up-to-date. There was a time when nVidia's list lacked the most recent models for almost one year (though the list I just downloaded seems to be complete). I'm not aware of other manufacturers, including Intel, doing this consistently.
Spending some time on Google will lead you to sites like this one, which are not "official" but may allow you to figure out most stuff anyway... in a painstalking manner.
And then, you know the model, and you have gained pretty much nothing. You still need to translate this to "good enough for what I want" or "not good enough".
Which you could have found out simply by compiling your kernels and seeing that no error is reported, and running them.
And what do you do in 6 months when 3 new GPU models are released after your application which obviously cannot know these has already shipped? How do you treat these?
I have to check compatibility of a software with systems (os, device, browser, client). Only some of the systems are supported.
We have all 4 paramaters combinations for compatible systems. Given parameters for some system i have to check for its compatibility.
Best i can think of that i allot different os values 0-9, device with values 100,200,..900, similarly for browser 1000,2000,...9000 and for client. Maintain a cache of all valid additions and check for given system from that cache.
Is there any better method? In the above method i can have scalability problem. suggest me some similar algorithms.
To be absolutely sure some combination will work you will have to test it. If you have so many combinations to check that you cannot check each one, you can make assumptions about what is likely to go wrong, and find schemes that give you the most useful test under these assumptions.
If you assume that bugs can always be replicated by combining just two choices (e.g. Windows + device always gives trouble, regardless of browser and client) then you can find a scheme for testing every combination of two choices without testing every combination of everything - see http://en.wikipedia.org/wiki/All-pairs_testing
Use a hash table. Virtually every language has them built in together with methods to serialize them to a file.
You could create some object representing the instance of your class and then hash the object and compare the hash to the hashes of the samples that work. This should solve your scalability issue.
I working on converting an existing program to take advantage of some parallel functionality of the STL.
Specifically, I've re-written a big loop to work with std::accumulate. It runs, nicely.
Now, I want to have that accumulate operation run in parallel.
The documentation I've seen for GCC outline two specific steps.
Include the compiler flag -D_GLIBCXX_PARALLEL
Possibly add the header <parallel/algorithm>
Adding the compiler flag doesn't seem to change anything. The execution time is the same, and I don't see any indication of multiple core usage when monitoring the system.
I get an error when adding the parallel/algorithm header. I thought it would be included with the latest version of gcc (4.7).
So, a few questions:
Is there some way to definitively determine if code is actually running in parallel?
Is there a "best practices" way of doing this on OS X? (Ideal compiler flags, header, etc?)
Any and all suggestions are welcome.
Thanks!
See http://threadingbuildingblocks.org/
If you only ever parallelize STL algorithms, you are going to disappointed in the results in general. Those algorithms generally only begin to show a scalability advantage when working over very large datasets (e.g. N > 10 million).
TBB (and others like it) work at a higher level, focusing on the overall algorithm design, not just the leaf functions (like std::accumulate()).
Second alternative is to use OpenMP, which is supported by both GCC and
Clang, though is not STL by any means, but is cross-platform.
Third alternative is to use Grand Central Dispatch - the official multicore API in OSX, again hardly STL.
Forth alternative is to wait for C++17, it will have Parallelism module.