ATmegaXXX V, P, does it matter for compilation?

ATmegaXXX V, P, does it matter for compilation? - gcc

I used a ATmega649 before but then switched to ATmega649V.
Does it matter which MCU version given to the compiler, ATmega649, ATmega649V or ATmega649P?
I understand it as the architecture is exactly the same it is only some powersaving that is somehow achieved without changing the architecture that is the difference?
Using avr-gcc.

well, you can use an "almost" compatible architecture with no harm, though you have to triple check the datasheet that there's no difference in the way registers are setup otherwise your program won't work, or worst will work until a feature is failing. It is usually a source of frustration when you've forgotten you've been using a close enough, but not exactly the architecture you're targetting.
I don't know well enough the Atmega649X, and I won't thoroughly read the lengthy datasheets to find those differences. So if you decide to do it, be careful, and don't forget about that!

usually the additional letters signalize differences in max speed, supply voltage ratings or power consumptions. the core itself is compatible. so if numbers match, it is no difference from the compilers point of view.
however the flash tool may recognize them as different parts and require correct settings.

Related

Differences between how a simulator and synthesizer treats VHDL code

In a recent question (Difference in initializing a state machine between a simulator and synthesizer) I found out that simulators and sythesizers do not always treat VHDL code in exactly the same way. For example, when initialising a state machine using an enumerated type a simulator defaults to the enumerator's left hand value; however, it does not appear so clear cut as to the value a synthesizer defaults to.
Being relatively new to VHDL and FPGAs, it got me wondering as to whether there are other differences between the two that would be useful to know about. Does anyone know of any such differences that they would share? Even links to other places explaining such differences would be useful.
Thanks

I come from Verilog but about the same rules apply.
1/ Do not use any initialization, use a reset.
2/ Do not use sensitivity list. Use always #( * ) or always_comb
I don't know the VHDL equivalent of this one, but I assume somebody will point it out in a comment soon ;-)
3/ Never assume, always know what kind of logic will be generated. If you are not sure, use a different language construct or find out.
4/ Be fussy, meticulous, precise, excessively orderly, the best description is: be anal!
By the way I followed the mentioned post and was somewhat stunned by it. I'll be honest: I don't like VHDL for many reasons and thought the one asset of it was that errors like that where not possible because of the tight type & vector length checking. Obviously not, so the only remaining reason for using VHDL goes out the window for me.

ARM Cortex-M compiler differences

I'm about to develop some firmwares for Cortex-M cores on STM32 processors using C for my projects, and searching on the web I've found a lot of different compilers:
Keil, IAR, Linaro, Yagarto and GNU Tools for ARM Embedded Processors.
I was wondering, what functional differences are there between these compilers that might influence my choice? For example as an enthusiast I don't need support or assistance from the vendor, and a limitation on the code size is OK for the moment. Also the ease of use is not a main concern since I like to learn (and for the moment I have both Keil Lite and Eclipse with GNU ARM configured and working).
Is the generated code so different in terms of size/speed between these compilers? Are there any comparison table? (I've found only stale infos on the web)

benchmarking is an artform in and of itself, usually easy to manipulate the results to show whatever you want. I would not expect the compilers to generate the same results except for very small test cases, and sometimes in those small test cases their results are either identical or sometimes vastly different as your test has exposed an optimization that one compiler knows/uses and one the other doesnt.
I used to keep track of such things (compiler performance numbers) with dhrystone for example, but in the case of known benchmarks (not that dhrystone means much anymore, but others) you may find that some compilers are tuning themselves to look good under benchmarks perhaps at the expense of something else.
There is no right answer, there is no universal "best", it is all in the eye of the beholder, you. Which tool is easier for you to use, which do you like better be it for the gui or pretty colors or sound card sounds or whatever. And go from there.
The gnu compiler generally for applications I have tested does not produce code as "fast" which is my benchmark, compared to the others, but there are way more people using the free gnu tools so the support for it is considerably wider due to the number of web pages and forums and examples. gnu wont have a size restriction either, but it may require more learning or whatever to get up and running...
The cortex-ms are split into the armv6m and armv7m families, the v6m (cortex-m0) only have a small number of thumb2 extensions, the armv7m have about 150 thumbv2 extensions to thumb, so you need to know what your tools support and not use the wrong stuff on the wrong chip. Then the compilers if they know all of this may and will produce different instruction mixes from the same source code. Further within the same compiler or family using different command line options you can/will get vastly different code. And then beyond that with a cortex-m4 with cache on if you have one with such a thing, depending on how the code lies in the cache lines you may get vastly different performance, so benchmarking is a research project in itself for each blob of C code you want to benchmark. The performance range within a single compiler may shadow another compiler or the overlap may be enough to not matter.
If you have access to the tools you add value to yourself professionally by learning to use the competing tools and being able to walk into a job and or within your job choose what you see as the right tool for the job or walk into a Kiel house and be able to work right away or a gnu house and work right away. Where you might lose a job if you are gnu only and the job is for a Kiel house.

We have done some comparisons; IAR and Keil typically outperform GCC with default settings. But with some compiler flags you can make GCC come pretty close to the result of IAR and Keil.
Some of the compilers you mention are integrated development environments. Others are just plain compilers.
Some people prefer a integrated environment with compiler, editor and debugger nicely packaged for you. Others prefer to set up their own environment. It is a matter of taste.
In addition to Yagarto, there is also the "Code Sourcery" distribution of GCC for ARM.

Performance should not be your first concern unless when it becomes so in a production environment. The reason is that first, most ARM compilers are plenty good enough, and really you are down to GCC based, Keil, and IAR. Second, most ARM MCU are "blazingly fast" and have "so much memory" (these are comparing to 8-bit MCU like AVR/PIC but also to older PC). A decent Cortex-M4 MCU runs up to 100MHz and has 256K of flash. Again, to put it in perspective, that's more memory and 10x faster clock rate than the original Macintosh etc. We went to the Moon with much less ;-)
Now the performance of the tools itself, in particular, the IDE and the debuggers, differ greatly. For example, the popular Eclipse is written in Java, might be a bit sluggish to slower or memory-starved PCs. The best thing to do is to install GCC+Eclipse, and the vendors' demos and see for yourself.

Parallel STL algorithms in OS X

I working on converting an existing program to take advantage of some parallel functionality of the STL.
Specifically, I've re-written a big loop to work with std::accumulate. It runs, nicely.
Now, I want to have that accumulate operation run in parallel.
The documentation I've seen for GCC outline two specific steps.
Include the compiler flag -D_GLIBCXX_PARALLEL
Possibly add the header <parallel/algorithm>
Adding the compiler flag doesn't seem to change anything. The execution time is the same, and I don't see any indication of multiple core usage when monitoring the system.
I get an error when adding the parallel/algorithm header. I thought it would be included with the latest version of gcc (4.7).
So, a few questions:
Is there some way to definitively determine if code is actually running in parallel?
Is there a "best practices" way of doing this on OS X? (Ideal compiler flags, header, etc?)
Any and all suggestions are welcome.
Thanks!

See http://threadingbuildingblocks.org/
If you only ever parallelize STL algorithms, you are going to disappointed in the results in general. Those algorithms generally only begin to show a scalability advantage when working over very large datasets (e.g. N > 10 million).
TBB (and others like it) work at a higher level, focusing on the overall algorithm design, not just the leaf functions (like std::accumulate()).

Second alternative is to use OpenMP, which is supported by both GCC and
Clang, though is not STL by any means, but is cross-platform.
Third alternative is to use Grand Central Dispatch - the official multicore API in OSX, again hardly STL.
Forth alternative is to wait for C++17, it will have Parallelism module.

Downloading the binary code from an AT89S52 chip

I have an AT89S52, and I want to read the program burned on it.
Is there a way to do it with the programming interface?
(I am well aware it will be assembly code, but I think I can handle it, since I'm looking for a specific string in that code)
Thanks

You may not be able to (at all easily anyway) -- that chip has three protection bits that are intended to prevent you from doing so. If you're dealing with some sort of commercial product, chances are pretty good that those bits will be set.
Reference: Datasheet, page 20, section 17.

What's a good profiling tool to use when source code isn't available?

I have a big problem. My boss said to me that he wants two "magic black box":
1- something that receives a micropocessor like input and return, like output, the MIPS and/or MFLOPS.
2- something that receives a c code like input and return, like output, something that can characterize the code in term of performance (something like the necessary MIPS that a uP must have to execute the code in some time).
So the first "black box" I think could be a benchmark of EEMBC or SPEC...different uP, same benchmark that returns MIPS/MFLOPS of each uP. The first problem is OK (I hope)
But the second...the second black box is my nightmare...the only thingh that i find is to use profiling tool but I ask a particular profiling tool.
Is there somebody that know a profiling tool that can have, like input, simple c code and gives me, like output, the performance characteristics of my c code (or the times that some assembly instruction is called)?
The real problem is that we must choose the correct uP for a certai c code...but we want a uP tailored for our c code...so if we know a MIPS (and architectural structure of uP, memory structure...) and what our code needed
Thanks to everyone

I have to agree with Adam, though I would be a little more gracious about it. Compiler optimizations only matter in hotspot code, i.e. tight loops that a) don't call functions, and b) take a large percentage of time.
On a positive note, here's what I would suggest:
Run the C code on a processor, any processor. On that processor, find out what takes the most time.
You could use a profiler for this. The simple method I prefer is to just run it under a debugger and manually halt it, some number of times (like 10) and each time write down the call stack. I suppose there is something in the code taking a good percentage of the time, like 50%. If so, you will see it doing that thing on roughly that percentage of samples, so you won't have to guess what it is.
If that activity is something that would be helped by some special processor, then try that processor.
It is important not to guess. If you say "I think this needs a DSP chip", or "I think it needs a multi-core chip", that is a guess. The guess might be right, but probably not. It is probably the case that what takes the most time is something you never would guess, like memory management or I/O formatting. Performance issues are very good at hiding from you.

No. If someone made a tool that could analyse (non-trivial) source code and tell you its performance characteristics, it would be common place. i.e. everyone would be using it.
Until source code is compiled for a particular target architecture, you will not be able to determine its overall performance. For instance, a parallelising compiler targeting n processors might conceivably be able to change an O(n^2) algorithm to one of O(n).

You won't find a tool to do what you want.
Your only option is to cross-compile the code and profile it on an emulator for the architecture you're running. The problem with profiling high level code is the compiler makes a stack of optimizations that are non trivial and you'd need to know how the particular compiler did that.
It sounds dumb, but why do you want to fit your code to a uP and a uP to your code? If you're writing signal processing buy a DSP. If you're building a SCADA box then look into Atmel or ARM stuff. Are you building a general purpose appliance with a user interface? Look into PPC or X86 compatible stuff.
Simply put, choose a bloody architecture that's suitable and provides the features you need. Optimization before choosing the processor is retarded (very roughly paraphrasing Knuth).
Fix the architecture at something roughly appropriate, work out roughly the processing requirements (you can scratch up an estimate by hand which will always be too high when looking at C code) and buy a uP to match.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio