Execution time of code on microcontroller - performance

On a 32 bit microcontroller , I want to measure exection time of a code for different operating frequencies of microcontroller. First of all I used Periodic timer (PIT), but it did not provide high resolution, because if I operate PIT at high frequency then its counter got overflow. So I shifted to System timer (STM), because it can run at system clock. but at different operating frequencies of microcontroller, STM give same execution time of code. Could any of you help me in this matter. Thanks

I realize this is an old question, but if this doesn't need to be done in the system "real time," I would just toggle a port pin when entering and exiting the function and use an oscilloscope to measure the time. I'm assuming that you just want to do this for software testing.
If you need to do it "real time" (in the application code), then you'll need to multiply your STM timer value by the microcontroller clock's period. The timer value for the function execution should always be the same (with some exceptions) regardless of the micro's clock frequency. (i.e. the timer's speed will change with clock frequency in the same way your code's execution speed will change)

Related

Measure Power Consumption of Designed system on an Altera DE1 Board

I am designing a processor using an Altera DE1 kit.
I will be running test bench to stress the processor.
I want to know if there is any way to measure only the power consumption of my design and neglecting the other power dissipation caused by the DE1 board.
TIA for the answer.
Measure power at an idle state. The idle state can be many things. This needs to be decided by you:
The board operating when the FPGA is not programmed (no bitstream loaded).
FPGA loaded, but you hold down the reset for the logic.
Place the FPGA in some kind of suspended state (sleep mode).
Now that you have your reference power measurement, measure the power with your design running fully. Subtract one from the other, and you will have a result which is close to what you are searching for (The board may consume differently, at each idle state, than it would have been when running normally with your design).
You should be able to replace the 0-Ohm resistor R29 by a shunt resistor and measure the core current of the fpga through that. It's right in series with VCCINT so it should reflect only the current used by the fpga logic.
There's also R30 in series with VCCIO, if you want to include IO power consumption as well.
The resistor names are from this schematic (the only one I could find so far): http://d1.amobbs.com/bbs_upload782111/files_33/ourdev_586508CWZW3R.pdf

How long takes a multiplier function on FPGA? and is it possible to calculate this time?

I have implemented a hardware architecture on FPGA and i use some multiplier function on this architecture ,
I'd like to know is there any way or method on ISE software or hardware (by using chip scope) to calculate the maximum delay time of each section/step?
for example i want to know if i increase the input clock pulse, which sections won't work correctly?
Look at the timing report for the design, which can give you delay information about various elements in a requested path.
Based on this you can also get minimum slack information, which then tells you how much you may increase the clock, and you can then change the clock frequency and rerun synthesis to check that it holds timing with the new clock frequency.
Using specific measurement, from for example chip scope, only gives information about that specific chip, on that specific power supply, with that specific data, etc., where the timing engine (Static Timing Analysis (STA)) given you a worst case analysis for design and vendor parameters.

Questions on Measuring Time Using the CPU Clock

I'm aware of the standard methods of getting time deltas using CPU clock counters on various operating systems. My question is, how do such operating systems account for the change in CPU frequency for power saving purposes. I initially thought this could be explained based on the fact that OS's use specific calls to measure frequency to get the corrected frequency based on which core is being used, what frequency it's currently set to, etc. But then I realized, wouldn't that make any time delta inaccurate if the CPU frequency was lowered and raised back to it's original value in between two clock queries.
For example take the following scenario:
Query the CPU cycles. Operating system lowers CPU frequency for power saving. Some other code is run here. Operating system raises CPU frequency for performance. Query the CPU cycles. Calculate delta as cycle difference divided by frequency.
This would yield an inaccurate delta since the CPU frequency was not constant between the two queries. How is this worked around by the operating system or programs that have to work with time deltas using CPU cycles?
see this wrong clock cycle measurements with rdtsc
there are more ways how to deal with it
set CPU clock to max
read the link above to see how to do it?
use PIT instead of RDTSC
PIT is programmable interrupt timer (Intel 8253 if I remember correctly) it is present on all PC motherboards since x286 (and maybe even before) but the resolution is only ~119KHz and not all OS give you access to it.
combine PIT and RDTSC
just measure the CPU clock by PIT repeatedly when is stable enough start your measurement (and remain scanning for CPU clock change). If CPU clock changes during measurement then throw away the measurement and start again

How to generate ~100kHz clock signal in Liunx kernel module with bit-banging?

I'm trying to generate clock signal on GPIO pin (ARM platform, mach-davinci, kernel 2.6.27) which will have something arroung 100kHz. Using tasklet with high priority to do that. Theory is simple, set gpio high, udelay for 5us, set gpio low, wait another 5us, but strange problems appear. First of all, can't get this 5us of dalay, but it's fine, looks like hw performance problem, so i moved to period = 40us (gives ~25kHz). Second problem is worst. Once per ~10ms udelay waits 3x longer than usual. I'm thinking that it's hearbeat taking this time, but this is is unacceptable from protocol (which will be implemented on top of this) point of view. Is there any way to temporary disable heartbeat procedure, lets say, for 500ms ? Or maybe I'm doing it wrong from the beginning? Any comments?
You cannot use tasklet for this kind of job. Tasklets can be preempted by interrupts. In some case your tasklet can be even executed in the process context!
If you absolutely have to do it this way, use an interrupt handler - get in, disable interrupts, do whatever you have to do and get out as fast as you can.
Generating the clock asynchronously in software is not the right thing to do. I can think of two alternatives that will work better:
Your processor may have a built-in clock generator peripheral that isn't already being used by the kernel or another driver. When you set one of these up, you tell it how fast to run its clock, and it just starts running out the pulses.
Get your processor's datasheet and study it.
You might not find a peripheral called a "clock" per se, but might find something similar that you can press into service, like a PWM peripheral.
The other device you are talking to may not actually require a regular clock. Some chips that need a "clock" line merely need a line that goes high when there is a bit to read, which then goes low while the data line(s) are changing. If this is the case, the 100 kHz thing you're reading isn't a hard requirement for a clock of exactly that frequency, it is just an upper limit on how fast the clock line (and thus the data line(s)) are allowed to transition.
With a CPU so much faster than the clock, you want to split this into two halves:
The "top half" sets the data line(s) state correctly, then brings the clock line up. Then it schedules the bottom half to run 5 μs later, using an interrupt or kernel timer.
In the "bottom half", called by the interrupt or timer, bring the clock line back down, then schedule the top half to run again 5 μs later.
Unless you can run your timer tasklet at higher priority than the kernel timer, you will always be susceptible to this kind of jitter. You do really have to do this by bit-ganging? It would be far easier to use a hardware timer or PWM generator. Configure the timer to run at your desired rate, set the pin to output, and you're done.
If you need software control on each bit period, you can try and work around the other tasks by setting your tasklet to run at a short period, say three-fourths of your 40 us delay. In the tasklet, disable interrupts and poll the clock until you get to the correct 40 us timeslot, set the I/O state, re-enable interrupts, and exit. But this effectively types up 25 % of your system in watching a clock.

What is the clock source for the count returned by QueryPerfomanceCounter

I was under the impression that QueryPerformanceCounter was actually accessing the counter that feeds the HPET (High Performance Event Timer)---the difference of course being that HPET is a timer which send an interrupt when the counter value matches the desired interval whereas to make a timer "out of" QueryPerformanceCounter you have to write your own loop in software.
The only reason I had assumed the hardware behind the two was the same is because somewhere I had read that QueryPerformanceCounter was accessing a counter on the chipset.
http://www.gamedev.net/reference/programming/features/timing/ claims that QueryPerformanceCounter use chipset timers which apparently have a specified clock rate. However, I can verify that QueryPerformanceFrequency returns wildly different numbers on different machines, and in fact, the number can change slightly from boot to boot.
The numbers returned can sometimes be totally ridiculous---implying ticks in the nanosecond range. Of course when put together it all works; that is, writing timer software using QueryPerformanceCounter/QueryPerformanceFrequency allows you to get proper timing and latency is pretty low.
A software timer using these functions can be pretty good. For example, with an interval of 1 millisecond, over 30 seconds it's easy to nearly 100% of ticks to fall within 10% of the intended interval. With an interval of 100 microseconds, you still get a high success rate (99.7%) but the worst ticks can be way off (200 microseconds).
I'm wondering if the clock behind the HPET is the same. Supposedly HPET should still increase accuracy since it is a hardware timer, but as of yet I don't know how to access it in Windows.
Sounds like Microsoft has made these functions use "whatever best timer there is":
http://www.microsoft.com/whdc/system/sysinternals/mm-timer.mspx
Did you try updating your CPU driver for your AMD multicore system? Did you check whether your motherboard chipset is on the "bad" list? Did you try setting the CPU affinity?
One can also use the RTC-based time functions and/or a skip-detecting heuristic to eliminate trouble with QPC.
This has some hints: CPU clock frequency and thus QueryPerformanceCounter wrong?
Please improve this. It is a community wiki.

Resources