Linux: Timing during recording/playing sound - linux-kernel

I have a more general question, regarding timing in a standard Linux-OS dealing with playing sound and receiving data over a serial port.
In the moment, I'm reading a PCM-Signal arriving over a USB-to-Serial Bridge (pl2303) which is recorded, encoded and sent from a FPGA.
Now, I need to create "peaks" at a known position in the recorded soundstream, and plan to play a soundfile from the same machine which is recording at a known moment. The peak has to begin and stop inside windows of maximal 50ms, it's length could be ~200ms...
Now, my question is: How precise can I expect the timing to be? I know, that several components add "unkown lag", jitter:
USB-to-Serial Bridge collects ~20 bytes from the serial side before sending them to the USB-side (at 230400Baud this results in ~1ms)
If I call "`sleep 1; mpg123 $MP3FILE` &" directly before my recording software, the Linux-Kernel will schedule them differenty (maybe this causes a few 10ms, depending on system load?)
The soundcard/driver will maybe add some more unkown lag...
Will tricks like "nice" or "sched_setscheduler" add value in my case?
I could build an additional thread inside my recording sofware, which plays the sound. Doing this, the timing may be more precise, but I have a lot more work to do ...
Thanks a lot.
I will try it anyway, but I'm looking for some background toughts to understand and solve my upcoming problems better.

I am not 100% sure, but I would imagine that your kernel would need to be rebuilt to allow the scheduler to reduce latency time in switching tasks a la multitasking, in kernels 2.6.x series, there's an option to make the kernel more smoother by making it pre-emptible.
Go to Processor Type and features
Pre-emption Model
Select Preemptible kernel (low latency desktop)
This should streamline the timing and make the sounds appear smoother as a result of less jitters.
Try that and recompile the kernel again. There are of course, plenty of kernel patches that would reduce the timeslice for each task-switch to make it even more smoother, your mileage may vary depending on:
Processor speed - what processor is used?
Memory - how much RAM?
Disk input/output - the faster, the merrier
using those three factors combined, will have an influence on the scheduler and the multi-tasking features. The lower the latency, the more fine-grained it is.
Incidentally, there is a specialized linux distribution that is catered for capturing sound in real-time, I cannot remember the name of it, the kernel in that distribution was heavily patched to make sound capture very smooth.

it's me again... After one restless night, I solved my strange timing-problems... My first edit is not completely correct, since what I posted was not 100% reproducible. After running some more tests, I can come up with the following Plot, showing timing accuracy:
Results from analysis http://mega2000.de/~mzenzes/pics4web/2010-05-28_13-37-08_timingexperiment.png
I tried two different ubuntu-kernels: 2.6.32-21-generic and 2.6.32-10-rt
I tried to achieve RT-scheduling: sudo chrt --fifo 99 ./experimenter.sh
And I tried to change powersaving-options: echo {performance,conservative} | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
This resulted in 8 different tests, with 50 runs each. Here are the numbers:
mean(peakPos) std(peakPos)
rt-kernel-fifo99-ondemand 0.97 0.0212
rt-kernel-fifo99-performance 0.99 0.0040
rt-kernel-ondemand 0.91 0.1423
rt-kernel-performance 0.96 0.0078
standard-kernel-fifo99-ondemand 0.68 0.0177
standard-kernel-fifo99-performance 0.72 0.0142
standard-kernel-ondemand 0.69 0.0749
standard-kernel-performance 0.69 0.0147

Related

GPU affects core calculation and or RAM access (high jitter)?

i have a kthread which runs alone on one core from a multi-core CPU. This kthread disables all IRQs for that core, runs a loop as fast as possible and measures the maximum loop duration with the help of the TSC. The whole ACPI stuff is disabled (no frequency scaling, no power saving, etc.).
My problem is, that the maximum loop duration apparently depends on the gpu.
When the system is used normal (a little bit office, Internet and programming stuff / not really busy) then the maximum loop duration is around 5 us :-(
The same situation, but with a stressed CPU (the other three cores are 100% busy) leads to a maximum loop duration of approximately 1 us :-|
But when the GPU is switching into idle mode (turning-off the screen), then the maximum loop duration is going down to less than 300 ns :-)
Why is that? And how can i influence this behavior? I thought the CPU and the RAM are directly connected. I recognized, that the maximum loop duration becomes better on a system with a external graphic card for the first situation. For the second and third case i couldn't see a difference. I also tested AMD and Intel systems without success - always the same :-(
I'm fine with the second case. But is it possible to achieve that without stressing the CPU additionally?
Many thanks in advance!
Billy

How to reduce time taken for large calculations in MATLAB

When using the desktop PC's in my university (Which have 4Gb of ram), calculations in Matlab are fairly speedy, but on my laptop (Which also has 4Gb of ram), the exact same calculations take ages. My laptop is much more modern so I assume it also has a similar clock speed to the desktops.
For example, I have written a program that calculates the solid angle subtended by 50 disks at 500 points. On the desktop PC's this calculation takes about 15 seconds, on my laptop it takes about 5 minutes.
Is there a way to reduce the time taken to perform these calculations? e.g, can I allocate more ram to MATLAB, or can I boot up my PC in a way that optimises it for using MATLAB? I'm thinking that if the processor on my laptop is also doing calculations to run other programs this will slow down the MATLAB calculations. I've closed all other applications, but I know theres probably a lot of stuff going on I can't see. Can I boot my laptop up in a way that will have less of these things going on in the background?
I can't modify the code to make it more efficient.
Thanks!
You might run some of my benchmarks which, along with example results, can be found via:
http://www.roylongbottom.org.uk/
The CPU core used at a particular point in time, is the same on Pentiums, Celerons, Core 2s, Xeons and others. Only differences are L2/L3 cache sizes and external memory bus speeds. So you can compare most results with similar vintage 2 GHz CPUs. Things to try, besides simple number crunching tests.
1 - Try memory test, such as my BusSpeed, to show that caches are being used and RAM not dead slow.
2 - Assuming Windows, check that the offending program is the one using most CPU time in Task Manager, also that with the program not running, that CPU utilisation is around zero.
3 - Check that CPU temperature is not too high, like with SpeedFan (free D/L).
4 - If disk light is flashing, too much RAM might be being used, with some being swapped in and out. Task Manager Performance would show this. Increasing RAM demands can be checked my some of my reliability tests.
There are many things that go into computing power besides RAM. You mention processor speed, but there is also number of cores, GPU capability and more. Programs like MATLAB are designed to take advantage of features like parallelism.
Summary: You can't compare only RAM between two machines and expect to know how they will perform with respect to one another.
Side note: 4 GB is not very much RAM for a modern laptop.
Firstly you should perform a CPU performance benchmark on both computers.
Modern operating systems usually apply the most aggressive power management schemes when it is run on laptop. This usually means turning off one or more cores, or setting them to a very low frequency. For example, a Quad-core CPU that normally runs at 2.0 GHz could be throttled down to 700 MHz on one CPU while the other three are basically put to sleep, while it is on battery. (Remark. Numbers are not taken from a real example.)
The OS manages the CPU frequency in a dynamic way, tweaking it on the order of seconds. You will need a software monitoring tool that actually asks for the CPU frequency every second (without doing busy work itself) in order to know if this is the case.
Plugging in the laptop will make the OS use a less aggressive power management scheme.
(If this is found to be unrelated to MATLAB, please "flag" this post and ask moderator to move this question to the SuperUser site.)

Linux Driver real time constraints

I need to build a platform for logging some sensor data. And possibly later doing some calculations on this logged data.
The Raspberry Pi seem like an interesting (and cheap!) device for this.
I have a gyroscope that can sample at 800 Hz which is equivalent to one sample every 1.25 ms.
The gyroscope has a built-in FIFO that can store 32 samples.
This means that the FIFO has to be emptied at least every 32 * 1.25 = 40 ms, otherwise samples will be dropped.
So my question is: Can I be 100% certain that my kernel driver will be able to extract the data from this FIFO within the specified time?
The gyroscope communicates with the host via i2c, and it can also trigger an interrupt pin on a "almost full"-event if that would make things simpler.
But it would be easiest if I could just have a loop in the driver that retrieves the data at regular intervals.
I can live with storing the data in kernel space, and move it to user space more infrequently (no constraint on time).
I can also live with sampling the gyroscope at lower sample rates (400 or 200 Hz is acceptable).
This is with regards to the stock kernel, and not the special real-time kernel as it seems like this is currently not supported for the Raspberry Pi.
You will need a real-time linux environment for tight timing:
You could try Xenomai on Raspberry Pi:
http://diy.powet.eu/2012/07/25/raspberry-pi-xenomai/
However, following along this blog:
http://linuxcnc.mah.priv.at/rpi/rpi-rtperf.html (dead, and I could not find it in wayback or google cache)
It seems he is getting repeatable +/- 20µS timing out of the stock kernel. As your timing resolution is 1250µS you may be fine with the stock kernel if you willing to lose a sample once in a blue moon YMMV.
I have not tested this yet myself but I have been reading up in an attempt to try to drive a ws2811 LED controller with the Raspberry Pi and this was looking the most promising to me.
There is also the RT linux patch: https://rt.wiki.kernel.org/index.php/Main_Page
Which has at lest one pi version: https://github.com/licaon-kter/raspi-rt
However I have run into a bunch of nay-sayers when looking deeper into this patch.
Your best bet it to read the MS timer and log or light an LED if you miss an interval and then try some of the solutions. Happy Hacking..

What is the best way to measure "spare" CPU time on a Linux based system

For some of the customers that we develop software for, we are required to "guarantee" a certain amount of spare resources (memory, disk space, CPU). Memory and disk space are simple, but CPU is a bit more difficult.
One technique that we have used is to create a process that consumes a guaranteed amount of CPU time (say 2.5 seconds every 5 seconds). We run this process at highest priority in order to guarantee that it runs and consumes all of its required CPU cycles.
If our normal applications are able to run at an acceptable level of performance and can pass all of their functionality tests while the spare time process is running as well, then we "assume" that we have met our commitment for spare CPU time.
I'm sure that there are other techniques for doing the same thing, and would like to learn about them.
So this may not be exactly the answer you're looking for, but if all you want to do is make sure your application doesn't exceed certain limits on resource consumption and you're running on linux you can customize /etc/security/limits.con (may be different file on your distro of choice) to force the limits on a particular user and only run the process under that user. This is of course assuming that you have that level of control on your client's production environment.
If I understand correctly, your concern is wether the application also runs while a given percentage of the processing power is not available.
The most incontrovertible approach is to use underpowered hardware for your testing. If the processor in your setup allows you to, you can downclock it online. The Linux kernel gives you an easy interface for doing this, see /sys/devices/system/cpu/cpu0/cpufreq/. There is also a bunch of GUI applications for this available.
If your processor isn't capable of changing clock speed online, you can do it the hard way and select a smaller multiplier in your BIOS.
I think you get the idea. If it runs on 1600 Mhz instead of 2400 Mhz, you can guarantee 33% of spare CPU time.
SAR is a standard *nix process that collects information about the operational use of system resources. It also has a command line tool that allows you to create various reports, and it's common for the data to be persisted in a database.
With a multi-core/processor system you could use Affinity to your advantage.

Power Efficient Software Coding

In a typical handheld/portable embedded system device Battery life is a major concern in design of H/W, S/W and the features the device can support. From the Software programming perspective, one is aware of MIPS, Memory(Data and Program) optimized code.
I am aware of the H/W Deep sleep mode, Standby mode that are used to clock the hardware at lower Cycles or turn of the clock entirel to some unused circutis to save power, but i am looking for some ideas from that point of view:
Wherein my code is running and it needs to keep executing, given this how can I write the code "power" efficiently so as to consume minimum watts?
Are there any special programming constructs, data structures, control structures which i should look at to achieve minimum power consumption for a given functionality.
Are there any s/w high level design considerations which one should keep in mind at time of code structure design, or during low level design to make the code as power efficient(Least power consuming) as possible?
Like 1800 INFORMATION said, avoid polling; subscribe to events and wait for them to happen
Update window content only when necessary - let the system decide when to redraw it
When updating window content, ensure your code recreates as little of the invalid region as possible
With quick code the CPU goes back to deep sleep mode faster and there's a better chance that such code stays in L1 cache
Operate on small data at one time so data stays in caches as well
Ensure that your application doesn't do any unnecessary action when in background
Make your software not only power efficient, but also power aware - update graphics less often when on battery, disable animations, less hard drive thrashing
And read some other guidelines. ;)
Recently a series of posts called "Optimizing Software Applications for Power", started appearing on Intel Software Blogs. May be of some use for x86 developers.
Zeroith, use a fully static machine that can stop when idle. You can't beat zero Hz.
First up, switch to a tickless operating system scheduler. Waking up every millisecend or so wastes power. If you can't, consider slowing the scheduler interrupt instead.
Secondly, ensure your idle thread is a power save, wait for next interrupt instruction.
You can do this in the sort of under-regulated "userland" most small devices have.
Thirdly, if you have to poll or perform user confidence activities like updating the UI,
sleep, do it, and get back to sleep.
Don't trust GUI frameworks that you haven't checked for "sleep and spin" kind of code.
Especially the event timer you may be tempted to use for #2.
Block a thread on read instead of polling with select()/epoll()/ WaitForMultipleObjects().
Puts stress on the thread scheuler ( and your brain) but the devices generally do okay.
This ends up changing your high-level design a bit; it gets tidier!.
A main loop that polls all the things you Might do ends up slow and wasteful on CPU, but does guarantee performance. ( Guaranteed to be slow)
Cache results, lazily create things. Users expect the device to be slow so don't disappoint them. Less running is better. Run as little as you can get away with.
Separate threads can be killed off when you stop needing them.
Try to get more memory than you need, then you can insert into more than one hashtable and save ever searching. This is a direct tradeoff if the memory is DRAM.
Look at a realtime-ier system than you think you might need. It saves time (sic) later.
They cope better with threading too.
Do not poll. Use events and other OS primitives to wait for notifiable occurrences. Polling ensures that the CPU will stay active and use more battery life.
From my work using smart phones, the best way I have found of preserving battery life is to ensure that everything you do not need for your program to function at that specific point is disabled.
For example, only switch Bluetooth on when you need it, similarly the phone capabilities, turn the screen brightness down when it isn't needed, turn the volume down, etc.
The power used by these functions will generally far outweigh the power used by your code.
To avoid polling is a good suggestion.
A microprocessor's power consumption is roughly proportional to its clock frequency, and to the square of its supply voltage. If you have the possibility to adjust these from software, that could save some power. Also, turning off the parts of the processor that you don't need (e.g. floating-point unit) may help, but this very much depends on your platform. In any case, you need a way to measure the actual power consumption of your processor, so that you can find out what works and what not. Just like speed optimizations, power optimizations need to be carefully profiled.
Consider using the network interfaces the least you can. You might want to gather information and send it out in bursts instead of constantly send it.
Look at what your compiler generates, particularly for hot areas of code.
If you have low priority intermittent operations, don't use specific timers to wake up to deal with them, but deal with when processing other events.
Use logic to avoid stupid scenarios where your app might go to sleep for 10 ms and then have to wake up again for the next event. For the kind of platform mentioned it shouldn't matter if both events are processed at the same time.
Having your own timer & callback mechanism might be appropriate for this kind of decision making. The trade off is in code complexity and maintenance vs. likely power savings.
Simply put, do as little as possible.
Well, to the extent that your code can execute entirely in the processor cache, you'll have less bus activity and save power. To the extent that your program is small enough to fit code+data entirely in the cache, you get that benefit "for free". OTOH, if your program is too big, and you can divide your programs into modules that are more or less independent of the other, you might get some power saving by dividing it into separate programs. (I suppose it's also possible to make a toolchain that spreas out related bundles of code and data into cache-sized chunks...)
I suppose that, theoretically, you can save some amount of unnecessary work by reducing the number of pointer dereferencing, and by refactoring your jumps so that the most likely jumps are taken first -- but that's not realistic to do as a programmer.
Transmeta had the idea of letting the machine do some instruction optimization on-the-fly to save power... But that didn't seem to help enough... And look where that got them.
Set unused memory or flash to 0xFF not 0x00. This is certainly true for flash and eeprom, not sure about s or d ram. For the proms there is an inversion so a 0 is stored as a 1 and takes more energy, a 1 is stored as a zero and takes less. This is why you read 0xFFs after erasing a block.
Rather timely this, article on Hackaday today about measuring power consumption of various commands:
Hackaday: the-effect-of-code-on-power-consumption
Aside from that:
- Interrupts are your friends
- Polling / wait() aren't your friends
- Do as little as possible
- make your code as small/efficient as possible
- Turn off as many modules, pins, peripherals as possible in the micro
- Run as slowly as possible
- If the micro has settings for pin drive strengh, slew rate, etc. check them & configure them, the defaults are often full power / max speed.
- returning to the article above, go back and measure the power & see if you can drop it by altering things.
also something that is not trivial to do is reduce precision of the mathematical operations, go for the smallest dataset available and if available by your development environment pack data and aggregate operations.
knuth books could give you all the variant of specific algorithms you need to save memory or cpu, or going with reduced precision minimizing the rounding errors
also, spent some time checking for all the embedded device api - for example most symbian phones could do audio encoding via a specialized hardware
Do your work as quickly as possible, and then go to some idle state waiting for interrupts (or events) to happen. Try to make the code run out of cache with as little external memory traffic as possible.
On Linux, install powertop to see how often which piece of software wakes up the CPU. And follow the various tips that the powertop site links to, some of which are probably applicable to non-Linux, too.
http://www.lesswatts.org/projects/powertop/
Choose efficient algorithms that are quick and have small basic blocks and minimal memory accesses.
Understand the cache size and functional units of your processor.
Don't access memory. Don't use objects or garbage collection or any other high level constructs if they expands your working code or data set outside the available cache. If you know the cache size and associativity, lay out the entire working data set you will need in low power mode and fit it all into the dcache (forget some of the "proper" coding practices that scatter the data around in separate objects or data structures if that causes cache trashing). Same with all the subroutines. Put your working code set all in one module if necessary to stripe it all in the icache. If the processor has multiple levels of cache, try to fit in the lowest level of instruction or data cache possible. Don't use floating point unit or any other instructions that may power up any other optional functional units unless you can make a good case that use of these instructions significantly shortens the time that the CPU is out of sleep mode.
etc.
Don't poll, sleep
Avoid using power hungry areas of the chip when possible. For example multipliers are power hungry, if you can shift and add you can save some Joules (as long as you don't do so much shifting and adding that actually the multiplier is a win!)
If you are really serious,l get a power-aware debugger, which can correlate power usage with your source code. Like this

Resources