Delphi timer more precise than milliseconds - windows

I've got a program in Delphi which takes in frames from an external application in 25 hertz (25 times per seconds) and then converts it to 60 hertz (60 frames per second) by creating 1-2 extra frames. I need to output these extra frames by continuously building a frame buffer and outputting the frames from here from a separate thread. The problem is that 1000/60 is 16.66667 which means I can't just send the frames in a "interval" on 16 or 17 milliseconds, I need it to be more precise. How do I do this in Delphi/Windows?

Use a multimedia timer via the Win32 API timeSetEvent() or CreateTimerQueueTimer() function.

You probably need to make use of both of the following:
The high resolution timer. This is available through the QueryPerformanceCounter Win32 function, and also wrapped by the Delphi TStopwatch type.
A waitable timer. This allows you to set a due date in the future and have your thread block, and be woken at the due date.
Both of these have higher resolution than the GUI timer, and should suffice for your needs. Read an overview here: http://msdn.microsoft.com/en-gb/library/windows/desktop/ms644900.aspx

Related

Performance Counters and IMC Counter Not Matching

I have an Intel(R) Core(TM) i7-4720HQ CPU # 2.60GHz (Haswell) processor. In a relatively idle situation, I ran the following Perf commands and their outputs are shown, below. The counters are offcore_response.all_data_rd.l3_miss.any_response and mem_load_uops_retired.l3_miss:
sudo perf stat -a -e offcore_response.all_data_rd.l3_miss.any_response,mem_load_uops_retired.l3_miss sleep 10
Performance counter stats for 'system wide':
3,713,037 offcore_response.all_data_rd.l3_miss.any_response
2,909,573 mem_load_uops_retired.l3_miss
10.016644133 seconds time elapsed
These two values seem consistent, as the latter excludes prefetch requests and those not targeted at DRAM. But they do not match the read counter in the IMC. This counter is called UNC_IMC_DRAM_DATA_READS and documented here. I read the counter reread it 1 second later. The difference was around 30,000,000 (EDITED). If multiplied by 10 (to estimate for 10 seconds) the resulting value will be around 300 million (EDITED), which is 100 times the value of the above-mentioned performance counters (EDITED). It is nowhere near 3 million! What am I missing?
P.S.: The difference is much smaller (but still large), when the system has more load.
The question is also asked, here:
https://community.intel.com/t5/Software-Tuning-Performance/Performance-Counters-and-IMC-Counter-Not-Matching/m-p/1288832
UPDATE:
Please note that PCM output matches my IMC counter reads.
This is the relevant PCM output:
The values for columns READ, WRITE and IO are calculated based on UNC_IMC_DRAM_DATA_READS, UNC_IMC_DRAM_DATA_WRITES and UNC_IMC_DRAM_IO_REQUESTS, respectively. It seems that requests classified as IO will be either READ or WRITE. In other words, during the depicted one second interval, almost (because of the inaccuracy reported in the above-mentioned doc) 2.01GB of the 2.42GB READ and WRITE requests belong to IO. Based on this explanation, the above three columns seem consistent with each other.
The problem is that there still exists a LARGE gap between the IMC and PMC values!
The situation is the same when I boot in runlevel 1. The processes on the scheduler are one of swapper, kworker and migration. Disk IO is almost 85KB/s. I'm wondering what leads to such a (relatively) huge amount of IO. Is it possible to detect that (e.g., using a counter or a tool)?
UPDATE 2:
I think that there is something wrong with the IO column. It is always something in the range [1.99,2.01], regardless of the amount of load in the system!
UPDATE 3:
In runlevel 1, the average number of occurrences of the uops_retired.all event in a 1-second interval is 15,000,000. During the same period, the number of read requests recorded by the associated IMC counter is around 30,000,000. In other words, assuming that all memory accesses are directly caused by cpu instructions, for each retired micro-operation, there exists two memory accesses. This seems impossible specially concerning the fact that there exist multiple levels of caches. Therefore, in the idle scenario, perhaps, the read accesses are caused by IO.
Actually, it was mostly caused by the GPU device. This was the reason for exclusion from performance counters. Here is the relevant output for a sample execution of PCM on a relatively idle system with resolution 3840x2160 and refresh rate 60 using xrandr:
And this is for the situation with resolution 800x600 and the same refresh rate (i.e., 60):
As can be seen, changing screen resolution reduced read and IO traffic considerably (more than 100x!).

Computing End-To-End Delay in Veins

I have read a bunch of posts on SO regarding the computation of end-to-end delay in Veins, but have not found an answer to be fulfilling in explaining why the delay is seemingly too low.
I am using:
Veins 4.7
Sumo 0.32.0
Omnetpp 5.3
Channel switching is turned off.
I have the following code, sending a message from the transmitting node:
if(sendMessage) {
WaveShortMessage* wsm = new WaveShortMessage();
sendDown(wsm);
}
The receiving node computes the delay using the wsm creation time, but I have also tried setting the timestamp on the transmitting side. The result is the same.
simtime_t delay = simTime() - wsm -> getCreationTime();
delayVector.record(delay);
The sample output for the delay vector is as follows:
Item# Event# Time Value
0 165 14.400239402394 2.39402394E-4
1 186 14.500240403299 2.40403299E-4
2 207 14.600241404069 2.41404069E-4
3 228 14.700242404729 2.42404729E-4
Which means that the end-to-end delay (from creation to reception) is equivalent to roughly a quarter of a millisecond, which seems to be quite low - and a fair bit below what is typically reported in the literature. This seems to be consistent with what other people have reported as being an issue (e.g. end to end delay in Veins)
Am I missing something in this computation? I have tried adding load on the network by adding a high number of vehicular nodes (21 nodes within a 1000x50 sandbox on a straight highway, with an average speed of 50 km/h), but the result seems to be the same. The difference is negligible. I have read several research papers that suggest that end-to-end delay should increase dramatically in high vehicular densities.
This end-to-end delay is to be expected. If your application's simulation model does not explicitly model processing delay (e.g., by an application running on a slow general purpose computer), all you would expect to delay a frame is propagation delay (lightspeed, so negligible here) and queueing delay on the MAC (time from inserting frame into TX queue until transmission finishes).
To give an example, for a 2400 bit frame sent at 6 Mbit/s this delay is roughly 0.45 ms. You are likely using slightly shorter frames, so your values appear to be reasonable.
For background information, see F. Klingler, F. Dressler, C. Sommer: "The Impact of Head of Line Blocking in Highly Dynamic WLANs" (DOI 10.1109/TVT.2018.2837157), which also includes a comparison of theory vs. Veins vs. real measurements.

How to prevent SD card from creating write delays during logging?

I've been working on an Arduino (ATMega328p) prototype that has to log data during certain events. An LSM6DS33 sensor is used to generate 6 values (2 bytes each) at a sample rate of 104 Hz. This data needs to be logged for a period of 500-20000ms.
In my code, I generate an interrupt every 1/104 sec using Timer1. When this interrupt occurs, data is read from the sensor, calibrated and then written to an SD card. Normally, this is not an issue. Reading the data from the sensor takes ~3350us, calibrating ~5us and writing ~550us. This means a total cycle takes ~4000us, whereas 9615us is available.
In order to save power, I wish to lower the voltage to 3.3V. According to the atmel datasheet, this also means that the clock frequency should be lowered to 8MHz. Assuming everything will go twice as slow, a measurement cycle would still be possible because ~8000us < 9615us.
After some testing (still 5V#16MHz), however, it occured to me that every now and then, a write cycle would take ~1880us instead of ~550us. I am using the library SdFat to write and test SD cards (RawWrite example). The following results came in when I tested the card:
Start raw write of 100000 KB
Target rate: 100 KB/sec
Target time: 100 seconds
Min block write time: 1244 micros
Max block write time: 12324 micros
Avg block write time: 1247 micros
As seen, the average time to write is fairly consistent, but sometimes a peak duration of 10x average occurs! According to the writer of the library, this is because the SD card needs some erase cycles in between x amount of write cycles. This causes a write delay (src:post#18&#22). This delay, however, pushes the time required for a cycle out of the available 9615us bracket, because the total measure cycle would be 10672us.
The data I am trying to write, is first put into a string using sprintf:
char buf[20] = "";
sprintf(buf,"%li\t%li\t%li\t%li\t%li\t%li",rawData[0],rawData[1],rawData[2],rawData[3],rawData[4],rawData[5]);
myLog.println(buf);
This writes the data to a txt file. But at my speed rate, only 21*104=2184 B/s would suffice. Lowering the speed of the RawWrite example to 6 KB/s, causes the SD card to write without getting an extended write delay. Yet my code still has them, even though less data is written.
My question is: how do I prevent this delay from occurring (if possible)? And if not possible, how can I work around it? It would help if I understood why exactly the delay occurs, because the interval is not always the same (every 10-15 writes).
Some additional info:
The sketch currently uses 69% of RAM (2kB) with variables. Creating two 512 byte buffers - like suggested in the same forum - is not possible for me.
Initially, I used two strings. Merging them into one, didn't affect the write speed with any significance.
I don't know how to work around the delay, but I experience a more stable and faster writing time, if I wrote to a binary file instead of a ".csv" or .txt" file.
The following link provide a fine script to write data as a binary struct to the SD card. (There are some small typo in his example, it is easily fixed)
https://hackingmajenkoblog.wordpress.com/2016/03/25/fast-efficient-data-storage-on-an-arduino/
This will not help you with the time variation, but it might minimize the writing time, and thus negleting the time issue.

High Resolution GetMessageTime()?

WINAPI has a function GetMessageTime() that returns the time a message was generated in system time, which has a resolution of 10 to 16 ms. Is there an effective way to get the time an event occurred in interrupt time (100 ns precision), or in some other format with at least 1 ms precision?
Even with Raw Input, the message timing isn't delivered with < 10 ms resolution. (The timing data comes from the WM_INPUT message.) As far as I can tell from the keyboard driver sources, the timing data simply isn't collected with < 10 ms resolution.

Ruby sleep or delay less than a second?

I'm making a script with ruby that must render frames at 24 frames per second, but I need to wait 1/24th of a second between sending the commands. What is the best way to sleep for less than a second?
sleep(1.0/24.0)
As to your follow up question if that's the best way: No, you could get not-so-smooth framerates because the rendering of each frame might not take the same amount of time.
You could try one of these solutions:
Use a timer which fires 24 times a second with the drawing code.
Create as many frames as possible, create the motion based on the time passed, not per frame.
Pass a float to sleep, like:
sleep 0.1

Resources