This is a two part question:
I have a fluid flow sensor connected to an NI-9361 on my DAQ. For those that don't know, that's a pulse counter card. None the less, from the data read from the card, I'm able to calculate fluid flowing through the device in Gallons per hour, min, sec, etc. But what I need to do is the following:
Calculate total number of gallons of fluid that has flowed through the sensor since the loop began running
if possible, store that total so that it can be incremented next time the program runs
I know how to calculate it by hand, just not sure how to achieve the running summation required to calculate total amount of fluid that has passed through the sensor, or how to store the variable being incremented at the next program execution. I'm presuming the latter would involve writing a TDMS file, then opening and reading back the data, unless there's a better way?
Edit:
Below is the code used to determine GPM flow through my sensor. This setup is in accordance with the 9361 manual; it executes and yields proper results.
See this link for details:
http://zone.ni.com/reference/en-XX/help/373197L-01/criodevicehelp/crio-9361/
I can extrapolate how many gallons flow per second, or sample period, the 1526.99 scalar is the fluid flow manufacturer's constant - number of pulses per gallon passing through the sensor. The 9361 is set to frequency/period mode, so I'm calculating cycles per second, dividing by the constant for cycles per gallon to get gallons per second/min.
I suppose I could get a time reference by looking at the sample period, so I guess the better question is, how do I keep an incrementing sum?
Related
I executed a linear search on an array containing all unique elements in range [1, 10000], sorted in increasing order with all search values i.e., from 1 to 10000 and plotted the runtime vs search value graph as follows:
Upon closely analysing the zoomed in version of the plot as follows:
I found that the runtime for some larger search values is smaller than the lower search values and vice versa
My best guess for this phenomenon is that it is related to how data is processed by CPU using primary memory and cache, but don't have a firm quantifiable reason to explain this.
Any hint would be greatly appreciated.
PS: The code was written in C++ and executed on linux platform hosted on virtual machine with 4 VCPUs on Google Cloud. The runtime was measured using the C++ Chrono library.
CPU cache size depends on the CPU model, there are several cache levels, so your experiment should take all those factors into account. L1 cache is usually 8 KiB, which is about 4 times smaller than your 10000 array. But I don't think this is cache misses. L2 latency is about 100ns, which is much smaller than the difference between lowest and second line, which is about 5 usec. I suppose this (second line-cloud) is contributed from the context switching. The longer the task, the more probable the context switching to occur. This is why the cloud on the right side is thicker.
Now for the zoomed in figure. As Linux is not a real time OS, it's time measuring is not very reliable. IIRC it's minimal reporting unit is microsecond. Now, if a certain task takes exactly 15.45 microseconds, then its ending time depends on when it started. If the task started at exact zero time clock, the time reported would be 15 microseconds. If it started when the internal clock was at 0.1 microsecond in, than you will get 16 microsecond. What you see on the graph is a linear approximation of the analogue straight line to the discrete-valued axis. So the tasks duration you get is not actual task duration, but the real value plus task start time into microsecond (which is uniformly distributed ~U[0,1]) and all that rounded to the closest integer value.
As my understanding, pprof stops and samples go program at every 10ms. So a 30s program should got 3000 samples, but what’s the meaning of the 26.26s? How can the samples count be shown as time duration?
What’s more, I even ever got such output shows that the sample time is bigger than wall time, how could it be such result?
Duration: 5.13s, Total samples = 5.57s (108.58%)
That confusing wording was reported in google/pprof issue 128
The "Total samples" part is confusing.
Milliseconds are continuous, but samples are discrete — they're individual points, so how can you sum them up into a quantity of milliseconds?
The "sum" is the sum of a discrete number (a quantity of samples), not a continuous range (a time interval).
Reporting the sums makes perfect sense, but reporting discrete numbers using continuous units is just plain confusing.
Please update the formatting of the Duration line to give a clearer indication of what a quantity of samples reported in milliseconds actually means.
Raul Silvera's answer:
Each callstack in a profile is associated to a set of values. What is reported here is the sum of these values for all the callstacks in the profile, and is useful to understand the weight of individual frames over the full profile.
We're reporting the sum using the unit described in the profile.
Would you have a concrete suggestion for this example?
Maybe just a rewording would help, like:
Duration: 1.60s, Samples account for 14.50ms (0.9%)
There are still pprof improvements discussed for pprof in golang/go issue 36821
pprof CPU profiles lack accuracy (closeness to the ground truth) and precision (repeatability across different runs).
The issue is with the use of OS timers used for sampling; OS timers are coarse-grained and have a high skid.
I will propose a design to extend CPU profiling by sampling CPU Performance Monitoring Unit (PMU) aka hardware performance counters
It includes examples where Total samples exceeds duration.
Dependence on the number of cores and length of test execution:
The results of goroutine.go test depend on the number of CPU cores available.
On a multi-core CPU, if you set GOMAXPROCS=1, goroutine.go will not show a huge variation, since each goroutine runs for several seconds.
However, if you set GOMAXPROCS to a larger value, say 4, you will notice a significant measurement attribution problem.
One reason for this problem is that the itimer samples on Linux are not guaranteed to be delivered to the thread whose timer expired.
Since Go 1.17 (and improved in Go 1.18), you can add pprof labels to know more:
A cool feature of Go's CPU profiler is that you can attach arbitrary key value pairs to a goroutine. These labels will be inherited by any goroutine spawned from that goroutine and show up in the resulting profile.
Let's consider the example below that does some CPU work() on behalf of a user.
By using the pprof.Labels() and pprof.Do() API, we can associate the user with the goroutine that is executing the work() function.
Additionally the labels are automatically inherited by any goroutine spawned within the same code block, for example the backgroundWork() goroutine.
func work(ctx context.Context, user string) {
labels := pprof.Labels("user", user)
pprof.Do(ctx, labels, func(_ context.Context) {
go backgroundWork()
directWork()
})
}
How you use these labels is up to you.
You might include things such as user ids, request ids, http endpoints, subscription plan or other data that can allow you to get a better understanding of what types of requests are causing high CPU utilization, even when they are being processed by the same code paths.
That being said, using labels will increase the size of your pprof files. So you should probably start with low cardinality labels such as endpoints before moving on to high cardinality labels once you feel confident that they don't impact the performance of your application.
I've been working on an Arduino (ATMega328p) prototype that has to log data during certain events. An LSM6DS33 sensor is used to generate 6 values (2 bytes each) at a sample rate of 104 Hz. This data needs to be logged for a period of 500-20000ms.
In my code, I generate an interrupt every 1/104 sec using Timer1. When this interrupt occurs, data is read from the sensor, calibrated and then written to an SD card. Normally, this is not an issue. Reading the data from the sensor takes ~3350us, calibrating ~5us and writing ~550us. This means a total cycle takes ~4000us, whereas 9615us is available.
In order to save power, I wish to lower the voltage to 3.3V. According to the atmel datasheet, this also means that the clock frequency should be lowered to 8MHz. Assuming everything will go twice as slow, a measurement cycle would still be possible because ~8000us < 9615us.
After some testing (still 5V#16MHz), however, it occured to me that every now and then, a write cycle would take ~1880us instead of ~550us. I am using the library SdFat to write and test SD cards (RawWrite example). The following results came in when I tested the card:
Start raw write of 100000 KB
Target rate: 100 KB/sec
Target time: 100 seconds
Min block write time: 1244 micros
Max block write time: 12324 micros
Avg block write time: 1247 micros
As seen, the average time to write is fairly consistent, but sometimes a peak duration of 10x average occurs! According to the writer of the library, this is because the SD card needs some erase cycles in between x amount of write cycles. This causes a write delay (src:post#18). This delay, however, pushes the time required for a cycle out of the available 9615us bracket, because the total measure cycle would be 10672us.
The data I am trying to write, is first put into a string using sprintf:
char buf[20] = "";
sprintf(buf,"%li\t%li\t%li\t%li\t%li\t%li",rawData[0],rawData[1],rawData[2],rawData[3],rawData[4],rawData[5]);
myLog.println(buf);
This writes the data to a txt file. But at my speed rate, only 21*104=2184 B/s would suffice. Lowering the speed of the RawWrite example to 6 KB/s, causes the SD card to write without getting an extended write delay. Yet my code still has them, even though less data is written.
My question is: how do I prevent this delay from occurring (if possible)? And if not possible, how can I work around it? It would help if I understood why exactly the delay occurs, because the interval is not always the same (every 10-15 writes).
Some additional info:
The sketch currently uses 69% of RAM (2kB) with variables. Creating two 512 byte buffers - like suggested in the same forum - is not possible for me.
Initially, I used two strings. Merging them into one, didn't affect the write speed with any significance.
I don't know how to work around the delay, but I experience a more stable and faster writing time, if I wrote to a binary file instead of a ".csv" or .txt" file.
The following link provide a fine script to write data as a binary struct to the SD card. (There are some small typo in his example, it is easily fixed)
https://hackingmajenkoblog.wordpress.com/2016/03/25/fast-efficient-data-storage-on-an-arduino/
This will not help you with the time variation, but it might minimize the writing time, and thus negleting the time issue.
I have a book describing energy saving compiler algorithms with a variable having "cycles" as measuring unit for the "distance" until something happens (an HDD is put into idle mode).
But the results for efficiency of the algorithm have just "time" on one axis of a diagram, not "cycles". So is it safe to assume (i.e. my understanding of the cycle concept) that unless something like dynamic frequency scaling is used, cycles are equal to real physical time (seconds for example)?
The cycles are equal to real physical time, for example a CPU with a 1 GHz frequency executes 1,000,000,000 cycles per second which is the same as 1 over 1,000,000,000 seconds per cycle or, in a other words a cycle per nanosecond. In the case of dynamic frequency that would change according to the change in frequency at any particular time.
I'm currentyl building something like a tiny software audio synthesizer on Window 7 in c++. The core engine is running and upon receiving midi events it plays notes, changes programmes, etc. What puzzles me at the moment is where to put the 0 db reference sound pressure level of the output channels.
Let's say the synthesizer produces a sinewave with 440 Hz with an amplitude of |0.5f| . In order to calculate the sound level in db I need to set the reference level (0 db). Does anyone know something like a default for this?
When decibel relative to full scale is in question, AKA dBFS, zero dB is assigned to the maximum possible digital level. A quote from Wiki:
0 dBFS is assigned to the maximum possible digital level.[1] for
example, a signal that reaches 50% of the maximum level at any point
would peak at -6 dBFS i.e. 6 dB below full scale. All peak
measurements will be negative numbers, unless they reach the maximum
digital value.
First you need to be clear about units. dB on its own is a ratio, not an absolute value. As #Roman R. suggested, you can just use 0 dB to mean "full scale" and then your range will be 0 dB (max) to some negative dB value which corresponds to the minimum value that you are interested in (e.g. -120 dB). However this is just an arbitrary measurement which doesn't tell you anything about the absolute value of the signal.
In your question though you refer to dB SPL (SPL = Sound Pressure Level), which is an absolute unit. 0 dB SPL is typically defined as 20 µPa (RMS), which is around the threshold of human hearing, and in this case the range of interest might be say -20 dB SPL to say +120 dB SPL. However if you really do want to measure dB SPL and not just an arbitrary dB value then you will need to calibrate your system to take into account microphone gain, microphone frequency response, A-D sensitivity/gain, and various other factors. This is non-trivial, but essential if you actually want to implement some kind of SPL measuring system.