ftrace: output through GPIO - linux-kernel

I am doing some research and need to collect all the kernel function calls within a certain time span, e.g. 60 seconds. I am using Raspberry Pi 4B.
I've tried to use the function tracer ftrace and read the trace_pipe via
echo function > current_tracer
echo 1 > tracing_on
cat trace_pipe > /home/pi/trace/test.txt
This method seems to be too slow and too much data gets lost due to overfilled buffer: approx. 50-60M data points get lost and I only get about 3 M data points. So that's not a good statistics.
I also tried to use trace-cmd:
trace-cmd record -p function sleep 60
With trace-cmd about 20 M data points get lost, which is much better, but still not good enough to build a good statistics. Furthermore the file I get by doing
trace-cmd report > /home/pi/trace/test_trace-cmd.txt
is about 5-6 Gb and takes a few minutes to write. I don't have an intention to make this file smaller (I assume it is impossible). But I just can't wait for so long.
I also worry about producing too much overhead to the system by saving such big trace files. Is it the case?
I am wondering, if it would possible to direct the output of the trace_pipe (or maybe of some other tracing file) to some I/O pin, so that I can connect some logic analyser to this pin and read the data flow by some other device? There will be no need to save the tracing file on the raspberry itself then. I also hope I can reduce the amount of data getting lost.

Related

What is good way of using multiprocessing for bifacial_radiance simulations?

For a university project I am using bifacial_radiance v0.4.0 to run simulations of approx. 270 000 rows of data in an EWP file.
I have set up a scene with some panels in a module following a tutorial on the bifacial_radiance GitHub page.
I am running the python script for this on a high power computer with 64 cores. Since python natively only uses 1 processor I want to use multiprocessing, which is currently working. However it does not seem very fast, even when starting 64 processes it uses roughly 10 % of the CPU's capacity (according to the task manager).
The script will first create the scene with panels.
Then it will look at a result file (where I store results as csv), and compare it to the contents of the radObj.metdata object. Both metdata and my result file use dates, so all dates which exist in the metdata file but not in the result file are stored in a queue object from the multiprocessing package. I also initialize a result queue.
I want to send a lot of the work to other processors.
To do this I have written two function:
A file writer function which every 10 seconds gets all items from the result queue and writes them to the result file. This function is running in a single multiprocessing.Process process like so:
fileWriteProcess = Process(target=fileWriter,args=(resultQueue,resultFileName)).start()
A ray trace function with a unique ID which does the following:
Get an index ìdx from the index queue (described above)
Use this index in radObj.gendaylit(idx)
Create the octfile. For this I have modified the name which the octfile is saved with to use a prefix which is the name of the process. This is to avoid all the processes using the same octfile on the SSD. octfile = radObj.makeOct(prefix=name)
Run an analysis analysis = bifacial_radiance.AnalysisObj(octfile,radObj.basename)
frontscan, backscan = analysis.moduleAnalysis(scene)
frontDict, backDict = analysis.analysis(octfile, radObj.basename, frontscan, backscan)
Read the desired results from resultDict and put them in the resultQueue as a single line of comma-separated values.
This all works. The processes are running after being created in a for loop.
This speeds up the whole simulation process quite a bit (10 days down to 1½ day), but as said earlier the CPU is running at around 10 % capacity and the GPU is running around 25 % capacity. The computer has 512 GB ram which is not an issue. The only communication with the processes is through the resultQueue and indexQueue, which should not bottleneck the program. I can see that it is not synchronizing as the results are written slightly unsorted while the input EPW file is sorted.
My question is if there is a better way to do this, which might make it run faster? I can see in the source code that a boolean "hpc" is used to initiate some of the classes, and a comment in the code mentions that it is for multiprocessing, but I can't find any information about it elsewhere.

Transfer data async from one program to another

I am running a python program on a Raspberry Pi. This program writes data to a txt-file every second (every second some data is changed).
On a laptop I am running a Studio Basic program that reads that data file over the network from the Raspberry. This works OK as long as the time between the reads from that that file are more than 15 seconds apart. If I read/access faster than the same data is read. It looks that the windows program reads from a cache if it is accessed in less than 15 seconds. Is there a way to change the time limit so I can read more often (let us say every 5 seconds).
Note if I read the txt-data file using another python program in the Raspberry Pi than the changed data is read OK by that program. So the problem lies in the Windows system.
Please refer to this File Caching document, use win32file.CreateFile and specify FILE_FLAG_NO_BUFFERING to disable the cache, all read and write operations will directly access the physical disk.
EDIT :
For using CreateFile in VB.net, please refer to:
https://social.msdn.microsoft.com/Forums/en-US/4a2ebfaa-d56d-487a-b03d-0f9ca72e3bbc/createfile-and-deviceiocontrol-function-in-vbnet?forum=winembplatdev

How to process sensor data in LabVIEW? Every value is 255

I'm trying to read data from the Yost Labs 3-Space Sensor Nano into LabVIEW via an NI MyRIO (1900). I was able to set up a sequence that communicates with the sensor through SPI. However, every time I run the program, it just spits out a single value of 255.
I think understand that I need to include something that allows all the bytes to be read. I just don't know how to go about it.
As an example, I'm trying to read the gyros (0x26) which have a return length of 12 and is a vector (float x3).
Here is my labview code
and here is the manual for the sensor. The commands I'm using are on pages 29-33. In the image, 0x2B is 'read temperature'.
Any help would be greatly appreciated! Thanks :)
Edit: i had messed up the wiring so now the output jumps between ~35 to 255. I'm still having trouble getting all 3 gyro values from the SPI read.
Quoting from Joe Friedrichsen in his comment:
The express block that resets the sensor is not guaranteed to precede the loop because there is no data flow between them. The LabVIEW runtime can see two independent and parallel groups and may choose to execute them simultaneously (which on the wire might mean reset comes between loop commands) or in "reverse" order. Add a wire from reset block to create a terminal on the loop.
Here's a picture of the fix.
You may wish to consider stringing the error wire through your program and wiring it to the stop terminal of the While Loop. Currently, your loop will keep running even if there's a fault in your hardware. Using the error wire would eliminate the need for the flat sequence structure.

hashes output more often in command line ftp

I use a command line ftp (the one on mac osx) inside some application I developed.
The thing is, this application will be used on very slow internet connexions.
In the actual configuration, I use the stdout with hashes to determine upload progress this way:
I first get the exact file size.
I activate hash output in ftp
3 I then simply count each time I read something from stdout how many hashes have been printed, and add them to my interal counter
4 multiplying this hashcount by 1024 bytes, I get the transferred data over the total data to caculate percentage.
This works but isn't as fluid as I would like it to be..
The final results jumps 20-30% at a time and 'waits' for 2-3 seconds between each stdout.
on large files with a fast internet connection, let's say a 100mb file over a 50mbps connection, this is very fluid..
Is there a way to tell ftp to output the current upload state more often? like based on a time interval, every 200ms or so... ?

I/O completion port silently fails to read completely

I'm developing a program that needs to write a large amout of data to disk then read back much smaller amount of data back later on. It needs to "bin" related data together then once it figures out what to do with it, then it can process the data further. It's basically acting like a database, but with temp files on disk. Portions of the temp files get reused fairly frequently as I don't care about the data on disk after I read it back out, so that portion of the file can be recycled. I'm using I/O completion ports to implement this because sequential I/O is simply too slow.
The problem is that sometimes when I read the data, I don't get all of it back. For example, I will zero out my read buffer, do a read operation of, say, 20 bytes, and when the corresponding completion event triggers, some or even none of my read buffer will match what should be on disk, but all of it won't be zeroed out. Occasionally, I can detect this and try sleeping 5 seconds and reading the same portion again, and it matches what I read in the first try. This is taking place on a top of the line SSD, so 5 seconds should be plenty to flush to disk. However, when I stop my application and look at the contents of the file, it's correct on disk. It's as if the previous write hasn't flushed to disk and it tried reading old data.
To test that theory, I tried writing 0xFF on entire sections as I read them. When this error happened again, my read buffer did not contain 0xFFs as I would have expected. So presumably, I'm not reading old data.
I also checked to make sure that the number of bytes returned from the completion event matched the number of bytes that I passed to ReadFile, and they do match. There is no error returned by the completion event or by ReadFile (other than ERROR_IO_PENDING). I am creating my temp files with FILE_ATTRIBUTE_NORMAL, FILE_FLAG_OVERLAPPED, and FILE_FLAG_RANDOM_ACCESS.
I also tried waiting for all pending writes for a given portion of the file to complete before trying to read, but to no avail. I would hope that Windows would do that for me, but it isn't covered in any documentation that I've read.
I'm really at a loss as to why I'm getting what look to be partial or corrupted reads. I'm really just looking for some ideas that might cause this behavior because I'm all out.
From the sound of things you're firing off writes and reads to the same portions of the same file and sometimes the data that the read returns isn't what you think you have previously written.
I assume you are waiting for the write completion for a piece of data before issuing a read request for the same area of the file? If not the read could be occurring before the write completes? When lots of data is being written to the same disk the write completions may begin to slow down and writes may spend more time pending (watch out for the resources that this consumes!)
Personally I'd include my own memory cache layer which knows about the data block until the write completion occurs - you can then satisfy reads for this part of the file from your cache if the write has not yet completed.

Resources