What is good way of using multiprocessing for bifacial_radiance simulations? - multiprocessing

For a university project I am using bifacial_radiance v0.4.0 to run simulations of approx. 270 000 rows of data in an EWP file.
I have set up a scene with some panels in a module following a tutorial on the bifacial_radiance GitHub page.
I am running the python script for this on a high power computer with 64 cores. Since python natively only uses 1 processor I want to use multiprocessing, which is currently working. However it does not seem very fast, even when starting 64 processes it uses roughly 10 % of the CPU's capacity (according to the task manager).
The script will first create the scene with panels.
Then it will look at a result file (where I store results as csv), and compare it to the contents of the radObj.metdata object. Both metdata and my result file use dates, so all dates which exist in the metdata file but not in the result file are stored in a queue object from the multiprocessing package. I also initialize a result queue.
I want to send a lot of the work to other processors.
To do this I have written two function:
A file writer function which every 10 seconds gets all items from the result queue and writes them to the result file. This function is running in a single multiprocessing.Process process like so:
fileWriteProcess = Process(target=fileWriter,args=(resultQueue,resultFileName)).start()
A ray trace function with a unique ID which does the following:
Get an index ìdx from the index queue (described above)
Use this index in radObj.gendaylit(idx)
Create the octfile. For this I have modified the name which the octfile is saved with to use a prefix which is the name of the process. This is to avoid all the processes using the same octfile on the SSD. octfile = radObj.makeOct(prefix=name)
Run an analysis analysis = bifacial_radiance.AnalysisObj(octfile,radObj.basename)
frontscan, backscan = analysis.moduleAnalysis(scene)
frontDict, backDict = analysis.analysis(octfile, radObj.basename, frontscan, backscan)
Read the desired results from resultDict and put them in the resultQueue as a single line of comma-separated values.
This all works. The processes are running after being created in a for loop.
This speeds up the whole simulation process quite a bit (10 days down to 1½ day), but as said earlier the CPU is running at around 10 % capacity and the GPU is running around 25 % capacity. The computer has 512 GB ram which is not an issue. The only communication with the processes is through the resultQueue and indexQueue, which should not bottleneck the program. I can see that it is not synchronizing as the results are written slightly unsorted while the input EPW file is sorted.
My question is if there is a better way to do this, which might make it run faster? I can see in the source code that a boolean "hpc" is used to initiate some of the classes, and a comment in the code mentions that it is for multiprocessing, but I can't find any information about it elsewhere.

Related

Looking for a more efficient way to pull data from multiple datasets in SAS

I'm trying to find a more efficient and speedier way (if possible) to pull subsets of observations that meet certain criteria from multiple hospital claims datasets in SAS. A simplified but common type of data pull would look like this:
data out.qualifying_patients;
set in.state1_2017
in.state1_2018
in.state1_2019
in.state1_2020
in.state2_2017
in.state2_2018
in.state2_2019
in.state2_2020;
array prcode{*} I10_PR1-I10_PR25;
do i=1 to 25;
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then cohort=1;
end;
if cohort=1 then output;
run;
Now imagine that instead of 2 states and 4 years we have 18 states and 9 years -- each about 1GB in size. The code above works fine but it takes FOREVER to run on our non-optimized server setup. So I'm looking for alternate methods to perform the same task but hopefully at a faster clip.
I've tried including (KEEP=) or (DROP=) statements for each dataset included the SET statement to limit the variables being scanned, but this really didn't have much of an impact on speed -- and, for non-coding-related reasons, we pretty much need to pull all the variables.
I've also experimented a bit with hash tables but it's too much to store in memory so that didn't seem to solve the issue. This also isn't a MERGE issue which seems to be what hash tables excel at.
Any thoughts on other approaches that might help? Every data pull we do contains customized criteria for a given project, but we do these pulls a lot and it seems really inefficient to constantly be processing thru the same datasets over and over but not benefitting from that. Thanks for any help!
I happend to have a 1GB dataset on my compute, I tried several times, it takes SAS no more than 25 seconds to set the dataset 8 times. I think the set statement is too simple and basic to improve its efficient.
I think the issue may located at the do loop. Your program runs do loop 25 times for each record, may assigns to cohort more than once, which is not necessary. You can change it like:
do i=1 to 25 until(cohort=1);
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then cohort=1;
end;
This can save a lot of do loops.
First, parallelization will help immensely here. Instead of running 1 job, 1 dataset after the next; run one job per state, or one job per year, or whatever makes sense for your dataset size and CPU count. (You don't want more than 1 job per CPU.). If your server has 32 cores, then you can easily run all the jobs you need here - 1 per state, say - and then after that's done, combine the results together.
Look up SAS MP Connect for one way to do multiprocessing, which basically uses rsubmits to submit code to your own machine. You can also do this by using xcmd to literally launch SAS sessions - add a parameter to the SAS program of state, then run 18 of them, have them output their results to a known location with state name or number, and then have your program collect them.
Second, you can optimize the DO loop more - in addition to the suggestions above, you may be able to optimize using pointers. SAS stores character array variables in memory in adjacent spots (assuming they all come from the same place) - see From Obscurity to Utility:
ADDR, PEEK, POKE as DATA Step Programming Tools from Paul Dorfman for more details here. On page 10, he shows the method I describe here; you PEEKC to get the concatenated values and then use INDEXW to find the thing you want.
data want;
set have;
array prcode{*} $8 I10_PR1-I10_PR25;
found = (^^ indexw (peekc (addr(prcode[1]), 200 ), '0DTJ0ZZ')) or
(^^ indexw (peekc (addr(prcode[1]), 200 ), '0DTJ4ZZ'))
;
run;
Something like that should work. It avoids the loop.
You also could, if you want to keep the loop, exit the loop once you run into an empty procedure code. Usually these things don't go all 25, at least in my experience - they're left-filled, so I10_PR1 is always filled, and then some of them - say, 5 or 10 of them - are filled, then I10_PR11 and on are empty; and if you hit an empty one, you're all done for that round. So not just leaving when you hit what you are looking for, but also leaving when you hit an empty, saves you a lot of processing time.
You probably should consider a hardware upgrade or find someone who can tune your server. This paper suggests tips to improve the processing of large datasets.
Your code is pretty straightforward. The only suggestion is to kill the loop as soon as the criteria is met to avoid wasting unnecessary resources.
do i=1 to 25;
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then do;
output; * cohort criteria met so output the row;
leave; * exit the loop immediately;
end;
end;

TwinCAT fails to save data to CSV

I am part of tractor pulling team and we have Bechoff CX8190 based PLC for data logging. System works most of the time but every now and then saving sensor values (every 10ms is collected) to CSV fails (mostly in middle of csv row). Guy who build the code is new with the TwinCAT and does not know how to find what causes that. Any Ideas where to look reason for this.
Writing to a file is always a asynchron action in TwinCAT. That is to say this is no realtime action and it is not safe that the writing process is done within the task cycletime of 10ms. Therefore these functionblocks always have a BUSY-output which has to be evaluated and the functionblock has to be called successivly until the BUSY-output returns to FALSE. Only then a new write command can be executed.
I normally tackle this task with a two-side-buffer algorithm. Lets say the buffer-array has 2x100 entries. So fill up the first 100 entries with sample values. Then write them all together with one command to the file. When its done, clean the buffer. In the meanwhile the other half of the buffer can be filled with sample values. If second side is full, write them all together to the file ... and so on. So you have more time for the filesystem access (in the example above 100x10ms=1s) as the 10ms task cycletime.
But this is just a suggestion out of my experience. I agree with the others, some code could really help.

ftrace: output through GPIO

I am doing some research and need to collect all the kernel function calls within a certain time span, e.g. 60 seconds. I am using Raspberry Pi 4B.
I've tried to use the function tracer ftrace and read the trace_pipe via
echo function > current_tracer
echo 1 > tracing_on
cat trace_pipe > /home/pi/trace/test.txt
This method seems to be too slow and too much data gets lost due to overfilled buffer: approx. 50-60M data points get lost and I only get about 3 M data points. So that's not a good statistics.
I also tried to use trace-cmd:
trace-cmd record -p function sleep 60
With trace-cmd about 20 M data points get lost, which is much better, but still not good enough to build a good statistics. Furthermore the file I get by doing
trace-cmd report > /home/pi/trace/test_trace-cmd.txt
is about 5-6 Gb and takes a few minutes to write. I don't have an intention to make this file smaller (I assume it is impossible). But I just can't wait for so long.
I also worry about producing too much overhead to the system by saving such big trace files. Is it the case?
I am wondering, if it would possible to direct the output of the trace_pipe (or maybe of some other tracing file) to some I/O pin, so that I can connect some logic analyser to this pin and read the data flow by some other device? There will be no need to save the tracing file on the raspberry itself then. I also hope I can reduce the amount of data getting lost.

VBscript alternatives to writing variables to a file needed (FSO.write is too slow)

TL;DR: I need something way faster than FSO.write OR another way to share a variable in memory between different script instances.
Hello, I am running CCPulse (on Windows 7), which is a Call Center monitoring tool. Agents are represented as "Objects" and can have various statistics (like calls taken, total talk duration etc). CCPulse allows to apply thresholds and actions to any statistic. These are basically vbscripts and as far as I can tell, there are no restrictions.
This allows me to take the "Threshold StatValue" and do things with it, ie writing it to a file. The issue is that if I apply a threshold to a statistic for all agents, the script executes for each agent object seperately (in sequence, not parallel). However, I want to export all the agent stats to a single csv file.
I already got it working, by creating a file if it doesn't exist, then open/ReadAll into a string. If an agent has not been written to the file yet his stat values get appended as a newline in the string, if he already exists in this file I search and replace his line using a regex pattern. I then write the entire multiline string back to the file:
Set objFile = objFSO.OpenTextFile(inFile,2)
objFile.Write strMemoryBuffer
objFile.Close
set objFile = nothing
strMemoryBuffer contains the files original content, with either a new line or a modified line. This string (and subsequently the export file) is around 30kb in size after all agents have been exported. It looks like this (simplified):
LoginID;Calls;TotalTalkTime
2243;08;9403
2132;12;8439
As I said, since the script runs seperately for each agent, only one line is ever added/modified per pass (CCpulse will execute the script one object at a time, until all are finished).
The write process is very slow however, using Timer() it says it needs between 0.10 and 0.15 seconds! That is way too slow, as I need to run the script on almost 500 agents (ideally in no more than 30 second intervals), but all the writing would take over a minute (CCPulse would create a backlog of threshold operations which could never be finished. I can decrease the recalculation frequency, but that is detrimental in other ways).
If I comment out only the above block, execution time dramatically decreases to ~0.02 seconds. So reading the file and manipulating the string takes almost no time at all, just the write process is slow.
I am writing the file locally to a hard drive (no SSD though). I cannot use a RAM Disk.
I also already tried writing to the volatile environment, but somehow, this is even slower (it does work, but for some reason the explorer process goes crazy with up to 50% cpu usage and ccpulse locks up, allthough the export file is still being updated).
The ideal solution would to have the string being repeadetly manipulated only in memory, and then written to file like only once every 30 seconds or something like that, but I don't know how I can make the strMemoryBuffer variable available to the "next" agent. Any ideas?

Parallel processing in condor

I have a java program that will process 800 images.
I decided to use Condor as a platform for distributed computing, aiming that I can divide those images onto available nodes -> get processed -> combined the results back to me.
Say I have 4 nodes. I want to divide the processing to be 200 images on each node and combine the end result back to me.
I have tried executing it normally by submitting it as java program and stating the requirements = Machine == .. (stating all nodes). But it doesn't seem to work.
How can I divide the processing and execute it in parallel?
HTCondor can definitely help you but you might need to do a little bit of work yourself :-)
There are two possible approaches that come to mind: job arrays and DAG applications.
Job arrays: as you can see from example 5 on the HTCondor Quick Start Guide, you can use the queue command to submit more than 1 job. For instance, queue 800 at the bottom of your job file would submit 800 jobs to your HTCondor pool.
What people do in this case is organize the data to process using a filename convention and exploit that convention in the job file. For instance you could rename your images as img_0.jpg, img_1.jpg, ... img_799.jpg (possibly using symlinks rather than renaming the actual files) and then use a job file along these lines:
Executable = /path/to/my/script
Arguments = /path/to/data/dir/img_$(Process)
Queue 800
When the 800 jobs run, $(Process) gets automatically assigned the value of the corresponding process ID (i.e. a integer going from 0 to 799). Which means that your code will pick up the correct image to process.
DAG: Another approach is to organize your processing in a simple DAG. In this case you could have a pre-processing script (SCRIPT PRE entry in your DAG file) organizing your input data (possibly creating symlinks named appropriately). The real job would be just like the example above.

Resources