Vb.exe performance time - vb6

I am running a vb.exe through automation. In exe I have return a code which takes a data from database and saves that data into file. I ran that .exe for the first time. It took 1 mins. For testing baseline I called same .exe 5 times one after the other. But it took nearly 10 mins to generate.
My question is if it takes 1 min for 1 report to generate then it should take 5 mins to generate 5 report but why it is taking 10 mins (more than the double). Is there any problem while calling a exe one after the other?

No, there isn't a problem with calling an exe multiple times. However, there might be a problem with the data that you're fetching or writing. Maybe you didn't close the database connection correctly upon exiting the program and this causes a delay when you try to read from the database the second time.
But then again... without a more accurate description of what your program exactly does and the relevant pieces of source code, it's very hard to be more specific...

Related

Looking for a more efficient way to pull data from multiple datasets in SAS

I'm trying to find a more efficient and speedier way (if possible) to pull subsets of observations that meet certain criteria from multiple hospital claims datasets in SAS. A simplified but common type of data pull would look like this:
data out.qualifying_patients;
set in.state1_2017
in.state1_2018
in.state1_2019
in.state1_2020
in.state2_2017
in.state2_2018
in.state2_2019
in.state2_2020;
array prcode{*} I10_PR1-I10_PR25;
do i=1 to 25;
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then cohort=1;
end;
if cohort=1 then output;
run;
Now imagine that instead of 2 states and 4 years we have 18 states and 9 years -- each about 1GB in size. The code above works fine but it takes FOREVER to run on our non-optimized server setup. So I'm looking for alternate methods to perform the same task but hopefully at a faster clip.
I've tried including (KEEP=) or (DROP=) statements for each dataset included the SET statement to limit the variables being scanned, but this really didn't have much of an impact on speed -- and, for non-coding-related reasons, we pretty much need to pull all the variables.
I've also experimented a bit with hash tables but it's too much to store in memory so that didn't seem to solve the issue. This also isn't a MERGE issue which seems to be what hash tables excel at.
Any thoughts on other approaches that might help? Every data pull we do contains customized criteria for a given project, but we do these pulls a lot and it seems really inefficient to constantly be processing thru the same datasets over and over but not benefitting from that. Thanks for any help!
I happend to have a 1GB dataset on my compute, I tried several times, it takes SAS no more than 25 seconds to set the dataset 8 times. I think the set statement is too simple and basic to improve its efficient.
I think the issue may located at the do loop. Your program runs do loop 25 times for each record, may assigns to cohort more than once, which is not necessary. You can change it like:
do i=1 to 25 until(cohort=1);
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then cohort=1;
end;
This can save a lot of do loops.
First, parallelization will help immensely here. Instead of running 1 job, 1 dataset after the next; run one job per state, or one job per year, or whatever makes sense for your dataset size and CPU count. (You don't want more than 1 job per CPU.). If your server has 32 cores, then you can easily run all the jobs you need here - 1 per state, say - and then after that's done, combine the results together.
Look up SAS MP Connect for one way to do multiprocessing, which basically uses rsubmits to submit code to your own machine. You can also do this by using xcmd to literally launch SAS sessions - add a parameter to the SAS program of state, then run 18 of them, have them output their results to a known location with state name or number, and then have your program collect them.
Second, you can optimize the DO loop more - in addition to the suggestions above, you may be able to optimize using pointers. SAS stores character array variables in memory in adjacent spots (assuming they all come from the same place) - see From Obscurity to Utility:
ADDR, PEEK, POKE as DATA Step Programming Tools from Paul Dorfman for more details here. On page 10, he shows the method I describe here; you PEEKC to get the concatenated values and then use INDEXW to find the thing you want.
data want;
set have;
array prcode{*} $8 I10_PR1-I10_PR25;
found = (^^ indexw (peekc (addr(prcode[1]), 200 ), '0DTJ0ZZ')) or
(^^ indexw (peekc (addr(prcode[1]), 200 ), '0DTJ4ZZ'))
;
run;
Something like that should work. It avoids the loop.
You also could, if you want to keep the loop, exit the loop once you run into an empty procedure code. Usually these things don't go all 25, at least in my experience - they're left-filled, so I10_PR1 is always filled, and then some of them - say, 5 or 10 of them - are filled, then I10_PR11 and on are empty; and if you hit an empty one, you're all done for that round. So not just leaving when you hit what you are looking for, but also leaving when you hit an empty, saves you a lot of processing time.
You probably should consider a hardware upgrade or find someone who can tune your server. This paper suggests tips to improve the processing of large datasets.
Your code is pretty straightforward. The only suggestion is to kill the loop as soon as the criteria is met to avoid wasting unnecessary resources.
do i=1 to 25;
if prcode{i} in ("0DTJ0ZZ","0DTJ4ZZ") then do;
output; * cohort criteria met so output the row;
leave; * exit the loop immediately;
end;
end;

TwinCAT fails to save data to CSV

I am part of tractor pulling team and we have Bechoff CX8190 based PLC for data logging. System works most of the time but every now and then saving sensor values (every 10ms is collected) to CSV fails (mostly in middle of csv row). Guy who build the code is new with the TwinCAT and does not know how to find what causes that. Any Ideas where to look reason for this.
Writing to a file is always a asynchron action in TwinCAT. That is to say this is no realtime action and it is not safe that the writing process is done within the task cycletime of 10ms. Therefore these functionblocks always have a BUSY-output which has to be evaluated and the functionblock has to be called successivly until the BUSY-output returns to FALSE. Only then a new write command can be executed.
I normally tackle this task with a two-side-buffer algorithm. Lets say the buffer-array has 2x100 entries. So fill up the first 100 entries with sample values. Then write them all together with one command to the file. When its done, clean the buffer. In the meanwhile the other half of the buffer can be filled with sample values. If second side is full, write them all together to the file ... and so on. So you have more time for the filesystem access (in the example above 100x10ms=1s) as the 10ms task cycletime.
But this is just a suggestion out of my experience. I agree with the others, some code could really help.

How to profile/debug VBA script that works fine on XLS files, but hangs on XLSX files?

I have a VB script that does row processing on a small Excel file (35 columns, 2000 rows, no hidden content). It takes about 3 seconds to process the file when applied to an .xls file. If I save the .xls as a .xlsx file, the same script takes about 30 to 60 seconds, completely freezing Excel several times in the process (although it finishes and the results are correct).
I suspect some loop parts of the code are responsible, but I need to find out which. I have found scripts like https://stackoverflow.com/a/3506143/5064264 which basically amount to timing all parts of the script by hand - similar to printf-debugging, which is not very elegant and also consumes much time for large scripts.
Is there a better alternative to semi-manual profiling, like performance analyzers in most IDEs, or exist any other approaches to this problem? I do not want to post the code as it is rather long and I also want to solve it for other scripts in the future.
yes - kind of. I use this to test how fast things are running and experiment with different approaches to get better timings. I stole this from somewhere here on SO.
sub thetimingstuff()
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
'your code goes here
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
end sub
Another EDIT
you could bury debug.print lines sporadically through your code to see how much each chunk takes to execute.
The error in my case was as Comintern's comment suggested - a method was copying whole columns into a dictionary to reorder them, instead of just the cells with values in them. To preserve this for future readers, I am copying his/her excellent comment here, thanks again!
An xls file has a maximum of 65,536 rows. An xlsx file has a maximum of 1,048,576 rows. I'd start by searching for .Rows. My guess is that you have some code iterates the entire worksheet instead of only rows with data in them.
As for generic profiling or how to get there, I used a large amount of breakpoints in the editor (set/remove with left-click in the line number column of the code editor):
I first set them every 20 lines apart in the main method, ran the program with the large dataset, then after each press of continue (F5) manually measured if it was slow or not.
Then I removed the breakpoints in the fast regions and added additional ones in the slow regions to further narrow it down into methods and then into lines inside the methods.
At the end, I could verify if by commenting out the responsible line of code and fix it.

vsubst in batch file

In my office they every day double click on the program vsubst to assign a partition of the space on the computer as a drive. I want to make a batch file out of it so they do not have to do this every day. It takes 5 seconds, but 5 people times 5 seconds times 250 working days is 6250 seconds a year!
Can anyone help me out here? This shouldnt be to difficult. I run a windows machine and could put it in the windows task manager.
I guess its something like: Start ....vsubst.exe param param1
Ok. Trial and error got me there. Fill the following line in and place the batch file in your startup programs and you're done :D
start path\VSubst.exe NEWDRIVE FOLDERPATH
exit

Rcpp in Rstudio, can't cache in memory when parallel if I don't open the cpp file in Rstudio

I met a wired problem but I wonder if I'm asking the correct question:
result = parLapply(cl, 1:4,
function(j,rho_list_needed,delta0_needed,
V_iter_s,Sigma_list_needed) {
rhoj = rho_list_needed[[j]]
delta0_in_cpp = delta0_needed
v = as.vector(V_iter_s[,,,j])
sigmaj = Sigma_list_needed[[j]]
sourceCpp('sample_Z.cpp')#first time complie slow,then cashed
return(Sample_Z(rhoj,delta0_in_cpp, v,sigmaj,A,Cmatrix))
},rho_list_needed,delta0_needed,
V_iter[[s]],Sigma_list_needed)
When I was testing my sample_Z.cpp with parallel through parLapply, the single calculation takes around 1 sec. By parallel, my 4 iterations takes around 1.2 secs, which is a big improvement compared to unparalleled version, which is 8 sec.
There's no problem at all when I run my program yesterday. Just now I noticed a bug and revised my program. To give my PC a fresh environment, I restarted my computer. When started to run my program, I only opened the .R file, and run. But it took 9 sec for that parallel, which used to be 1.2 sec. The 9 sec was after warming up my cores, i.e., already sourced the cpp before I time it.
I just don't know where is the bug. Then try to source the cpp file directly in my global merriment, and I found out that there was no caching at all. The second time took the same time as the first one.
But I accidentally opened the sample_Z.cpp in Rstudio, explicitly at the editor. And then, everything works correct now.
I don't know how to search this similar problem on google with what kind of key words and I don't know if opening the cpp file is a must, while I never known before.
Can anyone tell me what's the real issue? Thanks!
After restarting your PC, you probably had extra processes running which would have competed for CPU cores that slowed down your algorithm. The fact you're rebooting suggests to me you're not using Linux... but if you are, watch with top while starting your code, or equivalent for your platform.

Resources