VBA I/O Performance - performance

I'd like to know if there is a performance difference between those two codes :
Open strFile For Output As #fNum
For var1 = 1 to UBound(strvar1)
For var2 = 1 to UBound(strvar2)
For var3 = 1 to UBound(strvar3)
For var4 = 1 to UBound(strvar4)
Print #fNum texte
Next var4
Next var3
Next var2
Next var1
Close #fNum
And
For var1 = 1 to UBound(strvar1)
For var2 = 1 to UBound(strvar2)
For var3 = 1 to UBound(strvar3)
For var4 = 1 to UBound(strvar4)
texteTotal = texteTotal + texte
Next var4
Next var3
Next var2
Next var1
Open strFile For Output As #fNum
Print #fNum texteTotal
Close #fNum
In case, the loops are pretty big ?

You'll have to try it, because it depends on the size of texte.
Each time you do texteTotal = texteTotal + texte, vba makes a fresh copy of textTotal. As textTotal gets larger and larger, your loop will slow down.
You also run the risk of creating a string larger than vba can handle.
So:
If you are writing to a network drive, and texte is a single character, the second approach will probably be better.
If you are writing to a fast local disc, and texte is 64kb, and the arrays are 1M entries each, the first approach will be better.

Since you said that texte and texteTotal are strings, I have a couple of suggestions:
1. Always concatenate strings with the & operator.
In VBScript, there are two ways to concatenate (add together) two string variables: the & operator and the + operator. The + operator is normally used to add together two numeric values, but is retained for backwards compatibility with older versions of BASIC that did not have the & operator for strings. Because the & operator is available in VBScript, it is recommended that you always prefer using it to concatenate strings and reserve the + for adding together numeric values. This won't necessarily provide any speed increase, but it eliminates any ambiguity in your code and makes clear your original intention. This is especially important in VBScript where you're working with Variants. When you use the + operator on a string that may contain numeric values, you have no way of knowing whether it will add the two numeric values together or combine the two strings.
2. Remember that string concatenation in VBScript has tremendous overhead and is very inefficient.
Unlike VB.NET and Java, VBScript does not have a StringBuilder class to aid in the creation of large strings. Instead, whenever you repeatedly add things to the end of a string variable, VB will repeatedly copy that variable over and over again. When you're building a string in a loop like you do in the above code, this can really degrade performance as VB constantly allocates space for the new string variable and performs a copy. With each iteration of the loop, the concatenation becomes slower and slower (in geek-speak, you're dealing with an n² algorithm, where n = the number of concatenations). The problem gets even worse if the string exceeds 64K in size. VB can store small strings in a 64K cache, but if a string becomes larger than the cache, performance drops even more. This kind of thing is of course invisible to the programmer for simplicity, but if you're concerned about optimization, it's something to understand is happening in the background.
In light of the above information, let's revisit the two code samples that you posted. You said that `texte` is "not very big but there are hundreds [of] thousands [of] lines." That means you may easily run out of space in the 64K string cache, and eventually you may even run out of space in the RAM allocated to the script. The limit varies, but you can eventually get "Out of Memory" errors, depending on how large your string grows. Even if you're not anywhere near that point now, it's worth considering for the future. If someone goes back later to add functionality to the script, will they remember or bother to change the string concatenation algorithm? Probably not.
To prevent any "Out of Memory" errors from cropping up, you can simply stop keeping the text string in RAM and write it directly to a file instead. This makes even more sense in your case because that's what you're eventually going to do with the string anyway! Why waste CPU cycles and space in memory by continually allocating and reallocating new string variables for each iteration of the loop when you could just write the value to the file and forget about it? I'd say that your first code sample is the simplest and preferred method to accomplish what you want.
The only time that I would consider the second method is if you were dealing with file I/O that was extremely inefficient, such as a disk being accessed over a network connection. As long as you're writing to a local disk, you're not going to have any performance problems here. The other concern, pointed out by #astander, is that the first method leaves the file you're writing to open for a long period of time, placing a lock on that resource. While this is an important concern, I think its implication for your application is minimal, as I assume that you're creating and writing to your own private file that no other application is expected to be able to access.
However, despite my recommendation, the first method is still not the most optimized and efficient way to concatenate strings. The best way would be a VBScript implementation of a StringBuilder class that stores the string in memory in an array of bytes as it is created. Periodically, as you add text to the string, the concatenation class would allocate more space to the array to hold the additional text. This would be much faster than the native VBScript implementation of string concatenation because it would perform these reallocations far less often. Additionally, as the string you're building in memory grows larger, your StringBuilder class could flush the contents of the string in memory to the file, and start over with an empty string. You could use something like Francesco Balena's CString class (http://www.vbcode.com/asp/showsn.asp?theID=415), or Microsoft's example (complete with some benchmarks and a further explanation) available here: http://support.microsoft.com/kb/170964.

I think the biggest difference would be the period for which you have the file open.
In the second case I would assume that it will be open for a shorter period of time, which is better as you should only ever lock resources for the smallest period of time required.

Cody Thank you very much for your time.
Here are a few more information: The Code Sample #1 is the current production code. To be more precise, this is one part of the whole process from
1)getting information from a DB #1
2)calculation with mathematics formulas (matrices, vectors) , N= BIG
3)copy results to txt files ( 10 k lignes + ) each.
4)psql queries to insert to Databases
5)restitution I am wondering if the copy to txt files is really necessary and costly compared to psql insertions? I like your idea of building a custom string class, do you think it can overtake the I/O performance?

Ran into this issue recently, where I was writing large amounts of text (~100k lines) to a network file. As each Print command creates I/O activity, the process of writing the file was terribly slow. However, creating a large string by concatenating new lines to it proved to be very slow as well, as explained in the other answers.
I solved this problem by writing the individual lines to a buffer array, then joining this array into a string, which is then written to the file at once.
Based on your example, it would be something like:
Dim buffer() as Variant
Dim i as Long
i = 1
ReDim buffer(1 to Ubound(strvar1) * Ubound(strvar2) * Ubound(strvar3) * Ubound(strvar4)
For var1 = 1 to UBound(strvar1)
For var2 = 1 to UBound(strvar2)
For var3 = 1 to UBound(strvar3)
For var4 = 1 to UBound(strvar4)
buffer(i) = texte
i = i + 1
Next var4
Next var3
Next var2
Next var1
Open strFile For Output As #fNum
Print #fNum Join(buffer, vbCrLf)
Close #fNum
This prevents both the overhead of the incremental concatenations (the Join function scales linear with the amount of lines, instead of exponential as the concatenation does), and the I/O overhead of writing many lines individually to a network file.

Related

Why is appending '$' to the end of string functions faster?

I was using a code-parser that checks for syntax errors and such, running through some old code, and saw the following suggestion:
Str functions should use $( for speed:
Trim(
I figure that using the '$' enforces strong typing (so we know it's going to be a string), but how much time does that really save?
So the question is: what extra steps happen behind the scenes without the '$' in place and how expensive are these extra steps?
It's pretty much guaranteed to be negligible these days, but I'm curious as to the relative cost; if not having the '$' costs 4 ticks, but having '$' costs 1 tick, then that's a pretty significant increase in relative performance.
VB6 has two versions of string functions. "Normal" string functions return Variant datatypes. The functions with $ return String datatypes. Using strings ist always faster than using variants. Because of this, you should always use $-functions.

Incremental text file processing for parallel processing

I'm at the first experience with the Julia language, and I'm quite surprises by its simplicity.
I need to process big files, where each line is composed by a set of tab separated strings. As a first example, I started by a simple count program; I managed to use #parallel with the following code:
d = open(f)
lis = readlines(d)
ntrue = #parallel (+) for li in lis
contains(li,s)
end
println(ntrue)
close(d)
end
I compared the parallel approach against a simple "serial" one with a 3.5GB file (more than 1 million lines). On a 4-cores Intel Xeon E5-1620, 3.60GHz, with 32GB of RAM, What I've got is:
Parallel = 10.5 seconds; Serial = 12.3 seconds; Allocated Memory = 5.2
GB;
My first concern is about memory allocation; is there a better way to read the file incrementally in order to lower the memory allocation, while preserving the benefits of parallelizing the processing?
Secondly, since the CPU gain related to the use of #parallel is not astonishing, I'm wondering if it might be related to the specific case itself, or to my naive use of the parallel features of Julia? In the latter case, what would be the right approach to follow? Thanks for the help!
Your program is reading all of the file into memory as a large array of strings at once. You may want to try a serial version that processes the lines one at a time instead (i.e. streaming):
const s = "needle" # it's important for this to be const
open(f) do d
ntrue = 0
for li in eachline(d)
ntrue += contains(li,s)
end
println(ntrue)
end
This avoids allocating an array to hold all of the strings and avoids allocating all of string objects at once, allowing the program to reuse the same memory by periodically reclaiming it during garbage collection. You may want to try this and see if that improves the performance sufficiently for you. The fact that s is const is important since it allows the compiler to predict the types in the for loop body, which isn't possible if s could change value (and thus type) at any time.
If you still want to process the file in parallel, you will have to open the file in each worker and advance each worker's read cursor (using the seek function) to an appropriate point in the file to start reading lines. Note that you'll have to be careful to avoid reading in the middle of a line and you'll have to make sure each worker does all of the lines assigned to it and no more – otherwise you might miss some instances of the search string or double count some of them.
If this workload isn't just an example and you actually just want to count the number of lines in which a certain string occurs in a file, you may just want to use the grep command, e.g. calling it from Julia like this:
julia> s = "boo"
"boo"
julia> f = "/usr/share/dict/words"
"/usr/share/dict/words"
julia> parse(Int, readchomp(`grep -c -F $s $f`))
292
Since the grep command has been carefully optimized over decades to search text files for lines matching certain patterns, it's hard to beat its performance. [Note: if it's possible that zero lines contain the pattern you're looking for, you will want to wrap the grep command in a call to the ignorestatus function since the grep command returns an error status code when there are no matches.]

Fortran implied do write speedup

tl;dr: I found that an "implied do" write was slower than an explicit one under certain circumstances, and want to understand why/if I can improve this.
Details:
I've got a code that does something to the effect of:
DO i=1,n
calculations...
!m, x, and y all change each pass through the loop
IF(m.GT.1)THEN
DO j=1,m
WRITE(10,*)x(j),y(j) !where 10 is an output file
ENDDO
ENDIF
ENDDO
The output file ends up being fairly large, and so it seems like the writing is a big performance factor, so I wanted to optimize it. Before anyone asks, no, moving away from ASCII isn't an option due to various downstream requirements. Accordingly, I rewrote the IF statement (and contents) as:
IF(m.GT.1)THEN
!build format statement for write
WRITE(mm1,*)m-1
mm1=ADJUSTL(mm1)
!implied do write statement
WRITE(10,'('//TRIM(mm1)//'(i9,1x,f7.5/),i9,1x,f7.5)')(x(j),y(j),j=1,m)
ELSEIF(m.EQ.1)THEN
WRITE(10,'(i9,1x,f7.5)')x(1),y(1)
ENDIF
This builds the format statement according to the # of values to be written out, then does a single write statement to output things. I've found that the code actually runs slower with this formulation. For reference, I've seen significant speedup on the same system (hardware and software) when going to an implied do write statement when the amount of data to be written was fixed. Under the assumption that the WRITE statement, itself, is faster, then that would mean the overhead from the couple of lines building that statement are what take the added time, but that seems hard to believe. For reference, m can vary a fair amount, but probably averages at least 1000. Is the concatenation of strings // a very slow operator, or is there something else I'm missing? Thanks in advance.
I haven't specific timing information to add, but your data transfer with an implied do loop is needlessly complicated.
In the first fragment, with the explicit looping, you are writing each pair of numbers to distinct records and you wish to repeat this output with the implied do loop. To do this, you use the slash edit descriptor to terminate each record once a pair has been written.
The needless complexity comes from two areas:
you have distinct cases for one/more than one pair;
for the more-than-one case you construct a format including a "dynamic" repeat count.
As Vladimir F comments you could just use a very large repeat count: it isn't erroneous for an edit descriptor to be processed when there are no more items to be written. The output terminates (successfully) when reaching such a non-matching descriptor. You could, then, just write
WRITE(10,'(*(i9,1x,f7.5/))') (x(j),y(j),j=1,m) ! * replacing a large count
rather than the if construct and the format creation.
Now, this doesn't quite match your first output. As I mentioned above, output termination comes about when a data edit descriptor is reached when there is no corresponding item to output. This means that / will be processed before that happens: you have a final empty record.
The colon edit descriptor is useful here:
WRITE(10,'(*(i9,1x,f7.5,:,/))') (x(j),y(j),j=1,m)
On reaching a : processing stops immediately if there is no remaining output item to process.
But my preferred approach is the far simpler
WRITE(10,'(i9,1x,f7.5)') (x(j),y(j),j=1,m) ! No repeat count
You had the more detailed format to include record termination. However, we have what is known as format reversion: if a format end is reached and more remains to be output then the record is terminated and processing goes back to the start of the format.
Whether these things make your output faster remains to be seen, but they certainly make the code itself much cleaner and clearer.
As a final note, it used to be trendy to avoid additional X editing. If your numbers fit inside the field of width 7 then 1x,f7.5 could be replaced by f8.5 and have the same look: the representation is right-justified in the field. It was claimed that this reduction had performance benefits with fewer switching of descriptors.

Fortran unformatted I/O optimization

I'm working on a set of Fortran programs that are heavily I/O bound, and so am trying to optimize this. I've read at multiple places that writing entire arrays is faster than individual elements, i.e. WRITE(10)arr is faster than DO i=1,n; WRITE(10) arr(i); ENDDO. But, I'm unclear where my case would fall in this regard. Conceptually, my code is something like:
OPEN(10,FILE='testfile',FORM='UNFORMATTED')
DO i=1,n
[calculations to determine m values stored in array arr]
WRITE(10) m
DO j=1,m
WRITE(10) arr(j)
ENDDO
ENDDO
But m may change each time through the DO i=1,n loop such that writing the whole array arr isn't an option. So, collapsing the DO loop for writing would end up with WRITE(10) arr(1:m), which isn't the same as writing the whole array. Would this still provide a speed-up to writing, what about reading? I could allocate an array of size m after the calculations, assign the values to that array, write it, then deallocate it, but that seems too involved.
I've also seen differing information on implied DO loop writes, i.e. WRITE(10) (arr(j),j=1,m), as to whether they help/hurt on I/O overhead.
I'm running a couple of tests now, and intend to update with my observations. Other suggestions on applicable
Additional details:
The first program creates a large file, the second reads it. And, no, merging the two programs and keeping everything in memory isn't a valid option.
I'm using unformatted I/O and have access to the Portland Group and gfortran compilers. It's my understanding the PG's is generally faster, so that's what I'm using.
The output file is currently ~600 GB, the codes take several hours to run.
The second program (reading in the file) seems especially costly. I've monitored the system and seen that it's mostly CPU-bound, even when I reduce the code to little more than reading the file, indicating that there is very significant CPU overhead on all the I/O calls when each value is read in one-at-a-time.
Compiler flags: -O3 (high optimization) -fastsse (various performance enhancements, optimized for SSE hardware) -Mipa=fast,inline (enables aggressive inter-procedural analysis/optimization on compiler)
UPDATE
I ran the codes with WRITE(10) arr(1:m) and READ(10) arr(1:m). My tests with these agreed, and showed a reduction in runtime of about 30% for the WRITE code, the output file is also slightly less than half the original's size. For the second code, reading in the file, I made the code do basically nothing but read the file to compare pure read time. This reduced the run time by a factor of 30.
If you use normal unformatted (record-oriented) I/O, you also write a record marker before and after the data itself. So you add eight bytes (usually) of overhead to each data item, which can easily (almost) double the data written to disc if your number is a double precision. The runtime overhead mentioned in the other answers is also significant.
The argument above does not apply if you use unformatted stream.
So, use
WRITE (10) m
WRITE (10) arr(1:m)
For gfortran, this is faster than an implied DO loop (i.e. the solution WRITE (10) (arr(i),i=1,m)).
In the suggested solution, an array descriptor is built and passed to the library with a single call. I/O can then be done much more efficiently, in your case taking advantage of the fact that the data is contiguous.
For the implied DO loop, gfortran issues multiple library calls, with much more overhead. This could be optimized, and is subject of a long-standing bug report, PR 35339, but some complicated corner cases and the presence of a viable alternative have kept this from being optimized.
I would also suggest doing I/O in stream access, not because of the rather insignificant saving in space (see above) but because keeping up the leading record marker up to date on writing needs a seek, which is additional effort.
If your data size is very large, above ~ 2^31 bytes, you might run into different behavior with record markers. gfortran uses subrecords in this case (compatible to Intel), but it should just work. I don't know what Portland does in this case.
For reading, of course, you can read m, then allocate an allocatable array, then read the whole array in one READ statement.
The point of avoiding outputting an array by looping over multiple WRITE() operations is to avoid the multiple WRITE() operations. It's not particularly important that the data being output are all the members of the array.
Writing either an array section or a whole array via a single WRITE() operation is a good bet. An implied DO loop cannot be worse than an explicit outer loop, but whether it's any better is a question of compiler implementation. (Though I'd expect the implied-DO to be better than an outer loop.)

handle the total of Integers exceeding Long

I have the following code :
Dim L as Integer
Dim R as Integer
Dim a as Integer
a=((L+R)/2)
Now (L+R) exceeds limit of Integer.
In order to handle this case:
I have following three options:
Define L (or R) as Long
Write a= ((CLng(L)+R)/2)
Declare new variable as Long :
Like this
Dim S as Long
S=S+L+R
I am confused which one is the best to implement?
Change all the variables to Long.
The code will be more robust.
The code will execute faster.
The additional 2 bytes of memory per variable is totally insignificant, unless you have many millions of these integer variables in use simultaneously.
You've already posted several questions here about integer overflow errors. With all respect, I really advise you to just change all your Integer variables to Long and get on with your coding.
I'd pick #2. I think (not sure) that this uses a little less memory than #1 because there's only one Long -value in the equation where as changing L or R to Long would require space for 2 Long values.
I'm thinking #2 and #3 might end up looking the same (or pretty damn close) after compile and I personally think that in this case an extra variable wouldn't make it more readable. The difference of course is that in #2 the result of the L+R might not need to be saved anywhere, but only moved between registers for the calculation.
I'm thinking alot here, but I'm posting this partly because I hope that if I'm wrong, someone would correct me. Anyway, with the reasoning above, I'd go with #2. Edit: at least I'm quite certain that if one of the options uses less memory than the others, it's #2, but they might all be the same in that regard.

Resources