How does the processing overhead of the length() function in REXX change with the length of the string?
Update: I'm using:
uni-REXX (R) Version 297t
Open-REXX
(TM) Copyright (C) iX Corporation
1989-2002. All rights reserved.
The overhead is 0. The length is stored in a descriptor.
Neil Milsted
Author of uni-REXX (no kidding).
It depends entirely on the implementation. Do you mean REXX for OS/2, REXX for z/VM, REXX for z/OS, OOREXX for Windows, REXX/400 or Regina?
Nothing in the REXX language specs out of IBM dictate how the function is implemented under the covers, it could be O(N) if you scan the string or O(1) if the length is stored with the string somewhere.
If it's really important, best to test with benchmarking code to see if the length makes a difference.
I'm not sure. I've written some Rexx in my days but I've never had performance trouble with the length() function. The way this scales is probably even depending on your implementation of the Rexx parser.
I'd write a Rexx script that times 10.000 calls of "length()" on a 10-character sting, then on a 100 character string, and then on a 1000 character string.
Plotting the resulting times in a graph would give you an approximation on how performance decreases.
Having said all this, my guess is that performance decrease is at most linear, as in O(n). (See http://en.wikipedia.org/wiki/Big_O_notation)
It's language implementation specific. It was a long time since I wrote any REXX now, in fact I wrote AREXX (the Amiga implementation) and it was 15 years ago. :-)
You can write your own test routine. Generate strings of increasing length and measure the time it takes to get length() using a high performance timer. If you store the times and string lengths in a text file based comma separated table you can then plot it using gnuplot. And then you'll see very clearly how it scales.
Edit: I should have checked Rolf's answer first since he wrote more or less the same thing. :-)
I can speak for the IBM Mainframe versions, the Classic Rexx version for OS/2, and any of the Object Rexx implementations. The length is stored in the string descriptor, so the overhead is independent of the string length.
Related
Sometimes I encounter questions about converting sth to bytes. Are anything existing where it is vitally important to convert to bytes or what for could I convert sth to bytes?
In most languages the most common string functions come as part of the language or in a library/include/import that comes pre-made, often employing object code to take advantage of processor based strings functions, however, sometimes you need to do something with a string that isnt natively supported by the language so since 8-bit days, people have viewed strings as an array of 7 or 8-bit characters, which fit within a byte and use conventions like ASCII to determine which byte value represents which character.
While standard languages often have functions like "string.replaceChar(OFFSET,'a')" this methodology can be painstaking slow because each call to the replaceChar method results in processing overhead which may be greater than the processing needing to be done.
There is also the simplicity factor when designing your own string algorithms but like I said, most of the common algorithms come prebuilt in modern languages. (stringCompare, trimString, reverseString, etc).
Suppose you want to perform an operation on a string which doesnt come as standard.
Suppose you want to add two numbers which are represented in decimal digits in strings and the size of these numbers are greater than the 64-bit bus size of the processor? The RSA encryption/descryption behind the SSL browser padlocks employs the use of numbers which dont fit into the word size of a desktop computer but none the less the programs on a desktop which deal with RSA certificates and keys must be able to process these data which are actually strings.
There are many and varied reasons you would want to deal with string, as an array of bytes but each of these reasons would be fairly specialised.
There are places in Windows (or related technologies) where time is counted in 100-nanosecond units.
FILETIME
MFTIME
TimeSpan.TimeSpan(long)
The TimeSpan constructor does call these "ticks" - but since GetTickCount counts milliseconds, this sounds more like a general term for short periods of time than anything specific.
I can describe these things as "100 nanosecond units", but this is a bit unwieldy when writing documentation and comments or naming variables. I can make a term up, but if there's a standard one, or even one that's just somewhat common, then I'd rather use that instead.
"Tick" is indeed a generic term.
DateTime, TimeSpan and DateTimeOffset in .NET use 100ns ticks, as do FILETIME, MFTIME and several others from Win32 APIs.
Environment.TickCount in .NET and GetTickCount from Win32 use 1ms ticks.
System.Diagnostics.Stopwatch uses a tick size that varies from system to system, depending on the hardware capabilities of that system. Its length is determined by its Frequency property.
In the .NET space, it would be acceptable to say "TimeSpan ticks" or "DateTime ticks" (as opposed to "Stopwatch Ticks"). If you just say "ticks", you leave it open for ambiguity.
Indeed, a very common bug you will find is improper usage of someStopwatch.ElapsedTicks. The fix is a simple . character: someStopwatch.Elapsed.Ticks
Speaking generically, since there's not an SI prefix for 10-7, there is no better scientifically valid term for this unit of time than "100-nanosecond units".
You could combine SI prefixes hecto- (denoting a factor of one hundred, or 102) and nano- (denoting one billionth, or 10−9), thus getting the term "hectonanoseconds". However, SI prefixes are not generally allowed to be combined in this way (at least according to Wikipedia and several other sources I found).
I don't think there's a standard term, but I have seen:
hns which stands for "hundreds of nanoseconds"
values padded out to nanoseconds (e.g., 16000 ns instead of 160 hundred-nanoseconds) or expressed as a decimal number of milliseconds (16.000 ms)
tick could be appropriate, as it is typically with respect to whatever clock is in context (GetTickCount is poorly named)
Ticks is the right term, at least in modern C# usage (as described at https://msdn.microsoft.com/en-us/library/system.timespan.ticks(v=vs.110).aspx). GetTickCount predates that but isn't actually attempting to define a tick.
The D programming language, which also uses this timestamp granularity (though not epoch), uses the term "hecto-nanoseconds", abbreviated as hnsecs.
I'm working on a set of Fortran programs that are heavily I/O bound, and so am trying to optimize this. I've read at multiple places that writing entire arrays is faster than individual elements, i.e. WRITE(10)arr is faster than DO i=1,n; WRITE(10) arr(i); ENDDO. But, I'm unclear where my case would fall in this regard. Conceptually, my code is something like:
OPEN(10,FILE='testfile',FORM='UNFORMATTED')
DO i=1,n
[calculations to determine m values stored in array arr]
WRITE(10) m
DO j=1,m
WRITE(10) arr(j)
ENDDO
ENDDO
But m may change each time through the DO i=1,n loop such that writing the whole array arr isn't an option. So, collapsing the DO loop for writing would end up with WRITE(10) arr(1:m), which isn't the same as writing the whole array. Would this still provide a speed-up to writing, what about reading? I could allocate an array of size m after the calculations, assign the values to that array, write it, then deallocate it, but that seems too involved.
I've also seen differing information on implied DO loop writes, i.e. WRITE(10) (arr(j),j=1,m), as to whether they help/hurt on I/O overhead.
I'm running a couple of tests now, and intend to update with my observations. Other suggestions on applicable
Additional details:
The first program creates a large file, the second reads it. And, no, merging the two programs and keeping everything in memory isn't a valid option.
I'm using unformatted I/O and have access to the Portland Group and gfortran compilers. It's my understanding the PG's is generally faster, so that's what I'm using.
The output file is currently ~600 GB, the codes take several hours to run.
The second program (reading in the file) seems especially costly. I've monitored the system and seen that it's mostly CPU-bound, even when I reduce the code to little more than reading the file, indicating that there is very significant CPU overhead on all the I/O calls when each value is read in one-at-a-time.
Compiler flags: -O3 (high optimization) -fastsse (various performance enhancements, optimized for SSE hardware) -Mipa=fast,inline (enables aggressive inter-procedural analysis/optimization on compiler)
UPDATE
I ran the codes with WRITE(10) arr(1:m) and READ(10) arr(1:m). My tests with these agreed, and showed a reduction in runtime of about 30% for the WRITE code, the output file is also slightly less than half the original's size. For the second code, reading in the file, I made the code do basically nothing but read the file to compare pure read time. This reduced the run time by a factor of 30.
If you use normal unformatted (record-oriented) I/O, you also write a record marker before and after the data itself. So you add eight bytes (usually) of overhead to each data item, which can easily (almost) double the data written to disc if your number is a double precision. The runtime overhead mentioned in the other answers is also significant.
The argument above does not apply if you use unformatted stream.
So, use
WRITE (10) m
WRITE (10) arr(1:m)
For gfortran, this is faster than an implied DO loop (i.e. the solution WRITE (10) (arr(i),i=1,m)).
In the suggested solution, an array descriptor is built and passed to the library with a single call. I/O can then be done much more efficiently, in your case taking advantage of the fact that the data is contiguous.
For the implied DO loop, gfortran issues multiple library calls, with much more overhead. This could be optimized, and is subject of a long-standing bug report, PR 35339, but some complicated corner cases and the presence of a viable alternative have kept this from being optimized.
I would also suggest doing I/O in stream access, not because of the rather insignificant saving in space (see above) but because keeping up the leading record marker up to date on writing needs a seek, which is additional effort.
If your data size is very large, above ~ 2^31 bytes, you might run into different behavior with record markers. gfortran uses subrecords in this case (compatible to Intel), but it should just work. I don't know what Portland does in this case.
For reading, of course, you can read m, then allocate an allocatable array, then read the whole array in one READ statement.
The point of avoiding outputting an array by looping over multiple WRITE() operations is to avoid the multiple WRITE() operations. It's not particularly important that the data being output are all the members of the array.
Writing either an array section or a whole array via a single WRITE() operation is a good bet. An implied DO loop cannot be worse than an explicit outer loop, but whether it's any better is a question of compiler implementation. (Though I'd expect the implied-DO to be better than an outer loop.)
Background: I'm writing a toy Lisp (Scheme) interpreter in Haskell. I'm at the point where I would like to be able to compile code using LLVM. I've spent a couple days dreaming up various ways of feeding untyped Lisp values into compiled functions that expect to know the format of the data coming at them. It occurs to me that I am not the first person to need to solve this problem.
Question: What are some historically successful ways of mapping untyped data into an efficient binary format.
Addendum: In point of fact, I do know which of about a dozen different types the data is, I just don't know which one might be sent to the function at compile time. The function itself needs a way to determine what it got.
Do you mean, "I just don't know which [type] might be sent to the function at runtime"? It's not that the data isn't typed; certainly 1 and '() have different types. Rather, the data is not statically typed, i.e., it's not known at compile time what the type of a given variable will be. This is called dynamic typing.
You're right that you're not the first person to need to solve this problem. The canonical solution is to tag each runtime value with its type. For example, if you have a dozen types, number them like so:
0 = integer
1 = cons pair
2 = vector
etc.
Once you've done this, reserve the first four bits of each word for the tag. Then, every time two objects get passed in to +, first you perform a simple bit mask to verify that both objects' first four bits are 0b0000, i.e., that they are both integers. If they are not, you jump to an error message; otherwise, you proceed with the addition, and make sure that the result is also tagged accordingly.
This technique essentially makes each runtime value a manually-tagged union, which should be familiar to you if you've used C. In fact, it's also just like a Haskell data type, except that in Haskell the taggedness is much more abstract.
I'm guessing that you're familiar with pointers if you're trying to write a Scheme compiler. To avoid limiting your usable memory space, it may be more sensical to use the bottom (least significant) four bits, rather than the top ones. Better yet, because aligned dword pointers already have three meaningless bits at the bottom, you can simply co-opt those bits for your tag, as long as you dereference the actual address, rather than the tagged one.
Does that help?
Your default solution should be a simple tagged union. If you want to narrow your typing down to more specific types, you can do it - but it won't be that "toy" any more. A thing to look at is called abstract interpretation.
There are few successful implementations of such an optimisation, with V8 being probably the most widespread. In the Scheme world, the most aggressively optimising implementation is Stalin.
I have an array of integers
a = [1,2,3,4]
When I do
a.join
Ruby internally calls the to_s method 4 times, which is too slow for my needs.
What is the fastest method to output an big array of integers to console?
I mean:
a = [1,2,3,4........,1,2,3,9], should be:
1234........1239
If you want to print an integer to stdout, you need to convert it to a string first, since that's all stdout understands. If you want to print two integers to stdout, you need to convert both of them to a string first. If you want to print three integers to stdout, you need to convert all three of them to a string first. If you want to print one billion integers to stdout, you need to convert all one billion of them to a string first.
There's nothing you, we, or Ruby, or really any programming language can do about that.
You could try interleaving the conversion with the I/O by doing a lazy stream implementation. You could try to do the conversion and the I/O in parallel, by doing a lazy stream implementation and separating the conversion and the I/O into two separate threads. (Be sure to use a Ruby implementation which can actually execute parallel threads, not all of them can: MRI, YARV and Rubinius can't, for example.)
You can parallelize the conversion, by converting separate chunks in the array in separate threads in parallel. You can even buy a billion core machine and convert all billion integers at the same time in parallel.
But even then, the fact of the matter remains: every single integer needs to be converted. Whether you do that one after the other first, and then print them or do it one after the other interleaved with the I/O or do it one after the other in parallel with the I/O or even convert all of them at the same time on a billion core CPU: the number of needed conversions does not magically decrease. A large number of integers means a large number of conversions. Even if you do all billion conversions in a billion core CPU in parallel, it's still a billion conversions, i.e. a billion calls to to_s.
As stated in the comments above if Fixnum.to_s is not performing quickly enough for you then you really need to consider whether Ruby is the correct tool for this particular task.
However, there are a couple of things you could do that may or may not be applicable for your situation.
If the building of the array happens outside the time critical area then build the array, or a copy of the array with strings instead of integers. With my small test of 10000 integers this is approximately 5 times faster.
If you control both the reading and the writing process then use Array.pack to write the output and String.unpack to read the result. This may not be quicker as pack seems to call Fixnum.to_int even when the elements are already Integers.
I expect these figures would be different with each version of Ruby so it is worth checking for your particular target version.
The slowness in you program does not come from to_s being called 4 times, but from printing to the console. Console output is slow, and you can't really do anything about it.
For single digits you can do this
[1,2,3,4,5].map{|x|(x+48).chr}.join
If you need to speed up larger numbers you could try memoizing the result of to_s
Unless you really need to see the numbers on the console (and it sound like you do not) then write them to a file in binary - should be much faster.
And you can pipe binary files into other programs if that is what you need to do, not just text.