Which string copying method is the faster in Delphi? - performance

I work in Delphi XE2 and I have to make a complicated function that sometimes copies longer parts of strings and sometimes only just characters. It depends on the content of the source string. So the question is that which example method is faster?
Len := Length(Str);
SetLength(Result, Len);
for I := 1 to Len do Result[I] := Str[I];
Len := Length(Str);
SetLength(Result, Len);
Move(Str[1], Result[1], Len * SizeOf(Char));
And I would be also curious how big is the difference in running time.

The Move() alternative - even if Move() was written naively as a byte-by-byte loop (which it is not in the RTL, despite much room for optimization, which we might get soon(tm)) - would be faster, because for every indexed write to a string, the compiler inserts a call to System._UniqueStringU().
To copy a part (if contiguous) of a string into a new string, I would probably use either System.Copy() or System.SetString() instead.
However, if performance matters, my intuition tells me that this part would probably not be the one worth optimizing, but rather to reduce string usage and copying parts of them as new strings. In .NET, that was the reason why they implemented Span<T>, which basically is a length restricted pointer. When dealing with things like string parsing, using such an approach boosts performance way more than optimizing the copying itself.
Bonus: If you write your loop like this, you omit the _UniqueStringU() call, because the SetLength() before already assured that Result is a string with RefCount = 1:
Len := Length(Str);
SetLength(Result, Len);
for I := 1 to Len do PChar(Pointer(Result))[I-1] := Str[I];
I am using a cast to Pointer first to avoid the _UStrToPWChar() call the compiler inserts when doing a string to PChar cast.

Related

Why does setting a Char variable to nil cause compilation to fail?

In my case below I don't want c to be assigned to anything until the first character of the file is read.
I tried setting the Char variable c to nil (c := nil;) but compilation fails. I tried an empty string like below, and still doesn't work.
It works when I set it to an empty space, but it seems peculiar that I have to do that.
Is there any way to initialize a Char to a null like value as you can do in other languages?
program CSVToMarkdown;
{$mode objfpc}{$H+}{$J-}
uses
Sysutils;
var
f: File of Char;
c: Char;
begin
Assign(f, 'test.csv');
Reset(f);
c := '';
while not Eof(f) do
begin
Read(f, c);
Write(c);
end;
Close(f);
ReadLn;
end.
NIL is a value for pointers, or reference types (interfaces,class, dyn arrays) in general.
Non reference types don't have a NIL value, the type char can take values from #0 to #255, and all are valid, though sometimes when interfacing to other languages #0 is interpreted as end of string.
If you mean nullable types like in Java or .NET, there is no default support for them as they have the disadvantage of the type becoming larger than need be (iow becoming a pseudo record with added NULL boolean).
There are some generics based solutions that try to implement nullable types, but I haven't used them, and they are not part of the standard distribution.

Fastest way to allocate a large string in Go?

I need to create a string in Go that is 1048577 characters (1MB + 1 byte). The content of the string is totally unimportant. Is there a way to allocate this directly without concatenating or using buffers?
Also, it's worth noting that the value of string will not change. It's for a unit test to verify that strings that are too long will return an error.
Use strings.Builder to allocate a string without using extra buffers.
var b strings.Builder
b.Grow(1048577)
for i := 0; i < 1048577; i++ {
b.WriteByte(0)
}
s := b.String()
The call to the Grow method allocates a slice with capacity 1048577. The WriteByte calls fill the slice to capacity. The String() method uses unsafe to convert that slice to a string.
The cost of the loop can be reduced by writing chunks of N bytes at a time and filling single bytes at the end.
If you are not opposed to using the unsafe package, then use this:
p := make([]byte, 1048577)
s := *(*string)(unsafe.Pointer(&p))
If you are asking about how to do this with the simplest code, then use the following:
s := string(make([]byte, 1048577)
This approach does not meet the requirements set forth in the question. It uses an extra buffer instead of allocating the string directly.
I ended up using this:
string(make([]byte, 1048577))
https://play.golang.org/p/afPukPc1Esr

Skipping ahead n codepoints while iterating through a unicode string in Go

In Go, iterating over a string using
for i := 0; i < len(myString); i++{
doSomething(myString[i])
}
only accesses individual bytes in the string, whereas iterating over a string via
for i, c := range myString{
doSomething(c)
}
iterates over individual Unicode codepoints (calledrunes in Go), which may span multiple bytes.
My question is: how does one go about jumping ahead while iterating over a string with range Mystring? continue can jump ahead by one unicode codepoint, but it's not possible to just do i += 3 for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?
I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.
I'd consider avoiding the conversion to []rune, and code this directly.
skip := 0
for _, c := range myString {
if skip > 0 {
skip--
continue
}
skip = doSomething(c)
}
It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune is a bit simpler so you may prefer that.
It turns out this can be done quite easily simply by casting the string into a slice of runes.
runes := []rune(myString)
for i := 0; i < len(runes); i++{
jumpHowFarAhead := doSomething(runes[i])
i += jumpHowFarAhead
}

How to ignore fields with sscanf (%* is rejected)

I wish to ignore a particular field whilst processing a string with sscanf.
Man page for sscanf says
An optional '*' assignment-suppression character: scanf() reads input as directed by the conversion specification, but discards the input. No corresponding pointer argument is required, and this specification is not included in the count of successful assignments returned by scanf().
Attempting to use this in Golang, to ignore the 3rd field:
if c, err := fmt.Sscanf(str, " %s %d %*d %d ", &iface.Name, &iface.BTx, &iface.BytesRx); err != nil || c != 3 {
compiles OK, but at runtime err is set to:
bad verb %* for integer
Golang doco doesn't specifically mention the %* conversion specification, but it does say,
Package fmt implements formatted I/O with functions analogous to C's printf and scanf.
It doesn't indicate that %* is not implemented, so... Am I doing it wrong? Or has it just been quietly omitted? ...but then, why does it compile?
To the best of my knowledge there is no such verb (as the format specifiers are called in the fmt package) for this task. What you can do however, is specifying some verb and ignoring its value. This is not particularly memory friendly, though. Ideally this would work:
fmt.Scan(&a, _, &b)
Sadly, it doesn't. So your next best option would be to declare the variables and ignore the one
you don't want:
var a,b,c int
fmt.Scanf("%d %v %d", &a, &b, &c)
fmt.Println(a,c)
%v would read a space separated token. Depending on what you're scanning on, you may fast forward the
stream to the position you need to scan on. See this answer
for details on seeking in buffers. If you're using stdio or you don't know which length your input may
have, you seem to be out of luck here.
It doesn't indicate that %* is not implemented, so... Am I doing it
wrong? Or has it just been quietly omitted? ...but then, why does it
compile?
It compiles because for the compiler a format string is just a string like any other. The content of that string is evaluated at run time by functions of the fmt package. Some C compilers may check format strings
for correctness, but this is a feature, not the norm. With go, the go vet command will try to warn you about format string errors with mismatched arguments.
Edit:
For the special case of needing to parse a row of integers and just caring for some of them, you
can use fmt.Scan in combination with a slice of integers. The following example reads 3 integers
from stdin and stores them in the slice named vals:
ints := make([]interface{}, 3)
vals := make([]int, len(ints))
for i, _ := range ints {
ints[i] = interface{}(&vals[i])
}
fmt.Scan(ints...)
fmt.Println(vals)
This is probably shorter than the conventional split/trim/strconv chain. It makes a slice of pointers
which each points to a value in vals. fmt.Scan then fills these pointers. With this you can even
ignore most of the values by assigning the same pointer over and over for the values you don't want:
ignored := 0
for i, _ := range ints {
if(i == 0 || i == 2) {
ints[i] = interface{}(&vals[i])
} else {
ints[i] = interface{}(&ignored)
}
}
The example above would assign the address of ignore to all values except the first and the second, thus
effectively ignoring them by overwriting.

ResourceString VS Const for string literals

I have a couple of thousands string literals in a Delphi application. They have been isolated in a separate file and used for localization in the past.
Now I don't need localization any more.
Is there any performance penalty in using resourcestring compared to plain constants.
Should I change those to CONST instead?
The const string makes a call to _UStrLAsg and the resource string ends up in LoadResString.
Since the question is about speed there is nothing like doing a test.
resourcestring
str2 = 'str2';
const
str1 = 'str1';
function ConstStr1: string;
begin
result := str1;
end;
function ReceStr1: string;
begin
result := str2;
end;
function ConstStr2: string;
begin
result := str1;
end;
function ReceStr2: string;
begin
result := str2;
end;
procedure Test;
var
s1, s2, s3, s4: string;
begin
s1 := ConstStr1;
s2 := ReceStr1;
s3 := ConstStr2;
s4 := ReceStr2;
end;
For the first time I used AQTime added in DelphiXE to profile this code and here is the result. The time column show Machine Cycles.
I might have done a lot of rookie mistakes profiling this but as I see it there is a difference between const and resourcestring. If the difference is noticeable for a user depends on what you do with the string. In a loop with many iterations it can matter but used to display information to the users, not so much.
Since they are stored in a single file which presumably does little else (well done!), there's no reason not to try it out. I predict it won't make any discernible difference to performance, but I guess it depends on what else you are doing in your app.
Resource strings do incur overhead.
Compared to displaying such a string, or writing it to a file or database, the overhead is not much.
On the other hand it is just a switch from the resourcestring to const keyword (and back if you ever consider to to localization again).

Resources