Long strings in pascal - pascal

I want to be able to use a string that is quite long (not longer then 100000 signs).
As far as I know a typical string variable can cotain only up to 256 chars.
Is there a way to store such a long string?

Old-style (Turbo Pascal, or Delphi 1) strings, now known as ShortString, are limited to 255 characters (byte 0 was reserved for the string length). This appears to still be the default in FreePascal (according to #MarcovandeVoort's comment below). Keep reading, though, until you get to the discussion and code sample for AnsiString below. :-)
Currently, most other dialects of Pascal I'm aware of default to either AnsiString (long strings of single byte characters) or UnicodeString (long strings of multi-byte characters). Neither of those are limited to 255 characters.
The current versions of Delphi defaults to UnicodeString as the default type, so declaring a string variable is in fact a long UnicodeString. There is no practical upper limit to the string length:
var
Test: string; // Declare a new Unicode string
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
If you want to force single-byte characters (but not be limited to 255 character strings), use AnsiString (which can set as the default string type in FreePascal if you use the {$H+} compiler directive - thanks #MarcovandeVoort):
var
Test: AnsiString; // Declare a new Ansistring
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
Finally, if you do for some unknown reason want to use the old style ShortString that is restricted to 255 characters, declare it as such, either using ShortString or the old style String[Size] declaration:
var
Test: ShortString; // Declare a new short string of 255 characters
ShortTest: String[100]; // Also a ShortString of 100 characters
begin
// This line won't compile, because it's too large for Test
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;

In Free Pascal, you do not need to be worry about this. You only need to insert the directive {$H+} at the beginning of the source code.
{$H+}
var s: String;
begin
s := StringOfChar('X', 1000);
writeln(s);
end.

You can use the AnsiString type.

Related

Binary file error in Lazarus Pascal with custom records - error SIGSEGV

I don't work with Pascal very often so I apologise if this question is basic. I am working on a binary file program that writes an array of custom made records to a binary file.
Eventually I want it to be able to write multiple arrays of different custom record types to one single binary file.
For that reason I thought I would write an integer first being the number of bytes that the next array will be in total. Then I write the array itself. I can then read the first integer type block - to tell me the size of the next blocks to read in directly to an array.
For example - when writing the binary file I would do something like this:
assignfile(f,MasterFileName);
{$I-}
reset(f,1);
{$I+}
n := IOResult;
if n<> 0 then
begin
{$I-}
rewrite(f);
{$I+}
end;
n:= IOResult;
If n <> 0 then
begin
writeln('Error creating file: ', n);
end
else
begin
SetLength(MyArray, 2);
MyArray[0].ID := 101;
MyArray[0].Att1 := 'Hi';
MyArray[0].Att2 := 'MyArray 0 - Att2';
MyArray[0].Value := 1;
MyArray[1].ID := 102;
MyArray[1].Att1:= 'Hi again';
MyArray[1].Att2:= MyArray 1 - Att2';
MyArray[1].Value:= 5;
SizeOfArray := sizeOf(MyArray);
writeln('Size of character array: ', SizeOfArray);
writeln('Size of integer var: ', sizeof(SizeOfArray));
blockwrite(f,sizeOfArray,sizeof(SizeOfArray),actual);
blockwrite(f,MyArray,SizeOfArray,actual);
Close(f);
Then you could re-read the file with something like this:
Assign(f, MasterFileName);
Reset(f,1);
blockread(f,SizeOfArray,sizeof(SizeOfArray),actual);
blockread(f,MyArray,SizeOfArray,actual);
Close(f);
This has the idea that after these blocks have been read that you can then have a new integer recorded and a new array then saved etc.
It reads the integer parts of the records in but nothing for the strings. The record would be something like this:
TMyType = record
ID : Integer;
att1 : string;
att2 : String;
Value : Integer;
end;
Any help gratefully received!!
TMyType = record
ID : Integer;
att1 : string; // <- your problem
That field att1 declared as string that way means that the record contains a pointer to the actual string data (att1 is really a pointer). The compiler manages this pointer and the memory for the associated data, and the string can be any (reasonable) length.
A quick fix for you would be to declare att1 something like string[64], for example: a string which can be at maximum 64 chars long. That would eliminate the pointer and use the memory of the record (the att1 field itself, which now is a special static array) as buffer for string characters. Declaring the maximum length of the string, of course, can be slightly dangerous: if you try to assign the string a string too long, it will be truncated.
To be really complete: it depends on the compiler; some have a switch to make your declaration "string" usable, making it an alias for "string[255]". This is not the default though. Consider also that using string[...] is faster and wastes memory.
You have a few mistakes.
MyArray is a dynamic array, a reference type (a pointer), so SizeOf(MyArray) is the size of a pointer, not the size of the array. To get the length of the array, use Length(MyArray).
But the bigger problem is saving long strings (AnsiStrings -- the usual type to which string maps --, WideStrings, UnicodeStrings). These are reference types too, so you can't just save them together with the record. You will have to save the parts of the record one by one, and for strings, you will have to use a function like:
procedure SaveStr(var F: File; const S: AnsiString);
var
Actual: Integer;
Len: Integer;
begin
Len := Length(S);
BlockWrite(F, Len, SizeOf(Len), Actual);
if Len > 0 then
begin
BlockWrite(F, S[1], Len * SizeOf(AnsiChar), Actual);
end;
end;
Of course you should normally check Actual and do appropriate error handling, but I left that out, for simplicity.
Reading back is similar: first read the length, then use SetLength to set the string to that size and then read the rest.
So now you do something like:
Len := Length(MyArray);
BlockWrite(F, Len, SizeOf(Len), Actual);
for I := Low(MyArray) to High(MyArray) do
begin
BlockWrite(F, MyArray[I].ID, SizeOf(Integer), Actual);
SaveStr(F, MyArray[I].att1);
SaveStr(F, MyArray[I].att2);
BlockWrite(F, MyArray[I].Value, SizeOf(Integer), Actual);
end;
// etc...
Note that I can't currently test the code, so it may have some little errors. I'll try this later on, when I have access to a compiler, if that is necessary.
Update
As Marco van de Voort commented, you may have to do:
rewrite(f, 1);
instead of a simple
rewrite(f);
But as I replied to him, if you can, use streams. They are easier to use (IMO) and provide a more consistent interface, no matter to what exactly you try to write or read. There are streams for many different kinds of I/O, and all derive from (and are thus compatible with) the same basic abstract TStream class.

Japan character encoding

I have Japanese string of 'ぱはめ'. I want to convert it into '%82%CF%82%CD%82%DF'. I hope someone will give me a function for this converting.
You need to take the string and encode it in a specific code page. Then take each encoded byte and produce its hex representation. Like this:
function MyEncode(const S: string; const CodePage: Integer): string;
var
Encoding: TEncoding;
Bytes: TBytes;
b: Byte;
sb: TStringBuilder;
begin
Encoding := TEncoding.GetEncoding(932);
try
Bytes := Encoding.GetBytes(S);
finally
Encoding.Free;
end;
sb := TStringBuilder.Create;
try
for b in Bytes do begin
sb.Append('%');
sb.Append(IntToHex(b, 2));
end;
Result := sb.ToString;
finally
sb.Free;
end;
end;
Although you have not stated this, you wish to encode the text as code page 932. So you should pass that value when calling the function.
Writeln(MyEncode('ぱはめ', 932));
I must say that in the modern day, it is somewhat surprising to see this Windows specific multi byte encoding still in use.

Delphi 7 WriteProcessMemory

This is my Working Code
DriftMul:=99;
WriteProcessMemory(HandleWindow, ptr($4E709C), #DriftMul, 2, Write);
I want to Convert it without using a variable but it wont work
Below is just an Example of what i want to do.
WriteProcessMemory(HandleWindow, ptr($4E709C), ptr(99), 2, Write);
Does anyone know a way to make this work with using a variable???
I am able to program in a few languages and every language i use their is a
way to to do this. The reason i want to do this is because i am gonna be making a big program that does alot of writing of different values and it will save me around 300+ lines. Below is an Example in c++ i was using.
WriteProcessMemory(hProcess, (void*)0x4E709C, (void*)(PBYTE)"\x20", 1, NULL);
Update:
Solved it
Im using 4 Procedures that i call depending on how many bytes i want to write.
procedure Wpm(Address: Cardinal; ChangeValues: Byte);
Begin
WriteProcessMemory(HandleWindow, Pointer(Address), #ChangeValues, 1, Write);
End;
procedure Wpm2(Address: Cardinal; ChangeValues: Word);
Begin
WriteProcessMemory(HandleWindow, Pointer(Address), #ChangeValues, 2, Write);
End;
procedure Wpm3(Address: Cardinal; ChangeValues: Word);
Begin
WriteProcessMemory(HandleWindow, Pointer(Address), #ChangeValues, 3, Write);
End;
procedure Wpm4(Address: Cardinal; ChangeValues: Cardinal);
Begin
WriteProcessMemory(HandleWindow, Pointer(Address), #ChangeValues, 4, Write);
End;
Example writes
Wpm($477343,$EB);
Wpm2($40A889,$37EB);
Wpm3($416E34,$0086E9);
Pchar is the only method i found to compile without procedures, i dont want to use assci though.
WriteProcessMemory(HandleWindow, Pointer($449A17), PChar('90'), 1, Write);
You have to store the contents of the word that you are writing somewhere. WriteProcessMemory expects a pointer to some memory in your process space. If you don't want to use a variable, use a constant.
const
DriftMul: word=99;
....
WriteProcessMemory(HandleWindow, ptr($4E709C), #DriftMul, 2, Write);
Passing ptr(99) fails because ptr(99) is not a pointer to a word containing the value 99. It is a pointer to address 99. I think you were trying to write #Word(99) but you cannot take the address of a true constant.
You can make this more convenient by wrapping up the call to WriteProcessMemory in a helper methods. Although your question suggests that you want to write Word values, it became apparent in out lengthy chat that you actually want to write byte sequences. Writing integer data types will lead to machine endianness confusion. So instead I would do it using an open array of Byte to give the flexibility at the call site.
procedure WriteBytes(hProcess: THandle; Address: Pointer;
const Buffer: array of Byte);
var
NumberOfBytesWritten: DWORD;
begin
if not WriteProcessMemory(hProcess, Address, #Buffer[0], Length(Buffer),
NumberOfBytesWritten) then RaiseLastOSError;
end;
You can then call the code
WriteBytes(Handle, Pointer($523328), [$42]);//single byte
WriteBytes(Handle, Pointer($523328), [$CC, $90, $03]);//3 bytes
In C++, this code:
WriteProcessMemory(hProcess, (void*)0x4E709C, (void*)(PBYTE)"\x20", 1, NULL);
Is declaring a const char[] buffer in the app's memory that contains the two characters '\x20' and '\x00' in it. This is evident by the use of the " double-quote characters around the literal. They are creating a string literal, not a character literal (which uses ' single-quote character instead). The starting address of that literal's first character is being passed to the third parameter and the fourth parameter is set to 1 to tell WriteProcessMemory() to copy only 1 byte from that 2-byte buffer.
Delphi, on the other hand, uses the ' single-quote character around both single-character and string literals, and thus relies on code context to decide which type of literal needs to be created. As such, Delphi does not have a direct means of declaring a single-character literal that is the equivilent of an inlined char[] like in the C++ code. The closest equivilent I can think of right now, without declaring a constant, would be something like this:
WriteProcessMemory(hProcess, Pointer($4E709C), PAnsiChar(AnsiString(' ')), 1, nil);
Otherwise, use just an explicit constant instead. The direct equivilent of what the C++ code is doing is the following:
const
buffer: array[0..1] of AnsiChar = (#$20, #0);
WriteProcessMemory(hProcess, Pointer($4E709C), Pointer(PByte(#buffer[0])), 1, nil);
Alternatively, you can simplify it to the following:
const
space: Byte = $20;
WriteProcessMemory(hProcess, Pointer($4E709C), #space, 1, nil);
The ptr() Method converts an address to an pointer. So the value in the second method is not 99 but the value that is written at the address 99.
My dirty method, but with few lines of code:
procedure WriteBytes(hProcess: THandle; address: Pointer; buffer: Variant; count: Integer);
begin
WriteProcessMemory(hProcess, address, #buffer, count, nil);
end;
Then you can call the method with:
WriteBytes(HandleWindow, Pointer($449A17), 90, 1);

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?
A function like this will do what you need:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s));
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
end;
You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.
After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.
If you want to get the zero-terminator you would write it so:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s)+1);
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
Result[high(Result)] := 0;
end;
You can use TEncoding.UTF8.GetBytes in SysUtils.pas
If you're using Delphi 2009 or later (the Unicode versions), converting a WideString to a UTF8String is a simple assignment statement:
var
ws: WideString;
u8s: UTF8String;
u8s := ws;
The compiler will call the right library function to do the conversion because it knows that values of type UTF8String have a "code page" of CP_UTF8.
In Delphi 7 and later, you can use the provided library function Utf8Encode. For even earlier versions, you can get that function from other libraries, such as the JCL.
You can also write your own conversion function using the Windows API:
function CustomUtf8Encode(const ws: WideString): UTF8String;
var
n: Integer;
begin
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), nil, 0, nil, nil);
Win32Check(n <> 0);
SetLength(Result, n);
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), PAnsiChar(Result), n, nil, nil);
Win32Check(n = Length(Result));
end;
A lot of the time, you can simply use a UTF8String as an array, but if you really need a byte array, you can use David's and Cosmin's functions. If you're writing your own character-conversion function, you can skip the UTF8String and go directly to a byte array; just change the return type to TBytes or array of Byte. (You may also wish to increase the length by one, if you want the array to be null-terminated. SetLength will do that to the string implicitly, but to an array.)
If you have some other string type that's neither WideString, UnicodeString, nor UTF8String, then the way to convert it to UTF-8 is to first convert it to WideString or UnicodeString, and then convert it back to UTF-8.
var S: UTF8String;
B: TBytes;
begin
S := 'Șase sași în șase saci';
SetLength(B, Length(S)); // Length(s) = 26 for this 22 char string.
CopyMemory(#B[0], #S[1], Length(S));
end.
Depending on what you need the bytes for, you might want to include an NULL terminator.
For production code make sure you test for empty string. Adding the 3-4 LOC required would just make the sample harder to read.
I have the following two routines (source code can be downloaded here - http://www.csinnovations.com/framework_utilities.htm):
function CsiBytesToStr(const pInData: TByteDynArray; pStringEncoding: TECsiStringEncoding; pIncludesBom: Boolean): string;
function CsiStrToBytes(const pInStr: string; pStringEncoding: TECsiStringEncoding;
pIncludeBom: Boolean): TByteDynArray;
widestring -> UTF8:
http://www.freepascal.org/docs-html/rtl/system/utf8decode.html
the opposite:
http://www.freepascal.org/docs-html/rtl/system/utf8encode.html
Note that assigning a widestring to an ansistring in a pre D2009 system (including current Free Pascal) will convert to the local ansi encoding, garbling characters.
For the TBytes part, see the remark of Rob Kennedy above.

Help in Pascal writing a word counter

I have to write a program in Pascal which has to detect how many words on a text (input by the user) start with a certain letter. I can't use arrays, can you give me any hints as to where to start?
If you know which letter, you merely need to keep a counter, no need for arrays.
If you don't know which letter, keep 26 counters. Stupid, but works as per your spec.
First thing to do is define the set of characters that constitute letters, or conversely which ones constitute non-letters.
Write a function that takes a character and returns a boolean based on whether that character is a letter. Then loop through the string and call it for each character. When you detect a letter right after a non-letter or at the start of the string, increment your counter if it is the target letter.
count instances of SPACE LETTER plus first word if it matches.
(S) is your input string;
Create a for loop that goes from 1 to the length of (S) - 1.
Inside loop, check is (S)[i] = ' ' and (S)[i+1] = 't' where i is the loop counter and 't' is the letter starting the word you want to count
If criteria in step two matches then increment a counter.
Note the minus one on the loop size.
Also, remember that the very first letter of the string may be the one you want to match and that will not get picked up by the loop defined above.
If you need to make your code smarter in that it can locate a specific letter rather than a hardcoded 't' then you can pass the requested character as a parameter to the function/procedure that your loop is in.
Off the top of my head - not tested
function WordCount(const S: string; const C: Char): Integer;
const
ValidChars: Set of Char [A..Z, a..z]; // Alter for appropriate language
var
i : Integer;
t : string;
begin
Result := 0;
if Length(S) <> 0 then
begin
t := Trim(S); // lose and leading and trailing spaces
t := t + ' '; // make sure a space is the last char
repeat
if (t[1] in ValidChars) and (t[1] = C then
inc(Result);
i := Pos(' ', t);
t := Copy(t(i+1, Length(t));
until Length(t) = 0;
end;
end;
Why would you need an array or a case statement?

Resources