error with ISO-88559-1 encoding using TIdHttp and Delphi - delphi-7

I have a big problem regarding the accentuation in the result obtained from the Post() method of TIdHTTP.
The URL I'm accessing is already coded correctly, I saved the result to a text file on the server just to make sure it's all correct. But when I bring the data to Delphi through a function that I created, instead of letters with accents, the character "?" Is appearing.
For example, if the page results in Conexão não configurada, the result of the function is Conex?o n?o configurada.
I've tried several forms posted here in StackOverflow, but I did not succeed.
My function is as follows:
function HttpPost(PostUrl: string; PostParams: TStringList): string;
var
IdHTTP1: TIdHTTP;
IOHandler: TIdSSLIOHandlerSocketOpenSSL;
begin
IdHTTP1 := TIdHTTP.Create(nil);
IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
IdHTTP1.IOHandler := IOHandler;
IdHTTP1.HandleRedirects := True;
IdHTTP1.Request.ContentType := 'text/html';
IdHTTP1.Request.CharSet := 'ISO-8859-1';
IdHTTP1.Request.UserAgent := 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0';
IdHTTP1.ReadTimeout := 20000;
try
Result := IdHTTP1.Post(PostUrl, PostParams);
except
on E:Exception do
begin
Result := 'ErrorExcept';
Msg(E,2);
end;
end;
IdHTTP1.Free;
IOHandler.Free;
end;
The updated version of Indy at 10.6.2.0

You are using an ANSI version of Delphi (Delphi switched to Unicode in 2009).
The version of TIdHTTP.Post() that returns a String decodes the raw server data to Unicode using the charset reported in the Content-Type response header, or a default if no charset is specified. So, make sure the data being sent is actually encoded in the correct charset, and that charset is being reported correctly.
In Unicode versions of Delphi, where String is an alias for UnicodeString, this Unicode data is returned as-is.
In ANSI versions of Delphi, where String is an alias for AnsiString, Post() converts this Unicode data to ANSI for output. The ? characters you are seeing mean the Unicode data has characters that do not exist in the ANSI charset being converted to. Post() has an optional ADestEncoding parameter to specify the desired ANSI charset for output. If not specified, Indy's default encoding is used. That default is controlled by the global GIdDefaultTextEncoding variable in the IdGlobal unit, which is set to encASCII (7bit US-ASCII) by default.
The output ANSI charset does not need to be the same as the charset used by the raw data. The point of ADestEncoding is to specify the charset that you want the output to be in.
If you know ahead of time the exact ANSI charset you want to use, you can set ADestEncoding to an IIdTextEncoding for that charset, such as from the CharsetToEncoding() function in the IdGlobalProtocols unit, or the IndyTextEncoding() function in the IdGlobal unit.
Or, to use the OS default charset of the machine your code is running on, set ADestEncoding to IndyTextEncoding_OSDefault (or set GIdDefaultTextEncoding to encOSDefault).
But note that Unicode-to-ANSI conversions are usually lossy, so it is better to use UTF-8 instead, which is lossless. You can set ADestEncoding to IndyTextEncoding_UTF8 (or set GIdDefaultTextEncoding to encUTF8).

Related

Inno Setup replace a string in a UTF-8 file without BOM

I need to change some values in a configuration file. The file is UTF-8 without BOM. I need to save it the same way. How do I do it with Inno Setup Unicode edition? Note: This doesn't work, and this doesn't show how to read the file correctly.
const
CP_UTF8 = 65001;
{ ... }
var
FileName: string;
S: string;
begin
FileName := 'test.txt';
if not LoadStringFromFileInCP(FileName, S, CP_UTF8) then
begin
Log('Error loading the file');
end
else
if StringChangeEx(S, 'žluťoučký kůň', 'ďábelské ódy', True) <= 0 then
begin
Log('No value was replaced');
end
else
if not SaveStringToFileInCP(FileName, S, CP_UTF8) then
begin
Log('Error writing the file');
end
else
begin
Log('Replacement successful');
end;
end;
LoadStringFromFileInCP and SaveStringToFileInCP come from:
Inno Setup - Convert array of string to Unicode and back to ANSI
The code needs Unicode version of Inno Setup (the only version as of Inno Setup 6).
For Unicode string literals, your .iss file must be in UTF-8 encoding with BOM.

Delphi Berlin 10.1 OS X app Decode cyrillic for writing to hardDevice

I have delphi application, i need to rewrite it for OS X.
This app writes/reads data to/from HID-device.
I have issues when i'm trying to write string from mac.
Here is the line that i'm writing(from debugger on windows): 'Новый комплекс 1'
and this works good. Meanwhile if copy this from debugger to somewhere it becomes 'Íîâûé êîìïëåêñ 1'. Device shows it as it was written, in cyrillic. And that's OK.
When i'm trying to repeat this steps on OS X, device shows unreadeble symbols. But if i do hardcode 'Íîâûé êîìïëåêñ 1' from windows example it's OK again.
Give some hints.
How it on windows
Some code:
s:= 'Новый комлекс 1'
s:= AnsiToUtf8(ReplaceNull(s));
Here is ReplaceNULL:
function ReplaceNull(const Input: string): string;
var
Index: Integer;
Res: String;
begin
Res:= '';
for Index := 1 to Length(Input) do
begin
if Input[Index] = #0 then
Res:= Res + #$12
else
Res:= Res + Input[Index];
end;
ReplaceNull:= Res;
end;
this string i put to Tstringlist and then save to file:
ProgsList.SaveToFile(Mwork.pathLibs+'stream.ini', TEncoding.UTF8);
Other program read this list and then writes to device:
Progs:= TStringList.Create();
Progs.LoadFromFile(****);
s:= UTF8ToAnsi(stringreplace(Progs.Strings[i], #$12, #0, [rfReplaceAll, rfIgnoreCase]));
And then write it to device.
So the line wich writes seems like this:
"'þ5'#0'ÿ'#$11'Новый комплекс 1'#0'T45/180;55;70;85;90;95;100;T45/180'#0'ÿ'"
On the mac i succesfully get the same string. But device can't show this in cyrillic.
A Delphi string is encoded in UTF-16 on all platforms. There is no need to convert it, unless you are interacting with non-Unicode data outside of your app.
That being said, if you have a byte array that is encoded in a particular charset, you can convert it to another charset using Delphi's TEncoding.Convert() method. You can use the TEncoding.GetEncoding() method to get a TEncoding object for a particular charset (if different than the standard supported charsets - ANSI, ASCII, UTF-7, UTF-8, and UTF-16 - which have their own property getters in TEncoding).
var
SrcEnc, DstEnc: TEncoding;
SrcBytes, ConvertedBytes: TBytes;
begin
SrcBytes := ...; // Cyrillic encoded bytes
SrcEnc := TEncoding.GetEncoding('Cyrillic'); // or whatever the real name is...
try
DstEnc := TEncoding.GetEncoding('Windows-1251');
try
ConvertedBytes := TEncoding.Convert(SrcEnc, DstEnc, SrcBytes);
finally
DstEnc.Free;
end;
finally
SrcEnc.Free;
end;
// use ConvertedBytes as needed...
end;
Update: To encode a Unicode string in a particular charset, simply call the TEncoding.GetBytes() method, eg:
s := 'Новый комлекс 1';
Enc := TEncoding.GetEncoding('Windows-1251');
try
bytes := Enc.GetBytes(s);
finally
Enc.Free;
end;
s := 'Новый комлекс 1';
bytes := TEncoding.UTF8.GetBytes(s);
You can use the TEncoding.GetString() to decode bytes in a particular charset back to a String, eg:
bytes := ...; // Windows-1251 encoded bytes
Enc := TEncoding.GetEncoding('Windows-1251');
try
s := Enc.GetString(bytes);
finally
Enc.Free;
end;
bytes := ...; // UTF-8 encoded bytes
s := TEncoding.UTF8.GetString(bytes);
The answer was next. Delphi Berlin 10.1 uses KOI8-R, and my device - cp1251.
As i'd wanted to write russian symbols(Cyrillic) i've created table of matches for symbols from KOI8-R and cp1251.
So, i take string in KOI8-R make it in cp1251.
Simple code:
Dict:=TDictionary<String,String>.Create;
Dict.Add(#$439,#$E9);//'й'
Dict.Add(#$44E,#$FE);//'ю'
Dict.Add(#$430,#$E0);//'а'
....
function tkoitocp.getCP1251Code(str:string):string;
var i:integer; res,key,val:string; pair:Tpair<String,String>;
begin
res:='';
for i:=1 to length(str) do
begin
if dict.ContainsKey(str[i]) then
begin
pair:= dict.ExtractPair(str[i]);
res:=res+pair.Value;
dict.Add(pair.Key,pair.Value);
end
else
res:=res+str[i];
end;
Result:=res;
end;

Japan character encoding

I have Japanese string of 'ぱはめ'. I want to convert it into '%82%CF%82%CD%82%DF'. I hope someone will give me a function for this converting.
You need to take the string and encode it in a specific code page. Then take each encoded byte and produce its hex representation. Like this:
function MyEncode(const S: string; const CodePage: Integer): string;
var
Encoding: TEncoding;
Bytes: TBytes;
b: Byte;
sb: TStringBuilder;
begin
Encoding := TEncoding.GetEncoding(932);
try
Bytes := Encoding.GetBytes(S);
finally
Encoding.Free;
end;
sb := TStringBuilder.Create;
try
for b in Bytes do begin
sb.Append('%');
sb.Append(IntToHex(b, 2));
end;
Result := sb.ToString;
finally
sb.Free;
end;
end;
Although you have not stated this, you wish to encode the text as code page 932. So you should pass that value when calling the function.
Writeln(MyEncode('ぱはめ', 932));
I must say that in the modern day, it is somewhat surprising to see this Windows specific multi byte encoding still in use.

Long strings in pascal

I want to be able to use a string that is quite long (not longer then 100000 signs).
As far as I know a typical string variable can cotain only up to 256 chars.
Is there a way to store such a long string?
Old-style (Turbo Pascal, or Delphi 1) strings, now known as ShortString, are limited to 255 characters (byte 0 was reserved for the string length). This appears to still be the default in FreePascal (according to #MarcovandeVoort's comment below). Keep reading, though, until you get to the discussion and code sample for AnsiString below. :-)
Currently, most other dialects of Pascal I'm aware of default to either AnsiString (long strings of single byte characters) or UnicodeString (long strings of multi-byte characters). Neither of those are limited to 255 characters.
The current versions of Delphi defaults to UnicodeString as the default type, so declaring a string variable is in fact a long UnicodeString. There is no practical upper limit to the string length:
var
Test: string; // Declare a new Unicode string
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
If you want to force single-byte characters (but not be limited to 255 character strings), use AnsiString (which can set as the default string type in FreePascal if you use the {$H+} compiler directive - thanks #MarcovandeVoort):
var
Test: AnsiString; // Declare a new Ansistring
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
Finally, if you do for some unknown reason want to use the old style ShortString that is restricted to 255 characters, declare it as such, either using ShortString or the old style String[Size] declaration:
var
Test: ShortString; // Declare a new short string of 255 characters
ShortTest: String[100]; // Also a ShortString of 100 characters
begin
// This line won't compile, because it's too large for Test
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
In Free Pascal, you do not need to be worry about this. You only need to insert the directive {$H+} at the beginning of the source code.
{$H+}
var s: String;
begin
s := StringOfChar('X', 1000);
writeln(s);
end.
You can use the AnsiString type.

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?
A function like this will do what you need:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s));
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
end;
You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.
After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.
If you want to get the zero-terminator you would write it so:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s)+1);
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
Result[high(Result)] := 0;
end;
You can use TEncoding.UTF8.GetBytes in SysUtils.pas
If you're using Delphi 2009 or later (the Unicode versions), converting a WideString to a UTF8String is a simple assignment statement:
var
ws: WideString;
u8s: UTF8String;
u8s := ws;
The compiler will call the right library function to do the conversion because it knows that values of type UTF8String have a "code page" of CP_UTF8.
In Delphi 7 and later, you can use the provided library function Utf8Encode. For even earlier versions, you can get that function from other libraries, such as the JCL.
You can also write your own conversion function using the Windows API:
function CustomUtf8Encode(const ws: WideString): UTF8String;
var
n: Integer;
begin
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), nil, 0, nil, nil);
Win32Check(n <> 0);
SetLength(Result, n);
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), PAnsiChar(Result), n, nil, nil);
Win32Check(n = Length(Result));
end;
A lot of the time, you can simply use a UTF8String as an array, but if you really need a byte array, you can use David's and Cosmin's functions. If you're writing your own character-conversion function, you can skip the UTF8String and go directly to a byte array; just change the return type to TBytes or array of Byte. (You may also wish to increase the length by one, if you want the array to be null-terminated. SetLength will do that to the string implicitly, but to an array.)
If you have some other string type that's neither WideString, UnicodeString, nor UTF8String, then the way to convert it to UTF-8 is to first convert it to WideString or UnicodeString, and then convert it back to UTF-8.
var S: UTF8String;
B: TBytes;
begin
S := 'Șase sași în șase saci';
SetLength(B, Length(S)); // Length(s) = 26 for this 22 char string.
CopyMemory(#B[0], #S[1], Length(S));
end.
Depending on what you need the bytes for, you might want to include an NULL terminator.
For production code make sure you test for empty string. Adding the 3-4 LOC required would just make the sample harder to read.
I have the following two routines (source code can be downloaded here - http://www.csinnovations.com/framework_utilities.htm):
function CsiBytesToStr(const pInData: TByteDynArray; pStringEncoding: TECsiStringEncoding; pIncludesBom: Boolean): string;
function CsiStrToBytes(const pInStr: string; pStringEncoding: TECsiStringEncoding;
pIncludeBom: Boolean): TByteDynArray;
widestring -> UTF8:
http://www.freepascal.org/docs-html/rtl/system/utf8decode.html
the opposite:
http://www.freepascal.org/docs-html/rtl/system/utf8encode.html
Note that assigning a widestring to an ansistring in a pre D2009 system (including current Free Pascal) will convert to the local ansi encoding, garbling characters.
For the TBytes part, see the remark of Rob Kennedy above.

Resources