Japan character encoding

Japan character encoding - windows

I have Japanese string of 'ぱはめ'. I want to convert it into '%82%CF%82%CD%82%DF'. I hope someone will give me a function for this converting.

You need to take the string and encode it in a specific code page. Then take each encoded byte and produce its hex representation. Like this:
function MyEncode(const S: string; const CodePage: Integer): string;
var
Encoding: TEncoding;
Bytes: TBytes;
b: Byte;
sb: TStringBuilder;
begin
Encoding := TEncoding.GetEncoding(932);
try
Bytes := Encoding.GetBytes(S);
finally
Encoding.Free;
end;
sb := TStringBuilder.Create;
try
for b in Bytes do begin
sb.Append('%');
sb.Append(IntToHex(b, 2));
end;
Result := sb.ToString;
finally
sb.Free;
end;
end;
Although you have not stated this, you wish to encode the text as code page 932. So you should pass that value when calling the function.
Writeln(MyEncode('ぱはめ', 932));
I must say that in the modern day, it is somewhat surprising to see this Windows specific multi byte encoding still in use.

Related

Delphi Function that takes Integer and returns a not-so-easily decoded Integer

Can anyone share what are some common Delphi examples of a function that takes a number
and returns a number that is not so obvious?
For example :
function GetNumber(const aSeed: Integer): Integer;
begin
Result := ((aSeed+5) * aSeed) + 15;
end;
So let's say the user knows that sending aSeed = 21 gives 561
and aSeed = 2, gives 29
and so on...
is there a function that makes it hard to reverse engineer the code,
even if one can generate a large number sets of Seed/Result ?
(hard : I do not mean impossible, just need to be non-trivial)
Preferably a function that does not allow in function result exceeding the
Integer result as well.
In any case, if you are not sure whether it's hard/impossible to reverse,
do feel free to share what you have.
some other requirements:
the same input always results in the same output; cannot have Random output
the same output regardless of platform: windows/android/mac/ios
won't result in some extraordinary big number (fit in Integer)

Using a hash is a very good way to achieve what you want. Here is an example that takes an integer, converts it to a string, appends it to a salt, computes the MD5 and returns the integer corresponding to the first 4 bytes:
uses
System.Hash;
function GetHash(const s: string): TBytes;
var
MD5: THashMD5;
begin
MD5 := THashMD5.Create;
MD5.Update(TEncoding.UTF8.GetBytes(s));
Result := MD5.HashAsBytes;
end;
function GetNumber(Input: Integer): Integer;
var
Hash: TBytes;
p: ^Integer;
begin
Hash := GetHash('secret' + IntToStr(Input));
P := #Hash[0];
Result := Abs(P^);
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
ShowMessage(IntToStr(GetNumber(1))); // 996659739
ShowMessage(IntToStr(GetNumber(2))); // 939216101
ShowMessage(IntToStr(GetNumber(3))); // 175456750
end;

Binary file error in Lazarus Pascal with custom records - error SIGSEGV

I don't work with Pascal very often so I apologise if this question is basic. I am working on a binary file program that writes an array of custom made records to a binary file.
Eventually I want it to be able to write multiple arrays of different custom record types to one single binary file.
For that reason I thought I would write an integer first being the number of bytes that the next array will be in total. Then I write the array itself. I can then read the first integer type block - to tell me the size of the next blocks to read in directly to an array.
For example - when writing the binary file I would do something like this:
assignfile(f,MasterFileName);
{$I-}
reset(f,1);
{$I+}
n := IOResult;
if n<> 0 then
begin
{$I-}
rewrite(f);
{$I+}
end;
n:= IOResult;
If n <> 0 then
begin
writeln('Error creating file: ', n);
end
else
begin
SetLength(MyArray, 2);
MyArray[0].ID := 101;
MyArray[0].Att1 := 'Hi';
MyArray[0].Att2 := 'MyArray 0 - Att2';
MyArray[0].Value := 1;
MyArray[1].ID := 102;
MyArray[1].Att1:= 'Hi again';
MyArray[1].Att2:= MyArray 1 - Att2';
MyArray[1].Value:= 5;
SizeOfArray := sizeOf(MyArray);
writeln('Size of character array: ', SizeOfArray);
writeln('Size of integer var: ', sizeof(SizeOfArray));
blockwrite(f,sizeOfArray,sizeof(SizeOfArray),actual);
blockwrite(f,MyArray,SizeOfArray,actual);
Close(f);
Then you could re-read the file with something like this:
Assign(f, MasterFileName);
Reset(f,1);
blockread(f,SizeOfArray,sizeof(SizeOfArray),actual);
blockread(f,MyArray,SizeOfArray,actual);
Close(f);
This has the idea that after these blocks have been read that you can then have a new integer recorded and a new array then saved etc.
It reads the integer parts of the records in but nothing for the strings. The record would be something like this:
TMyType = record
ID : Integer;
att1 : string;
att2 : String;
Value : Integer;
end;
Any help gratefully received!!

TMyType = record
ID : Integer;
att1 : string; // <- your problem
That field att1 declared as string that way means that the record contains a pointer to the actual string data (att1 is really a pointer). The compiler manages this pointer and the memory for the associated data, and the string can be any (reasonable) length.
A quick fix for you would be to declare att1 something like string[64], for example: a string which can be at maximum 64 chars long. That would eliminate the pointer and use the memory of the record (the att1 field itself, which now is a special static array) as buffer for string characters. Declaring the maximum length of the string, of course, can be slightly dangerous: if you try to assign the string a string too long, it will be truncated.
To be really complete: it depends on the compiler; some have a switch to make your declaration "string" usable, making it an alias for "string[255]". This is not the default though. Consider also that using string[...] is faster and wastes memory.

You have a few mistakes.
MyArray is a dynamic array, a reference type (a pointer), so SizeOf(MyArray) is the size of a pointer, not the size of the array. To get the length of the array, use Length(MyArray).
But the bigger problem is saving long strings (AnsiStrings -- the usual type to which string maps --, WideStrings, UnicodeStrings). These are reference types too, so you can't just save them together with the record. You will have to save the parts of the record one by one, and for strings, you will have to use a function like:
procedure SaveStr(var F: File; const S: AnsiString);
var
Actual: Integer;
Len: Integer;
begin
Len := Length(S);
BlockWrite(F, Len, SizeOf(Len), Actual);
if Len > 0 then
begin
BlockWrite(F, S[1], Len * SizeOf(AnsiChar), Actual);
end;
end;
Of course you should normally check Actual and do appropriate error handling, but I left that out, for simplicity.
Reading back is similar: first read the length, then use SetLength to set the string to that size and then read the rest.
So now you do something like:
Len := Length(MyArray);
BlockWrite(F, Len, SizeOf(Len), Actual);
for I := Low(MyArray) to High(MyArray) do
begin
BlockWrite(F, MyArray[I].ID, SizeOf(Integer), Actual);
SaveStr(F, MyArray[I].att1);
SaveStr(F, MyArray[I].att2);
BlockWrite(F, MyArray[I].Value, SizeOf(Integer), Actual);
end;
// etc...
Note that I can't currently test the code, so it may have some little errors. I'll try this later on, when I have access to a compiler, if that is necessary.
Update
As Marco van de Voort commented, you may have to do:
rewrite(f, 1);
instead of a simple
rewrite(f);
But as I replied to him, if you can, use streams. They are easier to use (IMO) and provide a more consistent interface, no matter to what exactly you try to write or read. There are streams for many different kinds of I/O, and all derive from (and are thus compatible with) the same basic abstract TStream class.

Delphi Berlin 10.1 OS X app Decode cyrillic for writing to hardDevice

I have delphi application, i need to rewrite it for OS X.
This app writes/reads data to/from HID-device.
I have issues when i'm trying to write string from mac.
Here is the line that i'm writing(from debugger on windows): 'Новый комплекс 1'
and this works good. Meanwhile if copy this from debugger to somewhere it becomes 'Íîâûé êîìïëåêñ 1'. Device shows it as it was written, in cyrillic. And that's OK.
When i'm trying to repeat this steps on OS X, device shows unreadeble symbols. But if i do hardcode 'Íîâûé êîìïëåêñ 1' from windows example it's OK again.
Give some hints.
How it on windows
Some code:
s:= 'Новый комлекс 1'
s:= AnsiToUtf8(ReplaceNull(s));
Here is ReplaceNULL:
function ReplaceNull(const Input: string): string;
var
Index: Integer;
Res: String;
begin
Res:= '';
for Index := 1 to Length(Input) do
begin
if Input[Index] = #0 then
Res:= Res + #$12
else
Res:= Res + Input[Index];
end;
ReplaceNull:= Res;
end;
this string i put to Tstringlist and then save to file:
ProgsList.SaveToFile(Mwork.pathLibs+'stream.ini', TEncoding.UTF8);
Other program read this list and then writes to device:
Progs:= TStringList.Create();
Progs.LoadFromFile(****);
s:= UTF8ToAnsi(stringreplace(Progs.Strings[i], #$12, #0, [rfReplaceAll, rfIgnoreCase]));
And then write it to device.
So the line wich writes seems like this:
"'þ5'#0'ÿ'#$11'Новый комплекс 1'#0'T45/180;55;70;85;90;95;100;T45/180'#0'ÿ'"
On the mac i succesfully get the same string. But device can't show this in cyrillic.

A Delphi string is encoded in UTF-16 on all platforms. There is no need to convert it, unless you are interacting with non-Unicode data outside of your app.
That being said, if you have a byte array that is encoded in a particular charset, you can convert it to another charset using Delphi's TEncoding.Convert() method. You can use the TEncoding.GetEncoding() method to get a TEncoding object for a particular charset (if different than the standard supported charsets - ANSI, ASCII, UTF-7, UTF-8, and UTF-16 - which have their own property getters in TEncoding).
var
SrcEnc, DstEnc: TEncoding;
SrcBytes, ConvertedBytes: TBytes;
begin
SrcBytes := ...; // Cyrillic encoded bytes
SrcEnc := TEncoding.GetEncoding('Cyrillic'); // or whatever the real name is...
try
DstEnc := TEncoding.GetEncoding('Windows-1251');
try
ConvertedBytes := TEncoding.Convert(SrcEnc, DstEnc, SrcBytes);
finally
DstEnc.Free;
end;
finally
SrcEnc.Free;
end;
// use ConvertedBytes as needed...
end;
Update: To encode a Unicode string in a particular charset, simply call the TEncoding.GetBytes() method, eg:
s := 'Новый комлекс 1';
Enc := TEncoding.GetEncoding('Windows-1251');
try
bytes := Enc.GetBytes(s);
finally
Enc.Free;
end;
s := 'Новый комлекс 1';
bytes := TEncoding.UTF8.GetBytes(s);
You can use the TEncoding.GetString() to decode bytes in a particular charset back to a String, eg:
bytes := ...; // Windows-1251 encoded bytes
Enc := TEncoding.GetEncoding('Windows-1251');
try
s := Enc.GetString(bytes);
finally
Enc.Free;
end;
bytes := ...; // UTF-8 encoded bytes
s := TEncoding.UTF8.GetString(bytes);

The answer was next. Delphi Berlin 10.1 uses KOI8-R, and my device - cp1251.
As i'd wanted to write russian symbols(Cyrillic) i've created table of matches for symbols from KOI8-R and cp1251.
So, i take string in KOI8-R make it in cp1251.
Simple code:
Dict:=TDictionary<String,String>.Create;
Dict.Add(#$439,#$E9);//'й'
Dict.Add(#$44E,#$FE);//'ю'
Dict.Add(#$430,#$E0);//'а'
....
function tkoitocp.getCP1251Code(str:string):string;
var i:integer; res,key,val:string; pair:Tpair<String,String>;
begin
res:='';
for i:=1 to length(str) do
begin
if dict.ContainsKey(str[i]) then
begin
pair:= dict.ExtractPair(str[i]);
res:=res+pair.Value;
dict.Add(pair.Key,pair.Value);
end
else
res:=res+str[i];
end;
Result:=res;
end;

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?

A function like this will do what you need:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s));
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
end;
You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.
After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.
If you want to get the zero-terminator you would write it so:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s)+1);
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
Result[high(Result)] := 0;
end;

You can use TEncoding.UTF8.GetBytes in SysUtils.pas

If you're using Delphi 2009 or later (the Unicode versions), converting a WideString to a UTF8String is a simple assignment statement:
var
ws: WideString;
u8s: UTF8String;
u8s := ws;
The compiler will call the right library function to do the conversion because it knows that values of type UTF8String have a "code page" of CP_UTF8.
In Delphi 7 and later, you can use the provided library function Utf8Encode. For even earlier versions, you can get that function from other libraries, such as the JCL.
You can also write your own conversion function using the Windows API:
function CustomUtf8Encode(const ws: WideString): UTF8String;
var
n: Integer;
begin
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), nil, 0, nil, nil);
Win32Check(n <> 0);
SetLength(Result, n);
n := WideCharToMultiByte(cp_UTF8, 0, PWideChar(ws), Length(ws), PAnsiChar(Result), n, nil, nil);
Win32Check(n = Length(Result));
end;
A lot of the time, you can simply use a UTF8String as an array, but if you really need a byte array, you can use David's and Cosmin's functions. If you're writing your own character-conversion function, you can skip the UTF8String and go directly to a byte array; just change the return type to TBytes or array of Byte. (You may also wish to increase the length by one, if you want the array to be null-terminated. SetLength will do that to the string implicitly, but to an array.)
If you have some other string type that's neither WideString, UnicodeString, nor UTF8String, then the way to convert it to UTF-8 is to first convert it to WideString or UnicodeString, and then convert it back to UTF-8.

var S: UTF8String;
B: TBytes;
begin
S := 'Șase sași în șase saci';
SetLength(B, Length(S)); // Length(s) = 26 for this 22 char string.
CopyMemory(#B[0], #S[1], Length(S));
end.
Depending on what you need the bytes for, you might want to include an NULL terminator.
For production code make sure you test for empty string. Adding the 3-4 LOC required would just make the sample harder to read.

I have the following two routines (source code can be downloaded here - http://www.csinnovations.com/framework_utilities.htm):
function CsiBytesToStr(const pInData: TByteDynArray; pStringEncoding: TECsiStringEncoding; pIncludesBom: Boolean): string;
function CsiStrToBytes(const pInStr: string; pStringEncoding: TECsiStringEncoding;
pIncludeBom: Boolean): TByteDynArray;

widestring -> UTF8:
http://www.freepascal.org/docs-html/rtl/system/utf8decode.html
the opposite:
http://www.freepascal.org/docs-html/rtl/system/utf8encode.html
Note that assigning a widestring to an ansistring in a pre D2009 system (including current Free Pascal) will convert to the local ansi encoding, garbling characters.
For the TBytes part, see the remark of Rob Kennedy above.

saving a records containing a member of type string to a file (Delphi, Windows)

I have a record that looks similar to:
type
TNote = record
Title : string;
Note : string;
Index : integer;
end;
Simple. The reason I chose to set the variables as string (as opposed to an array of chars) is that I have no idea how long those strings are going to be. They can be 1 char long, 200 or 2000.
Of course when I try to save the record to a type file (file of...) the compiler complains that I have to give a size to string.
Is there a way to overcome this? or a way to save those records to an untyped file and still maintain a sort of searchable way?
Please do not point me to possible solutions, if you know the solution please post code.
Thank you

You can't do it with a typed file. Try something like this, with a TFileStream:
type
TStreamEx = class helper for TStream
public
procedure writeString(const data: string);
function readString: string;
procedure writeInt(data: integer);
function readInt: integer;
end;
function TStreamEx.readString: string;
var
len: integer;
iString: UTF8String;
begin
self.readBuffer(len, 4);
if len > 0 then
begin
setLength(iString, len);
self.ReadBuffer(iString[1], len);
result := string(iString);
end;
end;
procedure TStreamEx.writeString(const data: string);
var
len: cardinal;
oString: UTF8String;
begin
oString := UTF8String(data);
len := length(oString);
self.WriteBuffer(len, 4);
if len > 0 then
self.WriteBuffer(oString[1], len);
end;
function TStreamEx.readInt: integer;
begin
self.readBuffer(result, 4);
end;
procedure TStreamEx.writeInt(data: integer);
begin
self.WriteBuffer(data, 4);
end;
type
TNote = record
Title : string;
Note : string;
Index : integer;
procedure Save(stream: TStream);
end;
procedure TNote.Save(stream: TStream);
var
temp: TMemoryStream;
begin
temp := TMemoryStream.Create;
try
temp.writeString(Title);
temp.writeString(Note);
temp.writeInt(Index);
temp.seek(0, soFromBeginning);
stream.writeInt(temp.size);
stream.copyFrom(temp, temp.size);
finally
temp.Free;
end;
end;
I'll leave the Load procedure to you. Same basic idea, but it shouldn't need a temp stream. With the record size in front of each entry, you can read it and know how far to skip if you're looking for a certain record # instead of reading the whole thing.
EDIT: This was written specifically for versions of Delphi that use Unicode strings. On older versions, you could simplify it quite a bit.

Why not write this out as XML? See my session "Practical XML with Delphi" on how to get started with this.
Another possibility would be to make your records into classes descending form TComponent and store/retreive your data in DFM files.
This Stackoverflow entry shows you how to do that.
--jeroen
PS: Sorry my XML answer was a bit dense; I'm actually on the road for two conferences (BASTA! and DelphiLive! Germany).
Basically what you need to do is very simple: create a sample XML file, then start the Delphi XML Data Binding Wizard (available in Delphi since version 6).
This wizard will generate a unit for you that has the interfaces and classes mapping XML to Delphi objects, and a few helper functions for reading them from file, creating a new object, etc. My session (see the first link above) actually contains most of the details for this process.
The above link is a video demonstrating the usage of the Delphi XML Data Binding Wizard.

You could work with two different files, one that just stores the strings in some convenient way, the other stores the records with a reference to the strings. That way you will still have a file of records for easy access even though you don't know the size of the actual content.
(Sorry no code.)

TNote = record
Title : string;
Note : string;
Index : integer;
end;
could be translated as
TNote = record
Title : string[255];
Note : string[255];
Index : integer;
end;
and use Stream.writebuffer(ANodeVariable, sizeof(TNode), but you said that strings get go over 255 chars in this case IF a string goes over 65535 chars then change WORD to INTEGER
type
TNodeHeader=Record
TitleLen,
NoteLen: Word;
end;
(* this is for writing a TNode *)
procedure saveNodetoStream(theNode: TNode; AStream: TStream);
var
header: TNodeHeader;
pStr: PChar;
begin
...
(* writing to AStream which should be initialized before this *)
Header.TitleLen := Length(theNode.Title);
header.NodeLen := Length(theNode.Note);
AStream.WriteBuffer(Header, sizeof(TNodeHeader);
(* save strings *)
PStr := PChar(theNode.Title);
AStream.writeBuffer(PStr^, Header.TitleLen);
PStr := PChar(theNode.Note);
AStream.writebuffer(PStr^, Header.NoteLen);
(* save index *)
AStream.writebuffer(theNode.Index, sizeof(Integer));
end;
(* this is for reading a TNode *)
function readNode(AStream: TStream): TNode;
var
header: THeader
PStr: PChar;
begin
AStream.ReadBuffer(Header, sizeof(TNodeHeader);
SetLength(Result.Title, Header.TitleLen);
PStr := PChar(Result.Title);
AStream.ReadBuffer(PStr^, Header.TitleLen);
SetLength(Result.Note, Header.NoteLen);
PStr := PChar(Result.Note);
AStream.ReadBuffer(PStr^, Header.NoteLen);
AStream.ReadBuffer(REsult.Index, sizeof(Integer)(* 4 bytes *);
end;

You can use the functions available in this Open Source unit.
It allows you to serialize any record content into binary, including even dynamic arrays within:
type
TNote = record
Title : string;
Note : string;
Index : integer;
end;
var
aSave: TRawByteString;
aNote, aNew: TNote;
begin
// create some content
aNote.Title := 'Title';
aNote.Note := 'Note';
aNote.Index := 10;
// serialize the content
aSave := RecordSave(aNote,TypeInfo(TNote));
// unserialize the content
RecordLoad(aNew,pointer(aSave),TypeInfo(TNote));
// check the content
assert(aNew.Title = 'Title');
assert(aNew.Note = 'Note');
assert(aNew.Index = 10);
end;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Japan character encoding - windows

I have Japanese string of 'ぱはめ'. I want to convert it into '%82%CF%82%CD%82%DF'. I hope someone will give me a function for this converting.

Related

Delphi Function that takes Integer and returns a not-so-easily decoded Integer

Binary file error in Lazarus Pascal with custom records - error SIGSEGV

Delphi Berlin 10.1 OS X app Decode cyrillic for writing to hardDevice

String to byte array in UTF-8?

saving a records containing a member of type string to a file (Delphi, Windows)

Categories

Resources