I encrypted some text and put it in a INI file. Then I used getprivateprofilestring() to retrieve the value but some of the end characters are missing. I suspect it may be a new line character causing it to be incomplete. Writing to the INI file is OK. Opening the INI file and looking at the sections and keys - everything is in order. Its just the retrieving part that causes the bug.
Please any help would be appreciated.
Thanks
Eddie
First off when encrypting strings, make sure that they are converted to Base64 before dumping them into the INI file.
Most likely, the encrypted string created an ascii character which is not handled very well by the INI related APIs.
WritePrivateProfileStringW writes files in the active ANSI codepage by default; WritePrivateProfileStringA will always write ANSI.
To achieve the best results, follow the directions here and use GetPrivateProfileStringW when reading the data back
It's more than likely that the encryption is injecting a NULL character into the stream you are writing. GetPrivateProfileString will read a string till it finds a NULL character.
So I agree with Angry Hacker, convert to Base64 or some other friendly human readable encoding and you won't have any problems.
Related
I get text file of random encoding format, usc-2le, ansi, utf-8, usc-2be etc. I have to convert this files to utf8.
For conversion am using the following command
iconv options -f from-encoding -t utf-8 <inputfile > outputfile
But if incorrect from-encoding is provided, then incorrect file is generated.
I want a way to find the input file encoding type.
Thanks in advance
On Linux you could try using file(1) on your unknown input file. Most of the time it would guess the encoding correctly. Or else try several encodings to iconv till you "feel" that the result is acceptable (for example if you know that the file is some Russian poetry, you might try KOI-8, UTF-8, etc.... till you recognize a good Russian poem).
But character encoding is a nightmare and can be ambiguous. The provider of the file should tell you what encoding he used (and there is no way to get that encoding reliably and in all cases : there are some byte sequences which would be valid and interpreted differently with various encodings).
(notice that the HTTP protocol mentions and explicits the encoding)
In 2017, better use UTF-8 everywhere (and you should follow that http://utf8everywhere.org/ link) so ask your human partners to send you UTF-8 (hopefully most of your files are in UTF-8, since today they all should be).
(so encoding is more a social issue than a technical one)
I get text file of random encoding format
Notice that "random encoding" don't exist. You want and need to find out what character encoding (and file format) has been used by the provider of that file (so you mean "unknown encoding", not "random" one).
BTW, do you have a formal, unambiguous, sound and precise definition of text file, beyond file without zero bytes, or files with few control characters? LaTeX, C source, Markdown, SQL, UUencoding, shar, XPM, and HTML files are all text files, but very different ones!
You probably want to expect UTF-8, and you might use the file extension as some hint. Knowing the media-type could help.
(so if HTTP has been used to transfer the file, it is important to keep (and trust) the Content-Type...; read about HTTP headers)
[...] then incorrect file is generated.
How do you know that the resulting file is incorrect? You can only know if you have some expectations about that result (e.g. that it contains Russian poetry, not junk characters; but perhaps these junk characters are some bytecode to some secret interpreter, or some music represented in weird fashion, or encrypted, etc....). Raw files are just sequences of bytes, you need some extra knowledge to use them (even if you know that they use UTF-8).
We do file encoding conversion with
vim -c "set encoding=utf8" -c "set fileencoding=utf8" -c "wq" filename
It's working fine , no need to give source encoding.
I am having difficulty outputting data in UTF-8 format. I have a test case set up where data I am reading from an input file contains a British pound symbol (Hex C2A3). When I write it out on Linux, I get valid UTF-8 (C2A3). On windows, I only get HEX A3.
I tried using a PrintStream and specifying the character set as "UTF-8". No luck. I tried many other streams with no luck until I finally tried a DataOutputStream. I used the "write()" method which took a byte array as a parameter. I needed to output a string, so I called "myString.getBytes("UTF-8")".
I ended up with code like:
dataOutputStream.write(myString.getBytes("UTF-8"));
This works properly on both systems; Windows 7 and Linux.
I am trying to understand why this worked and convince myself my solution is correct. Does it come down to system Locale's? Linux defaults to en_US.utf-8. While all I could specify in Windows was just "en_US". So when the outputstream attempted to get data from the string, the string was sending its data based upon the locale?
Or are you using FileOutputStream and there it matters the character encoding or DataOutputStream where you write binary. You should do a research too, but look at here please
I was wondering how Windows interprets characters.
I made a file with a hex editor with the 3 bytes E3 81 81.
Those bytes are the ぁ character in UTF-8.
I opened the notepad and it displayed ぁ. I didn't specify the encoding of the file, I just created the bytes and the notepad interpreted it correctly.
Is notepad somehow guessing the encoding?
Or is the hex editor saving those bytes with a specific encoding?
If the file only contains these three bytes, then there is no information at all about which encoding to use.
A byte is just a byte, and there is no way to include any encoding information in it. Besides, the hex editor doesn't even know that you intended to decode the data as text.
Notepad normally uses ANSI encoding, so if it reads the file as UTF-8 then it has to guess the encoding based on the data in the file.
If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BF at the beginning of the file.
Notepad makes an educated guess. I don't know the details, but loading the first few kilobytes and trying to convert them from UTF-8 is very simple, so it probably does something similar to that.
...and sometimes it gets it wrong...
https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/
There is an easy and efficient way to check whether a file is in UTF-8. See Wikipedia: http://en.wikipedia.org/w/index.php?title=UTF-8&oldid=581360767#Advantages, fourth bullet point. Notepad probably uses this.
Wikipedia claims that Notepad used the IsTextUnicode function, which checks whether a patricular text is written in UTF-16 (it may have stopped using it in Windows Vista, which fixed the "Bush hid the facts" bug): http://en.wikipedia.org/wiki/Bush_hid_the_facts.
how to identify the file is in which encoding ....?
Go to the file and try to Save As... and you can see the default (current) encoding of the file (in which encoding it is saved).
if we don't mention the decoding what decoding will they use?
I do not think it's System.Text.Encoding.Default. Things work well if I EXPLICITLY put System.Text.Encoding.Default but things go wrong when I live that empty.
So this doesn't work well
Dim b = System.IO.File.ReadAllText("test.txt")
System.IO.File.WriteAllText("test4.txt", b)
but this works well
Dim b = System.IO.File.ReadAllText("test.txt", System.Text.Encoding.Default)
System.IO.File.WriteAllText("test4.txt", b, System.Text.Encoding.Default)
If we do not specify encoding will vb.net try to figure out the encoding from the text file?
Also what is System.Text.Encoding.Default?
It's the system default. What is my system default and how can I change it?
How do I know encoding used in a text file?
If I create a new text file and open it with scite I see that the encoding is code page property. What is code page property?
Look here, "This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected."
see also http://msdn.microsoft.com/en-us/library/ms143375(v=vs.110).aspx
This method uses UTF-8 encoding without a Byte-Order Mark (BOM)
Using WritePrivateProfileString and GetPrivateProfileString results in ??? instead of the real characters.
GetPrivateProfileString() and WritePrivateProfileString() will work with Unicode, sort of.
If the ini file is UTF-16LE encoded, i.e. it has a UTF-16 BOM, then the functions will work in Unicode. However if the functions have to create the file they will create an ANSI file and only work in ANSI.
So to use the functions with Unicode, create your ini file before you first use it and write a UTF-16LE Byte Order Mark to it. Then carry on as normal.
Note that the functions do not work at all with UTF-8.
See Michael Kaplan's blog for more detail than you ever wanted to know about this.
The WritePrivateProfileStringW function will write the INI file in legacy system encoding (e.g. Shift-JIS on a Japanese system) because it is a legacy support function. If you want to have a fully Unicode-enabled INI file, you will need to use an external library.
Try SimpleIni http://code.jellycan.com/simpleini/
It is C++, single header file, template library with an MIT licence (i.e. commercial use is OK). Include it into your source file and use it. It is cross-platform, supports UTF-8 and legacy encoded files, and can read and write the INI file largely preserving comments and structure, etc. Easiest to check out the page.
It's been around for a while and is appears to be used by quite a number of people. I wrote it and continue to support it.
According to the WritePrivateProfileString documentation, there is a Unicode version: WritePrivateProfileStringW. Use that, and you should be able to use Unicode characters.
It might just be a problem with how you are displaying or handling the strings.
For example, the normal console window can't display japanese strings with printf.
Can you post some of your code?