Output data in UTF-8 format - utf-8

I am having difficulty outputting data in UTF-8 format. I have a test case set up where data I am reading from an input file contains a British pound symbol (Hex C2A3). When I write it out on Linux, I get valid UTF-8 (C2A3). On windows, I only get HEX A3.
I tried using a PrintStream and specifying the character set as "UTF-8". No luck. I tried many other streams with no luck until I finally tried a DataOutputStream. I used the "write()" method which took a byte array as a parameter. I needed to output a string, so I called "myString.getBytes("UTF-8")".
I ended up with code like:
dataOutputStream.write(myString.getBytes("UTF-8"));
This works properly on both systems; Windows 7 and Linux.
I am trying to understand why this worked and convince myself my solution is correct. Does it come down to system Locale's? Linux defaults to en_US.utf-8. While all I could specify in Windows was just "en_US". So when the outputstream attempted to get data from the string, the string was sending its data based upon the locale?

Or are you using FileOutputStream and there it matters the character encoding or DataOutputStream where you write binary. You should do a research too, but look at here please

Related

Zebra RW420 is printing the ZPL commands to get TID

I am trying to printing TID FROM ZPL COMMANDS
getting JJL179464
can anyone please tell me
what is this character
To partially answer your question:
Are you reading an ASCII encoded data? Because your result: "JJL179464" does not look like a valid RFID tag data, unless it is in ASCII. Data encoded in RFID tags are encoded in binary. Depending on reader settings, the data can be outputed in binary, hexadecimal or ASCII format. Judging by the first three symbols "JJL", your reader is set to output ASCII data, or there is an error in your code.
Try to answer us the following questions:
What are you trying to achieve?
Provide us your code. (whole, structured)
What device are you using to read the RFID tag?
Provide us the settings of your reading device. (unless they are a part of the code)
Do you know the data content of the RFID tag you are trying to read? That means, can you validate that the reading was successful?
Edit:
Thank you for your code:
^XA
^FN1^RFR,H,0,12,2^FS^FH_^HV1,256^FS
^XZ
It seems that there could be several issues in your code.
Firstly, your ^HV command is incomplete. It is missing 3 parameters. The first one (third parameter) sets the data prefix. Next one data termination. And the last one specifies when to return data. You should include all of them in the ^HV command.
There is already a good example how the ^HV command needs to be set:
^RFR,^FN1,^HV1 not sending output to computer
The second issue, at least I think that it is an issue but I don't have the means to verify it, is that you are using ^FH_ command. There are no hexadecimal values for encoding special characters in your code, so there is no point in using it. So I would try to omit it.
Also, I am not sure about the order of commands. The ^FN1 command should be after ^RFR and before ^FS commands.
Try this code:
^XA
^RFR,H,0,12,2^FN1^FS^HV1,256,HEADER,TERMINATION,L^FS
^XZ
That should give you output in format:
HEADERhexadecimaldataTERMINATION
It is a little bit hard to read, but if it will work, then you might proceed to format it nicely.
The words HEADER and TERMINATION serve as prefix and postfix of data from ^RFR command. So if this will work you can replace them with brackets or whatever suits your needs.
I am also concerned about 2 things:
The number of bytes to read - 12. Usually it is 8, but it varies depending on the type of RFID tag and the data format. I don't say that it is a mistake, just unsual to me.
The last parameter in ^HV1 command may be "F" instead of "L". The "F" is default value and it seems that in your case it was working with it. At least you got some output, so maybe it should be "F". But try it with "L" to get a response for each label. "F" means getting a response after the entire job is done.
I hope this will work. Currently we are in lockdown and I don't have the means to verify this on real devices. But theoreticaly it should help.
Please let me know the results.

How to find file encoding type or convert any encoding type to UTF-8 in shell?

I get text file of random encoding format, usc-2le, ansi, utf-8, usc-2be etc. I have to convert this files to utf8.
For conversion am using the following command
iconv options -f from-encoding -t utf-8 <inputfile > outputfile
But if incorrect from-encoding is provided, then incorrect file is generated.
I want a way to find the input file encoding type.
Thanks in advance
On Linux you could try using file(1) on your unknown input file. Most of the time it would guess the encoding correctly. Or else try several encodings to iconv till you "feel" that the result is acceptable (for example if you know that the file is some Russian poetry, you might try KOI-8, UTF-8, etc.... till you recognize a good Russian poem).
But character encoding is a nightmare and can be ambiguous. The provider of the file should tell you what encoding he used (and there is no way to get that encoding reliably and in all cases : there are some byte sequences which would be valid and interpreted differently with various encodings).
(notice that the HTTP protocol mentions and explicits the encoding)
In 2017, better use UTF-8 everywhere (and you should follow that http://utf8everywhere.org/ link) so ask your human partners to send you UTF-8 (hopefully most of your files are in UTF-8, since today they all should be).
(so encoding is more a social issue than a technical one)
I get text file of random encoding format
Notice that "random encoding" don't exist. You want and need to find out what character encoding (and file format) has been used by the provider of that file (so you mean "unknown encoding", not "random" one).
BTW, do you have a formal, unambiguous, sound and precise definition of text file, beyond file without zero bytes, or files with few control characters? LaTeX, C source, Markdown, SQL, UUencoding, shar, XPM, and HTML files are all text files, but very different ones!
You probably want to expect UTF-8, and you might use the file extension as some hint. Knowing the media-type could help.
(so if HTTP has been used to transfer the file, it is important to keep (and trust) the Content-Type...; read about HTTP headers)
[...] then incorrect file is generated.
How do you know that the resulting file is incorrect? You can only know if you have some expectations about that result (e.g. that it contains Russian poetry, not junk characters; but perhaps these junk characters are some bytecode to some secret interpreter, or some music represented in weird fashion, or encrypted, etc....). Raw files are just sequences of bytes, you need some extra knowledge to use them (even if you know that they use UTF-8).
We do file encoding conversion with
vim -c "set encoding=utf8" -c "set fileencoding=utf8" -c "wq" filename
It's working fine , no need to give source encoding.

Ruby 1.9 iso-8859-8-i encoding

I'm trying to create a piece of code that will download a page from the internet and do some manipulation on it. The page is encoded in iso-8859-1.
I can't find a way to handle this file. I need to search through the file in Hebrew and return the changed file to the user.
I tried to use string.encode, but I still get the wrong encoding.
when printing the response encoding, I get: "encoding":{} like its undefined, and this is an example of what it returns:
\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd-\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd\ufffd\ufffd
It should be Hebrew letters.
When I try with final.body.encode('iso-8859-8-i'), I get the error code converter not found (ASCII-8BIT to iso-8859-8-i).
When you have input where Ruby or OS has incorrectly assign encoding, then conversions will not work. That's because Ruby will start with the wrong assumption and try to maintain the wrong characters when converting.
However, if you know from some other source what the correct encoding is, you can use force_encoding method to tell Ruby how to interpret the bytes it has loaded into a String. Note this alters the object in place.
E.g.
contents = final.body
contents.force_encoding( 'ISO-8859-8' )
puts contents
At this point (provided it works), you now can make conversions (to e.g. UTF-8), because Ruby has been correctly told what characters it is dealing with.
I could not find 'ISO-8859-8-I' on my version of Ruby. I am not sure yet how close 'ISO-8859-8' is to what you need (some Googling suggests that it may be OK for you, if the ...-I encoding is not available).

What's the default encoding for System.IO.File.ReadAllText

if we don't mention the decoding what decoding will they use?
I do not think it's System.Text.Encoding.Default. Things work well if I EXPLICITLY put System.Text.Encoding.Default but things go wrong when I live that empty.
So this doesn't work well
Dim b = System.IO.File.ReadAllText("test.txt")
System.IO.File.WriteAllText("test4.txt", b)
but this works well
Dim b = System.IO.File.ReadAllText("test.txt", System.Text.Encoding.Default)
System.IO.File.WriteAllText("test4.txt", b, System.Text.Encoding.Default)
If we do not specify encoding will vb.net try to figure out the encoding from the text file?
Also what is System.Text.Encoding.Default?
It's the system default. What is my system default and how can I change it?
How do I know encoding used in a text file?
If I create a new text file and open it with scite I see that the encoding is code page property. What is code page property?
Look here, "This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected."
see also http://msdn.microsoft.com/en-us/library/ms143375(v=vs.110).aspx
This method uses UTF-8 encoding without a Byte-Order Mark (BOM)

Getprivateprofilestring Bug

I encrypted some text and put it in a INI file. Then I used getprivateprofilestring() to retrieve the value but some of the end characters are missing. I suspect it may be a new line character causing it to be incomplete. Writing to the INI file is OK. Opening the INI file and looking at the sections and keys - everything is in order. Its just the retrieving part that causes the bug.
Please any help would be appreciated.
Thanks
Eddie
First off when encrypting strings, make sure that they are converted to Base64 before dumping them into the INI file.
Most likely, the encrypted string created an ascii character which is not handled very well by the INI related APIs.
WritePrivateProfileStringW writes files in the active ANSI codepage by default; WritePrivateProfileStringA will always write ANSI.
To achieve the best results, follow the directions here and use GetPrivateProfileStringW when reading the data back
It's more than likely that the encryption is injecting a NULL character into the stream you are writing. GetPrivateProfileString will read a string till it finds a NULL character.
So I agree with Angry Hacker, convert to Base64 or some other friendly human readable encoding and you won't have any problems.

Resources