wxWidgets and UTF8 - some characters missing - utf-8

So I have this file encoded in UTF8. I load it and print like this:
char buffer[2048] = {0};
FILE *pFile = fopen("D:/localization.csv","rb");
int iret = fread(buffer,1,2048,pFile);
fclose(pFile);
wxString strMessageText = wxString::FromUTF8(buffer);
wxMessageBox(strMessageText);
The problem is that when the text contains some "invalid" characters, it doesn't get created (length of strMessageText is 0). I noticed, for instance, that Danish or German characters are fine but when I put Polish or Russian chars in the text file the wxString::FromUTF8 function fails to create proper text. Any idea?

If the file contains correctly encoded UTF-8 text, wxString::FromUTF8() will decode it. If it doesn't, you can still use wxMBConvUTF8 with e.g. MAP_INVALID_UTF8_TO_OCTAL to preserve even incorrectly encoded bytes in the input, but this isn't a good idea, in general.

I found solution here https://forums.wxwidgets.org/viewtopic.php?f=1&t=41068
It turned out that my wxWidgets lib was out of date. I had version 2.8.12 and updated to 3.0.2 and it's fine.

Related

How to handle the javascript decoding error?

When the code is implemented, some characters cannot be decoded. I am getting a bunch of question marks like ??. How can I fix this?
HtmlInput inputBox2 = (HtmlInput)currentPage.getHtmlElementById("classNo");
inputBox2.setValueAttribute("2016同學15");
ScriptResult result = currentPage.executeJavaScript("javascript:Search(2)");
I found this in the compiler: ScriptResult[result=net.sourceforge.htmlunit.corejs.javascript.Undefined#24d7aac3 page=HtmlPage(http://www.xx.org/classNo=2016??15)#1330510442]
You might try to use URL-encoding for some ASCII and all non ASCII characters.
e.g. space by %20
Here is a web site explaning the
HTML URL Encoding Reference.
You can also interactive encode strings there.
Your "2016同學15" would be encoded as:
"2016%E5%90%8C%E5%AD%B815"

special character was lost when saving excel into csv file

I have an Excel file including latin character, which is shown as follows:
abcón
After saving it into a csv file, the latin character was lost
abc??n
What causes this problem and how to solve it? Thanks.
It's likely that the ó you're using in the excel file isn't supported in ascii text. There are a couple different symbols that look almost if not entirely identical. From the Insert->Symbol character map, 00F3 is supported and is from the latin extended alphabet. However, 1F79 from the greek extended alphabet is not supported and from my casual inspection is identical. Try replacing the char in question with the char from the char map.
Alternatively, you can use Alt-Codes and use 0243 for the char which should work.

Printing superscript / subscript to zebra printer using ZPL

I'm trying to find a solution to print superscript using ZPL.
Example, if I have this string of ZPL:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FDHello, World!^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
Since there aren't any superscript characters, I could send this to my printer without issue. But if I wanted to use this string:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FDe = mc²^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
The superscript won't print natively. I think I need to access an international character set or something but I'm not sure how to do this, especially if I only need it for the one character. Do I need to change my entire character set, or do some sort of "replace" on it?
Note, we are generating ZPL code manually and shooting it directly at the printers (unfortunately this is our system), bypassing any drivers or 3rd party dev components of any kind.
Mark's answer gave me exactly what I needed to solve my issue. Here is additional information to further clarify the solution:
To use the hex code in your data you need to prefix the ^FD command with ^FH_ (where ^FH tells the printer the data in ^FD will contain hex values and the _ defines the hex code identifier so it knows which data is or is not defined as a hex code instead of standard text)
I got this to work immediately exactly as you mentioned. Then testing against additional printers I found (but not sure why) that I didn't need to actually send in the ^CI13 to specify code page 850. The ² appeared on all printers even when I didn't send the ^CI13
In my .NET application, for some reason the ² didn't map to the correct hex code that the ZPL code page expected (the .NET app converted ² to hex code b2 instead of fd, but for most standard characters converted to the same code as the ZPL map) so in my application I created a conversion table where any character I defined in my table I mapped to the ZPL hex code and any character I didn't define I allowed to remain as converted by the application).
I'd never used information from the non default code page and I didn't realize when using ^FH that you could mix standard text with hex (I thought if you used ^FH that "all" of the information in ^FD had to be hex). So the information Mark provided let me right down the correct path.
The final example to solve the problem, using the information Mark provided, is:
string ZPLString =
"^XA" +
"^FO50,50" +
"^A0N50,50" +
"^FH_" +
"^FDe = mc_fd^FS" +
"^XZ";
sendToZebraPrinter(ZPLString);
Try using ^CI13 to select code page 850, then use _fd in your string for the superscripted 2. The underscore is used to designate a hex character.

Extended charsets chars not reccognized and converting to ? mark

I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.

convert text from utf to read-able text

I have some UTF-Text starting with "ef bb bf". How can I turn this message to human read-able text? vim, gedit, etc. interpret the file as plain text and show all the ef-text even when I force them to read the file with several utf-encodings. I tried the "recode" tool, it doesn't work. Even php's utf8_decode failed to produce the expected text output.
Please help, how can I convert this file so that I can read it?
ef bb bf is the UTF-8 BOM. Strip of the first three bytes and try to utf8_decode the remainder.
$text = "\xef\xbb\xbf....";
echo utf8_decode(substr($text, 3));
Is it UFT8, UTF16, UTF32? It matters a lot! I assume you want to convert the text into old-fashioned ASCII (all characters are 1 byte long).
UTF8 should already be (at least mostly) readable as it uses 1 byte for standard ASCII characters and only uses multiple bytes for special/multilingual characters (Character codes > 127). It sounds like your file isn't UTF8, or you'd already be able to read it! Online content is generally UTF-8.
Unicode character codes are the same as the old ASCII codes up to 127.
UTF16 and UTF32 always use 2 and 4 bytes respectively to encode every character, whether those characters can be represented in a single byte or not. That makes it unreadable if the text editor is expecting UTF8.
Gedit supports UTF16 and UTF32 but you need to 'add' those encoding explicitly in the open dialog box (and possibly select them explicitly instead of using auto-detect)

Resources