unicode in firefox extension - firefox

Application works when I am using the following code :
xulschoolhello.greeting.label = Hello World?
But when I use Unicode, Application does not work :
xulschoolhello.greeting.label = سلام دنیا ?
Why does not work?

I don't have a problem loading that string in my extension in a xul file from chrome://. Make sure you are not overriding the encoding (UTF-8 by default). See this page for more information.
To make sure, change your XUL's first line to:
<?xml version="1.0" encoding='UTF-8' ?>
In case you are using this in a properties file, make sure you save the .properties file in utf-8 format. From Property Files - XUL | MDN:
Non-ASCII Characters, UTF-8 and escaping
Gecko 1.8.x (or later) supports property files encoded in UTF-8. You
can and should write non-ASCII characters directly without escape
sequences, and save the file as UTF-8 without BOM. Double-check the
save options of your text editor, because many don't do this by
default. See Localizing extension descriptions for more details.
In some cases, it may be useful or needed to use escape sequences to
express some characters. Property files support escape sequences of
the form: \uXXXX , where XXXX is a Unicode character code. For
example, to put a space at the beginning or end of a string (which
would normally be stripped by the properties file parser), use \u0020
.

Related

How do WritePrivateProfileStringA() and Unicode go together?

I'm working on a legacy project that uses INI files and I'm currently trying to understand how Microsoft deals with INI files.
The documentation of WritePrivateProfileStringA() [MSDN] says for lpFileName:
If the file was created using Unicode characters, the function writes Unicode characters to the file. Otherwise, the function writes ANSI characters.
What exactly does that mean? What is a file "created using Unicode characters"? How does Microsoft determine whether a file was created using Unicode characters or not?
Since this is documented under lpFileName, do they refer to Unicode characters in the file name, like "if the file has a Japanese file name, we'll read it as Unicode"?
By default neither the ...A() nor the ...W() method supports Unicode as file contents for INI files. If e.g. the file does not exist, they will both create a file with ANSI content.
However, if you create the INI file first and you give it a UTF-16 BOM (byte-order-mark), both ...A() and ...W() will respect that BOM and write UTF-16 characters to the file.
Other than the BOM, the file can be empty, so a 2 byte file with 0xFF 0xFE content is enough to get the Microsoft API to write Unicode characters.
Both methods will not recognize and respect a UTF-8 BOM. In fact, a UTF-8 BOM can break an existing file if the UTF-8 BOM and the first section are both in line 1. In that case you can't access any of the keys in the affected section. If the first section is in line 2, the UTF-8 BOM will have no effect.
My tests on Windows 10 21H1 cannot confirm a statement about UTF16-BE support from 2006:
Just for fun, you can even reverse the BOM bytes and WritePrivateProfileString will write to it as a UTF-16 BE (Big Endian) file!

Liquid Template encoding issue with spring boot version 2.0.5

I am using Spring Boot version 2.0.5. and liquid template version 0.7.8
My problem is when I am using German text in the template file and when sending mail then few German characters converted into ? mark.
So what is the solution for this?
Somewhere along the path from the text file template, through processing and sending out as an email the character encoding is being mangled, so that the German characters, encoded in one scheme, are being incorrectly rendered as the wrong "glyph" in the other scheme, in the email.
The first things to check are what the encoding is for the template file. Then investigate how the email is being rendered. For example if it is an HTML email see if there is a character encoding reference in the header with a different encoding, e.g.:
<head><meta charset="utf-8" /></head>
If this differs from the encoding of the file, e.g. ISO-8859-1, then the first thing I would try is to resave the template in UTF-8, you should be able to do that within most IDEs or advanced text editors such as Notepad++
(As the glyphs are question marks it may be that the template is UTF-8 or UTF-16 and the HTML is in a more limited charset.)
If that doesn't work then you may need to look at your code and pay attention to how the raw bytes from the template are converted to Strings. For example:
String template = new String(bytesFromFile);
Would use the system default Charset, which might be different from the file. The safe way to convert the bytes to the String is to specify the character set:
String template = new String(bytesFromFile, "UTF-8");

find reason for automatic encoding detection (UTF-8 vs Windows-1252)

I have a CSV with content that is UTF-8 encoded. However, various applications and systems errorneously detect the encoding of the CSV as Windows-1252, which breaks all the special characters in the file (e.g. Umlauts).
I can see that Sublime Text (on Windows) for example also automatically detects the wrong Windows-1252 encoding, when opening the file for the first time, showing garbled text where special characters are supposed to be.
When I choose Reopen with Encoding » UTF-8, everything will look fine, as expected.
Now, to find the source of the error I thought it might help to figure out, why these applications are not automatically detecting the correct encoding in the first place. May be there is a stray character somewhere with the wrong encoding for example.
The CSV in question is actually an automatically generated product export of a Magento 2 installation. Recently the character encodings broke and I am currently trying to figure out what happened - hence my investigation on why this export is detected as Windows-1252.
Is there any reliable way of figuring out why the automatic detection of applications like Sublime Text assume the wrong character encoding?
This is what I did in the end to find out why the file was not detected as UTF-8, i.e. to find the characters that were not encoded in UTF-8. Since PHP is more readily available to me, I decided to simply use the following script, to force convert anything that is not UTF-8 to UTF-8, using the very handy neitanod/forceutf8 library.
$before = file_get_contents('export.csv');
$after = \ForceUTF8\Encoding::toUTF8($before);
file_put_contents('export.fixed.csv', $after);
Then I used a file comparison tool like Beyond Compare to compare the two resulting CSVs, in order to see more easily which characters were not originally encoded in UTF-8.
This in turn showed me that only one particular column of the export was affected. Upon further investigation I found out that the contents of that column were processed in PHP with the following preg_replace:
$value = preg_replace('/([^\pL0-9 -])+/', '', $value);
Using \p in the regular expression had an unknown side effect: all the special characters were converted to another encoding. A quick solution to this is to use the u flag on the regex (see regex pattern modifiers reference). This forces the resulting encoding of this preg_replace to be UTF-8. See also this answer.

How does visual studio resolve unicode string from different encoding source file ?

I know if I using unicode charset in vs, I can use L"There is a string" to present an unicode string. I think There is a string will be read from srouce file when vs is doing lexical parsing, it will decode There is a string to unicode from source file's encoding.
I have change source file to some different encodings, but I always got the correct unicode data from L marco. Dose vs detect the encoding of source file to covert There is a string to correct unicode ? If not, how does vs achieve this ?
I'm not sure whether this question could be asked in SO, if not , where should I ask? Thanks in advance.
VS won't detect the encoding without a BOM1 signature at the start of a source file. It will just assume the localized ANSI encoding if no BOM is present.
A BOM signature identifies the UTF8/16/32 encoding used. So if you save something as UTF-8 (VS will add a BOM) and remove the first 3 bytes (EF BB BF), then the file will be interpreted as CP1252 on US Windows, but GB2312 on Chinese Windows, etc.
You are on Chinese Windows, so either save as GB2312 (without BOM) or UTF8 (with BOM) for VS to decode your source code correctly.
1https://en.wikipedia.org/wiki/Byte_order_mark

Generate UTF-8 file with NotesStream

I'm trying to export some text to an UTF-8 file with LotusScript. I checked the documentation and the following lines should output my text as UTF-8, but Notepad++ says it's ANSI.
Dim streamCompanies As NotesStream
Dim sesCurrent as New NotesSession
Set streamCompanies = sesCurrent.CreateStream
Call streamCompanies.Open("C:\companies.txt", "UTF-8")
Call streamCompanies.WriteText("Test")
streamCompanies.Close
When I try the same with UTF-16 instead of UTF-8, the generated fileformat is correct. Could anyone point me in the right direction on how to write an UTF-8 file with LotusScript on a Windows platform?
Notes is most likely doing its job and encoding properly. It is likely that Notepad++ is interpreting the UTF-8 file as ANSI if no UTF-8-only characters exist in the file. There is no other way to determine the encoding in this case other than to analyze its contents.
See this SO answer: How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?
So a simple test to make sure Notes is working would be to output a non-ANSI character and then open in Notepad++ to confirm.
Closed - down the line while coding I stumbled across some data with Asian characters which where displayed correctly in my text editor. Rechecking file encodings I found the following:
If the output text only includes ASCII-chars, it is decoded as ANSI with Notepad++
If the output text contains e.g. Katakana, it is decoded as UTF-8 with Notepad++
-> problem solved for me.

Resources