I am using Spring Boot version 2.0.5. and liquid template version 0.7.8
My problem is when I am using German text in the template file and when sending mail then few German characters converted into ? mark.
So what is the solution for this?
Somewhere along the path from the text file template, through processing and sending out as an email the character encoding is being mangled, so that the German characters, encoded in one scheme, are being incorrectly rendered as the wrong "glyph" in the other scheme, in the email.
The first things to check are what the encoding is for the template file. Then investigate how the email is being rendered. For example if it is an HTML email see if there is a character encoding reference in the header with a different encoding, e.g.:
<head><meta charset="utf-8" /></head>
If this differs from the encoding of the file, e.g. ISO-8859-1, then the first thing I would try is to resave the template in UTF-8, you should be able to do that within most IDEs or advanced text editors such as Notepad++
(As the glyphs are question marks it may be that the template is UTF-8 or UTF-16 and the HTML is in a more limited charset.)
If that doesn't work then you may need to look at your code and pay attention to how the raw bytes from the template are converted to Strings. For example:
String template = new String(bytesFromFile);
Would use the system default Charset, which might be different from the file. The safe way to convert the bytes to the String is to specify the character set:
String template = new String(bytesFromFile, "UTF-8");
Related
I have a CSV with content that is UTF-8 encoded. However, various applications and systems errorneously detect the encoding of the CSV as Windows-1252, which breaks all the special characters in the file (e.g. Umlauts).
I can see that Sublime Text (on Windows) for example also automatically detects the wrong Windows-1252 encoding, when opening the file for the first time, showing garbled text where special characters are supposed to be.
When I choose Reopen with Encoding » UTF-8, everything will look fine, as expected.
Now, to find the source of the error I thought it might help to figure out, why these applications are not automatically detecting the correct encoding in the first place. May be there is a stray character somewhere with the wrong encoding for example.
The CSV in question is actually an automatically generated product export of a Magento 2 installation. Recently the character encodings broke and I am currently trying to figure out what happened - hence my investigation on why this export is detected as Windows-1252.
Is there any reliable way of figuring out why the automatic detection of applications like Sublime Text assume the wrong character encoding?
This is what I did in the end to find out why the file was not detected as UTF-8, i.e. to find the characters that were not encoded in UTF-8. Since PHP is more readily available to me, I decided to simply use the following script, to force convert anything that is not UTF-8 to UTF-8, using the very handy neitanod/forceutf8 library.
$before = file_get_contents('export.csv');
$after = \ForceUTF8\Encoding::toUTF8($before);
file_put_contents('export.fixed.csv', $after);
Then I used a file comparison tool like Beyond Compare to compare the two resulting CSVs, in order to see more easily which characters were not originally encoded in UTF-8.
This in turn showed me that only one particular column of the export was affected. Upon further investigation I found out that the contents of that column were processed in PHP with the following preg_replace:
$value = preg_replace('/([^\pL0-9 -])+/', '', $value);
Using \p in the regular expression had an unknown side effect: all the special characters were converted to another encoding. A quick solution to this is to use the u flag on the regex (see regex pattern modifiers reference). This forces the resulting encoding of this preg_replace to be UTF-8. See also this answer.
I would like to translate a game, this game loads the strings from a text file.
The destination language uses non-ascii characters, so I naïvely saved my file in utf8, but it does not work as letters with diacritics are not shown correctly.
Studying better in the configuration file where the string text filename is stored, I found a CHARSET option that can assume any of those values:
ANSI_CHARSET DEFAULT_CHARSET SYMBOL_CHARSET MAC_CHARSET SHIFTJIS_CHARSET HANGEUL_CHARSET JOHAB_CHARSET GB2312_CHARSET CHINESEBIG5_CHARSET GREEK_CHARSET TURKISH_CHARSET VIETNAMESE_CHARSET HEBREW_CHARSET ARABIC_CHARSET BALTIC_CHARSET RUSSIAN_CHARSET THAI_CHARSET EASTEUROPE_CHARSET OEM_CHARSET
That as far as I understood are fairly standard values in WinAPIs and charset and character encoding are synonymous.
So my question is, is there a correspondence between this names and standard names like utf8 or iso-8859-2? If it is the case what is it?
Try using EASTEUROPE_CHARSET
ISO 8859-2 is mostly equivalent to Windows-1250. According to this MSDN article, the 1250 code page is accessed using EASTEUROPE_CHARSET.
Note that you will need to save your text file in the 1250 code page as ISO 8859-2 is not exactly equivalent. From Wikipedia:
Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place). Most of the rearrangements seem to have been done to keep characters shared with Windows-1252 in the same place as in Windows-1252 but three of the characters moved (Ą,Ľ,ź) cannot be explained this way.
The names are symbolic identifiers for Windows code pages, which are character encodings (= charsets) defined or adopted by Microsoft. Many of them are registered at IANA with the prefix windows-. For example, EASTEUROPE_CHARSET stands for code page 1250, which has been registered as windows-1250 and is often called Windows Latin 2.
UTF-8 is something different. You need special routines to read and write UTF-8 encoded data. UTF-8 or UTF-16 is generally the only sensible choice for character encoding when you want to be truly global (support different languages and writing systems). For a single specific language, some of the code pages might be more practical in some cases.
You can get the the standard encoding names (as registered by IANA) using the table under the remarks section of this MSDN page.
Just find the Character set row and read the Code page number, the standard name is windows-[code page number].
Application works when I am using the following code :
xulschoolhello.greeting.label = Hello World?
But when I use Unicode, Application does not work :
xulschoolhello.greeting.label = سلام دنیا ?
Why does not work?
I don't have a problem loading that string in my extension in a xul file from chrome://. Make sure you are not overriding the encoding (UTF-8 by default). See this page for more information.
To make sure, change your XUL's first line to:
<?xml version="1.0" encoding='UTF-8' ?>
In case you are using this in a properties file, make sure you save the .properties file in utf-8 format. From Property Files - XUL | MDN:
Non-ASCII Characters, UTF-8 and escaping
Gecko 1.8.x (or later) supports property files encoded in UTF-8. You
can and should write non-ASCII characters directly without escape
sequences, and save the file as UTF-8 without BOM. Double-check the
save options of your text editor, because many don't do this by
default. See Localizing extension descriptions for more details.
In some cases, it may be useful or needed to use escape sequences to
express some characters. Property files support escape sequences of
the form: \uXXXX , where XXXX is a Unicode character code. For
example, to put a space at the beginning or end of a string (which
would normally be stripped by the properties file parser), use \u0020
.
I want to show Shift-jis characters but only when displaying it. Store in UTF-8 and show in Shift-jis, so what is the solution to do that in Smarty?
You cannot mix different charsets/encodings in the output to the browser. So you can either send UTF-8 OR Shift-jis.
You can use UTF-8 internally and in an outputfilter convert the complete output from UTF-8 to Shift-jis (using mb_convert_encoding).
Smarty is not (really) equipped to deal with charsets other than ASCII supersets (like Latin1, UTF-8) internally.
if we don't mention the decoding what decoding will they use?
I do not think it's System.Text.Encoding.Default. Things work well if I EXPLICITLY put System.Text.Encoding.Default but things go wrong when I live that empty.
So this doesn't work well
Dim b = System.IO.File.ReadAllText("test.txt")
System.IO.File.WriteAllText("test4.txt", b)
but this works well
Dim b = System.IO.File.ReadAllText("test.txt", System.Text.Encoding.Default)
System.IO.File.WriteAllText("test4.txt", b, System.Text.Encoding.Default)
If we do not specify encoding will vb.net try to figure out the encoding from the text file?
Also what is System.Text.Encoding.Default?
It's the system default. What is my system default and how can I change it?
How do I know encoding used in a text file?
If I create a new text file and open it with scite I see that the encoding is code page property. What is code page property?
Look here, "This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected."
see also http://msdn.microsoft.com/en-us/library/ms143375(v=vs.110).aspx
This method uses UTF-8 encoding without a Byte-Order Mark (BOM)