No UTF8 encoding in Magento blocks - magento

I am using some widgets for showing new/bestselling items on my frontpage. This widgets need some blocks with the articles in it.
Recently, I moves my magento folder and database to another server and since then, some unicode signs (€, ä, ü, etc) are not shown correctly anymore. First, I thought this could be a database issue, but the encoding ist the same. Whatever non-ascii sign I type it, gets wrong like  € or ä.
Do you know what the issue for this problem can be and how to solve it? I am confused, because I did not change any files or settings.

The "Binary" transfer mode of FTP copies files exactly, byte for byte.
Try this method.

Related

PC-DOS vs MS-DOS vs Windows multilingual text files

As far as a I know, in 1987 PC-DOS 3.3 as well as MS-DOS 3.3 were released and they had several code pages (850, 860, 863, 865).
Does it mean that user could write text using Portuguese (cp860) and, say, Nordic (cp865) symbols in one file?
Or it was something like one code page per one operation system. For example, PC-DOS from Portugal had only 860 code page and user could use symbols only from that code page, and PC-DOS from Scandinavia had only 865 code page.
The same question about Windows. Starting from what version it started to support multilingual text documents?
DOS has not really knowledge of code page. They were just ASCII strings (zero or dollar terminated).
Codepage were used mostly for display: changing a code page, it will change how a bytecode is printed on screen.
What you describe here, it is a frequent problem: mixed encoding in one text. If you are old enough, you will remember a lot of such problem in web. The text file has no tag or metadata about the codepage. If you mix it, you will just see the characters according the active codepage. You change the codepage of screen, and you will get a new interpretation of characters.
You can do anything you want in your own file. It's communicating how to read it to others that would be a problem.
So, no, not really. Using more than one character encoding in a file and calling it a text file would be more trouble than it's worth.
What the setting of an operating system does not have a direct relationship on the contents of a file. Programs that exchange files between systems (such as over the Internet) might use an understanding of the source character encoding and a local setting for character encoding and do a lossy transcoding.
Nothing has changed except with the advent of Unicode more than 25 years ago, more scripts than you can imagine are available in one character set. So, if there is any transcoding to be done, ideally, it would only be to UTF-8.

RStudio: keeping special characters in a script

I wrote a script with German special characters e.g. ü.
However, whenever I close R and reopen the script the characters are substituted:
Before "für"; "hinzufügen"; "Ø" - After "für"; "hinzufügen"; "Ã".
I tried to remedy it using save with encoding and choosing UTF-8 as it is stated here but it did not work.
What am I missing?
You don't say what OS you're using, but this kind of thing really only happens on Windows nowadays, so I'll assume that.
The problem is that Windows has a local encoding that is not UTF-8. It is commonly something like Latin1 in English-speaking countries. I'm not sure what encoding people use in German-speaking countries, if that's where you are. From the junk you saw, it looks as though you saved the file in UTF-8, then read it using your local encoding. The encodings for writing and reading have to match if you want things to work.
In RStudio you can try "Reopen with encoding..." and specify UTF-8, and you'll probably get your original back, as long as you haven't saved it after the bad read. If you did that, you've got a much harder cleanup to do.

Codemirror (AjaXplorer/Linux Webserver): ANSI-files with special characters

I have a AjaXplorer installation on a Linux Webserver.
One of the plugins of AjaXplorer is Codemirror - to view and edit text files.
Now I have the following situation: If I create a txt-File on Windows (ANSI) and upload it into Ajaxplorer (UTF-8), Codemirror shows every special character as a question mark. Consequently the whole file will be saved with question marks instead of the special characters.
But if a file once is saved in UTF-8, the special characters will be saved correctly.
So the problem exists in opening the ANSI-File. This is the point where I have to implement the solution, for example convert ANSI to UTF-8.
The 'funny' thing is, that if I open a fresh uploaded ANSI-File and a saved UTF-8 file with VIM on the Linux console, they look exactly the same, but the output in codemirror is different.
This is a uploaded ANSI-File with special characters like ä and ö and ü
Output in Codemirror: 'like ? and ? and ?'
This is a saved UTF8-File with special characters like ä and ö and ü
Output in Codemirror 'like ä and ö and ü'
This is the CodeMirror-Class of AjaXplorer and I think here must be the point where I could intervene:
https://github.com/mattleff/AjaXplorer/blob/master/plugins/editor.codemirror/class.CodeMirrorEditor.js
As you may see, I'm not a pro and I already tried some code pieces - otherwise I already had the solution ;-) I would be happy if someone gives me a hint! Thank you!!!

Some Chinese, Japanese characters appear garbled on IE/FF

We have a content rendering site that displays content in multiple languages. The site is built using JSP and content fetched from Oracle DB. All our pages are UTF-8 compliant.
When displaying zh/jp content, out of the complete content, only some of the character appear garbled (square boxes on IE and diamond question marks on FF). The data in DB does not have any garbled character. Since we dont understand the language, we dont know what characters are problematic. Would appreciate some pointers to the solution please. Could it be that some characters may be appearing invalid to the browsers?
Example in FF:
ネット犯���者 がアプ
脆弱性保護機能 - ネット犯���者 がアプリケーションのセキュリティホール (脆弱性) を突いて、パソコンに脅威を侵入させることを阻止します。
In doubt on the sanity of a UTF-8 encoding, you can always re-encode it either with a good text editor or with a specialized tool like iconv :
iconv -f UTF-8 -t UTF-8 yourfile > youfile2
If your file is indeed invalid, iconv will also give you some information on the problem.
But, another way you might want to explore is installing new fonts for Far East languages…
Indeed, not knowing the actual bytes used in your file, it is hard to say why their are replaced with a replacement character � (U+FFFD). Thus, you might want to post a hex dump of the parts of your file that do not work.

tcl utf-8 characters not displaying properly in ui

Objective : To have multi language characters in the user id in Enovia v6
I am using utf-8 encoding in tcl script and it seems it saves multi language characters properly in the database (after some conversion). But, in ui i literally see the saved information from the database.
While doing the same excercise throuhg Power Web, saved data somehow gets converted back into proper multi language character and displays properly.
Am i missing something while taking tcl approach?
Pasting one example to help understand better.
Original Name: Kátai-Pál
Name saved in database as: Kátai-Pál
In UI I see name as: Kátai-Pál
In Tcl I use below syntax
set encoded [encoding convertto utf-8 Kátai-Pál];
Now user name becomes: Kátai-Pál
In UI I see name as “Kátai-Pál”
The trick is to think in terms of characters, not bytes. They're different things. Encodings are ways of representing characters as byte sequences (internally, Tcl's really quite complicated, but you shouldn't ever have to care about that if you're not developing Tcl's implementation itself; suffice to say it's Unicode). Thus, when you use:
encoding convertto utf-8 "Kátai-Pál"
You're taking a sequence of characters and asking for the sequence of bytes (one per result character) that is the encoding of those characters in the given encoding (UTF-8).
What you need to do is to get the database integration layer to understand what encoding the database is using so it can convert back into characters for you (you can only ever communicate using bytes; everything else is just a simplification). There are two ways that can happen: either the information is correctly shared (via metadata or defined convention), or both sides make assumptions which come unstuck occasionally. It sounds like the latter is what's happening, alas.
If you can't handle it any other way, you can take the bytes produced out of the database layer and convert into characters:
encoding convertfrom $theEncoding $theBytes
Working out what $theEncoding should be is in general very tricky, but it sounds like it's utf-8 for you. Once you've got characters, Tcl/Tk will be able to display them correctly; it knows how to transfer them correctly into the guts of the platform's GUI. (And in scripts that you actually write, you're best off replacing non-ASCII characters with their \uXXXX escapes, because platforms don't agree on what encoding is right to use for scripts. Alas.)

Resources