Exporting to CSV (using built in tools) - incorrect characters

Exporting to CSV (using built in tools) - incorrect characters - freemarker

When I export data from a screen using built-in show-csv-button="true", the application prints incorrect characters on places of special characters (the EUR sign, accent marks). Can I fix it by changing character set?

This seems to be an issue of managing the way MS Excel loads CSV. Opening it directly from browser downloads causes the problem. When I import the file in a standard way, letting Excel check the character set, data is OK.
Using Excel's import utility, it recognizes the UTF-8.
When opening from OS, 1250: Central European is used.

Related

Prevent Windows changing ANSI characters

I'm having this problem with (I assume) Windows.
I need to convert an existing application written in Delphi 7 in a multilingual application.
I'm using the Delphi ITE and everything works well, except, when the file is saved and recompiled, special characters like ë is converted to e.
I thought that this is a Delphi issue but then realized that, even if I create text document in notepad, insert the character ë, save the document in ANSI format, and then open it again, the character is converted to e.
Is there any workaround to this, except for upgrading to Delphi 2009 and using unicode components?
How can I make Windows keep the original character?
Obviously this is not a coding issue, so no relevant code could be posted.
Thanks

UTF-8 Tab Separated File not opening in Excel properly

I am generating tab separated .xls file from SAP MII using XSLT which has few french characters as headers/columns.
When I open this file in excel, all special characters are messed up and does not appear correctly.
Such as Description Ã©vÃ¨nement
When I store this file as .txt and open it in notepad, everything is displayed correctly.
It seems that Excel is not opening this file in UTF-8 formay but opens by default in ASCII format.
How do I avoid that?
Soham

You have to open excel then import the CSV off the data tab. UTF-8 will be recognized during the import. That said, I do need the ability to do this by simply double clicking on the CSVs.

Why do Netbeans, Aptana Studio and Komodo Edit all not save in UTF-8?

I'm getting back into development and want to find a good editor for HTML5/JQuery.
Being able to save files in UTF-8 is important.
However, although I set my project in NetBeans 7.0 to encode in UTF-8, when I create a file in the project, then look at it in Notepad++, the file is encoded in ANSI and I have to manually set the encoding to UTF-8:
In Aptana Studio 3 I set the workspace to UTF-8 encoding, and my project inherits from that, but when I create a file in the project and look at it in Notepad++, it is encoded in ANSI and I have to change the encoding manually to UTF-8:
So I tried Komodo Edit 7 and in the file manually set the encoding to UTF-8, saved the file, looked at it in Notepad++ which said the file is in ANSI.
I notice in any of these editors if I put a German umlaut character in the file, then Notepad++ shows it as "ANSI as UTF-8" but I still have to manually change it to UTF-8 in Notepad++ where it will stay.
The reason I want an editor that saves in UTF-8 is I remember having a project a couple years ago which had German and French characters in the files and after they were viewed and saved in various editors, the characters would be replaced with garbage characters. The solution was to always initially set the encoding of the file to UTF-8.
I assumed that editors would be so far advanced now that if you specify that the files should be saved in UTF-8, that they actually save in UTF-8 in a way that is recognized by every modern text editor. Is this not the case? What am I not understanding about modern text editors and development environments in regard to UTF-8?
How can I get these editors to save their files in UTF-8 encoding?

A UTF-8 encoded file that only contains characters also present in the ASCII table (the first 128 Unicode characters, i.e. your basic alphanumeric characters) is indistinguishable from an ASCII/ANSI encoded file. My guess is that Notepad++ simply can't make the distinction (because there is none) and defaults to ANSI. You can see the difference when you include a character that is not in the ASCII table. By "ANSI as UTF-8" I can only guess that it means "this documents contains characters from the ANSI table (a.k.a. Latin-1) and is saved in UTF-8".
In other words, your IDEs are probably fine, the problem is with Notepad++.
Try a character like 漢字, that will result in a pretty unique UTF-8 byte sequence that's most certainly not ANSI.

From what I've seen on this topic, Notepad's UTF-8 equates to Notepad++'s UTF-8, which means with BOM included. If a file is saved with this encoding and opened in NetBeans, it will actually show a - character or the ï»¿ characters for the BOM sequence (depending on whether the encoding for the project or IDE is set to UTF-8.) But if you save the file in Notepad++ encoded as "UTF-8 without BOM", and have either your project defined as UTF-8 or have your netbeans_default_options included with this -J-Dfile.encoding=UTF-8, you'll see what I think is UTF-8 as it should be. Unfortunately, if you try to edit this file in NetBeans without including characters that are outside of the ANSI code set, you see the behavior that you referred to in your question with the file having its encoding set to ANSI.
So in an attempt to make this a "sort-of" answer to your question, please remember that not all editor's concept of UTF-8 are the same. Notepad++ gives the most actual info on what the real encoding for a file is. I'd say that developing in either a Linux or Mac environment might be a possible good choice for making sure that localization is correct, but on Windows a decent workaround might be to just include a non-ANSI character in the file to insure it always get saved as a UTF-8 (non-BOM) file. This is all geared towards NetBeans dev by the way. I haven't tested this with the others, though I'm willing to bet that they will save the file correctly on a Windows machine if they have non-ANSI characters in them. Sorry for the kluge gang, but either way, I hope it helps someone struggling with this same issue.

TextPad and Unicode: full support?

I've got some UTF-8 files created in Mac, and when trying to open them using TextPad in Windows, I get the following warning:
WARNING: (file name) contains characters that do not exist in code
page 1252 (ANSI Latin 1). They will be converted to the system default
character, if you click OK.
Linux (GNOME gEdit) can open the same file without complaints. What does the above mean? I thought that TextPad had full UTF-8 support. Can I safely open and edit UTF-8 files using it without corrupting the file?

It seems that TextPad cannot handle characters outside windows-1252 (CP1252, here carrying the misnomer “ANSI Latin 1”). I tested it on Windows, opening a plain text file created on the same system, as UTF-8 encoded, both with and without BOM, with the same result. The program’s help does not seem to contain anything related to character encodings, and its tools for writing “international characters” are for Latin-1 characters only.
There are several text editors for Windows that can deal with UTF-8 (even Notepad can open a UTF-8 file, but it can hardly be recommended for serious editing). See Alan Wood’s collection of information on Unicode editors and word processors for Windows. (Personally, I like Notepad++ and BabelPad, which are both free.)

TextPad 8, the newest as of 2016-01-28, does finally properly support BMP Unicode. It's a paid upgrade, but so far has been working flawlessly for me.

TextPad ‘supports’ UTF-8 and UTF-16 documents only in as much as it will import and export them. But it still edits files as simple bytes, and not Unicode characters (using the ANSI code page, which is code page 1252 for Western European).
So unless the file happened to contain only characters that also exist in that code page, you will lose content. This rather defeats the point of Unicode.
Indeed, this was the issue that made me flee—to EmEditor, at the time, though now I would agree with the previous comments and recommend Notepad++. The era of paying for text editors is long gone.

Actually TextPad does support displaying Unicode code points granted they went about it the wrong way. In order to display the Unicode characters you have to choose Configure->Preferences and expand "Document Classes->Text->Font.
You need to choose a Unicode font AND set the Script to match. E.g. Arial Unicode MS with script CHINESE_BIG5.
However, this is a backward approach since the application should handle this when the user tells TextPad to open the file in Unicode or UTF-8. The built in Notepad application with MS Windows will detect the encoding automatically and display the glyphs correctly based upon the encoding.

I found a discussion on this in the Textpad forums:
http://forums.textpad.com/viewtopic.php?t=11019
While I have Notepad++, Textpad handles large files with ease while other editors I've tried, including Notepad++, either slow to a crawl or die. I'm currently trying to edit a 475MB file and Notepad++ is not up to the task.

Textpad Configure Menu --> Preferences --> Document Classes --> Default --> Default encoding --> UTF-8

Try the ANSI code set with File/Open, that should solve the problem in TextPad

Globalizing source code

I'm running an open source project and every now and then Chinese users report build errors due to unrecognized escape sequences .cs and .js files. When they paste the files as they see them I notice that the Latin characters are changed into Chinese.
I'm using Visual Studio and when I look at "Advanced Save Options" the setting is "Western European (Windows) - Codepage 1252".
Should I be using Unicode (UTF-8 with signature)? Is there a way to convert all files in the solution?

You should probably use UTF-8 (or UTF-16) so you can handle all characters. Codepage 1252 is almost the same as ISO-8859-1 but that can't handle Chinese characters. Here is a link that can be of interest.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Exporting to CSV (using built in tools) - incorrect characters - freemarker

When I export data from a screen using built-in show-csv-button="true", the application prints incorrect characters on places of special characters (the EUR sign, accent marks). Can I fix it by changing character set?

Related

Prevent Windows changing ANSI characters

UTF-8 Tab Separated File not opening in Excel properly

Why do Netbeans, Aptana Studio and Komodo Edit all not save in UTF-8?

TextPad and Unicode: full support?

Globalizing source code

Categories

Resources