Prevent Windows changing ANSI characters - windows

I'm having this problem with (I assume) Windows.
I need to convert an existing application written in Delphi 7 in a multilingual application.
I'm using the Delphi ITE and everything works well, except, when the file is saved and recompiled, special characters like ë is converted to e.
I thought that this is a Delphi issue but then realized that, even if I create text document in notepad, insert the character ë, save the document in ANSI format, and then open it again, the character is converted to e.
Is there any workaround to this, except for upgrading to Delphi 2009 and using unicode components?
How can I make Windows keep the original character?
Obviously this is not a coding issue, so no relevant code could be posted.
Thanks

Related

Why some software can display all characters and some not?

Reference text: どうもありがとうございました
Copied to:
Notepad/Notepad++: displays it with no problems
LibreOffice Writer: it changes the font family to work, if you convert to Lucida Console, square boxes appear
Windows: displays it with no problems
Console: it needs the correct chcp and a font family (Lucida Console displays square boxes here too) which can display them if I am right
Is it possible to explain why Notepad can display any text in any font family and LibreOffice + Console cannot? Where is(are) the difference(s)? Is it possible to have the same behaviour on the console as the Notepad does for example?
Some Windows fonts have glyphs for many different scripts, some cover a few scripts, and many cover just one. (Fonts which support many scripts are sometimes called "Unicode fonts," which can be a misleading term. In other OSes, these kinds of fonts are more prevalent. Windows itself doesn't ship with any, though I think you get one or two with the Office suite.)
When you try to output text in multiple scripts using standard Windows functions using one of the well-known fonts, then Windows uses font fallback and/or font linking, which automatically switches between fonts as needed to output the whole string. Most programs, like Notepad and Notepad++, thus get coverage automatically.
I haven't read the LibreOffice code, but I suspect that when you select a font for a span of text, it sticks with that font, effectively preventing Windows's font fallback and font linking mechanisms from helping. This isn't surprising, since a WYSIWYG editor is likely to use lower-level APIs for outputting text in order to have more typographic control. But using the lower-level APIs means you don't get fallback and linking for free, so you'd have to implement it yourself, and that's a lot of extra work that may not be important to very many users.
The Windows console has a lot of legacy and limitations that persist for backward compatibility with older programs. The console mostly emulates DOS systems, which didn't have any sort of Unicode support and instead relied on "Code Pages," which are, roughly speaking, alternate mappings between character values and glyphs. Code Pages are geared at just one (or maybe two) scripts, so if you need characters from another script, you were basically out of luck. I think modern versions of Windows have hacked in some support for a pseudo code page that supports UTF-8, but I've never gotten it to work well and it, too, has limitations.

Notepad++, Atom, encoding seems to be broken

I have to edit php (.inc) file which was created a long ago and I don't know which editor was used to create it. The Cyrillic letters in Notepad++ are shown like they were in wrong encoding:
In GitHub's Atom editor, Cyrillic letters are totally lost and replaced with the � character:
But in browser everything is displayed correctly! The same is true when using Windows Notepad. Why it is displayed incorrectly in code editors and is there a way to make it look normal?
P.S. OK the thought that I just can copy it from windows notepad and save in notepad++ only now came to me :D But still curious why this happened to code editors.
P.S.2 Problem is solved. Editors just didn't recognize the original encoding properly. When I changed it manually to Windows1251, everything became ok.
Atom's support for encoding isn't as mature as some other editors out there, as you have already discovered you can change the encoding in the bottom right hand corner and Atom will remember it, however there are some packages which help further:
Out of the box as you have discovered Encoding Selector which allows you to choose how Atom interprets the contents of the text file.
There is a package that automatically select encoding for you named Auto Encoding, however it does have some issues with certain types of file, you might find this isn't a problem.
Finally, there is my personal favorite, editor-settings, which allows you to set the encoding of all files of a specific language, with a specific file extension or or directory.
As an example if you wanted to configure all .inc files in a directory to use windows-1251 create a .editor-settings in the directory you are using and paste in the following:
encoding: utf-8
extensionConfig:
inc:
encoding: windows-1251

Mac text editor that support LineBreak and Encoding detection/change

I'm used to windows and I try to find replacement tools in mac world.
I use Notepad++ on Windows and I frequently use the ability to detect and change the charset and the line break mode of a file.
I find some editor who support linebreak changing but I didn't find an editor that can display the charset of the current file and change it in an other.
Thank you for your help!
Yes: I recommend BBEdit or its free smaller brother TextWrangler. Both are very good and both handle changes to encoding and line breaks etc well.

Why do Netbeans, Aptana Studio and Komodo Edit all not save in UTF-8?

I'm getting back into development and want to find a good editor for HTML5/JQuery.
Being able to save files in UTF-8 is important.
However, although I set my project in NetBeans 7.0 to encode in UTF-8, when I create a file in the project, then look at it in Notepad++, the file is encoded in ANSI and I have to manually set the encoding to UTF-8:
In Aptana Studio 3 I set the workspace to UTF-8 encoding, and my project inherits from that, but when I create a file in the project and look at it in Notepad++, it is encoded in ANSI and I have to change the encoding manually to UTF-8:
So I tried Komodo Edit 7 and in the file manually set the encoding to UTF-8, saved the file, looked at it in Notepad++ which said the file is in ANSI.
I notice in any of these editors if I put a German umlaut character in the file, then Notepad++ shows it as "ANSI as UTF-8" but I still have to manually change it to UTF-8 in Notepad++ where it will stay.
The reason I want an editor that saves in UTF-8 is I remember having a project a couple years ago which had German and French characters in the files and after they were viewed and saved in various editors, the characters would be replaced with garbage characters. The solution was to always initially set the encoding of the file to UTF-8.
I assumed that editors would be so far advanced now that if you specify that the files should be saved in UTF-8, that they actually save in UTF-8 in a way that is recognized by every modern text editor. Is this not the case? What am I not understanding about modern text editors and development environments in regard to UTF-8?
How can I get these editors to save their files in UTF-8 encoding?
A UTF-8 encoded file that only contains characters also present in the ASCII table (the first 128 Unicode characters, i.e. your basic alphanumeric characters) is indistinguishable from an ASCII/ANSI encoded file. My guess is that Notepad++ simply can't make the distinction (because there is none) and defaults to ANSI. You can see the difference when you include a character that is not in the ASCII table. By "ANSI as UTF-8" I can only guess that it means "this documents contains characters from the ANSI table (a.k.a. Latin-1) and is saved in UTF-8".
In other words, your IDEs are probably fine, the problem is with Notepad++.
Try a character like 漢字, that will result in a pretty unique UTF-8 byte sequence that's most certainly not ANSI.
From what I've seen on this topic, Notepad's UTF-8 equates to Notepad++'s UTF-8, which means with BOM included. If a file is saved with this encoding and opened in NetBeans, it will actually show a - character or the  characters for the BOM sequence (depending on whether the encoding for the project or IDE is set to UTF-8.) But if you save the file in Notepad++ encoded as "UTF-8 without BOM", and have either your project defined as UTF-8 or have your netbeans_default_options included with this -J-Dfile.encoding=UTF-8, you'll see what I think is UTF-8 as it should be. Unfortunately, if you try to edit this file in NetBeans without including characters that are outside of the ANSI code set, you see the behavior that you referred to in your question with the file having its encoding set to ANSI.
So in an attempt to make this a "sort-of" answer to your question, please remember that not all editor's concept of UTF-8 are the same. Notepad++ gives the most actual info on what the real encoding for a file is. I'd say that developing in either a Linux or Mac environment might be a possible good choice for making sure that localization is correct, but on Windows a decent workaround might be to just include a non-ANSI character in the file to insure it always get saved as a UTF-8 (non-BOM) file. This is all geared towards NetBeans dev by the way. I haven't tested this with the others, though I'm willing to bet that they will save the file correctly on a Windows machine if they have non-ANSI characters in them. Sorry for the kluge gang, but either way, I hope it helps someone struggling with this same issue.

TextPad and Unicode: full support?

I've got some UTF-8 files created in Mac, and when trying to open them using TextPad in Windows, I get the following warning:
WARNING: (file name) contains characters that do not exist in code
page 1252 (ANSI Latin 1). They will be converted to the system default
character, if you click OK.
Linux (GNOME gEdit) can open the same file without complaints. What does the above mean? I thought that TextPad had full UTF-8 support. Can I safely open and edit UTF-8 files using it without corrupting the file?
It seems that TextPad cannot handle characters outside windows-1252 (CP1252, here carrying the misnomer “ANSI Latin 1”). I tested it on Windows, opening a plain text file created on the same system, as UTF-8 encoded, both with and without BOM, with the same result. The program’s help does not seem to contain anything related to character encodings, and its tools for writing “international characters” are for Latin-1 characters only.
There are several text editors for Windows that can deal with UTF-8 (even Notepad can open a UTF-8 file, but it can hardly be recommended for serious editing). See Alan Wood’s collection of information on Unicode editors and word processors for Windows. (Personally, I like Notepad++ and BabelPad, which are both free.)
TextPad 8, the newest as of 2016-01-28, does finally properly support BMP Unicode. It's a paid upgrade, but so far has been working flawlessly for me.
TextPad ‘supports’ UTF-8 and UTF-16 documents only in as much as it will import and export them. But it still edits files as simple bytes, and not Unicode characters (using the ANSI code page, which is code page 1252 for Western European).
So unless the file happened to contain only characters that also exist in that code page, you will lose content. This rather defeats the point of Unicode.
Indeed, this was the issue that made me flee—to EmEditor, at the time, though now I would agree with the previous comments and recommend Notepad++. The era of paying for text editors is long gone.
Actually TextPad does support displaying Unicode code points granted they went about it the wrong way. In order to display the Unicode characters you have to choose Configure->Preferences and expand "Document Classes->Text->Font.
You need to choose a Unicode font AND set the Script to match. E.g. Arial Unicode MS with script CHINESE_BIG5.
However, this is a backward approach since the application should handle this when the user tells TextPad to open the file in Unicode or UTF-8. The built in Notepad application with MS Windows will detect the encoding automatically and display the glyphs correctly based upon the encoding.
I found a discussion on this in the Textpad forums:
http://forums.textpad.com/viewtopic.php?t=11019
While I have Notepad++, Textpad handles large files with ease while other editors I've tried, including Notepad++, either slow to a crawl or die. I'm currently trying to edit a 475MB file and Notepad++ is not up to the task.
Textpad Configure Menu --> Preferences --> Document Classes --> Default --> Default encoding --> UTF-8
Try the ANSI code set with File/Open, that should solve the problem in TextPad

Resources