Notepad++, Atom, encoding seems to be broken - utf-8

I have to edit php (.inc) file which was created a long ago and I don't know which editor was used to create it. The Cyrillic letters in Notepad++ are shown like they were in wrong encoding:
In GitHub's Atom editor, Cyrillic letters are totally lost and replaced with the � character:
But in browser everything is displayed correctly! The same is true when using Windows Notepad. Why it is displayed incorrectly in code editors and is there a way to make it look normal?
P.S. OK the thought that I just can copy it from windows notepad and save in notepad++ only now came to me :D But still curious why this happened to code editors.
P.S.2 Problem is solved. Editors just didn't recognize the original encoding properly. When I changed it manually to Windows1251, everything became ok.

Atom's support for encoding isn't as mature as some other editors out there, as you have already discovered you can change the encoding in the bottom right hand corner and Atom will remember it, however there are some packages which help further:
Out of the box as you have discovered Encoding Selector which allows you to choose how Atom interprets the contents of the text file.
There is a package that automatically select encoding for you named Auto Encoding, however it does have some issues with certain types of file, you might find this isn't a problem.
Finally, there is my personal favorite, editor-settings, which allows you to set the encoding of all files of a specific language, with a specific file extension or or directory.
As an example if you wanted to configure all .inc files in a directory to use windows-1251 create a .editor-settings in the directory you are using and paste in the following:
encoding: utf-8
extensionConfig:
inc:
encoding: windows-1251

Related

SASS/ SCSS - Unknown Encoding Format

I am using the Brackets text editor and I have been using SASS. An unexpected computer reboot affected the file's encoding and my CSS code was replaced by diamonds with question marks.
I am getting the error "unknown encoding format", whenever I try opening my SCSS file. The content became rows of 0s instead of my code. I have also tried opening my SCSS file using different editors such as Sublime and Notepad++, all results are same.
What could be the cause of this? I cannot attach screenshots due to new account limitations.
Try to change your file encoding to UTF-8. try this utility EmEditor.

UTF-8 characters missing or displayed as boxes in Notepad, but works fine in webbrowser and other text editors

I have UTF-8 text stored in DB and served as text/plain; charset=utf-8 in a web application. All the things are working fine. I can see the UTF-8 text on browser window without any problem.
But when I save that text to a file and try to open it in Windows Notepad, I got some characters missing and displayed as a small rectangular box. However, the text file looks fine in other editors like EditPlus and Notepad++.
How is this caused and how can I solve it?
If it looks fine in other editors, then the text itself is fine. If it looks OK in the browser, then the response is probably fine too (but better check page info in the browser and see what the encoding is). Your problem is probably with notepad itself. Sometimes it requires BOM to detect Unicode properly. But BOM can break other apps that don't support it. You should also try Notepad on different versions of Windows. I have just tried opening an UTF-8 file in Windows 7, looks fine to me.
If you use tomcat as an application server you might want to add this to its configuration:
"-Dfile.encoding=UTF-8"
Also, take a look here:
Setting the default Java character encoding?
You need to use as below:
response.setContentType("text/html; charset=utf-8");
response.setCharacterEncoding("UTF-8");

Why do Netbeans, Aptana Studio and Komodo Edit all not save in UTF-8?

I'm getting back into development and want to find a good editor for HTML5/JQuery.
Being able to save files in UTF-8 is important.
However, although I set my project in NetBeans 7.0 to encode in UTF-8, when I create a file in the project, then look at it in Notepad++, the file is encoded in ANSI and I have to manually set the encoding to UTF-8:
In Aptana Studio 3 I set the workspace to UTF-8 encoding, and my project inherits from that, but when I create a file in the project and look at it in Notepad++, it is encoded in ANSI and I have to change the encoding manually to UTF-8:
So I tried Komodo Edit 7 and in the file manually set the encoding to UTF-8, saved the file, looked at it in Notepad++ which said the file is in ANSI.
I notice in any of these editors if I put a German umlaut character in the file, then Notepad++ shows it as "ANSI as UTF-8" but I still have to manually change it to UTF-8 in Notepad++ where it will stay.
The reason I want an editor that saves in UTF-8 is I remember having a project a couple years ago which had German and French characters in the files and after they were viewed and saved in various editors, the characters would be replaced with garbage characters. The solution was to always initially set the encoding of the file to UTF-8.
I assumed that editors would be so far advanced now that if you specify that the files should be saved in UTF-8, that they actually save in UTF-8 in a way that is recognized by every modern text editor. Is this not the case? What am I not understanding about modern text editors and development environments in regard to UTF-8?
How can I get these editors to save their files in UTF-8 encoding?
A UTF-8 encoded file that only contains characters also present in the ASCII table (the first 128 Unicode characters, i.e. your basic alphanumeric characters) is indistinguishable from an ASCII/ANSI encoded file. My guess is that Notepad++ simply can't make the distinction (because there is none) and defaults to ANSI. You can see the difference when you include a character that is not in the ASCII table. By "ANSI as UTF-8" I can only guess that it means "this documents contains characters from the ANSI table (a.k.a. Latin-1) and is saved in UTF-8".
In other words, your IDEs are probably fine, the problem is with Notepad++.
Try a character like 漢字, that will result in a pretty unique UTF-8 byte sequence that's most certainly not ANSI.
From what I've seen on this topic, Notepad's UTF-8 equates to Notepad++'s UTF-8, which means with BOM included. If a file is saved with this encoding and opened in NetBeans, it will actually show a - character or the  characters for the BOM sequence (depending on whether the encoding for the project or IDE is set to UTF-8.) But if you save the file in Notepad++ encoded as "UTF-8 without BOM", and have either your project defined as UTF-8 or have your netbeans_default_options included with this -J-Dfile.encoding=UTF-8, you'll see what I think is UTF-8 as it should be. Unfortunately, if you try to edit this file in NetBeans without including characters that are outside of the ANSI code set, you see the behavior that you referred to in your question with the file having its encoding set to ANSI.
So in an attempt to make this a "sort-of" answer to your question, please remember that not all editor's concept of UTF-8 are the same. Notepad++ gives the most actual info on what the real encoding for a file is. I'd say that developing in either a Linux or Mac environment might be a possible good choice for making sure that localization is correct, but on Windows a decent workaround might be to just include a non-ANSI character in the file to insure it always get saved as a UTF-8 (non-BOM) file. This is all geared towards NetBeans dev by the way. I haven't tested this with the others, though I'm willing to bet that they will save the file correctly on a Windows machine if they have non-ANSI characters in them. Sorry for the kluge gang, but either way, I hope it helps someone struggling with this same issue.

TextPad and Unicode: full support?

I've got some UTF-8 files created in Mac, and when trying to open them using TextPad in Windows, I get the following warning:
WARNING: (file name) contains characters that do not exist in code
page 1252 (ANSI Latin 1). They will be converted to the system default
character, if you click OK.
Linux (GNOME gEdit) can open the same file without complaints. What does the above mean? I thought that TextPad had full UTF-8 support. Can I safely open and edit UTF-8 files using it without corrupting the file?
It seems that TextPad cannot handle characters outside windows-1252 (CP1252, here carrying the misnomer “ANSI Latin 1”). I tested it on Windows, opening a plain text file created on the same system, as UTF-8 encoded, both with and without BOM, with the same result. The program’s help does not seem to contain anything related to character encodings, and its tools for writing “international characters” are for Latin-1 characters only.
There are several text editors for Windows that can deal with UTF-8 (even Notepad can open a UTF-8 file, but it can hardly be recommended for serious editing). See Alan Wood’s collection of information on Unicode editors and word processors for Windows. (Personally, I like Notepad++ and BabelPad, which are both free.)
TextPad 8, the newest as of 2016-01-28, does finally properly support BMP Unicode. It's a paid upgrade, but so far has been working flawlessly for me.
TextPad ‘supports’ UTF-8 and UTF-16 documents only in as much as it will import and export them. But it still edits files as simple bytes, and not Unicode characters (using the ANSI code page, which is code page 1252 for Western European).
So unless the file happened to contain only characters that also exist in that code page, you will lose content. This rather defeats the point of Unicode.
Indeed, this was the issue that made me flee—to EmEditor, at the time, though now I would agree with the previous comments and recommend Notepad++. The era of paying for text editors is long gone.
Actually TextPad does support displaying Unicode code points granted they went about it the wrong way. In order to display the Unicode characters you have to choose Configure->Preferences and expand "Document Classes->Text->Font.
You need to choose a Unicode font AND set the Script to match. E.g. Arial Unicode MS with script CHINESE_BIG5.
However, this is a backward approach since the application should handle this when the user tells TextPad to open the file in Unicode or UTF-8. The built in Notepad application with MS Windows will detect the encoding automatically and display the glyphs correctly based upon the encoding.
I found a discussion on this in the Textpad forums:
http://forums.textpad.com/viewtopic.php?t=11019
While I have Notepad++, Textpad handles large files with ease while other editors I've tried, including Notepad++, either slow to a crawl or die. I'm currently trying to edit a 475MB file and Notepad++ is not up to the task.
Textpad Configure Menu --> Preferences --> Document Classes --> Default --> Default encoding --> UTF-8
Try the ANSI code set with File/Open, that should solve the problem in TextPad

Force Visual Studio (2010) to save all files in UTF-8

Is there any way I can force Visual Studio (2010) to save all files in UTF-8, always?
I do not know of a way to force it to save everything in UTF-8, but you can do so on a case-by-case basis. When you first save a document and the Save As... dialog appears, the Save button will actually be a drop-down button with two options. You want "Save with Encoding...", which will then present you the entire list of installed Windows encodings.
The encoding you really want is way down the bottom:
Unicode (UTF-8 without signature) - Codepage 65001
although if you want to save yourself a lot of pain, you will probably want to pick the option near the top:
Unicode (UTF-8 with signature) - Codepage 65001
The difference is that the latter option stick the UTF-8 signature (which is just the UTF-16 byte-order mark encoded in UTF-8). This is one of my pet peeves, as UTF-8 doesn't have multiple byte orders, so the BOM is redundant at best, and breaks all kinds of text processing tools at worst. MS uses it to "detect" UTF-8 automatically, since for single-byte character, UTF-8, ISO-8859-1, and CP-1252 are identical except for a sequence of 32 characters (0x80 - 0x9f) that MS basically made up.
If you only ever edit or process your files with Visual Studio or the .NET tools, then saving with signature will probably work fine. If you need to save files for use by other tools (batch files, SQL queries, PHP scripts, etc), the signature will cause problems, and you should save them without it. If you do this, you may want to enable the option (Under Tools -> Options -> Text Editor) to "Auto-detect UTF-8 encoding without signature", or else, right-click on the file and chose "Open With..." and select the editor option that says " editor with Encoding".
I think it saves files in the current codepage. There's an option under Tools->Options->Environment->Documents that will make it save in unicode when it cannot save in current codepage. But I don't know if that helps...
I think you want to try ForceUtf8(with BOM)/ ForceUtf8(without BOM) extenstion.
Just search UTF8 on VS extension gallery(Tools -> Extension and updates)

Resources