Even if auto detection of utf8 files is ON and you instruct the editor to open the file as utf8 in the Open dialog, UtraEdit will open the file as ASCII.
UltraEdit version 12 is an unprecise version information. There were several UTF-8 related fixes from UE v12.00 released on 2006-03-15 to version 12.20b+1 released on 2007-01-02 which was the last 12.xx version.
The UTF-8 detection algorithm is explained in UltraEdit forum topic UTF-8 not recognized, largish file. There are further Using UTF-8 with UltraEdit and UTF-8 auto-detection problem with first multi-byte after 10k and some other topics related to UTF-8 in UltraEdit forum which can be found using the UltraEdit forum search searching for the words UTF and open.
But I don't really know what to answer as your question does not contain any question. All you wrote is that a file we can't see opens in UltraEdit v12.xx always as ASCII/ANSI file even with selecting UTF-8 in the File Open dialog. We can't verify your statement without having the file for verification and of course knowing the full version of the now already 8 year old UltraEdit version you still use for some unknown reason.
It seems the problem is that the algorithm that utf8 uses to detect utf8 files is only using the start of the file. So to make sure the file is detected as utf8 I just put an "utf8 trap" at the beginning of the file, in a comment:
<!-- €șăâțÎȚȘĂÂ - utf8 trap -->
Still not sure why UtraEdit does not listen to the Format field in the Open dialog...
Make sure you have the right configuration first http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/unicode.html
Related
when i push a package in nuget server (on a local TFS), it will corrupt file's encoding. of course if i open my index.cshtml in notepad it will show me utf-8 encoding but VS can't show unicode characters in run time and i have to open the cshtml file in notepad and saveAs it by utf-8 encoding.
Do you mean that VS can not show the unicode characters correctly, but they can be showed correctly in VS after saving the file with utf-8 encoding?
In order to run UTF-8, you need a Byte Order Mark (BOM) sometimes called a signature. According to the description in this MSDN article:
Unicode characters are now supported in identifiers, macros, string
and character literals, and in comments. Universal character names are
also now supported.
Unicode can be input into a source code file in the following
encodings:
UTF-16 little endian with or without byte order mark (BOM)
UTF-16 big endian with or without BOM
UTF-8 with BOM
For the love of all things decent, do NOT use "UTF-8 with BOM!"
You can try to recreate the nuget package in VS with the utf-8 encoding files, then publish it to the server, then try it again. While please note below things:
Use the UTF-8 character encoding for the *.nuspec
Do not save your *.nuspec files with a Byte Order Mark (BOM). A BOM
is neither required nor recommended for UTF-8, because it can lead to
several issues.
Specify the UTF-8 encoding in the first line of your *.nuspec files
like so: <?xml version="1.0" encoding="utf-8"?>.
Also reference this thread: UTF-8 without BOM
I am getting Content Encoding Error when I am enabling the Gzip Page Compression on my Joomla website , how can I enable this without this error?
The problem may be that your server already has gzip compression turned on. When the server gzips, then Joomla tries to gzip again it can cause some weird encoding issues. Contact your hosting company and find out if they gzip automatically. If so, there is no need to have it on in Joomla.
Maybe you had the same problem as me and actually have some utf-8 files WITH an UTF-8 BOM inside * in your code or joomla files somehow.
I think gzip in combination with UTF-8 BOM gives an encoding problem.
Note:
- Not all editors are able to show if the BOM is there or not. I had to actually use another editor, Notepad++, to realize there was a BOM there and remove it there via "Convert to utf-8 without BOM" and then saving the file. (Also closing it in my original editor first.)
But could as well be that you can set your editor not to include the BOM.
- Possibly this only occurs when php error reporting is on
* More about UTF BOM:
stackoverflow: What's different between utf-8 and utf-8 without BOM?
Wikipedia: Byte order mark
Using Visual Studio 2010. I have a resource.h file which TortoiseHg thinks is binary so it won't display a diff for it in the commit window. I can easily open the file in a text editor and see that it is plain text.
I saw a related question (Why does Mercurial think my SQL files are binary?) which suggests it has to do with file encoding. Indeed opening the file in Notepad++ says the file is in "UCS-2 Little Endian". How can I fix this? I, obviously, don't want to break some Visual Studio expectation.
For display purposes only, Mercurial treats all files containing NUL bytes as binary due to long-standing UNIX convention. This is just about always right.. except for UTF-16 (formerly known as UCS-2).. where half your file is NUL bytes!
Internally, Mercurial treats all files as binary all the time, so this issue is only relevant for things like whether or not we try to display diffs.
So you have two options:
ignore it, Mercurial will work just fine
use an encoding other than UTF-16
Some web searched for "resource.h utf-16" suggest that VS2010 will be just fine if you save this file in UTF-8 or ASCII, which should be perfectly fine choices for C source code.
http://social.msdn.microsoft.com/Forums/en/vssetup/thread/aff0f96d-16e3-4801-a7a2-5032803c8d83
Try explicitly converting / changing the encoding to UTF-8 / ASCII and see. You can do that from Notepad++'s Encoding menu ( choose Encode in UTF-8)
Visual Studio will work with the UTF-8 file just fine.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a website, and I can send my Turkish characters with jQuery in Firefox, but Internet Explorer doesn't send my Turkish characters.
I looked at my source file in notepad, and this file's code page is ANSI.
When I convert it to UTF-8 without BOM and close the file, the file is again ANSI when I reopen.
How can I convert my file from ANSI to UTF-8?
Regarding this part:
When I convert it to UTF-8 without bom and close file, the file is again ANSI when I reopen.
The easiest solution is to avoid the problem entirely by properly configuring Notepad++.
Try Settings -> Preferences -> New document -> Encoding -> choose UTF-8 without BOM, and check Apply to opened ANSI files.
That way all the opened ANSI files will be treated as UTF-8 without BOM.
For explanation what's going on, read the comments below this answer.
To fully learn about Unicode and UTF-8, read this excellent article from Joel Spolsky.
Maybe this is not the answer you needed, but I encountered similar problem, so I decided to put it here.
I needed to convert 500 xml files to UTF8 via Notepad++. Why Notepad++? When I used the option "Encode in UTF8" (many other converters use the same logic) it messed up all special characters, so I had to use "Convert to UTF8" explicitly.
Here some simple steps to convert multiple files via Notepad++ without messing up with special characters (for ex. diacritical marks).
Run Notepad++ and then open menu Plugins->Plugin Manager->Show
Plugin Manager
Install Python Script. When plugin is installed, restart the
application.
Choose menu Plugins->Python Script->New script.
Choose its name, and then past the following code:
convertToUTF8.py
import os
import sys
from Npp import notepad # import it first!
filePathSrc="C:\\Users\\" # Path to the folder with files to convert
for root, dirs, files in os.walk(filePathSrc):
for fn in files:
if fn[-4:] == '.xml': # Specify type of the files
notepad.open(root + "\\" + fn)
notepad.runMenuCommand("Encoding", "Convert to UTF-8")
# notepad.save()
# if you try to save/replace the file, an annoying confirmation window would popup.
notepad.saveAs("{}{}".format(fn[:-4], '_utf8.xml'))
notepad.close()
After all, run the script
If you don't have non-ASCII characters (codepoints 128 and above) in your file, UTF-8 without BOM is the same as ASCII, byte for byte - so Notepad++ will guess wrong.
What you need to do is to specify the character encoding when serving the AJAX response - e.g. with PHP, you'd do this:
header('Content-Type: application/json; charset=utf-8');
The important part is to specify the charset with every JS response - else IE will fall back to user's system default encoding, which is wrong most of the time.
I have problems with files encoding in Visual Studio 2008. While compiling I'm getting such errors:
When I'm trying to open file where particular error occures, encoding window appears:
By defualt auto-detect is set. When I change encoding option to UTF-8, everything works. If I open each problematic file in my project using UTF-8 encoding, project starts to compile. The problem is I have too many files and there is ridiculous to open each file and set encoding to UTF-8. Is there any way to make this in a quick way ?
My VS settings are:
I'm using Windows Server 2008 R2.
UPDATE:
For Hans Passant and Noah Richards. Thanks for interaction. I recently changed my operating system so everything is fresh. I've also downloaded fresh solution from source control.
In OS regional settings I've changed system locale to Polish(Poland):
In VS I've changed international settings to the same as windows:
The problem is still not solved.
When I open some .cs files using auto-detection for encoding, and then check Files -> Advanced Save Options..., some of this .cs files have codepage 1250:
but internally looks following:
It is wired, because when I check properties of such particular files in source control, they seems to have UTF-8 encoding set:
I don't understand this mismatch.
All other files have UTF-8 encoding:
and opens correctly. I have basically no idea what is going wrong because as far as I know my friend has the same options set as me, and the same project compiles for him correctly. But so far he happily hasn't encountered encoding issues.
That uppercase A with circumflex tells me that the file is UTF-8 (if you look with a hex editor you will probably see that the bytes are C2 A0). That is a non-breaking space in UTF-8.
Visual Studio does not detect the encoding because (most likely) there are not enough high-ASCII characters in the file to help with a reliable detection.
Also, there is no BOM (Byte Order Mark). That would help with the detection (this is the "signature" in the "UTF-8 with signature" description).
What you can do: add BOM to all the files that don't have one.
How to add? Make a file with a BOM only (empty file in Notepad, Save As, select UTF-8 as encoding). It will be 3 bytes long (EF BB BF).
You can copy that at the beginning of each file that is missing the BOM:
copy /b/v BOM.txt + YourFile.cs YourFile_Ok.cs
ren YourFile.cs YourFile_Org.cs
ren YourFile_Ok.cs YourFile.cs
Make sure there is a + between the name of the BOM file and the one of the original file.
Try it on one or two files, and if it works you can create some batch file to do that.
Or a small C# application (since you are a C# programmer), that can detect if the file already has a BOM or not, so that you don't add it twice. Of course, you can do this in almost anything, from Perl to PowerShell to C++ :-)
Once you've opened the files in UTF-8 mode, can you try changing the Advanced Save Options for the file and saving it (as UTF-8 with signature, if you think these files should be UTF-8)?
The auto-detect encoding detection is best-effort, so it's likely that something in the file is causing it to be detected as something other than UTF-8, such as having only ASCII characters in the first kilobyte of the file, or having a BOM that indicates the file is something other than UTF-8. Re-saving the file as UTF-8 with signature should (hopefully) correct that.
If it continues happening after that, let me know, and we can try to track down what is causing them to be created/saved like that in the first place.