Visual Studio C# disable unicode or utf-8 as file encoding and use ASCII instead - visual-studio

I am currently working on some LaTeX document which embeds C# files generated by Visual Studio 2008. My problem is that these files are encoded in UTF-8 with BOM. This causes LaTeX to produce output similar to the output described in this post:
Invalid characters in generated latex sources in Doxygen?
I know that I can use a tool like Notepad++ to convert the file to ASCII or some other format without BOM. But my intention would be to:
either cause LaTeX to use correct input encoding (until now I failed doing it with the package imports like:
\usepackage{ucs} % unicode functionality
\usepackage[latin1]{inputenc}
or cause Visual Studio to save the files without BOM or in plain ASCII
Otherwise I might edit the file (compile it and save it in VC#) and unintentionally introduce BOM again, which would break the code listing in the document.
Many thanks,
Ovanes

Visual Studio does not have this option, by design I believe, because .NET is built from the ground-up to use Unicode.
However, I don't believe Visual Studio is supposed to use the byte order marks. You said that Visual Studio is "generating" these files, but what process is really creating them? Is it the result of some sort of code generation tool? If so, that's the culprit and the place where you should focus.
I checked several of my code files and none of them contain the byte order marks.
EDIT: Changing Visual Studio Project Templates
In the comments the questioner said that these files were generated by the built-in Console Application project template. These are stored on your hard drive and can be modified if necessary.
Your installation path may vary, but on my system, I navigated to this directory:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ProjectTemplates\CSharp\Windows\1033
Here I find ConsoleApplication.zip. I copied this to my desktop (for safety) and unzipped, and inside you find 4 files - a .vstemplate file, and the 3 files that are created by the project: AssemblyInfo.cs, ConsoleApplication.csproj, and Program.cs.
If you want, you can edit these files to remove the byte order marks, zip it back up, and replace the file in the source directory.
OR, to be safer, you can change the name of the template to "Console Project - No BOM" or something like that. In the .vstemplate file, there is a Name attribute that uses a Package attribute to call in information from somewhere by a guid. You can replace this name line with a simple line that specifies the name.
<Name>Console Application - No BOM</Name>
Then rezip the files, and put the zip file in the following path:
(My Documents)\Visual Studio 2008\Templates\ProjectTemplates\Visual C#
New projects created from this template should not contain the byte order marks, but remember, Microsoft apparently wanted those byte order marks in there, so your mileage may vary.
Item templates (like Class) can be modified in the same way - it shouldn't take too much exploring to find the default and user ItemTemplates directory.

I'm not sure I understand your scenario. But if you simply want to convert a file to ASCII from within Visual Studio select "File - Save As" and switch the encoding to ASCII.

have you tried \usepackage[UTF8]{inputenc}

In VS 2015 you can specify encoding using File -> Advanced save options... dialog

You can use the plugin for Visual Studio: https://vlasovstudio.com/fix-file-encoding/, this plugin prevents Visual Studio from adding BOM to the beginning of the file, so that way all of my files can have UTF-8 encoding and raw strings can contain special character and they will be displayed/written without any issues,

Related

Visual Studio 2010 doesn't generate Resource (Resx) designer code if file is localized

I've just come across the weirdest behavior in Visual Studio 2010 while working with Resx resource files and I just can't wrap my head around it.
The problem is as follows: Visual Studio will not generate the designer.cs file for a resource file with a localized name (such as resource.fr.resx), but it works fine for other files with simple names (such as resource.resx).
Here's a snapshot of my visual studio project setup:
As you can see I just did a simple test with 3 resource files:
test.resx
test2.resx
test.fr.resx
The designer.cs files for test.resx and test2.resx are generated just fine. However test.fr.designer.cs is created but always blank, no matter what I do.
I have double and triple checked: the custom tool property is properly set for the localized file. All properties are exactly the same across all files (I'm using PublicResxFileCodeGenerator, but I get the same behavior if I set access modifier to internal and use ResxFileCodeGenerator).
Note: I've noticed that when created a resource file, Visual Studio normally defaults the access modifier to "Internal". However, when creating a localized resource file (resource.fr.resx) it defaults to "No code generation". Just found that interesting to note since it proves that visual Studio is treating the localized file differently for some reason.
--> Is there something I'm missing here? I would appreciate if anybody has some insight on the subject, this is driving me crazy.
While I haven't looked into this particular issue, I've had numerous other problems with ".resx" files. Visual Studio is sometimes buggy (handling ".resx" files among other things), and I've officially reported some of these to MSFT (since it affects my own commercial localization program). In any case, you shouldn't normally be naming things this way. It effectively violates MSFT's localization rules. Default language files shouldn't normally have an embedded language code, and it could be choking on it for this reason. I'd need to investigate, but what you should be doing is creating "Test.resx", which has a "Designer.cs" file, and then "Test.fr.resx", which doesn't. All default language strings are then placed in "Test.resx" and the corresponding French strings in "Test.fr.resx". In code, you then access the strongly typed name found in "Test.Designer.cs", and the string you get back will be the default language string unless you set "System.Threading.Thread.CurrentThread.CurrentUICulture" to "fr". You'll then get back the French version of the string from "Test.fr.resx", unless it's not found there (there's no translation), in which case you'll get back the fallback string from "Test.resx" (i.e., the default language string). This is how the hub-and-spoke model basically works.

Why does TortoiseHg think Resource.h is binary?

Using Visual Studio 2010. I have a resource.h file which TortoiseHg thinks is binary so it won't display a diff for it in the commit window. I can easily open the file in a text editor and see that it is plain text.
I saw a related question (Why does Mercurial think my SQL files are binary?) which suggests it has to do with file encoding. Indeed opening the file in Notepad++ says the file is in "UCS-2 Little Endian". How can I fix this? I, obviously, don't want to break some Visual Studio expectation.
For display purposes only, Mercurial treats all files containing NUL bytes as binary due to long-standing UNIX convention. This is just about always right.. except for UTF-16 (formerly known as UCS-2).. where half your file is NUL bytes!
Internally, Mercurial treats all files as binary all the time, so this issue is only relevant for things like whether or not we try to display diffs.
So you have two options:
ignore it, Mercurial will work just fine
use an encoding other than UTF-16
Some web searched for "resource.h utf-16" suggest that VS2010 will be just fine if you save this file in UTF-8 or ASCII, which should be perfectly fine choices for C source code.
http://social.msdn.microsoft.com/Forums/en/vssetup/thread/aff0f96d-16e3-4801-a7a2-5032803c8d83
Try explicitly converting / changing the encoding to UTF-8 / ASCII and see. You can do that from Notepad++'s Encoding menu ( choose Encode in UTF-8)
Visual Studio will work with the UTF-8 file just fine.

Why is mercurial (hg) treating my Visual Studio solutions (.sln) as binary?

I get the message "File or diffs not displayed: File is binary."
Why is mercurial (hg) treating my visual studio solutions (.sln) as binary?
And how do I stop it?
Thanks
I tried this out on one of my projects and the sln file was treated as a text file. Check if your sln file is in a different encoding like UTF-16. Otherwise, Hg should not be treating it as binary. Try explicitly converting / changing the encoding to UTF-8 / ASCII and see.
For actual storage Mercurial treats all files as binary. It never does line conversions or anything else that requires considering things as text or knowing the file's encoding.
However, at the UI level (separate from the storage level) it will try to avoid filling your screen with binary gookus, and to do that it uses a simple test -- a file won't be displayed in diffs if it has one or more NUL (0x00) characters in it.
So your .sln file must have a 0x00 somewhere in it. The most common cause is misbehaving editors putting a Byte Order Mark (BOM) at the front of the file.
If you can remove the NUL Mercurial will display the file contents, and if you can't I think you're out of luck.

T4 vs UTF-8 vs VS2010

I'm trying to use T4 in VS2010, but I have a weird problem. T4 always shows error message "A directive was specified in the wrong format", although all directives are in correct format. It turned out this error was caused by a UTF-8 file prefix, which is not recognized by T4. Okay, I have to remove it. But every time I'm trying to save this file to ANSI encoding with an external editor, VS2010 changes it's encoding back to UTF-8. And the same when I modify file in VS2010. So T4 doesn't work again.
Any suggestions?
What Windows and Visual Studio language editions are you using.
T4 supports UTF-8 with/without prefix as it essentially replicates the encoding of the input template unless otherwise directed. (you have to close/reopen the output file in VS after changing encodings to see the switch). I'm not able to repro what you're seeing on EN-US Windows and VS.
If you do want to save as ANSI, you can use the File/Advanced Save Options menu in VS and pick a codepage.
I've found the reason, it had nothing to do with encoding, my bad. I copied some text from a sample, and it had wrong symbol for quotation mark (looking very similar visually, but having different char code). That made T4 parser fail.

Visual Studio encoding problems

I have problems with files encoding in Visual Studio 2008. While compiling I'm getting such errors:
When I'm trying to open file where particular error occures, encoding window appears:
By defualt auto-detect is set. When I change encoding option to UTF-8, everything works. If I open each problematic file in my project using UTF-8 encoding, project starts to compile. The problem is I have too many files and there is ridiculous to open each file and set encoding to UTF-8. Is there any way to make this in a quick way ?
My VS settings are:
I'm using Windows Server 2008 R2.
UPDATE:
For Hans Passant and Noah Richards. Thanks for interaction. I recently changed my operating system so everything is fresh. I've also downloaded fresh solution from source control.
In OS regional settings I've changed system locale to Polish(Poland):
In VS I've changed international settings to the same as windows:
The problem is still not solved.
When I open some .cs files using auto-detection for encoding, and then check Files -> Advanced Save Options..., some of this .cs files have codepage 1250:
but internally looks following:
It is wired, because when I check properties of such particular files in source control, they seems to have UTF-8 encoding set:
I don't understand this mismatch.
All other files have UTF-8 encoding:
and opens correctly. I have basically no idea what is going wrong because as far as I know my friend has the same options set as me, and the same project compiles for him correctly. But so far he happily hasn't encountered encoding issues.
That uppercase A with circumflex tells me that the file is UTF-8 (if you look with a hex editor you will probably see that the bytes are C2 A0). That is a non-breaking space in UTF-8.
Visual Studio does not detect the encoding because (most likely) there are not enough high-ASCII characters in the file to help with a reliable detection.
Also, there is no BOM (Byte Order Mark). That would help with the detection (this is the "signature" in the "UTF-8 with signature" description).
What you can do: add BOM to all the files that don't have one.
How to add? Make a file with a BOM only (empty file in Notepad, Save As, select UTF-8 as encoding). It will be 3 bytes long (EF BB BF).
You can copy that at the beginning of each file that is missing the BOM:
copy /b/v BOM.txt + YourFile.cs YourFile_Ok.cs
ren YourFile.cs YourFile_Org.cs
ren YourFile_Ok.cs YourFile.cs
Make sure there is a + between the name of the BOM file and the one of the original file.
Try it on one or two files, and if it works you can create some batch file to do that.
Or a small C# application (since you are a C# programmer), that can detect if the file already has a BOM or not, so that you don't add it twice. Of course, you can do this in almost anything, from Perl to PowerShell to C++ :-)
Once you've opened the files in UTF-8 mode, can you try changing the Advanced Save Options for the file and saving it (as UTF-8 with signature, if you think these files should be UTF-8)?
The auto-detect encoding detection is best-effort, so it's likely that something in the file is causing it to be detected as something other than UTF-8, such as having only ASCII characters in the first kilobyte of the file, or having a BOM that indicates the file is something other than UTF-8. Re-saving the file as UTF-8 with signature should (hopefully) correct that.
If it continues happening after that, let me know, and we can try to track down what is causing them to be created/saved like that in the first place.

Resources