T4 vs UTF-8 vs VS2010 - visual-studio-2010

I'm trying to use T4 in VS2010, but I have a weird problem. T4 always shows error message "A directive was specified in the wrong format", although all directives are in correct format. It turned out this error was caused by a UTF-8 file prefix, which is not recognized by T4. Okay, I have to remove it. But every time I'm trying to save this file to ANSI encoding with an external editor, VS2010 changes it's encoding back to UTF-8. And the same when I modify file in VS2010. So T4 doesn't work again.
Any suggestions?

What Windows and Visual Studio language editions are you using.
T4 supports UTF-8 with/without prefix as it essentially replicates the encoding of the input template unless otherwise directed. (you have to close/reopen the output file in VS after changing encodings to see the switch). I'm not able to repro what you're seeing on EN-US Windows and VS.
If you do want to save as ANSI, you can use the File/Advanced Save Options menu in VS and pick a codepage.

I've found the reason, it had nothing to do with encoding, my bad. I copied some text from a sample, and it had wrong symbol for quotation mark (looking very similar visually, but having different char code). That made T4 parser fail.

Related

How to make ApprovalTests create UTF-8 files

I use Visual Studio 2019 and have added ApprovalTests nuget package. Test class is configured with [UseReporter(typeof(DiffReporter))] and approval is done with Approvals.Verify(result)
It works fine except for the file encoding. In VS I get two files opened. But I also get a warning: "These files have different encodings. Left file: Unicode (UTF-8) with signature. Right file: Western European (Windows). You can resolve the difference by saving the right file with the encoding Unicode (UTF-8) with signature."
I can obviously manually change the right file by saving it with different encoding. That will make the comparison accept the result, but I will the have a content with weird looking escaping in both windows. That makes it much less readable. Example: Simple plus sign is exchanged with \u002B
When debugging the code just before the approval I can verify that the result looks good with all characters as I expect them to look. What happens then? My impression is that the ApprovalTests framework forces an encoding that I can not control.

How can I avoid an encoding conflict when merging files in TFS?

We are working on a large codebase in Visual Studio 2010 using TFS as version control system.
When performing merges we recently get a lot of encoding issues. For the most of them we get an option "Choose encoding and merge" for other files we get:
"The encodings for the files being merged must match or an encoding conversion must be specified."
We normally put every file in UTF-8, although we get conflicts when merging between codepage 1252 and utf-8.
To solve these issues we always perform a manual merge which can be quite cumbersome. How can we avoid these errors? What is the recommended encoding for source-code files in TFS? How can we recursively set the encoding to avoid errors like these in the future?
You are getting the message because TFS thinks (correctly or incorrectly) that a file has different encodings in different branches. To double-check the encoding, go to the properties window of any source-controlled file.
Although TFS detects the encoding when a file is added to source control, if the encoding is later changed TFS will not always pick it up. Click the Set Encoding button and then click Detect to see what the actual encoding is. If it isn't what you expect, then check out the file, modify the encoding in a text editor, and then have TFS re-detect the encoding.
Once the encoding is the same in both branches you shouldn't get this error any longer.

Why does TortoiseHg think Resource.h is binary?

Using Visual Studio 2010. I have a resource.h file which TortoiseHg thinks is binary so it won't display a diff for it in the commit window. I can easily open the file in a text editor and see that it is plain text.
I saw a related question (Why does Mercurial think my SQL files are binary?) which suggests it has to do with file encoding. Indeed opening the file in Notepad++ says the file is in "UCS-2 Little Endian". How can I fix this? I, obviously, don't want to break some Visual Studio expectation.
For display purposes only, Mercurial treats all files containing NUL bytes as binary due to long-standing UNIX convention. This is just about always right.. except for UTF-16 (formerly known as UCS-2).. where half your file is NUL bytes!
Internally, Mercurial treats all files as binary all the time, so this issue is only relevant for things like whether or not we try to display diffs.
So you have two options:
ignore it, Mercurial will work just fine
use an encoding other than UTF-16
Some web searched for "resource.h utf-16" suggest that VS2010 will be just fine if you save this file in UTF-8 or ASCII, which should be perfectly fine choices for C source code.
http://social.msdn.microsoft.com/Forums/en/vssetup/thread/aff0f96d-16e3-4801-a7a2-5032803c8d83
Try explicitly converting / changing the encoding to UTF-8 / ASCII and see. You can do that from Notepad++'s Encoding menu ( choose Encode in UTF-8)
Visual Studio will work with the UTF-8 file just fine.

Visual Studio encoding problems

I have problems with files encoding in Visual Studio 2008. While compiling I'm getting such errors:
When I'm trying to open file where particular error occures, encoding window appears:
By defualt auto-detect is set. When I change encoding option to UTF-8, everything works. If I open each problematic file in my project using UTF-8 encoding, project starts to compile. The problem is I have too many files and there is ridiculous to open each file and set encoding to UTF-8. Is there any way to make this in a quick way ?
My VS settings are:
I'm using Windows Server 2008 R2.
UPDATE:
For Hans Passant and Noah Richards. Thanks for interaction. I recently changed my operating system so everything is fresh. I've also downloaded fresh solution from source control.
In OS regional settings I've changed system locale to Polish(Poland):
In VS I've changed international settings to the same as windows:
The problem is still not solved.
When I open some .cs files using auto-detection for encoding, and then check Files -> Advanced Save Options..., some of this .cs files have codepage 1250:
but internally looks following:
It is wired, because when I check properties of such particular files in source control, they seems to have UTF-8 encoding set:
I don't understand this mismatch.
All other files have UTF-8 encoding:
and opens correctly. I have basically no idea what is going wrong because as far as I know my friend has the same options set as me, and the same project compiles for him correctly. But so far he happily hasn't encountered encoding issues.
That uppercase A with circumflex tells me that the file is UTF-8 (if you look with a hex editor you will probably see that the bytes are C2 A0). That is a non-breaking space in UTF-8.
Visual Studio does not detect the encoding because (most likely) there are not enough high-ASCII characters in the file to help with a reliable detection.
Also, there is no BOM (Byte Order Mark). That would help with the detection (this is the "signature" in the "UTF-8 with signature" description).
What you can do: add BOM to all the files that don't have one.
How to add? Make a file with a BOM only (empty file in Notepad, Save As, select UTF-8 as encoding). It will be 3 bytes long (EF BB BF).
You can copy that at the beginning of each file that is missing the BOM:
copy /b/v BOM.txt + YourFile.cs YourFile_Ok.cs
ren YourFile.cs YourFile_Org.cs
ren YourFile_Ok.cs YourFile.cs
Make sure there is a + between the name of the BOM file and the one of the original file.
Try it on one or two files, and if it works you can create some batch file to do that.
Or a small C# application (since you are a C# programmer), that can detect if the file already has a BOM or not, so that you don't add it twice. Of course, you can do this in almost anything, from Perl to PowerShell to C++ :-)
Once you've opened the files in UTF-8 mode, can you try changing the Advanced Save Options for the file and saving it (as UTF-8 with signature, if you think these files should be UTF-8)?
The auto-detect encoding detection is best-effort, so it's likely that something in the file is causing it to be detected as something other than UTF-8, such as having only ASCII characters in the first kilobyte of the file, or having a BOM that indicates the file is something other than UTF-8. Re-saving the file as UTF-8 with signature should (hopefully) correct that.
If it continues happening after that, let me know, and we can try to track down what is causing them to be created/saved like that in the first place.

Visual Studio C# disable unicode or utf-8 as file encoding and use ASCII instead

I am currently working on some LaTeX document which embeds C# files generated by Visual Studio 2008. My problem is that these files are encoded in UTF-8 with BOM. This causes LaTeX to produce output similar to the output described in this post:
Invalid characters in generated latex sources in Doxygen?
I know that I can use a tool like Notepad++ to convert the file to ASCII or some other format without BOM. But my intention would be to:
either cause LaTeX to use correct input encoding (until now I failed doing it with the package imports like:
\usepackage{ucs} % unicode functionality
\usepackage[latin1]{inputenc}
or cause Visual Studio to save the files without BOM or in plain ASCII
Otherwise I might edit the file (compile it and save it in VC#) and unintentionally introduce BOM again, which would break the code listing in the document.
Many thanks,
Ovanes
Visual Studio does not have this option, by design I believe, because .NET is built from the ground-up to use Unicode.
However, I don't believe Visual Studio is supposed to use the byte order marks. You said that Visual Studio is "generating" these files, but what process is really creating them? Is it the result of some sort of code generation tool? If so, that's the culprit and the place where you should focus.
I checked several of my code files and none of them contain the byte order marks.
EDIT: Changing Visual Studio Project Templates
In the comments the questioner said that these files were generated by the built-in Console Application project template. These are stored on your hard drive and can be modified if necessary.
Your installation path may vary, but on my system, I navigated to this directory:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ProjectTemplates\CSharp\Windows\1033
Here I find ConsoleApplication.zip. I copied this to my desktop (for safety) and unzipped, and inside you find 4 files - a .vstemplate file, and the 3 files that are created by the project: AssemblyInfo.cs, ConsoleApplication.csproj, and Program.cs.
If you want, you can edit these files to remove the byte order marks, zip it back up, and replace the file in the source directory.
OR, to be safer, you can change the name of the template to "Console Project - No BOM" or something like that. In the .vstemplate file, there is a Name attribute that uses a Package attribute to call in information from somewhere by a guid. You can replace this name line with a simple line that specifies the name.
<Name>Console Application - No BOM</Name>
Then rezip the files, and put the zip file in the following path:
(My Documents)\Visual Studio 2008\Templates\ProjectTemplates\Visual C#
New projects created from this template should not contain the byte order marks, but remember, Microsoft apparently wanted those byte order marks in there, so your mileage may vary.
Item templates (like Class) can be modified in the same way - it shouldn't take too much exploring to find the default and user ItemTemplates directory.
I'm not sure I understand your scenario. But if you simply want to convert a file to ASCII from within Visual Studio select "File - Save As" and switch the encoding to ASCII.
have you tried \usepackage[UTF8]{inputenc}
In VS 2015 you can specify encoding using File -> Advanced save options... dialog
You can use the plugin for Visual Studio: https://vlasovstudio.com/fix-file-encoding/, this plugin prevents Visual Studio from adding BOM to the beginning of the file, so that way all of my files can have UTF-8 encoding and raw strings can contain special character and they will be displayed/written without any issues,

Resources