How to make ApprovalTests create UTF-8 files - visual-studio

I use Visual Studio 2019 and have added ApprovalTests nuget package. Test class is configured with [UseReporter(typeof(DiffReporter))] and approval is done with Approvals.Verify(result)
It works fine except for the file encoding. In VS I get two files opened. But I also get a warning: "These files have different encodings. Left file: Unicode (UTF-8) with signature. Right file: Western European (Windows). You can resolve the difference by saving the right file with the encoding Unicode (UTF-8) with signature."
I can obviously manually change the right file by saving it with different encoding. That will make the comparison accept the result, but I will the have a content with weird looking escaping in both windows. That makes it much less readable. Example: Simple plus sign is exchanged with \u002B
When debugging the code just before the approval I can verify that the result looks good with all characters as I expect them to look. What happens then? My impression is that the ApprovalTests framework forces an encoding that I can not control.

Related

Android Studio warnings about gradle build files

how can I solve such problem ..Warning:The project encoding (windows-1252) does not match the encoding specified in the Gradle build files (UTF-8). This can lead to serious bugs. More Info...Open File Encoding Settings
The answer may be in the link that you posted.
"When you encounter the above problem (which points to the this page), either change your IDE setting s or** build.gradle to UTF-8 such that the two matches**, or (if necessary) change your encoding to whatever custom encoding you have specified such that the two are in agreement.
(Note: If your source files contain more than plain ASCII characters, you can't "just" change the encoding to UTF-8. If your source files were written with a custom encoding, you'll need to convert them such that the actual characters are read in with the previous encoding and written out with the new encoding.)"

How can I avoid an encoding conflict when merging files in TFS?

We are working on a large codebase in Visual Studio 2010 using TFS as version control system.
When performing merges we recently get a lot of encoding issues. For the most of them we get an option "Choose encoding and merge" for other files we get:
"The encodings for the files being merged must match or an encoding conversion must be specified."
We normally put every file in UTF-8, although we get conflicts when merging between codepage 1252 and utf-8.
To solve these issues we always perform a manual merge which can be quite cumbersome. How can we avoid these errors? What is the recommended encoding for source-code files in TFS? How can we recursively set the encoding to avoid errors like these in the future?
You are getting the message because TFS thinks (correctly or incorrectly) that a file has different encodings in different branches. To double-check the encoding, go to the properties window of any source-controlled file.
Although TFS detects the encoding when a file is added to source control, if the encoding is later changed TFS will not always pick it up. Click the Set Encoding button and then click Detect to see what the actual encoding is. If it isn't what you expect, then check out the file, modify the encoding in a text editor, and then have TFS re-detect the encoding.
Once the encoding is the same in both branches you shouldn't get this error any longer.

Why does TortoiseHg think Resource.h is binary?

Using Visual Studio 2010. I have a resource.h file which TortoiseHg thinks is binary so it won't display a diff for it in the commit window. I can easily open the file in a text editor and see that it is plain text.
I saw a related question (Why does Mercurial think my SQL files are binary?) which suggests it has to do with file encoding. Indeed opening the file in Notepad++ says the file is in "UCS-2 Little Endian". How can I fix this? I, obviously, don't want to break some Visual Studio expectation.
For display purposes only, Mercurial treats all files containing NUL bytes as binary due to long-standing UNIX convention. This is just about always right.. except for UTF-16 (formerly known as UCS-2).. where half your file is NUL bytes!
Internally, Mercurial treats all files as binary all the time, so this issue is only relevant for things like whether or not we try to display diffs.
So you have two options:
ignore it, Mercurial will work just fine
use an encoding other than UTF-16
Some web searched for "resource.h utf-16" suggest that VS2010 will be just fine if you save this file in UTF-8 or ASCII, which should be perfectly fine choices for C source code.
http://social.msdn.microsoft.com/Forums/en/vssetup/thread/aff0f96d-16e3-4801-a7a2-5032803c8d83
Try explicitly converting / changing the encoding to UTF-8 / ASCII and see. You can do that from Notepad++'s Encoding menu ( choose Encode in UTF-8)
Visual Studio will work with the UTF-8 file just fine.

T4 vs UTF-8 vs VS2010

I'm trying to use T4 in VS2010, but I have a weird problem. T4 always shows error message "A directive was specified in the wrong format", although all directives are in correct format. It turned out this error was caused by a UTF-8 file prefix, which is not recognized by T4. Okay, I have to remove it. But every time I'm trying to save this file to ANSI encoding with an external editor, VS2010 changes it's encoding back to UTF-8. And the same when I modify file in VS2010. So T4 doesn't work again.
Any suggestions?
What Windows and Visual Studio language editions are you using.
T4 supports UTF-8 with/without prefix as it essentially replicates the encoding of the input template unless otherwise directed. (you have to close/reopen the output file in VS after changing encodings to see the switch). I'm not able to repro what you're seeing on EN-US Windows and VS.
If you do want to save as ANSI, you can use the File/Advanced Save Options menu in VS and pick a codepage.
I've found the reason, it had nothing to do with encoding, my bad. I copied some text from a sample, and it had wrong symbol for quotation mark (looking very similar visually, but having different char code). That made T4 parser fail.

Visual Studio encoding problems

I have problems with files encoding in Visual Studio 2008. While compiling I'm getting such errors:
When I'm trying to open file where particular error occures, encoding window appears:
By defualt auto-detect is set. When I change encoding option to UTF-8, everything works. If I open each problematic file in my project using UTF-8 encoding, project starts to compile. The problem is I have too many files and there is ridiculous to open each file and set encoding to UTF-8. Is there any way to make this in a quick way ?
My VS settings are:
I'm using Windows Server 2008 R2.
UPDATE:
For Hans Passant and Noah Richards. Thanks for interaction. I recently changed my operating system so everything is fresh. I've also downloaded fresh solution from source control.
In OS regional settings I've changed system locale to Polish(Poland):
In VS I've changed international settings to the same as windows:
The problem is still not solved.
When I open some .cs files using auto-detection for encoding, and then check Files -> Advanced Save Options..., some of this .cs files have codepage 1250:
but internally looks following:
It is wired, because when I check properties of such particular files in source control, they seems to have UTF-8 encoding set:
I don't understand this mismatch.
All other files have UTF-8 encoding:
and opens correctly. I have basically no idea what is going wrong because as far as I know my friend has the same options set as me, and the same project compiles for him correctly. But so far he happily hasn't encountered encoding issues.
That uppercase A with circumflex tells me that the file is UTF-8 (if you look with a hex editor you will probably see that the bytes are C2 A0). That is a non-breaking space in UTF-8.
Visual Studio does not detect the encoding because (most likely) there are not enough high-ASCII characters in the file to help with a reliable detection.
Also, there is no BOM (Byte Order Mark). That would help with the detection (this is the "signature" in the "UTF-8 with signature" description).
What you can do: add BOM to all the files that don't have one.
How to add? Make a file with a BOM only (empty file in Notepad, Save As, select UTF-8 as encoding). It will be 3 bytes long (EF BB BF).
You can copy that at the beginning of each file that is missing the BOM:
copy /b/v BOM.txt + YourFile.cs YourFile_Ok.cs
ren YourFile.cs YourFile_Org.cs
ren YourFile_Ok.cs YourFile.cs
Make sure there is a + between the name of the BOM file and the one of the original file.
Try it on one or two files, and if it works you can create some batch file to do that.
Or a small C# application (since you are a C# programmer), that can detect if the file already has a BOM or not, so that you don't add it twice. Of course, you can do this in almost anything, from Perl to PowerShell to C++ :-)
Once you've opened the files in UTF-8 mode, can you try changing the Advanced Save Options for the file and saving it (as UTF-8 with signature, if you think these files should be UTF-8)?
The auto-detect encoding detection is best-effort, so it's likely that something in the file is causing it to be detected as something other than UTF-8, such as having only ASCII characters in the first kilobyte of the file, or having a BOM that indicates the file is something other than UTF-8. Re-saving the file as UTF-8 with signature should (hopefully) correct that.
If it continues happening after that, let me know, and we can try to track down what is causing them to be created/saved like that in the first place.

Resources