Why would an auto conversion of LF to CRLF by Xerces result in CRCRLF? - xerces-c

From the Xerces documention on setNewLine, “However, Xerces-C++ always uses LF when this property is set to null since otherwise automatic translation of LF to CR-LF on Windows for text files would result in such files containing CR-CR-LF. If you need Windows-style end of line sequences in your output, consider writing to a file opened in text mode or explicitly set this property to CR-LF.” That statement makes no sense to me.
https://xerces.apache.org/xerces-c/apiDocs-3/classDOMLSSerializer.html#a56882d2fe0b4a0ecb1b3968febbcf4a3
Why an auto conversion of line endings results in a duplicate CR is beyond me. I do not understand why that would ever be reasonable. I have tried changing the code to explicitly set the line ending to CR-LF as described in the documentation and that does not work. I still end up with xml files that have CRCRLF as the line ending and then I have to manually remove the duplicate CR with a text editor such as notepad++.

Related

Prevent Git Gui from highlighting trailing spaces

Git Gui shows spaces at the end of line highlighted with red; how can I turn off this feature?
Apparently (see comments) Git Gui uses the same control knob here as plain command-line git, namely the core.whitespace setting, as described in the git config documentation:
core.whitespace
A comma separated list of common whitespace problems to notice. git diff will use color.diff.whitespace to highlight them, and git apply --whitespace=error will consider them as errors. You can prefix - to disable any of them (e.g. -trailing-space):
blank-at-eol treats trailing whitespaces at the end of the line as an error (enabled by default).
space-before-tab treats a space character that appears immediately before a tab character in the initial indent part of the line as an error (enabled by default).
indent-with-non-tab treats a line that is indented with space characters instead of the equivalent tabs as an error (not enabled by default).
tab-in-indent treats a tab character in the initial indent part of the line as an error (not enabled by default).
blank-at-eof treats blank lines added at the end of file as an error (enabled by default).
trailing-space is a short-hand to cover both blank-at-eol and blank-at-eof.
cr-at-eol treats a carriage-return at the end of line as part of the line terminator, i.e. with it, trailing-space does not trigger if the character before such a carriage-return is not a whitespace (not enabled by default).
tabwidth=<n> tells how many character positions a tab occupies; this is relevant for indent-with-non-tab and when Git fixes tab-in-indent errors. The default tab width is 8. Allowed values are 1 to 63.
(I'm not sure how Git Gui allows you to modify the config, or whether you must do that from a command line. Presumably you want -trailing-space in this case, or maybe just -blank-at-eol.)

Corruption when using certain batch variable names in custom build command

I have a VS2013 project with a custom build command. In the command script I set an environment variable, and read it out again in the same script. I can confirm by calling set that setting the variable works. However, depending on the variable name, I can't read it out again.
The following works as expected when run as a batch script:
set AVAR=xxx
set ABLAH=xxx
set BBLAH=xxx
set DEV=xxx
set #ABLAH=xxx
echo %AVAR%
echo %ABLAH%
echo %BBLAH%
echo %DEV%
echo %#ABLAH%
But produces the following output in the project:
1> xxx
1> «LAH
1> »LAH
1> ÞV
1> xxx
In this case, the name AVAR works, but many others don't. Also, variables starting with # seem to work. Any idea what is going on?
I've found the solution. Visual Studio (msbuild) converts %XX escape sequences like in URLs. I only expected it to so in URLs, like browsers do. However, it seems to replace them everywhere.
So when it encounters %ABCDE%, it recognizes %AB and inserts the character « = 0xAB, giving «CDE% to the batch interpreter. But if the code is not a valid hexadecimal number, it silently ignores it, and the interpreter sees the right characters. That's why variable names with # at the beginning always worked.
So the solution is to escape at least all % in front valid hex codes 00-FF, better even all of them, with %25.
An easy solution would be to just edit the corresponding commands in the GUI (via project properties), and not directly in the .vcxproj or .props file. This way, VS inserts the correct escape codes. In my case this was not possible since the commands were defined as user macros (Property Pages: Common Properties/User Macros). My commands span multiple lines, but the user macro editor only supports single lines.
Another thing to watch out for is that it not only replaces percent signs. Other symbols have special meaning and have to be replaced, too. (This goes beyond XML entities, like & -> &.) Here is a list of special characters from MSDN. The characters are: % $ # ' ; ? *. It doesn't seem to be necessary to replace all of them all the time, but if you notice funky behavior then this is a thing to look at. You can try to enter these characters through the GUI and see how and if VS escapes them in the project file.
On other character to note especially is the semicolon. If you define a property with unescaped semicolons, like <MyPaths>DirA;DirB</MyPaths>, msbuild/VS will internally convert them to newlines (well, or it splits the property into a list or something). But it will still show the paths as separated with semicolons in the property pages! Except when you click the dropdown button next to a property and select <Edit...>, then it will show the paths as a list or separated by newlines! This is completely invisible most of the time, except when you set a property not in XML or the GUI, but you are reading the output of a command into a property. In this case the command must output newlines, if you want the effect of a semicolon. Otherwise you don't get multiple paths, but one long path with semicolons in it.
Batch files are usually in North American and Western European countries "ASCII" files using an OEM code page like code page 850 (OEM multilingual Latin I) or code page 437 (OEM US) and not code page Windows-1252 as used usually for single byte encoded text files. The code page to use for a batch file depends on local settings for non Unicode files in console. The code page does not matter if just characters with a code value smaller 128 are used in batch file, i.e. the batch file is a real ASCII file.
Therefore make sure that you edit and save the batch file as ASCII file using the right code page and not as Unicode file using UTF-8, UTF-16 Little Endian or UTF-16 Big Endian. Editor of Visual Studio uses by default UTF-8 encoding for the files. This is the wrong encoding for batch files.
Character « has in table of code page 850 the code value 174 decimal (0xAB). In table of code page 1252 code value 174 is for character ® which is an indication that you want to output in batch file characters encoded in UTF-8 (also code value 174 for character ®) or Windows-1252.
A simple batch code for demonstration stored as ANSI file with code page Windows-1252.
#echo off
cls
echo This batch file was saved as ANSI file using code page Windows-1252.
echo.
echo Registered trademark symbol ® has code value 174 in Windows-1252.
echo.
echo But active code page is not Windows 1252 in console window.
echo.
chcp
echo.
echo Therefore the left guillemet character is output instead of registered
echo trademark symbol as this character has in code page 850 code value 174.
echo.
echo Press any key to continue ...
pause>nul
And batch files are for DOS/Windows and should therefore use carriage return + line-feed as line terminator instead of just line-feed (UNIX) or just carriage return (old MAC).
Some text editors display line terminator type and encoding respectively code page somewhere in status bar at bottom of main application window for active file.

«Inconsistent Line Ending» in Visual Studio when editing from outside VS

I have written a script that checks out a file that changes a value in a line of a file and checks in the code. But after that when I open the file it gives me a popup
Inconsistent line Ending
The Line endings in the following file are
not consistent. Do you want to normalize the line ending.
Is there a way to avoid this? When I compare I do not see any difference. Would it cause any issues for compiling the program?
The problem you met is about a different endline encoding. I bet that the script you wrote for changing files insert a line ending like \n. It is a «*nix» notation, usually also called «LF». Windows notation for a newline for some (I guess historic) reason requires two characters, it is called «CR/LF». That is, you need in your script insert not just the \n, but \r\n. Just for you interest, there is also just «CR» notation, i.e. \r — it was used in older MACs.
The message you see complains about the fact, that a file now have different line endings. That is, every line in the file was most likely in «CR/LF», and now there's a line in another notation. You ought to have the same line notation throughout the file, with disregard would it be «Unix», «MAC», or «Windows» one.
When I compare I do not see any difference.
It is non-printable characters, and usually not shown in text-diff utilities.
Would it cause any issues for compiling the program?
Hardly it could cause any compile problems. Anyway, now you know what is the problem, and how to fix it.
This is the code i used in powershell to
$enc = New-Object System.Text.UTF8Encoding( $false ) # required to save the file with UTF8 Without BOM
$wrt = New-Object System.XML.XMLTextWriter( $phyicalPath, $enc )
$wrt.Formatting = 'Indented'
$webconfig.Save($wrt)
$wrt.Close()
(Get-Content $phyicalPath)|Set-Content -Path $phyicalPath -Force # normalize line ending

Unknown Character

Facing a typical issue of some unknown character.
Actually trying to compile some packages in database through script and got an error as below:
SP2-0734: unknown command beginning "?SET DEF..." - rest of line ignored.
When i open the log file in notepad++ it shows the line as shown above.
Now, if I open the same log file in scite editor it shows the same file as:
SP2-0734: unknown command beginning "SET DEF..." - rest of line ignored.
Not getting what could be the issue.
Any help would be welcomed.
Your script has an unprintable character at the start (as you discovered from comments), which some editors don't display at all, and others display as an unknown character. "" is the byte order mark:
The UTF-8 representation of the BOM is the byte sequence
0xEF,0xBB,0xBF. A text editor or web browser interpreting the text as
ISO-8859-1 or CP1252 will display the characters  for this.
From that article some editors (notable Notepad) add that automatically. It should be safe to open the file with a hex editor and remove the extra character, and you'll then be able to run the script normally.

What's with Ruby's ZipInputStream screwing up my line endings?

I'd be happy with ZipInputStream taking indecent liberties with the line endings that are stored in a file if it would at least get them right for the platform I'm storing the file on. Unfortunately, I pull a text file (.txt, .cpp. .etc.) out of a zip and the \n (0x0A) gets replaced with a \r\n (0x0d0a) and, as you can imagine, this is causing me a great deal of trouble.
Is there a flag I can set to tell it either to avoid changing the line endings altogether or to use one of my choosing?
Thanks.
(I've checked the zip file, my creation of it, etc. I've extracted it using other zip tools and verified that it is archived properly. I've stepped through my project with rdebug and seen that the ZipInputStream call to read() is returning \r\n for line endings.)
if you have an open(filename) or open(filename,"r") call in your code, try to replace it with open(filename,"rb")

Resources