How do I get patch to ignore carriage returns? - whitespace

I'm attempting to apply a patch to a file with Windows line endings on a Linux system and I'm getting conflicts due to the carriage returns in the file.
The -l option (ignore whitespace) isn't ignoring the EOL characters. Is there anyway to get patch to ignore windows style line endings?

Try using the --binary option, from the manpage (emphasis mine)
--binary
Write all files in binary mode, except for standard output and /dev/tty. When reading, disable the heuristic for transforming CRLF line endings into LF line endings. (On POSIX -conforming systems, reads and writes never transform line endings. On Windows, reads and writes do transform line endings by default, and patches should be generated by diff --binary when line endings are significant.)
I don't fully understand the above, but it worked for me on a Linux machine to apply a Unix patch onto a DOS file.

Here's a link http://www.chemie.fu-berlin.de/chemnet/use/info/diff/diff_2.html
The -w' and--ignore-all-space' options ignore difference even if one file has white space >where the other file has none. White space characters include tab, newline, vertical tab, >form feed, carriage return, and space
Run diff like: diff -w file1.txt file2.txt

I work around this using the following commands to convert all files of interest to unix line endings.
dos2unix `grep Index\: mixed-line-ending.patch | sed -e 's/Index\://'`
dos2unix mixed-line-ending.patch
patch -p0 < mixed-line-ending.patch

I had this problem with a diff that was manually copied and pasted from git diff console output, into a patch file with LFs. To get that patch file to work again - to be able to be applied on the actual files that were using CRs and LFs - several things had to be done manually:
find all instances of "^M" and drop them
add CR to all lines within the hunks - but not the meta format lines (## etc)
on all lines within hunks that were empty, add the missing space in the first column
joe syntax highlighting was very helpful there, because it colored hunks properly as soon as I fixed them.

Tell patch to ignore white space:
-l, --ignore-whitespace
Causes the pattern matching to be done loosely, in case the tabs
and spaces have been munged in your input file. Any sequence of
whitespace in the pattern line will match any sequence in the
input file. Normal characters must still match exactly. Each
line of the context must still match a line in the input file.
This ignores mismatches of EOLs too -- at least, on FreeBSD, using patch version 2.0-12u11.

Related

How to generate and apply git patches correctly in Powershell (vs bash)?

Please, observe the following short scenario (this is in Powershell):
PS> git diff -U3 -r -M HEAD -- .\Metadata\LegacyTypeModules\xyz.Web.Main.draft.json | Out-File -Encoding ascii c:\temp\1.diff
PS> git apply --cached C:\temp\1.diff
error: patch failed: Metadata/LegacyTypeModules/xyz.Web.Main.draft.json:69
error: Metadata/LegacyTypeModules/xyz.Web.Main.draft.json: patch does not apply
This fails because the last line in the file does not end with CRLF:
However, the same exact commands work when run in bash:
$ git diff -U3 -r -M HEAD -- Metadata/LegacyTypeModules/xyz.Web.Main.draft.json > /c/Temp/2.diff
$ git apply --cached /c/Temp/2.diff
P11F70F#L-R910LPKW MINGW64 /c/xyz/tip (arch/1064933)
The difference between the two patches is:
So the problem seems to happen because Powershell terminates each line going through the pipe with CRLF whereas bash preserves the original line endings.
I understand why this happens - Powershell operates with objects and the objects are strings excluding the EOL characters. When writing to file Powershell converts objects to strings (in the case of strings the conversion is a nop) and uses the default EOL sequence to delimit the lines.
Does it mean Powershell cannot be used at all in EOL sensitive scenarios?
Indeed:
PowerShell invariably decodes output from external programs as text (using [Console]::OutputEncoding)
It then sends the decoded output line by line through the pipeline, as lines become available.
A file-output cmdlet such as Out-File then invariably uses the platform-native newline sequence - CRLF on Windows - to terminate each (stringified) input object when writing to the target file (using its default character encoding (or the one specified via -Encoding), which is technically unrelated to the encoding that was used to decode the external-program output).
In other words: PowerShell pipelines (and redirections) do not support passing raw binary data through, as of PowerShell 7.2 - future raw-data support is being discussed in GitHub issue #1908.
Workarounds:
Manually join and terminate the decoded output lines with LF newlines ("`n"), and write the resulting multi-line string as-is (-NoNewLine) to the target file, as shown in zdan's helpful answer.
In this simple case it is easiest to delegate to cmd.exe /c, given that cmd.exe's pipelines and redirections are raw byte conduits:
cmd /c #'
git diff -U3 -r -M HEAD -- .\Metadata\LegacyTypeModules\xyz.Web.Main.draft.json > c:\temp\1.diff
'#
Note the use of a here-string for readability and ease of any embedded quoting (n/a here).
You could try using join to replace the CRLF with unix EOL:
(git command arguments . . .) -join "`n" | out-file c:\temp\1.diff -NoNewline
A standard diff (also known as a patch) terminates lines with LF line endings. That's because that's what POSIX specifies for the output of diff. All lines must contain an LF line ending.
When a CR precedes the LF in a patch, it is considered part of the content to be patched. Therefore, the patch in your situation likely doesn't apply because the old content is being listed as having CRLF line endings, which it does not.
Unfortunately, PowerShell is completely broken in this regard and its pipelines corrupt data passing through them. This is especially true for any sort of binary data. If you're using any sort of tool designed to run on Unix like Git, you'll need to avoid using PowerShell's pipes.

Why is there no such file or directory_profile?

I am using Windows and MobaXterm.
I created a .bash_profile file in the ~ directory and the following line
alias sbp="source ~/.bash_profile"
is the only code in that file.
However, when I was trying to do sbp, I got an error.
This works on my Mac and it used to work on my old Windows computer (but that one has some water damage so it broke down). Why does this not work now?
Thanks in advance!
From the way that error message is garbled I'm pretty sure that the .bash_profile file you created has DOS/Windows-style line endings, consisting of a carriage return character followed by a newline character. Unix tools expect unix-style line endings consisting of just a newline; if they see DOS/Windows-style endings, they'll treat the carriage return as part of the content of the line. In this case, bash will treat the carriage return as part of the alias definition, and therefore part of the filename to filename to source. Try running alias sbp | cat -vt to print the alias with invisible characters shown; my guess is it'll print alias sbp='source ~/.bash_profile^M' (where the ^M is cat -vt's way of representing the carriage return).
Solution: convert the file to unix format, and either switch to a text editor that knows how to save in unix format, or change your settings in the current editor to do it. For conversion, there are a number of semi-standard tools like dos2unix and fromdos. If you don't have any of those, this answer has some other options.
BTW, the reason the error message is garbled is that the CR gets printed as part of the error message, and the terminal treats that as an instruction to go back to the beginning of the line; it then prints the rest of the message over top of the beginning of the message. It's a little like this:
-bash: /home/dir/path/.bash_profile
: No such file or directory
...but with the second line printed over the first, so it comes out as:
: No such file or directory_profile

In shell script, colon(:) is being treated as a operator for variable creation

I have following snippet:
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
the output is:
:80ps://example.com
How can I escape the colon here. I also tried:
url="${host}\:${port}"
but it did not work.
Expected output is:
https://example.com:80
You've most likely run into what I call the Linefeed-Limbo.
If I copy the code you provided from StackOverflow and run it on my machine (bash version 4.4.19(1)), then it outputs correctly
user#host:~$ cat script.sh
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
user#host:~$ bash script.sh
https://example.com:80
What is Linefeed-Limbo?
Different operating systems use different ASCII symbols to represent when a new line occurs in a text, such as a script. This Wikipedia article gives a good introduction.
As you can see, Unix and Unix-like systems use the single character \n, also called a "Line Feed". Windows, as well as other systems, use \r\n, so a "carriage return" followed by a "line feed".
What happens now is when you write a script on Windows on an editor such as notepad, what you write is host="example.com"\r\n. When you copy this file into Linux, Linux interprets the \r as if it were part of the script, since only \n is considered a new line. And indeed, when I change my newline style to DOS-style, I get the exact output you get.
How can I fix this?
You have several options to fix this issue.
Converting the script (with dos2unix)
Since all you need to do is replacing every instance of \r\n with \n, you could use any text-editing software you want. However, if you like simple solutions, then dos2unix (and its sister unix2dos) might be what you looking for:
user#host:~$ dos2unix script.sh
dos2unix: converting file script.sh to Unix format...
That's it. Run your file now and you will see it behaves well.
Encoding the source-file correctly
By using a more advanced text editor such as Notepad++, you can define which style of newline you would like to use.
By changing the newline-type to whichever system you intend to run your script on, you will not run into any problems like this anymore.
Bonus round: Why does it output :80ps://example.com?
To understand why your output is like this, you have to look at what your script is doing, and what \r means.
Try thinking of your terminal as an old-fashioned typewriter. Returning the carriage means you start writing on the left again. Making a "new line" means sliding the paper. These two things are seperate, and I think that's why some systems decided to use these two characters as a logical "new line".
But I digress. Let's look at the first line, host="https://example.com"\r.
What this means when printed is "Print https://example.com, then put the carriage back at the start". When you then print :80\r, it doesn't start after ".com", it starts at the beginning of the line, because that's where you (unknowingly) told the cursor to go. it then overwites the first few characters, resulting in ":80ps://example.com" to be written. Keep in mind that after 80, you again placed a carriage return symbol, so any new text you would have written ends up overwriting the beginning again.
It works for me, try to remove carriage returns in variables and then try.
new_host=$(echo "$host" | tr -d '\r')
new_port=$(echo "$port" | tr -d '\r')
new_url="${new_host}:${new_port}"

Sed deletion affecting all lines in file

Disclaimer: I have very little experience with sed.
I'm trying to automate a somewhat tedious task by use of a bash script, and one of the steps is to remove a single line from a Maven pom.xml file, which I'm trying to do with the following sed command:
sed -i '/<module>'"${MODULE_NAME}"'<\/module>/d' ./pom.xml
It seems to work fine. The problem is that all lines of the pom are affected by this, as a git diffcall shows that 1,680 lines have been added and 1,681 lines have been removed.
This is obviously a pain, as it makes it very hard for code reviewers to spot the one line difference.Is there a way to make sed perform this deletion without it affecting the other lines in the file?
EDIT: When opening the project in IntelliJ, the diff is correctly recognized as just a single line. In addition it seems that sed has changed line break style from CRLF to LF (yes, I'm on Windows). Would this be enough to trigger all lines to be different in git?
$ git diff --ignore-space-at-eol
edit
To avoid unintended line change, you can configure Git to automatically change line endings using core.autocrlf or use unix2dos program to fix file after sed.

sed can not work in script file in Windows

I once write a simple sed command like this
s/==/EQU/
while I run it in command line:
sed 's/==/EQU' filename
it works well, replace the '==' with 'EQU', but while I write the command to a script file named replace.sed, run it in this way:
sed -f replace.sed filename
there is a error, says that
sed: file replace.sed line 1: unknwon option to 's'
What I want to ask is that is there any problem with my script file replace.sed while it run in windows?
The unknown option is almost invariably a rogue character after the trailing / (which is missing from your command line version, by the way so it should complain about an unterminated command).
Have a look at you replace.sed again. You may have a funny character at the end, which could include the ' if you forgot to delete it, or even a CTRL-M DOS-style line ending, though CygWin seems to handle this okay - you haven't specified which sed you're using (that may help).
Okay, based on your edit, it looks like one of my scattergun of suggestions was right :-) You had CTRL-M at the end of the line because of the CR/LF line endings:
At the end of each line in the *.sed file, there was a 'CR\LF' pair, and that the problem, but you cannot see it by default, I use notepad to delete them manually and fix the problem. But I have not find a way to delete it automatically or do not contain the 'new-line' style while edit a new text file in windows.
You may want to get your hands on a more powerful editor like Notepad++ or gVim (my favourite) but, in fact, you do have a tool that can get rid of those characters :-) It's called sed.
sed 's/\015//g' replace.sed >replace2.sed
should get rid of all the CR characters from your file and give you a replace2.sed that you can use for your real job.

Resources