Setting grep end of line character - windows

I have a fairly simple bash script that does some grep to find all the text in a file that does not match a pattern.
grep -v $1 original.txt >trimmed.txt
The input file ends each line with the Windows line end characters, i.e., with a carriage return and a line feed CR LF.
The output of this command (run in Cygwin) ends each line with an extra carriage return, i.e., CR CR LF.
How do I tell grep to just use CR LF?

I think you can only configure the EOL setting during Cygwin install.
If you run your original file through dos2unix first, then grep should be able to process properly (you may wish to run through unix2dos afterwards to revert the EOLs)

Related

How to solve "bad interpreter: No such file or directory"

I'm trying to run a sh script and get the following error on Mac:
/usr/bin/perl^M: bad interpreter: No such file or directory
How can I fix this?
Remove ^M control chars with
perl -i -pe 'y|\r||d' script.pl
/usr/bin/perl^M:
Remove the ^M at the end of usr/bin/perl from the #! line at the beginning of the script. That is a spurious ASCII 13 character that is making the shell go crazy.
Possibly you would need to inspect the file with a binary editor if you do not see the character.
You could do like this to convert the file to Mac line-ending format:
$ vi your_script.sh
once in vi type:
:set ff=unix
:x
The problem is that you're trying to use DOS/Windows text format on Linux/Unix/OSX machine.
In DOS/Windows text files a line break, also known as newline, is a combination of two characters: a Carriage Return (CR) followed by a Line Feed (LF). In Unix text files a line break is a single character: the Line Feed (LF).
Dos2unix can convert for you the encoding of files, in example:
dos2unix yourfile yourfile
For help, run: man dos2unix.
You seem to have weird line endings in your script: ^M is a carriage return \r. Transform your script to Unix line endings (just \n instead of \r\n, which is the line ending on Windows systems).
And if you prefer Sublime Text - simply go to View -> Line Endings and check Unix.
An alternative approach:
sudo ln -s /usr/bin/perl /usr/local/bin/perl`echo -e '\r'`

remove CR line terminators

Firstly I would say that I have read this post however I still have problems for the CR line terminators.
There is a file called build_test.sh, I edited in leafpad and it can be displayed right in Vim:
cp ~/moonbox/llvm-2.9/lib/Transforms/MY_TOOL/$1 test.cpp
cd ~/moonbox/llvm-obj/tools/TEST_TOOL/
make
make install
cd -
However:
Using cat build_test.sh it outputs nothing.
Using more build_test.sh it outputs:cd - install/llvm-obj/tools/TEST_TOOL/Y_TOOL/$1 test.cpp
Using less build_test.sh it outputs: cp ~/moonbox/llvm-2.9/lib/Transforms/MY_TOOL/$1 test.cpp^Mcd ~/moonbox/llvm-obj/tools/TEST_TOOL/^Mmake^Mmake install^Mcd -
The result of file build_test.sh is:
build_test.sh: ASCII text, with CR line terminators
Following this post, the ^M no longer exists however there is no more line break :-(
The result of file build_test_no_cr.sh is now:
build_test_nocr.sh: ASCII text, with no line terminators
The solution can be seen here.
However I still would like why cat displays nothing and more displays so odd result. In addition why dos2unix and set fileformat=unix in Vim fails for this case.
ps: I guess that maybe my editor(Vim or leafpad?) generates only \r rather \n for the newline. How can it be so?
Simple \r terminators for newlines are "old Mac" line terminators, it is strange that an editor in 2012+ even generates files with such line terminators... Anyway, you can use the mac2unix command, which is part of the dos2unix distribution:
# Edits thefile inline
mac2unix thefile
# Takes origfile as an input, outputs to dstfile
mac2unix -n origfile dstfile
This command will not munge files which have already expected line terminators, which is a bonus. And the reverse (unix2mac) also exists.
Note that mac2unix is the same as dos2unix -c mac.
Also, if you work with vim, you can enforce UNIX line endings by executing
:set fileformat=unix
:w
or just add
set fileformat=unix
to your .vimrc file
I finally figured out that I could use this command:
tr '^M' '\n' <build_test.sh >build_test_nocr.sh
where ^M is added by pressing Ctrl+v and Enter keys.Alternately, this has the same effect:
tr '\r' '\n' <build_test.sh >build_test_nocr.sh

Shell Scripting unwanted '?' character at the end of file name

I get an unwanted '?' at the end of my file name while doing this:
emplid=$(grep -a "Student ID" "$i".txt | sed 's/(Student ID: //g' | sed 's/)Tj//g' )
#gets emplid by doing a grep from some text file
echo "$emplid" #prints employee id correctly
cp "$i" "$emplid".pdf #getting an extra '?' character after emplid and before .pdf
i.e instead of getting the file name like 123456.pdf , I get 123456?.pdf .
Why is this happening if the echo prints correctly?
How can I remove trailing question mark characters ?
It sounds like your script file has DOS-style line endings (\r\n) instead of unix-style (just \n) -- when a script in this format, the \r gets treated as part of the commands. In this instance, it's getting included in $emplid and therefore in the filename.
Many platforms support the dos2unix command to convert the file to unix-style line endings. And once it's converted, stick to text editors that support unix-style text files.
EDIT: I had assumed the problem line endings were in the shell script, but it looks like they're in the input file ("$i".txt) instead. You can use dos2unix on the input file to clean it and/or add a cleaning step to the sed command in your script. BTW, you can have a single instance of sed apply several edits with the -e option:
emplid=$(grep -a "Student ID" "$i".txt | sed '-e s/(Student ID: //g' -e 's/)Tj//g' -e $'s/\r$//' )
I'd recommend against using sed 's/.$//' -- if the file is in unix format, that'll cut off the last character of the filename.
using the file command to detect if it is pure unix or mixed with DOS.
DOS file: ASCII text, with CRLF line terminators
Unix file is pure ASCII file.

How to fix inconsistent line endings for whole VS solution?

Visual Studio will detect inconsistent line endings when opening a file and there is an option to fix it for that specific file. However, if I want to fix line endings for all files in a solution, how do I do that?
Just for a more complete answer, this worked best for me:
Replace
(?<!\r)\n
with
\r\n
in entire solution with "regEx" option.
This will set the correct line ending in all files which didn't have the correct line ending so far. It uses the negative lookahead to check for the non-existance of a \r in front of the \n.
Be careful with the other solutions: They will either modify all lines in all files (ignoring the original line ending) or delete the last character of each line.
You can use the Replace in Files command and enable regular expressions. For example, to replace end-of-lines that have a single linefeed "\n" (like, from GitHub, for example) with the standard Windows carriage-return linefeed "\r\n", search for:
([^\r]|^)\n
This says to create a group (that's why the parentheses are required), where the first character is either not a carriage-return or is the beginning of a line. The beginning of the line test is really only for the very beginning of the file, if it happens to start with a "\n". Following the group is a newline. So, you will match ";\n" which has the wrong end-of-line, but not "\r\n" which is the correct end-of-line.
And replace it with:
$1\r\n
This says to keep the group ($1) and then replace the "\n" with "\r\n".
Try doing
Edit > Advanced > Format Document
Then save the document, as long as the file doesn't get modified by another external editor, it should stay consistent. Fixes it for me.
If you have Cygwin with the cygutils package installed, you can use this chain of commands from the Cygwin shell:
unix2dos -idu *.cpp | sed -e 's/ 0 [1-9][0-9]//' -e 's/ [1-9][0-9]* 0 //' | sed '/ [1-9][0-9] / !d' | sed -e 's/ [1-9][0-9 ] //' | xargs unix2dos
(Replace the *.cpp with whatever wildcard you need)
To understand how this works, the unix2dos command is used to convert the files, but only files that have inconsistent line endings (i.e., a mixture of UNIX and DOS) need to be converted. The -idu option displays the number of dos and unix line endings in the file. For example:
0 491 Acad2S5kDim.cpp
689 0 Acad2S5kPolyline.cpp
0 120 Acad2S5kRaster.cpp
433 12 Acad2S5kXhat.cpp
0 115 AppAuditInfo.cpp
Here, only the Acad2S5kXhat.cpp file needs to be converted. The sed commands filter the output to produce a list of just the files that need to be converted, and these are then processed via xargs.

How to convert Windows end of line in Unix end of line (CR/LF to LF)

I'm a Java developer and I'm using Ubuntu to develop. The project was created in Windows with Eclipse and it's using the Windows-1252 encoding.
To convert to UTF-8 I've used the recode program:
find Web -iname \*.java | xargs recode CP1252...UTF-8
This command gives this error:
recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data
I've searched about it and get the solution in Bash and Windows, Recode: Ambiguous output in step `data..CR-LF' and it says:
Convert line endings from CR/LF to a
single LF: Edit the file with Vim,
give the command :set ff=unix and save
the file. Recode now should run
without errors.
Nice, but I've many files to remove the CR/LF character from, and I can't open each to do it. Vi doesn't provide any option to command line for Bash operations.
Can sed be used to do this? How?
There should be a program called dos2unix that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.
sed cannot match \n because the trailing newline is removed before the line is put into the pattern space, but it can match \r, so you can convert \r\n (DOS) to \n (Unix) by removing \r:
sed -i 's/\r//g' file
Warning: this will change the original file
However, you cannot change from Unix EOL to DOS or old Mac (\r) by this. More readings here:
How can I replace a newline (\n) using sed?
Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:
:args **/*.java
:argdo set ff=unix | update | next
The first of these commands sets the argument list to every file matching **/*.java, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:
Sets the line-endings to Unix style (you already know this)
Writes the file out iff it's been changed
Proceeds to the next file
I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n, just look for carriage return at the end of the line.
sed -i 's/\r$//' "${FILE_NAME}"
To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r to make this easier with grep regular expressions.)
sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"
Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.
Warning: -i changes the actual file. If you want a backup to be made, add a string of characters after -i. This will move the existing file to a file with the same name with your characters added to the end.
Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:
sed -i 's/$/\r/' "${FILE_NAME}"
The tr command can also do this:
tr -d '\15\32' < winfile.txt > unixfile.txt
and should be available to you.
You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:
#!/bin/bash
for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done
Running myscript.sh would process all the java files in the current directory and its subdirectories.
In order to overcome
Ambiguous output in step `CR-LF..data'
the simple solution might be to add the -f flag to force the conversion.
Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):
#!/usr/bin/env python
import sys
input_file_name = sys.argv[1]
output_file_name = sys.argv[2]
input_file = open(input_file_name)
output_file = open(output_file_name, 'w')
line_number = 0
for input_line in input_file:
line_number += 1
try: # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
try: # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
sys.exit(1) # and just keep going
output_file.write(output_line)
input_file.close()
output_file.close()
You can use that script with
$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql
Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u on the files.

Resources