Shell Scripting unwanted '?' character at the end of file name - bash

I get an unwanted '?' at the end of my file name while doing this:
emplid=$(grep -a "Student ID" "$i".txt | sed 's/(Student ID: //g' | sed 's/)Tj//g' )
#gets emplid by doing a grep from some text file
echo "$emplid" #prints employee id correctly
cp "$i" "$emplid".pdf #getting an extra '?' character after emplid and before .pdf
i.e instead of getting the file name like 123456.pdf , I get 123456?.pdf .
Why is this happening if the echo prints correctly?
How can I remove trailing question mark characters ?

It sounds like your script file has DOS-style line endings (\r\n) instead of unix-style (just \n) -- when a script in this format, the \r gets treated as part of the commands. In this instance, it's getting included in $emplid and therefore in the filename.
Many platforms support the dos2unix command to convert the file to unix-style line endings. And once it's converted, stick to text editors that support unix-style text files.
EDIT: I had assumed the problem line endings were in the shell script, but it looks like they're in the input file ("$i".txt) instead. You can use dos2unix on the input file to clean it and/or add a cleaning step to the sed command in your script. BTW, you can have a single instance of sed apply several edits with the -e option:
emplid=$(grep -a "Student ID" "$i".txt | sed '-e s/(Student ID: //g' -e 's/)Tj//g' -e $'s/\r$//' )
I'd recommend against using sed 's/.$//' -- if the file is in unix format, that'll cut off the last character of the filename.

using the file command to detect if it is pure unix or mixed with DOS.
DOS file: ASCII text, with CRLF line terminators
Unix file is pure ASCII file.

Related

Using variable names with find in a loop [duplicate]

How can I programmatically (not using vi) convert DOS/Windows newlines to Unix newlines?
The dos2unix and unix2dos commands are not available on certain systems.
How can I emulate them with commands such as sed, awk, and tr?
You can use tr to convert from DOS to Unix; however, you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair. This is usually the case. You then use:
tr -d '\015' <DOS-file >UNIX-file
Note that the name DOS-file is different from the name UNIX-file; if you try to use the same name twice, you will end up with no data in the file.
You can't do it the other way round (with standard 'tr').
If you know how to enter carriage return into a script (control-V, control-M to enter control-M), then:
sed 's/^M$//' # DOS to Unix
sed 's/$/^M/' # Unix to DOS
where the '^M' is the control-M character. You can also use the bash ANSI-C Quoting mechanism to specify the carriage return:
sed $'s/\r$//' # DOS to Unix
sed $'s/$/\r/' # Unix to DOS
However, if you're going to have to do this very often (more than once, roughly speaking), it is far more sensible to install the conversion programs (e.g. dos2unix and unix2dos, or perhaps dtou and utod) and use them.
If you need to process entire directories and subdirectories, you can use zip:
zip -r -ll zipfile.zip somedir/
unzip zipfile.zip
This will create a zip archive with line endings changed from CRLF to CR. unzip will then put the converted files back in place (and ask you file by file - you can answer: Yes-to-all). Credits to #vmsnomad for pointing this out.
Use:
tr -d "\r" < file
Take a look here for examples using sed:
# In a Unix environment: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # Assumes that all lines end with CR/LF
sed 's/^M$//' # In Bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' # Works on ssed, gsed 3.02.80 or higher
# In a Unix environment: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command line under ksh
sed 's/$'"/`echo \\\r`/" # Command line under bash
sed "s/$/`echo \\\r`/" # Command line under zsh
sed 's/$/\r/' # gsed 3.02.80 or higher
Use sed -i for in-place conversion, e.g., sed -i 's/..../' file.
You can use Vim programmatically with the option -c {command}:
DOS to Unix:
vim file.txt -c "set ff=unix" -c ":wq"
Unix to DOS:
vim file.txt -c "set ff=dos" -c ":wq"
"set ff=unix/dos" means change fileformat (ff) of the file to Unix/DOS end of line format.
":wq" means write the file to disk and quit the editor (allowing to use the command in a loop).
Install dos2unix, then convert a file in-place with
dos2unix <filename>
To output converted text to a different file use
dos2unix -n <input-file> <output-file>
You can install it on Ubuntu or Debian with
sudo apt install dos2unix
or on macOS using Homebrew
brew install dos2unix
Using AWK you can do:
awk '{ sub("\r$", ""); print }' dos.txt > unix.txt
Using Perl you can do:
perl -pe 's/\r$//' < dos.txt > unix.txt
This problem can be solved with standard tools, but there are sufficiently many traps for the unwary that I recommend you install the flip command, which was written over 20 years ago by Rahul Dhesi, the author of zoo.
It does an excellent job converting file formats while, for example, avoiding the inadvertant destruction of binary files, which is a little too easy if you just race around altering every CRLF you see...
If you don't have access to dos2unix, but can read this page, then you can copy/paste dos2unix.py from here.
#!/usr/bin/env python
"""\
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py <input> <output>
"""
import sys
if len(sys.argv[1:]) != 2:
sys.exit(__doc__)
content = ''
outsize = 0
with open(sys.argv[1], 'rb') as infile:
content = infile.read()
with open(sys.argv[2], 'wb') as output:
for line in content.splitlines():
outsize += len(line) + 1
output.write(line + '\n')
print("Done. Saved %s bytes." % (len(content)-outsize))
(Cross-posted from Super User.)
The solutions posted so far only deal with part of the problem, converting DOS/Windows' CRLF into Unix's LF; the part they're missing is that DOS use CRLF as a line separator, while Unix uses LF as a line terminator. The difference is that a DOS file (usually) won't have anything after the last line in the file, while Unix will. To do the conversion properly, you need to add that final LF (unless the file is zero-length, i.e. has no lines in it at all). My favorite incantation for this (with a little added logic to handle Mac-style CR-separated files, and not molest files that're already in unix format) is a bit of perl:
perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/([^\n])\z/$1\n/ }; $m=1' PCfile.txt
Note that this sends the Unixified version of the file to stdout. If you want to replace the file with a Unixified version, add perl's -i flag.
It is super duper easy with PCRE;
As a script, or replace $# with your files.
#!/usr/bin/env bash
perl -pi -e 's/\r\n/\n/g' -- $#
This will overwrite your files in place!
I recommend only doing this with a backup (version control or otherwise)
An even simpler AWK solution without a program:
awk -v ORS='\r\n' '1' unix.txt > dos.txt
Technically '1' is your program, because AWK requires one when the given option.
Alternatively, an internal solution is:
while IFS= read -r line;
do printf '%s\n' "${line%$'\r'}";
done < dos.txt > unix.txt
Interestingly, in my Git Bash on Windows, sed "" did the trick already:
$ echo -e "abc\r" >tst.txt
$ file tst.txt
tst.txt: ASCII text, with CRLF line terminators
$ sed -i "" tst.txt
$ file tst.txt
tst.txt: ASCII text
My guess is that sed ignores them when reading lines from the input and always writes Unix line endings to the output.
For Mac OS X if you have Homebrew installed (http://brew.sh/):
brew install dos2unix
for csv in *.csv; do dos2unix -c mac ${csv}; done;
Make sure you have made copies of the files, as this command will modify the files in place.
The -c mac option makes the switch to be compatible with OS X.
I had just to ponder that same question (on Windows-side, but equally applicable to Linux).
Surprisingly, nobody mentioned a very much automated way of doing CRLF <-> LF conversion for text-files using the good old zip -ll option (Info-ZIP):
zip -ll textfiles-lf.zip files-with-crlf-eol.*
unzip textfiles-lf.zip
NOTE: this would create a ZIP file preserving the original file names, but converting the line endings to LF. Then unzip would extract the files as zip'ed, that is, with their original names (but with LF-endings), thus prompting to overwrite the local original files if any.
The relevant excerpt from the zip --help:
zip --help
...
-l convert LF to CR LF (-ll CR LF to LF)
sed -i.bak --expression='s/\r\n/\n/g' <file_path>
Since the question mentions sed, this is the most straightforward way to use sed to achieve this. The expression says replace all carriage-returns and line-feeds with just line-feeds only. That is what you need when you go from Windows to Unix. I verified it works.
Just complementing #Jonathan Leffler's excellent answer, if you have a file with mixed line endings (LF and CRLF) and you need to normalize to CRLF (DOS), use the following commands in sequence...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
# Unix to DOS (normalized)
sed -i $'s/$/\r/' "<YOUR_FILE>"
NOTE: If you have a file with mixed line endings (LF and CRLF), the second command above alone will cause a mess.
If you need to convert to LF (Unix) the first command alone will be enough...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
Thanks! 🤗
[Ref(s).: https://stackoverflow.com/a/3777853/3223785 ]
TIMTOWTDI!
perl -pe 's/\r\n/\n/; s/([^\n])\z/$1\n/ if eof' PCfile.txt
Based on Gordon Davisson's answer.
One must consider the possibility of [noeol]...
You can use AWK. Set the record separator (RS) to a regular expression that matches all possible newline character, or characters. And set the output record separator (ORS) to the Unix-style newline character.
awk 'BEGIN{RS="\r|\n|\r\n|\n\r";ORS="\n"}{print}' windows_or_macos.txt > unix.txt
This worked for me
tr "\r" "\n" < sampledata.csv > sampledata2.csv
On Linux, it's easy to convert ^M (Ctrl + M) to *nix newlines (^J) with sed.
It will be something like this on the CLI, and there will actually be a line break in the text. However, the \ passes that ^J along to sed:
sed 's/^M/\
/g' < ffmpeg.log > new.log
You get this by using ^V (Ctrl + V), ^M (Ctrl + M) and \ (backslash) as you type:
sed 's/^V^M/\^V^J/g' < ffmpeg.log > new.log
As an extension to Jonathan Leffler's Unix to DOS solution, to safely convert to DOS when you're unsure of the file's current line endings:
sed '/^M$/! s/$/^M/'
This checks that the line does not already end in CRLF before converting to CRLF.
I made a script based on the accepted answer, so you can convert it directly without needing an additional file in the end and removing and renaming afterwards.
convert-crlf-to-lf() {
file="$1"
tr -d '\015' <"$file" >"$file"2
rm -rf "$file"
mv "$file"2 "$file"
}
Just make sure if you have a file like "file1.txt" that "file1.txt2" doesn't already exist or it will be overwritten. I use this as a temporary place to store the file in.
With Bash 4.2 and newer you can use something like this to strip the trailing CR, which only uses Bash built-ins:
if [[ "${str: -1}" == $'\r' ]]; then
str="${str:: -1}"
fi
I tried
sed 's/^M$//' file.txt
on OS X as well as several other methods (Fixing Dos Line Endings or http://hintsforums.macworld.com/archive/index.php/t-125.html). None worked, and the file remained unchanged (by the way, Ctrl + V, Enter was needed to reproduce ^M). In the end I used TextWrangler. It's not strictly command line, but it works and it doesn't complain.

Deleting lines beginning with a CR in a file directly

I want to write a ksh script delete all lines of a file beginning by a carriage return. I want to specify that in the same script I want to reuse the modified file so I need to do the modification directly in the file.
For example here is my file in Notepad ++ (with the carriage return shown as CRLF as its a Windows format file):
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CRLF
CE3;CPr3;CRLF
CRLF
CRLF
and I want to obtain:
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CE3;CPr3;CRLF
The script I wrote so far is:
sed -i '/^\n/d' ListeTable.lst
I also tried with \r and \R but nothing is working.
As I specify there is a following script that reuse the modified file that looks like (but there is more):
echo -n "(CE = '$(tail -n 1 ListeTable.lst | cut -d$';' -f1)'and CPr = '$(tail -n 1 ListeTable.lst | cut -d$';' -f2)')"
Ok, so I found a regex that works for this problem : '/^\s*$/d' (\s = match any whitespace character (newlines, spaces, tabs); * = the character may repeat any times or be absent; $ = to the end of the last \s character found)
So the working code is : sed -i '/^\s*$/d' ListeTable.lst

How to fix inconsistent line endings for whole VS solution?

Visual Studio will detect inconsistent line endings when opening a file and there is an option to fix it for that specific file. However, if I want to fix line endings for all files in a solution, how do I do that?
Just for a more complete answer, this worked best for me:
Replace
(?<!\r)\n
with
\r\n
in entire solution with "regEx" option.
This will set the correct line ending in all files which didn't have the correct line ending so far. It uses the negative lookahead to check for the non-existance of a \r in front of the \n.
Be careful with the other solutions: They will either modify all lines in all files (ignoring the original line ending) or delete the last character of each line.
You can use the Replace in Files command and enable regular expressions. For example, to replace end-of-lines that have a single linefeed "\n" (like, from GitHub, for example) with the standard Windows carriage-return linefeed "\r\n", search for:
([^\r]|^)\n
This says to create a group (that's why the parentheses are required), where the first character is either not a carriage-return or is the beginning of a line. The beginning of the line test is really only for the very beginning of the file, if it happens to start with a "\n". Following the group is a newline. So, you will match ";\n" which has the wrong end-of-line, but not "\r\n" which is the correct end-of-line.
And replace it with:
$1\r\n
This says to keep the group ($1) and then replace the "\n" with "\r\n".
Try doing
Edit > Advanced > Format Document
Then save the document, as long as the file doesn't get modified by another external editor, it should stay consistent. Fixes it for me.
If you have Cygwin with the cygutils package installed, you can use this chain of commands from the Cygwin shell:
unix2dos -idu *.cpp | sed -e 's/ 0 [1-9][0-9]//' -e 's/ [1-9][0-9]* 0 //' | sed '/ [1-9][0-9] / !d' | sed -e 's/ [1-9][0-9 ] //' | xargs unix2dos
(Replace the *.cpp with whatever wildcard you need)
To understand how this works, the unix2dos command is used to convert the files, but only files that have inconsistent line endings (i.e., a mixture of UNIX and DOS) need to be converted. The -idu option displays the number of dos and unix line endings in the file. For example:
0 491 Acad2S5kDim.cpp
689 0 Acad2S5kPolyline.cpp
0 120 Acad2S5kRaster.cpp
433 12 Acad2S5kXhat.cpp
0 115 AppAuditInfo.cpp
Here, only the Acad2S5kXhat.cpp file needs to be converted. The sed commands filter the output to produce a list of just the files that need to be converted, and these are then processed via xargs.

How to convert Windows end of line in Unix end of line (CR/LF to LF)

I'm a Java developer and I'm using Ubuntu to develop. The project was created in Windows with Eclipse and it's using the Windows-1252 encoding.
To convert to UTF-8 I've used the recode program:
find Web -iname \*.java | xargs recode CP1252...UTF-8
This command gives this error:
recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data
I've searched about it and get the solution in Bash and Windows, Recode: Ambiguous output in step `data..CR-LF' and it says:
Convert line endings from CR/LF to a
single LF: Edit the file with Vim,
give the command :set ff=unix and save
the file. Recode now should run
without errors.
Nice, but I've many files to remove the CR/LF character from, and I can't open each to do it. Vi doesn't provide any option to command line for Bash operations.
Can sed be used to do this? How?
There should be a program called dos2unix that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.
sed cannot match \n because the trailing newline is removed before the line is put into the pattern space, but it can match \r, so you can convert \r\n (DOS) to \n (Unix) by removing \r:
sed -i 's/\r//g' file
Warning: this will change the original file
However, you cannot change from Unix EOL to DOS or old Mac (\r) by this. More readings here:
How can I replace a newline (\n) using sed?
Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:
:args **/*.java
:argdo set ff=unix | update | next
The first of these commands sets the argument list to every file matching **/*.java, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:
Sets the line-endings to Unix style (you already know this)
Writes the file out iff it's been changed
Proceeds to the next file
I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n, just look for carriage return at the end of the line.
sed -i 's/\r$//' "${FILE_NAME}"
To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r to make this easier with grep regular expressions.)
sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"
Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.
Warning: -i changes the actual file. If you want a backup to be made, add a string of characters after -i. This will move the existing file to a file with the same name with your characters added to the end.
Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:
sed -i 's/$/\r/' "${FILE_NAME}"
The tr command can also do this:
tr -d '\15\32' < winfile.txt > unixfile.txt
and should be available to you.
You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:
#!/bin/bash
for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done
Running myscript.sh would process all the java files in the current directory and its subdirectories.
In order to overcome
Ambiguous output in step `CR-LF..data'
the simple solution might be to add the -f flag to force the conversion.
Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):
#!/usr/bin/env python
import sys
input_file_name = sys.argv[1]
output_file_name = sys.argv[2]
input_file = open(input_file_name)
output_file = open(output_file_name, 'w')
line_number = 0
for input_line in input_file:
line_number += 1
try: # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
try: # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
sys.exit(1) # and just keep going
output_file.write(output_line)
input_file.close()
output_file.close()
You can use that script with
$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql
Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u on the files.

How to convert DOS/Windows newline (CRLF) to Unix newline (LF)

How can I programmatically (not using vi) convert DOS/Windows newlines to Unix newlines?
The dos2unix and unix2dos commands are not available on certain systems.
How can I emulate them with commands such as sed, awk, and tr?
You can use tr to convert from DOS to Unix; however, you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair. This is usually the case. You then use:
tr -d '\015' <DOS-file >UNIX-file
Note that the name DOS-file is different from the name UNIX-file; if you try to use the same name twice, you will end up with no data in the file.
You can't do it the other way round (with standard 'tr').
If you know how to enter carriage return into a script (control-V, control-M to enter control-M), then:
sed 's/^M$//' # DOS to Unix
sed 's/$/^M/' # Unix to DOS
where the '^M' is the control-M character. You can also use the bash ANSI-C Quoting mechanism to specify the carriage return:
sed $'s/\r$//' # DOS to Unix
sed $'s/$/\r/' # Unix to DOS
However, if you're going to have to do this very often (more than once, roughly speaking), it is far more sensible to install the conversion programs (e.g. dos2unix and unix2dos, or perhaps dtou and utod) and use them.
If you need to process entire directories and subdirectories, you can use zip:
zip -r -ll zipfile.zip somedir/
unzip zipfile.zip
This will create a zip archive with line endings changed from CRLF to CR. unzip will then put the converted files back in place (and ask you file by file - you can answer: Yes-to-all). Credits to #vmsnomad for pointing this out.
Use:
tr -d "\r" < file
Take a look here for examples using sed:
# In a Unix environment: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # Assumes that all lines end with CR/LF
sed 's/^M$//' # In Bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' # Works on ssed, gsed 3.02.80 or higher
# In a Unix environment: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command line under ksh
sed 's/$'"/`echo \\\r`/" # Command line under bash
sed "s/$/`echo \\\r`/" # Command line under zsh
sed 's/$/\r/' # gsed 3.02.80 or higher
Use sed -i for in-place conversion, e.g., sed -i 's/..../' file.
You can use Vim programmatically with the option -c {command}:
DOS to Unix:
vim file.txt -c "set ff=unix" -c ":wq"
Unix to DOS:
vim file.txt -c "set ff=dos" -c ":wq"
"set ff=unix/dos" means change fileformat (ff) of the file to Unix/DOS end of line format.
":wq" means write the file to disk and quit the editor (allowing to use the command in a loop).
Install dos2unix, then convert a file in-place with
dos2unix <filename>
To output converted text to a different file use
dos2unix -n <input-file> <output-file>
You can install it on Ubuntu or Debian with
sudo apt install dos2unix
or on macOS using Homebrew
brew install dos2unix
Using AWK you can do:
awk '{ sub("\r$", ""); print }' dos.txt > unix.txt
Using Perl you can do:
perl -pe 's/\r$//' < dos.txt > unix.txt
This problem can be solved with standard tools, but there are sufficiently many traps for the unwary that I recommend you install the flip command, which was written over 20 years ago by Rahul Dhesi, the author of zoo.
It does an excellent job converting file formats while, for example, avoiding the inadvertant destruction of binary files, which is a little too easy if you just race around altering every CRLF you see...
If you don't have access to dos2unix, but can read this page, then you can copy/paste dos2unix.py from here.
#!/usr/bin/env python
"""\
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py <input> <output>
"""
import sys
if len(sys.argv[1:]) != 2:
sys.exit(__doc__)
content = ''
outsize = 0
with open(sys.argv[1], 'rb') as infile:
content = infile.read()
with open(sys.argv[2], 'wb') as output:
for line in content.splitlines():
outsize += len(line) + 1
output.write(line + '\n')
print("Done. Saved %s bytes." % (len(content)-outsize))
(Cross-posted from Super User.)
The solutions posted so far only deal with part of the problem, converting DOS/Windows' CRLF into Unix's LF; the part they're missing is that DOS use CRLF as a line separator, while Unix uses LF as a line terminator. The difference is that a DOS file (usually) won't have anything after the last line in the file, while Unix will. To do the conversion properly, you need to add that final LF (unless the file is zero-length, i.e. has no lines in it at all). My favorite incantation for this (with a little added logic to handle Mac-style CR-separated files, and not molest files that're already in unix format) is a bit of perl:
perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/([^\n])\z/$1\n/ }; $m=1' PCfile.txt
Note that this sends the Unixified version of the file to stdout. If you want to replace the file with a Unixified version, add perl's -i flag.
It is super duper easy with PCRE;
As a script, or replace $# with your files.
#!/usr/bin/env bash
perl -pi -e 's/\r\n/\n/g' -- $#
This will overwrite your files in place!
I recommend only doing this with a backup (version control or otherwise)
An even simpler AWK solution without a program:
awk -v ORS='\r\n' '1' unix.txt > dos.txt
Technically '1' is your program, because AWK requires one when the given option.
Alternatively, an internal solution is:
while IFS= read -r line;
do printf '%s\n' "${line%$'\r'}";
done < dos.txt > unix.txt
Interestingly, in my Git Bash on Windows, sed "" did the trick already:
$ echo -e "abc\r" >tst.txt
$ file tst.txt
tst.txt: ASCII text, with CRLF line terminators
$ sed -i "" tst.txt
$ file tst.txt
tst.txt: ASCII text
My guess is that sed ignores them when reading lines from the input and always writes Unix line endings to the output.
For Mac OS X if you have Homebrew installed (http://brew.sh/):
brew install dos2unix
for csv in *.csv; do dos2unix -c mac ${csv}; done;
Make sure you have made copies of the files, as this command will modify the files in place.
The -c mac option makes the switch to be compatible with OS X.
I had just to ponder that same question (on Windows-side, but equally applicable to Linux).
Surprisingly, nobody mentioned a very much automated way of doing CRLF <-> LF conversion for text-files using the good old zip -ll option (Info-ZIP):
zip -ll textfiles-lf.zip files-with-crlf-eol.*
unzip textfiles-lf.zip
NOTE: this would create a ZIP file preserving the original file names, but converting the line endings to LF. Then unzip would extract the files as zip'ed, that is, with their original names (but with LF-endings), thus prompting to overwrite the local original files if any.
The relevant excerpt from the zip --help:
zip --help
...
-l convert LF to CR LF (-ll CR LF to LF)
sed -i.bak --expression='s/\r\n/\n/g' <file_path>
Since the question mentions sed, this is the most straightforward way to use sed to achieve this. The expression says replace all carriage-returns and line-feeds with just line-feeds only. That is what you need when you go from Windows to Unix. I verified it works.
Just complementing #Jonathan Leffler's excellent answer, if you have a file with mixed line endings (LF and CRLF) and you need to normalize to CRLF (DOS), use the following commands in sequence...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
# Unix to DOS (normalized)
sed -i $'s/$/\r/' "<YOUR_FILE>"
NOTE: If you have a file with mixed line endings (LF and CRLF), the second command above alone will cause a mess.
If you need to convert to LF (Unix) the first command alone will be enough...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
Thanks! 🤗
[Ref(s).: https://stackoverflow.com/a/3777853/3223785 ]
TIMTOWTDI!
perl -pe 's/\r\n/\n/; s/([^\n])\z/$1\n/ if eof' PCfile.txt
Based on Gordon Davisson's answer.
One must consider the possibility of [noeol]...
You can use AWK. Set the record separator (RS) to a regular expression that matches all possible newline character, or characters. And set the output record separator (ORS) to the Unix-style newline character.
awk 'BEGIN{RS="\r|\n|\r\n|\n\r";ORS="\n"}{print}' windows_or_macos.txt > unix.txt
This worked for me
tr "\r" "\n" < sampledata.csv > sampledata2.csv
On Linux, it's easy to convert ^M (Ctrl + M) to *nix newlines (^J) with sed.
It will be something like this on the CLI, and there will actually be a line break in the text. However, the \ passes that ^J along to sed:
sed 's/^M/\
/g' < ffmpeg.log > new.log
You get this by using ^V (Ctrl + V), ^M (Ctrl + M) and \ (backslash) as you type:
sed 's/^V^M/\^V^J/g' < ffmpeg.log > new.log
As an extension to Jonathan Leffler's Unix to DOS solution, to safely convert to DOS when you're unsure of the file's current line endings:
sed '/^M$/! s/$/^M/'
This checks that the line does not already end in CRLF before converting to CRLF.
I made a script based on the accepted answer, so you can convert it directly without needing an additional file in the end and removing and renaming afterwards.
convert-crlf-to-lf() {
file="$1"
tr -d '\015' <"$file" >"$file"2
rm -rf "$file"
mv "$file"2 "$file"
}
Just make sure if you have a file like "file1.txt" that "file1.txt2" doesn't already exist or it will be overwritten. I use this as a temporary place to store the file in.
With Bash 4.2 and newer you can use something like this to strip the trailing CR, which only uses Bash built-ins:
if [[ "${str: -1}" == $'\r' ]]; then
str="${str:: -1}"
fi
I tried
sed 's/^M$//' file.txt
on OS X as well as several other methods (Fixing Dos Line Endings or http://hintsforums.macworld.com/archive/index.php/t-125.html). None worked, and the file remained unchanged (by the way, Ctrl + V, Enter was needed to reproduce ^M). In the end I used TextWrangler. It's not strictly command line, but it works and it doesn't complain.

Resources