Removing Unicode Line Separator "U+2028" in Bash - bash

I have a text file with a unicode line separator (hex code 2028).
I want to remove it using bash (I see implementations for Python, but not for this language). What command could I use to transform the text file (output4.txt) to lose the unicode line separator?
See in vim below:

Probably this tr command should also work:
tr '\xE2\x80\xA8' ' ' < inFile > outFIle
Working solution: Thanks to OP for finding this:
sed -i.old $'s/\xE2\x80\xA8/ /g' inFile

I noticed that in your screenshot, you have already opened file in vim, then why not just do the substitution in vim?
in vim you could do
:%s/(seebelow)//g
the (seebelow) part, you could type:
ctrl-vu2028

You can probably use sed:
sed 's/\x20\x28//g' <file_in.txt >file_out.txt
To overwrite the original file:
sed -i 's/\x20\x28//g' file.txt
Edit: (See chepner's comment) You should make sure that you have the correct bytes, depending on the encoding, and then use sed to delete them. You could use e.g. od -t x1 for looking at the hex dump and figuring out the encoding.

This worked for me
sed $'s/\u2028//g' file_in.txt > file_out.txt
Note: other questions use the term <U+2028>

Related

Using variable names with find in a loop [duplicate]

How can I programmatically (not using vi) convert DOS/Windows newlines to Unix newlines?
The dos2unix and unix2dos commands are not available on certain systems.
How can I emulate them with commands such as sed, awk, and tr?
You can use tr to convert from DOS to Unix; however, you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair. This is usually the case. You then use:
tr -d '\015' <DOS-file >UNIX-file
Note that the name DOS-file is different from the name UNIX-file; if you try to use the same name twice, you will end up with no data in the file.
You can't do it the other way round (with standard 'tr').
If you know how to enter carriage return into a script (control-V, control-M to enter control-M), then:
sed 's/^M$//' # DOS to Unix
sed 's/$/^M/' # Unix to DOS
where the '^M' is the control-M character. You can also use the bash ANSI-C Quoting mechanism to specify the carriage return:
sed $'s/\r$//' # DOS to Unix
sed $'s/$/\r/' # Unix to DOS
However, if you're going to have to do this very often (more than once, roughly speaking), it is far more sensible to install the conversion programs (e.g. dos2unix and unix2dos, or perhaps dtou and utod) and use them.
If you need to process entire directories and subdirectories, you can use zip:
zip -r -ll zipfile.zip somedir/
unzip zipfile.zip
This will create a zip archive with line endings changed from CRLF to CR. unzip will then put the converted files back in place (and ask you file by file - you can answer: Yes-to-all). Credits to #vmsnomad for pointing this out.
Use:
tr -d "\r" < file
Take a look here for examples using sed:
# In a Unix environment: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # Assumes that all lines end with CR/LF
sed 's/^M$//' # In Bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' # Works on ssed, gsed 3.02.80 or higher
# In a Unix environment: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command line under ksh
sed 's/$'"/`echo \\\r`/" # Command line under bash
sed "s/$/`echo \\\r`/" # Command line under zsh
sed 's/$/\r/' # gsed 3.02.80 or higher
Use sed -i for in-place conversion, e.g., sed -i 's/..../' file.
You can use Vim programmatically with the option -c {command}:
DOS to Unix:
vim file.txt -c "set ff=unix" -c ":wq"
Unix to DOS:
vim file.txt -c "set ff=dos" -c ":wq"
"set ff=unix/dos" means change fileformat (ff) of the file to Unix/DOS end of line format.
":wq" means write the file to disk and quit the editor (allowing to use the command in a loop).
Install dos2unix, then convert a file in-place with
dos2unix <filename>
To output converted text to a different file use
dos2unix -n <input-file> <output-file>
You can install it on Ubuntu or Debian with
sudo apt install dos2unix
or on macOS using Homebrew
brew install dos2unix
Using AWK you can do:
awk '{ sub("\r$", ""); print }' dos.txt > unix.txt
Using Perl you can do:
perl -pe 's/\r$//' < dos.txt > unix.txt
This problem can be solved with standard tools, but there are sufficiently many traps for the unwary that I recommend you install the flip command, which was written over 20 years ago by Rahul Dhesi, the author of zoo.
It does an excellent job converting file formats while, for example, avoiding the inadvertant destruction of binary files, which is a little too easy if you just race around altering every CRLF you see...
If you don't have access to dos2unix, but can read this page, then you can copy/paste dos2unix.py from here.
#!/usr/bin/env python
"""\
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py <input> <output>
"""
import sys
if len(sys.argv[1:]) != 2:
sys.exit(__doc__)
content = ''
outsize = 0
with open(sys.argv[1], 'rb') as infile:
content = infile.read()
with open(sys.argv[2], 'wb') as output:
for line in content.splitlines():
outsize += len(line) + 1
output.write(line + '\n')
print("Done. Saved %s bytes." % (len(content)-outsize))
(Cross-posted from Super User.)
The solutions posted so far only deal with part of the problem, converting DOS/Windows' CRLF into Unix's LF; the part they're missing is that DOS use CRLF as a line separator, while Unix uses LF as a line terminator. The difference is that a DOS file (usually) won't have anything after the last line in the file, while Unix will. To do the conversion properly, you need to add that final LF (unless the file is zero-length, i.e. has no lines in it at all). My favorite incantation for this (with a little added logic to handle Mac-style CR-separated files, and not molest files that're already in unix format) is a bit of perl:
perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/([^\n])\z/$1\n/ }; $m=1' PCfile.txt
Note that this sends the Unixified version of the file to stdout. If you want to replace the file with a Unixified version, add perl's -i flag.
It is super duper easy with PCRE;
As a script, or replace $# with your files.
#!/usr/bin/env bash
perl -pi -e 's/\r\n/\n/g' -- $#
This will overwrite your files in place!
I recommend only doing this with a backup (version control or otherwise)
An even simpler AWK solution without a program:
awk -v ORS='\r\n' '1' unix.txt > dos.txt
Technically '1' is your program, because AWK requires one when the given option.
Alternatively, an internal solution is:
while IFS= read -r line;
do printf '%s\n' "${line%$'\r'}";
done < dos.txt > unix.txt
Interestingly, in my Git Bash on Windows, sed "" did the trick already:
$ echo -e "abc\r" >tst.txt
$ file tst.txt
tst.txt: ASCII text, with CRLF line terminators
$ sed -i "" tst.txt
$ file tst.txt
tst.txt: ASCII text
My guess is that sed ignores them when reading lines from the input and always writes Unix line endings to the output.
For Mac OS X if you have Homebrew installed (http://brew.sh/):
brew install dos2unix
for csv in *.csv; do dos2unix -c mac ${csv}; done;
Make sure you have made copies of the files, as this command will modify the files in place.
The -c mac option makes the switch to be compatible with OS X.
I had just to ponder that same question (on Windows-side, but equally applicable to Linux).
Surprisingly, nobody mentioned a very much automated way of doing CRLF <-> LF conversion for text-files using the good old zip -ll option (Info-ZIP):
zip -ll textfiles-lf.zip files-with-crlf-eol.*
unzip textfiles-lf.zip
NOTE: this would create a ZIP file preserving the original file names, but converting the line endings to LF. Then unzip would extract the files as zip'ed, that is, with their original names (but with LF-endings), thus prompting to overwrite the local original files if any.
The relevant excerpt from the zip --help:
zip --help
...
-l convert LF to CR LF (-ll CR LF to LF)
sed -i.bak --expression='s/\r\n/\n/g' <file_path>
Since the question mentions sed, this is the most straightforward way to use sed to achieve this. The expression says replace all carriage-returns and line-feeds with just line-feeds only. That is what you need when you go from Windows to Unix. I verified it works.
Just complementing #Jonathan Leffler's excellent answer, if you have a file with mixed line endings (LF and CRLF) and you need to normalize to CRLF (DOS), use the following commands in sequence...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
# Unix to DOS (normalized)
sed -i $'s/$/\r/' "<YOUR_FILE>"
NOTE: If you have a file with mixed line endings (LF and CRLF), the second command above alone will cause a mess.
If you need to convert to LF (Unix) the first command alone will be enough...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
Thanks! 🤗
[Ref(s).: https://stackoverflow.com/a/3777853/3223785 ]
TIMTOWTDI!
perl -pe 's/\r\n/\n/; s/([^\n])\z/$1\n/ if eof' PCfile.txt
Based on Gordon Davisson's answer.
One must consider the possibility of [noeol]...
You can use AWK. Set the record separator (RS) to a regular expression that matches all possible newline character, or characters. And set the output record separator (ORS) to the Unix-style newline character.
awk 'BEGIN{RS="\r|\n|\r\n|\n\r";ORS="\n"}{print}' windows_or_macos.txt > unix.txt
This worked for me
tr "\r" "\n" < sampledata.csv > sampledata2.csv
On Linux, it's easy to convert ^M (Ctrl + M) to *nix newlines (^J) with sed.
It will be something like this on the CLI, and there will actually be a line break in the text. However, the \ passes that ^J along to sed:
sed 's/^M/\
/g' < ffmpeg.log > new.log
You get this by using ^V (Ctrl + V), ^M (Ctrl + M) and \ (backslash) as you type:
sed 's/^V^M/\^V^J/g' < ffmpeg.log > new.log
As an extension to Jonathan Leffler's Unix to DOS solution, to safely convert to DOS when you're unsure of the file's current line endings:
sed '/^M$/! s/$/^M/'
This checks that the line does not already end in CRLF before converting to CRLF.
I made a script based on the accepted answer, so you can convert it directly without needing an additional file in the end and removing and renaming afterwards.
convert-crlf-to-lf() {
file="$1"
tr -d '\015' <"$file" >"$file"2
rm -rf "$file"
mv "$file"2 "$file"
}
Just make sure if you have a file like "file1.txt" that "file1.txt2" doesn't already exist or it will be overwritten. I use this as a temporary place to store the file in.
With Bash 4.2 and newer you can use something like this to strip the trailing CR, which only uses Bash built-ins:
if [[ "${str: -1}" == $'\r' ]]; then
str="${str:: -1}"
fi
I tried
sed 's/^M$//' file.txt
on OS X as well as several other methods (Fixing Dos Line Endings or http://hintsforums.macworld.com/archive/index.php/t-125.html). None worked, and the file remained unchanged (by the way, Ctrl + V, Enter was needed to reproduce ^M). In the end I used TextWrangler. It's not strictly command line, but it works and it doesn't complain.

Trying to remove non-printable characters (junk values) from a UNIX file

I am trying to remove non-printable character (for e.g. ^#) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time.
I tried using
sed -i 's/[^#a-zA-Z 0-9`~!##$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
but still the ^# characters are not removed.
Also I tried using
awk '{ sub("[^a-zA-Z0-9\"!##$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
Can anybody suggest some alternative way to remove non-printable characters?
Used tr -cd but it is removing accented characters. But they are required in the file.
Perhaps you could go with the complement of [:print:], which contains all printable characters:
tr -cd '[:print:]' < file > newfile
If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
sed 's/[^[:print:]]//g' file
Remove all control characters first:
tr -dc '\007-\011\012-\015\040-\376' < file > newfile
Then try your string:
sed -i 's/[^#a-zA-Z 0-9`~!##$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' newfile
I believe that what you see ^# is in fact a zero value \0.
The tr filter from above will remove those as well.
strings -1 file... > outputfile
seems to work. The strings program will take all printable characters, in this case of length 1 (the -1 argument) and print them. It effectively is removing all the non-printable characters.
"man strings" will provide the documentation.
Was searching for this for a while & found a rather simple solution:
The package ansifilter does exactly this. All you need to do is just pipe the output through it.
On Mac:
brew install ansifilter
Then:
cat file.txt | ansifilter

Insert line after match using sed

For some reason I can't seem to find a straightforward answer to this and I'm on a bit of a time crunch at the moment. How would I go about inserting a choice line of text after the first line matching a specific string using the sed command. I have ...
CLIENTSCRIPT="foo"
CLIENTFILE="bar"
And I want insert a line after the CLIENTSCRIPT= line resulting in ...
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Try doing this using GNU sed:
sed '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
if you want to substitute in-place, use
sed -i '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
Output
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Doc
see sed doc and search \a (append)
Note the standard sed syntax (as in POSIX, so supported by all conforming sed implementations around (GNU, OS/X, BSD, Solaris...)):
sed '/CLIENTSCRIPT=/a\
CLIENTSCRIPT2="hello"' file
Or on one line:
sed -e '/CLIENTSCRIPT=/a\' -e 'CLIENTSCRIPT2="hello"' file
(-expressions (and the contents of -files) are joined with newlines to make up the sed script sed interprets).
The -i option for in-place editing is also a GNU extension, some other implementations (like FreeBSD's) support -i '' for that.
Alternatively, for portability, you can use perl instead:
perl -pi -e '$_ .= qq(CLIENTSCRIPT2="hello"\n) if /CLIENTSCRIPT=/' file
Or you could use ed or ex:
printf '%s\n' /CLIENTSCRIPT=/a 'CLIENTSCRIPT2="hello"' . w q | ex -s file
Sed command that works on MacOS (at least, OS 10) and Unix alike (ie. doesn't require gnu sed like Gilles' (currently accepted) one does):
sed -e '/CLIENTSCRIPT="foo"/a\'$'\n''CLIENTSCRIPT2="hello"' file
This works in bash and maybe other shells too that know the $'\n' evaluation quote style. Everything can be on one line and work in
older/POSIX sed commands. If there might be multiple lines matching the CLIENTSCRIPT="foo" (or your equivalent) and you wish to only add the extra line the first time, you can rework it as follows:
sed -e '/^ *CLIENTSCRIPT="foo"/b ins' -e b -e ':ins' -e 'a\'$'\n''CLIENTSCRIPT2="hello"' -e ': done' -e 'n;b done' file
(this creates a loop after the line insertion code that just cycles through the rest of the file, never getting back to the first sed command again).
You might notice I added a '^ *' to the matching pattern in case that line shows up in a comment, say, or is indented. Its not 100% perfect but covers some other situations likely to be common. Adjust as required...
These two solutions also get round the problem (for the generic solution to adding a line) that if your new inserted line contains unescaped backslashes or ampersands they will be interpreted by sed and likely not come out the same, just like the \n is - eg. \0 would be the first line matched. Especially handy if you're adding a line that comes from a variable where you'd otherwise have to escape everything first using ${var//} before, or another sed statement etc.
This solution is a little less messy in scripts (that quoting and \n is not easy to read though), when you don't want to put the replacement text for the a command at the start of a line if say, in a function with indented lines. I've taken advantage that $'\n' is evaluated to a newline by the shell, its not in regular '\n' single-quoted values.
Its getting long enough though that I think perl/even awk might win due to being more readable.
A POSIX compliant one using the s command:
sed '/CLIENTSCRIPT="foo"/s/.*/&\
CLIENTSCRIPT2="hello"/' file
Maybe a bit late to post an answer for this, but I found some of the above solutions a bit cumbersome.
I tried simple string replacement in sed and it worked:
sed 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
& sign reflects the matched string, and then you add \n and the new line.
As mentioned, if you want to do it in-place:
sed -i 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
Another thing. You can match using an expression:
sed -i 's/CLIENTSCRIPT=.*/&\nCLIENTSCRIPT2="hello"/' file
Hope this helps someone
The awk variant :
awk '1;/CLIENTSCRIPT=/{print "CLIENTSCRIPT2=\"hello\""}' file
I had a similar task, and was not able to get the above perl solution to work.
Here is my solution:
perl -i -pe "BEGIN{undef $/;} s/^\[mysqld\]$/[mysqld]\n\ncollation-server = utf8_unicode_ci\n/sgm" /etc/mysql/my.cnf
Explanation:
Uses a regular expression to search for a line in my /etc/mysql/my.cnf file that contained only [mysqld] and replaced it with
[mysqld]
collation-server = utf8_unicode_ci
effectively adding the collation-server = utf8_unicode_ci line after the line containing [mysqld].
I had to do this recently as well for both Mac and Linux OS's and after browsing through many posts and trying many things out, in my particular opinion I never got to where I wanted to which is: a simple enough to understand solution using well known and standard commands with simple patterns, one liner, portable, expandable to add in more constraints. Then I tried to looked at it with a different perspective, that's when I realized i could do without the "one liner" option if a "2-liner" met the rest of my criteria. At the end I came up with this solution I like that works in both Ubuntu and Mac which i wanted to share with everyone:
insertLine=$(( $(grep -n "foo" sample.txt | cut -f1 -d: | head -1) + 1 ))
sed -i -e "$insertLine"' i\'$'\n''bar'$'\n' sample.txt
In first command, grep looks for line numbers containing "foo", cut/head selects 1st occurrence, and the arithmetic op increments that first occurrence line number by 1 since I want to insert after the occurrence.
In second command, it's an in-place file edit, "i" for inserting: an ansi-c quoting new line, "bar", then another new line. The result is adding a new line containing "bar" after the "foo" line. Each of these 2 commands can be expanded to more complex operations and matching.

Trying to delete lines from file with sed -- what am I doing wrong?

I have a .csv file where I'd like to delete the lines between line 355686 and line 1048576.
I used the following command in Terminal (on MacOSx):
sed -i.bak -e '355686,1048576d' trips3.csv
This produces a file called trips3.csv.bak -- but it still has a total of 1,048,576 lines when I reopen it in Excel.
Any thoughts or suggestions you have are welcome and appreciated!
I suspect the problem is that excel is using carriage return (\r, octal 015) to separate records, while sed assumes lines are separated by linefeed (\n, octal 012); this means that sed will treat the entire file as one really long line. I don't think there's an easy way to get sed to get sed to recognize CR as a line delimiter, but it's easy with perl:
perl -n -015 -i.bak -e 'print if $. < 355686 || $. > 1048576' trips3.csv
(Note: if 1048576 is the number of "lines" in the file, you can leave off the || $. > 1048576 part.)
Not sure about the osx sed implementation, however the gnu sed implementation when passed the -i flag with a backup extension first copies the original file to the specified backup and modifies the original file in-place. You should expect to see a reduced number of lines in the original file trip3.csv
Some incantation that should do the job (if you have Ruby installed, obviously)
ruby -pe 'exit if $. > 355686' < trips3.csv > output.csv
If you prefer Perl/Python, just follow the documentation to do something similar and you should be fine. :)
Also, I'm using one of the Ruby one-liners, by Dave.
EDIT: Sorry, forgot to say that you need '> output.csv' to redirect stdout to a file.
awk '!(NR>355686 && NR <1048576)' your_file

How to convert DOS/Windows newline (CRLF) to Unix newline (LF)

How can I programmatically (not using vi) convert DOS/Windows newlines to Unix newlines?
The dos2unix and unix2dos commands are not available on certain systems.
How can I emulate them with commands such as sed, awk, and tr?
You can use tr to convert from DOS to Unix; however, you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair. This is usually the case. You then use:
tr -d '\015' <DOS-file >UNIX-file
Note that the name DOS-file is different from the name UNIX-file; if you try to use the same name twice, you will end up with no data in the file.
You can't do it the other way round (with standard 'tr').
If you know how to enter carriage return into a script (control-V, control-M to enter control-M), then:
sed 's/^M$//' # DOS to Unix
sed 's/$/^M/' # Unix to DOS
where the '^M' is the control-M character. You can also use the bash ANSI-C Quoting mechanism to specify the carriage return:
sed $'s/\r$//' # DOS to Unix
sed $'s/$/\r/' # Unix to DOS
However, if you're going to have to do this very often (more than once, roughly speaking), it is far more sensible to install the conversion programs (e.g. dos2unix and unix2dos, or perhaps dtou and utod) and use them.
If you need to process entire directories and subdirectories, you can use zip:
zip -r -ll zipfile.zip somedir/
unzip zipfile.zip
This will create a zip archive with line endings changed from CRLF to CR. unzip will then put the converted files back in place (and ask you file by file - you can answer: Yes-to-all). Credits to #vmsnomad for pointing this out.
Use:
tr -d "\r" < file
Take a look here for examples using sed:
# In a Unix environment: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # Assumes that all lines end with CR/LF
sed 's/^M$//' # In Bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' # Works on ssed, gsed 3.02.80 or higher
# In a Unix environment: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command line under ksh
sed 's/$'"/`echo \\\r`/" # Command line under bash
sed "s/$/`echo \\\r`/" # Command line under zsh
sed 's/$/\r/' # gsed 3.02.80 or higher
Use sed -i for in-place conversion, e.g., sed -i 's/..../' file.
You can use Vim programmatically with the option -c {command}:
DOS to Unix:
vim file.txt -c "set ff=unix" -c ":wq"
Unix to DOS:
vim file.txt -c "set ff=dos" -c ":wq"
"set ff=unix/dos" means change fileformat (ff) of the file to Unix/DOS end of line format.
":wq" means write the file to disk and quit the editor (allowing to use the command in a loop).
Install dos2unix, then convert a file in-place with
dos2unix <filename>
To output converted text to a different file use
dos2unix -n <input-file> <output-file>
You can install it on Ubuntu or Debian with
sudo apt install dos2unix
or on macOS using Homebrew
brew install dos2unix
Using AWK you can do:
awk '{ sub("\r$", ""); print }' dos.txt > unix.txt
Using Perl you can do:
perl -pe 's/\r$//' < dos.txt > unix.txt
This problem can be solved with standard tools, but there are sufficiently many traps for the unwary that I recommend you install the flip command, which was written over 20 years ago by Rahul Dhesi, the author of zoo.
It does an excellent job converting file formats while, for example, avoiding the inadvertant destruction of binary files, which is a little too easy if you just race around altering every CRLF you see...
If you don't have access to dos2unix, but can read this page, then you can copy/paste dos2unix.py from here.
#!/usr/bin/env python
"""\
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py <input> <output>
"""
import sys
if len(sys.argv[1:]) != 2:
sys.exit(__doc__)
content = ''
outsize = 0
with open(sys.argv[1], 'rb') as infile:
content = infile.read()
with open(sys.argv[2], 'wb') as output:
for line in content.splitlines():
outsize += len(line) + 1
output.write(line + '\n')
print("Done. Saved %s bytes." % (len(content)-outsize))
(Cross-posted from Super User.)
The solutions posted so far only deal with part of the problem, converting DOS/Windows' CRLF into Unix's LF; the part they're missing is that DOS use CRLF as a line separator, while Unix uses LF as a line terminator. The difference is that a DOS file (usually) won't have anything after the last line in the file, while Unix will. To do the conversion properly, you need to add that final LF (unless the file is zero-length, i.e. has no lines in it at all). My favorite incantation for this (with a little added logic to handle Mac-style CR-separated files, and not molest files that're already in unix format) is a bit of perl:
perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/([^\n])\z/$1\n/ }; $m=1' PCfile.txt
Note that this sends the Unixified version of the file to stdout. If you want to replace the file with a Unixified version, add perl's -i flag.
It is super duper easy with PCRE;
As a script, or replace $# with your files.
#!/usr/bin/env bash
perl -pi -e 's/\r\n/\n/g' -- $#
This will overwrite your files in place!
I recommend only doing this with a backup (version control or otherwise)
An even simpler AWK solution without a program:
awk -v ORS='\r\n' '1' unix.txt > dos.txt
Technically '1' is your program, because AWK requires one when the given option.
Alternatively, an internal solution is:
while IFS= read -r line;
do printf '%s\n' "${line%$'\r'}";
done < dos.txt > unix.txt
Interestingly, in my Git Bash on Windows, sed "" did the trick already:
$ echo -e "abc\r" >tst.txt
$ file tst.txt
tst.txt: ASCII text, with CRLF line terminators
$ sed -i "" tst.txt
$ file tst.txt
tst.txt: ASCII text
My guess is that sed ignores them when reading lines from the input and always writes Unix line endings to the output.
For Mac OS X if you have Homebrew installed (http://brew.sh/):
brew install dos2unix
for csv in *.csv; do dos2unix -c mac ${csv}; done;
Make sure you have made copies of the files, as this command will modify the files in place.
The -c mac option makes the switch to be compatible with OS X.
I had just to ponder that same question (on Windows-side, but equally applicable to Linux).
Surprisingly, nobody mentioned a very much automated way of doing CRLF <-> LF conversion for text-files using the good old zip -ll option (Info-ZIP):
zip -ll textfiles-lf.zip files-with-crlf-eol.*
unzip textfiles-lf.zip
NOTE: this would create a ZIP file preserving the original file names, but converting the line endings to LF. Then unzip would extract the files as zip'ed, that is, with their original names (but with LF-endings), thus prompting to overwrite the local original files if any.
The relevant excerpt from the zip --help:
zip --help
...
-l convert LF to CR LF (-ll CR LF to LF)
sed -i.bak --expression='s/\r\n/\n/g' <file_path>
Since the question mentions sed, this is the most straightforward way to use sed to achieve this. The expression says replace all carriage-returns and line-feeds with just line-feeds only. That is what you need when you go from Windows to Unix. I verified it works.
Just complementing #Jonathan Leffler's excellent answer, if you have a file with mixed line endings (LF and CRLF) and you need to normalize to CRLF (DOS), use the following commands in sequence...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
# Unix to DOS (normalized)
sed -i $'s/$/\r/' "<YOUR_FILE>"
NOTE: If you have a file with mixed line endings (LF and CRLF), the second command above alone will cause a mess.
If you need to convert to LF (Unix) the first command alone will be enough...
# DOS to Unix
sed -i $'s/\r$//' "<YOUR_FILE>"
Thanks! 🤗
[Ref(s).: https://stackoverflow.com/a/3777853/3223785 ]
TIMTOWTDI!
perl -pe 's/\r\n/\n/; s/([^\n])\z/$1\n/ if eof' PCfile.txt
Based on Gordon Davisson's answer.
One must consider the possibility of [noeol]...
You can use AWK. Set the record separator (RS) to a regular expression that matches all possible newline character, or characters. And set the output record separator (ORS) to the Unix-style newline character.
awk 'BEGIN{RS="\r|\n|\r\n|\n\r";ORS="\n"}{print}' windows_or_macos.txt > unix.txt
This worked for me
tr "\r" "\n" < sampledata.csv > sampledata2.csv
On Linux, it's easy to convert ^M (Ctrl + M) to *nix newlines (^J) with sed.
It will be something like this on the CLI, and there will actually be a line break in the text. However, the \ passes that ^J along to sed:
sed 's/^M/\
/g' < ffmpeg.log > new.log
You get this by using ^V (Ctrl + V), ^M (Ctrl + M) and \ (backslash) as you type:
sed 's/^V^M/\^V^J/g' < ffmpeg.log > new.log
As an extension to Jonathan Leffler's Unix to DOS solution, to safely convert to DOS when you're unsure of the file's current line endings:
sed '/^M$/! s/$/^M/'
This checks that the line does not already end in CRLF before converting to CRLF.
I made a script based on the accepted answer, so you can convert it directly without needing an additional file in the end and removing and renaming afterwards.
convert-crlf-to-lf() {
file="$1"
tr -d '\015' <"$file" >"$file"2
rm -rf "$file"
mv "$file"2 "$file"
}
Just make sure if you have a file like "file1.txt" that "file1.txt2" doesn't already exist or it will be overwritten. I use this as a temporary place to store the file in.
With Bash 4.2 and newer you can use something like this to strip the trailing CR, which only uses Bash built-ins:
if [[ "${str: -1}" == $'\r' ]]; then
str="${str:: -1}"
fi
I tried
sed 's/^M$//' file.txt
on OS X as well as several other methods (Fixing Dos Line Endings or http://hintsforums.macworld.com/archive/index.php/t-125.html). None worked, and the file remained unchanged (by the way, Ctrl + V, Enter was needed to reproduce ^M). In the end I used TextWrangler. It's not strictly command line, but it works and it doesn't complain.

Resources