Remove \r (CR) from CSV

Remove \r (CR) from CSV - macos

On OSX I need to remove line-ending CR (\r) characters (represented as ^M in the output from cat -v) from my CSV file:
$ cat -v myitems.csv
output:
strPicture,strEmail^M
image1xl.jpg,me#example.com^M
I have tried lots of options with sed and perl but nothing works.
Any ideas?

Solutions with stock utilities:
Note: Except where noted (the sed -i incompatibility), the following solutions work on both OSX (macOS) and Linux.
Use sed as follows, which replaces \r\n with \n:
sed $'s/\r$//' myitems.csv
To update the input file in place, use
sed -i '' $'s/\r$//' myitems.csv
-i '' specifies updating in place, with '' indicating that no backup should be made of the input file; if you specify a extension, e.g., -i'.bak', the original input file will be saved with that extension as a backup.
Caveats:
* With GNU sed (Linux), to not create a backup file, you'd have to use just -i, without the separate '' argument, which is an unfortunate syntactic incompatibility between GNU Sed and the BSD Sed used on OSX (macOS) - see this answer of mine for the full story.
* -i creates a new file with a temporary name and then replaces the original file; the most notably consequence is that if the original file was a symlink, it is replaced with a regular file; for a detailed discussion, see the lower half of this answer.
Note: The above uses an ANSI C-quoted string ($'...') to create the \r character in the sed command, because BSD sed (the one used on OS X), doesn't natively recognize such escape sequences (note that the GNU sed used on Linux distros would).
ANSI C-quoted strings are supported in Bash, Ksh, and Zsh.
If you don't want to rely on such strings, use:
sed 's/'"$(printf '\r')"'$//'
Here, the \r is created via printf and spliced into the sed command with a command substitution ($(...)).
Using perl:
perl -pe 's/\r\n/\n/' myitems.csv | cat -v
To update the input file in place, use
perl -i -ple 's/\r\n/\n/' myitems.csv # -i'.bak' creates backup with suffix '.bak' first
The same caveat as above for sed with regard to in-place updating applies.
Using awk:
awk '{ sub("\r$", ""); print }' myitems.csv # shorter: awk 'sub("\r$", "")+1'
BSD awk offers no in-place updating option, so you'll have to capture the output in a different file; to use a temporary file and have it replace the original afterward, use the following idiom:
awk '{ sub("\r$", ""); print }' myitems.csv > tmpfile && mv tmpfile myitems.csv
GNU awk v4.1 or higher offers -i inplace for in-place updating, to which the same caveat as above for sed applies.
Edge case for all variants above: If the very last char. in the input file happens to be a lone \r without a following \n, it will also be replaced with a \n.
For the sake of completeness: here are additional, possibly suboptimal solutions:
None of them offer in-place updating, but you can employ the > tmpfile && mv tmpfile myitems.csv idiom introduced above
Using tr: a very simple solution that simply removes all \r instances; thus, it can only be used if \r instance only occur as part of \r\n sequences; typically, however, that is the case:
tr -d '\r' < myitems.csv
Using pure bash code: note that this will be slow; like the tr solution, this can only be used if \r instance only occur as part of \r\n sequences.
while IFS=$'\r' read -r line; do
printf '%s\n' "$line"
done < myitems.csv
$IFS is the internal field separator, and setting it to \r causes read to read everything before \r, if present, into variable $line (if there's no \r, the line is read as is). -r prevents read from interpreting \ instances in the input.
Edge case: If the input doesn't end with \n, the last line will not print - you could fix that by using read -r line || [[ -n $line ]].

try this, it will fix your issue.
dos2unix myitems.csv myitems.csv

Try the unix2dos command.
Example: unix2dos infile outfile
http://en.wikipedia.org/wiki/Unix2dos
The wikipedia page has some examples using perl and sed too.
perl -i -p -e 's/\n/\r\n/' file
sed -i -e 's/$/\r/' file

Related

Why is cat printing only the first and last line of file? [duplicate]

I have this line inside a file:
ULNET-PA,client_sgcib,broker_keplersecurities
,KEPLER
I try to get rid of that ^M (carriage return) character so I used:
sed 's/^M//g'
However this does remove everything after ^M:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities^M,KEPLER
[root#localhost tmp]# sed 's/^M//g' test
ULNET-PA,client_sgcib,broker_keplersecurities
What I want to obtain is:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities,KEPLER

Use tr:
tr -d '^M' < inputfile
(Note that the ^M character can be input using Ctrl+VCtrl+M)
EDIT: As suggested by Glenn Jackman, if you're using bash, you could also say:
tr -d $'\r' < inputfile

still the same line:
sed -i 's/^M//g' file
when you type the command, for ^M you type Ctrl+VCtrl+M
actually if you have already opened the file in vim, you can just in vim do:
:%s/^M//g
same, ^M you type Ctrl-V Ctrl-M

You can simply use dos2unix which is available in most Unix/Linux systems. However I found the following sed command to be better as it removed ^M where dos2unix couldn't:
sed 's/\r//g' < input.txt > output.txt
Hope that helps.
Note: ^M is actually carriage return character which is represented in code as \r
What dos2unix does is most likely equivalent to:
sed 's/\r\n/\n/g' < input.txt > output.txt
It doesn't remove \r when it is not immediately followed by \n and replaces both with just \n. This fails with certain types of files like one I just tested with.

alias dos2unix="sed -i -e 's/'\"\$(printf '\015')\"'//g' "
Usage:
dos2unix file

If Perl is an option:
perl -i -pe 's/\r\n$/\n/g' file
-i makes a .bak version of the input file
\r = carriage return
\n = linefeed
$ = end of line
s/foo/bar/g = globally substitute "foo" with "bar"

In awk:
sub(/\r/,"")
If it is in the end of record, sub(/\r/,"",$NF) should suffice. No need to scan the whole record.

This is the better way to achieve
tr -d '\015' < inputfile_name > outputfile_name
Later rename the file to original file name.

I agree with #twalberg (see accepted answer comments, above), dos2unix on Mac OSX covers this, quoting man dos2unix:
To run in Mac mode use the command-line option "-c mac" or use the
commands "mac2unix" or "unix2mac"
I settled on 'mac2unix', which got rid of my less-cmd-visible '^M' entries, introduced by an Apple 'Messages' transfer of a bash script between 2 Yosemite (OSX 10.10) Macs!
I installed 'dos2unix', trivially, on Mac OSX using the popular Homebrew package installer, I highly recommend it and it's companion command, Cask.

This is clean and simple and it works:
sed -i 's/\r//g' file
where \r of course is the equivalent for ^M.

Simply run the following command:
sed -i -e 's/\r$//' input.file
I verified this as valid in Mac OSX Monterey.

remove any \r :
nawk 'NF+=OFS=_' FS='\r'
gawk 3 ORS= RS='\r'
remove end of line \r :
mawk2 8 RS='\r?\n'
mawk -F'\r$' NF=1

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?

To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile

There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file

You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.

The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile

You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.

I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.

You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).

To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename

I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s

Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'

SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'

perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).

You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.

echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt

Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file

to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.

Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.

cat filename | grep -v "pattern" > filename.1
mv filename.1 filename

You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF

This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done

I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Bash Script to Replace a word followed by colon followed by a space and then a number

i want to find the revision number in a file- there will be an input of a new revision number from a user and this new one will replace the old one.
example:
revision: 56
should be replaced with 67 if the users input is 67, like this;
revision: 67
I want a bash script which would find and replace the old number with the new one. The value for the new revision number will be stored in the variable revision_number.
So far this is what i got:
#!/bin/bash
echo “Insert the new revision number: “
read revision_number
sed -e ^revision: \d+$/^revision: $revision_number$ *.txt

Like this:
sed "s/^revision:.*/revision: ${input}/" file

Using sed
To change the files in place with sed, use:
sed -i -E "s/^revision: [[:digit:]]+$/revision: $revision_number/" *.txt
Notes:
sed supports POSIX regular expressions. \d is not POSIX. [[:digit:]] is POSIX and is superior to [0-9] because it is unicode safe.
+ is not supported in POSIX's basic regular expressions. Use -E to get extended regular expressions.
-i tells sed to change the files in place.
Note that sed treats $revision_number not as data but as part of the command. This is dangerous. Malicious values of $revision_number could cause files to be deleted or overwritten.
Compatibility: The above works for modern GNU sed. For very old GNU sed, replace -E with -r:
sed -i -r "s/^revision: [[:digit:]]+$/revision: $revision_number/" *.txt
For BSD/OSX sed, replace -i with -i '':
sed -i '' -E "s/^revision: [[:digit:]]+$/revision: $revision_number/" *.txt
Using GNU awk
To change the files in-place using GNU awk:
gawk -i inplace -v r="$revision_number" '{sub(/^revision: [[:digit:]]+/, "revision: " r)} 1' *.txt
Notes:
BSD/OSX awk does not support -i inplace.
Because awk treats $revision_number as data not code, this is much safer to use than the sed approach.

perl
perl -i -spe 's/^revision:\s+\K\d+/$revno/g' -- -revno="$revision_number" *.txt

sed delete not working with cat variable

I have a file named test-domain, the contents of which contain the line 100.am.
When I do this, the line with 100.am is deleted from the test-domain file, as expected:
for x in $(echo 100.am); do sed -i "/$x/d" test-domain; done
However, if instead of echo 100.am, I read each line from a file named unwanted-lines, it does NOT work.
for x in $(cat unwanted-lines); do sed -i "/$x/d" test-domain; done
This is even if the only contents of unwanted-lines is one line, with the exact contents 100.am.
Does anyone know why sed delete line works if you use echo in your variable, but not if you use cat?

fgrep -v -f unwanted-lines test-domain > /tmp/Buffer
mv /tmp/Buffer test-domain
sed is not interesting in this case due to multiple call in shell (poor efficiency and lot of ressources used). The way to still use sed is to preload line to delete, and make a search base on this preloaded info but very heavy compare to fgrep in this case

Does anyone know why sed delete line works if you use echo in your
variable, but not if you use cat?
I believe that your file containing unwanted lines contains CR+LF line endings due to which it doesn't work when you use the file. You could strip the CR in your loop:
for x in $(cat unwanted-lines); do x="${x//$'\r'}"; sed -i "/$x/d" test-domain; done

One better strategy than yours would be to use a genuine editor, e.g., ed, as so:
ed -s test-domain < <(
shopt -s extglob
while IFS= read -r l; do
[[ $l = *([[:space:]]) ]] && continue
l=${l//./\\.}
echo "g/$l/d"
done < unwanted-lines
echo "wq"
)
Caveat. You must make sure that the file unwanted-lines doesn't contain any character that could clash with ed's regexps and commands. I have already included a match for a period (i.e., replace . with \.).
This method is quite efficient, as you're not forking so many times on sed, writing temp files, renaming them, etc.
Another possibility would be to use grep, but then you won't have the editing option ed offers.
Remark. ed is the standard editor.

why not just applying the sed command on your file?
sed -i '/.*100\.am/d' your_file

unix commandline for inline replacement of all newlines in file with <br>\n

sed 's/$/<br>/' mytext.txt
worked but output it all on the command line. I want it to just do the replacement WITHIN the specified file itself. Should I be using another tool?

If you have gnu sed, you can use the -i option, which will do the replacement in place.
sed -i 's/$/<br>/' mytext.txt
Otherwise you will have to redirect to another file and rename it over the old one.
sed 's/$/<br>/' mytext.txt > mytext.txt.new && mv mytext.txt.new mytext.txt

Just for completeness. On Mac OS X (which uses FreeBSD sed) you have to use an additional null-string "" for in-place file editing without backup:
sed -i "" 's/$/<br>/' mytext.txt
As an alternative to using sed for no-backup in-place file editing, you may use ed(1), which, however, reads the entire file into memory before operating on it.
printf '%s\n' H 'g/$/s//<br>/g' ',p' | ed -s test.file # print to stdout
printf '%s\n' H 'g/$/s//<br>/g' wq | ed -s test.file # in-place file edit
For more information on ed(1) see:
"Editing files with the ed text editor from scripts",
http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed

If you have an up-to-date sed, just use sed -i or sed --in-place, which will modify the actual file itself.
If you want a backup, you need to supply a suffix for it. So sed -i.bak or sed --in-place=.bak will create a backup file with the .bak suffix.
Use this! Seriously! You'll appreciate it a lot the first time you damage your file due to a mistyped sed command or wrong assumption about the data in the file.

Use the redirection symbol i.e.
sed 's/$/<br>/' mytext.txt > mytext2.txt && mv mytext2.txt mytext.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove \r (CR) from CSV - macos

On OSX I need to remove line-ending CR (\r) characters (represented as ^M in the output from cat -v) from my CSV file: $ cat -v myitems.csv output: strPicture,strEmail^M image1xl.jpg,me#example.com^M I have tried lots of options with sed and perl but nothing works. Any ideas?

try this, it will fix your issue. dos2unix myitems.csv myitems.csv

Try the unix2dos command. Example: unix2dos infile outfile http://en.wikipedia.org/wiki/Unix2dos The wikipedia page has some examples using perl and sed too. perl -i -p -e 's/\n/\r\n/' file sed -i -e 's/$/\r/' file

Related

Why is cat printing only the first and last line of file? [duplicate]

How to delete a line (matching a pattern) from a text file? [duplicate]

Bash Script to Replace a word followed by colon followed by a space and then a number

sed delete not working with cat variable

unix commandline for inline replacement of all newlines in file with <br>\n

Categories

Resources