How can I replace a newline ("\n") with a space ("") using the sed command?
I unsuccessfully tried:
sed 's#\n# #g' file
sed 's#^$# #g' file
How do I fix it?
sed is intended to be used on line-based input. Although it can do what you need.
A better option here is to use the tr command as follows:
tr '\n' ' ' < input_filename
or remove the newline characters entirely:
tr -d '\n' < input.txt > output.txt
or if you have the GNU version (with its long options)
tr --delete '\n' < input.txt > output.txt
Use this solution with GNU sed:
sed ':a;N;$!ba;s/\n/ /g' file
This will read the whole file in a loop (':a;N;$!ba), then replaces the newline(s) with a space (s/\n/ /g). Additional substitutions can be simply appended if needed.
Explanation:
sed starts by reading the first line excluding the newline into the pattern space.
Create a label via :a.
Append a newline and next line to the pattern space via N.
If we are before the last line, branch to the created label $!ba ($! means not to do it on the last line. This is necessary to avoid executing N again, which would terminate the script if there is no more input!).
Finally the substitution replaces every newline with a space on the pattern space (which is the whole file).
Here is cross-platform compatible syntax which works with BSD and OS X's sed (as per #Benjie comment):
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' file
As you can see, using sed for this otherwise simple problem is problematic. For a simpler and adequate solution see this answer.
Fast answer
sed ':a;N;$!ba;s/\n/ /g' file
:a create a label 'a'
N append the next line to the pattern space
$! if not the last line, ba branch (go to) label 'a'
s substitute, /\n/ regex for new line, / / by a space, /g global match (as many times as it can)
sed will loop through step 1 to 3 until it reach the last line, getting all lines fit in the pattern space where sed will substitute all \n characters
Alternatives
All alternatives, unlike sed will not need to reach the last line to begin the process
with bash, slow
while read line; do printf "%s" "$line "; done < file
with perl, sed-like speed
perl -p -e 's/\n/ /' file
with tr, faster than sed, can replace by one character only
tr '\n' ' ' < file
with paste, tr-like speed, can replace by one character only
paste -s -d ' ' file
with awk, tr-like speed
awk 1 ORS=' ' file
Other alternative like "echo $(< file)" is slow, works only on small files and needs to process the whole file to begin the process.
Long answer from the sed FAQ 5.10
5.10. Why can't I match or delete a newline using the \n escape
sequence? Why can't I match 2 or more lines using \n?
The \n will never match the newline at the end-of-line because the
newline is always stripped off before the line is placed into the
pattern space. To get 2 or more lines into the pattern space, use
the 'N' command or something similar (such as 'H;...;g;').
Sed works like this: sed reads one line at a time, chops off the
terminating newline, puts what is left into the pattern space where
the sed script can address or change it, and when the pattern space
is printed, appends a newline to stdout (or to a file). If the
pattern space is entirely or partially deleted with 'd' or 'D', the
newline is not added in such cases. Thus, scripts like
sed 's/\n//' file # to delete newlines from each line
sed 's/\n/foo\n/' file # to add a word to the end of each line
will NEVER work, because the trailing newline is removed before
the line is put into the pattern space. To perform the above tasks,
use one of these scripts instead:
tr -d '\n' < file # use tr to delete newlines
sed ':a;N;$!ba;s/\n//g' file # GNU sed to delete newlines
sed 's/$/ foo/' file # add "foo" to end of each line
Since versions of sed other than GNU sed have limits to the size of
the pattern buffer, the Unix 'tr' utility is to be preferred here.
If the last line of the file contains a newline, GNU sed will add
that newline to the output but delete all others, whereas tr will
delete all newlines.
To match a block of two or more lines, there are 3 basic choices:
(1) use the 'N' command to add the Next line to the pattern space;
(2) use the 'H' command at least twice to append the current line
to the Hold space, and then retrieve the lines from the hold space
with x, g, or G; or (3) use address ranges (see section 3.3, above)
to match lines between two specified addresses.
Choices (1) and (2) will put an \n into the pattern space, where it
can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
of using 'N' to delete a block of lines appears in section 4.13
("How do I delete a block of specific consecutive lines?"). This
example can be modified by changing the delete command to something
else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
or 's' (substitute).
Choice (3) will not put an \n into the pattern space, but it does
match a block of consecutive lines, so it may be that you don't
even need the \n to find what you're looking for. Since GNU sed
version 3.02.80 now supports this syntax:
sed '/start/,+4d' # to delete "start" plus the next 4 lines,
in addition to the traditional '/from here/,/to there/{...}' range
addresses, it may be possible to avoid the use of \n entirely.
A shorter awk alternative:
awk 1 ORS=' '
Explanation
An awk program is built up of rules which consist of conditional code-blocks, i.e.:
condition { code-block }
If the code-block is omitted, the default is used: { print $0 }. Thus, the 1 is interpreted as a true condition and print $0 is executed for each line.
When awk reads the input it splits it into records based on the value of RS (Record Separator), which by default is a newline, thus awk will by default parse the input line-wise. The splitting also involves stripping off RS from the input record.
Now, when printing a record, ORS (Output Record Separator) is appended to it, default is again a newline. So by changing ORS to a space all newlines are changed to spaces.
GNU sed has an option, -z, for null-separated records (lines). You can just call:
sed -z 's/\n/ /g'
The Perl version works the way you expected.
perl -i -p -e 's/\n//' file
As pointed out in the comments, it's worth noting that this edits in place. -i.bak will give you a backup of the original file before the replacement in case your regular expression isn't as smart as you thought.
Who needs sed? Here is the bash way:
cat test.txt | while read line; do echo -n "$line "; done
In order to replace all newlines with spaces using awk, without reading the whole file into memory:
awk '{printf "%s ", $0}' inputfile
If you want a final newline:
awk '{printf "%s ", $0} END {printf "\n"}' inputfile
You can use a character other than space:
awk '{printf "%s|", $0} END {printf "\n"}' inputfile
tr '\n' ' '
is the command.
Simple and easy to use.
Three things.
tr (or cat, etc.) is absolutely not needed. (GNU) sed and (GNU) awk, when combined, can do 99.9% of any text processing you need.
stream != line based. ed is a line-based editor. sed is not. See sed lecture for more information on the difference. Most people confuse sed to be line-based because it is, by default, not very greedy in its pattern matching for SIMPLE matches - for instance, when doing pattern searching and replacing by one or two characters, it by default only replaces on the first match it finds (unless specified otherwise by the global command). There would not even be a global command if it were line-based rather than STREAM-based, because it would evaluate only lines at a time. Try running ed; you'll notice the difference. ed is pretty useful if you want to iterate over specific lines (such as in a for-loop), but most of the times you'll just want sed.
That being said,
sed -e '{:q;N;s/\n/ /g;t q}' file
works just fine in GNU sed version 4.2.1. The above command will replace all newlines with spaces. It's ugly and a bit cumbersome to type in, but it works just fine. The {}'s can be left out, as they're only included for sanity reasons.
Why didn't I find a simple solution with awk?
awk '{printf $0}' file
printf will print the every line without newlines, if you want to separate the original lines with a space or other:
awk '{printf $0 " "}' file
The answer with the :a label ...
How can I replace a newline (\n) using sed?
... does not work in freebsd 7.2 on the command line:
( echo foo ; echo bar ) | sed ':a;N;$!ba;s/\n/ /g'
sed: 1: ":a;N;$!ba;s/\n/ /g": unused label 'a;N;$!ba;s/\n/ /g'
foo
bar
But does if you put the sed script in a file or use -e to "build" the sed script...
> (echo foo; echo bar) | sed -e :a -e N -e '$!ba' -e 's/\n/ /g'
foo bar
or ...
> cat > x.sed << eof
:a
N
$!ba
s/\n/ /g
eof
> (echo foo; echo bar) | sed -f x.sed
foo bar
Maybe the sed in OS X is similar.
Easy-to-understand Solution
I had this problem. The kicker was that I needed the solution to work on BSD's (Mac OS X) and GNU's (Linux and Cygwin) sed and tr:
$ echo 'foo
bar
baz
foo2
bar2
baz2' \
| tr '\n' '\000' \
| sed 's:\x00\x00.*:\n:g' \
| tr '\000' '\n'
Output:
foo
bar
baz
(has trailing newline)
It works on Linux, OS X, and BSD - even without UTF-8 support or with a crappy terminal.
Use tr to swap the newline with another character.
NULL (\000 or \x00) is nice because it doesn't need UTF-8 support and it's not likely to be used.
Use sed to match the NULL
Use tr to swap back extra newlines if you need them
You can use xargs:
seq 10 | xargs
or
seq 10 | xargs echo -n
cat file | xargs
for the sake of completeness
If you are unfortunate enough to have to deal with Windows line endings, you need to remove the \r and the \n:
tr '\r\n' ' ' < $input > $output
I'm not an expert, but I guess in sed you'd first need to append the next line into the pattern space, bij using "N". From the section "Multiline Pattern Space" in "Advanced sed Commands" of the book sed & awk (Dale Dougherty and Arnold Robbins; O'Reilly 1997; page 107 in the preview):
The multiline Next (N) command creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The original contents of pattern space and the new input line are separated by a newline. The embedded newline character can be matched in patterns by the escape sequence "\n". In a multiline pattern space, the metacharacter "^" matches the very first character of the pattern space, and not the character(s) following any embedded newline(s). Similarly, "$" matches only the final newline in the pattern space, and not any embedded newline(s). After the Next command is executed, control is then passed to subsequent commands in the script.
From man sed:
[2addr]N
Append the next line of input to the pattern space, using an embedded newline character to separate the appended material from the original contents. Note that the current line number changes.
I've used this to search (multiple) badly formatted log files, in which the search string may be found on an "orphaned" next line.
In response to the "tr" solution above, on Windows (probably using the Gnuwin32 version of tr), the proposed solution:
tr '\n' ' ' < input
was not working for me, it'd either error or actually replace the \n w/ '' for some reason.
Using another feature of tr, the "delete" option -d did work though:
tr -d '\n' < input
or '\r\n' instead of '\n'
I used a hybrid approach to get around the newline thing by using tr to replace newlines with tabs, then replacing tabs with whatever I want. In this case, " " since I'm trying to generate HTML breaks.
echo -e "a\nb\nc\n" |tr '\n' '\t' | sed 's/\t/ <br> /g'`
You can also use this method:
sed 'x;G;1!h;s/\n/ /g;$!d'
Explanation
x - which is used to exchange the data from both space (pattern and hold).
G - which is used to append the data from hold space to pattern space.
h - which is used to copy the pattern space to hold space.
1!h - During first line won't copy pattern space to hold space due to \n is
available in pattern space.
$!d - Clear the pattern space every time before getting the next line until the
the last line.
Flow
When the first line get from the input, an exchange is made, so 1 goes to hold space and \n comes to pattern space, appending the hold space to pattern space, and a substitution is performed and deletes the pattern space.
During the second line, an exchange is made, 2 goes to hold space and 1 comes to the pattern space, G append the hold space into the pattern space, h copy the pattern to it, the substitution is made and deleted. This operation is continued until EOF is reached and prints the exact result.
Bullet-proof solution. Binary-data-safe and POSIX-compliant, but slow.
POSIX sed
requires input according to the
POSIX text file
and
POSIX line
definitions, so NULL-bytes and too long lines are not allowed and each line must end with a newline (including the last line). This makes it hard to use sed for processing arbitrary input data.
The following solution avoids sed and instead converts the input bytes to octal codes and then to bytes again, but intercepts octal code 012 (newline) and outputs the replacement string in place of it. As far as I can tell the solution is POSIX-compliant, so it should work on a wide variety of platforms.
od -A n -t o1 -v | tr ' \t' '\n\n' | grep . |
while read x; do [ "0$x" -eq 012 ] && printf '<br>\n' || printf "\\$x"; done
POSIX reference documentation:
sh,
shell command language,
od,
tr,
grep,
read,
[,
printf.
Both read, [, and printf are built-ins in at least bash, but that is probably not guaranteed by POSIX, so on some platforms it could be that each input byte will start one or more new processes, which will slow things down. Even in bash this solution only reaches about 50 kB/s, so it's not suited for large files.
Tested on Ubuntu (bash, dash, and busybox), FreeBSD, and OpenBSD.
In some situations maybe you can change RS to some other string or character. This way, \n is available for sub/gsub:
$ gawk 'BEGIN {RS="dn" } {gsub("\n"," ") ;print $0 }' file
The power of shell scripting is that if you do not know how to do it in one way you can do it in another way. And many times you have more things to take into account than make a complex solution on a simple problem.
Regarding the thing that gawk is slow... and reads the file into memory, I do not know this, but to me gawk seems to work with one line at the time and is very very fast (not that fast as some of the others, but the time to write and test also counts).
I process MB and even GB of data, and the only limit I found is line size.
Finds and replaces using allowing \n
sed -ie -z 's/Marker\n/# Marker Comment\nMarker\n/g' myfile.txt
Marker
Becomes
# Marker Comment
Marker
You could use xargs — it will replace \n with a space by default.
However, it would have problems if your input has any case of an unterminated quote, e.g. if the quote signs on a given line don't match.
On Mac OS X (using FreeBSD sed):
# replace each newline with a space
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g; ta'
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g' -e ta
To remove empty lines:
sed -n "s/^$//;t;p;"
Using Awk:
awk "BEGIN { o=\"\" } { o=o \" \" \$0 } END { print o; }"
A solution I particularly like is to append all the file in the hold space and replace all newlines at the end of file:
$ (echo foo; echo bar) | sed -n 'H;${x;s/\n//g;p;}'
foobar
However, someone said me the hold space can be finite in some sed implementations.
Replace newlines with any string, and replace the last newline too
The pure tr solutions can only replace with a single character, and the pure sed solutions don't replace the last newline of the input. The following solution fixes these problems, and seems to be safe for binary data (even with a UTF-8 locale):
printf '1\n2\n3\n' |
sed 's/%/%p/g;s/#/%a/g' | tr '\n' # | sed 's/#/<br>/g;s/%a/#/g;s/%p/%/g'
Result:
1<br>2<br>3<br>
It is sed that introduces the new-lines after "normal" substitution. First, it trims the new-line char, then it processes according to your instructions, then it introduces a new-line.
Using sed you can replace "the end" of a line (not the new-line char) after being trimmed, with a string of your choice, for each input line; but, sed will output different lines. For example, suppose you wanted to replace the "end of line" with "===" (more general than a replacing with a single space):
PROMPT~$ cat <<EOF |sed 's/$/===/g'
first line
second line
3rd line
EOF
first line===
second line===
3rd line===
PROMPT~$
To replace the new-line char with the string, you can, inefficiently though, use tr , as pointed before, to replace the newline-chars with a "special char" and then use sed to replace that special char with the string you want.
For example:
PROMPT~$ cat <<EOF | tr '\n' $'\x01'|sed -e 's/\x01/===/g'
first line
second line
3rd line
EOF
first line===second line===3rd line===PROMPT~$
I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
I had a file containing CR and CRLF on Windows.
I ran this command on it:
$ sed -i 's \x0d \x0a ' foo
What I got back was that:
All CR that were not followed by LF were converted to LF
But
Those CR that were part of CRLF were left unchanged.
Why is that?
Assuming that you're running this on a Unix platform, using GNU sed:
sed -i 's/\r/\n/g; s/\n$//' foo
This replaces all isolated CR (\r, \x0d) instances as well as CRLF (\r\n, \x0d\x0a) sequences with one LF (\n, \x0a) each - see bottom for an explanation.
As for what you tried (again, assuming that you're running this on a Unix platform, using GNU sed):
sed reads everything up to, but not including, a LF (\n) as a single line, and, on output, terminates that line with LF.
In your case that means that a single line read would end in CR (\r) (due to sed reading up to CRLF, stripping the LF), possibly containing isolated CR instances in that line.
's \x0d \x0a ', due to not using option g, replaces at most 1 CR character with LF.
What that should have resulted in:
The first CR (\r, \x0d) instance on each line should have been replaced with LF (\n, \x0a)
Any additional CR instances on the current line - including one that is part of the line-ending CRLF sequence - would have been left alone.
Why does a correct solution need two s calls?
's/\r/\n/g' globally (g) replaces all CR (\r) instances in the current line with LF \n.
Since the CR that was part of the line-ending CRLF was therefore also replaced with \n, the in-memory line (the pattern space, in sed speak) now ends in \n.
Because sed invariably appends an LF (\n) on output, the extra trailing \n must be removed, which is what s/\n$//' does.
The reason of this behavior is that lines ending with \r in unix appear as ONE line with the next line that has a \n:
$ echo -e "line1\rline2\r\nline3" |cat -A
line1^Mline2^M$
line3$
As a result your sed, without g option, will replace the first \r in this "concatenated" line :
$ echo -e "line1\rline2\r\nline3" |sed 's \x0d \x0a ' |cat -A
line1$
line2^M$ #this is same input line as line1 and thus \r is not replaced the second time in the same line without g
line3$
You need to include g for global replacements of \r when found more than once in the same whatever considered to be input line:
$ echo -e "line1\rline2\r\nline3\rline4\r\nline5\r\nline6" |cat -A
line1^Mline2^M$ #line2 \r will not be replaced without g
line3^Mline4^M$ #line4 \r will not be replaced without g
line5^M$ # This \r will be replaced since it is unique on input line
line6$
$ echo -e "line1\rline2\r\nline3\rline4\r\nline5\r\nline6" |sed 's \r \n ' |cat -A
line1$
line2^M$
line3$
line4^M$
line5$ #the \r is removed from here even without g , since input line5 was alone
$
line6$
$ echo -e "line1\rline2\r\nline3\rline4\r\nline5\r\nline6" |sed 's \r \n g' |cat -A
line1$
line2$
$
line3$
line4$
$
line5$
$
line6$
Attention:
As it is obvious from above tests , replacing \r with \n will make CRLF to be LFLF = \n\n and this will generate an extra blank line. This may or may not be desirable. This extra line can be removed as advised i.e by answer of mklement0
On OSX I need to remove line-ending CR (\r) characters (represented as ^M in the output from cat -v) from my CSV file:
$ cat -v myitems.csv
output:
strPicture,strEmail^M
image1xl.jpg,me#example.com^M
I have tried lots of options with sed and perl but nothing works.
Any ideas?
Solutions with stock utilities:
Note: Except where noted (the sed -i incompatibility), the following solutions work on both OSX (macOS) and Linux.
Use sed as follows, which replaces \r\n with \n:
sed $'s/\r$//' myitems.csv
To update the input file in place, use
sed -i '' $'s/\r$//' myitems.csv
-i '' specifies updating in place, with '' indicating that no backup should be made of the input file; if you specify a extension, e.g., -i'.bak', the original input file will be saved with that extension as a backup.
Caveats:
* With GNU sed (Linux), to not create a backup file, you'd have to use just -i, without the separate '' argument, which is an unfortunate syntactic incompatibility between GNU Sed and the BSD Sed used on OSX (macOS) - see this answer of mine for the full story.
* -i creates a new file with a temporary name and then replaces the original file; the most notably consequence is that if the original file was a symlink, it is replaced with a regular file; for a detailed discussion, see the lower half of this answer.
Note: The above uses an ANSI C-quoted string ($'...') to create the \r character in the sed command, because BSD sed (the one used on OS X), doesn't natively recognize such escape sequences (note that the GNU sed used on Linux distros would).
ANSI C-quoted strings are supported in Bash, Ksh, and Zsh.
If you don't want to rely on such strings, use:
sed 's/'"$(printf '\r')"'$//'
Here, the \r is created via printf and spliced into the sed command with a command substitution ($(...)).
Using perl:
perl -pe 's/\r\n/\n/' myitems.csv | cat -v
To update the input file in place, use
perl -i -ple 's/\r\n/\n/' myitems.csv # -i'.bak' creates backup with suffix '.bak' first
The same caveat as above for sed with regard to in-place updating applies.
Using awk:
awk '{ sub("\r$", ""); print }' myitems.csv # shorter: awk 'sub("\r$", "")+1'
BSD awk offers no in-place updating option, so you'll have to capture the output in a different file; to use a temporary file and have it replace the original afterward, use the following idiom:
awk '{ sub("\r$", ""); print }' myitems.csv > tmpfile && mv tmpfile myitems.csv
GNU awk v4.1 or higher offers -i inplace for in-place updating, to which the same caveat as above for sed applies.
Edge case for all variants above: If the very last char. in the input file happens to be a lone \r without a following \n, it will also be replaced with a \n.
For the sake of completeness: here are additional, possibly suboptimal solutions:
None of them offer in-place updating, but you can employ the > tmpfile && mv tmpfile myitems.csv idiom introduced above
Using tr: a very simple solution that simply removes all \r instances; thus, it can only be used if \r instance only occur as part of \r\n sequences; typically, however, that is the case:
tr -d '\r' < myitems.csv
Using pure bash code: note that this will be slow; like the tr solution, this can only be used if \r instance only occur as part of \r\n sequences.
while IFS=$'\r' read -r line; do
printf '%s\n' "$line"
done < myitems.csv
$IFS is the internal field separator, and setting it to \r causes read to read everything before \r, if present, into variable $line (if there's no \r, the line is read as is). -r prevents read from interpreting \ instances in the input.
Edge case: If the input doesn't end with \n, the last line will not print - you could fix that by using read -r line || [[ -n $line ]].
try this, it will fix your issue.
dos2unix myitems.csv myitems.csv
Try the unix2dos command.
Example: unix2dos infile outfile
http://en.wikipedia.org/wiki/Unix2dos
The wikipedia page has some examples using perl and sed too.
perl -i -p -e 's/\n/\r\n/' file
sed -i -e 's/$/\r/' file
Inside a Makefile I run a shell command which I want to pass a NULL byte as argument. The following attempt fails:
echo $(shell /bin/echo -n $$'\x00' | ruby -e "puts STDIN.read.inspect")
It generates:
echo "$\\x00"
Instead I expected:
echo "\u0000"
How do I properly escape such a NULL byte?
echo disables interpretation of backslash escapes by default. You need to supply the -e option to enable it.
$ echo -ne "\x00" | ruby -e "puts STDIN.read.inspect"
"\u0000"
Due to the execve(2) semantics it is not possible to pass a string containing a null byte as argument. Each argument string is terminated by null byte, therefore making it impossible to distinguish between the contained null byte and the end of the string.
These uses of echo are totally non-portable. Use printf, it's much easier to use for anything other than the simplest strings, and much more portable.
$ cat makefile
all:
printf '\0' > foo.out
od -a foo.out
$ make
printf '\0' > foo.out
od -a foo.out
0000000 nul
0000001
You can't use NUL as argument in bash
You can't use $'\0' as an argument, store it as a variable or using command substitution $(printf '\0') since bash (and most shells?) use C-strings that are null terminated. The leading string before NUL is interpreted as the string and the trailing string discarded.
You can only input using pipes - printf '\0' | cat -v or letting the resulting program use a file for input.
Use another means of input
Most programs that work on input with line strings NUL strings (xargs, cut, ...) typically have a -z flag. This is primarily used when dealing with paths as a character may contain ANY character EXCEPT NUL.
Programs like find and git ls-files support outputting this format, usually in the form of a -print0 or -0 flag.
Programs like sed, tr, bash et. al. use special escape characters like \0, \x0, \x00 to generate NUL bytes.
Massage the input
OP originally seems to have wanted to know how to use cut with a NUL delimiter. The problem is typically that something is separated using \n, where \n is a valid part of the values and not a line-separator (typically in paths).
Say you have a situation where you group files, each separated by a NUL character, and the groups separated by \n.
# Temporary mock output with NUL columns and newline rows
printf 'foo\0bar\nbar\0\nFOO\0BAR\0\n' > /tmp/$$.output
A work-around is to get creative with a combination of sed, awk or tr to massage the output to something that suits our input/commands.
our.sed
#!/usr/bin/sed -nf
# beginning
:x
# match \0\n
/\x0$/ {
# Change \0\n to \n
s|\x0$|\n|g
# print
p
# delete
d
}
# match \n with no leading \0
/[^\x0]$/ {
# change \0 to \1
s|\x0|\x1|g
# read next line
N
# branch to beginning
bx
}
In this scenario we map:
\0\n => \n
\0 not followed by \n => \1
While a valid character in a filename, it's unlikely to pose a problem.
# Change NUL to another unlikely to be used control character
sed -f our.sed /tmp/$$.output |\
cut -d $'\x1' -f 2
output
bar
BAR
If anyone else came here looking how to escape a null via a shell command in ruby backticks:
irb(main):024:0> `curl --silent http://some-website-or-stream.com | sed 's/\\x0//g' 1>&2`
=> ""