replacing newlines with the string '\n' with POSIX tools - shell

Yes I know there are a number of questions (e.g. (0) or (1)) which seem to ask the same, but AFAICS none really answers what I want.
What I want is, to replace any occurrence of a newline (LF) with the string \n, with no implicitly assumed newlines... and this with POSIX only utilities (and no GNU extensions or Bashisms) and input read from stdin with no buffering of that is desired.
So for example:
printf 'foo' | magic
should give foo
printf 'foo\n' | magic
should give foo\n
printf 'foo\n\n' | magic
should give foo\n\n
The usually given answers, don't do this, e.g.:
awk
printf 'foo' | awk 1 ORS='\\n gives foo\n, whereas it should give just foo
so adds an \n when there was no newline.
sed
would work for just foo but in all other cases, like:
printf 'foo\n' | sed ':a;N;$!ba;s/\n/\\n/g' gives foo, whereas it should give foo\n
misses one final newline.
Since I do not want any sort of buffering, I cannot just look whether the input ended in an newline and then add the missing one manually.
And anyway... it would use GNU extensions.
sed -z 's/\n/\\n/g'
does work (even retains the NULs correctly), but again, GNU extension.
tr
can only replace with one character, whereas I need two.
The only working solution I'd have so far is with perl:
perl -p -e 's/\n/\\n/'
which works just as desired in all cases, but as I've said, I'd like to have a solution for environments where just the basic POSIX utilities are there (so no Perl or using any GNU extensions).
Thanks in advance.

The following will work with all POSIX versions of the tools being used and with any POSIX text permissible characters as input whether a terminating newline is present or not:
$ magic() { { cat -u; printf '\n'; } | awk -v ORS= '{print sep $0; sep="\\n"}'; }
$ printf 'foo' | magic
foo$
$ printf 'foo\n' | magic
foo\n$
$ printf 'foo\n\n' | magic
foo\n\n$
The function first adds a newline to the incoming piped data to ensure that what awk is reading is a valid POSIX text file (which must end in a newline) so it's guaranteed to work in all POSIX compliant awks and then the awk command discards that terminating newline that we added and replaces all others with "\n" as required.
The only utility above that has to process input without a terminating newline is cat, but POSIX just talks about "files" as input to cat, not "text files" as in the awk and sed specs, and so every POSIX-compliant version of cat can handle input without a terminating newline.

You can (I think) do this with pure POSIX shell. I am assuming you are working with text, not arbitrary binary data that can include null bytes.
magic () {
while read x; do
printf '%s\\n' "$x"
done
printf '%s' "$x"
}
read assumes POSIX text lines (terminated with a newline), but it still populates x with anything it reads until the end of its input when no linefeed is seen. So as long as read succeeds, you have a proper line (minus the linefeed) in x that you can write back, but with a literal \n instead of a linefeed.
Once the loop breaks, output whatever (if anything) in x after the failed read, but without a trailing literal \n.
$ [ "$(printf foo | magic)" = foo ] && echo passed
passed
$ [ "$(printf 'foo\n' | magic)" = 'foo\n' ] && echo passed
passed
$ [ "$(printf 'foo\n\n' | magic)" = 'foo\n\n' ] && echo passed
passed

Here is a tr + sed solution that should work on any POSIX shell as it doesn't call any gnu utility:
printf 'foo' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo
printf 'foo\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n
printf 'foo\n\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n\n
Details:
tr command replaces each line break with \x07
sed command replace each \x07 with \\n

Related

Replace one character by the other (and vice-versa) in shell

Say I have strings that look like this:
$ a='/o\\'
$ echo $a
/o\
$ b='\//\\\\/'
$ echo $b
\//\\/
I'd like a shell script (ideally a one-liner) to replace / occurrences by \ and vice-versa.
Suppose the command is called invert, it would yield (in a shell prompt):
$ invert $a
\o/
$ invert $b
/\\//\
For example using sed, it seems unavoidable to use a temporary character, which is not great, like so:
$ echo $a | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
\o/
$ echo $b | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
/\\//\
For some context, this is useful for proper printing of git log --graph --all | tac (I like to see newer commits at the bottom).
tr is your friend:
% echo 'abc' | tr ab ba
bac
% echo '/o\' | tr '\\/' '/\\'
\o/
(escaping the backslashes in the output might require a separate step)
I think this can be done with (g)awk:
$ echo a/\\b\\/c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a\/b/\c
$ echo a\\/b/\\c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a/\b\/c
$
-F "/" This defines the separator, The input will be split in "/", and should no longer contain a "/" character.
for(i=1;i<=NF;i++) gsub(/\\/,"/",$i);. This will replace, in all items in the input, the backslash (\) for a slash (/).
If you want to replace every instance of / with \, you can uses the y command of sed, which is quite similar to what tr does:
$ a='/o\'
$ echo "$a"
/o\
$ echo "$a" | sed 'y|/\\|\\/|'
\o/
$ b='\//\\/'
$ echo "$b"
\//\\/
$ echo "$b" | sed 'y|/\\|\\/|'
/\\//\
If you are strictly limited to GNU AWK you might get desired result following way, let file.txt content be
\//\\\\/
then
awk 'BEGIN{FPAT=".";OFS="";arr["/"]="\\";arr["\\"]="/"}{for(i=1;i<=NF;i+=1){if($i in arr){$i=arr[$i]}};print}' file.txt
gives output
/\\////\
Explanation: I inform GNU AWK that field is any single character using FPAT built-in variable and that output field separator (OFS) is empty string and create array where key-value pair represent charactertobereplace-replacement, \ needs to be escaped hence \\ denote literal \. Then for each line I iterate overall all fields using for loop and if given field hold character present in array arr keys I do exchange it for corresponding value, after loop I print line.
(tested in gawk 4.2.1)

shell script for reading file and replacing new file with | symbol

i have txt file like below.
abc
def
ghi
123
456
789
expected output is
abc|def|ghi
123|456|789
I want replace new line with pipe symbol (|). i want to use in egrep.After empty line it should start other new line.
you can try with awk
awk -v RS= -v OFS="|" '{$1=$1}1' file
you get,
abc|def|ghi
123|456|789
Explanation
Set RS to a null/blank value to get awk to operate on sequences of blank lines.
From the POSIX specification for awk:
RS
The first character of the string value of RS shall be the input record separator; a by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a shall always be a field separator, no matter what the value of FS is.
$1==$1 re-formatting output with OFS as separator, 1 is true for always print.
Here's one using GNU sed:
cat file | sed ':a; N; $!ba; s/\n/|/g; s/||/\n/g'
If you're using BSD sed (the flavor packaged with Mac OS X), you will need to pass in each expression separately, and use a literal newline instead of \n (more info):
cat file | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/|/g' -e 's/||/\
/g'
If file is:
abc
def
ghi
123
456
789
You get:
abc|def|ghi
123|456|789
This replaces each newline with a | (credit to this answer), and then || (i.e. what was a pair of newlines in the original input) with a newline.
The caveat here is that | can't appear at the beginning or end of a line in your input; otherwise, the second sed will add newlines in the wrong places. To work around that, you can use another character that won't be in your input as an intermediate value, and then replace singletons of that character with | and pairs with \n.
EDIT
Here's an example that implements the workaround above, using the NUL character \x00 (which should be highly unlikely to appear in your input) as the intermediate character:
cat file | sed ':a;N;$!ba; s/\n/\x00/g; s/\x00\x00/\n/g; s/\x00/|/g'
Explanation:
:a;N;$!ba; puts the entire file in the pattern space, including newlines
s/\n/\x00/g; replaces all newlines with the NUL character
s/\x00\x00/\n/g; replaces all pairs of NULs with a newline
s/\x00/|/g replaces the remaining singletons of NULs with a |
BSD version:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\x00/g' -e 's/\x00\x00/\
/g' -e 's/\x00/|/g'
EDIT 2
For a more direct approach (GNU sed only), provided by #ClaudiuGeorgiu:
sed -z 's/\([^\n]\)\n\([^\n]\)/\1|\2/g; s/\n\n/\n/g'
Explanation:
-z uses NUL characters as line-endings (so newlines are not given special treatment and can be matched in the regular expression)
s/\([^\n]\)\n\([^\n]\)/\1|\2/g; replaces every 3-character sequence of <non-newline><newline><non-newline> with <non-newline>|<non-newline>
s/\n\n/\n/g replaces all pairs of newlines with a single newline
In native bash:
#!/usr/bin/env bash
curr=
while IFS= read -r line; do
if [[ $line ]]; then
curr+="|$line"
else
printf '%s\n' "${curr#|}"
curr=
fi
done
[[ $curr ]] && printf '%s\n' "${curr#|}"
Tested:
$ f() { local curr= line; while IFS= read -r line; do if [[ $line ]]; then curr+="|$line"; else printf '%s\n' "${curr#|}"; curr=; fi; done; [[ $curr ]] && printf '%s\n' "${curr#|}"; }
$ f < <(printf '%s\n' 'abc' 'def' 'ghi' '' 123 456 789)
abc|def|ghi
123|456|789
Use rs. For example:
rs -C'|' 2 3 < file
rs = reshape data array. Here I'm specifying that I want 2 rows, 3 columns, and the output separator to be pipe.

How do I avoid the usage of the "for" loop in this bash function?

I am creating this function to make multiple grep's over every line of a file. I run it as following:
cat file.txt | agrep string1 string2 ... stringN
function agrep () {
for a in $#; do
cmd+=" | grep '$a'";
done ;
while read line ; do
eval "echo "\'"$line"\'" $cmd";
done;
}
The idea is to print every line that contains all the strings: string1, string2, ..., stringN. This already works but I want to avoid the usage of the for to construct the expression:
| grep string1 | grep string2 ... | stringN
And if it's possible, also the usage of eval. I tried to make some expansion as follows:
echo "| grep $"{1..3}
And I get:
| grep $1 | grep $2 | grep $3
This is almost what I want but the problem is that when I try:
echo "| grep $"{1..$#}
The expansion doesn't occur because bash cant expand {1..$#} due to the $#. It just works with numbers. I would like to construct some expansion that works in order to avoid the usage of the for in the agrep function.
agrep () {
if [ $# = 0 ]; then
cat
else
pattern="$1"
shift
grep -e "$pattern" | agrep "$#"
fi
}
Instead of running each multiple greps on each line, just get all the lines that match string1, then pipe that to grep for string2, etc. One way to do this is make agrep recursive.
agrep () {
if (( $# == 0 )); then
cat # With no arguments, just output everything
else
grep "$1" | agrep "${#:2}"
fi
}
It's not the most efficient solution, but it's simple.
(Be sure to note Rob Mayoff's answer, which is the POSIX-compliant version of this.)
awk to the rescue!
you can avoid multiple grep calls and constructing the command by switching to awk
awk -v pat='string1 string2 string3' 'BEGIN{n=split(pat,p)}
{for(i=1;i<=n;i++) if($0!~p[i]) next}1 ' file
enter your space delimited strings as in the example above.
Not building a string for the command is definitely better (see chepner's and Rob Mayoff's answers). However, just as an example, you can avoid the for by using printf:
agrep () {
cmd=$(printf ' | grep %q' "$#")
sh -c "cat $cmd"
}
Using printf also helps somewhat with special characters in the patterns. From help printf:
In addition to the standard format specifications described in printf(1),
printf interprets:
%b expand backslash escape sequences in the corresponding argument
%q quote the argument in a way that can be reused as shell input
%(fmt)T output the date-time string resulting from using FMT as a format
string for strftime(3)
Since the aim of %q is providing output suitable for shell input, this should be safe.
Also: You almost always want to use "$#" with the quotes, not just plain $#.

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Bash: Strip trailing linebreak from output

When I execute commands in Bash (or to be specific, wc -l < log.txt), the output contains a linebreak after it. How do I get rid of it?
If your expected output is a single line, you can simply remove all newline characters from the output. It would not be uncommon to pipe to the tr utility, or to Perl if preferred:
wc -l < log.txt | tr -d '\n'
wc -l < log.txt | perl -pe 'chomp'
You can also use command substitution to remove the trailing newline:
echo -n "$(wc -l < log.txt)"
printf "%s" "$(wc -l < log.txt)"
If your expected output may contain multiple lines, you have another decision to make:
If you want to remove MULTIPLE newline characters from the end of the file, again use cmd substitution:
printf "%s" "$(< log.txt)"
If you want to strictly remove THE LAST newline character from a file, use Perl:
perl -pe 'chomp if eof' log.txt
Note that if you are certain you have a trailing newline character you want to remove, you can use head from GNU coreutils to select everything except the last byte. This should be quite quick:
head -c -1 log.txt
Also, for completeness, you can quickly check where your newline (or other special) characters are in your file using cat and the 'show-all' flag -A. The dollar sign character will indicate the end of each line:
cat -A log.txt
One way:
wc -l < log.txt | xargs echo -n
If you want to remove only the last newline, pipe through:
sed -z '$ s/\n$//'
sed won't add a \0 to then end of the stream if the delimiter is set to NUL via -z, whereas to create a POSIX text file (defined to end in a \n), it will always output a final \n without -z.
Eg:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//'; echo tender
foo
bartender
And to prove no NUL added:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//' | xxd
00000000: 666f 6f0a 6261 72 foo.bar
To remove multiple trailing newlines, pipe through:
sed -Ez '$ s/\n+$//'
There is also direct support for white space removal in Bash variable substitution:
testvar=$(wc -l < log.txt)
trailing_space_removed=${testvar%%[[:space:]]}
leading_space_removed=${testvar##[[:space:]]}
If you want to print output of anything in Bash without end of line, you echo it with the -n switch.
If you have it in a variable already, then echo it with the trailing newline cropped:
$ testvar=$(wc -l < log.txt)
$ echo -n $testvar
Or you can do it in one line, instead:
$ echo -n $(wc -l < log.txt)
If you assign its output to a variable, bash automatically strips whitespace:
linecount=`wc -l < log.txt`
printf already crops the trailing newline for you:
$ printf '%s' $(wc -l < log.txt)
Detail:
printf will print your content in place of the %s string place holder.
If you do not tell it to print a newline (%s\n), it won't.
Adding this for my reference more than anything else ^_^
You can also strip a new line from the output using the bash expansion magic
VAR=$'helloworld\n'
CLEANED="${VAR%$'\n'}"
echo "${CLEANED}"
Using Awk:
awk -v ORS="" '1' log.txt
Explanation:
-v assignment for ORS
ORS - output record separator set to blank. This will replace new line (Input record separator) with ""

Resources