How does xargs format the input of $'\n'? - bash

Problem
(1) Given a string, I replace spaces with $'\n' using sed:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g'
This outputs:
# one$'\n'two
(2) Note that echoing this output of (1):
echo one$'\n'two
results in:
# one
# two
(3) I echo the output of (1) in another way, by piping the output of (1) into xargs echo:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g' | xargs echo
But I don't get the same output as (2):
# one$\ntwo
Question
What does xargs do when formatting the input of a string containing $'\n'?
Why is echoing a string with $'\n' not the same as using xargs echo on the same string?

When you write
echo one$'\n'two
at the command line, bash replaces the "$'\n'" with a newline. But when you pass it to xargs no such replacement can happen.
But piping it to xargs will still not do what you want, since by default xargs uses the newline as an argument separator:
$ echo "one two" | tr ' ' '\n' | xargs echo
one two
You must tell xargs to use a different separator, even if it is a bogus one:
$ echo "one two" | tr ' ' '\n' | xargs -0 echo
one
two

Unsure if answering your question, but a trick I've used in the past for similar cases is to use printf instead, which loops over passed arguments in a loop (if not enough % to consume them), e.g.:
$ printf "%s\n" one two
one
two
Use shell own white-space separator if above are in a single string
$ args="one two"
$ printf "%s\n" $args
one
two
Just for completeness, feed to xargs -n1 with some foo scriptlet
$ printf "%s\n" one two |xargs -n1 sh -c 'echo [$(date -R)] foo=$1' --
[Sun, 03 Jun 2018 21:34:17 -0300] foo=one
[Sun, 03 Jun 2018 21:34:17 -0300] foo=two

Related

How to remove invisible chars in bash (tr and sed not working)

I want to remove invisible chars from a response:
Here is my code:
test_id=`clasp run testRunner`
echo "visible"
echo "$test_id"
echo "invisible"
echo "$test_id" | cat -v
echo "invisible2"
echo "$test_id" | tr -dc '[:print:]' | cat -v
echo "invisible3"
echo "$test_id" | sed 's/[^a-zA-Z0-9]//g' | cat -v
echo "invisible4"
printf '%q\n' "$test_id"
Here's the output:
visible
1d5422fb
invisible
^[[2K^[[1G1d5422fb
invisible2
[2K[1G1d5422fbinvisible3
2K1G1d5422fb
invisible4
$'\E[2K\E[1G1d5422fb'
The following code works with your example:
shopt -s extglob
test_id=$'\e[2K\e[1G1d5422fb'
test_id="${test_id//$'\e['*([^a-zA-Z])[a-zA-Z]}"
echo "$test_id" | cat -v
The crucial part is the third line, which applies a string substitution to the expanded variable. It matches (and removes) all occurrences of the pattern
$'\e[' - a single Esc character followed by [
*( ... ) - (this is what extglob is needed for) zero or more occurrences of ...
[^a-zA-Z] - a single non-alphabetic character
[a-zA-Z] - a single alphabetic character
In your example this gets rid of the two escape sequences \e[2K (erase line) and \e[1G (move cursor to column 1).
Instead of removing the escape sequences prevent them from being generated, which I guess you can do with
test_id=$(TERM=dumb clasp run testRunner)
echo "solution"
echo "$test_id" | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g' | cat -v
as per #Dave's edit on his own question.

Get exact output of a shell command

The bash manual says regarding command substitution:
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted.
Demonstration - 3 characters, newlines first:
$ output="$(printf "\n\nx")"; echo -n "$output" | wc -c
3
Here the newlines are not at the end, and do not get removed, so the count is 3.
Demonstration - 3 characters, newlines last:
$ output="$(printf "x\n\n")"; echo -n "$output" | wc -c
1
Here the newlines are removed from the end, so the count is 1.
TL;DR
What is a robust work-around to get the binary-clean output of a command into a variable?
Bonus points for Bourne shell compatibility.
The only way to do it in a "Bourne compatible" way is to use external utilities.
Beside writting one in c, you can use xxd and expr (for example):
$ output="$(printf "x\n\n"; printf "X")" # get the output ending in "X".
$ printf '%s' "${output}" | xxd -p # transform the string to hex.
780a0a58
$ hexstr="$(printf '%s' "${output}" | xxd -p)" # capture the hex
$ expr "$hexstr" : '\(.*\)..' # remove the last two hex ("X").
780a0a
$ hexstr="$(expr "$hexstr" : '\(.*\)..') # capture the shorter str.
$ printf "$hexstr" | xxd -p -r | wc -c # convert back to binary.
3
Shortened:
$ output="$(printf "x\n\n"; printf "X")"
$ hexstr="$(printf '%s' "${output}" | xxd -p )"
$ expr "$hexstr" : '\(.*\)..' | xxd -p -r | wc -c
3
The command xxd is being used for its ability to convert back to binary.
Note that wc will fail with many UNICODE characters (multibyte chars):
$ printf "Voilà" | wc -c
6
$ printf "★" | wc -c
3
It will print the count of bytes, not characters.
The length of a variable ${#var} will also fail in older shells.
Of course, to get this to run in a Bourne shell you must use `…` instead of $(…).
In bash, the ${parameter%word} form of Shell Parameter Expansion can be used:
$ output="$(printf "x\n\n"; echo X)"; echo -n "${output%X}" | wc -c
3
This is substitution is also specified by POSIX.1-2008.

Bash: Filter directory when piping from `ls` to `tee`

(background info)
Writing my first bash psuedo-program. The program downloads a bunch of files from the network, stores them in a sub-directory called ./network-files/, then removes all the files it downloaded. It also logs the result to several log files in ./logs/.
I want to log the filenames of each file deleted.
Currently, I'm doing this:
echo -e "$(date -u) >>> Removing files: $(ls -1 "$base_directory"/network-files/* | tr '\n' ' ')" | tee -a $network_files_log $verbose_log $network_log
($base_directory is a variable defining the base directory for the app, $network_files_log etc are variables defining the location of various log files)
This produces some pretty grody and unreadable output:
Tue Jun 21 04:55:46 UTC 2016 >>> Removing files: /home/vagrant/load-simulator/network-files/207822218.png /home/vagrant/load-simulator/network-files/217311040.png /home/vagrant/load-simulator/network-files/442119100.png /home/vagrant/load-simulator/network-files/464324101.png /home/vagrant/load-simulator/network-files/525787337.png /home/vagrant/load-simulator/network-files/581100197.png /home/vagrant/load-simulator/network-files/640387393.png /home/vagrant/load-simulator/network-files/650797708.png /home/vagrant/load-simulator/network-files/827538696.png /home/vagrant/load-simulator/network-files/833069509.png /home/vagrant/load-simulator/network-files/8580204.png /home/vagrant/load-simulator/network-files/858174053.png /home/vagrant/load-simulator/network-files/998266826.png
Any good way to strip out the /home/vagrant/load-simulator/network-files/ part from each of those file paths? I suspect there's something I should be doing with sed or grep, but haven't had any luck so far.
You might also consider using find. Its perfect for walking directories, removing files and using customized printf for output:
find $PWD/x -type f -printf "%f\n" -delete >>$YourLogFile.log
Don't use ls at all; use a glob to populate an array with the desired files. You can then use parameter expansion to shorten each array element.
d=$base_directory/network-files
files=( "$d"/* )
printf '%s Removing files: %s' "$(date -u)" "${files[*]#$d/}" | tee ...
You could do it a couple of ways. To directly answer the question, you could use sed to do it with the substitution command like:
echo -e "$(date -u) >>> Removing files: $(ls -1 "$base_directory"/network-files/* | tr '\n' ' ')" | sed -e "s,$base_directory/network-files/,," | tee -a $network_files_log $verbose_log $network_log
which adds sed -e "s,$base_directory/network-files/,," to the pipeline. It will substitute the string found in base_directory with the empty string, so long as there are not any commas in base_directory. If there are you could try a different separator for the parts of the sed command, like underscore: sed -e "s_$base_directory/network-files__"
Instead though, you could just have the subshell cd to that directory and then the string wouldn't be there in the first place:
echo -e "$(date -u) >>> Removing files: $(cd "$base_directory/network-files/"; ls -1 | tr '\n' ' ')" | tee -a "$network_files_log" "$verbose_log" "$network_log"
Or you could avoid some potential pitfalls with echo and use printf like
{ printf '%s >>>Removing files: '; printf '%s ' "$(cd "$base_directory/network-files"; ls -1)"; printf '\n'; } | tee -a ...
testdata="/home/vagrant/load-simulator/network-files/207822218.png /home/vagrant/load-simulator/network-files/217311040.png"
echo -e $testdata | sed -e 's/\/[^ ]*\///g'
Pipe your output to sed the replace that captured group with nothing.
The regex: \/[^ ]*\/
Start with a /, captured everything that is not a space until it gets to the last /.

Bash: Strip trailing linebreak from output

When I execute commands in Bash (or to be specific, wc -l < log.txt), the output contains a linebreak after it. How do I get rid of it?
If your expected output is a single line, you can simply remove all newline characters from the output. It would not be uncommon to pipe to the tr utility, or to Perl if preferred:
wc -l < log.txt | tr -d '\n'
wc -l < log.txt | perl -pe 'chomp'
You can also use command substitution to remove the trailing newline:
echo -n "$(wc -l < log.txt)"
printf "%s" "$(wc -l < log.txt)"
If your expected output may contain multiple lines, you have another decision to make:
If you want to remove MULTIPLE newline characters from the end of the file, again use cmd substitution:
printf "%s" "$(< log.txt)"
If you want to strictly remove THE LAST newline character from a file, use Perl:
perl -pe 'chomp if eof' log.txt
Note that if you are certain you have a trailing newline character you want to remove, you can use head from GNU coreutils to select everything except the last byte. This should be quite quick:
head -c -1 log.txt
Also, for completeness, you can quickly check where your newline (or other special) characters are in your file using cat and the 'show-all' flag -A. The dollar sign character will indicate the end of each line:
cat -A log.txt
One way:
wc -l < log.txt | xargs echo -n
If you want to remove only the last newline, pipe through:
sed -z '$ s/\n$//'
sed won't add a \0 to then end of the stream if the delimiter is set to NUL via -z, whereas to create a POSIX text file (defined to end in a \n), it will always output a final \n without -z.
Eg:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//'; echo tender
foo
bartender
And to prove no NUL added:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//' | xxd
00000000: 666f 6f0a 6261 72 foo.bar
To remove multiple trailing newlines, pipe through:
sed -Ez '$ s/\n+$//'
There is also direct support for white space removal in Bash variable substitution:
testvar=$(wc -l < log.txt)
trailing_space_removed=${testvar%%[[:space:]]}
leading_space_removed=${testvar##[[:space:]]}
If you want to print output of anything in Bash without end of line, you echo it with the -n switch.
If you have it in a variable already, then echo it with the trailing newline cropped:
$ testvar=$(wc -l < log.txt)
$ echo -n $testvar
Or you can do it in one line, instead:
$ echo -n $(wc -l < log.txt)
If you assign its output to a variable, bash automatically strips whitespace:
linecount=`wc -l < log.txt`
printf already crops the trailing newline for you:
$ printf '%s' $(wc -l < log.txt)
Detail:
printf will print your content in place of the %s string place holder.
If you do not tell it to print a newline (%s\n), it won't.
Adding this for my reference more than anything else ^_^
You can also strip a new line from the output using the bash expansion magic
VAR=$'helloworld\n'
CLEANED="${VAR%$'\n'}"
echo "${CLEANED}"
Using Awk:
awk -v ORS="" '1' log.txt
Explanation:
-v assignment for ORS
ORS - output record separator set to blank. This will replace new line (Input record separator) with ""

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources