I'm trying to pipe a list of values through xargs. Here's a simple example:
echo "Hello Hola Bonjour" | xargs -I _ echo _ Landon
I would expect this to output the following:
Hello Landon
Hola Landon
Bonjour Landon
Instead, the command outputs this:
Hello Hola Bonjour Landon
What am I missing?
Under -I, man xargs says
unquoted blanks do not terminate input items; instead the
separator is the newline character
You can specify a different delimiter (at least in GNU xargs):
printf 'Hello Hola Bonjour' | xargs -d' ' -I _ echo _ Landon
More portably, use \0 as the delimiter and -0 to use it:
printf '%s\0' Hello Hola Bonjour | xargs -0 -I _ echo _ Landon
The delimiter needs to be changed, and you also have to specify the -L option. So either change the delimiter via -d like the other answer suggested, or pipe to sed and replace space with linefeed
echo "Hello Hola Bonjour" | sed -e 's/ /\n/g' | xargs -L 1 -I _ echo _ Landon
Results in
Hello Landon
Hola Landon
Bonjour Landon
Sometimes changing the delimiters is not enough. xargs will sometimes take all the input arguments and pass it all at once. There is no splitting of the arguments.
e.g.
seq 1 7 | xargs echo
results in
1
2
3
4
5
6
7
being passed to xargs, so the output would be
1 2 3 4 5 6 7
If you add a -L 1 shown in the xargs man page
-L max-lines
Use at most max-lines nonblank input lines per command line. Trailing blanks cause an input line to be
logically continued on the next input line. Implies -x.
seq 1 7 | xargs -L 1 echo
then you will see
1
2
3
4
5
6
7
You can also covert it manually to a for loop which lets you setup multi line statements more easily.
# be absolutely sure the values the for loop iterates over is well sanitized to avoid glob expansion of the *.
for i in Hello Hola Bonjour
do
if [ "$i" = "Hello" ]
then
echo "$i Landon, language detected as English!"
else
echo "$i Landon, language detected as non English."
fi
done
The -I flag changes the delimiter to newline.
Unquoted blanks do not terminate input items; instead the separator is the newline character.
You can read about it here.
You have to manually specify the delimiter to be a space. Echo also inserts a newline by default, which messes up xargs. Use the -n flag to remove the newline.
Here is the fixed command:
echo -n "Hello Hola Bonjour" | xargs -d' ' -I _ echo _ Landon
Related
Problem
(1) Given a string, I replace spaces with $'\n' using sed:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g'
This outputs:
# one$'\n'two
(2) Note that echoing this output of (1):
echo one$'\n'two
results in:
# one
# two
(3) I echo the output of (1) in another way, by piping the output of (1) into xargs echo:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g' | xargs echo
But I don't get the same output as (2):
# one$\ntwo
Question
What does xargs do when formatting the input of a string containing $'\n'?
Why is echoing a string with $'\n' not the same as using xargs echo on the same string?
When you write
echo one$'\n'two
at the command line, bash replaces the "$'\n'" with a newline. But when you pass it to xargs no such replacement can happen.
But piping it to xargs will still not do what you want, since by default xargs uses the newline as an argument separator:
$ echo "one two" | tr ' ' '\n' | xargs echo
one two
You must tell xargs to use a different separator, even if it is a bogus one:
$ echo "one two" | tr ' ' '\n' | xargs -0 echo
one
two
Unsure if answering your question, but a trick I've used in the past for similar cases is to use printf instead, which loops over passed arguments in a loop (if not enough % to consume them), e.g.:
$ printf "%s\n" one two
one
two
Use shell own white-space separator if above are in a single string
$ args="one two"
$ printf "%s\n" $args
one
two
Just for completeness, feed to xargs -n1 with some foo scriptlet
$ printf "%s\n" one two |xargs -n1 sh -c 'echo [$(date -R)] foo=$1' --
[Sun, 03 Jun 2018 21:34:17 -0300] foo=one
[Sun, 03 Jun 2018 21:34:17 -0300] foo=two
When invoking xargs with only -n1, xargs executes a separate echo command for every item:
$ echo 1 2 | xargs -n1
1
2
But when using -n1 with the -I option, which passes the string to be replaced to xargs, it passes all the arguments to a single echo command, effectively ignoring-n1:
$ echo 1 2 | xargs -n1 -I% echo %
1 2
My goal is to execute an arbitrary command with different arguments:
$ echo 1 2 | xargs -n1 -I% mycommand %
# What I want to achieve
mycommand 1
mycommand 2
but I'm quite baffled by the behavior I'm seeing, so:
Why xargs seemingly ignore -n1?
What is the correct way to do what I am trying to? Note that I don't want to deal with any files while doing so.
From xargs(1):
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input. Also, unquoted blanks do not
terminate input items; instead the separator is the newline
character. Implies -x and -L 1.
$ echo $'1\n2' | xargs -n1 -I% echo %
1
2
$ echo $'1\n2' | xargs -n1 -I% echo '*' %
* 1
* 2
The bash manual says regarding command substitution:
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted.
Demonstration - 3 characters, newlines first:
$ output="$(printf "\n\nx")"; echo -n "$output" | wc -c
3
Here the newlines are not at the end, and do not get removed, so the count is 3.
Demonstration - 3 characters, newlines last:
$ output="$(printf "x\n\n")"; echo -n "$output" | wc -c
1
Here the newlines are removed from the end, so the count is 1.
TL;DR
What is a robust work-around to get the binary-clean output of a command into a variable?
Bonus points for Bourne shell compatibility.
The only way to do it in a "Bourne compatible" way is to use external utilities.
Beside writting one in c, you can use xxd and expr (for example):
$ output="$(printf "x\n\n"; printf "X")" # get the output ending in "X".
$ printf '%s' "${output}" | xxd -p # transform the string to hex.
780a0a58
$ hexstr="$(printf '%s' "${output}" | xxd -p)" # capture the hex
$ expr "$hexstr" : '\(.*\)..' # remove the last two hex ("X").
780a0a
$ hexstr="$(expr "$hexstr" : '\(.*\)..') # capture the shorter str.
$ printf "$hexstr" | xxd -p -r | wc -c # convert back to binary.
3
Shortened:
$ output="$(printf "x\n\n"; printf "X")"
$ hexstr="$(printf '%s' "${output}" | xxd -p )"
$ expr "$hexstr" : '\(.*\)..' | xxd -p -r | wc -c
3
The command xxd is being used for its ability to convert back to binary.
Note that wc will fail with many UNICODE characters (multibyte chars):
$ printf "Voilà" | wc -c
6
$ printf "★" | wc -c
3
It will print the count of bytes, not characters.
The length of a variable ${#var} will also fail in older shells.
Of course, to get this to run in a Bourne shell you must use `…` instead of $(…).
In bash, the ${parameter%word} form of Shell Parameter Expansion can be used:
$ output="$(printf "x\n\n"; echo X)"; echo -n "${output%X}" | wc -c
3
This is substitution is also specified by POSIX.1-2008.
Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.
When I execute commands in Bash (or to be specific, wc -l < log.txt), the output contains a linebreak after it. How do I get rid of it?
If your expected output is a single line, you can simply remove all newline characters from the output. It would not be uncommon to pipe to the tr utility, or to Perl if preferred:
wc -l < log.txt | tr -d '\n'
wc -l < log.txt | perl -pe 'chomp'
You can also use command substitution to remove the trailing newline:
echo -n "$(wc -l < log.txt)"
printf "%s" "$(wc -l < log.txt)"
If your expected output may contain multiple lines, you have another decision to make:
If you want to remove MULTIPLE newline characters from the end of the file, again use cmd substitution:
printf "%s" "$(< log.txt)"
If you want to strictly remove THE LAST newline character from a file, use Perl:
perl -pe 'chomp if eof' log.txt
Note that if you are certain you have a trailing newline character you want to remove, you can use head from GNU coreutils to select everything except the last byte. This should be quite quick:
head -c -1 log.txt
Also, for completeness, you can quickly check where your newline (or other special) characters are in your file using cat and the 'show-all' flag -A. The dollar sign character will indicate the end of each line:
cat -A log.txt
One way:
wc -l < log.txt | xargs echo -n
If you want to remove only the last newline, pipe through:
sed -z '$ s/\n$//'
sed won't add a \0 to then end of the stream if the delimiter is set to NUL via -z, whereas to create a POSIX text file (defined to end in a \n), it will always output a final \n without -z.
Eg:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//'; echo tender
foo
bartender
And to prove no NUL added:
$ { echo foo; echo bar; } | sed -z '$ s/\n$//' | xxd
00000000: 666f 6f0a 6261 72 foo.bar
To remove multiple trailing newlines, pipe through:
sed -Ez '$ s/\n+$//'
There is also direct support for white space removal in Bash variable substitution:
testvar=$(wc -l < log.txt)
trailing_space_removed=${testvar%%[[:space:]]}
leading_space_removed=${testvar##[[:space:]]}
If you want to print output of anything in Bash without end of line, you echo it with the -n switch.
If you have it in a variable already, then echo it with the trailing newline cropped:
$ testvar=$(wc -l < log.txt)
$ echo -n $testvar
Or you can do it in one line, instead:
$ echo -n $(wc -l < log.txt)
If you assign its output to a variable, bash automatically strips whitespace:
linecount=`wc -l < log.txt`
printf already crops the trailing newline for you:
$ printf '%s' $(wc -l < log.txt)
Detail:
printf will print your content in place of the %s string place holder.
If you do not tell it to print a newline (%s\n), it won't.
Adding this for my reference more than anything else ^_^
You can also strip a new line from the output using the bash expansion magic
VAR=$'helloworld\n'
CLEANED="${VAR%$'\n'}"
echo "${CLEANED}"
Using Awk:
awk -v ORS="" '1' log.txt
Explanation:
-v assignment for ORS
ORS - output record separator set to blank. This will replace new line (Input record separator) with ""