What options to xargs will make it work with lines rather than words - xargs

I want to use xargs (or similar concept) on lines with spaces. In effect, if the input has
line one
line two
line three
...
and I call xargs such as cat f | xargs foo, then I want xargs to execute foo "line one" "line two" "line three" ... for as many arguments that fit on a command line.
Currently I do this using the shell such as
cat list | while read f ; do foo $f ; done
but that calls foo for each line and it would be a bit more efficient if foo is called with a list rather than just one argument.
The -I option, as documented, sounds like what I want but in my case (Free BSD), it is calling foo with only one argument. -L and -R don't seem to help.

You can replace newlines with nulls:
cat list | tr '\n' '\0' | xargs -0 foo

Related

How to split multiple line output to arguments in bash?

Let the output of a program to be a multiple line text:
$: some_program
↓
line 1
line 2
Now how to use the output so each line is passed as a single argument?
$: count_arguments $(some_program)
↓
4
won't work because it split by new lines and spaces.
count_arguments "$(some_program)"
↓
1
won't work either.
With an intermediary step could the output be read into an array and then use the array as "${arr[#]}"
But I am looking for a one line solution. Is it possible?
With Bashv4+, mapfile is one solution.
mapfile -t output < <(some_command); your_command "${output[#]}"
or
echo "${#output[*]}"
Counting the output of some_command, just use wc
some_command | wc -l
You could convert the newlines to NULL characters, and use xargs -0 (the -0 tells it to use NULL as a delimiter, instead of whitespace):
some_program | tr '\n' '\0' | xargs -0 count_arguments
There is one possible caveat, though: if xargs thinks there are too many lines (arguments) or they're too big, it'll split them into reasonable-sized groups and run the utility separately on each group. OTOH if xargs thinks that, it's probably right any any method that didn't split them would just straight-up fail.

Why doesn't this sed command put a newline

I have a file, ciao.py thas has only one line in it: print("ciao")
I want to do this: I want to do that via pipe stream, and als, if I do cat ciao.py | sed 's/.*/&\n&/' it would work, but I want to do this in two separated parts, simulating the case where I want to print it and then pass that to further commands.
If I do this:
cat ciao.py | sed 's/.*/&\n/' |tee >(xargs echo) | xargs echo
it does not work. It prints print("ciao") print("ciao") in the same line. I don't understand why, since I am putting \n with sed.
I'd guess print cia is appearing twice on the same line because xargs is calling echo with multiple strings since xargs calls the command you provide it with groups of input lines at a time by default.
Is this what you're trying to do?
$ cat ciao.py | sed 's/.*/&\n/' |tee >(xargs -n 1 echo) | xargs -n 1 echo
print(ciao)
print(ciao)
or:
$ cat ciao.py | sed 's/.*/&\n/' |tee >(cat) | xargs -n 1 echo
print(ciao)
print(ciao)
There are, of course, better ways to get that output from that input, e.g.:
$ sed 'p' ciao.py
print("ciao")
print("ciao")

xargs for each element in a list in the middle of a command, not just the end like -L does

What is the correct way to xargs to feed a list of strings as inputs to the middle of a command?
For example, say I want to move all files that come through a complicated series of "pipes" to the home directory. Something like
$ ... | ... | ... | awk '{print $2}' | xargs -L1 mv ~/
Tries to move the home directory to each input, rather than the desired order.
Someone previously asked a question about this, but the answers were not helpful:
Unix - "xargs" - output "in the middle" (not at the end!)
Is there a way to place the xargs input into a specific part of the command and not just at the end.
Using GNU Parallel it looks like this:
$ ... | ... | ... | awk '{print $2}' | parallel mv {} ~/
It will run one job per file. Faster is:
$ ... | ... | ... | awk '{print $2}' | parallel -X mv {} ~/
It will insert multiple names before running the job.
You didn't mention what xargs are you using.
If you use BSD xargs instead of GNU xargs, there is a powerful option -L that can do what you want:
-J replstr
If this option is specified, xargs will use the data read from
standard input to replace the first occurrence of replstr instead
of appending that data after all other arguments. This option
will not affect how many arguments will be read from input (-n),
or the size of the command(s) xargs will generate (-s). The op-
tion just moves where those arguments will be placed in the com-
mand(s) that are executed. The replstr must show up as a dis-
tinct argument to xargs. It will not be recognized if, for in-
stance, it is in the middle of a quoted string. Furthermore,
only the first occurrence of the replstr will be replaced. For
example, the following command will copy the list of files and
directories which start with an uppercase letter in the current
directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -Rp % destdir
Examples
$ seq 3 | xargs -J# echo # fourth fifth
1 2 3 fourth fifth
seq 3 | xargs -J# ruby -e 'pp ARGV' # fourth fifth
["1", "2", "3", "fourth", "fifth"]
The Ruby code pp ARGV here would pretty print the command arguments it receives. I usually use this way to debug xargs scripts.

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

How to apply shell command to each line of a command output?

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?
It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'
for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo
You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.
You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.
A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.
xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.
i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.
Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Resources