How to split multiple line output to arguments in bash? - bash

Let the output of a program to be a multiple line text:
$: some_program
↓
line 1
line 2
Now how to use the output so each line is passed as a single argument?
$: count_arguments $(some_program)
↓
4
won't work because it split by new lines and spaces.
count_arguments "$(some_program)"
↓
1
won't work either.
With an intermediary step could the output be read into an array and then use the array as "${arr[#]}"
But I am looking for a one line solution. Is it possible?

With Bashv4+, mapfile is one solution.
mapfile -t output < <(some_command); your_command "${output[#]}"
or
echo "${#output[*]}"
Counting the output of some_command, just use wc
some_command | wc -l

You could convert the newlines to NULL characters, and use xargs -0 (the -0 tells it to use NULL as a delimiter, instead of whitespace):
some_program | tr '\n' '\0' | xargs -0 count_arguments
There is one possible caveat, though: if xargs thinks there are too many lines (arguments) or they're too big, it'll split them into reasonable-sized groups and run the utility separately on each group. OTOH if xargs thinks that, it's probably right any any method that didn't split them would just straight-up fail.

Related

Excluding only four-digit and five-digit numbers of a txt file

I have a file in linux bash, where I have a list of file names basically.
The filenames are including all kind of characters A-Z a-Z . _ and numbers.
Examples:
hello-34.87-world
foo-34578-bar
fo.23-5789-foobar
and a lot more...
The goal is, that I get a list of only the four and five digit numbers.
So Outcome should be:
34578
5789
I tought it would be a good idea, to work with vi.
So I could use only one command like:
:%s/someregularexpression//g
Thanks for your help.
Without having to store the filenames in a file
$ shopt -s extglob nullglob
$ for file in *-[0-9][0-9][0-9][0-9]?([0-9])-*; do
tmp=${file#*-} # remove up to the first hyphen
tmp=${tmp%-*} # remove the last hyphen and after
echo $tmp
done
5789
34578
Just use sed:
sed -nr 's/^[^0-9]*([0-9]{4,5}).*/\1/p' < myfile.txt
If you use vim, and a line doesn't have more than one such number per line, you can try the following:
:%s/^.\{-}\(\d\{4,5\}\).\{-}$/\1/g
And see :help \{- for non-greedy search.
This works with 1 instance per line and surrounded by 1 pair of dashes
grep -P '\d{4,5}' mytxt | \
while read buff
do
buff=${buff#*-}
echo ${buff%-*}
done
Normally you do not want to parse ls in view of spaces, newlines and other special characters. In this case you don't care.
First replace all non-numeric things into newlines.
Than only look for lines with 4 or 5 digits. After the replacement you only have digits, so this can be done by looking for 4 or 5 characters.
ls | tr -c '[^0-9]' '\n' | grep -E "^....(|.)$"
When you already have the filenames in a file and you are in vi, use
:% !tr -c '[^0-9]' '\n' | grep -E "^....(|.)$"

xargs for each element in a list in the middle of a command, not just the end like -L does

What is the correct way to xargs to feed a list of strings as inputs to the middle of a command?
For example, say I want to move all files that come through a complicated series of "pipes" to the home directory. Something like
$ ... | ... | ... | awk '{print $2}' | xargs -L1 mv ~/
Tries to move the home directory to each input, rather than the desired order.
Someone previously asked a question about this, but the answers were not helpful:
Unix - "xargs" - output "in the middle" (not at the end!)
Is there a way to place the xargs input into a specific part of the command and not just at the end.
Using GNU Parallel it looks like this:
$ ... | ... | ... | awk '{print $2}' | parallel mv {} ~/
It will run one job per file. Faster is:
$ ... | ... | ... | awk '{print $2}' | parallel -X mv {} ~/
It will insert multiple names before running the job.
You didn't mention what xargs are you using.
If you use BSD xargs instead of GNU xargs, there is a powerful option -L that can do what you want:
-J replstr
If this option is specified, xargs will use the data read from
standard input to replace the first occurrence of replstr instead
of appending that data after all other arguments. This option
will not affect how many arguments will be read from input (-n),
or the size of the command(s) xargs will generate (-s). The op-
tion just moves where those arguments will be placed in the com-
mand(s) that are executed. The replstr must show up as a dis-
tinct argument to xargs. It will not be recognized if, for in-
stance, it is in the middle of a quoted string. Furthermore,
only the first occurrence of the replstr will be replaced. For
example, the following command will copy the list of files and
directories which start with an uppercase letter in the current
directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -Rp % destdir
Examples
$ seq 3 | xargs -J# echo # fourth fifth
1 2 3 fourth fifth
seq 3 | xargs -J# ruby -e 'pp ARGV' # fourth fifth
["1", "2", "3", "fourth", "fifth"]
The Ruby code pp ARGV here would pretty print the command arguments it receives. I usually use this way to debug xargs scripts.

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

How to apply shell command to each line of a command output?

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?
It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'
for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo
You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.
You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.
A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.
xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.
i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.
Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Resources