How to apply shell command to each line of a command output? - bash

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?

It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]

You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'

for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo

You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.

You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.

A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.

xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.

i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.

Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Related

How to split multiple line output to arguments in bash?

Let the output of a program to be a multiple line text:
$: some_program
↓
line 1
line 2
Now how to use the output so each line is passed as a single argument?
$: count_arguments $(some_program)
↓
4
won't work because it split by new lines and spaces.
count_arguments "$(some_program)"
↓
1
won't work either.
With an intermediary step could the output be read into an array and then use the array as "${arr[#]}"
But I am looking for a one line solution. Is it possible?
With Bashv4+, mapfile is one solution.
mapfile -t output < <(some_command); your_command "${output[#]}"
or
echo "${#output[*]}"
Counting the output of some_command, just use wc
some_command | wc -l
You could convert the newlines to NULL characters, and use xargs -0 (the -0 tells it to use NULL as a delimiter, instead of whitespace):
some_program | tr '\n' '\0' | xargs -0 count_arguments
There is one possible caveat, though: if xargs thinks there are too many lines (arguments) or they're too big, it'll split them into reasonable-sized groups and run the utility separately on each group. OTOH if xargs thinks that, it's probably right any any method that didn't split them would just straight-up fail.

Bash shell script, special characters and passing arguments to curl [duplicate]

I have the following problem.
Got a file which includes certain paths/files of a FS.
These for some reason do include the whole range of special characters, like space, single/double quotes, even sometimes the Copyright ASCII.
I need to run each line of the file and pass it to another command.
What I tried so far is:
<input_file xargs -I % command %
Which was working until I got this message from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
But usinf this option did not work at all for me
xargs: argument line too long
Does anybody have a solution which does work ok with special characters.
Doesn't have to be with xargs, but I need to pass the line as it is to the command.
Many thanks in advance.
You should separate the filenames with the \0 NULL character for processing.
This can be done with
find . <args> -print0 | xargs -0
or if you must process the file with filenames, change the '\n` to '\0', e.g.
tr '\n' '\0' < filename | xargs -0 -n1 -I% echo "==%=="
the -n 1 says,
-n max-args
Use at most max-args arguments per command line.
and you should to use "%" quotes to enclosing %
The xargs -0 -n1 -I% echo "==%==" solution didn't work for me on my Mac OS X, so I found another one.
<input_with_single_quotes cat | sed "s/'/\\\'/" | xargs -I {} echo {}
This replaces the ' character with \' that works well as an input to the commands in xargs.

Bash: displaying wc with three digit output?

conducting a word count of a directory.
ls | wc -l
if output is "17", I would like the output to display as "017".
I have played with | printf with little luck.
Any suggestions would be appreciated.
printf is the way to go to format numbers:
printf "There were %03d files\n" "$(ls | wc -l)"
ls | wc -l will tell you how many lines it encountered parsing the output of ls, which may not be the same as the number of (non-dot) filenames in the directory. What if a filename has a newline? One reliable way to get the number of files in a directory is
x=(*)
printf '%03d\n' "${#x[#]}"
But that will only work with a shell that supports arrays. If you want a POSIX compatible approach, use a shell function:
countargs() { printf '%03d\n' $#; }
countargs *
This works because when a glob expands the shell maintains the words in each member of the glob expansion, regardless of the characters in the filename. But when you pipe a filename the command on the other side of the pipe can't tell it's anything other than a normal string, so it can't do any special handling.
You coud use sed.
ls | wc -l | sed 's/^17$/017/'
And this applies to all the two digit numbers.
ls | wc -l | sed '/^[0-9][0-9]$/s/.*/0&/'

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Resources