Bash: print each input string in a new line - bash

With xargs, I give echo several strings to print. Is there any way to display each given string in a new line?
find . something -print0 | xargs -r0 echo
I'm sure this is super simple, but I can't find the right key words to google it, apparently.

Here's one way with echo to get what you want.
find . something -print0 | xargs -r0 -n1 echo
-n1 tells xargs to consume 1 command line argument with each invocation of the command (in this case, echo)

find . something -print0 | xargs -r0 printf "%s\n"
Clearly, find can also print one per line happily without help from xargs, but presumably this is only part of what you're after. The -r option makes sense; it doesn't run the printf command if there's nothing to print, which is a useful and sensible extension in GNU xargs compared with POSIX xargs which always requires the command to be run at least once. (The -print0 option to find and the -0 option to xargs are also GNU extensions compared with POSIX.)
Most frequently these days, you don't need to use xargs with find because POSIX find supports the + operand in place of the legacy ; to -exec, which means you can write:
find . something -exec printf '%s\n' {} +
When the + is used, find collects as many arguments as will fit conveniently on a command line and invokes the command with those arguments. In this case, there isn't much point to using printf (or echo), but in general, this handles the arguments correctly, one argument per file name, even if the file name contains blanks, tabs, newlines, and other awkward characters.

Related

Human-readable filesize and line count

I want a bash command that will return a table, where each row is the human-readable filesize, number of lines, and filename. The table should be sorted by filesize.
I've been trying to do this using a combination of du -hs, wc -l, and sort -h, and find.
Here's where I'm at:
find . -exec echo $(du -h {}) $(wc -l {}) \; | sort -h
Your approach fell short not only because the shell expanded your command substitutions ($(...)) up front, but more fundamentally because you cannot pass shell command lines directly to find:
find's -exec action can only invoke external utilities with literal arguments - the only non-literal argument supported is the {} representing the filename(s) at hand.
choroba's answer fixes your immediate problem by invoking a separate shell instance in each iteration, to which the shell command to execute is passed as a string argument (-exec bash -c '...' \;).
While this works (assuming you pass the {} value as an argument rather than embedding it in the command-line string), it is also quite inefficient, because multiple child processes are created for each input file.
(While there is a way to have find pass (typically) all input files to a (typically) single invocation of the specified external utility - namely with terminator + rather than \;, this is not an option here due to the nature of the command line passed.)
An efficient and robust[1]
implementation that minimizes the number of child processes created would look like this:
Note: I'm assuming GNU utilities here, due to use of head -n -1 and sort -h.
Also, I'm limiting find's output to files only (as opposed to directories), because wc -l only works on files.
paste <(find . -type f -exec du -h {} +) <(find . -type f -exec wc -l {} + | head -n -1) |
awk -F'\t *' 'BEGIN{OFS="\t"} {sub(" .+$", "", $3); print $1,$2,$3}' |
sort -h -t$'\t' -k1,1
Note the use of -exec ... + rather than -exec ... \;, which ensures that typically all input filenames are passed to a single invocation to the external utility (if not all filenames fit on a single command line, invocations are batched efficiently to make as few calls as possible).
wc -l {} + invariably outputs a summary line, which head -n -1 strips away, but also outputs filenames after each line count.
paste combines the lines from each command (whose respective inputs are provided by a process substitution. <(...)) into a single output stream.
The awk command then strips the extraneous filename that stems from wc from the end of each line.
Finally, the sort command sorts the result by the 1st (-k1,1) tab-separated (-t$'\t') column by human-readable numbers (-h), such as the numbers that du -h outputs (e.g., 1K).
[1] As with any line-oriented processing, filenames with embedded newlines are not supported, but I do not consider this a real-world problem.
Ok, I tried it with find/-exec as well, but the escaping is hell. With a shell function it works pretty straight forward:
#!/bin/bash
function dir
{
du=$(du -sh "$1" | awk '{print $1}')
wc=$(wc -l < "$1")
printf "%10s %10s %s\n" $du $wc "${1#./}"
}
printf "%10s %10s %s\n" "size" "lines" "name"
OIFS=$IFS; IFS=""
find . -type f -print0 | while read -r -d $'\0' f; do dir "$f"; done
IFS=$OIFS
Using basishm read it is even kind of safe by using nul terminator. The IFS is needed to avoid read to truncate trailing blanks in filenames.
BTW: $'\0' does not really work (same as '') - but it makes the intention clear.
Sample output:
size lines name
156K 708 sash
16K 64 hostname
120K 460 netstat
40K 110 fuser
644K 1555 dir/bash
28K 82 keyctl
2.3M 8067 vim
The problem is that your shell interprets the $(...), so find doesn't get them. Escaping them doesn't help, either (\$\(du -h {}\)), as they become normal parameters to the commands, not command substitution.
In order to interpret them as command substitution is to call a new shell, either directly
find . -exec bash -c 'echo $(du -h {}) $(wc -l {})' \; | sort -h
or by creating a script and calling it from find.

Bash shell script, special characters and passing arguments to curl [duplicate]

I have the following problem.
Got a file which includes certain paths/files of a FS.
These for some reason do include the whole range of special characters, like space, single/double quotes, even sometimes the Copyright ASCII.
I need to run each line of the file and pass it to another command.
What I tried so far is:
<input_file xargs -I % command %
Which was working until I got this message from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
But usinf this option did not work at all for me
xargs: argument line too long
Does anybody have a solution which does work ok with special characters.
Doesn't have to be with xargs, but I need to pass the line as it is to the command.
Many thanks in advance.
You should separate the filenames with the \0 NULL character for processing.
This can be done with
find . <args> -print0 | xargs -0
or if you must process the file with filenames, change the '\n` to '\0', e.g.
tr '\n' '\0' < filename | xargs -0 -n1 -I% echo "==%=="
the -n 1 says,
-n max-args
Use at most max-args arguments per command line.
and you should to use "%" quotes to enclosing %
The xargs -0 -n1 -I% echo "==%==" solution didn't work for me on my Mac OS X, so I found another one.
<input_with_single_quotes cat | sed "s/'/\\\'/" | xargs -I {} echo {}
This replaces the ' character with \' that works well as an input to the commands in xargs.

In bash, how to batch show the text of certain line in files?

I want to batch show the text of certain line of files in certain directory, usually this can be done with the following commands:
for file in `find ./ -name "results.txt"`;
do
sed -n '12p' < ${file};
done
In the 12th line of each file names "results.txt", there is the text I want to output.
But, I wonder that if we can use the pipeline command to do this operation. I have tried the following command:
find ./ -name "results.txt" | xargs sed -n '12p'
or
find ./ -name "results.txt" | xargs sed -n '12p' < {} \;
But neither works fine.
Could you give some advice or recommend some references, please?
All are welcome, Thanks in advice!
This should do it
find ./ -name results.txt -exec sed '12!d' {} ';'
#Steven Penny's answer is the most elegant and best-performing solution, but to shed some light on why your solution didn't work:
find ./ -name "results.txt" | xargs sed -n '12p'
causes all filenames(1) to be passed at once(2) to sed. Since sed counts lines cumulatively, across input files, only 1 line will be printed for all input files, namely line 12 from the first input file.
Keeping in mind that find's -exec action is the best solution, if you still wanted to solve this problem with xargs, you'd have to use xarg's -I option as follows, so as to ensure that sed is called once per input line (filename) (% is a self-chosen placeholder):
find ./ -name "results.txt" | xargs -I % sed -n '12q;d' %
Footnotes:
(1) with word splitting applied, which would break with paths with embedded spaces, but that's a separate issue.
(2) assuming they don't make the entire command exceed the max. length of a command line; either way, multiple filenames are passed at once.
As an aside: parsing command output with for as in your first snippet is NEVER a good idea - see http://mywiki.wooledge.org/ParsingLs and http://mywiki.wooledge.org/BashFAQ/001
Your use of xargs results in running sed with multiple file arguments. But as you can see, sed doesn't reset the record number to 1 when it starts reading a new file. For example, try running the following command against files with more than 12 lines each.
sed -n '12p' x.txt y.txt
If you want to use xargs, you might consider using awk:
find . -name 'results.txt' | xargs awk 'FNR==12'
P.S: I personally like using the for loop.

Is there a grep equivalent for find's -print0 and xargs's -0 switches?

I often want to write commands like this (in zsh, if it's relevant):
find <somebasedirectory> | \
grep stringinfilenamesIwant | \
grep -v stringinfilesnamesIdont | \
xargs dosomecommand
(or more complex combinations of greps)
In recent years find has added the -print0 switch, and xargs has added -0, which allow handling of files with spaces in the name in an elegant way by null-terminating filenames instead, allowing for this:
find <somebasedirectory> -print0 | xargs -0 dosomecommand
However, grep (at least the version I have, GNU grep 2.10 on Ubuntu), doesn't seem to have an equivalent to consume and generate null-terminated lines; it has --null, but that only seems related to using -l to output names when searching in files directly with grep.
Is there an equivalent option or combination of options I can use with grep? Alternatively, is there an easy and elegant way to express my pipe of commands simply using find's -regex, or perhaps Perl?
Use GNU Grep's --null Flag
According to the GNU Grep documentation, you can use Output Line Prefix Control to handle ASCII NUL characters the same way as find and xargs.
-Z
--null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file name. For example, ‘grep -lZ’ outputs a zero byte after each file name instead of the usual newline. This option makes the output unambiguous, even in the presence of file names containing unusual characters like newlines. This option can be used with commands like ‘find -print0’, ‘perl -0’, ‘sort -z’, and ‘xargs -0’ to process arbitrary file names, even those that contain newline characters.
Use tr from GNU Coreutils
As the OP correctly points out, this flag is most useful when handling filenames on input or output. In order to actually convert grep output to use NUL characters as line endings, you'd need to use a tool like sed or tr to transform each line of output. For example:
find /etc/passwd -print0 |
xargs -0 egrep -Z 'root|www' |
tr "\n" "\0" |
xargs -0 -n1
This pipeline will use NULs to separate filenames from find, and then convert newlines to NULs in the strings returned by egrep. This will pass NUL-terminated strings to the next command in the pipeline, which in this case is just xargs turning the output back into normal strings, but it could be anything you want.
As you are already using GNU find you can use its internal regular expression pattern matching capabilities instead of these grep, eg:
find <somebasedirectory> -regex ".*stringinfilenamesIwant.*" ! -regex ".*stringinfilesnamesIdont.*" -exec dosomecommand {} +
Use
find <somebasedirectory> -print0 | \
grep -z stringinfilenamesIwant | \
grep -zv stringinfilesnamesIdont | \
xargs -0 dosomecommand
However, the pattern may not contain newline, see bug report.
The newest version of the GNU grep source can now use -z/--null to separate the output by null characters, while it previously only worked in conjunction with -l:
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=cce2fd5520bba35cf9b264de2f1b6131304f19d2
This means that your issue is solved automatically when using the newest version.
Instead of using a pipe, you can use find's -exec with the + terminator. To chain multiple commands together, you can spawn a shell in -exec.
find ./ -type f -exec bash -c 'grep "$#" | grep -v something | xargs dosomething' -- {} +
find <somebasedirectory> -print0 | xargs -0 -I % grep something '%'

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Resources