How to wrap output lines in quotes in bash? - bash

Essentially I want the inverse operation performed in this question.
I'm running a search, looking for files that have Windows line endings (\r\n) as I want to remove them.
$ grep -URl ^M .
Some of the returned files have spaces in their names:
./file name 1.txt
./file name 2.txt
In order to pass this on to another tool via xargs, I need to quote the lines. How can I transform to this output instead:
"./file name 1.txt"
"./file name 2.txt"

BSD grep provides a --null option to print names followed by a null byte (instead of a newline).
GNU grep provides a -Z or --null option with the same semantics.
Both BSD and GNU xargs take a -0 option to indicate that file names are separated by null bytes.
Hence:
grep -URl --null ^M . | xargs -0 ...

To avoid the space problem I'd use new line character as separator for xargs with the -d option:
xargs -d '\n' ...
So my solution would be:
grep -URl ^M . | xargs -d '\n' rm

Related

Handle files with space in filename and output file names

I need to write a Bash script that achieve the following goals:
1) move the newest n pdf files from folder 1 to folder 2;
2) correctly handles files that could have spaces in file names;
3) output each file name in a specific position in a text file. (In my actual usage, I will use sed to put the file names in a specific position of an existing file.)
I tried to make an array of filenames and then move them and do text output in a loop. However, the following array cannot handle files with spaces in filename:
pdfs=($(find -name "$DOWNLOADS/*.pdf" -print0 | xargs -0 ls -1 -t | head -n$NUM))
Suppose a file has name "Filename with Space". What I get from the above array will have "with" and "Space" in separate array entries.
I am not sure how to avoid these words in the same filename being treated separately.
Can someone help me out?
Thanks!
-------------Update------------
Sorry for being vague on the third point as I thought I might be able to figure that out after achieving the first and second goals.
Basically, it is a text file that have a line start with "%comment" near the end and I will need to insert the filenames before that line in the format "file=PATH".
The PATH is the folder 2 that I have my pdfs moved to.
You can achieve this using mapfile in conjunction with gnu versions of find | sort | cut | head that have options to operate on NUL terminated filenames:
mapfile -d '' -t pdfs < <(find "$DOWNLOADS/*.pdf" -name 'file*' -printf '%T#:%p\0' |
sort -z -t : -rnk1 | cut -z -d : -f2- | head -z -n $NUM)
Commands used are:
mapfile -d '': To read array with NUL as delimiter
find: outputs each file's modification stamp in EPOCH + ":" + filename + NUL byte
sort: sorts reverse numerically on 1st field
cut: removes 1st field from output
head: outputs only first $NUM filenames
find downloads -name "*.pdf" -printf "%T# %p\0" |
sort -z -t' ' -k1 -n |
cut -z -d' ' -f2- |
tail -z -n 3
find all *.pdf files in downloads
for each file print it's modifition date %T with the format specifier # that means seconds since epoch with fractional part, then print space, filename and separate with \0
Sort the null separated stream using space as field separator using only first field using numerical sort
Remove the first field from the stream, ie. creation date, leaving only filenames.
Get the count of the newest files, in this example 3 newest files, by using tail. We could also do reverse sort and use head, no difference.
Don't use ls in scripts. ls is for nice formatted output. You could do xargs -0 stat --printf "%Y %n\0" which would basically move your script forward, as ls isn't meant to be used for scripts. Just that I couldn't make stat output fractional part of creation date.
As for the second part, we need to save the null delimetered list to a file
find downloads ........ >"$tmp"
and then:
str='%comment'
{
grep -B$((2**32)) -x "$str" "$out" | grep -v "$str"
# I don't know what you expect to do with newlines in filenames, but I guess you don't have those
cat "$tmp" | sed -z 's/^/file=/' | sed 's/\x0/\n/g'
grep -A$((2**32)) -x "$str" "$out"
} | sponge "$out"
where output is the output file name
assuming output file name is stored in variable "$out"
filter all lines before the %comment and remove the line %comment itself from the file
output each filename with file= on the beginning. I also substituted zeros for newlines.
the filter all lines after %comment including %comment itself
write the output for outfile. Remember to use a temporary file.
Don't use pdf=$(...) on null separated inputs. You can use mapfile to store that to an array, as other answers provided.
Then to move the files, do smth like
<"$tmp" xargs -0 -i mv {} "$outdir"
or faster, with a single move:
{ cat <"$tmp"; printf "%s\0" "$outdir"; } | xargs -0 mv
or alternatively:
<"$tmp" xargs -0 sh -c 'outdir="$1"; shift; mv "$#" "$outdir"' -- "$outdir"
Live example at turorialspoint.
I suppose following code will be close to what you want:
IFS=$'\n' pdfs=($(find -name "$DOWNLOADS/*.pdf" -print0 | xargs -0 -I ls -lt "{}" | tail -n +1 | head -n$NUM))
Then you can access the output through ${pdfs[0]}, ${pdfs[1]}, ...
Explanations
IFS=$'\n' makes the following line to be split only with "\n".
-I option for xargs tells xargs to substitute {} with filenames so it can be quoted as "{}".
tail -n +1 is a trick to suppress an error message saying "xargs: 'ls' terminated by signal 13".
Hope this helps.
Bash v4 has an option globstar, after enabling this option, we can use ** to match zero or more subdirectories.
mapfile is a built-in command, which is used for reading lines into an indexed array variable. -t option removes a trailing newline.
shopt -s globstar
mapfile -t pdffiles < <(ls -t1 **/*.pdf | head -n"$NUM")
typeset -p pdffiles
for f in "${pdffiles[#]}"; do
echo "==="
mv "${f}" /dest/path
sed "/^%comment/i${f}=/dest/path" a-text-file.txt
done

Bash shell script, special characters and passing arguments to curl [duplicate]

I have the following problem.
Got a file which includes certain paths/files of a FS.
These for some reason do include the whole range of special characters, like space, single/double quotes, even sometimes the Copyright ASCII.
I need to run each line of the file and pass it to another command.
What I tried so far is:
<input_file xargs -I % command %
Which was working until I got this message from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
But usinf this option did not work at all for me
xargs: argument line too long
Does anybody have a solution which does work ok with special characters.
Doesn't have to be with xargs, but I need to pass the line as it is to the command.
Many thanks in advance.
You should separate the filenames with the \0 NULL character for processing.
This can be done with
find . <args> -print0 | xargs -0
or if you must process the file with filenames, change the '\n` to '\0', e.g.
tr '\n' '\0' < filename | xargs -0 -n1 -I% echo "==%=="
the -n 1 says,
-n max-args
Use at most max-args arguments per command line.
and you should to use "%" quotes to enclosing %
The xargs -0 -n1 -I% echo "==%==" solution didn't work for me on my Mac OS X, so I found another one.
<input_with_single_quotes cat | sed "s/'/\\\'/" | xargs -I {} echo {}
This replaces the ' character with \' that works well as an input to the commands in xargs.

Find all files with text "example.html" and replace with "example.php" works only if no spaces are in file name

I have used the following to do a recursive find and replace within files, to update hrefs to point to a new page correctly:
#!/bin/bash
oldstring='features.html'
newstring='features.php'
grep -rl $oldstring public_html/ | xargs sed -i s#"$oldstring"#"$newstring"#g
It worked, except for a few files that had spaces in the name.
This isn't an issue, as the files with spaces in their names are backups/duplicates I created while testing new things. But I'd like to understand how I could properly pass paths with spaces to the sed command, in this query. Would anybody know how this could be corrected in this "one liner"?
find public_html/ -type f -exec grep -q "$oldstring" {} \; -print0 |
xargs -0 sed -i '' s#"$oldstring"#"$newstring"#g
find will print all the filenames for which the grep command is successful. I use the -print0 option to print them with the NUL character as the delimiter. This goes with the -0 option to xargs, which treats NUL as the argument delimiter on its input, rather than breaking the input at whitespace.
Actually, you don't even need grep and xargs, just run sed from find:
find public_html/ -type f -exec sed -i '' s#"$oldstring"#"$newstring"#g {} +
Here's a lazy approach:
grep -rl $oldstring public_html/ | xargs -d'\n' sed -i "s#$oldstring#$newstring#g"
By default, xargs uses whitespace as the delimiter of arguments coming from the input. So for example if you have two files, a b and c, then it will execute the command:
sed -i 's/.../.../' a b c
By telling xargs explicitly to use newline as the delimiter with -d '\n' it will correctly handle a b as a single argument and quote it when running the command:
sed -i 's/.../.../' 'a b' c
I called a lazy approach, because as #Barmar pointed out, this won't work if your files have newline characters in their names. If you need to take care of such cases, then use #Barmar's method with find ... -print0 and xargs -0 ...
PS: I also changed s#"$oldstring"#"$newstring"#g to "s#$oldstring#$newstring#g", which is equivalent, but more readable.

Is there a grep equivalent for find's -print0 and xargs's -0 switches?

I often want to write commands like this (in zsh, if it's relevant):
find <somebasedirectory> | \
grep stringinfilenamesIwant | \
grep -v stringinfilesnamesIdont | \
xargs dosomecommand
(or more complex combinations of greps)
In recent years find has added the -print0 switch, and xargs has added -0, which allow handling of files with spaces in the name in an elegant way by null-terminating filenames instead, allowing for this:
find <somebasedirectory> -print0 | xargs -0 dosomecommand
However, grep (at least the version I have, GNU grep 2.10 on Ubuntu), doesn't seem to have an equivalent to consume and generate null-terminated lines; it has --null, but that only seems related to using -l to output names when searching in files directly with grep.
Is there an equivalent option or combination of options I can use with grep? Alternatively, is there an easy and elegant way to express my pipe of commands simply using find's -regex, or perhaps Perl?
Use GNU Grep's --null Flag
According to the GNU Grep documentation, you can use Output Line Prefix Control to handle ASCII NUL characters the same way as find and xargs.
-Z
--null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file name. For example, ‘grep -lZ’ outputs a zero byte after each file name instead of the usual newline. This option makes the output unambiguous, even in the presence of file names containing unusual characters like newlines. This option can be used with commands like ‘find -print0’, ‘perl -0’, ‘sort -z’, and ‘xargs -0’ to process arbitrary file names, even those that contain newline characters.
Use tr from GNU Coreutils
As the OP correctly points out, this flag is most useful when handling filenames on input or output. In order to actually convert grep output to use NUL characters as line endings, you'd need to use a tool like sed or tr to transform each line of output. For example:
find /etc/passwd -print0 |
xargs -0 egrep -Z 'root|www' |
tr "\n" "\0" |
xargs -0 -n1
This pipeline will use NULs to separate filenames from find, and then convert newlines to NULs in the strings returned by egrep. This will pass NUL-terminated strings to the next command in the pipeline, which in this case is just xargs turning the output back into normal strings, but it could be anything you want.
As you are already using GNU find you can use its internal regular expression pattern matching capabilities instead of these grep, eg:
find <somebasedirectory> -regex ".*stringinfilenamesIwant.*" ! -regex ".*stringinfilesnamesIdont.*" -exec dosomecommand {} +
Use
find <somebasedirectory> -print0 | \
grep -z stringinfilenamesIwant | \
grep -zv stringinfilesnamesIdont | \
xargs -0 dosomecommand
However, the pattern may not contain newline, see bug report.
The newest version of the GNU grep source can now use -z/--null to separate the output by null characters, while it previously only worked in conjunction with -l:
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=cce2fd5520bba35cf9b264de2f1b6131304f19d2
This means that your issue is solved automatically when using the newest version.
Instead of using a pipe, you can use find's -exec with the + terminator. To chain multiple commands together, you can spawn a shell in -exec.
find ./ -type f -exec bash -c 'grep "$#" | grep -v something | xargs dosomething' -- {} +
find <somebasedirectory> -print0 | xargs -0 -I % grep something '%'

Bash: escape characters in backticks

I'm trying to escape characters within backticks in my bash command, mainly to handle spaces in filenames which cause my command to fail.
The command I have so far is:
grep -Li badword `grep -lr goodword *`
This command should result in a list of files that do not contain the word "badword" but do contain "goodword".
Your approach, even if you get the escaping right, will run into problems when the number of files output by the goodword grep reaches the limits on command-line length. It is better to pipe the output of the first grep onto a second grep, like this
grep -lr -- goodword * | xargs grep -Li -- badword
This will correctly handle files with spaces in them, but it will fail if a file name has a newline in it. At least GNU grep and xargs support separating the file names with NUL bytes, like this
grep -lrZ -- goodword * | xargs -0 grep -Li -- badword
EDIT: Added double dashes -- to grep invocations to avoid the case when some file names start with - and would be interpreted by grep as additional options.
How about rewrite it to:
grep -lr goodword * | grep -Li badword

Resources