How to compare latest two files are identical or not with shell? - shell

I wanna check whether the latest two files are different or not.
This is my code, it does not work.
#!/bin/bash
set -x -e
function test() {
ls ~/Downloads/* -t | head -n 2 | xargs cmp -s
echo $?
}
test
Thanks.

Assuming that you have GNU find and GNU sort:
#!/bin/bash
# ^^^^ - not /bin/sh, which lacks <()
{
IFS= read -r -d '' file1
IFS= read -r -d '' file2
} < <(find ~/Downloads -type f -mindepth 1 -maxdepth 1 -printf '%T# %p\0' | sort -r -n -z)
if cmp -s -- "$file1" "$file2"; then
echo "Files are identical"
else
echo "Files differ"
fi
If your operating system is MacOS X and you have GNU findutils and coreutils installed through MacPorts, homebrew or fink, you might need to replace the find with gfind and the sort with gsort to get GNU rather than BSD implementations of these tools.
Key points here:
find is asked to emit a stream in the form of [epoch-timestamp] [filename][NULL]. This is done because NUL is the only character that cannot exist in a pathname.
sort is asked to sort this stream numerically.
The first two items of the stream are read into shell variables.
Using the -- argument to cmp after options and before positional arguments ensures that filenames can never be parsed as positional arguments, even if the were to start with -.
So, why not use ls -t? Consider (as an example) what happens if you have a file created with the command touch $'hello\nworld', with a literal newline partway through its name; depending on your version of ls, it may be emitted as hello?world, hello^Mworld or hello\nworld (in any of these cases, a filename that doesn't actually exist if treated as literal and not glob-expanded), or as two lines, hello, and world. This would mess up the rest of your pipeline (and things as simple as filenames with spaces will also break xargs with default options, as will filenames with literal quotes; xargs is only truly safe when used with the argument -0 to treat content as NUL-delimited, though it's less unsafe than defaults when used with the GNU extension -d $'\n').
See also:
ParsingLs ("Why you shouldn't parse the output of ls")
BashFAQ #3 ("How can I find the latest (newest, earliest, oldest) file in a directory?")

Related

shell script to find all files in /etc with at least 7 hard links

Using standard UNIX tools (grep, awk, shell builtins, etc), I need to output any file that has at least 7 hard links in the /etc directory.
Any help with this would be appreciated.
Unfortunately, find doesn't have a predicate for this, so you end up needing to do your own filtering. Assuming you have the GNU version of find, though, it can output link count, even if it can't filter by it on its own:
#!/usr/bin/env bash
# ^^^^- NOT /bin/sh; we need the ability to tell read to stop on a NUL.
while IFS= read -r -d ' ' link_count && IFS= read -r -d '' filename; do
(( link_count >= 7 )) && printf '%q\n' "$filename"
done < <(find /etc -printf '%n %p\0')
To answer some likely questions regarding the above:
What's with the -printf '%n %p\0'?
%n prints hardlink count.
%P prints the name of the file that was found.
\0 prints a NUL character -- the only character that's guaranteed not to be part of a filename, and which is thus safe to separate them with (filenames can have newlines inside of them, so newline-separated lists of names can be misleading!)
What's with the (( link_count >= 7 )) syntax? - See arithmetic expression on the bash-hackers' wiki.
What's with the printf '%q\n' "$filename"? - It prints names in a human-readable way that will escape any characters that aren't printable (tabs, whitespaces, newlines, etc) -- and which you can copy-and-paste into a bash prompt to refer to the same file.
Why a while read loop? - See BashFAQ #1.
Why read -d ' ' and then read -d ''? - The -d argument to read tells it to stop when it sees the first character of the succeeding argument. read -d ' ' tells it to stop when it sees a space; read -d '' tells it to stop when it sees a NUL (as a 0-length C string has one character in it -- the NUL terminator).
I think the question is ill-posed, and presents the misconception that there are any files at all in /etc. /etc is a directory, and as such it does not contain any files. It contains only names, which are references to files. Perhaps a short answer to the question is as simple as:
ls -ila /etc | awk '$3 > 7'
, which will list any of the names in /etc that link to the file with 7 or more links, but there is certainly no guarantee that all of those links are themselves in /etc. I suspect the question is intended to be worded as "list any file that has a link in /etc that has at least 7 total links", in which case I would give the answer as:
for i in /etc/*; do stat -c '%h %i' "$i"; done |
awk '$1 > 7{a[$2]++} END {for (node in a) print node}'
Or, if you just want to list all of links that are in /etc, do:
for i in /etc/*; do stat -c '%h %n' "$i"; done | awk '$1 > 7'
Use find If you want to do it recursively.

Syntax for if statement in a for loop in bash

for i in $( find . -name 'x.txt' ); do; if [ grep 'vvvv' ];
then; grep 'vvvv' -A 2 $i | grep -v vvvv | grep -v '-' >> y.csv; else
grep 0 $i >> y.csv; fi; done
What might be wrong with this?
Thanks!
A ; is not permitted after do.
This is automatically detected by http://shellcheck.net/
That said, what you probably want is something more like:
while IFS= read -r -d '' i; do
if grep -q -e vvvv -- "$i"; then
grep -e 'vvvv' -A 2 -- "$i" | egrep -v -e '(vvvv|-)'
else
grep 0 -- "$i"
fi
done < <(find . -name 'x.txt' -print0) >y.csv
Note:
Using find -print0 and IFS= read -r -d '' ensures that all possible filenames (including filenames containing spaces, newlines, etc) can be handled correctly. See BashFAQ #1 for more background on this idiom.
if grep ... should be used if you want if to check the output of grep. Making it if [ grep ... ] means you're passing grep as an argument to the test command, not running it as a command itself.
We open y.csv only once for the entire loop, rather than re-opening the file over and over, only to write a single line (or short number of lines) and close it.
The argument -- should be used to separate options from positional arguments if you don't control those positional arguments.
When - is passed to grep as a string to search for, it should be preceded by -e. That said, in the present case, we can combine both grep -v invocations and avoid the need altogether.
Expansions should always be quoted. That is, "$i", not $i. Otherwise, the values are split on whitespace, and each piece generated is individually evaluated as a glob, preventing correct handling of filenames modified by either of these operations.

Human-readable filesize and line count

I want a bash command that will return a table, where each row is the human-readable filesize, number of lines, and filename. The table should be sorted by filesize.
I've been trying to do this using a combination of du -hs, wc -l, and sort -h, and find.
Here's where I'm at:
find . -exec echo $(du -h {}) $(wc -l {}) \; | sort -h
Your approach fell short not only because the shell expanded your command substitutions ($(...)) up front, but more fundamentally because you cannot pass shell command lines directly to find:
find's -exec action can only invoke external utilities with literal arguments - the only non-literal argument supported is the {} representing the filename(s) at hand.
choroba's answer fixes your immediate problem by invoking a separate shell instance in each iteration, to which the shell command to execute is passed as a string argument (-exec bash -c '...' \;).
While this works (assuming you pass the {} value as an argument rather than embedding it in the command-line string), it is also quite inefficient, because multiple child processes are created for each input file.
(While there is a way to have find pass (typically) all input files to a (typically) single invocation of the specified external utility - namely with terminator + rather than \;, this is not an option here due to the nature of the command line passed.)
An efficient and robust[1]
implementation that minimizes the number of child processes created would look like this:
Note: I'm assuming GNU utilities here, due to use of head -n -1 and sort -h.
Also, I'm limiting find's output to files only (as opposed to directories), because wc -l only works on files.
paste <(find . -type f -exec du -h {} +) <(find . -type f -exec wc -l {} + | head -n -1) |
awk -F'\t *' 'BEGIN{OFS="\t"} {sub(" .+$", "", $3); print $1,$2,$3}' |
sort -h -t$'\t' -k1,1
Note the use of -exec ... + rather than -exec ... \;, which ensures that typically all input filenames are passed to a single invocation to the external utility (if not all filenames fit on a single command line, invocations are batched efficiently to make as few calls as possible).
wc -l {} + invariably outputs a summary line, which head -n -1 strips away, but also outputs filenames after each line count.
paste combines the lines from each command (whose respective inputs are provided by a process substitution. <(...)) into a single output stream.
The awk command then strips the extraneous filename that stems from wc from the end of each line.
Finally, the sort command sorts the result by the 1st (-k1,1) tab-separated (-t$'\t') column by human-readable numbers (-h), such as the numbers that du -h outputs (e.g., 1K).
[1] As with any line-oriented processing, filenames with embedded newlines are not supported, but I do not consider this a real-world problem.
Ok, I tried it with find/-exec as well, but the escaping is hell. With a shell function it works pretty straight forward:
#!/bin/bash
function dir
{
du=$(du -sh "$1" | awk '{print $1}')
wc=$(wc -l < "$1")
printf "%10s %10s %s\n" $du $wc "${1#./}"
}
printf "%10s %10s %s\n" "size" "lines" "name"
OIFS=$IFS; IFS=""
find . -type f -print0 | while read -r -d $'\0' f; do dir "$f"; done
IFS=$OIFS
Using basishm read it is even kind of safe by using nul terminator. The IFS is needed to avoid read to truncate trailing blanks in filenames.
BTW: $'\0' does not really work (same as '') - but it makes the intention clear.
Sample output:
size lines name
156K 708 sash
16K 64 hostname
120K 460 netstat
40K 110 fuser
644K 1555 dir/bash
28K 82 keyctl
2.3M 8067 vim
The problem is that your shell interprets the $(...), so find doesn't get them. Escaping them doesn't help, either (\$\(du -h {}\)), as they become normal parameters to the commands, not command substitution.
In order to interpret them as command substitution is to call a new shell, either directly
find . -exec bash -c 'echo $(du -h {}) $(wc -l {})' \; | sort -h
or by creating a script and calling it from find.

bash script create file in dir containing (multiple) file types

I want to create a file (containing the dir name) in any sub dir of e.g music that has at least one but maybe more .mp3 file. I want one file created no matter if there is one or more .mp3 in that dir and the dir can have whitespace.. I tried something like this: for i in $(find . -name "*.mp3" -printf "%h\n"|sort -u); do echo "$i" ; done
This breaks the path into 2 lines where the whitespace was so:
./directory one
outputs as:
./directory
one
The construct $( ... ) in your
for x in $(find ... | ... | ... ) ; do ... ; done
executes whatever is in $( ... ) and passes the newline separated output that you would see in the terminal if you had executed the ... command from the shell prompt to the for construct as a long list of names separated by blanks, as in
% ls -d1 b*
bianco nodi.pdf
bin
b.txt
% echo $(ls -d1 b*)
bianco nodi.pdf bin b.txt
%
now, the for cycle assign to i the first item in the list, in my example bianco and of course it's not what you want...
This situation is dealt with this idiom, in which the shell reads ONE WHOLE LINE at a time
% ls -d1 b* | while read i ; do echo "$i" ; ... ; done
in your case
find . -name "*.mp3" -printf "%h\n" | sort -u | while read i ; do
echo "$i"
done
hth, ciao
Edit
My anser above catches the most common case of blanks inside the filename, but it still fails if one has blanks at the beginning or the end of the filename and it fails also if there are newlines embedded in the filename.
Hence, I'm modifying my answer quite a bit, according to the advice from BroSlow (see comments below).
find . -name "*.mp3" -printf "%h\0" | \
sort -uz | while IFS= read -r -d '' i ; do
...
done
Key points
find's printf now separates filenames with a NUL.
sort, by the -z option, splits elements to be sorted on NULs rather than on newlines.
IFS= stops completely the shell habit of splitting on generic whitespace.
read's option -d (this is a bashism) means that the input is split on a particular character (by default, a newline).
Here I have -d '' that bash sees as specifying NUL, where BroSlow had $'\0' that bash expands, by the rules of parameter expansion, to '' but may be clearer as one can see an explicit reference to the NUL character.
I like to close with "Thank you, BroSlow".

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Resources