using find sort and wc -l in the one command - bash

This is how find files using find and show the number of lines in each file
$ find ./ -type f -name "data*.csv" -exec wc -l {} +
380723 ./data_2016-07-07-10-41-13.csv
369869 ./data_2016-07-11-10-42-01.csv
363941 ./data_2016-07-08-10-41-50.csv
378981 ./data_2016-07-12-10-41-28.csv
1493514 total
how do I sort the results by file name? Below is my attempt, but it is not working.
$ find ./ -type f -name "data*.csv" -exec wc -l {} + | sort
1493514 total
363941 ./data_2016-07-08-10-41-50.csv
369869 ./data_2016-07-11-10-42-01.csv
378981 ./data_2016-07-12-10-41-28.csv
380723 ./data_2016-07-07-10-41-13.csv
$ find ./ -type f -name "data*.csv" | sort -exec wc -l {} +
sort: invalid option -- 'e'
Try `sort --help' for more information.
$ find ./ -type f -name "data*.csv" -exec sort | wc -l {} +
find: wc: {}missing argument to `-exec'
: No such file or directory
wc: +: No such file or directory
0 total
$
Can someone offer a solution and correct me so I understand it better?
EDIT1
from man sort
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syntax below
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginā€
ning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.

Ismail's suggestion of using sort -k is correct. However, I'm often too lazy to learn (or relearn) how -k works, so here's a cheap solution:
find . -name 'data*.csv' -print0 | sort -z | xargs -0 wc -l
Edit: after some experimentation, I did figure out how -k works:
find . -name 'data*.csv' -exec wc -l {} + | sort -k 2

Related

Bash find: exec in reverse oder

I am iterating over files like so:
find $directory -type f -exec codesign {} \;
Now the problem here is that files on a higher hierarchy are signed first.
Is there a way to iterate over a directory tree and handle the deepest files first?
So that
/My/path/to/app/bin
is handled before
/My/path/mainbin
Yes, just use -depth:
-depth
The primary shall always evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated.
For example:
$ mkdir -p top/a/b/c/d/e/f/g/h
$ find top -print
top
top/a
top/a/b
top/a/b/c
top/a/b/c/d
top/a/b/c/d/e
top/a/b/c/d/e/f
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f/g/h
$ find top -depth -print
top/a/b/c/d/e/f/g/h
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f
top/a/b/c/d/e
top/a/b/c/d
top/a/b/c
top/a/b
top/a
top
Note that at a particular level, ordering is still arbitrary.
Using GNU utilities, and decorate-sort-undecorate pattern (aka Schwartzian transform):
find . -type f -printf '%d %p\0' |
sort -znr |
sed -z 's/[0-9]* //' |
xargs -0 -I# echo codesign #
Drop the echo if the output looks ok.
Using find's -depth option as my other answer, or naive sort as some others, only ensures that sub-directories of a directory are processed before the directory itself, but not that the deepest level is processed first.
For example:
$ mkdir -p top/a/b/d/f/h top/a/c/e/g
$ find top -depth -print
top/a/c/e/g
top/a/c/e
top/a/c
top/a/b/d/f/h
top/a/b/d/f
top/a/b/d
top/a/b
top/a
top
For overall deepest level to be processed first, the ordering should be something like:
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
To determine this ordering, the entire list must be known, and then the number of levels (ie. /) of each path counted to enable ranking.
A simple-ish Perl script (assigned to a shell function for this example) to do this ordering is:
$ dsort(){
perl -ne '
BEGIN { $/ = "\0" } # null-delimited i/o
$fname[$.] = $_;
$depth[$.] = tr|/||;
END {
print
map { $fname[$_] }
sort { $depth[$b] <=> $depth[$a] }
keys #fname
}
'
}
Then:
$ find top -print0 | dsort | xargs -0 -I# echo #
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
How about sorting the output of find in descending order:
while IFS= read -d "" -r f; do
codesign "$f"
done < <(find "$directory" -type f -print0 | sort -zr)
<(command ..) is a process substitution which feeds the output
of the command to the read command in while loop via the redirect.
-print0, sort -z and read -d "" combo uses a null character
as a file delimiter. It is useful to protect filenames which include
special characters such as whitespace.
I don't know if there is a native way in find, but you may pipe the output of it into a loop and process it line by line as you wish this way:
find . | while read file; do echo filename: "$file"; done
In your case, if you are happy just reversing the output of find, you may go with something like:
find $directory -type f | tac | while read file; do codesign "$file"; done

Filter folders that do not contain any audio files with bash

Given a root folder, how do I filter down subfolders that do not contain any audio files (mp3, wav and flac)? Do I need to set a variable like
folders = find /parentfolder/ -type d
and then pass some expression on ${folders} or is there a one-liner for this?
All the subdirectories of . (we write that into a file):
find . -type d | sort > all_dirs.txt
All subdirectories that do contain an mp3 file (goes into another file):
find . -name "*.mp3" | xargs dirname | sort | uniq > music_dirs.txt
And this is the lines that are only contained in the first file but not the second:
diff --new-line-format="" --unchanged-line-format="" all_dirs.txt music_dirs.txt
If you think oneliners are cool and you are working in bash, here it is a bit more condensed:
diff --new-line-format="" --unchanged-line-format="" <(find . -type d | sort) <(find . -name "*.mp3" | xargs dirname | sort | uniq)

How to find the sub-directories using find

I'm using this below command to get the sub-directories inside an array #handoff of a perl code.
chomp(#handoff = `find * -maxdepth 0 -type d -name "18????_????" | sort -u | tail -2`);
I'm getting the error as
find: unknown predicate `-lrt'
If I try the same command in terminal directly, I'm able to get the sub-directories. Please suggest me some solution.
No need to call an external program to find sub-directories:
opendir(my $dh, '.') || die "Can't opendir '.': $!";
my #handoff = grep { /^18.{4}_.{4}$/ && -d $_ } readdir($dh);
closedir $dh;
print join(' ', #handoff), "\n"
find expects the path to search as first argument, hence :
find * -maxdepth 0 -type d -name "18????_????" | sort -u | tail -2
should be :
find . -maxdepth 1 -type d -name "18????_????" | sort -u | tail -2
(assuming that you want to search the current path - else replace the . with the path to search).
But bottom line, as you are using perl already, why use an external command like find ?
Here is another solution using module Path::Iterator::Rule.
use Path::Iterator::Rule;
my #handoffs = Path::Iterator::Rule
->new
->directory # only directories (not files)
->max_depth(1) # do not recurse
->name("18????_????") # match directory name (glob or regex)
->all(".") # search the current path
;

list subdirectories recursively

I want to write a shell script so as to recursively list all different subdirectories which contain NEW subdirectory (NEW is a fixed name).
Dir1/Subdir1/Subsub1/NEW/1.jpg
Dir1/Subdir1/Subsub1/NEW/2.jpg
Dir1/Subdir1/Subsub2/NEW/3.jpg
Dir1/Subdir1/Subsub2/NEW/4.jpg
Dir1/Subdir2/Subsub3/NEW/5.jpg
Dir1/Subdir2/Subsub4/NEW/6.jpg
I want to get
Dir1/Subdir1/Subsub1
Dir1/Subdir1/Subsub2
Dir1/Subdir2/Subsub3
Dir1/Subdir2/Subsub4
How can I do that?
find . -type d -name NEW | sed 's|/NEW$||'
--- EDIT ---
for your comment, sed does not have a -print0. There are various ways of doing this (most of which are wrong). One possible solution would be:
find . -type d -name NEW -print0 | \
while IFS= read -rd '' subdirNew; do \
subdir=$(sed 's|/NEW$||' <<< "$subdirNew"); \
echo "$subdir"; \
done
which should be tolerant of spaces and newlines in the filename
ls -R will list things recursively.
of find . | grep "/NEW/" should give you the type of list you are looking for.
You could try this:
find . -type d -name "NEW" -exec dirname {} \;

Unix find and sort with wildcards

let's suppose i've a folder with some xml file:
a-as-jdbc.xml
z-as-jdbc.xml
fa-jdbc.xml
config.xml
router.xml
paster.xml
cleardown.xml
I would like to pipe the find with some kind of sort command using wildcards and my custom sorting logic.
That because I want that the order of the filename returned will be always the same.
For example, i want always:
1 element: "config.xml"
2 element: "*.as-jdbc.xml"
3 element: "-jdbc.xml" (excluding pattern ".as-jdbc")
4 element: "router.xml"
and so on...
How can i achieve this? Any idea?
I did it in the past using arrays but don't remember exactly how i did now...
Thanks
Not too pretty but :
rules.txt:
config\.xml
.*\.as\-jdbc\.xml
^[^-]*\-jdbc\.xml
router\.xml
Commands:
$ find /path/to/dir > /tmp/result.txt
$ cat rules.txt | xargs -I{} grep -E "{}" /tmp/result.txt
config.xml
a-as-jdbc.xml
z-as-jdbc.xml
fa-jdbc.xml
router.xml
You will have to add the two others patterns needed for paster and cleardown
It is certainly easier to do this in a higher level language like Python.
This is not a sorting problem; it is an ordering problem. As such, you cannot use the Unix sort command.
Inevitably, you will need to make 4 passes anyway so I would do either:
$ find /tmp/alex -name config.xml ; \
> find /tmp/alex -name *-as-jdbc.xml ; \
> find /tmp/alex \( \! -name *-as-jdbc.xml -a -name *-jdbc.xml \) ; \
> find /tmp/alex \( -type f -a \! -name config.xml -a \! -name *-jdbc.xml \)
/tmp/alex/config.xml
/tmp/alex/a-as-jdbc.xml
/tmp/alex/z-as-jdbc.xml
/tmp/alex/fa-jdbc.xml
/tmp/alex/cleardown.xml
/tmp/alex/paster.xml
/tmp/alex/router.xml
Or use grep:
$ find /tmp/alex -type f > /tmp/aaa
$ grep /config.xml /tmp/aaa ; \
> grep -- -as-jdbc.xml /tmp/aaa ; \
> grep -- -jdbc.xml /tmp/aaa | grep -v -- -as-jdbc.xml ; \
> egrep -v '(?:config.xml|-jdbc.xml)' /tmp/aaa
/tmp/alex/config.xml
/tmp/alex/a-as-jdbc.xml
/tmp/alex/z-as-jdbc.xml
/tmp/alex/fa-jdbc.xml
/tmp/alex/cleardown.xml
/tmp/alex/paster.xml
/tmp/alex/router.xml
I recommend just putting your find commands in the order you want:
$ find . -name config.xml; \
> find . -name \*.as-jdbc.xm; \
> find . -name \*-jdbc.xml -a ! -name \*as-jdbc.xml; \
> find . -name router.xml; \
> ... and so on.

Resources