How to find the sub-directories using find - bash

I'm using this below command to get the sub-directories inside an array #handoff of a perl code.
chomp(#handoff = `find * -maxdepth 0 -type d -name "18????_????" | sort -u | tail -2`);
I'm getting the error as
find: unknown predicate `-lrt'
If I try the same command in terminal directly, I'm able to get the sub-directories. Please suggest me some solution.

No need to call an external program to find sub-directories:
opendir(my $dh, '.') || die "Can't opendir '.': $!";
my #handoff = grep { /^18.{4}_.{4}$/ && -d $_ } readdir($dh);
closedir $dh;
print join(' ', #handoff), "\n"

find expects the path to search as first argument, hence :
find * -maxdepth 0 -type d -name "18????_????" | sort -u | tail -2
should be :
find . -maxdepth 1 -type d -name "18????_????" | sort -u | tail -2
(assuming that you want to search the current path - else replace the . with the path to search).
But bottom line, as you are using perl already, why use an external command like find ?
Here is another solution using module Path::Iterator::Rule.
use Path::Iterator::Rule;
my #handoffs = Path::Iterator::Rule
->new
->directory # only directories (not files)
->max_depth(1) # do not recurse
->name("18????_????") # match directory name (glob or regex)
->all(".") # search the current path
;

Related

Bash find: exec in reverse oder

I am iterating over files like so:
find $directory -type f -exec codesign {} \;
Now the problem here is that files on a higher hierarchy are signed first.
Is there a way to iterate over a directory tree and handle the deepest files first?
So that
/My/path/to/app/bin
is handled before
/My/path/mainbin
Yes, just use -depth:
-depth
The primary shall always evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated.
For example:
$ mkdir -p top/a/b/c/d/e/f/g/h
$ find top -print
top
top/a
top/a/b
top/a/b/c
top/a/b/c/d
top/a/b/c/d/e
top/a/b/c/d/e/f
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f/g/h
$ find top -depth -print
top/a/b/c/d/e/f/g/h
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f
top/a/b/c/d/e
top/a/b/c/d
top/a/b/c
top/a/b
top/a
top
Note that at a particular level, ordering is still arbitrary.
Using GNU utilities, and decorate-sort-undecorate pattern (aka Schwartzian transform):
find . -type f -printf '%d %p\0' |
sort -znr |
sed -z 's/[0-9]* //' |
xargs -0 -I# echo codesign #
Drop the echo if the output looks ok.
Using find's -depth option as my other answer, or naive sort as some others, only ensures that sub-directories of a directory are processed before the directory itself, but not that the deepest level is processed first.
For example:
$ mkdir -p top/a/b/d/f/h top/a/c/e/g
$ find top -depth -print
top/a/c/e/g
top/a/c/e
top/a/c
top/a/b/d/f/h
top/a/b/d/f
top/a/b/d
top/a/b
top/a
top
For overall deepest level to be processed first, the ordering should be something like:
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
To determine this ordering, the entire list must be known, and then the number of levels (ie. /) of each path counted to enable ranking.
A simple-ish Perl script (assigned to a shell function for this example) to do this ordering is:
$ dsort(){
perl -ne '
BEGIN { $/ = "\0" } # null-delimited i/o
$fname[$.] = $_;
$depth[$.] = tr|/||;
END {
print
map { $fname[$_] }
sort { $depth[$b] <=> $depth[$a] }
keys #fname
}
'
}
Then:
$ find top -print0 | dsort | xargs -0 -I# echo #
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
How about sorting the output of find in descending order:
while IFS= read -d "" -r f; do
codesign "$f"
done < <(find "$directory" -type f -print0 | sort -zr)
<(command ..) is a process substitution which feeds the output
of the command to the read command in while loop via the redirect.
-print0, sort -z and read -d "" combo uses a null character
as a file delimiter. It is useful to protect filenames which include
special characters such as whitespace.
I don't know if there is a native way in find, but you may pipe the output of it into a loop and process it line by line as you wish this way:
find . | while read file; do echo filename: "$file"; done
In your case, if you are happy just reversing the output of find, you may go with something like:
find $directory -type f | tac | while read file; do codesign "$file"; done

Calling external command in find and using pipe

I am wondering if there is a way to search for all the files from a certain directory including subdirectories using a find command on AIX 6.x, before calling an external command (e.g. hlcat) to display/convert them into a readable format, which can then be piped through a grep command to find a pattern instead of using loops in the shell?
e.g. find . -type f -name “*.hl7” -exec hlcat {} | grep -l “pattern” \;
The above command would not work and I have to use a while loop to display the content and search for the pattern as follows:
find . -type f -name “*.hl7” -print | while read file; do
hlcat $file | grep -l “pattern”;
done
At the same time, these HL7 files have been renamed with round brackets which prevent them from being open without having to include double quotes around the file name.
e.g. hlcat (patient) filename.hl7 will fail to open.
hlcat “(patient) filename.hl7” will work.
In short, I am looking for a clean concise one-liner approach within the find command and view and search their content these HL7 files with round bracket names.
Many thanks,
George
P.S. HL7 raw data is made up of one continuous line and is not readable unless it is converted into a workable reading format using tools such as hlcat.
in
Update: The easy way
find . -type f -name '*.hl7' -exec grep -iEl 'Barry|Jolene' {} +
note: You may get some false positives though. See below for a targeted search.
Searching for a first name in a bunch of HL7v2 files:
1. Looking into the HL7v2 file format
Example of HL7v2 PID segment:
PID|||56782445^^^UAReg^PI||KLEINSAMPLE^BARRY^Q^JR||19620910|M|||
PID Segment decomposition:
Seq
NAME
HHIC USE
LEN
0
PID keyword
Segment Type
3
3
Patient ID
Medical Record Num
250
5
Patient Name
Last^First^Middle
250
7
Date/Time Of Birth
YYYYMMDD
26
8
Sex
F, M, or U
1
2. Writing targeted searches
With grep (AIX):
find . -type f -name '*.hl7' -exec grep -iEl '^PID\|([^|]*\|){4}[^^|]*\^(Barry|Jolene)\^' {} +
With awk:
find . -type f -name '*.hl7' -exec awk -v firstname='^(Barry|Jolene)$' '
BEGIN { FS="|" }
FNR == 1 { if( found ) print filename; found = 0; filename = FILENAME }
$1 == "PID" { split($6, name, "^"); if (toupper(name[2]) ~ toupper(firstname)) { found = 1 } }
END { if ( found ) print filename }
' {} +
remark: The good part about this awk solution is that you pass the first name regexp as an argument. This solution is easily extendable, for example for searching the last name.

Script for printing out file names and their number of appearance starting from a given folder

I need to write a shell script, which starts with a given folder name as an argument will print out the names of folder and files in it and how many times does each name appear in the given folder.
edit I need to check only their names, without taking into consideration the file extensions.
#!/bin/bash
folder="$1"
for f in "$folder"
do
echo "$f"
done
And I would expect to see something like this (if i have 3 files with the same name and different extension like x.html, x.css, x.sh and so on, in a directory called dir)
x
3 times
after executing the script with dir (the name of the directory) as a parameter.
The find command already does most of this for you.
find . -printf "%f\n" |
sort | uniq -c
This will not work correctly if you have files whose names contain a newline.
If your find doesn't support -printf, maybe try
find . -exec basename {} \; |
sort | uniq -c
To restrict to just file names or directory names, add -type f or -type d, respectively, before the action (-exec or -printf).
If you genuinely want to remove extensions, try
find .... whatever ... |
sed 's%\.[^./]*$%%' |
sort | uniq -c
Can you try this,
#!/bin/bash
IFS=$'\n' array=($(ls))
iter=0;
for file in ${array[*]}; do
filename=$(basename -- "$file")
extension="${filename##*.}"
filename="${filename%.*}"
filenamearray[$iter]=$filename
iter=$((iter+1))
done
for filename in ${filenamearray[#]}; do
echo $filename;
grep -o $filename <<< "${filenamearray[#]}" | wc -l
done
You can try with find and awk :
find . -type f -print0 |
awk '
BEGIN {
FS="/"
RS="\0"
}
{
k = split( $NF , b , "." )
if ( k > 1 )
sub ( "."b[k] , "" , $NF )
a[$NF]++
}
END {
for ( i in a ) {
j = a[i]>1 ? "s" : ""
print i
print a[i] " time" j
}
}'

list subdirectories recursively

I want to write a shell script so as to recursively list all different subdirectories which contain NEW subdirectory (NEW is a fixed name).
Dir1/Subdir1/Subsub1/NEW/1.jpg
Dir1/Subdir1/Subsub1/NEW/2.jpg
Dir1/Subdir1/Subsub2/NEW/3.jpg
Dir1/Subdir1/Subsub2/NEW/4.jpg
Dir1/Subdir2/Subsub3/NEW/5.jpg
Dir1/Subdir2/Subsub4/NEW/6.jpg
I want to get
Dir1/Subdir1/Subsub1
Dir1/Subdir1/Subsub2
Dir1/Subdir2/Subsub3
Dir1/Subdir2/Subsub4
How can I do that?
find . -type d -name NEW | sed 's|/NEW$||'
--- EDIT ---
for your comment, sed does not have a -print0. There are various ways of doing this (most of which are wrong). One possible solution would be:
find . -type d -name NEW -print0 | \
while IFS= read -rd '' subdirNew; do \
subdir=$(sed 's|/NEW$||' <<< "$subdirNew"); \
echo "$subdir"; \
done
which should be tolerant of spaces and newlines in the filename
ls -R will list things recursively.
of find . | grep "/NEW/" should give you the type of list you are looking for.
You could try this:
find . -type d -name "NEW" -exec dirname {} \;

using find sort and wc -l in the one command

This is how find files using find and show the number of lines in each file
$ find ./ -type f -name "data*.csv" -exec wc -l {} +
380723 ./data_2016-07-07-10-41-13.csv
369869 ./data_2016-07-11-10-42-01.csv
363941 ./data_2016-07-08-10-41-50.csv
378981 ./data_2016-07-12-10-41-28.csv
1493514 total
how do I sort the results by file name? Below is my attempt, but it is not working.
$ find ./ -type f -name "data*.csv" -exec wc -l {} + | sort
1493514 total
363941 ./data_2016-07-08-10-41-50.csv
369869 ./data_2016-07-11-10-42-01.csv
378981 ./data_2016-07-12-10-41-28.csv
380723 ./data_2016-07-07-10-41-13.csv
$ find ./ -type f -name "data*.csv" | sort -exec wc -l {} +
sort: invalid option -- 'e'
Try `sort --help' for more information.
$ find ./ -type f -name "data*.csv" -exec sort | wc -l {} +
find: wc: {}missing argument to `-exec'
: No such file or directory
wc: +: No such file or directory
0 total
$
Can someone offer a solution and correct me so I understand it better?
EDIT1
from man sort
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syntax below
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the begin‐
ning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
Ismail's suggestion of using sort -k is correct. However, I'm often too lazy to learn (or relearn) how -k works, so here's a cheap solution:
find . -name 'data*.csv' -print0 | sort -z | xargs -0 wc -l
Edit: after some experimentation, I did figure out how -k works:
find . -name 'data*.csv' -exec wc -l {} + | sort -k 2

Resources