How to count files in subdir and filter output in bash - bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')

files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

Related

Finding most recent file from a list of directories from find command

I use find . -type d -name "Financials" to find all the directories called "Financials" under the current directory. Since I am on Mac, I can use the following (which I found from another stackoverflow question) to find the latest modified file in my current directory: find . -type f -print0 | xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" ". What I would like to do is find a way to pipe the results of the first command into the second command--i.e. to find the most recently modified file in each "Financials" directory. Is there a way to do this?
I think you could:
find . -type d -name "Financials" -print0 |
xargs -0 -I{} find {} -type f -print0 |
xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
But if you want separately for each dir, then... why not just loop it:
find . -type d -name "Financials" |
while IFS= read -r dir; do
echo "newest file in $dir is $(
find "$dir" -type f -print0 |
xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
)"
done
Nest the 2nd file+xargs inside a first find+xargs:
find . -type d -name "Financials" -print0 \
| xargs -0 sh -c '
for d in "$#"; do
find "$d" -type f -print0 \
| xargs -0 stat -f "%m %N" \
| sort -rn \
| head -1 \
| cut -f2- -d" "
done
' sh
Note the trailing "sh" in sh -c '...' sh -- that word becomes "$0" inside the shell script so the for-loop can iterate over all the directories.
A robust way that will also avoid problems with funny filenames that contain special characters is:
find all files within this particular subdirectory, and extract the inode number and modifcation time
$ find . -type f -ipath '*/Financials/*' -printf "%T# %i\n"
extract youngest file's inode number
$ ... | awk '($1>t){t=$1;i=$2}END{print i}'
search file information by inode number
$ find . -type f -inum "$inode" '*/Financials/*'
So this gives you:
$ inode="$(find . -type f -ipath '*/Financials/*' -printf "%T# %i\n" | awk '($1>t){t=$1;i=$2}END{print i}')"
$ find . -type f -inum "$inode" '*/Financials/*'

execute an if statement on every folder

I have for example 3 files (it could 1 or it could be 30) like this :
name_date1.tgz
name_date2.tgz
name_date3.tgz
When extracted it will look like :
name_date1/data/info/
name_date2/data/info/
name_date3/data/info/
Here how it looks inside each folder:
name_date1/data/info/
you.log
you.log.1.gz
you.log.2.gz
you.log.3.gz
name_date2/data/info/
you.log
name_date3/data/info/
you.log
you.log.1.gz
you.log.2.gz
What I want to do is concatenate all you file from each folder and concatenate one more time all the concatenated one to one single file.
1st step: extract all the folder
for a in *.tgz
do
a_dir=${a%.tgz}
mkdir $a_dir 2>/dev/null
tar -xvzf $a -C $a_dir >/dev/null
done
2nd step: executing an if statement on each folder available and cat everything
myarray=(`find */data/info/ -maxdepth 1 -name "you.log.*.gz"`)
ls -d */ | xargs -I {} bash -c "cd '{}' &&
if [ ${#myarray[#]} -gt 0 ];
then
find data/info -name "you.log.*.gz" -print0 | sort -z -rn -t. -k4 | xargs -0 zcat | cat -
data/info/you.log > youfull1.log
else
cat - data/info/you.log > youfull1.log
fi "
cat */youfull1.log > youfull.log
My issue when I put multiple name_date*.tgzit gives me this error:
gzip: stdin: unexpected end of file
With the error, I still have all my files concatenated, but why error message ?
But when I put only one .tgz file then I don't have any issue regardless the number you file.
any suggestion please ?
Try something simpler. No need for myarray. Pass files one at a time as they are inputted and decide what to do with them one at a time. Try:
find */data/info -type f -maxdepth 1 -name "you.log*" -print0 |
sort -z |
xargs -0 -n1 bash -c '
if [[ "${1##*.}" == "gz" ]]; then
zcat "$1";
else
cat "$1";
fi
' --
If you have to iterate over directories, don't use ls, still use find.
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
while IFS= read -r -d '' dir; do
cat "$dir"/data/info/you.log
find "$dir"/data/info -type f -maxdepth 1 -name 'you.log.*.gz' -print0 |
sort -z -t'.' -n -k3 |
xargs -r -0 zcat
done
or (if you have to) with xargs, which should give you the idea how it's used:
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
xargs -0 -n1 bash -c '
cat "$1"/data/info/you.log
find "$1"/data/info -type f -maxdepth 1 -name "you.log.*.gz" -print0 |
sort -z -t"." -n -k3 |
xargs -r -0 zcat
' --
Use -t option with xargs to see what it's doing.

Why does my xargs command with a pipe work only for a single file, but not multiple?

I am trying to pipe a few commands in a row; it works with a single file, but gives me an error once I try it on multiple files at once.
On a single file in my working folder:
find . -type f -iname "summary.5runs.*" -print0 | xargs -0 cut -f1-2 | head -n 2
#It works
Now I want to scan all files with a certain prefix/suffix in the name in all subdirectories of my working folder, then write the results to text file
find . -type f -iname "ww.*.out.txt" -print0 | xargs -0 cut -f3-5 | head -n 42 > summary.5runs.txt
#Error: xargs: cut: terminated by signal 13
I guess my problem is to reiterate through multiple files, but I am not sure how to do it.
Your final head stops after 42 lines of total output, but you want it to operate per file. You could fudge around with a subshell in xargs:
xargs -0 -I{} bash -c 'cut -f3-5 "$1" | head -n 42' _ {} > summary.5runs.txt
or you could make it part of an -exec action:
find . -type f -iname "ww.*.out.txt" \
-exec bash -c 'cut -f3-5 "$1" | head -n 42' _ {} \; > summary.5runs.txt
Alternatively, you could loop over all the files in the subshell so you have to spawn just one:
find . -type f -iname "ww.*.out.txt" \
-exec bash -c 'for f; do cut -f3-5 "$f" | head -n 42; done' _ {} + \
> summary.5runs.txt
Notice the {} + instead of {} \;.

How can I count the number of words in a directory recursively?

I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.
Can anyone help me find out a quick way to do this?
bash or vim would be good!
Thanks
use find the scan the dir tree and wc will do the rest
$ find path -type f | xargs wc -w | tail -1
last line gives the totals.
tldr;
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
Explanation:
The find . -type f -exec wc -w {} + will run wc -w on all the files (recursively) contained by . (the current working directory). find will execute wc as few times as possible but as many times as is necessary to comply with ARG_MAX --- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX, then find invokes wc -w more than once, giving multiple total lines:
$ find . -type f -exec wc -w {} + | awk '/total/{print $0}'
8264577 total
654892 total
1109527 total
149522 total
174922 total
181897 total
1229726 total
2305504 total
1196390 total
5509702 total
9886665 total
Isolate these partial sums by printing only the first whitespace-delimited field of each total line:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665
paste the partial sums with a + delimiter to give an infix summation:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665
Evaluate the infix summation using bc, which supports both infix expressions and arbitrary precision:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
30663324
References:
https://www.cyberciti.biz/faq/argument-list-too-long-error-solution/
https://www.in-ulm.de/~mascheck/various/argmax/
https://linux.die.net/man/1/find
https://linux.die.net/man/1/wc
https://linux.die.net/man/1/awk
https://linux.die.net/man/1/paste
https://linux.die.net/man/1/bc
You could find and print all the content and pipe to wc:
find path -type f -exec cat {} \; -exec echo \; | wc -w
Note: the -exec echo \; is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.
Or you could find and wc and use awk to aggregate the counts:
find . -type f -exec wc -w {} \; | awk '{ sum += $1 } END { print sum }'
If there's one thing I've learned from all the bash questions on SO, it's that a filename with a space will mess you up. This script will work even if you have whitespace in the file names.
#!/usr/bin/env bash
shopt -s globstar
count=0
for f in **/*.txt
do
words=$(wc -w "$f" | awk '{print $1}')
count=$(($count + $words))
done
echo $count
Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:
wc -l *
10 000292_0
500 000297_0
510 total
If you want to count the words for only a specific extension in the current directory , you could try :
cat *.txt | wc -l

Bash: how to pipe each result of one command to another

I want to get the total count of the number of lines from all the files returned by the following command:
shell> find . -name *.info
All the .info files are nested in sub-directories so I can't simply do:
shell> wc -l *.info
Am sure this should be in any bash users repertoire, but am stuck!
Thanks
wc -l `find . -name *.info`
If you just want the total, use
wc -l `find . -name *.info` | tail -1
Edit: Piping to xargs also works, and hopefully can avoid the 'command line too long'.
find . -name *.info | xargs wc -l
You can use xargs like so:
find . -name *.info -print0 | xargs -0 cat | wc -l
some googling turns up
find /topleveldirectory/ -type f -exec wc -l {} \; | awk '{total += $1} END{print total}'
which seems to do the trick
#!/bin/bash
# bash 4.0
shopt -s globstar
sum=0
for file in **/*.info
do
if [ -f "$file" ];then
s=$(wc -l< "$file")
sum=$((sum+s))
fi
done
echo "Total: $sum"
find . -name "*.info" -exec wc -l {} \;
Note to self - read the question
find . -name "*.info" -exec cat {} \; | wc -l
# for a speed-up use: find ... -exec ... '{}' + | ...
find . -type f -name "*.info" -exec sed -n '$=' '{}' + | awk '{total += $0} END{print total}'

Resources