i want to group and sort files sizes by extensions in current and all subfolders
for i in `find . -type f -name '*.*' | sed 's/.*\.//' | sort | uniq `
do
echo $i
done
got code which gets all files extensions in current and all subfolders
now i need to sum all files sizes by those extensions and print them
Any ideas how this could be done?
example output:
sh (files sizes sum by sh extension)
pl (files sizes sum by pl extension)
c (files sizes sum by c extension)
I would use a loop, so that you can provide a different extension every time and find just the files with that extension:
for extension in c php pl ...
do
find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc
done
The sum is based on the answer in total size of group of files selected with 'find'.
In case you want the very specific output you mention in the question, you can store the last line and then print it together with the extension name:
for extension in c php pl ...
do
sum=$(find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc | tail -1)
echo "$extension ($sum)"
done
If you don't want to name file extensions beforehand, the stat(1) program has a format option (-c) that can make tasks like this a bit easier, if you're on a system that includes it, and xargs(1) usually helps performance.
#!/bin/sh
find . -type f -name '*.*' -print0 |
xargs -0 stat -c '%s %n' |
sed 's/ .*\./ /' |
awk '
{
sums[$2] += $1
}
END {
for (key in sums) {
printf "%s %d\n", key, sums[key]
}
}'
Related
I'm counting files in a photos folder:
% find . -type f | wc -l
22188
Then I'm counting files per extension:
% find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
268 AVI
14983 JPG
61 MOV
1 MP4
131 MPG
1 VOB
21 avi
1 jpeg
6602 jpg
12 mov
20 mp4
74 mpg
12 png
The sum of that is 22187, not 22188. So I thought it could be a file without extension:
% find . -type f ! -name "*.*"
But the result was empty. Maybe a file starting with .:
% find . -type f ! -name "?*.*"
But also empty. How can I find out what that file is?
I'm on macOS 10.15.
This command should find the missing file:
comm -3 <(find . -type f | sort) <(find . -type f | sed -n '/..*\./p' | sort)
Perhaps a file with an embedded carriage return (or linefeed)?
Would be curious to see what this generates:
find . -type f | grep -Eiv '\.avi|\.jpg|\.mov|\.mp4|\.mpg|\.vob|\.avi|\.jpeg|\.png'
Would you please try:
find . -type f -name $'*\n*'
It will pick up filenames which contain newline character.
The ANSI-C quoting is supported by bash-3.2.x or so on MacOS.
I have two zip files called 10.88.10 and 10.88.12. One or more files in 10.88.12 have been modified. Is there any way I can find out which file has been modified?
The zip file contains a directory, a subdirectory, and zip files inside.
Code I've tried (I don't think I am on right path):
m1= md5sum 10.88.10.zip | cut -d' ' -f1
m2= md5sum 10.88.12.zip | cut -d' ' -f1
if [ "m1" != "m2" ]; then
echo file are not same
cd "/c/Users/name/Downloads/10.88.10/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
cd "/c/Users/name/Downloads/10.88.10/"
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/10.88.12/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/"
find 10.88.10/* -type f -print0 | xargs -0 sha1sum |cut -d' ' -f1 > file1.txt
find 10.88.12/* -type f -print0 | xargs -0 sha1sum | cut -d' ' -f1 > file2.txt
diff file1.txt file2.txt
else
echo false
fi
I tried hash to find out modified file by comparing and getting unique values but unfortunately I only receive the hash and can't think of a way to get names of the the input file which corresponds to that hash.
Running the hash cmd:
find 10.88.10/* -type f -print0 | xargs -0 sha1sum
Output:
c3f2b563b3cb091e2adsss321221a3d *10.88.12/name.xml
Difference/Modified file in hash:
1c1
< 3c2a991d1231c3eae391fadsdadda19e8f7b85df8caf2d
---
> c3f2b56qwdq2112e375b40fbfd5e60f526da3d1874c1874
< fbdc82dasdaa30538e5adadadada2d9456ff86953fbeeb1
---
> f962e8eqeqeqqe3b65d3ed43559adc879f5600c738e1e1c
Required output:
< 10.88.10/FOLDER/FILE1.XML
---
> 10.88.12/FOLDER1/FILE1.XML
< 10.88.10/FOLDER/FILE2.TXT
---
> 10.88.12/FOLDER/FILE2.TXT
IF anyone has a Java solution or bash script please share it.
The following is a shell script that leverages the sqlite3 command line tool's ability to open zip files to avoid having to unzip the files into a temporary location and using some simple SQL to do all the work:
#!/bin/sh
oldfile="$1"
newfile="$2"
sqlite3 -batch -bail <<EOF
.mode tabs
.headers off
CREATE VIRTUAL TABLE oldfile USING zipfile('${oldfile}');
CREATE VIRTUAL TABLE newfile USING zipfile('${newfile}');
-- Show files present in newfile that are absent in oldfile
SELECT 'added', name
FROM (SELECT name FROM newfile EXCEPT SELECT name FROM oldfile)
ORDER BY name;
-- Show files missing from newfile that are present in oldfile
SELECT 'deleted', name
FROM (SELECT name FROM oldfile EXCEPT SELECT name FROM newfile)
ORDER BY name;
-- Show files whose contents differ between the two
SELECT 'modified', of.name
FROM oldfile AS of
JOIN newfile AS nf ON of.name = nf.name
WHERE of.data <> nf.data
ORDER BY of.name;
EOF
Example usage:
$ unzip -l test1.zip
Archive: test1.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:05 1/
4 2020-02-27 04:05 1/a.txt
4 2020-02-27 04:05 1/b.txt
4 2020-02-27 04:05 a.txt
--------- -------
12 4 files
$ unzip -l test2.zip
Archive: test2.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:07 1/
4 2020-02-27 04:07 1/a.txt
4 2020-02-27 04:06 a.txt
4 2020-02-27 04:06 b.txt
--------- -------
12 4 files
$ ./cmpzip test1.zip test2.zip
added b.txt
deleted 1/b.txt
modified 1/a.txt
(I'm not sure why you want diff-style output when all you seem to care about is if a file changed, not what the change is, so this produces TSV output that's easier to understand and work with in further processing)
I am using shell to count the number files of a folder. For example, a folder A has 100 file, and I just want to show 20% of it and it must be integer, means 20. This is my code but it was failure
file_num= find . -type f | wc -l
prob_select=0.2
file_num=$(expr $file_num \* $prob_select)
file_num=$( printf "%.0f" $file_num)
For a somewhat simpler approach which shows every n files instead of requiring you to know how many there are before deciding which ones to display,
find . -type f | awk -v factor=5 'NR%factor == 0'
You can't work with float numbers like that in bash, but try to convert 20% -> 0.2 -> 2/10 -> 1/5, so:
file_num=$(($(find . -type f | wc -l) / 5)); echo "${file_num}"
You will get the number of 20% of found files.
Next, just run find . -type f | head -n "${file_num}"
I have a list of files one time list can contain:
1489247450-filename1
1489248450-filename2
1489249450-filename3
1489249550-filename4
and another time:
1489249450-filename3
1489249550-filename4
and another time:
1489245450-filename1
1489246450-filename2
1489247450-filename3
1489248450-filename4
1489249450-filename5
1489249550-filename6
The list is created by:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort
I would like to choose all of the files but not the 2 newest.
I can build a script which could count all files and subtract 2 and after that do | head. But is there much more simple way to do this?
I would like to remove old files in only condition that there is a 2 newest.
I don't want to use ctime because files are not created in regular time.
If the list is in the right order:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort | tail +3
Otherwise:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort -r | tail +3
The result was really simple.
If You would like to list all files but the newest 3 you can do:
find ./ -type f -name "*605*" -exec stat --format="%X-%n" {} \; | sort | head -n -3
The head -n -3 is the main thing!!
In order to write a cleanup script on a directory, I need to take look at all files that are older than one day. Additionally, I need to delete them in reverse order of modification time (oldest first) until a specified size is reached.
I came along with the following approach to list the files:
find . -mtime +1 -exec ls -a1rt {} +
Am I right, that this does not work for a large number of files (since more than one 'ls' will be executed)? How can I achieve my goal in that case?
You can use the following command to find the 10 oldest files:
find . -mtime +1 -type f -printf '%T# %p\n' | sort -n | head -10 | awk '{print $2}'
The steps used:
For each file returned by find, we print the modification timestamp along with the filename.
Then we numerically sort by the timestamp.
We take the 10 first.
We print only the filename part.
Later if you want to remove them, you can do the following:
rm $(...)
where ... is the command described above.
Here is a perl script that you can use to delete the oldest files first in a given directory, until the total size of the files in the directory gets down to a given size:
&CleanupDir("/path/to/directory/", 30*1024*1024); #delete oldest files first in /path/to/directory/ until total size of files in /path/to/directory/ gets down to 30MB
sub CleanupDir {
my($dirname, $dirsize) = #_;
my($cmd, $r, #lines, $line, #vals, $b, $dsize, $fname);
$b=1;
while($b) {
$cmd="du -k " . $dirname . " | cut -f1";
$r=`$cmd`;
$dsize=$r * 1024;
#print $dsize . "\n";
if($dsize>$dirsize) {
$cmd=" ls -lrt " . $dirname . " | head -n 100";
$r=`$cmd`;
#lines=split(/\n/, $r);
foreach $line (#lines) {
#vals=split(" ", $line);
if($#vals>=8) {
if(length($vals[8])>0) {
$fname=$dirname . $vals[8];
#print $fname . "\n";
unlink $fname;
}
}
}
} else {
$b=0;
}
}
}