BASH: show find all files but not last 2 newest - bash

I have a list of files one time list can contain:
1489247450-filename1
1489248450-filename2
1489249450-filename3
1489249550-filename4
and another time:
1489249450-filename3
1489249550-filename4
and another time:
1489245450-filename1
1489246450-filename2
1489247450-filename3
1489248450-filename4
1489249450-filename5
1489249550-filename6
The list is created by:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort
I would like to choose all of the files but not the 2 newest.
I can build a script which could count all files and subtract 2 and after that do | head. But is there much more simple way to do this?
I would like to remove old files in only condition that there is a 2 newest.
I don't want to use ctime because files are not created in regular time.

If the list is in the right order:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort | tail +3
Otherwise:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort -r | tail +3

The result was really simple.
If You would like to list all files but the newest 3 you can do:
find ./ -type f -name "*605*" -exec stat --format="%X-%n" {} \; | sort | head -n -3
The head -n -3 is the main thing!!

Related

Bash to find missing file

I'm counting files in a photos folder:
% find . -type f | wc -l
22188
Then I'm counting files per extension:
% find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
268 AVI
14983 JPG
61 MOV
1 MP4
131 MPG
1 VOB
21 avi
1 jpeg
6602 jpg
12 mov
20 mp4
74 mpg
12 png
The sum of that is 22187, not 22188. So I thought it could be a file without extension:
% find . -type f ! -name "*.*"
But the result was empty. Maybe a file starting with .:
% find . -type f ! -name "?*.*"
But also empty. How can I find out what that file is?
I'm on macOS 10.15.
This command should find the missing file:
comm -3 <(find . -type f | sort) <(find . -type f | sed -n '/..*\./p' | sort)
Perhaps a file with an embedded carriage return (or linefeed)?
Would be curious to see what this generates:
find . -type f | grep -Eiv '\.avi|\.jpg|\.mov|\.mp4|\.mpg|\.vob|\.avi|\.jpeg|\.png'
Would you please try:
find . -type f -name $'*\n*'
It will pick up filenames which contain newline character.
The ANSI-C quoting is supported by bash-3.2.x or so on MacOS.

Compare two version of zip file and find which file has been modified within that zip

I have two zip files called 10.88.10 and 10.88.12. One or more files in 10.88.12 have been modified. Is there any way I can find out which file has been modified?
The zip file contains a directory, a subdirectory, and zip files inside.
Code I've tried (I don't think I am on right path):
m1= md5sum 10.88.10.zip | cut -d' ' -f1
m2= md5sum 10.88.12.zip | cut -d' ' -f1
if [ "m1" != "m2" ]; then
echo file are not same
cd "/c/Users/name/Downloads/10.88.10/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
cd "/c/Users/name/Downloads/10.88.10/"
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/10.88.12/"
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
cd "/c/Users/name/Downloads/"
find 10.88.10/* -type f -print0 | xargs -0 sha1sum |cut -d' ' -f1 > file1.txt
find 10.88.12/* -type f -print0 | xargs -0 sha1sum | cut -d' ' -f1 > file2.txt
diff file1.txt file2.txt
else
echo false
fi
I tried hash to find out modified file by comparing and getting unique values but unfortunately I only receive the hash and can't think of a way to get names of the the input file which corresponds to that hash.
Running the hash cmd:
find 10.88.10/* -type f -print0 | xargs -0 sha1sum
Output:
c3f2b563b3cb091e2adsss321221a3d *10.88.12/name.xml
Difference/Modified file in hash:
1c1
< 3c2a991d1231c3eae391fadsdadda19e8f7b85df8caf2d
---
> c3f2b56qwdq2112e375b40fbfd5e60f526da3d1874c1874
< fbdc82dasdaa30538e5adadadada2d9456ff86953fbeeb1
---
> f962e8eqeqeqqe3b65d3ed43559adc879f5600c738e1e1c
Required output:
< 10.88.10/FOLDER/FILE1.XML
---
> 10.88.12/FOLDER1/FILE1.XML
< 10.88.10/FOLDER/FILE2.TXT
---
> 10.88.12/FOLDER/FILE2.TXT
IF anyone has a Java solution or bash script please share it.
The following is a shell script that leverages the sqlite3 command line tool's ability to open zip files to avoid having to unzip the files into a temporary location and using some simple SQL to do all the work:
#!/bin/sh
oldfile="$1"
newfile="$2"
sqlite3 -batch -bail <<EOF
.mode tabs
.headers off
CREATE VIRTUAL TABLE oldfile USING zipfile('${oldfile}');
CREATE VIRTUAL TABLE newfile USING zipfile('${newfile}');
-- Show files present in newfile that are absent in oldfile
SELECT 'added', name
FROM (SELECT name FROM newfile EXCEPT SELECT name FROM oldfile)
ORDER BY name;
-- Show files missing from newfile that are present in oldfile
SELECT 'deleted', name
FROM (SELECT name FROM oldfile EXCEPT SELECT name FROM newfile)
ORDER BY name;
-- Show files whose contents differ between the two
SELECT 'modified', of.name
FROM oldfile AS of
JOIN newfile AS nf ON of.name = nf.name
WHERE of.data <> nf.data
ORDER BY of.name;
EOF
Example usage:
$ unzip -l test1.zip
Archive: test1.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:05 1/
4 2020-02-27 04:05 1/a.txt
4 2020-02-27 04:05 1/b.txt
4 2020-02-27 04:05 a.txt
--------- -------
12 4 files
$ unzip -l test2.zip
Archive: test2.zip
Length Date Time Name
--------- ---------- ----- ----
0 2020-02-27 04:07 1/
4 2020-02-27 04:07 1/a.txt
4 2020-02-27 04:06 a.txt
4 2020-02-27 04:06 b.txt
--------- -------
12 4 files
$ ./cmpzip test1.zip test2.zip
added b.txt
deleted 1/b.txt
modified 1/a.txt
(I'm not sure why you want diff-style output when all you seem to care about is if a file changed, not what the change is, so this produces TSV output that's easier to understand and work with in further processing)

How to get 20% the total number of file in a folder?

I am using shell to count the number files of a folder. For example, a folder A has 100 file, and I just want to show 20% of it and it must be integer, means 20. This is my code but it was failure
file_num= find . -type f | wc -l
prob_select=0.2
file_num=$(expr $file_num \* $prob_select)
file_num=$( printf "%.0f" $file_num)
For a somewhat simpler approach which shows every n files instead of requiring you to know how many there are before deciding which ones to display,
find . -type f | awk -v factor=5 'NR%factor == 0'
You can't work with float numbers like that in bash, but try to convert 20% -> 0.2 -> 2/10 -> 1/5, so:
file_num=$(($(find . -type f | wc -l) / 5)); echo "${file_num}"
You will get the number of 20% of found files.
Next, just run find . -type f | head -n "${file_num}"

Unix shell group files extensions by size

i want to group and sort files sizes by extensions in current and all subfolders
for i in `find . -type f -name '*.*' | sed 's/.*\.//' | sort | uniq `
do
echo $i
done
got code which gets all files extensions in current and all subfolders
now i need to sum all files sizes by those extensions and print them
Any ideas how this could be done?
example output:
sh (files sizes sum by sh extension)
pl (files sizes sum by pl extension)
c (files sizes sum by c extension)
I would use a loop, so that you can provide a different extension every time and find just the files with that extension:
for extension in c php pl ...
do
find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc
done
The sum is based on the answer in total size of group of files selected with 'find'.
In case you want the very specific output you mention in the question, you can store the last line and then print it together with the extension name:
for extension in c php pl ...
do
sum=$(find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc | tail -1)
echo "$extension ($sum)"
done
If you don't want to name file extensions beforehand, the stat(1) program has a format option (-c) that can make tasks like this a bit easier, if you're on a system that includes it, and xargs(1) usually helps performance.
#!/bin/sh
find . -type f -name '*.*' -print0 |
xargs -0 stat -c '%s %n' |
sed 's/ .*\./ /' |
awk '
{
sums[$2] += $1
}
END {
for (key in sums) {
printf "%s %d\n", key, sums[key]
}
}'

Sort output from ls command

I'm trying to sort the output from ls. The order that I'm going for is this:
any directories with names that begin with _
any directories with names that begin with +
all soft links (which may include some dot files)
all remaining .files
all remaining .directories
everything else
Everything is sorted alphabetically within these 'sublists'. At the moment I'm using the find command a number of times to find files meeting the criteria above. Following that I pipe the output from find to sort and then pass the entire sorted list to ls:
#!/bin/bash
find1=`find . -maxdepth 1 -name "_*" -type d -printf "%f\n" | sort`
find2=`find . -maxdepth 1 -name "+*" -type d -printf "%f\n" | sort`
find3=`find . -maxdepth 1 -type l -printf "%f\n" | sort`
find4=`find . -maxdepth 1 -name ".*" -type f -printf "%f\n" | sort`
find5=`find . -maxdepth 1 \( ! -name "." \) -name ".*" -type d -printf "%f\n" | sort`
find6=`find . -maxdepth 1 \( ! -name "_*" \) \( ! -name "+*" \) \( ! -name ".*" \) \( ! -type l \) -printf "%f\n"`
find="$find1 $find2 $find3 $find4 $find5 $find6"
ls -dfhlF --color=auto $find
This doesn't handle any names that contain spaces, and overall seems a bit excessive. I'm sure there is a better way to do this. Any ideas?
Will this work for you? It prints the files in the order you specified, but it won't print them in color. In order to do that, you'd need to strip the ANSI codes from the names before pattern-matching them. As it is, it will handle filenames with embedded spaces, but not horribly pathological names, like those with embedded newlines or control characters.
I think the awk script is fairly self-explanatory, but let me know if you'd like clarification. The BEGIN line is processed before the ls output starts, and the END line is processed after all the output is consumed. The other lines start with an optional condition, followed by a sequence of commands enclosed in curly brackets. The commands are executed on (only) those lines that match the condition.
ls -ahlF --color=none | awk '
BEGIN { name_col = 45 }
{ name = substr($0, name_col) }
name == "" { next }
/^d/ && substr(name, 1, 1) == "_" { under_dirs = under_dirs $0 "\n"; next }
/^d/ && substr(name, 1, 1) == "+" { plus_dirs = plus_dirs $0 "\n"; next }
/^l/ { links = links $0 "\n"; next }
/^[^d]/ && substr(name, 1, 1) == "." { dot_files = dot_files $0 "\n"; next }
/^d/ && substr(name, 1, 1) == "." { dot_dirs = dot_dirs $0 "\n"; next }
{ others = others $0 "\n" }
END { print under_dirs plus_dirs links dot_files dot_dirs others }
'

Resources