Sort output from ls command - sorting

I'm trying to sort the output from ls. The order that I'm going for is this:
any directories with names that begin with _
any directories with names that begin with +
all soft links (which may include some dot files)
all remaining .files
all remaining .directories
everything else
Everything is sorted alphabetically within these 'sublists'. At the moment I'm using the find command a number of times to find files meeting the criteria above. Following that I pipe the output from find to sort and then pass the entire sorted list to ls:
#!/bin/bash
find1=`find . -maxdepth 1 -name "_*" -type d -printf "%f\n" | sort`
find2=`find . -maxdepth 1 -name "+*" -type d -printf "%f\n" | sort`
find3=`find . -maxdepth 1 -type l -printf "%f\n" | sort`
find4=`find . -maxdepth 1 -name ".*" -type f -printf "%f\n" | sort`
find5=`find . -maxdepth 1 \( ! -name "." \) -name ".*" -type d -printf "%f\n" | sort`
find6=`find . -maxdepth 1 \( ! -name "_*" \) \( ! -name "+*" \) \( ! -name ".*" \) \( ! -type l \) -printf "%f\n"`
find="$find1 $find2 $find3 $find4 $find5 $find6"
ls -dfhlF --color=auto $find
This doesn't handle any names that contain spaces, and overall seems a bit excessive. I'm sure there is a better way to do this. Any ideas?

Will this work for you? It prints the files in the order you specified, but it won't print them in color. In order to do that, you'd need to strip the ANSI codes from the names before pattern-matching them. As it is, it will handle filenames with embedded spaces, but not horribly pathological names, like those with embedded newlines or control characters.
I think the awk script is fairly self-explanatory, but let me know if you'd like clarification. The BEGIN line is processed before the ls output starts, and the END line is processed after all the output is consumed. The other lines start with an optional condition, followed by a sequence of commands enclosed in curly brackets. The commands are executed on (only) those lines that match the condition.
ls -ahlF --color=none | awk '
BEGIN { name_col = 45 }
{ name = substr($0, name_col) }
name == "" { next }
/^d/ && substr(name, 1, 1) == "_" { under_dirs = under_dirs $0 "\n"; next }
/^d/ && substr(name, 1, 1) == "+" { plus_dirs = plus_dirs $0 "\n"; next }
/^l/ { links = links $0 "\n"; next }
/^[^d]/ && substr(name, 1, 1) == "." { dot_files = dot_files $0 "\n"; next }
/^d/ && substr(name, 1, 1) == "." { dot_dirs = dot_dirs $0 "\n"; next }
{ others = others $0 "\n" }
END { print under_dirs plus_dirs links dot_files dot_dirs others }
'

Related

Bash to find missing file

I'm counting files in a photos folder:
% find . -type f | wc -l
22188
Then I'm counting files per extension:
% find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
268 AVI
14983 JPG
61 MOV
1 MP4
131 MPG
1 VOB
21 avi
1 jpeg
6602 jpg
12 mov
20 mp4
74 mpg
12 png
The sum of that is 22187, not 22188. So I thought it could be a file without extension:
% find . -type f ! -name "*.*"
But the result was empty. Maybe a file starting with .:
% find . -type f ! -name "?*.*"
But also empty. How can I find out what that file is?
I'm on macOS 10.15.
This command should find the missing file:
comm -3 <(find . -type f | sort) <(find . -type f | sed -n '/..*\./p' | sort)
Perhaps a file with an embedded carriage return (or linefeed)?
Would be curious to see what this generates:
find . -type f | grep -Eiv '\.avi|\.jpg|\.mov|\.mp4|\.mpg|\.vob|\.avi|\.jpeg|\.png'
Would you please try:
find . -type f -name $'*\n*'
It will pick up filenames which contain newline character.
The ANSI-C quoting is supported by bash-3.2.x or so on MacOS.

BASH: show find all files but not last 2 newest

I have a list of files one time list can contain:
1489247450-filename1
1489248450-filename2
1489249450-filename3
1489249550-filename4
and another time:
1489249450-filename3
1489249550-filename4
and another time:
1489245450-filename1
1489246450-filename2
1489247450-filename3
1489248450-filename4
1489249450-filename5
1489249550-filename6
The list is created by:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort
I would like to choose all of the files but not the 2 newest.
I can build a script which could count all files and subtract 2 and after that do | head. But is there much more simple way to do this?
I would like to remove old files in only condition that there is a 2 newest.
I don't want to use ctime because files are not created in regular time.
If the list is in the right order:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort | tail +3
Otherwise:
find ./ -type f -name *filename* -exec stat --format="%X-%n" {} \; | sort -r | tail +3
The result was really simple.
If You would like to list all files but the newest 3 you can do:
find ./ -type f -name "*605*" -exec stat --format="%X-%n" {} \; | sort | head -n -3
The head -n -3 is the main thing!!

Listing files that are older than one day in reverse order of modification time

In order to write a cleanup script on a directory, I need to take look at all files that are older than one day. Additionally, I need to delete them in reverse order of modification time (oldest first) until a specified size is reached.
I came along with the following approach to list the files:
find . -mtime +1 -exec ls -a1rt {} +
Am I right, that this does not work for a large number of files (since more than one 'ls' will be executed)? How can I achieve my goal in that case?
You can use the following command to find the 10 oldest files:
find . -mtime +1 -type f -printf '%T# %p\n' | sort -n | head -10 | awk '{print $2}'
The steps used:
For each file returned by find, we print the modification timestamp along with the filename.
Then we numerically sort by the timestamp.
We take the 10 first.
We print only the filename part.
Later if you want to remove them, you can do the following:
rm $(...)
where ... is the command described above.
Here is a perl script that you can use to delete the oldest files first in a given directory, until the total size of the files in the directory gets down to a given size:
&CleanupDir("/path/to/directory/", 30*1024*1024); #delete oldest files first in /path/to/directory/ until total size of files in /path/to/directory/ gets down to 30MB
sub CleanupDir {
my($dirname, $dirsize) = #_;
my($cmd, $r, #lines, $line, #vals, $b, $dsize, $fname);
$b=1;
while($b) {
$cmd="du -k " . $dirname . " | cut -f1";
$r=`$cmd`;
$dsize=$r * 1024;
#print $dsize . "\n";
if($dsize>$dirsize) {
$cmd=" ls -lrt " . $dirname . " | head -n 100";
$r=`$cmd`;
#lines=split(/\n/, $r);
foreach $line (#lines) {
#vals=split(" ", $line);
if($#vals>=8) {
if(length($vals[8])>0) {
$fname=$dirname . $vals[8];
#print $fname . "\n";
unlink $fname;
}
}
}
} else {
$b=0;
}
}
}

Unix shell group files extensions by size

i want to group and sort files sizes by extensions in current and all subfolders
for i in `find . -type f -name '*.*' | sed 's/.*\.//' | sort | uniq `
do
echo $i
done
got code which gets all files extensions in current and all subfolders
now i need to sum all files sizes by those extensions and print them
Any ideas how this could be done?
example output:
sh (files sizes sum by sh extension)
pl (files sizes sum by pl extension)
c (files sizes sum by c extension)
I would use a loop, so that you can provide a different extension every time and find just the files with that extension:
for extension in c php pl ...
do
find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc
done
The sum is based on the answer in total size of group of files selected with 'find'.
In case you want the very specific output you mention in the question, you can store the last line and then print it together with the extension name:
for extension in c php pl ...
do
sum=$(find . -type f -name "*.$extension" -print0 | du --files0-from=- -hc | tail -1)
echo "$extension ($sum)"
done
If you don't want to name file extensions beforehand, the stat(1) program has a format option (-c) that can make tasks like this a bit easier, if you're on a system that includes it, and xargs(1) usually helps performance.
#!/bin/sh
find . -type f -name '*.*' -print0 |
xargs -0 stat -c '%s %n' |
sed 's/ .*\./ /' |
awk '
{
sums[$2] += $1
}
END {
for (key in sums) {
printf "%s %d\n", key, sums[key]
}
}'

bash script to help identify files and path with a count sorted by day of the week recursivley

i'm not a script-er really (yet), so apologies in advance.
What I need to do is search a path for files within the last 7 days then count the number of files in each directrory for each day (Mon to sun) for each day for each directory.
so for eaxmple
From folder - Rootfiles
Directory 1 :
Number of files Monday
Number of files ..n
Number of files Sunday
Directory 2 :
Number of files Monday
Number of files ..n
Number of files Sunday
So far I have this from my basic command line knowledge and a bit of research.
#!/bin/bash
find . -type f -mtime -7 -exec ls -l {} \; | grep "^-" | awk '{
key=$6$7
freq[key]++
}
END {
for (date in freq)
printf "%s\t%d\n", date, freq[date]
}'
but a couple of problems, I need to print each directory then I need to figure out the Monday, Tuesday, Wednesday sort.
And for some reason works on my test folders with basic folders and names, but isn't on the production folders.
Even some pouinters of where to start thinking would be helpful
Thanks in advance all, you all awesome!
Neil
I found some addtional code that is helping
#!bin/bash
# pass in the directory to search on the command line, use $PWD if not arg received
rdir=${1:-$(pwd)}
# if $rdir is a file, get it's directory
if [ -f $rdir ]; then
rdir=$(dirname $rdir)
fi
# first, find our tree of directories
for dir in $( find $rdir -type d -print ); do
# get a count of directories within $dir.
sdirs=$( find $dir -maxdepth 1 -type d | wc -l );
# only proceed if sdirs is less than 2 ( 1 = self ).
if (( $sdirs < 2 )); then
# get a count of all the files in $dir, but not in subdirs of $dir)
files=$( find $dir -maxdepth 1 -type f | wc -l );
echo "$dir : $files";
fi
done
if I could somehow replace the line
sdirs=$( find $dir -maxdepth 1 -type d | wc -l );
with my original code block that would help.
props to
https://unix.stackexchange.com/questions/22803/counting-files-in-leaves-of-directory-tree
for that bit of code
Neat problem.
I think in your find command you will want to add the --time-style=+%w to get the day of the week.
find . -type f -mtime -7 -exec ls -l --time-style=+%w {} \;
I'm not sure why you are grepping for lines that start with a dash (since you're already only finding files.) This is not necessary, so I would remove it.
Then I would get the directory names from this output by stripping the filenames, or everything after the last slash from each line.
| sed -e 's:/[^/]*$:/:'
Then I would cut out all the tokens before the day of the week. Since you're using . as the starting point, you can expect each directory to start with ./.
| sed -e 's:.*\([0-6]\) \./:\1 ./:'
From here you can sort -k2 to sort by directory name and then day of the week.
Eventually you can pipe this into uniq -c to get the counts of days per week by directory, but I would convert it to human readable days first.
| awk '
/^0/ { $1 = "Monday " }
/^1/ { $1 = "Tuesday " }
/^2/ { $1 = "Wednesday" }
/^3/ { $1 = "Thursday " }
/^4/ { $1 = "Friday " }
/^5/ { $1 = "Saturday " }
/^6/ { $1 = "Sunday " }
{ print $0 }
'
Putting this all together:
find . -type f -mtime -7 -exec ls -l --time-style=+%w {} \; \
| sed -e 's:/[^/]*$:/:' \
| sed -e 's:.*\([0-6]\) \./:\1 ./:' \
| sort -k2 \
| awk '
/^0/ { $1 = "Monday " }
/^1/ { $1 = "Tuesday " }
/^2/ { $1 = "Wednesday" }
/^3/ { $1 = "Thursday " }
/^4/ { $1 = "Friday " }
/^5/ { $1 = "Saturday " }
/^6/ { $1 = "Sunday " }
{ print $0 }
' | uniq -c
On my totally random PWD it looks like this:
1 Monday ./
1 Tuesday ./
5 Saturday ./
2 Sunday ./
17 Monday ./M3Javadoc/
1 Thursday ./M3Javadoc/
1 Saturday ./M3Javadoc/
1 Sunday ./M3Javadoc/

Resources