List files that match list whithout duplicates keep highest version number - bash

Sorry if this is a repeated question but I have been looking for several hours.
I have a list of files generated by
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n"
that print something like
libncurses.so.6
libaudit.so.1
libncurses.so.5
libicuuc.so.65
libnghttp2.so.14
libicuuc.so.71
I would like to keep only the files with the highest vesion number
libncurses.so.6
libaudit.so.1
libnghttp2.so.14
libicuuc.so.71
Thanks lot for your help

maybe with awk?
find /usr/lib64/ -maxdepth 1 -type l -printf '%f\n' |
awk -F '\\.so\\.' -v OFS='.so.' '
$2 > vers[$1] { vers[$1] = $2 }
END { for (lib in vers) print lib, vers[lib] }
'
libaudit.so.1
libicuuc.so.71
libnghttp2.so.14
libncurses.so.6
note: you might need to implement a more accurate operator than > for comparing versions, or use:
find /usr/lib64/ -maxdepth 1 -type l -printf '%f\n' |
sort -rV |
awk -F '\\.so\\.' -v OFS='.so.' '!seen[$1]++'

You could chain some commands with a pipe, something like:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" |
sort -rn -k3 -t. |
awk -F. '!seen[$1]++'
The code above assumes that there are always 3 fields/column separated by a dot . from the file names.
With pure bash using associative array with GNU find and sort
#!/usr/bin/env bash
declare -A uniq
while IFS= read -rd '' files; do
if ((!uniq[${files%%.*}]++)); then
printf '%s\n' "$files"
fi
done < <(
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\0" |
sort -rnz -k3 -t.
)

You can use the following to sort:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" | sort -V
To only keep the latest version:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" | sort -V | uniq -f2
-f2 ignores the first two fields when comparing lines.

Related

Get files from directories alphabetically sorted with bash

I have this code that works in the directory that I execute:
pathtrabajo=.
filedirs="files.txt"
dirs=$(find . -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in $dirs
do
echo "${entry}" >> "${filedirs}"
find "$entry" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${filedirs}"
done
IFS=$SAVEIFS
But when I try to make it global to work with variables, find gives error:
pathtrabajo="/path/to/a/files"
filedirs="files.txt"
dirs=$(find "${pathtrabajo}" -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in "${dirs[#]}"
do
echo "${entry}" >> "${pathtrabajo}"/"${filedirs}"
find "${entry}" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${pathtrabajo}"/"${filedirs}"
done
IFS=$SAVEIFS
What did I do wrong?
It's really not clear why you are using find here at all. The following will probably do what you are trying if I can guess from your code.
dirs=([0-9][!0-9]*/ [0-9][0-9][!0-9]*/ [0-9][0-9][0-9][!0-9]*/ [!0-9]*/)
printf "%s\n" "${dirs[#]}" >"$filedirs"
for dir in "${dirs[#]}"; do
printf "%s\n" "$dir"/*.md |
awk '{ sub(/\.md$/, ""); print "- [["$0"]]" }'
done >>"$filedirs"
The shell already expands wildcards alphabetically. The dirs assignment will expand all directories which start with a single digit, then the ones with two digits, then the ones with three digits -- extend if you need more digits -- then the ones which do not start with a digit.
It would not be hard, but cumbersome, to extend the code to run in an arbitrary directory. My proposed solution would be (for once!) to cd to the directory where you want the listing, then run this script.

Finding most recent file from a list of directories from find command

I use find . -type d -name "Financials" to find all the directories called "Financials" under the current directory. Since I am on Mac, I can use the following (which I found from another stackoverflow question) to find the latest modified file in my current directory: find . -type f -print0 | xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" ". What I would like to do is find a way to pipe the results of the first command into the second command--i.e. to find the most recently modified file in each "Financials" directory. Is there a way to do this?
I think you could:
find . -type d -name "Financials" -print0 |
xargs -0 -I{} find {} -type f -print0 |
xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
But if you want separately for each dir, then... why not just loop it:
find . -type d -name "Financials" |
while IFS= read -r dir; do
echo "newest file in $dir is $(
find "$dir" -type f -print0 |
xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
)"
done
Nest the 2nd file+xargs inside a first find+xargs:
find . -type d -name "Financials" -print0 \
| xargs -0 sh -c '
for d in "$#"; do
find "$d" -type f -print0 \
| xargs -0 stat -f "%m %N" \
| sort -rn \
| head -1 \
| cut -f2- -d" "
done
' sh
Note the trailing "sh" in sh -c '...' sh -- that word becomes "$0" inside the shell script so the for-loop can iterate over all the directories.
A robust way that will also avoid problems with funny filenames that contain special characters is:
find all files within this particular subdirectory, and extract the inode number and modifcation time
$ find . -type f -ipath '*/Financials/*' -printf "%T# %i\n"
extract youngest file's inode number
$ ... | awk '($1>t){t=$1;i=$2}END{print i}'
search file information by inode number
$ find . -type f -inum "$inode" '*/Financials/*'
So this gives you:
$ inode="$(find . -type f -ipath '*/Financials/*' -printf "%T# %i\n" | awk '($1>t){t=$1;i=$2}END{print i}')"
$ find . -type f -inum "$inode" '*/Financials/*'

How to print only results different from zero?

I've this script. I would like to print only the non-zero results.
My enviroment is Os X
find /PATH/ -type f -exec basename "{}" | grep -i "Word" | wc -l
First, here is a much faster find command that will do the same thing:
find /PATH/ -type f -iname '*Word*' | wc -l
Now, you can put this optimized command into an if statement:
if [[ `find /PATH/ -type f -iname '*Word*' | wc -l` ]]; then
find /PATH/ -type f -iname '*Word*' | wc -l
fi
To run the command just once, save the result into a variable:
count=`find /PATH/ -type f -iname '*Word*' | wc -l`
if [[ $count -gt 0 ]]; then
echo $count
fi
You can use grep -v to remove output that consists of just zero (with spaces before it, 'cause that's what wc prints). With #joanis' optimization of the search, that gives:
find /PATH/ -type f -iname '*Word*' | wc -l | grep -v '^ *0$'
When you count selected records, you do not have to filter on 0 hits.
This command shows all basenames that appear once or more.
find . -type f -iname '*Word*' -printf "%f\n" | sort | uniq -c
You might want to add | sort -n on the and to see which file occurs most.
Maybe you wanted something: How often Word occurs in different files.
grep -Rci while | grep -v ":0$"

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

Using awk to print ALL spaces within filenames which have a varied number of spaces

I'm executing the following using bash and awk to get the potentially space-full filename, colon, file size. (Column 5 contains the space delimited size, and 9 to EOL the file name):
src="Desktop"
echo "Constructing $src files list. `date`"
cat /dev/null > "$src"Files.txt
find -s ~/"$src" -type f -exec ls -l {} \; |
awk '{for(i=9;i<=NF;i++) {printf("%s", $i " ")} print ":" $5}' |
grep -v ".DS_Store" | grep -v "Icon\r" |
while read line ; do filespacesize=`basename "$line"`; filesize=`echo "$filespacesize" |
sed -e 's/ :/:/1'`
path=`dirname "$line"`; echo "$filesize:$path" >> "$src"Files.txt ;
done
And it works fine, BUT…
If a filename has > 1 space between parts, I only get 1 space between filename parts, and the colon, followed by the filesize.
How can I get the full filename, :, and then the file size?
It seems you want the following (provided your find handles the printf option with the %f, %s and %h modifiers):
src=Desktop
echo "Constructing $src files list. $(date)"
find ~/"$src" -type f -printf '%f:%s:%h\n' > "$src"Files.txt
Much shorter and much more efficient than your method!
This will not discard the .DS_STORE and Icon\r things… but I'm not really sure what you really want to discard. If you want to discard the .DS_STORE directory altogether:
find ~/"$src" -name '.DS_STORE' -type d -prune -o -type f -printf '%f:%s:%h\n' > "$src"Files.txt
#guido seems to have guessed what you mean by grep -v "Icon\r": ignore files ending with Icon; if this his guess is right, then this will do:
find ~/"$src" -name '.DS_STORE' -type d -prune -o ! -name '*Icon' -type f -printf '%f:%s:%h\n' > "$src"Files.txt

Resources