Get files from directories alphabetically sorted with bash - bash

I have this code that works in the directory that I execute:
pathtrabajo=.
filedirs="files.txt"
dirs=$(find . -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in $dirs
do
echo "${entry}" >> "${filedirs}"
find "$entry" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${filedirs}"
done
IFS=$SAVEIFS
But when I try to make it global to work with variables, find gives error:
pathtrabajo="/path/to/a/files"
filedirs="files.txt"
dirs=$(find "${pathtrabajo}" -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in "${dirs[#]}"
do
echo "${entry}" >> "${pathtrabajo}"/"${filedirs}"
find "${entry}" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${pathtrabajo}"/"${filedirs}"
done
IFS=$SAVEIFS
What did I do wrong?

It's really not clear why you are using find here at all. The following will probably do what you are trying if I can guess from your code.
dirs=([0-9][!0-9]*/ [0-9][0-9][!0-9]*/ [0-9][0-9][0-9][!0-9]*/ [!0-9]*/)
printf "%s\n" "${dirs[#]}" >"$filedirs"
for dir in "${dirs[#]}"; do
printf "%s\n" "$dir"/*.md |
awk '{ sub(/\.md$/, ""); print "- [["$0"]]" }'
done >>"$filedirs"
The shell already expands wildcards alphabetically. The dirs assignment will expand all directories which start with a single digit, then the ones with two digits, then the ones with three digits -- extend if you need more digits -- then the ones which do not start with a digit.
It would not be hard, but cumbersome, to extend the code to run in an arbitrary directory. My proposed solution would be (for once!) to cd to the directory where you want the listing, then run this script.

Related

List files that match list whithout duplicates keep highest version number

Sorry if this is a repeated question but I have been looking for several hours.
I have a list of files generated by
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n"
that print something like
libncurses.so.6
libaudit.so.1
libncurses.so.5
libicuuc.so.65
libnghttp2.so.14
libicuuc.so.71
I would like to keep only the files with the highest vesion number
libncurses.so.6
libaudit.so.1
libnghttp2.so.14
libicuuc.so.71
Thanks lot for your help
maybe with awk?
find /usr/lib64/ -maxdepth 1 -type l -printf '%f\n' |
awk -F '\\.so\\.' -v OFS='.so.' '
$2 > vers[$1] { vers[$1] = $2 }
END { for (lib in vers) print lib, vers[lib] }
'
libaudit.so.1
libicuuc.so.71
libnghttp2.so.14
libncurses.so.6
note: you might need to implement a more accurate operator than > for comparing versions, or use:
find /usr/lib64/ -maxdepth 1 -type l -printf '%f\n' |
sort -rV |
awk -F '\\.so\\.' -v OFS='.so.' '!seen[$1]++'
You could chain some commands with a pipe, something like:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" |
sort -rn -k3 -t. |
awk -F. '!seen[$1]++'
The code above assumes that there are always 3 fields/column separated by a dot . from the file names.
With pure bash using associative array with GNU find and sort
#!/usr/bin/env bash
declare -A uniq
while IFS= read -rd '' files; do
if ((!uniq[${files%%.*}]++)); then
printf '%s\n' "$files"
fi
done < <(
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\0" |
sort -rnz -k3 -t.
)
You can use the following to sort:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" | sort -V
To only keep the latest version:
find /usr/lib64/ -maxdepth 1 -type l -printf "%f\n" | sort -V | uniq -f2
-f2 ignores the first two fields when comparing lines.

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

Count filenumber in directory with blank in its name

If you want a breakdown of how many files are in each dir under your current dir:
for i in $(find . -maxdepth 1 -type d) ; do
echo -n $i": " ;
(find $i -type f | wc -l) ;
done
It does not work when the directory name has a blank in the name. Can anyone here tell me how I must edite this shell script so that such directory names also accepted for counting its file contents?
Thanks
Your code suffers from a common issue described in http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29.
In your case you could do this instead:
for i in */; do
echo -n "${i%/}: "
find "$i" -type f | wc -l
done
This will work with all types of file names:
find . -maxdepth 1 -type d -exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
How it works
find . -maxdepth 1 -type d
This finds the directories just as you were doing
-exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
This feeds each directory name to a shell script which counts the files, similarly to what you were doing.
There are some tricks here: Counter {} are passed as arguments to the shell script. Counter becomes $0 (which is only used if the shell script generates an error. find replaces {} with the name of a directory it found and this will be available to the shell script as $1. This is done is a way that is safe for all types of file names.
Note that, wherever $1 is used in the script, it is inside double-quotes. This protects it for word splitting or other unwanted shell expansions.
I found the solution what I have to consider:
Consider_this
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for i in $(find . -maxdepth 1 -type d); do
echo -n " $i: ";
(find $i -type f | wc -l) ;
done
IFS=$SAVEIFS

bash script: how to display output in different line

i'm trying use one line of code to solve a problem
echo $(find . -maxdepth 1 -type f -newer $1 | sed 's,\.\/,,g')
this will print out all the file in current folder that are newer than the input file. But it prints out in one single line:
file1 file2 file3 file4....
how can i display each file name in a single line like:
file1
file2
file3
...
This seems to be a very simple but i've been searching and have no solution.
Thank you in advance.
Get rid of the echo and the $(...).
find . -maxdepth 1 -type f -newer "$1" | sed 's,\.\/,,g'
If you have GNU find you can replace the sed with a -printf action:
find . -maxdepth 1 -type f -newer "$1" -printf '%P\n'
pipe it to tr:
... | tr " " "\n"

Using awk to print ALL spaces within filenames which have a varied number of spaces

I'm executing the following using bash and awk to get the potentially space-full filename, colon, file size. (Column 5 contains the space delimited size, and 9 to EOL the file name):
src="Desktop"
echo "Constructing $src files list. `date`"
cat /dev/null > "$src"Files.txt
find -s ~/"$src" -type f -exec ls -l {} \; |
awk '{for(i=9;i<=NF;i++) {printf("%s", $i " ")} print ":" $5}' |
grep -v ".DS_Store" | grep -v "Icon\r" |
while read line ; do filespacesize=`basename "$line"`; filesize=`echo "$filespacesize" |
sed -e 's/ :/:/1'`
path=`dirname "$line"`; echo "$filesize:$path" >> "$src"Files.txt ;
done
And it works fine, BUT…
If a filename has > 1 space between parts, I only get 1 space between filename parts, and the colon, followed by the filesize.
How can I get the full filename, :, and then the file size?
It seems you want the following (provided your find handles the printf option with the %f, %s and %h modifiers):
src=Desktop
echo "Constructing $src files list. $(date)"
find ~/"$src" -type f -printf '%f:%s:%h\n' > "$src"Files.txt
Much shorter and much more efficient than your method!
This will not discard the .DS_STORE and Icon\r things… but I'm not really sure what you really want to discard. If you want to discard the .DS_STORE directory altogether:
find ~/"$src" -name '.DS_STORE' -type d -prune -o -type f -printf '%f:%s:%h\n' > "$src"Files.txt
#guido seems to have guessed what you mean by grep -v "Icon\r": ignore files ending with Icon; if this his guess is right, then this will do:
find ~/"$src" -name '.DS_STORE' -type d -prune -o ! -name '*Icon' -type f -printf '%f:%s:%h\n' > "$src"Files.txt

Resources