Bash; check if a folder exist while the folders are numbered - bash

I have a series of folders (A1, A2, ...) and some subfolders, but I only need to check the subfolders which follow this pattern 0$n_st* and I do not need to check the rest of the subfolders:
A1/
01_stjhs_lkk/
02_stlkd_ooe/
03_stoie_akwe/
...
A2/
01_stpw_awq/
02_stoe_iwoq/
03_stak_weri/
...
...
I want to find the subfolder which has largest number (0$n) (the number of subfolders varies among different folders), then go to the subfolder and grep something and repeat the process over other folders (A1, A2, ...) here is my script which does not work (seems the if condition has some problem)
for dd in */; do
cd "$dd" # A1, A2,...
for num in {8,7,6,5,4,3,2,1}; do
if [ -d 0${num}_st* ]; then
echo "0${num}_st*"
cd "0${num}_st*"
echo $(grep -i 'xyz f' 04_*.log) #grep this line from log file
break
fi
cd ..
done
cd ..
done

The immediate problem is that if [ -d ... ] will produce a syntax error if ... is a wildcard which matches more than one file. You can work around this in various ways, but probably the simplest which matches your (vague) requirements is something like
for dd in */; do
for dir in "$dd"/0[0-9]_st*/; do
: nothing
done
# the variable `$dir` will now contain the alphabetically last match
grep -i 'xyz f' "$dir"/04_*.log
done
If the directories contain different numbers of digits in their names, sorting them alphabetically will not work (2 will come after 19) but your examples only show names with two digits in all cases so let's assume that's representative.
Demo: https://ideone.com/1N2Iui
Here's a variation which exhibits a different way to find the biggest number by using sort -n and which thus should work for directories with variable numbers of digits, too.
for dd in */; do
biggest=$(printf '%s\n' "$dd"/0*_st*/ | sort -d / -k2,2n | tail -n 1)
grep -i 'xyz f' "$biggest"/04_*.log
done
Because the wildcards already end with /, the / after e.g. "$dd"/ is strictly speaking redundant, but harmless. You can take it out (at the cost of some legibility) if it disturbs you (or you are on a weird system where double slashes have a special meaning to the file system).

Suggesting find command.
find . -type d -printf "%f\n" |grep "^0"|sort -n|tail -n1
Notice this find command scans recursively all directories under current directory.
In order to limit find command to specific directories dir1 dir2 you need to specify them before -type option.
find dir1 dir2 -type d -printf "%f\n" |grep "^0"|sort -n|tail -n1
Explanation
find . -type d
Prints all all the directories and subdirectories under current directory
find . -type d -printf "%f\n"
Prints only the directory name, not the directory path.
grep "^0"
Filter only directory names starting with 0
If matching more than required directories,
possibly refine grep filter: grep "^0[[:digit:]]\+_" as well.
sort -n
Sort directory names numerically
tail -n1
Print the last directory

Related

Find duplicate folder names and ignore further subdirectory matches

I have a data directory where duplicate folders have been created at various hierarchy levels. I want to find these duplicates using case insensitive comparison. I also want to ignore further subdirectory matches for every match, so as to reduce the noise, since in my usecase directory names are quite unique and indicate that all subdirectories are also likely to be duplicates (basically a given folder and all its subfolders has been copied at various locations). In other words, if 2 directories match but their parents, while being in different paths, also match, just return the parents.
For example, if I have the following folders:
a/b/Dup1
a/b/dup1/ignore1
c/dup1/ignore1
I want the output to be something like:
'dup1' found in:
a/b/Dup1
a/b/dup1
c/dup1
Any other output that lists all the case-insensitive duplicates and their relative or absolute path is also acceptable (or even just the name of the duplicate directory, since it should be easy to list its locations from there).
You can try combining find, sort and uniq tools to get desired result. This one liner should work:
find yourstartingpath -type d -printf '%f\n' |sort -i |uniq -icd |while read c x;do echo "===== $x found $c times in =====";find yourstartingpath -type d -iname "$x" -printf '%p\n';done
This oneliner will not remove subdirectories from search, but I believe it would be hard to do so with only bash core utilities. If it is necessary, depending on your system probably it would be better to write program in either perl or python to get what you need. Eventually you may restrict your results to folders, which have some common subdirectories, ie. using following oneliner:
find yourstartingpath -type d -printf '%P\n' |awk -F/ -v OFS='/' 'NF>=4 {print $(NF-1),$NF}' |sort -i |uniq -id |while read x;do echo "======= $x "; find yourstartingpath -type d -ipath "*$x";done
but it will remove Dup1 folder from your example (because it does not have ignore1 subfolder).

Shorten filename to n characters while preserving file extension

I'm trying to shorten a filename while preserving the extension.
I think cut may be the best tool to use, but I'm not sure how to preserve the file extension.
For example, I'm trying to rename abcdefghijklmnop.txt to abcde.txt
I'd like to simply lop off the end of the filename so that the total character length doesn't exceed [in this example] 5 characters.
I'm not concerned with filename clashes because my dataset likely won't contain any, and anyway I'll do a find, analyze the files, and test before I rename anything.
The background for this is ultimately that I want to mass truncate filenames that exceed 135 characters so that I can rsync the files to an encrypted share on a Synology NAS.
I found a good way to search for all filenames that exceed 135 characters:
find . -type f | awk -F'/' 'length($NF)>135{print $0}'
And I'd like to pipe that to a simple cut command to trim the filename down to size. Perhaps there is a better way than this. I found a method to shorten filenames while preserving extensions, but I need to recurse through all sub-directories.
Any help would be appreciated, thank you!
Update for clarification:
I'd like to use a one-liner with a syntax like this:
find . -type f | awk -F'/' 'length($NF)>135{print $0}' | some_code_here_to_shorten_the_filename_while_preserving_the_extension
With GNU find and bash:
export n=10 # change according to your needs
find . -type f \
! -name '.*' \
-regextype egrep \
! -regex '.*\.[^/.]{'"$n"',}' \
-regex '.*[^/]{'$((n+1))',}' \
-execdir bash -c '
echo "PWD=$PWD"
for f in "${##./}"; do
ext=${f#"${f%.*}"}
echo mv -- "$f" "${f:0:n-${#ext}}${ext}"
done' bash {} +
This will perform a dry-run, that is it shows folders followed by the commands to be executed within them. Once you're happy with its output you can drop echo before mv (and echo "PWD=$PWD" line too if you want) and it'll actually rename all the files whose names exceed n characters to names exactly of n characters length including extension.
Note that this excludes hidden files, and files whose extensions are equal to or longer than n in length (e.g. .hidden, app.properties where n=10).
use bash string manipulations
Details: https://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/string-manipulation.html.
scroll to "Substring Extraction"
example below cut filename to 10 chars preserving extension
~ % cat test
rawFileName=$(basename "$1")
filename="${rawFileName%.*}"
ext="${rawFileName##*.}"
if [[ ${#filename} < 9 ]]; then
echo ${filename:0:10}.${ext}
else
echo $1
fi
And tests:
~ % ./test 12345678901234567890.txt
1234567890.txt
~ % ./test 1234567.txt
1234567.txt
Update
Since your file are distributed in a tree of directories, you can use my original approach, but passing the script to a sh command passed to the -exec option of find:
n=5 find . -type f -exec sh -c 'f={}; d=${f%/*}; b=${f##*/}; e=${b##*.}; b=${b%.*}; mv -- "$f" "$d/${b:0:n}.$e"' \;
Original answer
If the filename is in a variable x, then ${x:0:5}.${x##*.} should do the job.
So you might do something like
n=5 # or 135, or whatever you like
for f in *; do
mv -- "$f" "${f:0:n}.${f##*.}"
done
Clearly this assumes that there are no clashes between the shortened names. If there are clashes, then only one would survive! So be careful.

sh/bash: Find all files in a folder that start with a number followed by a _blank_

I am working on a shell script, and now I came to a point where I need to rename files that start with a number and a blank by removing this pattern and moving them to a specific folder that is basically the string between the second and third " - "
example :
001 - folder1 - example.doc > /folder1/example.doc
002 - folder2 - someexample.doc > /folder2/someexample.doc
003 - folder3 - someotherexample.doc > /folder3/someotherexample.doc
I want to do something like
find /tmp -name '*.doc' -print | rename .... ...
what I do not know is:
- how to tell find that the file starts with a number,
and second
- how to explode the name by a pattern like " - " and tell rename to place the file in the folder
If possible, I would avoid find and just use bash's regular expression matching. If you don't need to recursively search /tmp (as your version of find is doing), this should work in any version of bash:
regex='^[[:digit:]]+ - (.+) - (.+)$'
for f in /tmp/*.doc; do
[[ $f =~ /tmp/$regex ]] || continue
mv -- "$f" "/${BASH_REMATCH[1]/${BASH_REMATCH[2]}"
done
If you do need to recursively search, and you are using bash 4 or later, you can use the globstar option:
shopt -s globstar
regex='^[[:digit:]]+ - (.+) - (.+)$'
for f in /tmp/**/*.doc; do
b=$(basename "$f")
[[ $b =~ $regex ]] || continue
mv -- "$f" "/${BASH_REMATCH[1]/${BASH_REMATCH[2]}"
done
If your destination folder and file names have no spaces, and if all your original files are in the current directory, you could try something like:
while read f; do
if [[ "$f" =~ ^\./[0-9]+\ -\ ([^\ ]+)\ -\ (.+\.doc)$ ]]; then
mkdir -p "${BASH_REMATCH[1]}"
mv "$f" "${BASH_REMATCH[1]}/${BASH_REMATCH[2]}"
fi
done < <(find . -maxdepth 1 -name '[0-9]*.doc')
Explanations:
find . -maxdepth 1... restricts the search to the current directory.
find...-name '[0-9]*.doc matches only files which names start with at least one digit and end with .doc.
The regular expression models your original file names (with the initial ./ that find adds). It contains two sub-expressions enclosed in (...) corresponding to the folder name and to the destination file name. These sub-expressions are stored by bash in the BASH_REMATCH array if there is a match, at positions 1 and 2, respectively.
The regular expression removes leading and trailing spaces from the destination folder name and the leading spaces from the destination file name (I assume this is what you want).
With gawk:
find . -regex '^.*[0-9]+ - [a-zA-Z]+[0-9] - [a-zA-Z]+\.doc$' -printf %f | awk -F- '{ print "mv \""$0"\" /"gensub(" ","","g",$2)"/"gensub(" ","","g",$3) }'
Use find with regular expressions and then parse the output through to awk to execute the move command.
When you have verified that the commands are OK, run the commands with.
find . -regex '^.*[0-9]+ - [a-zA-Z]+[0-9] - [a-zA-Z]+\.doc$' -printf %f | awk -F- '{ print "mv \""$0"\" /"gensub(" ","","g",$2)"/"gensub(" ","","g",$3) }' | sh
Be wary that such a strategy can be open to a risk of command injection.
the answer is very close to the suggestion made by Raman, I removed printf, added the part that creates the folder and added the part removing the leading and the trailing blank.
in the end it looks like this:
find . -regex '^.*[0-9].*\.doc$' | awk -F- '{ print "mkdir \""gensub(" $","","g",gensub("^ ","","g",$2))"\" \nmv \""$f"\" \"./"gensub(" $","","g",gensub("^ ","","g",$2))"/"gensub(" $","","g",gensub("^ ","","g",$2"-"$3))"\"" }'|sh
thank you everybody for the suggestions.

Recursively concatenating (joining) and renaming text files in a directory tree

I am using a Mac OS X Lion.
I have a folder: LITERATURE with the following structure:
LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
I want to recursively concatenate the chapters that are split into multiple files (not all are). Then, I want to write the concatenated file to its parent's parent directory. The name of the concatenated file should be the same as the name of its parent directory.
For example, after running the script (in the folder structure shown above) I should get the following.
LITERATURE > Y > YATES, DORNFORD:
THE BROTHER OF DAPHNE.txt
THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
In this example, the parent directory is THE BROTHER OF DAPHNE and the parent's parent directory is YATES, DORNFORD.
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
It's not clear what you mean by "recursively" but this should be enough to get you started.
#!/bin/bash
titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188
local arr
arr=("${#,,}")
echo "${arr[#]^}"
}
for book in LITERATURE/?/*/*; do
title=$(titlecase ${book##*/})
for file in "$book"/*; do
cat "$file"
echo
done >"$book/$title"
echo '# not doing this:' rm "$book"/*.txt
done
This loops over LITERATURE/initial/author/BOOK TITLE and creates a file Book Title (where should a space be added?) from the catenated files in each book directory. (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure.
Removing the chapter files is a bit risky so I'm not doing it here. You could remove the echo prefix from the line after the first done to enable it.
If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title assignment assumes you can use the book title unquoted.
Only the parameter expansion with case conversion is beyond the very basics of Bash. The array operations could perhaps also be a bit scary if you are a complete beginner. Proper understanding of quoting is also often a challenge for newcomers.
cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
cat all txt at once (assuming name sorted list)
take chapter number/ref from the ls of the folder and with a sed to adapt the format
rename the concatenate file including chapters
Shell doesn't like white space in names. However, over the years, Unix has come up with some tricks that'll help:
$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt
Might do what you want.
The find recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt).
Normally, find separates out the directory entry names with NL, but the -print0 says to separate out the entries names with the NUL character. The NL is a valid character in a file name, but NUL isn't.
The xargs command takes the output of the find and processes it. xargs gathers all the names and passes them in bulk to the command you give it -- in this case the cat command.
Normally, xargs separates out files by white space which means Chapters would be one file and 01-05.txt would be another. However, the -0 tells xargs, to use NUL as a file separator -- which is what -print0 does.
Thanks for all your input. They got me thinking, and I managed to concatenate the files using the following steps:
This script replaces spaces in filenames with underscores.
#!/bin/bash
# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
do
# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
find . -name '* *' -maxdepth $i | while read file
do
# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
item=`echo "$file" | sed 's/ /_/g'`
echo "Renaming '$file' to '$item'"
mv "$file" "$item"
done
done
This script concatenates text files that start with Part, Chapter, Section, or Book.
#!/bin/bash
# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do
# Check if the parent directory contains any files of interest.
if ls $D/Part*.txt &>/dev/null ||
ls $D/Chapter*.txt &>/dev/null ||
ls $D/Section*.txt &>/dev/null ||
ls $D/Book*.txt &>/dev/null
then
# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
ppdir="$(dirname "$D")"
# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
fi
done
Now, we delete all the files that we concatenated so that its parent directory is left empty.
find . -name 'Part*' -delete
find . -name 'Chapter*' -delete
find . -name 'Section*' -delete
find . -name 'Book*' -delete
The following command will delete empty directories. (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files.
find . -type d -empty -delete
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]

Recursively find all files that match a certain pattern

I need to find (or more specifically, count) all files that match this pattern:
*/foo/*.doc
Where the first wildcard asterisk includes a variable number of subdirectories.
With gnu find you can use regex, which (unlike -name) match the entire path:
find . -regex '.*/foo/[^/]*.doc'
To just count the number of files:
find . -regex '.*/foo/[^/]*.doc' -printf '%i\n' | wc -l
(The %i format code causes find to print the inode number instead of the filename; unlike the filename, the inode number is guaranteed to not have characters like a newline, so counting is more reliable. Thanks to #tripleee for the suggestion.)
I don't know if that will work on OSX, though.
how about:
find BASE_OF_SEARCH/*/foo -name \*.doc -type f | wc -l
What this is doing:
start at directory BASE_OF_SEARCH/
look in all directories that have a directory foo
look for files named like *.doc
count the lines of the result (one per file)
The benefit of this method:
not recursive nor iterative (no loops)
it's easy to read, and if you include it in a script it's fairly easy to decipher (regex sometimes is not).
UPDATE: you want variable depth? ok:
find BASE_OF_SEARCH -name \*.doc -type f | grep foo | wc -l
start at directory BASE_OF_SEARCH
look for files named like *.doc
only show the lines of this result that include "foo"
count the lines of the result (one per file)
Optionally, you could filter out results that have "foo" in the filename, because this will show those too.
Based on the answers on this page on other pages I managed to put together the following, where a search is performed in the current folder and all others under it for all files that have the extension pdf, followed by a filtering for those that contain test_text on their title.
find . -name "*.pdf" | grep test_text | wc -l
Untested, but try:
find . -type d -name foo -print | while read d; do echo "$d/*.doc" ; done | wc -l
find all the "foo" directories (at varying depths) (this ignores symlinks, if that's part of the problem you can add them); use shell globbing to find all the ".doc" files, then count them.

Resources