bash: recursively shorten directory name to first 10 characters - bash

I need to recursively rename all subdirectories to the first 10 characters of the original subdirectory name.
For example, the below directory:
/Documents/super-long-folder-name/other-folder-name/
would be renamed to:
/Documents/super-long/other-fold/
I have found a way to rename files to the first 10 characters of the original file name, but now I need to do this for directories.
To recursively rename the file names, I installed the perl rename function: brew install rename and then executed the code below:
find . -path '????????????????????*' -exec rename 's/^(.{10}).*(\..*)$/$1$2/' * {} \;
The above code finds files with file paths greater than 20 characters, then renames the file to the first 10 characters of the original file name.
Now I am trying to find a similar solution that would allow me to do this to directory names.
Thank you in advance for any insight you might have!

Please try the following:
#!/bin/bash
finddir="."
# calculate the max depth
depth=1
while IFS= read -r -d "" dir; do
str="${dir//[^\/]}"
if (( depth < ${#str} )); then
depth=${#str}
fi
done < <(find "$finddir" -type d -links 2 -print0)
# change long dirnames by starting with depth=1 incrementally
for (( i=1; i<=depth; i++ )); do
while IFS= read -r -d "" dir; do
stem=${dir%/*} # parent dir
leaf=${dir##*/} # current target dir
if (( ${#leaf} > 10 )); then
short=${leaf:0:10}
if [[ -d $stem/$short ]]; then
echo "$stem/$short exists. $stem/$leaf unchanged."
else
mv -- "$stem/$leaf" "$stem/$short"
# echo mv "$stem/$leaf" "$stem/$short"
fi
fi
done < <(find "$finddir" -type d -mindepth "$i" -maxdepth "$i" -print0)
done
The difficult point is the pathnames dynamically change during the execution and
pathnames given by find may differ from actual (renamed) pathnames.
My approach is:
to calculate the maximum depth in advance.
to iterate into deeper directory starting with depth=1 until depth=maximum_depth calculated above.
Hope this helps.

Related

Find all directories, that don't contain other directories

Currently:
$ find -type d
./a
./a/sub
./b
./b/sub
./b/sub/dub
./c/sub
./c/bub
I need:
$ find -type d -not -contains -type d
./a/sub
./b/sub/dub
./c/sub
./c/bub
How do I exclude directories, that contain other (sub)directories, but are not empty (contain files)?
You can find the leaf directories that only have 2 links (or less) and then check if each found directory contains some files.
Something like this:
# find leaf directories
find -type d -links -3 -print0 | while read -d '' dir
do
# check if it contains some files
if ls -1qA "$dir" | grep -q .
then
echo "$dir"
fi
done
Or simply:
find -type d -links -3 ! -empty
Note that you may need the find option -noleaf on some filesystems, like CD-ROM or some MS-DOS filesystems. It works without it in WSL2 though.
In the btrfs filesystem the directories always have 1 link so using -links won't work there.
A much slower, but filesystem agnostic, find based version:
prev='///' # some impossible dir
# A depth first find to collect non-empty directories
readarray -d '' dirs < <(find -depth -type d ! -empty -print0)
for dir in "${dirs[#]}"
do
dirterm=$dir'/'
# skip if it matches the previous dir
[[ $dirterm == ${prev:0:${#dirterm}} ]] && continue
# skip if it has sub directories
[[ $(find "$dir" -mindepth 1 -maxdepth 1 -type d -print -quit) != '' ]] && continue
echo "$dir"
prev=$dir
done # add "| sort" if you want the same order as a "find" without "-depth"
You didn't show us which of these directories do and do not contain files. You specify files, so I'm working on the assumption that you only want directories that have no subdirectories but do have files.
shopt -s dotglob nullglob globstar # customize glob evaluation
for d in **/ # loop directories only
do for s in "${d}"*/ # check subdirs in each
do [[ -h "$s" ]] || continue 2 # skip dirs with subdirs
done
for f in "${d}"* # check for nondirs in each
do echo "$d" # there's something here!
continue 2 # done with this dir, check next
done
done
dotglob includes "hidden" files whose names start with a "dot" (.foo)
nullglob makes no*such return nothing instead of the string 'no*such'.
globstar makes **/ match arbitrary depth - e.g., ./x/, ./x/y/, and ./x/y/z/.
for d in **/ loops over all subdirectories, including subdirectories of subdirectories, though the trailing / means it will only report directories, not files.
for s in "${d}"*/ loops over all the subdirectories of $d if there are any. nullglob means if there are none, the loop won't execute at all. If we see a subdirectory, [[ -h "$s" ]] || continue 2 says if it entered this loop at all, symlinks are ok, but anything else disqualifies $d, so skip up 2 enclosing loops and advance the top level to the next dir.
if it gets this far, there are no invalidating real subdirectories, so we have to confirm there are files of some sort, even if they are just symlinks to other directories. for f in "${d}"* loops through anything else in the directory, since we know there aren't subdirs. It won't even enter the loop if the directory doesn't have something because of the nullglob, so if it goes in at all, anything there is a reason to report the dir (echo "$d") as non-empty. Once that's done, there's no reason to keep checking, so continue 2 again advances the top loop to the next dir to check!
I expected **/ to work, but it fails to get any subdirectories at all on my Windows/Git Bash emulation. **/*/ ignores subdirectories of the current directory, which is why I originally used */ **/*/, but **/ prevents redundancies when run on a proper Centos VM. Use that.

Find a store in a variable the full path of a folder if exist in BASH

I need to obtain the full path of a folder (if exists) that match specific names. There is always one folder that matches the name.
E.g: the code must find, if exists, the folder with these possible names:
/home/user/myfolder
/home/user/myfolder_aaa
/home/user/myfolder_bbb
/home/user/myfolder_ccc
But it must not match any other "similar" folder, like
/home/user/myfolder_xxx
And if the folder exists I need to save in a variable the full path
Something like this is matching also unwanted cases and does not retry the full path:
path=`ls /home/user/myfolder*`
With a fairly small number of possibilities and only one target directory then this would be enough:
top_level='myfolder'
for end in '' '_aaa' '_bbb' '_ccc'
do
name=$top_level$end
if [[ -d $name ]]
then
var="$name"
break
fi
done
echo "$var found"
You can use find with regex:
find ./home/user -regextype posix-extended -type f -regex '.*/myfolder(_(aaa|bbb|ccc))?$'
To store the results in an array (as you don't appear to have whitespace in these folder names):
arr=()
while IFS= read -r f; do
arr+=( "$f" )
done < <(find /home/user -regextype posix-extended -type f -regex '.*/myfolder(_(aaa|bbb|ccc))?$' -print0)
# check array contents
declare -p arr

Bash Script To Move Newest Specific File Type From Subdirectories

Searches all subdirectories from the current directory.
Only targets specific file extension type.
Copies only the newest file which has a time stamp in it's title to another directory.
find . -mindepth 2 -name "*.ZIP" -exec cp {} tempZIP` \;
The only problem is I don't know how to tell it to grab only the newest file in each subdirectory. The files have the format:
2015-09-01_10-48-09.941+0000
for files in */; do
echo "Beginning of for loop"
echo "The current directory is $files"
cd $files
currentDirectory=$(pwd)
echo "Current working directory: $currentDirectory"
echo "Removing excess files from acqusition zips..."
rm *.csv *.tfr *.ini *.log
rm _Background.mca _Escape.mca _Gaussfit.mca _SumPeak.mca
echo "Removing the oldest MCA files..."
theDate=$(date +"LIVE_DATA_%Y-%m-%d_%H-%M-%S.000+0000.MCA")
echo "The date timestamp is $theDate"
for file in *; do
echo "Current file is: $file"
file=${file/.[0-9][0-9][0-9]/}
if [[ $theDate -gt $max ]] ; then
max=$theDate
latest="$file"
fi
done
echo "Latest: $latest"
echo "Moving up a folder"
cd ../
movedDirectory=$(pwd)
echo "Moved directory $movedDirectory"
echo "End of for loop"
done
How do I do comparisons between the date format I've specified and the files?
The current directory is U-500.0.0.2015-09-01_10-49-01-34/
Current working directory: /Users/user/Desktop/WatsonErrorLogs/v448/AlloyScript/temp/U-500.0.0.2015-09-01_10-49-01-34
Removing excess files from acqusition zips...
rm: *.csv: No such file or directory
rm: *.tfr: No such file or directory
rm: *.ini: No such file or directory
rm: *.log: No such file or directory
rm: _Background.mca: No such file or directory
rm: _Escape.mca: No such file or directory
rm: _Gaussfit.mca: No such file or directory
rm: _SumPeak.mca: No such file or directory
Removing the oldest MCA files...
The date timestamp is LIVE_DATA_2015-09-08_11-31-59.000+0000.MCA
Current file is: LIVE_DATA_2015-09-01_10-49-04.446+0000.MCA
./test.sh: line 46: [[: LIVE_DATA_2015-09: value too great for base (error token is "09")
Current file is: LIVE_DATA_2015-09-01_10-49-09.916+0000.MCA
./test.sh: line 46: [[: LIVE_DATA_2015-09: value too great for base (error token is "09")
Latest:
Moving up a folder
Moved directory /Users/user/Desktop/WatsonErrorLogs/v448/AlloyScript/temp
End of for loop
Similar to David's Linux-centric answer, here's one that should work in OSX, FreeBSD, NetBSD, etc.
#!/usr/bin/env bash
max=0
# You can make this pattern more explicit if you like.
# Or you could add an `if` that verifies it and `continue`s the loop on failure.
# Or not you could just ignore the errors. :)
for fname in *.ZIP; do
fname=${fname/.[0-9][0-9][0-9]/} # strptime/strftime doesn't support ms...
epoch=$(date -j -f '%Y-%m-%d_%H-%M-%S%z.ZIP' "$fname" '+%s')
if [ $epoch -gt $max ]; then
max=$epoch
latest="$fname"
fi
done
echo "Latest: $latest"
This has the advantage of using a for loop, so it will not barf on filenames with special characters like newlines in them, in case you decide to expand the pattern to recognize such formats.
The other thing a for loop does for us is avoid a subshell to run find. This saves a minuscule amount of resources on your server.
Some provisos:
If you need accuracy to less than a second, this solution will need extra tweaking.
This may not support dates after January 19th 2038. :-)
If I understand your question, the key to doing what you want to do is parsing a valid datestring from the filename that can be used with the date command to find the newest file in the selection returned by find. To do this, you will need to write a small script as it will require more than a single command usable by find ... -exec.
Note: you did not provide a full filename, but I will assume that you mean something like 2015-09-01_10-48-09.941+0000.ZIP
There are many ways to approach a small script to find the newest file based on the timestring. Using parameter expansion and substring replacement is what came to mind to handle creating a valid datestring to use with date. Each timestamp is then converted by date to seconds since epoch for comparison purposes:
#!/bin/bash
declare -i max=0
while read -r fname; do
## parese timestring from filename (assuming time.ZIP as filename)
tmp="${fname%.ZIP}"
t1=${tmp%_*} # first part (date)
t2=${tmp#*_} # second part (time)
dtstring="$t1 ${t2//-/:}" # create date string, replace '-' with ':'
tse=$(date -d "$dtstring" +%s) # time (sec) since epoch
if [ $tse -gt $max ]; then # test for tse > max
newest="$fname" # save filename & update max
max=$tse
fi
done < <(find . -type f -name "*.ZIP" -printf "%f\n")
# just for testing
echo "max : $max"
echo "newest: $newest"
## uncomment for actual copy
# cp "$newest" tempZIP
Test Files
$ ls -1 2015*
2015-08-31_10-48-09.941+0000.ZIP
2015-09-01_10-48-09.941+0000.ZIP
Output
$ bash neweststamp.sh
max : 1441104489
newest: 2015-09-01_10-48-09.941+0000.ZIP
Try it before actually copying, then you can adjust and uncomment the actual copy.
Note: the parameter expansion and substring replacement are present in the advanced shells like bash. If you must limit it to the POSIX shell (old Bourne shell +), leave a comment and we can adjust the script (it will just get much longer)
Deleting All Files EXCEPT Newest
Continuing from your comment, once you have the newest file, you can use find once again to delete all of the other files in a given directory. Use the ! not options along with -name (e.g. ! -name "$newest") to create a list excluding your newest file to delete:
find /path/to/dir -type f ! -name "$newest" -exec rm '{}' \;
You could also use a for loop as well:
for fname in /path/to/dir/*; do
[ "$fname" != "$newest" ] && rm "$fname"
done
Remember: TEST with an echo or printf before you actually let the script remove anything. Examples:
find /path/to/dir -type f ! -name "$newest" -exec printf "rm %s\n" '{}' \;
or
for fname in /path/to/dir/*; do
[ "$fname" != "$newest" ] && printf "rm %s\n" "$fname"
done
Fewer regrets that way...

Split a folder into multiple subfolders in terminal/bash script

I have several folders, each with between 15,000 and 40,000 photos. I want each of these to be split into sub folders - each with 2,000 files in them.
What is a quick way to do this that will create each folder I need on the go and move all the files?
Currently I can only find how to move the first x items in a folder into a pre-existing directory. In order to use this on a folder with 20,000 items... I would need to create 10 folders manually, and run the command 10 times.
ls -1 | sort -n | head -2000| xargs -i mv "{}" /folder/
I tried putting it in a for-loop, but am having trouble getting it to make folders properly with mkdir. Even after I get around that, I need the program to only create folders for every 20th file (start of a new group). It wants to make a new folder for each file.
So... how can I easily move a large number of files into folders of an arbitrary number of files in each one?
Any help would be very... well... helpful!
Try something like this:
for i in `seq 1 20`; do mkdir -p "folder$i"; find . -type f -maxdepth 1 | head -n 2000 | xargs -i mv "{}" "folder$i"; done
Full script version:
#!/bin/bash
dir_size=2000
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -i mv "{}" "$dir_name$i"
done
For dummies:
create a new file: vim split_files.sh
update the dir_size and dir_name values to match your desires
note that the dir_name will have a number appended
navigate into the desired folder: cd my_folder
run the script: sh ../split_files.sh
This solution worked for me on MacOS:
i=0; for f in *; do d=dir_$(printf %03d $((i/100+1))); mkdir -p $d; mv "$f" $d; let i++; done
It creates subfolders of 100 elements each.
This solution can handle names with whitespace and wildcards and can be easily extended to support less straightforward tree structures. It will look for files in all direct subdirectories of the working directory and sort them into new subdirectories of those. New directories will be named 0, 1, etc.:
#!/bin/bash
maxfilesperdir=20
# loop through all top level directories:
while IFS= read -r -d $'\0' topleveldir
do
# enter top level subdirectory:
cd "$topleveldir"
declare -i filecount=0 # number of moved files per dir
declare -i dircount=0 # number of subdirs created per top level dir
# loop through all files in that directory and below
while IFS= read -r -d $'\0' filename
do
# whenever file counter is 0, make a new dir:
if [ "$filecount" -eq 0 ]
then
mkdir "$dircount"
fi
# move the file into the current dir:
mv "$filename" "${dircount}/"
filecount+=1
# whenever our file counter reaches its maximum, reset it, and
# increase dir counter:
if [ "$filecount" -ge "$maxfilesperdir" ]
then
dircount+=1
filecount=0
fi
done < <(find -type f -print0)
# go back to top level:
cd ..
done < <(find -mindepth 1 -maxdepth 1 -type d -print0)
The find -print0/read combination with process substitution has been stolen from another question.
It should be noted that simple globbing can handle all kinds of strange directory and file names as well. It is however not easily extensible for multiple levels of directories.
The code below assumes that the filenames do not contain linefeeds, spaces, tabs, single quotes, double quotes, or backslashes, and that filenames do not start with a dash. It also assumes that IFS has not been changed, because it uses while read instead of while IFS= read, and because variables are not quoted. Add setopt shwordsplit in Zsh.
i=1;while read l;do mkdir $i;mv $l $((i++));done< <(ls|xargs -n2000)
The code below assumes that filenames do not contain linefeeds and that they do not start with a dash. -n2000 takes 2000 arguments at a time and {#} is the sequence number of the job. Replace {#} with '{=$_=sprintf("%04d",$job->seq())=}' to pad numbers to four digits.
ls|parallel -n2000 mkdir {#}\;mv {} {#}
The command below assumes that filenames do not contain linefeeds. It uses the implementation of rename by Aristotle Pagaltzis which is the rename formula in Homebrew, where -p is needed to create directories, where --stdin is needed to get paths from STDIN, and where $N is the number of the file. In other implementations you can use $. or ++$::i instead of $N.
ls|rename --stdin -p 's,^,1+int(($N-1)/2000)."/",e'
I would go with something like this:
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
# Go through all JPG files in the current directory
for f in *.jpg; do
# Create new output directory if first of new batch of 2000
if [ $n -eq 0 ]; then
outdir=folder$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if we have sent 2000
[ $n -eq 2000 ] && n=0
done
The answer above is very useful, but there is a very import point in Mac(10.13.6) terminal. Because xargs "-i" argument is not available, I have change the command from above to below.
ls -1 | sort -n | head -2000| xargs -I '{}' mv {} /folder/
Then, I use the below shell script(reference tmp's answer)
#!/bin/bash
dir_size=500
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -I '{}' mv {} "$dir_name$i"
done
This is a tweak of Mark Setchell's
Usage:
bash splitfiles.bash $PWD/directoryoffiles splitsize
It doesn't require the script to be located in the same dir as the files for splitting, it will operate on all files, not just the .jpg and allows you to specify the split size as an argument.
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
if [ "$#" -ne 2 ]; then
echo Wrong number of args
echo Usage: bash splitfiles.bash $PWD/directoryoffiles splitsize
exit 1
fi
# Go through all files in the specified directory
for f in $1/*; do
# Create new output directory if first of new batch
if [ $n -eq 0 ]; then
outdir=$1/$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if current new dir is full
[ $n -eq $2 ] && n=0
done
Can be directly run in the terminal
i=0;
for f in *;
do
d=picture_$(printf %03d $((i/2000+1)));
mkdir -p $d;
mv "$f" $d;
let i++;
done
This script will move all files within the current directory into picture_001, picture_002... and so on. Each newly created folder will contain 2000 files
2000 is the chunked number
%03d is the suffix digit you can adjust (currently 001,002,003)
picture_ is the folder prefix
This script will chunk all files into its directory (create subdirectory)
You'll certainly have to write a script for that.
Hints of things to include in your script:
First count the number of files within your source directory
NBFiles=$(find . -type f -name *.jpg | wc -l)
Divide this count by 2000 and add 1, to determine number of directories to create
NBDIR=$(( $NBFILES / 2000 + 1 ))
Finally loop through your files and move them accross the subdirs.
You'll have to use two imbricated loops : one to pick and create the destination directory, the other to move 2000 files in this subdir, then create next subdir and move the next 2000 files to the new one, etc...

Script fails with spaces in directory names

I have a really easy question, I have found a bunch of similar questions answered but none that solved this for me.
I have a shell script that goes through a directory and prints out the number of files and directories in a sub directory, followed by the directory name.
However it fails with directories with spaces, it attempts to use each word as a new argument. I have tried putting $dir in quotations but that doesn't help. Perhaps because its already in the echo quotations.
for dir in `find . -mindepth 1 -maxdepth 1 -type d`
do
echo -e "`ls -1 $dir | wc -l`\t$dir"
done
Thanks in advance for your help :)
Warning: Two of the three code samples below use bashisms. Please take care to use the correct one if you need POSIX sh rather than bash.
Don't do any of those things. If your real problem does involve using find, you can use it like so:
shopt -s nullglob
while IFS='' read -r -d '' dir; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done < <(find . -mindepth 1 -maxdepth 1 -type d -print0)
However, for iterating over only immediate subdirectories, you don't need find at all:
shopt -s nullglob
for dir in */; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done
If you're trying to do this in a way compatible with POSIX sh, you can try the following:
for dir in */; do
[ "$dir" = "*/" ] && continue
set -- "$dir"/*
[ "$#" -eq 1 ] && [ "$1" = "$dir/*" ] && continue
printf '%s\t%s\n' "$#" "$dir"
done
You shouldn't ever use ls in scripts: http://mywiki.wooledge.org/ParsingLs
You shouldn't ever use for to read lines: http://mywiki.wooledge.org/DontReadLinesWithFor
Use arrays and globs when counting files to do this safely, robustly, and without external commands: http://mywiki.wooledge.org/BashFAQ/004
Always NUL-terminate file lists coming out of find -- otherwise, filenames containing newlines (yes, they're legal in UNIX!) can cause a single name to be read as multiple files, or (in some find versions and usages) your "filename" to not match the real file's name. http://mywiki.wooledge.org/UsingFind

Resources