Breaking down a filename into lexicographic based folders - bash

Let's say I have thousands of images in a folder in the format filename_order.jpg.
filename are encoded as a 7 digits integer from 0000000 to 9999999
order is a number between 0 and 9
folder/
6398305_0.jpg
6398305_1.jpg
6398305_2.jpg
...
6399305_0.jpg
Is there an easy way to sort them into equality repartitioned folders based on the filenames?
folder/
6/3/9/
8/3/0/5/
6398305_0.jpg
6398305_1.jpg
6398305_2.jpg
...
9/3/0/7/
6399307_0.jpg
Is there a way to do the reverse operation as well: given a nested tree structure bringing it back to level 1 only.
The goal is being able to store them in S3 in an efficient way for millions of images.
Thank you.

This would do it in pure Bash:
#!/usr/bin/env bash
# extglob needed to expand number into a serie of folders path
shopt -s extglob
# Starting folder name
folder=folder
# Iterate all *.jpg files in folder
for file in "$folder/"*.jpg; do
# Remove leading directory path from file to get basename
basename="${file##*/}"
# Remove everything ater first _ to get only numbers
numbers="${basename%_*}"
# Insert / before each number to create a directory path from numbers
# Need Bash extglob
dir="$folder${numbers//?()/\/}"
# Create the directory path
echo mkdir -p "$dir"
# move file to its directory
echo mv "$file" "$dir/"
done
Remove the echo if the output matches your expectations.

Nesting a flat folder,
cp -R flat_folder/ nested_folder/
cd nested_folder/
for f in *_[0-9].jpg
do
filename=${f%.*}
extension=${f##*.}
number=${filename%_*}
index=${filename##*_}
folder=$(echo $number | sed 's/\(.\)\(.\)\(.\)\(.\)\(.\)\(.\)\(.\)/\1\/\2\/\3\/\4\/\5\/\6\/\7/')
mkdir -p $folder
mv $f $folder/
done
Flattening a nested folder,
cd nested_folder/
find . -name "*.jpg" -exec cp {} ../flat_folder/ \;

Related

change unzipped folder name

I have a fairly large number of directories (500+), each directory (and possible sub-directories) contains 4 or more zip files.
I managed to piece together a bash script that unzips the compressed files while maintaining zip filename as directory and all the directory hierarchy.
For example: If I have a zip file called 100011_test123.zip, and it contains 10 files. The script will uncompress all the files into 100011_test123/ directory.
The occurrence of numbers 100010 before the underscore in the filename/directoryname is totally random.
Here's the actual bash script:
#!/bin/bash
cd <directory-with-large-number-of-zip-files>
find . -name "*.zip" | while read filename; do unar -d -o "`dirname "$filename"`" "$filename"; done;
find . -name "*.zip" -type f -delete
Now I would like to update the script in order to remove the 100010_ from the .zip filename without tampering with the directory structure/hierarchy (I guess there's a way to rename the zip files before using unar command) and then uncompress the files into a directory without 100010_ at the beginning.
I have been stuck with this for more than 3 days. Any insights on this would be highly appreciated.
Thank you.
With all zip files at the same level, you don't need find, but a regular filename pattern globbing will do to iterate each zip archive.
And with bash's globstar option, you can also find the zip archives inside sub-directories
#!/usr/bin/env bash
shopt -s nullglob # Prevents iterating if no filename match
shopt -s globstar # ./**/ Allow searching inside sub-directories
# Set the basedir if you want all output directories at same place
#basedir="$PWD"
for zipfile in ./**/*.zip; do
# Extract the base directory containing the archive
zipdir="${zipfile%/*}"
# Extract the base name without the directory path
basename="${zipfile##*/}"
# Remove the .zip extension
# 100011_test123.zip -> 100011_test123
extensionless="${basename%.zip}"
# Remove everything before and first underscore 100011_
# 100011_test123 -> test123
outputdir="${basedir:-$zipdir}/${extensionless#*_}"
# Create output directory or continue with next archive
# mkdir -p test123
mkdir -p "$outputdir" || continue
# Unzip the zipfile into the outputdir and remove the zipfile if successful
# unrar -d -o test123 100011_test123.zip && rm -f -- 100011_test123.zip
unar -d -o "$outputdir" "$zipfile" && rm -f -- "$zipfile"
done
You need to parse directory name and filename first for each entry. Please check the ${fullpath%/*} and ${fullpath##*/} for this purpose. And awk for splitting filename with '_' and getting second part of it.
You can try following code.
#!/bin/bash
# cd directory
zip_files=($(find . -name "*.zip"))
for fullpath in "${zip_files[#]}"; do
echo "Processing: "$fullpath""
DIRNAME="${fullpath%/*}"
FILENAME="${fullpath##*/}"
NEW_FILENAME="`echo $FILENAME | awk -F'_' '{print $NF}'`"
echo " DIRNAME="$DIRNAME
echo " NEW_FILENAME="$NEW_FILENAME
mv $fullpath "$DIRNAME/$NEW_FILENAME"
# call unar command
unar -d -o $DIRNAME $NEW_FILENAME
# delete file if you want
done

Bash to rename files to append folder name

In folders and subfolders, I have a bunch of images named by date. I'm trying to come up with a script to look into a folder/subfolders and rename all jpg files to add the folder name.
Example:
/Desktop/Trip 1/200512 1.jpg
/Desktop/Trip 1/200512 2.jpg
would became:
/Desktop/Trip 1/Trip 1 200512 1.jpg
/Desktop/Trip 1/Trip 1 200512 2.jpg
I tried tweaking this script but I can't figure out how to get it to add the new part. I also don't know how to get it to work on subfolders.
#!/bin/bash
# Ignore case, i.e. process *.JPG and *.jpg
shopt -s nocaseglob
shopt -s nullglob
cd ~/Desktop/t/
# Get last part of directory name
here=$(pwd)
dir=${here/*\//}
i=1
for image in *.JPG
do
echo mv "$image" "${dir}${name}.jpg"
((i++))
done
Using find with the -iname option for a case insensitive match and a small script to loop over the images:
find /Desktop -iname '*.jpg' -exec sh -c '
for img; do
parentdir=${img%/*} # leave the parent dir (remove the last `/` and filename)
dirname=${parentdir##*/} # leave the parent directory name (remove all parent paths `*/`)
echo mv -i "$img" "$parentdir/$dirname ${img##*/}"
done
' sh {} +
This extracts the parent path for each image path (like the dirname command) and the directory name (like basename) and constructs a new output filename with the parent directory name before the image filename.
Remove the echo if the output looks as expected.
Try the following in your for loop. Note that '' is used to that the script can deal with spaces in the file names.
for image in "$search_dir"*.JPG
do
echo mv "'$image'" "'${dir} ${image}'"
done

How to move files from subfolders to their parent directory (unix, terminal)

I have a folder structure like this:
A big parent folder named Photos. This folder contains 900+ subfolders named a_000, a_001, a_002 etc.
Each of those subfolders contain more subfolders, named dir_001, dir_002 etc. And each of those subfolders contain lots of pictures (with unique names).
I want to move all these pictures contained in the subdirectories of a_xxx inside a_xxx. (where xxx could be 001, 002 etc)
After looking in similar questions around, this is the closest solution I came up with:
for file in *; do
if [ -d $file ]; then
cd $file; mv * ./; cd ..;
fi
done
Another solution I got is doing a bash script:
#!/bin/bash
dir1="/path/to/photos/"
subs= `ls $dir1`
for i in $subs; do
mv $dir1/$i/*/* $dir1/$i/
done
Still, I'm missing something, can you help?
(Then it would be nice to discard the empty dir_yyy, but not much of a problem at the moment)
You could try the following bash script :
#!/bin/bash
#needed in case we have empty folders
shopt -s nullglob
#we must write the full path here (no ~ character)
target="/path/to/photos"
#we use a glob to list the folders. parsing the output of ls is baaaaaaaddd !!!!
#for every folder in our photo folder ...
for dir in "$target"/*/
do
#we list the subdirectories ...
for sub in "$dir"/*/
do
#and we move the content of the subdirectories to the parent
mv "$sub"/* "$dir"
#if you want to remove subdirectories once the copy is done, uncoment the next line
#rm -r "$sub"
done
done
Here is why you don't parse ls in bash
Make sure the directory where the files exist is correct (and complete) in the following script and try it:
#!/bin/bash
BigParentDir=Photos
for subdir in "$BigParentDir"/*/; do # Select the a_001, a_002 subdirs
for ssdir in "$subdir"/*/; do # Select dir_001, … sub-subdirs
for f in "$ssdir"/*; do # Select the files to move
if [[ -f $f ]]; do # if indeed are files
echo \
mv "$ssdir"/* "$subdir"/ # Move the files.
fi
done
done
done
No file will be moved, just printed. If you are sure the script does what you want, comment the echo line and run it "for real".
You can try this
#!/bin/bash
dir1="/path/to/photos/"
subs= `ls $dir1`
cp /dev/null /tmp/newscript.sh
for i in $subs; do
find $dir1/$i -type f -exec echo mv \'\{\}\' $dir1/$i \; >> /tmp/newscript.sh
done
then open /tmp/newscript.sh with an editor or less and see if looks like what you are trying to do.
if it does then execute it with sh -x /tmp/newscript.sh

Bash: how to copy multiple files with same name to multiple folders

I am working on Linux machine.
I have a lot of files named the same, with a directory structure like this:
P45_input_foo/result.dat
P45_input_bar/result.dat
P45_input_tar/result.dat
P45_input_cool/result.dat ...
It is difficult to copy them one by one. I want to copy them into another folder named as data with similar folder names and file names:
/data/foo/result.dat
/data/bar/result.dat
/data/tar/result.dat
/data/cool/result.dat ...
In stead of copy them one by one what I should do?
Using a for loop in bash :
# we list every files following the pattern : ./<somedirname>/<any file>
# if you want to specify a format for the folders, you could change it here
# i.e. for your case you could write 'for f in P45*/*' to only match folders starting by P45
for f in */*
do
# we strip the path of the file from its filename
# i.e. 'P45_input_foo/result.dat' will become 'P45_input_foo'
newpath="${f%/*}"
# mkdir -p /data/${newpath##*_} will create our new data structure
# - /data/${newpath##*_} extract the last chain of character after a _, in our example, 'foo'
# - mkdir -p will recursively create our structure
# - cp "$f" "$_" will copy the file to our new directory. It will not launch if mkdir returns an error
mkdir -p /data/${newpath##*_} && cp "$f" "$_"
done
the ${newpath##*_} and ${f%/*} usage are part of Bash string manipulation methods. You can read more about it here.
You will need to extract the 3rd item after "_" :
P45_input_foo --> foo
create the directory (if needed) and copy the file to it. Something like this (not tested, might need editing):
STARTING_DIR="/"
cd "$STARTING_DIR"
VAR=$(ls -1)
while read DIR; do
TARGET_DIR=$(echo "$DIR" | cut -d'_' -f3)
NEW_DIR="/data/$DIR"
if [ ! -d "$NEW_DIR" ]; then
mkdir "$NEW_DIR"
fi
cp "$DIR/result.dat" "$NEW_DIR/result.dat"
if [ $? -ne 0 ];
echo "ERROR: encountered an error while copying"
fi
done <<<"$VAR"
Explanation: assuming all the paths you've mentioned are under root / (if not change STARTING_PATH accordingly). With ls you get the list of the directories, store the output in VAR. Pass the content of VAR to the while loop.
A bit of find and with a few bash tricks, the below script could do the trick for you. Remember to run the script without the mv and see if "/data/"$folder"/" is the actual path that you want to move the file(s).
#!/bin/bash
while IFS= read -r -d '' file
do
fileNew="${file%/*}" # Everything before the last '\'
fileNew="${fileNew#*/}" # Everything after the last '\'
IFS="_" read _ _ folder <<<"$fileNew"
mv -v "$file" "/data/"$folder"/"
done < <(find . -type f -name "result.dat" -print0)

Split a folder into multiple subfolders in terminal/bash script

I have several folders, each with between 15,000 and 40,000 photos. I want each of these to be split into sub folders - each with 2,000 files in them.
What is a quick way to do this that will create each folder I need on the go and move all the files?
Currently I can only find how to move the first x items in a folder into a pre-existing directory. In order to use this on a folder with 20,000 items... I would need to create 10 folders manually, and run the command 10 times.
ls -1 | sort -n | head -2000| xargs -i mv "{}" /folder/
I tried putting it in a for-loop, but am having trouble getting it to make folders properly with mkdir. Even after I get around that, I need the program to only create folders for every 20th file (start of a new group). It wants to make a new folder for each file.
So... how can I easily move a large number of files into folders of an arbitrary number of files in each one?
Any help would be very... well... helpful!
Try something like this:
for i in `seq 1 20`; do mkdir -p "folder$i"; find . -type f -maxdepth 1 | head -n 2000 | xargs -i mv "{}" "folder$i"; done
Full script version:
#!/bin/bash
dir_size=2000
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -i mv "{}" "$dir_name$i"
done
For dummies:
create a new file: vim split_files.sh
update the dir_size and dir_name values to match your desires
note that the dir_name will have a number appended
navigate into the desired folder: cd my_folder
run the script: sh ../split_files.sh
This solution worked for me on MacOS:
i=0; for f in *; do d=dir_$(printf %03d $((i/100+1))); mkdir -p $d; mv "$f" $d; let i++; done
It creates subfolders of 100 elements each.
This solution can handle names with whitespace and wildcards and can be easily extended to support less straightforward tree structures. It will look for files in all direct subdirectories of the working directory and sort them into new subdirectories of those. New directories will be named 0, 1, etc.:
#!/bin/bash
maxfilesperdir=20
# loop through all top level directories:
while IFS= read -r -d $'\0' topleveldir
do
# enter top level subdirectory:
cd "$topleveldir"
declare -i filecount=0 # number of moved files per dir
declare -i dircount=0 # number of subdirs created per top level dir
# loop through all files in that directory and below
while IFS= read -r -d $'\0' filename
do
# whenever file counter is 0, make a new dir:
if [ "$filecount" -eq 0 ]
then
mkdir "$dircount"
fi
# move the file into the current dir:
mv "$filename" "${dircount}/"
filecount+=1
# whenever our file counter reaches its maximum, reset it, and
# increase dir counter:
if [ "$filecount" -ge "$maxfilesperdir" ]
then
dircount+=1
filecount=0
fi
done < <(find -type f -print0)
# go back to top level:
cd ..
done < <(find -mindepth 1 -maxdepth 1 -type d -print0)
The find -print0/read combination with process substitution has been stolen from another question.
It should be noted that simple globbing can handle all kinds of strange directory and file names as well. It is however not easily extensible for multiple levels of directories.
The code below assumes that the filenames do not contain linefeeds, spaces, tabs, single quotes, double quotes, or backslashes, and that filenames do not start with a dash. It also assumes that IFS has not been changed, because it uses while read instead of while IFS= read, and because variables are not quoted. Add setopt shwordsplit in Zsh.
i=1;while read l;do mkdir $i;mv $l $((i++));done< <(ls|xargs -n2000)
The code below assumes that filenames do not contain linefeeds and that they do not start with a dash. -n2000 takes 2000 arguments at a time and {#} is the sequence number of the job. Replace {#} with '{=$_=sprintf("%04d",$job->seq())=}' to pad numbers to four digits.
ls|parallel -n2000 mkdir {#}\;mv {} {#}
The command below assumes that filenames do not contain linefeeds. It uses the implementation of rename by Aristotle Pagaltzis which is the rename formula in Homebrew, where -p is needed to create directories, where --stdin is needed to get paths from STDIN, and where $N is the number of the file. In other implementations you can use $. or ++$::i instead of $N.
ls|rename --stdin -p 's,^,1+int(($N-1)/2000)."/",e'
I would go with something like this:
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
# Go through all JPG files in the current directory
for f in *.jpg; do
# Create new output directory if first of new batch of 2000
if [ $n -eq 0 ]; then
outdir=folder$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if we have sent 2000
[ $n -eq 2000 ] && n=0
done
The answer above is very useful, but there is a very import point in Mac(10.13.6) terminal. Because xargs "-i" argument is not available, I have change the command from above to below.
ls -1 | sort -n | head -2000| xargs -I '{}' mv {} /folder/
Then, I use the below shell script(reference tmp's answer)
#!/bin/bash
dir_size=500
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -I '{}' mv {} "$dir_name$i"
done
This is a tweak of Mark Setchell's
Usage:
bash splitfiles.bash $PWD/directoryoffiles splitsize
It doesn't require the script to be located in the same dir as the files for splitting, it will operate on all files, not just the .jpg and allows you to specify the split size as an argument.
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
if [ "$#" -ne 2 ]; then
echo Wrong number of args
echo Usage: bash splitfiles.bash $PWD/directoryoffiles splitsize
exit 1
fi
# Go through all files in the specified directory
for f in $1/*; do
# Create new output directory if first of new batch
if [ $n -eq 0 ]; then
outdir=$1/$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if current new dir is full
[ $n -eq $2 ] && n=0
done
Can be directly run in the terminal
i=0;
for f in *;
do
d=picture_$(printf %03d $((i/2000+1)));
mkdir -p $d;
mv "$f" $d;
let i++;
done
This script will move all files within the current directory into picture_001, picture_002... and so on. Each newly created folder will contain 2000 files
2000 is the chunked number
%03d is the suffix digit you can adjust (currently 001,002,003)
picture_ is the folder prefix
This script will chunk all files into its directory (create subdirectory)
You'll certainly have to write a script for that.
Hints of things to include in your script:
First count the number of files within your source directory
NBFiles=$(find . -type f -name *.jpg | wc -l)
Divide this count by 2000 and add 1, to determine number of directories to create
NBDIR=$(( $NBFILES / 2000 + 1 ))
Finally loop through your files and move them accross the subdirs.
You'll have to use two imbricated loops : one to pick and create the destination directory, the other to move 2000 files in this subdir, then create next subdir and move the next 2000 files to the new one, etc...

Resources