Looping through sub dirs in large data set, making a new folder with the subdir name, & then hardlinking select files to that new directory - bash

I'm struggling immensely with getting a nested for loop to work for this. The data set I am working with is very large (a little over a million files).
I was looking at a nested for loop but it seems unstable.
count=0
for dir in $(find "$sourceDir" -mindepth 1 -maxdepth 1 -type d)
do
(
mkdir -p "$destDir/$dir"
for file in $(find . -type f)
do
(
if [ $((count % 3)) -eq 2 ]
then
cp -prl "$file" $destDir/$dir
fi
((count ++))
)
done
)
((count++))
done
^^ this is only going into the last directory and finding the 3rd file. I need it to enter every directory and find the third file
I've thought of breaking this up into chunks and running several scripts instead of just one to make it more scalable.

I was able to figure out the answer thanks to the commenters!! My input was a folder with 4 sub folders and within each of those 4 subfolders, there are 12 files.
My ideal output was having every 3rd file (starting with three) hardlinked at an external location sorted within their subdirectories... so something like this -
subdirA (3rdfile hardlink,6thfile hardlink,9thfile hardlink,12thfile hardlink) subdirB (3rdfile hardlink,6thfile hardlink,...)
... and so on!!
Here is what got it to work:
#!/bin/bash
for d in *;
do
echo $d
mkdir Desktop/testjan16/$d
#### loops through each file in the folder and hardlinks every third file (starting w 3) to the appropriate directory
for f in `find ./$d -type f | sort | awk 'NR %3 == 0'`; do ln $f Desktop/testjan16/$d; done
done

Related

BASH: copy only images from directory, not copying folder structure and rename copied files in sequential order

I have found an old HDD which was used in the family computer back in 2011. There are a lot of images on it which I would love to copy to my computer and print out in a nice photobook as a surprise to my parents and sister.
However, I have a problem: These photos have been taken with older cameras. Which means that I have a lot of photos with names such as: 01, 02, etc. These are now in hunderds of sub-folders.
I have already tried the following command but I still get exceptions where the file cannot be copied because one with the same name already exists.
Example: cp: cannot create regular file 'C:/Users/patri/Desktop/Fotoboek/battery.jpg': File exists
The command I execute:
$ find . -type f -regex '.*\(jpg\|jpeg\|png\|gif\|bmp\|mp4\)' -exec cp --backup=numbered '{}' C:/Users/patri/Desktop/Fotoboek \;
I had hoped that the --backup=numbered would solve my problem. (I thought that it would add either a 0,1,2 etc to the filename if it already exists, which it unfortunately doesn't do successfully).
Is there a way to find only media files such as images and videos like I have above and make it so that every file copied gets renamed to a sequential number? So the first copied image would have the name 0, then the 2nd 1, etc.
** doesn't do successfully ** is not a clear question. If I try your find command on sample directories on my system (Linux Mint 20), it works just fine. It creates files with ~1~, ~2~, ... added to the filename (mind you after the extension).
If you want a quick and dirty solution, you could do:
#!/bin/bash
counter=1
find sourcedir -type f -print0 | while IFS= read -r -d '' file
do
filename=$(basename -- "$file")
extension="${filename##*.}"
fileonly="${filename%.*}"
cp "$file" "targetdir/${fileonly}_${counter}.$extension"
(( counter += 1 ))
done
In this solution the counter is incremented every time a file is copied. The numbers are not sequential for each filename.
Yes I know it is an anti-pattern, and not ideal but it works.
If you want a "more evolved" version of the previous, where the numbers are sequential, you could do:
#!/bin/bash
find sourcedir -type f -print0 | while IFS= read -r -d '' file
do
filename=$(basename -- "$file")
extension="${filename##*.}"
fileonly="${filename%.*}"
counter=1
while [[ -f "targetdir/${fileonly}_${counter}.$extension" ]]
do
(( counter += 1 ))
done
cp "$file" "targetdir/${fileonly}_${counter}.$extension"
done
This version increments the counter every time a file is found to exist with that counter. Ex. if you have 3 a.jpg files, they will be named a_1.jpg, a_2.jpg, a_3.jpg

Running Command Recursively in all Subdirectories

I have a folder which contains a depth of up to 3 layers of subdirectories, with images in the deepest subdirectory (for most images, this is just one level). I want to rename these images to include the name of the directory they are in as a prefix. For this, I need to be able to run a single command for each subdirectory in this tree of subdirectories.
This is my attempt:
DIRS=/home/arjung2/data_256_nodir/a/*
for dir in $DIRS
do
for f in $dir
do
echo "$dir"
echo "$f"
echo "$dir_$f"
mv "$f" "$dir_$f"
done
done
However, the first three echos prints out the same thing for each 1-level deep subdirectory (not all up to 3-level deep subdirectories as I desire), and gives me an error. An example output is the following:
/home/arjung2/data_256_nodir/a/airfield
/home/arjung2/data_256_nodir/a/airfield
/home/arjung2/data_256_nodir/a/airfield
mv: cannot move ‘/home/arjung2/data_256_nodir/a/airfield’ to a subdirectory of itself, ‘/home/arjung2/data_256_nodir/a/airfield/airfield’
Any ideas what I'm doing wrong? Any help will be much appreciated, thanks!!
Assuming all images can be identified with 'find' (e.g., by suffix), consider the following bash script:
#! /bin/bash
find . -type f -name '*.jpeg' -print0 | while read -d '' file ; do
d=${file%/*};
d1=${d##*/};
new_file="$d/${d1}_${file##*/}"
echo "Move: $file -> $new_file"
mv "$file" "$new_file"
done
It will move a/b/c.jpeg to a/b/b_c.jpeg, for every folder/file. Adjust (or remove) the -name as needed.
Say dir has the value /home/arjung2/data_256_nodir/a/airfield. In this case, the statement
for f in $dir
expands to
for f in /home/arjung2/data_256_nodir/a/airfield
which means that the inner loop will be executed exactly once, f taking the name /home/arjung2/data_256_nodir/a/airfield, which is the same as dir.
It would make more sense to iterate over the files within the directory:
for f in $dir/*

Sorting loop function creates infinite subdirectories

I routinely have to sort large amounts of images in multiple folders into two 2 file types, ".JPG" and ".CR2". I'm fairly new to bash but have created a basic script that will sort through one individual folder successfully and divide these file types into distinct folders.
I'm having problems scaling this up to automatically loop through subdirectories. My current script creates an infinite loop of new subfolders until terminal times out.
How can I use the loop function without having it cycle through new folders?
function f {
cd "$1"
count=`ls -1 *.JPG 2>/dev/null | wc -l`
if [ $count != 0 ]; then
echo true
mkdir JPG; mv *.JPG jpg
else
echo false
fi
count=`ls -1 *.CR2 2>/dev/null | wc -l`
if [ $count != 0 ]; then
echo true
mkdir RAW; mv *.CR2 raw;
else
echo false
fi
for d in * .[!.]* ..?*; do
cd "$1"
test -d "$1/$d" && f "$1/$d"
done
}
f "`pwd`"
I still advise people to use find instead of globulation * in scripts. The * may not work reliably always, may fail and confuse.
First we create directories to move to:
mkdir -p ./jpg ./cr2
Note that -p in mkdir will make mkdir not fail in case the directory already exists.
Use find. Find all files named *.JPG and move each file to jpg :
find . -maxdepth 1 -mindepth 1 -name '*.JPG' -exec mv {} ./jpg \;
// similar
find . -maxdepth 1 -mindepth 1 -name '*.CR2' -exec mv {} ./cr2 \;
The -maxdepth 1 -mindepth 1 is so that the find does not scan the directory recursively, which is the default. You can remove them, but If you want, you can add -type f to include files only.
Notes to your script:
Don't parse ls output
You can use the find . -mindepth 1 -maxdepth 1 -file '*.jpg' -print . | wc -c to get the number of files in a directory instead.
for d in * .[!.]* ..?*; do I have a vague idea what this is supposed to do, some kind of recursively scanning the directory. Buf if the directory JPG is inside $(pwd) then you will scan infinitely into yourself and move the file into yourself etc... If the destination folder is outside current directory, just modify the find scripts by removing -mindepth 1, it will scan recursively then.
Don't use backticks, they are less readable and are deprecated. Use $( .. ) instead.

Split a folder into multiple subfolders in terminal/bash script

I have several folders, each with between 15,000 and 40,000 photos. I want each of these to be split into sub folders - each with 2,000 files in them.
What is a quick way to do this that will create each folder I need on the go and move all the files?
Currently I can only find how to move the first x items in a folder into a pre-existing directory. In order to use this on a folder with 20,000 items... I would need to create 10 folders manually, and run the command 10 times.
ls -1 | sort -n | head -2000| xargs -i mv "{}" /folder/
I tried putting it in a for-loop, but am having trouble getting it to make folders properly with mkdir. Even after I get around that, I need the program to only create folders for every 20th file (start of a new group). It wants to make a new folder for each file.
So... how can I easily move a large number of files into folders of an arbitrary number of files in each one?
Any help would be very... well... helpful!
Try something like this:
for i in `seq 1 20`; do mkdir -p "folder$i"; find . -type f -maxdepth 1 | head -n 2000 | xargs -i mv "{}" "folder$i"; done
Full script version:
#!/bin/bash
dir_size=2000
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -i mv "{}" "$dir_name$i"
done
For dummies:
create a new file: vim split_files.sh
update the dir_size and dir_name values to match your desires
note that the dir_name will have a number appended
navigate into the desired folder: cd my_folder
run the script: sh ../split_files.sh
This solution worked for me on MacOS:
i=0; for f in *; do d=dir_$(printf %03d $((i/100+1))); mkdir -p $d; mv "$f" $d; let i++; done
It creates subfolders of 100 elements each.
This solution can handle names with whitespace and wildcards and can be easily extended to support less straightforward tree structures. It will look for files in all direct subdirectories of the working directory and sort them into new subdirectories of those. New directories will be named 0, 1, etc.:
#!/bin/bash
maxfilesperdir=20
# loop through all top level directories:
while IFS= read -r -d $'\0' topleveldir
do
# enter top level subdirectory:
cd "$topleveldir"
declare -i filecount=0 # number of moved files per dir
declare -i dircount=0 # number of subdirs created per top level dir
# loop through all files in that directory and below
while IFS= read -r -d $'\0' filename
do
# whenever file counter is 0, make a new dir:
if [ "$filecount" -eq 0 ]
then
mkdir "$dircount"
fi
# move the file into the current dir:
mv "$filename" "${dircount}/"
filecount+=1
# whenever our file counter reaches its maximum, reset it, and
# increase dir counter:
if [ "$filecount" -ge "$maxfilesperdir" ]
then
dircount+=1
filecount=0
fi
done < <(find -type f -print0)
# go back to top level:
cd ..
done < <(find -mindepth 1 -maxdepth 1 -type d -print0)
The find -print0/read combination with process substitution has been stolen from another question.
It should be noted that simple globbing can handle all kinds of strange directory and file names as well. It is however not easily extensible for multiple levels of directories.
The code below assumes that the filenames do not contain linefeeds, spaces, tabs, single quotes, double quotes, or backslashes, and that filenames do not start with a dash. It also assumes that IFS has not been changed, because it uses while read instead of while IFS= read, and because variables are not quoted. Add setopt shwordsplit in Zsh.
i=1;while read l;do mkdir $i;mv $l $((i++));done< <(ls|xargs -n2000)
The code below assumes that filenames do not contain linefeeds and that they do not start with a dash. -n2000 takes 2000 arguments at a time and {#} is the sequence number of the job. Replace {#} with '{=$_=sprintf("%04d",$job->seq())=}' to pad numbers to four digits.
ls|parallel -n2000 mkdir {#}\;mv {} {#}
The command below assumes that filenames do not contain linefeeds. It uses the implementation of rename by Aristotle Pagaltzis which is the rename formula in Homebrew, where -p is needed to create directories, where --stdin is needed to get paths from STDIN, and where $N is the number of the file. In other implementations you can use $. or ++$::i instead of $N.
ls|rename --stdin -p 's,^,1+int(($N-1)/2000)."/",e'
I would go with something like this:
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
# Go through all JPG files in the current directory
for f in *.jpg; do
# Create new output directory if first of new batch of 2000
if [ $n -eq 0 ]; then
outdir=folder$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if we have sent 2000
[ $n -eq 2000 ] && n=0
done
The answer above is very useful, but there is a very import point in Mac(10.13.6) terminal. Because xargs "-i" argument is not available, I have change the command from above to below.
ls -1 | sort -n | head -2000| xargs -I '{}' mv {} /folder/
Then, I use the below shell script(reference tmp's answer)
#!/bin/bash
dir_size=500
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -I '{}' mv {} "$dir_name$i"
done
This is a tweak of Mark Setchell's
Usage:
bash splitfiles.bash $PWD/directoryoffiles splitsize
It doesn't require the script to be located in the same dir as the files for splitting, it will operate on all files, not just the .jpg and allows you to specify the split size as an argument.
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
if [ "$#" -ne 2 ]; then
echo Wrong number of args
echo Usage: bash splitfiles.bash $PWD/directoryoffiles splitsize
exit 1
fi
# Go through all files in the specified directory
for f in $1/*; do
# Create new output directory if first of new batch
if [ $n -eq 0 ]; then
outdir=$1/$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if current new dir is full
[ $n -eq $2 ] && n=0
done
Can be directly run in the terminal
i=0;
for f in *;
do
d=picture_$(printf %03d $((i/2000+1)));
mkdir -p $d;
mv "$f" $d;
let i++;
done
This script will move all files within the current directory into picture_001, picture_002... and so on. Each newly created folder will contain 2000 files
2000 is the chunked number
%03d is the suffix digit you can adjust (currently 001,002,003)
picture_ is the folder prefix
This script will chunk all files into its directory (create subdirectory)
You'll certainly have to write a script for that.
Hints of things to include in your script:
First count the number of files within your source directory
NBFiles=$(find . -type f -name *.jpg | wc -l)
Divide this count by 2000 and add 1, to determine number of directories to create
NBDIR=$(( $NBFILES / 2000 + 1 ))
Finally loop through your files and move them accross the subdirs.
You'll have to use two imbricated loops : one to pick and create the destination directory, the other to move 2000 files in this subdir, then create next subdir and move the next 2000 files to the new one, etc...

command line find first file in a directory

My directory structure is as follows
Directory1\file1.jpg
\file2.jpg
\file3.jpg
Directory2\anotherfile1.jpg
\anotherfile2.jpg
\anotherfile3.jpg
Directory3\yetanotherfile1.jpg
\yetanotherfile2.jpg
\yetanotherfile3.jpg
I'm trying to use the command line in a bash shell on ubuntu to take the first file from each directory and rename it to the directory name and move it up one level so it sits alongside the directory.
In the above example:
file1.jpg would be renamed to Directory1.jpg and placed alongside the folder Directory1
anotherfile1.jpg would be renamed to Directory2.jpg and placed alongside the folder Directory2
yetanotherfile1.jpg would be renamed to Directory3.jpg and placed alongside the folder Directory3
I've tried using:
find . -name "*.jpg"
but it does not list the files in sequential order (I need the first file).
This line:
find . -name "*.jpg" -type f -exec ls "{}" +;
lists the files in the correct order but how do I pick just the first file in each directory and move it up one level?
Any help would be appreciated!
Edit: When I refer to the first file what I mean is each jpg is numbered from 0 to however many files in that folder - for example: file1, file2...... file34, file35 etc... Another thing to mention is the format of the files is random, so the numbering might start at 0 or 1a or 1b etc...
You can go inside each dir and run:
$ mv `ls | head -n 1` ..
If first means whatever the shell glob finds first (lexical, but probably affected by LC_COLLATE), then this should work:
for dir in */; do
for file in "$dir"*.jpg; do
echo mv "$file" "${file%/*}.jpg" # If it does what you want, remove the echo
break 1
done
done
Proof of concept:
$ mkdir dir{1,2,3} && touch dir{1,2,3}/file{1,2,3}.jpg
$ for dir in */; do for file in "$dir"*.jpg; do echo mv "$file" "${file%/*}.jpg"; break 1; done; done
mv dir1/file1.jpg dir1.jpg
mv dir2/file1.jpg dir2.jpg
mv dir3/file1.jpg dir3.jpg
Look for all first level directories, identify first file in this directory and then move it one level up
find . -type d \! -name . -prune | while read d; do
f=$(ls $d | head -1)
mv $d/$f .
done
Building on the top answer, here is a general use bash function that simply returns the first path that resolves to a file within the given directory:
getFirstFile() {
for dir in "$1"; do
for file in "$dir"*; do
if [ -f "$file" ]; then
echo "$file"
break 1
fi
done
done
}
Usage:
# don't forget the trailing slash
getFirstFile ~/documents/
NOTE: it will silently return nothing if you pass it an invalid path.

Resources