Merge two directories keeping larger files - bash

Consider for example
mkdir dir1
mkdir dir2
cd dir1
echo "This file contains something" > a
touch b
echo "This file contains something" > c
echo "This file contains something" > d
touch e
cd ../dir2
touch a
echo "This file contains something" > b
echo "This file contains something" > c
echo "This file contains more data than the other file that has the same name but is in the other directory. BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla. bla!" > d
I would like to merge dir1 and dir2. If two files have the same name, then only the one which size is the largest must be kept. Here is the expected content of the merged directory
a # Comes from `dir1`
b # Comes from `dir2`
c # Comes from either `dir1` or `dir2`
d # Comes from `dir2`
e # Comes from `dir1`(is empty)

Assuming that no file name a newline:
find . -type f -printf '%s %p\n' \
| sort -nr \
| while read -r size file; do
if ! [ -e "dest/${file#./*/}" ]; then
cp "$file" "dest/${file#./*/}";
fi;
done
The output of find is a list of "filesize path":
221 ./dir1/a
1002 ./dir1/b
11 ./dir2/a
Then we sort the list numeric:
1002 ./dir1/b
221 ./dir1/a
11 ./dir2/a
And fianlly we reach the while read -r size filename loop, where each file is copied over to the destination dest/${file#./*/} if they don't already exists.
${file#./*/} expands to the value of the parameter file with the leading directory removed:
./abc/def/foo/bar.txt -> def/foo/bar.txt, which means you might need to create the directory def/foo in the dest directory:
| while read -r size file; do
dest=dest/${file#./*/}
destdir=${dest%/*}
[ -e "$dest" ] && continue
[ -e "$destdir" ] || mkdir -p -- "$destdir"
cp -- "$file" "$dest"
done

I cannot comment on the other answer due to not enough reputation, but I was getting a syntax error due to missing fi. I also got an error where the target directory needed to be created before copying. So:
find . -type f -printf '%s %p\n' | sort -nr | while read -r size file; do if ! [ -e "dest/${file#./*/}" ]; then mkdir -p "$(dirname "dest/${file#./*/}")" && cp "$file" "dest/${file#./*/}"; fi; done

Related

Separate folders into subfolders according to numbering in bash

I have the following directory tree:
1_loc
2_buzdfg
4_foodga
5_bardfg
6_loc
8_buzass
9_foossd
12_bardaf
There may be numbers missing in the folder ordering.
I want to separate these folders into subfolders according to their numbers, so that all folders with a number smaller than 6 (before the second _loc folder) would go to folder1 and all folders with a number equal or greater than 6 with go to folder2.
I can solve the problem very easily using the mouse, of course, but I wanted a suggestion of how to do this automatically from the terminal.
Any ideas?
while read -r line; do
# Regex match the beginning of the string for a character between 1 and 5 - change this as you'd please to any regex
FOLDERNUMBER=""
[[ "$line" ~= "^[1-5]" ]] && FOLDERNUMBER="1" || FOLDERNUMBER="2"
# So FOLDERPATH = "./folder1", if FOLDERNUMBER=1
FOLDERPATH="./folder$FOLDERNUMBER"
# Check folder exists, if not create it
[[ ! -d "$FOLDERPATH" ]] && mkdir "$FOLDERPATH"
# Finally, move the file to FOLDERPATH
mv "$line" "$FOLDERPATH/$(basename $line)"
done < <(find . -type f)
# This iterates through each line of the command in the brackets - in this case, each file path from the `find` command.
I think the solution is to loop through the files and check the number before the first _.
Firstly, let's check how to get the number before _:
$ d="1_loc_b"
$ echo "${d%%_*}"
1
OK, so this works. Then, let's loop:
for file in *
do
echo "$file"
(( ${file%%_*} > 5)) && echo "moving to dir2/" || echo "moving to dir1/"
done
Suppose folder1 and folder2 exists in the same directory, I will do it like this:
for d in *_*; do # to avoid folder1 and folder2
# check if the first field seperated by _ is less than 5
if ((`echo $d | cut -d"_" -f1` < 6)); then
mv $d folder1/$d;
else
mv $d folder2/$d;
fi;
done
(more about cut)
You can go to the current directory and run these simple commands:
mv {1,2,3,4}_* folder1/
mv {5,6,7,8}_* folder2/
This assumes no other files/directory starting with these prefixes (i.e. 1-8).
Another pure bash, parameter-expansion solution:-
#!/bin/bash
# 'find' returns folders having a '_' in their names, the flag -print0 option to
# preserve special characters in names.
# the folders are names as './1_folder', './2_folder', bash magic is done
# to remove those special characters.
# '-v' flag in 'mv' for verbose action
while IFS= read -r -d '' folder; do
folderName="${folder%_*}" # To strip the characters after the '_'
finalName="${folderName##*/}" # To strip the everything before '/'
((finalName > 5)) && mv -v "$folder" folder1 || mv -v "$folder" folder2
done < <(find . -maxdepth 1 -mindepth 1 -name "*_*" -type d -print0)
You can create the a script with the following code and when you run it, the folders will be moved as desired..
#seperate the folders into 2 folders
#this is a generic solution for any folder that start with a number
#!/bin/bash
for file in *
do
prefix=`echo $file | awk 'BEGIN{FS="_"};{print $1}'`
if [[ $prefix != ?(-)+([0-9]) ]]
then continue
fi
if [ $prefix -le 4 ]
then mv "$file" folder1
elif [ $prefix -ge 5 ]
then mv "$file" folder2
fi
done

Move files from directories listed in file

I have a directory structure like the following toy example
DirectoryTo
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I'm trying to copy all the files from DirectoryFrom to DirectoryTo, keeping the newer file if there are duplicates.
DirectoryTo
-File1.txt
-File2.txt
-File3.txt
-File4.txt
-File5.txt
-File6.txt
-File7.txt
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I've created a text file with a list of all the subdirectories. This list is in the order such that the NEWEST files will be listed first:
Filelist.txt
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
So what I'd like to do is loop through each directory in Filelist.txt, copy the files, and NOT replace if the file already exists.
I'd like to do this at the command line, in a shell script, or possibly in Python. I'm pretty new to Python, but have a little experience with the command line. However, I've never done something this complicated.
In reality, I have ~60 folders, each with 50-200 files in them, to give you a feel for how many I have. Also, each file is ~75MB.
I've done something similar in R before, but it's slow and not really meant for this. But here's what I've tried for a shell script, edited to fit this toy example:
#!/bin/bash
for line in Filelist.txt
do
cp -n line C:/DirectoryTo/
done
If you have only one one directory level in your DirectoryFrom then you can use:
cp -n DirectoryFrom/*/* DirectoryTo
explanation : copy every file which exist in subdirectories of DirectoryFrom to DirectoryTo if it doesn't exist
n flag is for not overwriting files if they already exist.
cp will also ignore directories if they exist in subdirectories of DirectoryTo
# Create test environnement :
mkdir C:/DirectoryTo
mkdir C:/DirectoryFrom
cd C:/DirectoryFrom
mkdir Dir1 Dir2 Dir3
(
cat << EOF
Dir1/File1.txt
Dir1/File2.txt
Dir1/File3.txt
Dir2/File4.txt
Dir2/File5.txt
Dir2/File6.txt
Dir3/File1.txt
Dir3/File5.txt
Dir3/File7.txt
EOF
)| while read f
do
echo "$f : `date`"
echo "$f : `date`" > $f
sleep 1
done
# create Filelist.txt file :
(
cat << EOF
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
EOF
) > Filelist.txt
# Generate the liste of all files :
cd C:/DirectoryFrom
cat Filelist.txt | while read f; do ls -1 $f; done | sort -u > filenames.txt
cat filenames.txt
# liste of all files path, sorted by time order :
cd C:/DirectoryFrom
ls -1tr */* > all_filespath_sorted.txt
cat all_filespath_sorted.txt
# selected files to be copied :
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done
# copy of selected files:
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done
# verifying :
cd C:/DirectoryTo
ls -ltr
# or
ls -1 | while read f; do echo -e "\n$f\n-------"; cat $f; done
#------------------------------------------------
# Other solution for a limited number of files :
#------------------------------------------------
# To list files by order :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr
# To copy files, the newer will replace the older :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done

counting the total numbers of files and directories in a provided folder including subdirectories and their files

I want to count all the files and directories from a provided folder including files and directories in a subdirectory. I have written a script which will count accurately the number of files and directory but it does not handle the subdirectories any ideas ???
I want to do it without using FIND command
#!/bin/bash
givendir=$1
cd "$givendir" || exit
file=0
directories=0
for d in *;
do
if [ -d "$d" ]; then
directories=$((directories+1))
else
file=$((file+1))
fi
done
echo "Number of directories :" $directories
echo "Number of file Files :" $file
Use find:
echo "Number of directories: $(find "$1" -type d | wc -l)"
echo "Number of files/symlinks/sockets: $(find "$1" ! -type d | wc -l)"
Using plain shell and recursion:
#!/bin/bash
countdir() {
cd "$1"
dirs=1
files=0
for f in *
do
if [[ -d $f ]]
then
read subdirs subfiles <<< "$(countdir "$f")"
(( dirs += subdirs, files += subfiles ))
else
(( files++ ))
fi
done
echo "$dirs $files"
}
shopt -s dotglob nullglob
read dirs files <<< "$(countdir "$1")"
echo "There are $dirs dirs and $files files"
find "$1" -type f | wc -l will give you the files, find "$1" -type d | wc -l the directories
My quick-and-dirty shellscript would read
#!/bin/bash
test -d "$1" || exit
files=0
# Start with 1 to count the starting dir (as find does), else with 0
directories=1
function docount () {
for d in $1/*; do
if [ -d "$d" ]; then
directories=$((directories+1))
docount "$d";
else
files=$((files+1))
fi
done
}
docount "$1"
echo "Number of directories :" $directories
echo "Number of file Files :" $files
but mind it: On my build folder for a project, there were quite some differences:
find: 6430 dirs, 74377 non-dirs
my script: 6032 dirs, 71564 non-dirs
#thatotherguy's script: 6794 dirs, 76862 non-dirs
I assume that has to do with the legions of links, hidden files etc., but I am too lazy to investigate: find is the tool of choice.
Here are some one-line commands that work without find:
Number of directories: ls -Rl ./ | grep ":$" | wc -l
Number of files: ls -Rl ./ | grep "[0-9]:[0-9]" | wc -l
Explanation:
ls -Rl lists all files and directories recursively, one line each.
grep ":$" finds just the results whose last character is ':'. These are all of the directory names.
grep "[0-9]:[0-9]" matches on the HH:MM part of the timestamp. The timestamp only shows up on file, not directories. If your timestamp format is different then you will need to pick a different grep.
wc -l counts the number of lines that matched from the grep.

Why is while not not working?

AIM: To find files with a word count less than 1000 and move them another folder. Loop until all under 1k files are moved.
STATUS: It will only move one file, then error with "Unable to move file as it doesn't exist. For some reason $INPUT_SMALL doesn't seem to update with the new file name."
What am I doing wrong?
Current Script:
Check for input files already under 1k and move to Split folder
INPUT_SMALL=$( ls -S /folder1/ | grep -i reply | tail -1 )
INPUT_COUNT=$( cat /folder1/$INPUT_SMALL 2>/dev/null | wc -l )
function moveSmallInput() {
while [[ $INPUT_SMALL != "" ]] && [[ $INPUT_COUNT -le 1003 ]]
do
echo "Files smaller than 1k have been found in input folder, these will be moved to the split folder to be processed."
mv /folder1/$INPUT_SMALL /folder2/
done
}
I assume you are looking for files that has the word reply somewhere in the path. My solution is:
wc -w $(find /folder1 -type f -path '*reply*') | \
while read wordcount filename
do
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
Run the script once, if the output looks correct, then uncomment the mv command and run it for real this time.
Update
The above solution has trouble with files with embedded spaces. The problem occurs when the find command hands its output to the wc command. After a little bit of thinking, here is my revised soltuion:
find /folder1 -type f -path '*reply*' | \
while read filename
do
set $(wc -w "$filename") # $1= word count, $2 = filename
wordcount=$1
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
A somewhat shorter version
#!/bin/bash
find ./folder1 -type f | while read f
do
(( $(wc -w "$f" | awk '{print $1}' ) < 1000 )) && cp "$f" folder2
done
I left cp instead of mv for safery reasons. Change to mv after validating
I you also want to filter with reply use #Hai's version of the find command
Your variables INPUT_SMALL and INPUT_COUNT are not functions, they're just values you assigned once. You either need to move them inside your while loop or turn them into functions and evaluate them each time (rather than just expanding the variable values, as you are now).

Need a bash scripts to move files to sub folders automatically

I have a folder with 320G images, I want to move the images to 5 sub folders randomly(just need to move to 5 sub folders). But I know nothing on bash scripts.Please could someone help? thanks!
You could move the files do different directories based on their first letter:
mv [A-Fa-f]* dir1
mv [F-Kf-k]* dir2
mv [^A-Ka-k]* dir3
Here is my take on this. In order to use it place the script somewhere else (not in you folder) but run it from your folder. If you call your script file rmove.sh, you can place it in, say ~/scripts/, then cd to your folder and run:
source ~/scripts/rmove.sh
#/bin/bash
ndirs=$((`find -type d | wc -l` - 1))
for file in *; do
if [ -f "${file}" ]; then
rand=`dd if=/dev/random bs=1 count=1 2>/dev/null | hexdump -b | head -n1 | cut -d" " -f2`
rand=$((rand % ndirs))
i=0
for directory in `find -type d`; do
if [ "${directory}" = . ]; then
continue
fi
if [ $i -eq $rand ]; then
mv "${file}" "${directory}"
fi
i=$((i + 1))
done
fi
done
Here's my stab at the problem:
#!/usr/bin/env bash
sdprefix=subdir
dirs=5
# pre-create all possible sub dirs
for n in {1..5} ; do
mkdir -p "${sdprefix}$n"
done
fcount=$(find . -maxdepth 1 -type f | wc -l)
while IFS= read -r -d $'\0' file ; do
subdir="${sdprefix}"$(expr \( $RANDOM % $dirs \) + 1)
mv -f "$file" "$subdir"
done < <(find . -maxdepth 1 -type f -print0)
Works with huge numbers of files
Does not beak if a file is not moveable
Creates subdirectories if necessary
Does not break on unusual file names
Relatively cheap
Any scripting language will do so I'll write in Python here:
#!/usr/bin/python
import os
import random
new_paths = ['/path1', '/path2', '/path3', '/path4', '/path5']
image_directory = '/path/to/images'
for file_path in os.listdir(image_directory):
full_path = os.path.abspath(os.path.join(image_directory, file_path))
random_subdir = random.choice(new_paths)
new_path = os.path.abspath(os.path.join(random_subdir, file_path))
os.rename(full_path, new_path)
mv `ls | while read x; do echo "`expr $RANDOM % 1000`:$x"; done \
| sort -n| sed 's/[0-9]*://' | head -1` ./DIRNAME
run it in your current image directory, this command will select one file at a time and move it to ./DIRNAME, iterate this command until there are no more files to move.
Pay attention that ` is backquotes and not just quotes characters.

Resources