Move files from directories listed in file - shell

I have a directory structure like the following toy example
DirectoryTo
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I'm trying to copy all the files from DirectoryFrom to DirectoryTo, keeping the newer file if there are duplicates.
DirectoryTo
-File1.txt
-File2.txt
-File3.txt
-File4.txt
-File5.txt
-File6.txt
-File7.txt
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I've created a text file with a list of all the subdirectories. This list is in the order such that the NEWEST files will be listed first:
Filelist.txt
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
So what I'd like to do is loop through each directory in Filelist.txt, copy the files, and NOT replace if the file already exists.
I'd like to do this at the command line, in a shell script, or possibly in Python. I'm pretty new to Python, but have a little experience with the command line. However, I've never done something this complicated.
In reality, I have ~60 folders, each with 50-200 files in them, to give you a feel for how many I have. Also, each file is ~75MB.
I've done something similar in R before, but it's slow and not really meant for this. But here's what I've tried for a shell script, edited to fit this toy example:
#!/bin/bash
for line in Filelist.txt
do
cp -n line C:/DirectoryTo/
done

If you have only one one directory level in your DirectoryFrom then you can use:
cp -n DirectoryFrom/*/* DirectoryTo
explanation : copy every file which exist in subdirectories of DirectoryFrom to DirectoryTo if it doesn't exist
n flag is for not overwriting files if they already exist.
cp will also ignore directories if they exist in subdirectories of DirectoryTo

# Create test environnement :
mkdir C:/DirectoryTo
mkdir C:/DirectoryFrom
cd C:/DirectoryFrom
mkdir Dir1 Dir2 Dir3
(
cat << EOF
Dir1/File1.txt
Dir1/File2.txt
Dir1/File3.txt
Dir2/File4.txt
Dir2/File5.txt
Dir2/File6.txt
Dir3/File1.txt
Dir3/File5.txt
Dir3/File7.txt
EOF
)| while read f
do
echo "$f : `date`"
echo "$f : `date`" > $f
sleep 1
done
# create Filelist.txt file :
(
cat << EOF
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
EOF
) > Filelist.txt
# Generate the liste of all files :
cd C:/DirectoryFrom
cat Filelist.txt | while read f; do ls -1 $f; done | sort -u > filenames.txt
cat filenames.txt
# liste of all files path, sorted by time order :
cd C:/DirectoryFrom
ls -1tr */* > all_filespath_sorted.txt
cat all_filespath_sorted.txt
# selected files to be copied :
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done
# copy of selected files:
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done
# verifying :
cd C:/DirectoryTo
ls -ltr
# or
ls -1 | while read f; do echo -e "\n$f\n-------"; cat $f; done
#------------------------------------------------
# Other solution for a limited number of files :
#------------------------------------------------
# To list files by order :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr
# To copy files, the newer will replace the older :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done

Related

Processing every file from a list of folders

I have a folder structure like
base
|
|---rbbc_23434
| |------rbbp_34954
| | |___this.json
|
|---rbbc_222334
| |------rbbp_39884954
| | |___this.json
|
etc
And I want to process each this.json. Notice that the letters after rbbp are random
I have the following
#! /bin/bash
search_dir=/path/to/base/
bf="$(basename -- $search_dir)"
for entry in "$search_dir"*/
do
#echo "$entry"
f="$(basename -- $entry)"
echo "$f"
if [[ "$f" == "rbb"* ]]
then
echo "$entry"
ls "$entry""rbbp"*"/this.json"
#echo "$entry""rdgp"*"/this2.json"
#python3 something.py --input "$entry""rbbp"*"/this.json"
fi
done
With ls I can localize the this.json files from all folders but these wildcards do not seem to work when specifying a file for input to a python script or even echo
How can I specify this this.json file as a path to the something.py script?
You don't need the nested loops and if statements, just make a wildcard that matches all the directories in the path.
search_dir=/path/to/base
for file in "$search_dir"/rbb*/rbbp*/this.json
do
python3 something.py --input "$file"
done

Merge two directories keeping larger files

Consider for example
mkdir dir1
mkdir dir2
cd dir1
echo "This file contains something" > a
touch b
echo "This file contains something" > c
echo "This file contains something" > d
touch e
cd ../dir2
touch a
echo "This file contains something" > b
echo "This file contains something" > c
echo "This file contains more data than the other file that has the same name but is in the other directory. BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla. bla!" > d
I would like to merge dir1 and dir2. If two files have the same name, then only the one which size is the largest must be kept. Here is the expected content of the merged directory
a # Comes from `dir1`
b # Comes from `dir2`
c # Comes from either `dir1` or `dir2`
d # Comes from `dir2`
e # Comes from `dir1`(is empty)
Assuming that no file name a newline:
find . -type f -printf '%s %p\n' \
| sort -nr \
| while read -r size file; do
if ! [ -e "dest/${file#./*/}" ]; then
cp "$file" "dest/${file#./*/}";
fi;
done
The output of find is a list of "filesize path":
221 ./dir1/a
1002 ./dir1/b
11 ./dir2/a
Then we sort the list numeric:
1002 ./dir1/b
221 ./dir1/a
11 ./dir2/a
And fianlly we reach the while read -r size filename loop, where each file is copied over to the destination dest/${file#./*/} if they don't already exists.
${file#./*/} expands to the value of the parameter file with the leading directory removed:
./abc/def/foo/bar.txt -> def/foo/bar.txt, which means you might need to create the directory def/foo in the dest directory:
| while read -r size file; do
dest=dest/${file#./*/}
destdir=${dest%/*}
[ -e "$dest" ] && continue
[ -e "$destdir" ] || mkdir -p -- "$destdir"
cp -- "$file" "$dest"
done
I cannot comment on the other answer due to not enough reputation, but I was getting a syntax error due to missing fi. I also got an error where the target directory needed to be created before copying. So:
find . -type f -printf '%s %p\n' | sort -nr | while read -r size file; do if ! [ -e "dest/${file#./*/}" ]; then mkdir -p "$(dirname "dest/${file#./*/}")" && cp "$file" "dest/${file#./*/}"; fi; done

Bash - Concatenate files in a directory ordered by date

i need some help with a simple script i m writing. The script takes as input a directory that contains files in the likes of :
FILENAME20160220.TXT
FILENAME20160221.TXT
FILENAME20160222.TXT
...
The script needs to have the directory as input, concatenate them into a new file called :
FILENAME.20160220_20160222.TXT
The above filenames need to have the "Earliest"_"Latest" date it finds. The script i ve written so far is this, but it doesnt produce the necessary output. Can someone help me tinker with it?
declare FILELISTING="FILELISTING.TXT"
declare SOURCEFOLDER="/Cat_test/cat_test/"
declare TEMPFOLDER="/Cat_Test/cat_test/temp/"
# Create temporary folder
cd $SOURCEFOLDER
mkdir $TEMPFOLDER
chk_abnd $?
# Move files into temporary folder
mv *.TXT $SOURCEFOLDER $TEMPFOLDER
chk_abnd $?
# Change directory to temporary folder
cd $TEMPFOLDER
chk_abnd $?
# Iterate through files in temp folder and create temporary listing files
for FILE in $TEMPFOLDER
do
echo $FILE >> $FILELISTING
done
# Iterate through the lines of FILELISTING and store dates into array for sorting
while read lines
do
array[$i] = "${$line:x:y}"
(( i++ ))
done <$FILELISTING
# Sort dates in array
for ((i = 0; i < $n ; i++ ))
do
for ((j = $i; j < $n; j++ ))
do
if [ $array[$i] -gt $array[$j] ]
then
t=${array[i]}
array[$i]=${array[$j]}
array[$j]=$t
fi
done
done
# Get first and last date of array and construct output filename
OT_FILE=FILENAME.${array[1]}_${array[-1]}.txt
# Sort files in folder
# Cat files into one
cat *.ACCT > "$OT_FILE.temp"
chk_abnd $?
# Remove Hex 1A
# tr '\x1A' '' < "$OT_FILE.temp" > $OT_FILE
# Cleanup - Remove File Listing
rm $FILE_LISTING
chk_abnd $?
rm $OT_FILE.temp
chk_abnd $?
Assuming that the base list of your files can be identified using FILENAME*.TXT which is nice and simple, ls can be used to generate an ordered list which will by default be ordered ascending alphabetically and thus (because of the date format you've chosen) in date ascending order.
You can get the earliest and lateest dates as follows:
$ earliest=$( ls -1 FILENAME*.TXT | head -1 | cut -c9-16 )
$ echo $earliest
20160220
$ latest=$( ls -1 FILENAME*.TXT | tail -1 | cut -c9-16 )
$ echo $latest
20160222
Therefore your file name can be produced using:
filename="FILENAME.${earliest}_${latest}.TXT"
And the concatenation should be as simple as:
cat $( ls -1 FILENAME*.TXT ) > ${filename}
though if you are writing to the same directory, you may wish to direct the output first to a temporary name that doesn't meet this pattern and then rename it. Perhaps something like:
earliest=$( ls -1 FILENAME*.TXT | head -1 | cut -c9-16 )
latest=$( ls -1 FILENAME*.TXT | tail -1 | cut -c9-16 )
filename="FILENAME.${earliest}_${latest}.TXT"
cat $( ls -1 FILENAME*.TXT ) > temp_${filename}
mv temp_${filename} ${filename}
Here are some hints, cat does most of the work.
If your filenames have fixed size date fields, as in your example, lexical sorting is enough.
ls -1 FILENAME* > allfiles
aggname=$(cat allfiles | sed -rn '1s/([^0-9]*)/\1./p;$s/[^0-9]*//p' |
paste -sd-)
cat allfiles | xargs cat > $aggname
you can combine the last two steps into one, but more readable this way.
don't reinvent the wheel.

counting the total numbers of files and directories in a provided folder including subdirectories and their files

I want to count all the files and directories from a provided folder including files and directories in a subdirectory. I have written a script which will count accurately the number of files and directory but it does not handle the subdirectories any ideas ???
I want to do it without using FIND command
#!/bin/bash
givendir=$1
cd "$givendir" || exit
file=0
directories=0
for d in *;
do
if [ -d "$d" ]; then
directories=$((directories+1))
else
file=$((file+1))
fi
done
echo "Number of directories :" $directories
echo "Number of file Files :" $file
Use find:
echo "Number of directories: $(find "$1" -type d | wc -l)"
echo "Number of files/symlinks/sockets: $(find "$1" ! -type d | wc -l)"
Using plain shell and recursion:
#!/bin/bash
countdir() {
cd "$1"
dirs=1
files=0
for f in *
do
if [[ -d $f ]]
then
read subdirs subfiles <<< "$(countdir "$f")"
(( dirs += subdirs, files += subfiles ))
else
(( files++ ))
fi
done
echo "$dirs $files"
}
shopt -s dotglob nullglob
read dirs files <<< "$(countdir "$1")"
echo "There are $dirs dirs and $files files"
find "$1" -type f | wc -l will give you the files, find "$1" -type d | wc -l the directories
My quick-and-dirty shellscript would read
#!/bin/bash
test -d "$1" || exit
files=0
# Start with 1 to count the starting dir (as find does), else with 0
directories=1
function docount () {
for d in $1/*; do
if [ -d "$d" ]; then
directories=$((directories+1))
docount "$d";
else
files=$((files+1))
fi
done
}
docount "$1"
echo "Number of directories :" $directories
echo "Number of file Files :" $files
but mind it: On my build folder for a project, there were quite some differences:
find: 6430 dirs, 74377 non-dirs
my script: 6032 dirs, 71564 non-dirs
#thatotherguy's script: 6794 dirs, 76862 non-dirs
I assume that has to do with the legions of links, hidden files etc., but I am too lazy to investigate: find is the tool of choice.
Here are some one-line commands that work without find:
Number of directories: ls -Rl ./ | grep ":$" | wc -l
Number of files: ls -Rl ./ | grep "[0-9]:[0-9]" | wc -l
Explanation:
ls -Rl lists all files and directories recursively, one line each.
grep ":$" finds just the results whose last character is ':'. These are all of the directory names.
grep "[0-9]:[0-9]" matches on the HH:MM part of the timestamp. The timestamp only shows up on file, not directories. If your timestamp format is different then you will need to pick a different grep.
wc -l counts the number of lines that matched from the grep.

How to segregate files based on recursive grep

I have a directory, sub-directories each containing some text files.
main-dir
|
sub-dir1
| file1 "foo"
|
sub-dir2
| file2 "bar"
|
sub-dir3
| file3 "foo"
These files file1, file2 contain same text. I want to segregate these sub-directories based on content of files. I would like to group sub-dir1 and sub-dir3 as files in these sub-dirs have same content. In this example, move sub-dir1 and sub-dir3 to another directory.
using grep in recursive mode lists out all subdirectories matching file content. How can I make use that of output.
Your solution could be simplified to this:
for dir in *; do
if grep "foo" "$dir/file1" >/dev/null; then
cp -rf "$dir" "$HOME_PATH/newdir/"
fi
done
but will work only when all directories actually contain a file file1.
Something like this:
grep -rl "foo" * | sed -r 's|(.*)/.*|\1|' | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
grep -rl "foo" * | while read f; do
dirname "$f"
done | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
find . -type f -exec grep -l "foo" {} \; | xargs -I {} dirname {} | sort -u |
while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
might be better.
I managed to write this script which solves my question.
PWD=`$pwd`
FILES=$PWD/*
for f in $FILES
do
str=$(cat $f/file1)
if [ "$str" == "foo" ];
then
cp -rf $f $HOME_PATH/newdir/
fi
done

Resources