bash script to pattern match and delete files - bash

Problem: In a directory, I have files of the following form:
<account-number>-<invoice-number>, an example being:
123456-3456789
123456-6789023
123456-2568907
...
456789-2347890
456789-2344357
etc.
What I want to do is, if there are more than 1 invoices for the same account, delete all except the latest. If there's only one, leave it alone.
Thanks for any pointers.

You can use this awk based script:
mkdir _tmp
ls -rt *-*|awk -F'-' '{a[$1]=$0} END{for (i in a) system("mv " a[i] " _tmp/")}'
Once you're satisfied with the files in ./_tmp/ remove all files from current directory and move files over.

Here is a pure bash solution (replace echo with rm when you validate it)
for file1 in *-*
do
IFS=- arr=($file1)
for file2 in "${arr[0]}"*
do
[ "$file1" -nt "$file2" ] && echo "$file2"
done
done

A nice one in Bash:
(pseudo "in place" processing)
#!/bin/bash -e
ADIR="/path/to/account/directory"
TMP="$ADIR.tmp"
mkdir "$TMP" && rmdir "$TMP" && mv "$ADIR" "$TMP" && mkdir "$ADIR"
while IFS=- read ACCNT INVOICE < <( ls -t1 "$TMP" )
do
mv "$TMP/$ACCNT-$INVOICE" "$ADIR/$ACCNT-$INVOICE" && rm "$TMP/$ACCNT"*
done
rmdir "$ADIR.tmp"
what it does:
1 first move the a(ccounts) directory to a temporary directory. (is atomic)
2 in a loop: list newest invoice, move it to the new directory, delete invoices with same account.
3 remove temporary directory
PROs:
solid, safe, short, reasonably fast and halts on serious errors
CONs:
Very definitive, be sure to have always a backup
Comment:
You may have noticed mkdir "$TMP" && rmdir "$TMP"
This is on purpose: rmdir gives the same returnvalue for "dir not exist" as "dir not empty"
so instead of checking which of the two it is
[ -d $DIRNAME ] && { rmdir $DIRNAME || exit }
I used the above construction.
Also the ls -t1 "$TMP may be at a strange place at first sight
But it is OK, every iteration it will be executed again (but only the first line is read)

Related

Bash: splitting filename by space? A backup rollback script

I need some help with Bash. I am a Python/Rust guy and do not understand bash too well. I have a "backup" script which copies a selected file to a "$filename $datetime.backup" file. Now I need to write a rollback script which copies latest backup file over the original (without space and datetime and backup suffix). Any guides will be appreciated.
Backup script, for your convenience:
set -e
DT=$(date --iso=seconds)
for f in $*
do
OLD="${f%/}"
NEW="${f%/} $DT.backup"
cp --no-clobber --recursive "$OLD" "$NEW"
done
Use parameter expansion to get the original name back.
for b in *.backup ; do
original=${b% *}
cp "$b" "$original"
done
${b% *} removes everything after the last space from $b.
Solved! Yay!
set -e
LB=$(ls $1 *.backup | sort --reverse | head -n 1)
echo "Moving $1 to trash for safe keeping"
trash "$1"
echo "Copying from $LB"
cp --no-clobber --recursive "$LB" "$1"

if then else statement will not loop properly

I figured how to get an if then else statement to work but it now seems to have broken. =( I cannot work out what is going wrong!
There are up to 10 directories in ./ called barcode01 - 09 and one called unclassified. This script is supposed to go into each one, prep the directory for ~/Taxonomy.R (Which requires all the fastq files to be gzipped and put into a sub-directory titled "data". It then runs the ~/Taxonomy.R script to make a metadata file for each.
Edit the tmp.txt file is created using ls > tmp.txt then echo "0" >> tmp.txt to make a sacrificial list of directories for the script to chew through then stop when it gets to 0.
#!/bin/bash
source deactivate
source activate R-Env
value=(sed -n 1p tmp.txt)
if [ "$value" = "0" ]
then
rm tmp.txt
else
cd "$(sed -n 1p tmp.txt)"
gzip *fastq
#
for i in *.gz
do
mv "$i" "${i%.*}_R1.fastq.gz";
done
#this adds the direction identifier "R1" to all the fastq.gzips
mkdir Data
mv *gz Data
~/Taxonomy3.R
cd Data
mv * ..
cd ..
rm -r Data
cd ..
sed '1d' tmp.txt > tmp2.txt
mv tmp2.txt tmp.txt
fi
Currently, it is only making the metadata file in the first barcode directory.
If you indent your code, things will get a lot clearer.
On the other hand, modifying your tmp.txt file this way id slow and dangerous. Better traverse its contents only reading it.
#!/bin/bash
source deactivate
source activate R-Env
for value in $(<tmp.txt)
do
cd "$value"
gzip *fastq
for i in *.gz
do
# This adds the direction identifier "R1" to all the fastq.gzips
mv "$i" "${i%.*}_R1.fastq.gz"
done
mkdir Data
mv *gz Data
~/Taxonomy3.R
mv Data/* .
rmdir Data
cd -
done
rm tmp.txt
With this reworked script you only need to create the tmp.txt file WITHOUT adding any marker at the end (in fact, you never needed it, you could have checked for empty file).
For each folder in the script, the operations you wanted are executed. I simplified some folder changing, minimizing it to the required ones for the R script to properly run. To go back, I used cd -, which goes to the previous folder, that way you can have more than one leven in your tmp.txt file.
Hope everything else is clear.

bash check for subdirectories under directory

This is my first day scripting, I use linux but needed a script that I have been racking my brain until i finally ask for help. I need to check a directory that has directories already present to see if any new directories are added that are not expected.
Ok I think i have got this as simple as possible. The below works but displays all files in the directory as well. I will keep working at it unless someone can tell me how not to list the files too | I tried ls -d but it is doing the echo "nothing new". I feel like an idiot and should have got this sooner.
#!/bin/bash
workingdirs=`ls ~/ | grep -viE "temp1|temp2|temp3"`
if [ -d "$workingdirs" ]
then
echo "nothing new"
else
echo "The following Direcetories are now present"
echo ""
echo "$workingdirs"
fi
If you want to take some action when a new directory is created, used inotifywait. If you just want to check to see that the directories that exist are the ones you expect, you could do something like:
trap 'rm -f $TMPDIR/manifest' 0
# Create the expected values. Really, you should hand edit
# the manifest, but this is just for demonstration.
find "$Workingdir" -maxdepth 1 -type d > $TMPDIR/manifest
while true; do
sleep 60 # Check every 60 seconds. Modify period as needed, or
# (recommended) use inotifywait
if ! find "$Workingdir" -maxdepth 1 -type d | cmp - $TMPDIR/manifest; then
: Unexpected directories exist or have been removed
fi
done
Below shell script will show directory present or not.
#!/bin/bash
Workingdir=/root/working/
knowndir1=/root/working/temp1
knowndir2=/root/working/temp2
knowndir3=/root/working/temp3
my=/home/learning/perl
arr=($Workingdir $knowndir1 $knowndir2 $knowndir3 $my) #creating an array
for i in ${arr[#]} #checking for each element in array
do
if [ -d $i ]
then
echo "directory $i present"
else
echo "directory $i not present"
fi
done
output:
directory /root/working/ not present
directory /root/working/temp1 not present
directory /root/working/temp2 not present
directory /root/working/temp3 not present
**directory /home/learning/perl present**
This will save the available directories in a list to a file. When you run the script a second time, it will report directories that have been deleted or added.
#!/bin/sh
dirlist="$HOME/dirlist" # dir list file for saving state between runs
topdir='/some/path' # the directory you want to keep track of
tmpfile=$(mktemp)
find "$topdir" -type d -print | sort -o "$tmpfile"
if [ -f "$dirlist" ] && ! cmp -s "$dirlist" "$tmpfile"; then
echo 'Directories added:'
comm -1 -3 "$dirlist" "$tmpfile"
echo 'Directories removed:'
comm -2 -3 "$dirlist" "$tmpfile"
else
echo 'No changes'
fi
mv "$tmpfile" "$dirlist"
The script will have problems with directories that have very exotic names (containing newlines).

Mac OS X Terminal batch rename...but with folder paths

I tried incorrectly to add my question on to a very similar thread w/ good solutions here:
mac os x terminal batch rename
I have essentially the same question, but I'm wanting to do this and change the folder path when renaming. Here is what I asked:
Would any of these solutions work to change underscores to a folder path? For example, I have mbox files on one level that need to be nested, such as:
TopLevel_NextLevel_mbox
TopLevel_NextLevel_FinalLevel_mbox
I'd like to automatically put these in a hierarchy like so:
TopLevel/NextLevel/mbox
TopLevel/NextLevel/FinalLevel/mbox
Can this be done? When I try simple replacement with "/", I get this:
fred$ for f in *_mbox; do mv "$f" "${f/_//}"; done
mv: rename TopLevel_NextLevel_mbox to TopLevel/NextLevel_mbox: No such file or directory
Looks like it just tries to sub in the "/", but then gets confused because there is no current folder TopLevel w/ NextLevel_mbox inside it...
Thanks,
Fred
The basics of making this work starts with the process of creating an array from the current directories that contain *mbox. Each array key then contains the resulting delimited word found between the underscores:
TopLevel_NextLevel_mbox
Is transformed into an array like this:
( TopLevel, NextLevel, mbox )
From there we create the first directory TopLevel then perform a cd followed by mkdir on the next key — repeating the process until there are no more keys. By doing this each array key creates a new nested directory (as a bonus it also copies any data from the original directory into the new one whilst keeping it's structure).
Create Nested Folders from Original
#!/bin/bash
DIR=$PWD
for f in *mbox
do cd $DIR
if [[ -d $f ]]; then
ARR=(${f//_/ }); n=0
for i in "${ARR[#]}"
do echo $n
if [[ $n -eq 0 ]]; then
mkdir -p $i && cp -R $f/* $i && cd $_
else
mkdir -p $i && cd $_
fi
let n++
done
fi
done
Pseudo One-liner
DIR=$PWD; for f in *mbox; do cd $DIR; if [[ -d $f ]]; then ARR=(${f//_/ }); n=0; for i in "${ARR[#]}"; do if [[ $n -eq 0 ]]; then mkdir -p $i && cp -R $f/* $i && cd $_; else mkdir -p $i && cd $_; fi; let n++; done; fi; done
This is the same exact script as the one above it, however, it's formatted to be one line. * The script leaves the original directories intact (I'll leave the exercise of removing them up to the OP).

Removing old directories with logs

My IM stores the logs according to the contact name. I have created a file with the list of active contacts. My problem is following:
I would like to create a bash script with read the active contacts names from the file and compare it with the directories. If the directory name wouldn't be found on the list, it would be moved to another directory (let's call it "archive"). I try to visualise it for you.
content of the list:
contact1
contact2
content of the dir
contact1
contact2
contact3
contact4
after running of the script, the content fo the dir:
contact1
contact2
contact3 ==> ../archive
contact4 ==> ../archive
You could use something like this:
mv $(ls | grep -v -x -F -f ../file.txt) ../archive
Where ../file.txt contains the names of the directories that should not be moved. It is assumed here that the current directory only contains directories, if that is not the case, ls should be replaced with something else. Note that the command fails if there are no directories that should be moved.
Since in the comments to the other answer you state that directories with whitespace in the name can occur, you could replace this by:
for i in *
do
echo $i | grep -v -x -q -F -f ../file.txt && mv "$i" ../archive
done
This is an improved version of marcog's answer. Note that the associative array requires Bash 4.
#!/bin/bash
sourcedir=/path/to/foo
destdir=/path/to/archive
contactfile=/path/to/list
declare -A contacts
while read -r contact
do
contacts[$contact]=1
done < "$contactfile"
for contact in "$sourcedir"/*
do
if [[ -f $contact ]]
then
index=${contact##*/}
if [[ ! ${contacts[$index]} ]]
then
mv "$contact" "$destdir"
fi
fi
done
Edit:
If you're moving directories instead of files, then change the for loop above to look like this:
for contact in "$sourcedir"/*/
do
index=${contact/%\/}
index=${index##*/}
if [[ ! ${contacts[$index]} ]]
then
mv "$contact" "$destdir"
fi
done
There might be a more concise solution, but this works. I'd strongly recommend prefixing the mv with echo to test it out first, otherwise you could end up with a serious mess if it doesn't do what you want.
declare -A contacts
for contact in "$#"
do
contacts[$contact]=1
done
ls a | while read contact
do
if [[ ! ${contacts[$contact]} ]]
then
mv "a/$contact" ../archive
fi
done

Resources