Parse CSV to find names corresponding to code, then copying folders with matching code to folders with corresponding name - bash

I'm trying to automate the packaging of files and contents from various sources using a bash script.
I have a main directory which contains pdf files, a csv file, and various folders with additional contents. The folders are named with the location code they pertain to, e.g. 190, 191, etc.
A typical row in my csv file looks like this: form_letters_Part1.pdf,PX_A31_smith.adam.pdf,190,
Where the first column is the original pdf name, the second is what it will be renamed to, and the third column is the location code the person belongs to.
The first part of my script renames the pdf files from the cover letters format to the PX_A31... format, and then creates a directory for each file and moves them into it.
#!/usr/bin/tcsh bash
sed 's/"//g' rename_list_lab.csv | while IFS=, read orig new num; do
mv "$orig" "$new"
done
echo 'Rename Done.'
for file in *.pdf; do
mkdir "${file%.*}"
mv "$file" "${file%.*}"
done
echo 'Directory creation done.'
What needs to happen next is the folders with the location-specific contents get copied into those new directories just created, corresponding to the location code from the csv file.
So I tried this after the above echo 'Directory Creation Done.' line:
echo 'Directory Creation Done.'
sed 's/"//g' rename_list.csv | while IFS=, read orig new num; do
for folder in *; do
if [[ -d .* = "$num" ]]; then
cp -R "$folder" "${file%.*}"
fi
done
echo 'Code Folder Contents Sort Done.'
However this results in a syntax error:
syntax error in conditional expression
syntax error near `='
` if [[ -d .* = "$num" ]]; then'
EDIT: To clarify the second part if statement, the intended logic of the statement is as follows: For the items in the current directory, if it is a directory, and the name of the directory matches the location code from the csv, that directory should be copied to any directories which have that same corresponding location code in the csv.
In other words, if the newly created directory from the first part is PX_A31_smith.adam whose location code in the csv line above is 190, then the folder called 190 should be copied into the directory PX_A31_smith.adam.
If three other people also have the 190 code in the csv, the 190 directory should also be copied to those as well.
EDIT 2: I resolved the syntax error, and also realized I had an nonterminated do statement. Fixing those, still seem to be having trouble with the evaluation of the if statement. Updated script below:
#!/usr/bin/tcsh bash
sed 's/"//g' rename_list.csv | while IFS=, read orig new num; do
mv "$orig" "$new"
done
echo '1 Done.'
for file in *.pdf; do
mkdir "${file%.*}"
mv "$file" "${file%.*}"
done
echo '2 done.'
sed 's/"//g' rename_list.csv | while IFS=, read orig new num; do
for folder in * ; do
if [[ .* = "$num" ]]; then
cp -R "$folder" "${file%.*}"
else echo "No matches found."
fi
done
done
echo '3 Done.'

I'm not really sure if this answers your question, but I think it will at least set you on the right track. Structurally, I just combined all of the loops into one. This removes some of the possible logic errors that would not be considered syntax errors like the use of $file in the second part. This is a local variable to the loop in the first part and no longer exists. However, this would be interpreted as an empty string.
#!/usr/bin/bash
#^Fixed shebang line.
sed 's/"//g' rename_list.csv | while IFS=, read -r orig new num; do
if [[ -f $orig ]]; then #If the file we want to rename is indeed a file.
mkdir "${new%.*}" #make the directory from the file name you want
mv "$orig" "${new%.*}/$new" #Rename when we move the file into the new directory
if [[ -d $num ]]; then #If the number directory exists
cp -R "$num" "${new%.*}" #Fixed this based on your edit.
else
#Here you can handle what to do if the number directory does not exist.
echo "$num is not a directory."
fi
else
#Here you can handle what to do if the file does not exist.
echo "The file $orig does not exist."
fi
done
Edited based on your clarification
Note: This is pretty lacking as far as error checking goes. Remember, any of these functions could fail, which will have unwanted behavior. Either check if [[ $? != 0 ]] to check the exit status (0 being success) of the last issued command. You could also do something like mkdir somedir || exit 2 to exit on failure.

Related

Execute rm with string on file and delete line

I have a file (log.txt) with multiples lines.
Uploaded 1Y3JxCDpjsId_f8C7YAGAjvHHk-y-QVQM at 1.9 MB/s, total 3.9 MB
Uploaded 14v58hwKP457ZF32rwIaUFH216yrp9fAB at 317.3 KB/s, total 2.1 MB
Each line in log.txt represents a file that needs to be deleted.
I want to delete the file and then delete the respective line.
Example:
rm 1Y3JxCDpjsId_f8C7YAGAjvHHk-y-QVQM
and after deleting the file that the log.txt contains, delete the line, leaving only the others.
Uploaded 14v58hwKP457ZF32rwIaUFH216yrp9fAB at 317.3 KB/s, total 2.1 MB
Try this:
#!/bin/bash
logfile="logfile.txt"
logfilecopy=$( mktemp )
cp "$logfile" "$logfilecopy"
while IFS= read -r line
do
filename=$( echo "$line" | sed 's/Uploaded \(.*\) at .*/\1/' )
if [[ -f "$filename" ]]
then
tempfile=$( mktemp )
rm -f "$filename" && grep -v "$line" "$logfile" >"$tempfile" && mv "$tempfile" "$logfile"
fi
done < "$logfilecopy"
# Cleanup
rm -f "$logfilecopy"
It does:
keep a copy of the original log file.
read each line of this copy using while and read.
for each line, extract the filename. Note, done with sed since a filename could contain spaces. Therefore cut would not work as required.
If the file exists, delete it, remove the line from the log file and store it in a temporary file, move the temporary file into the log file.
that last step is done with && between commands to ensure that the last command is done before continuing. If the rm fails, the log entry must not be deleted.
finally delete the original log file copy.
you can add echo statements and-or -x to $!/bin/bash to debug if required.
The following code reads log.txt line by line, captures the filename with a bash ERE and tries to delete that file. When the regex or the deletion fails it outputs the original line.
#!/bin/bash
tmpfile=$( mktemp ) || exit 1
while IFS='' read -r line
do
[[ $line =~ ^Uploaded\ (.*)\ at ]] &&
rm -- "${BASH_REMATCH[1]}" ||
echo "$line"
done < log.txt > "$tmpfile" &&
mv "$tmpfile" log.txt
remark: the while loop final result is true unless there's a problem reading log.txt or generating "$tmpfile", so chaining the mv with && makes it so that you won't overwrite the original logfile abusively.
Another approach using bash4+ and GNU tools.
#!/usr/bin/env bash
##: Save the file names in an array named files using mapfile aka readarray.
##: Process Substitution and With GNU grep that supports the -P flag.
mapfile -t files < <(grep -Po '(?<=Uploaded ).*(?= at)' log.txt)
##: loop through the files ("${files[#]}") and check if it is existing (-e).
##: If it does, save them in an array named existing_file.
##: Add an additional test if need be, see "help test".
for f in "${files[#]}"; do
[[ -e $f ]] && existing_file+=("$f")
done
##: Format the array existing_file into a syntax that is accepted
##: by GNU sed, e.g. "/file1|file2|file3|file4/d" and save it
##: in a variable named to_delete.
to_delete=$(IFS='|'; printf '%s' "/${existing_file[*]}/d")
##: delete/remove the existing files.
##: Not sure if ARG_MAX will come up.
echo rm -v -- "${existing_file[#]}"
##: Remove the deleted files (lines that contains the file name)
##: from log.txt using GNU sed.
echo sed -E -i "$to_delete" log.txt
Remove all the echo if you're satisfied with the output.
This not exactly what you asked for and it is not perfect but it just might be what you need.

Check if files exist in case some files contain [ Bash

I've got a set of files, let say
file1.txt
File2.txt
File [3].txt
file 4.txt
In my script, I store the path of each file in a var called $file.
Here is my issue:
in bash, testing the existence of it with following command
[[ ! -f "$file" ]]
WILL WORK (= system see that the file exists) for regular file like
file1.txt
File2.txt
file 4.txt BUT WILL NOT WORK (= system don't find the file - as it is not existing) with file containing [ ] in it, like File [3].txt does.
I assume it is because of the [ ] that interfer with the double [[. Testing with
test ! -f "$file"
is the same, system do not see it and return a missing file.
What can I do to escape the [ or to avoid such behaviour ? I've tried to find the solution on the net, but as I type "check if file exist with filename containing [" there is a bias as [ / [[ is used to check the existence..
Thanks for your help !
EDIT - 2022-01-15
Here is the loop I'm using
while read -r file; do
if [[ ! -f "$file" ]]; then
echo "Missing file $file"
fi
done < Compil.all ;
where Compil.all is a text file containing the path of file :
$cat Compil.all
/media/veracrypt1/file1.txt
/media/veracrypt1/File2.txt
/media/veracrypt1/File [3].txt
/media/veracrypt1/file 4.txt
$
AS I don't want to have issue with space in filenames, I've put the following code in the beginning of the script. Could it be the reason ?
IFS=$(echo -en "\n\b")
How are you storing the file var?
Simply iterating works as shown below:
$ ls
file1.txt File2.txt 'File [3].txt' 'file 4.txt'
$ for file in ./* ;do if [[ -f "$file" ]];then echo $file; fi; done
./file1.txt
./File2.txt
./File [3].txt
./file 4.txt
This also works:
$ [[ ! -f "File [3].txt" ]]
$ echo $?
1

How to identify files which are not in list using bash?

Unfortunately my knowledge in bash not so well and I have very non-standard task.
I have a file with the files list.
Example: /tmp/my/file1.txt /tmp/my/file2.txt
How can I write a script which can check that files from folder /tmp/my exist and to have two types messages after script is done.
1 - Files exist and show files:
/tmp/my/file1.txt
/tmp/my/file2.txt
2 - The folder /tmp/my including files and folders which are not in your list. The files and folders:
/tmp/my/test
/tmp/my/1.txt
You speak of files and folders, which seems unclear.
Anyways, I wanted to try it with arrays, so here we go :
unset valid_paths; declare -a valid_paths
unset invalid_paths; declare -a invalid_paths
while read -r line
do
if [ -e "$line" ]
then
valid_paths=("${valid_paths[#]}" "$line")
else
invalid_paths=("${invalid_paths[#]}" "$line")
fi
done < files.txt
echo "VALID PATHS:"; echo "${valid_paths[#]}"
echo "INVALID PATHS:"; echo "${invalid_paths[#]}"
You can check for the files' existence (assuming a list of files, one filename per line) and print the existing ones with a prefix using this
# Part 1 - check list contents for files
while read thefile; do
if [[ -n "$thefile" ]] && [[ -f "/tmp/my/$thefile" ]]; then
echo "Y: $thefile"
else
echo "N: $thefile"
fi
done < filelist.txt | sort
# Part 2 - check existing files against list
for filepath in /tmp/my/* ; do
filename="$(basename "$filepath")"
grep "$filename" filelist.txt -q || echo "U: $filename"
done
The files that exist are prefixed here with Y:, all others are prefixed with N:
In the second section, files in the tmp directory that are not in the file list are labelled with U: (unaccounted for/unexpected)
You can swap the -f test which checks that a path exists and is a regular file for -d (exists and is a directory) or -e (exists)
See
man test
for more options.

Bash script with a loop not executing utility that has paramters passed in?

Anyone able to help me out? I have a shell script I am working on but for the loop below the command after "echo "first file is $firstbd" is not being executed.. the $PROBIN/proutil ?? Not sure why this is...
Basically I have a list of files in a directory (*.list), I grab them and read the first line and pass it as a parameter to the cmdlet then move the .list and the content of the .list to another directory (the .list has a list of files with full path).
for i in $(ls $STAGEDIR/*.list); do
echo "Working with $i"
# grab first .bd file
firstbd=`head -1 $i`
echo "First file is $firstbd"
$PROBIN/proutil $DBENV/$DBNAME -C load $firstbd tenant $TENANT -dumplist $STAGEDIR/$i.list >> $WRKDIR/$i.load.log
#move the list and its content to finished folder
binlist=`cat $i`
for movethis in $binlist; do
echo "Moving file $movethis to $STAGEDIR/finished"
mv $movethis $STAGEDIR/finished/
done
echo "Finished working with list $i"
echo "Moving it to $STAGEDIR/finished"
mv $i $STAGEDIR/finished/
done
The error I was getting is..
./tableload.sh: line 107: /usr4/dlc/bin/proutil /usr4/testdbs/xxxx2 -C load /usr4/dumpdir/xxxxx.bd tenant xxxxx -dumplist /usr4/dumpdir/PUB.xxxxx.list >> /usr4/dumpdir/PUB.xxxx.list.load.log: A file or directory in the path name does not exist... however if I run "/usr4/dlc/bin/proutil"
The fix was to remove ">> $WRKDIR/$i.load.log".. the binary utility wouldn't run when trying to output results to file.. strange..
A couple of really bad practices here
parse the output of ls
not quoting variables
iterating the lines of a file with cat and for
As shelter comments, you don't check that you've created all the directories in the path for your log file.
A rewrite:
for i in "$STAGEDIR"/*.list; do
echo "Working with $i"
# grab first .bd file
firstbd=$(head -1 "$i")
echo "First file is $firstbd"
# ensure the output directory exists
logfile="$WRKDIR/$i.load.log"
mkdir -p "$(dirname "$logfile")"
"$PROBIN"/proutil "$DBENV/$DBNAME" -C load "$firstbd" tenant "$TENANT" -dumplist "$STAGEDIR/$i.list" >> "$logfile"
# move the list and its content to finished folder
while IFS= read -r movethis; do
echo "Moving file $movethis to $STAGEDIR/finished"
mv "$movethis" "$STAGEDIR"/finished/
done < "$i"
echo "Finished working with list $i"
echo "Moving it to $STAGEDIR/finished"
mv "$i" "$STAGEDIR"/finished/
done

Moving a file and adding the date to the filename

#!/bin/bash
while read server <&3; do #read server names into the while loop
if [[ ! $server =~ [^[:space:]] ]] ; then #empty line exception
continue
fi
echo "Connecting to - $server"
#ssh "$server" #SSH login
while read updatedfile <&3 && read oldfile <&4; do
echo Comparing $updatedfile with $oldfile
if diff "$updatedfile" "$oldfile" >/dev/null ; then
echo The files compared are the same. No changes were made.
else
echo The files compared are different.
# copy the new file and put it in the right location
# make a back up of the old file and put in right location (time stamp)
# rm the old file (not the back up)
#cp -f -v $newfile
# ****************************
mv $oldfile /home/u0146121/backupfiles/$oldfile_$(date +%F-%T)
# ****************************
fi
done 3</home/u0146121/test/newfiles.txt 4</home/u0146121/test/oldfiles.txt
done 3</home/u0146121/test/servers.txt
The line between * is where I am having trouble with my script. It would output the file with both the date and filename. It just uses the date. I want it to do both.
Variable names may contain underscores, so you can't have underscores immediately after bare variable names. In your case you're actually using an (undefined) variable $oldfile_ in the destination file name, so that the new name is constructed as "empty string + date". Put the variable name between curly brackets
mv $oldfile /home/u0146121/backupfiles/${oldfile}_$(date +%F-%T)
and the renaming should work as expected.

Resources