Use inotifywait to change filename and further loop through sql loader - bash

Objective: The moment multiple.csv files are uploaded to the folder, code should check each filename, if appropriate filename, file should be further used by sqlloader to get data uploaded in the database. Once file is uploaded, code should delete the file processed. Next time, same process repeats.
I have some parts of the code working but some are creating problem, especially related to inotifywait. Please help.
In first loop, I am trying to monitor the /uploads folder, the moment it finds the .csv file, it checks if the filename has space. If yes, it wants to change the space to underscore in the filename. I have been trying to find a way to find "space, () or ," in the filename but only could do the 'space' part change. This is giving me an error that file cannot be moved, no such file or directory.
Second loop works separately but not when incorporated with first loop as there are errors which I have not been able to debug. If I run second loop separately, it is working correctly. But if there is a way to optimize the code better in one loop, I would be happy to know. Thanks!
Example: folder name: /../../upload
filenames: abc_123.csv (code should not make any change) , pqr(12 Apr).csv (code should change it to pqr_12_Apr.csv), May 12.csv (code should change it to May_12.csv) etc.
Once these 3 files have proper naming, it should be ready to be uploaded through sql loader and once files are processed, they get deleted.
My code is:
#!bin/bash
inotifywait -mqe create /../../upload | while read file; do
if [[ $file = '* *'.csv]]; then
mv "$file" ${file// /_}
fi
done
for file in /../..upload/*.csv
do
sqlcommand="sqlldr user/pwd control="/../xxx.ctl" data=$file silent=feedback, header"
$sqlcommand
rm $file
done
Thank you!

I have modified your script to this,
#!/usr/bin/env bash
while IFS= read -r file; do
filename=${file#* CREATE }
pathname=${file%/*}
if [[ $pathname/$filename = *\ *.csv ]]; then
echo mv -v "$pathname/$filename" "$pathname/${filename// /_}"
fi
done < <(inotifywait -mqe create /../../upload)
Remove the echo if you think the output is correct.
I just don't know how you can integrate the other parts of your script with that, probably create a separate script or remove the -m (which you don't want to do most probably). Well you could use a named pipe if mkfifo is available.
EDIT: as per OP's message add another parameter expansion for another string removal.
Add the code below the if [[ ... ]]; then
newfilename=${filename//\(\)}
Then change "${filename// /_}" to "${newfilename// /_}"

Related

Wildcard on mv folder destination

I'm writing a small piece of code that checks for .mov files in a specific folder over 4gb and writes it to a log.txt file by name (without an extension). I'm then reading the names into a while loop line by line which signals some archiving and copying commands.
Consider a file named abcdefg.mov (new) and a corresponding folder somewhere else named abcdefg_20180525 (<-*underscore timestamp) that also contains a file named abcedfg.mov (old).
When reading in the filename from the log.txt, I strip the extension to store the variable "abcdefg" ($in1) and i'm using that variable to locate a folder elsewhere that contains that matching string at the beginning.
My problem is with how the mv command seems to support a wild card in the "source" string, but not in the "destination" string.
For example i can write;
mv -f /Volumes/Myshare/SourceVideo/$in1*/$in1.mov /Volumes/Myshare/Archive
However a wildcard on the destination doesn't work in the same way. For example;
mv -f /Volumes/Myshare/Processed/$in1.mov Volumes/Myshare/SourceVideo/$in1*/$in1.mov
Is there an easy fix here that doesn't involve using another method?
Cheers for any help.
mv accepts a single destination path. Suppose that $in1 is abcdfg, and that $in1* expands to abcdefg_20180525 and abcdefg_20180526. Then the command
mv -f /dir1/$in1 /dir2/$in1*/$in1.mov
will be equivalent to:
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir2/abcdefg_20180525/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
Moreover, because the destination file is the same in all three cases, the first two files will be overwritten by the third.
You should create a precise list and do a precise copy instead of using wild cards.
This is what I would probably do, generate a list of results in a file with FULL path information, then read those results in another function. I could have used arrays but I wanted to keep it simple. At the bottom of this script is a function call to scan for files of EXT mp4 (case insensitive) then writes the results to a file in tmp. then the script reads the results from that file in another function and performs some operation (mv etc.). Note, if functions are confusing , you can just remove the function name { } and name calls and it becomes a normal script again. functions are really handy, learn to love them!
#!/usr/bin/env bash
readonly SIZE_CHECK_LIMIT_MB="10M"
readonly FOLDER="/tmp"
readonly DESTINATION_FOLDER="/tmp/archive"
readonly SAVE_LIST_FILE="/tmp/$(basename $0)-save-list.txt"
readonly EXT="mp4"
readonly CASE="-iname" #change to -name for exact ext type upper/lower
function find_files_too_large() {
> ${SAVE_LIST_FILE}
find "${FOLDER}" -maxdepth 1 -type f "${CASE}" "*.${EXT}" -size +${SIZE_CHECK_LIMIT_MB} -print0 | while IFS= read -r -d $'\0' line ; do
echo "FOUND => $line"
echo "$line" >> ${SAVE_LIST_FILE}
done
}
function archive_large_files() {
local read_file="${SAVE_LIST_FILE}"
local write_folder="$DESTINATION_FOLDER"
if [ ! -s "${read_file}" ] || [ ! -f "${read_file}" ] ;then
echo "No work to be done ... "
return
fi
while IFS= read -r line ;do
echo "mv $line $write_folder" ;sleep 1
done < "${read_file}"
}
# MAIN (this is where the script starts) We just call two functions.
find_files_too_large
archive_large_files
it might be easier, i think, to change the filenames to the folder name initially. So abcdefg.mov would be abcdefg_timestamp.mov. I can always strip the timestamp from the filename easy enough after its copied to the right location. I was hoping i had a small syntax issue but i think there is no easy way of doing what i thought i could...
I think you have a basic misunderstanding of how wildcards work here. The mv command doesn't support wildcards at all; the shell expands all wildcards into lists of matching files before they get passed to the mv command as wildcards. Furthermore, the mv command doesn't know if the list of arguments it got came from wildcards or not, and the shell doesn't know anything about what the command is going to do with them. For instance, if you run the command grep *, the grep command just gets a list of names of files in the current directory as arguments, and will treat the first of them as a regex pattern ('cause that's what the first argument to grep is) to search the rest of the files for. If you ran mv * (note: don't do this!), it will interpret all but the last filename as sources, and the last one as a destination.
I think there's another source of confusion as well: when the shell expands a string containing a wildcard, it tries to match the entire thing to existing files and/or directories. So when you use Volumes/Myshare/SourceVideo/$in1*/$in1.mov, it looks for an already-existing file in a matching directory; AIUI the file isn't there yet, there's no match. What it does in that case is pass the raw (unexpanded) wildcard-containing string to mv as an argument, which looks for that exact name, doesn't find it, and gives you an error.
(BTW, should there be a "/" at the front of that pattern? I assume so below.)
If I understand the situation correctly, you might be able to use this:
mv -f /Volumes/Myshare/Processed/$in1.mov /Volumes/Myshare/SourceVideo/$in1*/
Since the filename isn't supplied in the second string, it doesn't look for existing files by that name, just directories with the right prefix; mv will automatically retain the filename from the source.
However, I'll echo #Sergio's warning about chaos from multiple matches. In this case, it won't overwrite files (well, it might, but for other reasons), but if it gets multiple matching target directories it'll move all but the last one into the last one (along with the file you meant to move). You say you're 100% certain this won't be a problem, but in my experience that means that there's at least a 50% chance that something you'd never have thought of will go ahead and make it happen anyway. For instance, is it possible that $in1 could wind up empty, or contain a space, or...?
Speaking of spaces, I'd also recommend double-quoting all variable references. You want the variables inside double-quotes, but the wildcards outside them (or they won't be expanded), like this:
mv -f "/Volumes/Myshare/Processed/$in1.mov" "/Volumes/Myshare/SourceVideo/$in1"*/

Append to list of files in bash

so I'm trying to get a simple bash script to continuously read a directory and update a list of files to play through a command. However, I'm having some trouble thinking out the logic in it. What I need to do is put the current items in the directory into the list, have each item in the directory run through a program, and when a new item comes in, just append it to the list. I'm attempting to use inotifywait but can't seem to think of the proper logic. I may need it to run in the background, as the process that is running on these files will run before inotifywait is read again, at which point it will not pick up any new files that have been added as it only checks when it runs. Here's the code so hopefully it makes more sense.
#!/bin/bash
#Initial check to see if files are converted.
if [ ! -d "/home/pi/rpitx/converted" ]; then
echo "Converted directory does not exist, cannot play!"
exit 1
fi
CYAN='\e[36m'
NC='\e[39m'
LGREEN='\e[92m'
#iterate through directory first and act upon each item
for f in $FILES
do
echo -e "${CYAN}Now playing ${f##*/}...${NC}"
#Figure out a way to always watch directory even when it is playing
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to |
while read path action file; do
echo -e "${LGREEN}New file found: ${CYAN}${file}${NC}"
FILES+=($file)
done
# take action on each file. $f store current file name
sudo ./rpitx -m RF -i "${f}" -f 101100
done
exit 0
So for example. if rpitx is currently playing something, and a file is converted, it won't pick up the latest file and add it to the list, nor will it make it since it's always reading. Is there a way to get inotifywait to run in the background of this script somehow? Thanks.
This is actually quite a difficult problem to get 100% perfect, but it is possible to get pretty close.
It is easy to get all the files in a directory, and it is easy to use inotifywait to get iteratively informed of new files being placed into the directory. The issue is getting the two to be consistent. If inotifywait isn't started until all the files have been processed (or even just listed), then you might miss new files created between the listing and the invocation of inotifywait. If, on the other hand, you start inotifywait first, then a file created after the invocation of inotifywait and the extraction of the current file list will be listed twice.
Since it is easier to filter duplicates than notice orphans, the recommended approach is the second one.
As a first approximation, we could ignore the duplicate problem on the assumption that the window of vulnerability is pretty short and so it is probably unlikely to happen. This simplifies the code, but it's not that difficult to track and eliminate duplicates: we could, for example, store each filename as the key in an associative array, ignoring the file if the key already exists.
We need three processes: one to execute inotifywait; one to produce the list of initial files; and one to handle each file as it is identified. So the basic structure of the code will be:
list_new_files |
{ list_existing_files; pass_through; } |
while read action file; do
handle -r "$action" "$file"
done
Note that the second process first produces the existing files, and then calls pass_through, which reads from standard input and writes to standard output, thus passing through the files being discovered by list_new_files. Since pipes have a finite capacity, it is possible that the execution of list_existing_files will block a few times (if there are lots of existing files and handling them takes a long time), so when pass_through finally gets executed, it could have quite a bit of queued-up input to pass through. That doesn't matter, unless the first pipe also fills up, which will happen if a large number of new files are created. And that still won't matter as long as inotifywait doesn't lose notifications while it is blocked on a write. (This may actually be a problem, since the manpage for inotifywait on my system includes in the "BUGS" section the note, "It is assumed the inotify event queue will never overflow." We could fix the problem by inserting another process which carefully buffers inotifywait's output, but that shouldn't be necessary unless you intend to flood the directory with lots of files.)
Now, let's examine each of the functions in turn.
list_new_files could be just the call to inotifywait from your original script:
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to
Listing existing files is also easy. Here's one simple solution:
printf "%s\n" /home/pi/rpitx/converted/*
However, that will print out the full file path, which is different from the output from inotifywait. To make them the same, we cd into the directory in order to do the listing. Since we might not actually want to change the working directory, we use a subshell by surrounding the commands inside parentheses:
( cd /home/pie/rpitx/converted; printf "%s\n" *; )
The printf just prints its arguments each on a separate line. Since glob-expansions are not word-split or recursively glob-expanded, this is safe against whitespace or metacharacters in filenames, except newline characters. Filenames with newline characters are pretty rare; for now, I'll ignore the issue but I'll indicate how to handle it at the end.
Even with the change indicated above, the output from these two commands is not compatible: the first one outputs three things on each line (directory, action, filename), and the second one just one thing (the filename). In the listing below, you'll see how we modify the format to printf and introduce a format for inotifywait in order to make the outputs fully compatible, with the "action" for existing files set to EXISTING.
pass_through could, in theory, just be cat, and that's how I've coded it below. However, it is important that it operate in line-buffered mode; otherwise, nothing will happen until "enough" files have been written by list_existing_files. On my system, cat in this configuration works perfectly; if that doesn't work for you or you don't want to count on it, you could write it explicitly as a while read loop:
pass_through() {
while read -r line; do echo "$line"; done
}
Finally, handle is essentially the code from the original post, but modified a bit to take the new format into account, and to do the right thing with action EXISTING.
# Colours. Note the use of `$'...'` to actually store the code,
# thereby avoiding the need to later reinterpret backslash sequences
CYAN=$'\e[36m'
NC=$'\e[39m'
LGREEN=$'\e[92m'
converted=/home/pi/rpitx/converted
list_new_files() {
inotifywait -m "$converted" -e create -e moved_to --format "%e %f"
}
# Note the use of ( ) around the body instead of { }
# This is the same as `{( ... )}'; it makes the `cd` local to the function.
list_existing_files() (
cd "$converted"
printf "EXISTING %s\n" *
)
# Invoked as `handle action filename`
handle() {
case "$1" in
EXISTING)
echo "${CYAN}Now playing ${2}...${NC}"
;;
*)
echo "${LGREEN}New file found: ${CYAN}${file}${NC}"
;;
esac
sudo ./rpitx -m RF -i "${f}" -f 101100
}
# Put everything together
list_new_files |
{ list_existing_files; cat; } |
while read -r action file; do handle "$action" "$file"; done
What if we thought a filename might have a newline character in it? There are two "safe" characters which could be used to delimit the filenames, in the sense that they cannot appear inside a filename. One is /, which can obviously appear in a path, but cannot appear in a simple filename, which is what we're working with here. The other one is the NUL character, which cannot appear inside a filename at all, but can sometimes be a bit annoying to deal with.
Normally, faced with this problem, we would use a NUL, but that depends on the various utilities we're using allowing the separation of data with NUL instead of newline. That's not the case for inotifywait, which always outputs a newline after a notification line. So in this case it seems simpler to use a /. First we modify the formats:
inotifywait -m "$converted" -e create -e moved_to --format "%e %f/"
printf "%s/\n" *
Now, when we're reading the lines, we need to read until we find a line ending with / (and remember to remove it). read doesn't allow two-character line terminators, so we need to accumulate the lines ourselves:
while read -r action file; do
# If file doesn't end with a slash, we need to read another line
while [[ file != */ ]] && read -r line; do
file+=$'\n'"$line"
done
# Remember to remove the trailing slash
handle "$action" "${file%/}"
done

Loop through a folder of PDF files and append a single PDF to each

This code just seems to replace the first file, not append file1.pdf to it.
I need the file to append not replace.
#!/bin/bash
FILES=("/Users/a/folder/"*.pdf)
for f in "${FILES[#]}"
do
echo "${f}"
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" -o "${f}" "${f}" "/Users/a/folder2/file1.pdf"
done
I noticed, if I run the code manually, but use a different name for the first and second parameters, it seems to work. However, I do not know how to change the name of the first parameter without making it a constant.
It seems to me your problem has nothing to do with Ruby. As I'm understanding it, you are trying to use the command line on MacOS X El Capitan to merge a PDF file with other PDF files.
If I understood your problem correctly, then you probably should heed the advice of this weblog and use the command "/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" which is available from MacOS X Tiger onwards.
Note that if the file you want to append is in the same directory where all the files are you want to append to, you'll run into problems: the script join.py does not seem to appreciate being given the same file thrice, so place your file elsewhere (the one you want to append to all files).
Try something along the lines of:
#!/bin/bash
for f in /Absolute/Path/To/The/PDFS/*.pdf;
do /System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py -o $f $f /Absolute/Path/To/The/File/To/Append; done
Solution:
#!/bin/bash
FILES=("/Users/a/folder/"*.pdf)
for f in "${FILES[#]}"
do
echo "${f}"
a="${f%.pdf}"
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" -o "${a}_x.pdf" "${f}" "/Users/a/folder2/file1.pdf"
done

Bash - extremely simple script redirecting output to file

Disclaimer: I'm very new to bash and for some reason I'm having a very hard time learning this one. The syntax seems very different depending on the website I visit.
I have a simple wrapper script that I want to test if a file is gzipped or not, and if so, to zcat the file to a new temporary file and open it in an editor. Here's part of the script:
if file $FILE | grep -q gzip
then
timestamp=$(date +"%D_%T")
$( zcat $FILE > tmp-$timestamp )
fi
I'm getting an error: "tmp-10/19/15_15:16:41: No such file or directory"
I tried removing the command substitution syntax or putting tmp-$timestamp in double quotes and I get the same error. If I remove the -$timestamp part, then it seems to work fine. Can someone tell me what's going on here? I'm clearing missing something very simple.
tmp-10/19/15_15:16:41 refers to a file named 15_15:16:41 in directory 19 which is a subdirectory of tmp-10. If those directories and subdirectories do not exist, you cannot write to them.
Replace:
timestamp=$(date +"%D_%T")
With:
timestamp=$(date +"%F_%T")
This gives the date without the /.
As an example of this format:
$ date +"%F_%T"
2015-10-19_12:37:05
With %F, the year comes before the month which comes before the day. This means that your files will sort properly. For most people, that is an important advantage over %D.
Revised script
Your script can be simplified to:
if file "$file" | grep -q gzip
then
zcat "$file" > "tmp-$(date +"%F_%T")"
fi
Notes:
It is best practices not to use all caps for your shell variable. The system uses all caps for its variables and you don't want to accidentally overwrite one. Use lower case or mixed case and you'll be safe.
File names, such as $file, should always be in double-quotes. Some day, someone will give you a file name with a space in it and you don't want that to cause your script to fail.
The command substitution $(...) does not belong here. It has been removed.

Linux shell script delete/restore/trash part 2

Thank you in advance for any help, this is coursework so further reading/ pointers is greatly appreciated.
I asked a question the other day relating to my own delete/trash/restore scripts and I have completed delete and trash as well as giving delete a backup text file for Restore to use later on.
However, instead of giving me errors, the Restore script just kinda stops in the console. Like when I type # ~/Restore -n the cursor skips to the next line without the usual # and I have to close it manually. Likewise without the -n option. The -n option should ask for a new location to restore to, and without it should restore to the files original location.
I'll post my script, see what y'all think.
#!/bin/bash
if [ "$1" == "-n" ]
then cd ~/rubbish
restore= grep $2 ~/store
filename= basename "$restore"
echo "Type the files new location"
read location
location1 = "readlink -f $location"
mv -i $filename "$location1" /$filename
else cd ~/rubbish
restore= grep $2 ~/store
filename= basename "$restore"
mv -i $filename "$location1" $location
fi
so, ~/rubbish is my own created directory to act as a recycle bin and ~/store is my text file which appends the deleted files readlink details on deletion. I can post the whole 3 scripts if necessary?
Many thanks!
If you call ~/Restore -n it will go to the if part and do a grep $2 ~/store. Since there is no parameter $2 it will result in grep ~/store, which tells grep to search for "~/store" in the input coming from standard input.
That's why your script stops and waits for input.
You can either test for a second parameter or enclose $2 in double quotes to make sure grep gets the correct number of parameters. Better yet, do both: 1. test for a second parameter and 2. enclose $2 in double quotes.
Some more points:
Don't put spaces around =
enclose commands in backticks `, if you want to capture the output
And no spaces between directory and filename
So, you should presumably write
restore=`grep "$2" ~/store`
filename=`basename "$restore"`
echo "Type the files new location"
read location
location1=`readlink -f "$location"`
mv -i $filename "$location1/$filename"
I suggest you look at bash info and follow the "Books and Resources".
I wrote one of these quite some time ago which I still use today. I don't have a restore script because I wrote it so that you could open your desktop trash can, right click and select "Restore". In other words it follows the Linux "trash info" standard.
http://wiki.linuxquestions.org/wiki/Scripting#KDE4_Command_Line_Trash_Can

Resources