Monitor and ftp newly-added files on Linux -- modify existing code - bash

The OS is centos 7, I have a small application to implement below functionality:
1.Read information from config.ini like this:
# Configuration file for ftpxml service
# Remote FTP server informations
ftpadress=1.2.3.4
username=test
password=test
# Local folders configuration
# folderA: folder for incomming files
folderA=/u02/dump
# folderB: Successfuly transfered files are copied here
folderB=/u02/dump_bak
# retrydir: when ftp upload fails, store failed files in this
# directory
retrydir=/u02/dump_retry
Monitor folder A. If there are any newly-added files in A, do step 3.
Ftp these new files to a remote ftp server in the order of their creation time, While upload finished, copy uploaded files to folder B and delete relevant files in folder A.
If ftp fails, store relevant files in retrydir and try to ftp them later.
Record every operation in a log file.
Detailed setting instruction for the application:
install ncftp package: yum install ncftp -y, it's not a service nor a daemon, just a client tool which is invoked in bash file for ftp purpose.
Customize these files to suit your setting using vi: config.ini
,ftpmon.path and ftpmon.service
copy ftpmon.path and ftpmon.service to /etc/systemd/system/, copy config.ini and ftpxml.sh to /u02/ftpxml/, run: chmod +x ftpxml.sh
Start the monitoring tool
sudo systemctl start ftpmon.path
If you want to enable it at boot time just enter: sudo systemctl enable ftpmon.path
Setup a cron task to purge queued files (add option -p)
*/5 * * * * /u02/ftpxml/ftpxml.sh -p
Now the application seems works well, except a special situation:
When we put several files in folder A continuously, for instance, put 1.txt, 2.txt and 3.txt...... one after another in a short time, we usually found 1.txt ftp well, but the upcoming files fails to ftp and still stay under folder A.
Now I am going to fix this problem. I suppose the error maybe due to: while doing ftp for the first file, maybe the second file is already created under folder A. so the code can't care about the second file.
Below is code of ftpxml.sh:
#!/bin/bash
# ! Please read the README.txt file !
# Copy files from upload dir to remote FTP then save them
# to folderB
# look our location
SCRIPT=$(readlink -f $0)
# Absolute path to this script
SCRIPTPATH=`dirname $SCRIPT`
PIDFILE=${SCRIPTPATH}/ftpmon_prog.lock
# load config.ini
if [ -f $SCRIPTPATH/config.ini ]; then
source $SCRIPTPATH/config.ini
else
echo "No config found. Exiting"
fi
# Lock to avoid multiple instances
if [ -f $PIDFILE ]; then
kill -0 $(cat $PIDFILE) 2> /dev/null
if [ $? == 0 ]; then
exit
fi
fi
# Store PID in lock file
echo $$ > $PIDFILE
# Parse cmdline arguments
while getopts ":ph" opt; do
case $opt in
p)
#we set the purge mode (cron mode)
purge_only=1
;;
\?|h)
echo "Help text"
exit 1
;;
esac
done
# declare usefull functions
# common logging function
function logmsg() {
LOGFILE=ftp_upload_`date +%Y%m%d`.log
echo $(date +%m-%d-%Y\ %H:%M:%S) $* >> $SCRIPTPATH/log/${LOGFILE}
}
# Upload to remote FTP
# we use ncftpput to batch silently
# $1 file to upload $2 return value placeholder
function upload() {
ncftpput -V -u $username -p $password $ftpadress /prog/ $1
return $?
}
function purge_retry() {
failed_files=$(ls -1 -rt ${retrydir}/*)
if [ -z $failed_files ]; then
return
fi
while read line
do
#upload ${retrydir}/$line
upload $line
if [ $? != 0 ]; then
# upload failed we exit
exit
fi
logmsg File $line Uploaded from ${retrydir}
mv $line $folderB
logmsg File $line Copyed from ${retrydir}
done <<< "$failed_files"
}
# first check out 'queue' directory
purge_retry
# if called from a cron task we are done
if [ $purge_only ]; then
rm -f $PIDFILE
exit 0
fi
# look in incoming dir
new_files=$(ls -1 -rt ${folderA}/*)
while read line
do
# launch upload
if [ Z$line == 'Z' ]; then
break
fi
#upload ${folderA}/$line
upload $line
if [ $? == 0 ]; then
logmsg File $line Uploaded from ${folderA}
else
# upload failed we cp to retry folder
echo upload failed
cp $line $retrydir
fi
# don't care upload successfull or failed, we ALWAYS move the file to folderB
mv $line $folderB
logmsg File $line Copyed from ${folderA}
done <<< "$new_files"
# clean exit
rm -f $PIDFILE
exit 0
below is content of ftpmon.path:
[Unit]
Description= Triggers the service that logs changes.
Documentation= man:systemd.path
[Path]
# Enter the path to monitor (/u02/dump)
PathModified=/u02/dump/
[Install]
WantedBy=multi-user.target
below is content of ftpmon.service:
[Unit]
Description= Starts File Upload monitoring
Documentation= man:systemd.service
[Service]
Type=oneshot
#Set here the user that ftpmxml.sh will run as
User=root
#Set the exact path to the script
ExecStart=/u02/ftpxml/ftpxml.sh
[Install]
WantedBy=default.target
Thanks in advance, hope any experts can give me some suggestion.

As you remove the successfull transfered files from A you can leave files with transfer errors in A. So I am dealing only with files in one folder.
List your files by creation time with
find -type f -maxdepth 1 -print0 | xargs -r0 stat -c %y\ %n | sort
if you want hidden files to be included or - if not -
find -type f -maxdepth 0 -print0 | xargs -r0 stat -c %y\ %n | sort
You'll get something like
2016-02-19 18:53:41.000000000 ./.dockerenv
2016-02-19 18:53:41.000000000 ./.dockerinit
2016-02-19 18:56:09.000000000 ./versions.txt
2016-02-19 19:01:44.000000000 ./test.sh
Now cut the filenames (or use xargs -r0 stat -c %n if it does not matter that the files are order by name instead the timestamp) and
do the transfer
check the success
move successfully transfered files to B
As you stated above, there are situations where newly stored files are not successfully transfered. This may be if the file is written further after you started the transfer. So filter the timestamp to be at least some time old. Add -mmin -1 to the find statement for "at least one minute old"
find -type f -maxdepth 0 -mmin -1 -print0 | xargs -r0 stat -c %n | sort
If you don't want to use a minute file age you'll have to check if the file is still open: lsof | grep ./testfile but this may have issues if you have tmpfs in your file system.
lsof | grep ./testfile
lsof: WARNING: can't stat() tmpfs file system /var/lib/docker/containers/8596cd310292a54652c7f50d7315c8390703b4816442146b340946779a72a40c/shm
Output information may be incomplete.
lsof: WARNING: can't stat() proc file system /run/docker/netns/fb9323486c44
Output information may be incomplete.
So add %s to the stats statement to check the file size twice within some seconds and if it's constant the file may be written complete. May, as the write process may be stalled.

Related

Are delete events considered close_write in inotifywait?

I have a simple inotifywait script that watches for FTP file uploads to be closed and then moving them to a aws s3. It seems to be working except that in the inotify logs, it indicates that the file was not found ( although the file was indeed uploaded to s3 ).
The s3 move command moves the file to the cloud and deletes it locally. Could this be because inotifywait detects deleting the file as a close_write event ?
Why is inotify seems to be executing the commands twice ?
TARGET=/home/*/ftp/files
inotifywait -m -r -e close_write $TARGET |
while read directory action file
do
if [[ "$file" =~ .*mp4$ ]]
then
echo COPY PATH IS "$directory$file"
aws s3 mv "$directory$file" s3://bucket
fi
done
example logs:
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
COPY PATH IS /home/user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4
COPY PATH IS /home/user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4
COPY PATH IS /home/user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4
move: ../user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4 to s3://bucket/user-cam-1_00_20220516114055.mp4
upload: ../user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4 to s3://bucket/user-cam-1_00_20220516114055.mp4
move failed: ../user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4 to s3://bucket/user-cam-1_00_20220516114055.mp4 [Errno 2] No such file or directory: '/home/user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4'
rm: cannot remove '/home/user/ftp/files/2022/05/16/user-cam-1_00_20220516114055.mp4': No such file or directory
Cleaned-up your script and added some safety with quotes and check for already processed file in case the filesystem triggers duplicate events for same file.
#!/usr/bin/env bash
# Prevents expanding pattern without matches
shopt -s nullglob
# Expands pattern into an array
target=(/home/*/ftp/files/)
# Creates temporary directory and cleanup trap
declare -- tmpdir=
if tmpdir=$(mktemp -d); then
trap 'rm -fr -- "$tmpdir"' EXIT INT
else
# or exit error if it fails
exit 1
fi
# In case no target matches, exit error
[ "${#target[#]}" -gt 0 ] || exit 1
s3move() {
local -- p=$1
local -- tmp="$tmpdir/$p"
printf 'Copy path is: %s\n' "$p"
# Moves the file to temporary dir
# so it is away from inotify watch dir ASAP
mv -- "$p" "$tmp"
# Then perform the slow remote copy to s3 bucket
# Remove the echo onces it is ok
echo aws s3 mv "$p" s3://bucket
# File has been copied to s3, tmp file no longer needed
rm -f -- "$tmp"
}
while read -r -d '' p; do
# Skip if file does not exist, as it has already been moved away
# case of a duplicate event for already processed file
[ -e "$p" ] || continue
s3move "$p"
done < <(
# Good practice to spell long option names in a script
# --format will print null-delimited full file path
inotifywait \
--monitor \
--recursive \
--event close_write \
--includei '.*\.mp4$' \
--format '%w%f%0' \
"${target[#]}" 2>/dev/null
)

cron not able to run the commands in shell

I am trying to run the following cron job from bash (RHEL 7.4), An entry level postgres DB backup script I could write:
#!/bin/bash
# find latest file
echo $PATH
cd /home/postgres/log/
echo "------------ backup starts-----------"
latest_file=$( ls -t | head -n 1 | grep '\.log$' )
echo "latest file"
echo $latest_file
# find older files than above
echo "old file"
old_file=$( find . -maxdepth 1 -name "postgresql*" ! -newer $latest_file -mmin +1 )
if [ -f "$old_file" ]
then
echo $old_file
file_name=${old_file##*/}
echo "file name"
echo $file_name
# zip older file
tar czvf /home/postgres/log/archived_logs/$old_file.gz /home/postgres/log/$file_name
rm -rf /home/postgres/log/$file_name
else
echo "no old file found"
fi
Above is running correctly from shell and performing the intended tasks. It is also echoing needed info.
I have installed it with postgres user (not root) with crontab -e
*/2 * * * * /home/postgres/log/rollup.sh >> /home/postgres/log/logfile.csv 2>&1
It is correctly echoing (text which I have embedded for testing) but not the commands output to the .csv. Although it is not my concern. My concern is , it is not running those few commands at all.
I have given another try by changing the log file (.csv) path to /dev/null and commands in shell script are executing. I am not getting what I am missing here.
.csv file has 777 as permission , just to test

Download a fix number of directories from ftp server

I have an FTP server with thousands of directories. What I want to do is to download a specific number of them (for example, 500 directories) using a shell script. How can I do that? I tried wget with -Q command. For example, "wget -Q25MB", which gives me 25MB of data. The problem is that each folder has a different size. Therefore, using this command will stop the download in the middle of getting a specific folder.
Assuming wget returns an error when the download get interrupted:
#!/bin/bash
to_del= # empty to_del in case you want to copy-paste this to a terminal instead of using a file
username=blablabla
password=blablabla
server=blablabla
printf -v today '%(%Y_%m_%d)T'
# Get the 500 first directory names to download
ftp -n "$server" << EOF | grep -v '^\.\.\?$' | head -n 502 > "to_download_$today.txt"
user $username $password
ls
bye
EOF
# Then, you can download each folder one by one:
while read -r dir; do
if [[ -e $dir ]]; then
echo >&2 "WARNING: '$dir' already exists!"
continue # We don't download or remove it. Manual action needed
fi
if wget "$username:$password#$server/$dir"; then
to_del+=("$dir")
else
# A directory was not successfully downloaded, we delete the temporary files
echo >&2 "WARNING: '$dir' download failed, skipping..."
rm -rf "$dir"
fi
done < "to_download_$today.txt"
# Now, delete the successfully downloaded folders using a single FTP connection
{
printf 'user %s %s\n' "$username" "$password"
for dir in "${to_del[#]}"; do
printf 'del %s\n' "$dir"
done
printf 'bye\n'
} | ftp -i -n "$server"

Bash: Check if remote directory exists using FTP

I'm writing a bash script to send files from a linux server to a remote Windows FTP server.
I would like to check using FTP if the folder where the file will be stored exists before attempting to create it.
Please note that I cannot use SSH nor SCP and I cannot install new scripts on the linux server. Also, for performance issues, I would prefer if checking and creating the folders is done using only one FTP connection.
Here's the function to send the file:
sendFile() {
ftp -n $FTP_HOST <<! >> ${LOCAL_LOG}
quote USER ${FTP_USER}
quote PASS ${FTP_PASS}
binary
$(ftp_mkdir_loop "$FTP_PATH")
put ${FILE_PATH} ${FTP_PATH}/${FILENAME}
bye
!
}
And here's what ftp_mkdir_loop looks like:
ftp_mkdir_loop() {
local r
local a
r="$#"
while [[ "$r" != "$a" ]]; do
a=${r%%/*}
echo "mkdir $a"
echo "cd $a"
r=${r#*/}
done
}
The ftp_mkdir_loop function helps in creating all the folders in $FTP_PATH (Since I cannot do mkdir -p $FTP_PATH through FTP).
Overall my script works but is not "clean"; this is what I'm getting in my log file after the execution of the script (yes, $FTP_PATH is composed of 5 existing directories):
(directory-name) Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
To solve this, do as follows:
To ensure that you only use one FTP connection, you create the input (FTP commands) as an output of a shell script
E.g.
$ cat a.sh
cd /home/test1
mkdir /home/test1/test2
$ ./a.sh | ftp $Your_login_and_server > /your/log 2>&1
To allow the FTP to test if a directory exists, you use the fact that "DIR" command has an option to write to file
# ...continuing a.sh
# In a loop, $CURRENT_DIR is the next subdirectory to check-or-create
echo "DIR $CURRENT_DIR $local_output_file"
sleep 5 # to leave time for the file to be created
if (! -s $local_output_file)
then
echo "mkdir $CURRENT_DIR"
endif
Please note that "-s" test is not necessarily correct - I don't have acccess to ftp now and don't know what the exact output of running DIR on non-existing directory will be - cold be empty file, could be a specific error. If error, you can grep the error text in $local_output_file
Now, wrap the step #2 into a loop over your individual subdirectories in a.sh
#!/bin/bash
FTP_HOST=prep.ai.mit.edu
FTP_USER=anonymous
FTP_PASS=foobar#example.com
DIRECTORY=/foo # /foo does not exist, /pub exists
LOCAL_LOG=/tmp/foo.log
ERROR="Failed to change directory"
ftp -n $FTP_HOST << EOF | tee -a ${LOCAL_LOG} | grep -q "${ERROR}"
quote USER ${FTP_USER}
quote pass ${FTP_PASS}
cd ${DIRECTORY}
EOF
if [[ "${PIPESTATUS[2]}" -eq 1 ]]; then
echo ${DIRECTORY} exists
else
echo ${DIRECTORY} does not exist
fi
Output:
/foo does not exist
If you want to suppress only the messages in ${LOCAL_LOG}:
ftp -n $FTP_HOST <<! | grep -v "Cannot create a file" >> ${LOCAL_LOG}

Bash curl returns 0 whenever the copy has finished or not

I'm calling curl on bash to copy a file from a mounted SD card with the option to resume the copy later if the device gets unmounted. I receive the same status exit code 0 when I interrupt the copy by unmounting the volume and when the file gets actually copied. Any suggestions how to catch the case where the file has not been copied?
I'm copying only one file at a time.
This is the command:
curl -C - -O file:///mnt/sdcard/DCIM/100/0044.MP4
I came to a solution which is not as clear as I want, but still working. I'm executing the command 2 times one after another, so when the first command returns 0 upon unmount, the second now tries to copy the file and return error code 37 because of the unreachable source. If the second command returns 0 I consider the file copied.
Following your concept you could have a script like this:
#!/bin/bash
# Copies files persistently.
#
# Usage: pc <filepath> [<filepath2>] ...
#
function pc {
local FILE
for FILE; do
echo "Copying $FILE."
until curl -C - -O "file://${FILE}" && curl -C - -O "file://${FILE}"; do
if [[ -e $FILE ]]; then
echo "File $FILE can't be copied."
break
else
echo "Waiting for $FILE."
until
sleep 5
[[ -e $FILE ]]
do
continue
done
fi
done
done
}
pc "$#"
You could also just embed the function to a bash startup script if you like.

Resources