Download a fix number of directories from ftp server - shell

I have an FTP server with thousands of directories. What I want to do is to download a specific number of them (for example, 500 directories) using a shell script. How can I do that? I tried wget with -Q command. For example, "wget -Q25MB", which gives me 25MB of data. The problem is that each folder has a different size. Therefore, using this command will stop the download in the middle of getting a specific folder.

Assuming wget returns an error when the download get interrupted:
#!/bin/bash
to_del= # empty to_del in case you want to copy-paste this to a terminal instead of using a file
username=blablabla
password=blablabla
server=blablabla
printf -v today '%(%Y_%m_%d)T'
# Get the 500 first directory names to download
ftp -n "$server" << EOF | grep -v '^\.\.\?$' | head -n 502 > "to_download_$today.txt"
user $username $password
ls
bye
EOF
# Then, you can download each folder one by one:
while read -r dir; do
if [[ -e $dir ]]; then
echo >&2 "WARNING: '$dir' already exists!"
continue # We don't download or remove it. Manual action needed
fi
if wget "$username:$password#$server/$dir"; then
to_del+=("$dir")
else
# A directory was not successfully downloaded, we delete the temporary files
echo >&2 "WARNING: '$dir' download failed, skipping..."
rm -rf "$dir"
fi
done < "to_download_$today.txt"
# Now, delete the successfully downloaded folders using a single FTP connection
{
printf 'user %s %s\n' "$username" "$password"
for dir in "${to_del[#]}"; do
printf 'del %s\n' "$dir"
done
printf 'bye\n'
} | ftp -i -n "$server"

Related

Unable to break out of While loop

I would like to break out of a loop when it gets to a blank line in a file. The issue is that my regexp's used to condition my data create a line with characters so I need something from the beginning to check if a line is empty or not so I can break out. What am I missing?
#!/bin/bash
#NOTES: chmod this script with chmod 755 to run as regular local user
#This line allows for passing in a source file as an argument to the script (i.e: ./script.sh source_file.txt)
input_file="$1"
#This creates the folder structure used to mount the SMB Share and copy the assets over to the local machines
SOURCE_FILES_ROOT_DIR="${HOME}/operations/source"
DESTINATION_FILES_ROOT_DIR="${HOME}/operations/copied_files"
#This creates the fileshare mount point and place to copy files over to on the local machine.
echo "Creating initial folders..."
mkdir -p "${SOURCE_FILES_ROOT_DIR}"
mkdir -p "${DESTINATION_FILES_ROOT_DIR}"
echo "Folders Created! Destination files will be copied to ${DESTINATION_FILES_ROOT_DIR}/SHARE_NAME"
while read -r line;
do
if [ -n "$line" ]; then
continue
fi
line=${line/\\\\///}
line=${line//\\//}
line=${line%%\"*\"}
SERVER_NAME=$(echo "$line" | cut -d / -f 4);
SHARE_NAME=$(echo "$line" | cut -d / -f 5);
ASSET_LOC=$(echo "$line" | cut -d / -f 6-);
SMB_MOUNT_PATH="//$(whoami)#${SERVER_NAME}/${SHARE_NAME}";
if df -h | grep -q "${SMB_MOUNT_PATH}"; then
echo "${SHARE_NAME} is already mounted. Copying files..."
else
echo "Mounting it"
mount_smbfs "${SMB_MOUNT_PATH}" "${SOURCE_FILES_ROOT_DIR}"
fi
cp -a ${SOURCE_FILES_ROOT_DIR}/${ASSET_LOC} ${DESTINATION_FILES_ROOT_DIR}
done < $input_file
# cleanup
hdiutil unmount ${SOURCE_FILES_ROOT_DIR}
exit 0
Expected result was for the script to realize when it gets to a blank line and then stops. The script works works when i remove the
if [ -n "$line" ]; then
continue
fi
The script runs and pulls assets but just keeps on going and never breaks out. When I do it as is now I get :
Creating initial folders...
Folders Created! Destination files will be copied to /Users/baguiar/operations/copied_files
Mounting it
mount_smbfs: server connection failed: No route to host
hdiutil: unmount: "/Users/baguiar/operations/source" failed to unmount due to error 16.
hdiutil: unmount failed - Resource busy
cat test.txt
This is some file
There are lines in it
And empty lines
etc
while read -r line; do
if [[ -n "$line" ]]; then
continue
fi
echo "$line"
done < "test.txt"
will print out
That's because -n matches strings that are not null, i.e., non-empty.
It sounds like you have a misunderstanding of what continue means. It does not mean "continue on in this step of the loop", it means "continue to the next step of the loop", i.e., go to the top of the while loop and run it starting with the next line in the file.
Right now, your script says "go line by line, and if the line is not empty, skip the rest of the processing". I think your goal is actually "go line by line, and if the line is empty, skip the rest of the processing". This would be achieved by if [[ -z "$line" ]]; then continue; fi
TL;DR You are skipping all the non-empty lines. Use -z to check if your variable is empty instead of -n.

Bash shell script not working

I have a script I am trying to work out to scan my LAN and send me notification if there is a new MAC address that does not appear in my master list. I believe my variables may be messed up. This is what I have:
#!/bin/bash
LIST=$HOME/maclist.log
MASTERFILE=$HOME/master
FILEDIFF="$(diff $LIST $MASTERFILE)"
# backup the maclist first
if [ -f $LIST ]; then
cp $LIST maclist_`date +%Y%m%H%M`.log.bk
else
touch $LIST
fi
# this will scan the network and extract the IP and MAC address
nmap -n -sP 192.168.122.0/24 | awk '/^Nmap scan/{IP=$5};/^MAC/{print IP,$3};{next}' > $LIST
# this will use a diff command to compare the maclist created above and master list of known good devices on the LAN
if [ $FILEDIFF ] 2> /dev/null; then
echo
echo "---- All is well on `date` ----" >> macscan.log
echo
else
# echo -e "\nWARNING!!" | `mutt -e 'my_hdr From:user#email.com' -s "WARNIG!! NEW DEVICE ON THE LAN" -i maclist.log user#email.com`
echo "emailing you"
fi
When I execute this when the maclist.log does not exist I get this response:
diff: /root/maclist.log: No such file or directory
If I execute it again with the maclist.log file existing the file gets renamed from the cp line without any issue.
The line
FILEDIFF="$(diff $LIST $MASTERFILE)"
executes the diff when it is run (not when you use $FILELIST later). At that time the list file hasn't been created.
The easiest fix is just to put the diff command in full where $FILELIST is currently used.

Monitor and ftp newly-added files on Linux -- modify existing code

The OS is centos 7, I have a small application to implement below functionality:
1.Read information from config.ini like this:
# Configuration file for ftpxml service
# Remote FTP server informations
ftpadress=1.2.3.4
username=test
password=test
# Local folders configuration
# folderA: folder for incomming files
folderA=/u02/dump
# folderB: Successfuly transfered files are copied here
folderB=/u02/dump_bak
# retrydir: when ftp upload fails, store failed files in this
# directory
retrydir=/u02/dump_retry
Monitor folder A. If there are any newly-added files in A, do step 3.
Ftp these new files to a remote ftp server in the order of their creation time, While upload finished, copy uploaded files to folder B and delete relevant files in folder A.
If ftp fails, store relevant files in retrydir and try to ftp them later.
Record every operation in a log file.
Detailed setting instruction for the application:
install ncftp package: yum install ncftp -y, it's not a service nor a daemon, just a client tool which is invoked in bash file for ftp purpose.
Customize these files to suit your setting using vi: config.ini
,ftpmon.path and ftpmon.service
copy ftpmon.path and ftpmon.service to /etc/systemd/system/, copy config.ini and ftpxml.sh to /u02/ftpxml/, run: chmod +x ftpxml.sh
Start the monitoring tool
sudo systemctl start ftpmon.path
If you want to enable it at boot time just enter: sudo systemctl enable ftpmon.path
Setup a cron task to purge queued files (add option -p)
*/5 * * * * /u02/ftpxml/ftpxml.sh -p
Now the application seems works well, except a special situation:
When we put several files in folder A continuously, for instance, put 1.txt, 2.txt and 3.txt...... one after another in a short time, we usually found 1.txt ftp well, but the upcoming files fails to ftp and still stay under folder A.
Now I am going to fix this problem. I suppose the error maybe due to: while doing ftp for the first file, maybe the second file is already created under folder A. so the code can't care about the second file.
Below is code of ftpxml.sh:
#!/bin/bash
# ! Please read the README.txt file !
# Copy files from upload dir to remote FTP then save them
# to folderB
# look our location
SCRIPT=$(readlink -f $0)
# Absolute path to this script
SCRIPTPATH=`dirname $SCRIPT`
PIDFILE=${SCRIPTPATH}/ftpmon_prog.lock
# load config.ini
if [ -f $SCRIPTPATH/config.ini ]; then
source $SCRIPTPATH/config.ini
else
echo "No config found. Exiting"
fi
# Lock to avoid multiple instances
if [ -f $PIDFILE ]; then
kill -0 $(cat $PIDFILE) 2> /dev/null
if [ $? == 0 ]; then
exit
fi
fi
# Store PID in lock file
echo $$ > $PIDFILE
# Parse cmdline arguments
while getopts ":ph" opt; do
case $opt in
p)
#we set the purge mode (cron mode)
purge_only=1
;;
\?|h)
echo "Help text"
exit 1
;;
esac
done
# declare usefull functions
# common logging function
function logmsg() {
LOGFILE=ftp_upload_`date +%Y%m%d`.log
echo $(date +%m-%d-%Y\ %H:%M:%S) $* >> $SCRIPTPATH/log/${LOGFILE}
}
# Upload to remote FTP
# we use ncftpput to batch silently
# $1 file to upload $2 return value placeholder
function upload() {
ncftpput -V -u $username -p $password $ftpadress /prog/ $1
return $?
}
function purge_retry() {
failed_files=$(ls -1 -rt ${retrydir}/*)
if [ -z $failed_files ]; then
return
fi
while read line
do
#upload ${retrydir}/$line
upload $line
if [ $? != 0 ]; then
# upload failed we exit
exit
fi
logmsg File $line Uploaded from ${retrydir}
mv $line $folderB
logmsg File $line Copyed from ${retrydir}
done <<< "$failed_files"
}
# first check out 'queue' directory
purge_retry
# if called from a cron task we are done
if [ $purge_only ]; then
rm -f $PIDFILE
exit 0
fi
# look in incoming dir
new_files=$(ls -1 -rt ${folderA}/*)
while read line
do
# launch upload
if [ Z$line == 'Z' ]; then
break
fi
#upload ${folderA}/$line
upload $line
if [ $? == 0 ]; then
logmsg File $line Uploaded from ${folderA}
else
# upload failed we cp to retry folder
echo upload failed
cp $line $retrydir
fi
# don't care upload successfull or failed, we ALWAYS move the file to folderB
mv $line $folderB
logmsg File $line Copyed from ${folderA}
done <<< "$new_files"
# clean exit
rm -f $PIDFILE
exit 0
below is content of ftpmon.path:
[Unit]
Description= Triggers the service that logs changes.
Documentation= man:systemd.path
[Path]
# Enter the path to monitor (/u02/dump)
PathModified=/u02/dump/
[Install]
WantedBy=multi-user.target
below is content of ftpmon.service:
[Unit]
Description= Starts File Upload monitoring
Documentation= man:systemd.service
[Service]
Type=oneshot
#Set here the user that ftpmxml.sh will run as
User=root
#Set the exact path to the script
ExecStart=/u02/ftpxml/ftpxml.sh
[Install]
WantedBy=default.target
Thanks in advance, hope any experts can give me some suggestion.
As you remove the successfull transfered files from A you can leave files with transfer errors in A. So I am dealing only with files in one folder.
List your files by creation time with
find -type f -maxdepth 1 -print0 | xargs -r0 stat -c %y\ %n | sort
if you want hidden files to be included or - if not -
find -type f -maxdepth 0 -print0 | xargs -r0 stat -c %y\ %n | sort
You'll get something like
2016-02-19 18:53:41.000000000 ./.dockerenv
2016-02-19 18:53:41.000000000 ./.dockerinit
2016-02-19 18:56:09.000000000 ./versions.txt
2016-02-19 19:01:44.000000000 ./test.sh
Now cut the filenames (or use xargs -r0 stat -c %n if it does not matter that the files are order by name instead the timestamp) and
do the transfer
check the success
move successfully transfered files to B
As you stated above, there are situations where newly stored files are not successfully transfered. This may be if the file is written further after you started the transfer. So filter the timestamp to be at least some time old. Add -mmin -1 to the find statement for "at least one minute old"
find -type f -maxdepth 0 -mmin -1 -print0 | xargs -r0 stat -c %n | sort
If you don't want to use a minute file age you'll have to check if the file is still open: lsof | grep ./testfile but this may have issues if you have tmpfs in your file system.
lsof | grep ./testfile
lsof: WARNING: can't stat() tmpfs file system /var/lib/docker/containers/8596cd310292a54652c7f50d7315c8390703b4816442146b340946779a72a40c/shm
Output information may be incomplete.
lsof: WARNING: can't stat() proc file system /run/docker/netns/fb9323486c44
Output information may be incomplete.
So add %s to the stats statement to check the file size twice within some seconds and if it's constant the file may be written complete. May, as the write process may be stalled.

Bash script with a loop not executing utility that has paramters passed in?

Anyone able to help me out? I have a shell script I am working on but for the loop below the command after "echo "first file is $firstbd" is not being executed.. the $PROBIN/proutil ?? Not sure why this is...
Basically I have a list of files in a directory (*.list), I grab them and read the first line and pass it as a parameter to the cmdlet then move the .list and the content of the .list to another directory (the .list has a list of files with full path).
for i in $(ls $STAGEDIR/*.list); do
echo "Working with $i"
# grab first .bd file
firstbd=`head -1 $i`
echo "First file is $firstbd"
$PROBIN/proutil $DBENV/$DBNAME -C load $firstbd tenant $TENANT -dumplist $STAGEDIR/$i.list >> $WRKDIR/$i.load.log
#move the list and its content to finished folder
binlist=`cat $i`
for movethis in $binlist; do
echo "Moving file $movethis to $STAGEDIR/finished"
mv $movethis $STAGEDIR/finished/
done
echo "Finished working with list $i"
echo "Moving it to $STAGEDIR/finished"
mv $i $STAGEDIR/finished/
done
The error I was getting is..
./tableload.sh: line 107: /usr4/dlc/bin/proutil /usr4/testdbs/xxxx2 -C load /usr4/dumpdir/xxxxx.bd tenant xxxxx -dumplist /usr4/dumpdir/PUB.xxxxx.list >> /usr4/dumpdir/PUB.xxxx.list.load.log: A file or directory in the path name does not exist... however if I run "/usr4/dlc/bin/proutil"
The fix was to remove ">> $WRKDIR/$i.load.log".. the binary utility wouldn't run when trying to output results to file.. strange..
A couple of really bad practices here
parse the output of ls
not quoting variables
iterating the lines of a file with cat and for
As shelter comments, you don't check that you've created all the directories in the path for your log file.
A rewrite:
for i in "$STAGEDIR"/*.list; do
echo "Working with $i"
# grab first .bd file
firstbd=$(head -1 "$i")
echo "First file is $firstbd"
# ensure the output directory exists
logfile="$WRKDIR/$i.load.log"
mkdir -p "$(dirname "$logfile")"
"$PROBIN"/proutil "$DBENV/$DBNAME" -C load "$firstbd" tenant "$TENANT" -dumplist "$STAGEDIR/$i.list" >> "$logfile"
# move the list and its content to finished folder
while IFS= read -r movethis; do
echo "Moving file $movethis to $STAGEDIR/finished"
mv "$movethis" "$STAGEDIR"/finished/
done < "$i"
echo "Finished working with list $i"
echo "Moving it to $STAGEDIR/finished"
mv "$i" "$STAGEDIR"/finished/
done

Bash: Check if remote directory exists using FTP

I'm writing a bash script to send files from a linux server to a remote Windows FTP server.
I would like to check using FTP if the folder where the file will be stored exists before attempting to create it.
Please note that I cannot use SSH nor SCP and I cannot install new scripts on the linux server. Also, for performance issues, I would prefer if checking and creating the folders is done using only one FTP connection.
Here's the function to send the file:
sendFile() {
ftp -n $FTP_HOST <<! >> ${LOCAL_LOG}
quote USER ${FTP_USER}
quote PASS ${FTP_PASS}
binary
$(ftp_mkdir_loop "$FTP_PATH")
put ${FILE_PATH} ${FTP_PATH}/${FILENAME}
bye
!
}
And here's what ftp_mkdir_loop looks like:
ftp_mkdir_loop() {
local r
local a
r="$#"
while [[ "$r" != "$a" ]]; do
a=${r%%/*}
echo "mkdir $a"
echo "cd $a"
r=${r#*/}
done
}
The ftp_mkdir_loop function helps in creating all the folders in $FTP_PATH (Since I cannot do mkdir -p $FTP_PATH through FTP).
Overall my script works but is not "clean"; this is what I'm getting in my log file after the execution of the script (yes, $FTP_PATH is composed of 5 existing directories):
(directory-name) Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
Cannot create a file when that file already exists.
To solve this, do as follows:
To ensure that you only use one FTP connection, you create the input (FTP commands) as an output of a shell script
E.g.
$ cat a.sh
cd /home/test1
mkdir /home/test1/test2
$ ./a.sh | ftp $Your_login_and_server > /your/log 2>&1
To allow the FTP to test if a directory exists, you use the fact that "DIR" command has an option to write to file
# ...continuing a.sh
# In a loop, $CURRENT_DIR is the next subdirectory to check-or-create
echo "DIR $CURRENT_DIR $local_output_file"
sleep 5 # to leave time for the file to be created
if (! -s $local_output_file)
then
echo "mkdir $CURRENT_DIR"
endif
Please note that "-s" test is not necessarily correct - I don't have acccess to ftp now and don't know what the exact output of running DIR on non-existing directory will be - cold be empty file, could be a specific error. If error, you can grep the error text in $local_output_file
Now, wrap the step #2 into a loop over your individual subdirectories in a.sh
#!/bin/bash
FTP_HOST=prep.ai.mit.edu
FTP_USER=anonymous
FTP_PASS=foobar#example.com
DIRECTORY=/foo # /foo does not exist, /pub exists
LOCAL_LOG=/tmp/foo.log
ERROR="Failed to change directory"
ftp -n $FTP_HOST << EOF | tee -a ${LOCAL_LOG} | grep -q "${ERROR}"
quote USER ${FTP_USER}
quote pass ${FTP_PASS}
cd ${DIRECTORY}
EOF
if [[ "${PIPESTATUS[2]}" -eq 1 ]]; then
echo ${DIRECTORY} exists
else
echo ${DIRECTORY} does not exist
fi
Output:
/foo does not exist
If you want to suppress only the messages in ${LOCAL_LOG}:
ftp -n $FTP_HOST <<! | grep -v "Cannot create a file" >> ${LOCAL_LOG}

Resources