program starting next line of code when gzip is still running - shell

In a shell script first I am generating a file, then zipping it using gzip and then transferring it using scp to remote machine. The issue here before the gzip completes successfully the pointer going to the next line of code and so because of this I am having partial transfer of gz file in the remote machine.
What I mean here is gzip command is starting the zip but before the zip get completed (as file size is big, so it should take some time to complete the zip process), the next line of code is getting executed; which is scp and so I am having partial file transfer in the remote machine.
My question is what is the option for gzip which I can specify and with that the pointer shouldn't move to the next line of code before zip process gets completed successfully.
GZIP is used like below in my code :
gzip -c <filename> > <zip_filename> 2>&1 | tee -a <log_filename>
Please suggest.

First of all, if I do this
gzip -cr /home/username > homefolder.gz
the bash waits till end of the command, before executing the next (even if used in a script).
However, if I'm mistaken or your ran a command in the background, you can wait till gzip is finished, using the following code:
#!/bin/bash
gzip -cr "$1" > "$1.gz" &
while true; do
if fuser -s "$1" "$1.gz"; then
echo "gzip still compressing '$1'"
else
echo "gzip finished compressing '$1' (archive is '$1.gz')"
break
fi
done
exit 0
Or if you just want to wait and do nothing more, it's even simpler:
gzip -cr "$1" > "$1.gz"
while fuser -s "$1" "$1.gz"; do :; done
# copy your file wherever you want

Related

Extracting certain files from a tar archive on a remote ssh server

I am running numerous simulations on a remote server (via ssh). The outcomes of these simulations are stored as .tar archives in an archive directory on this remote server.
What I would like to do, is write a bash script which connects to the remote server via ssh and extracts the required output files from each .tar archive into separate folders on my local hard drive.
These folders should have the same name as the .tar file from which the files come (To give an example, say the output of simulation 1 is stored in the archive S1.tar on the remote server, I want all '.dat' and '.def' files within this .tar archive to be extracted to a directory S1 on my local drive).
For the extraction itself, I was trying:
for f in *.tar; do
(
mkdir ../${f%.tar}
tar -x -f "$f" -C ../${f%.tar} "*.dat" "*.def"
)
done
wait
Every .tar file is around 1GB and there is a lot of them. So downloading everything takes too much time, which is why I only want to extract the necessary files (see the extensions in the code above).
Now the code works perfectly when I have the .tar files on my local drive. However, what I can't figure out is how I can do it without first having to download all the .tar archives from the server.
When I first connect to the remote server via ssh username#host, then the terminal stops with the script and just connects to the server.
Btw I am doing this in VS Code and running the script through terminal on my MacBook.
I hope I have described it clear enough. Thanks for the help!
Stream the results of tar back with filenames via SSH
To get the data you wish to retrieve from .tar files, you'll need to pass the results of tar to a string of commands with the --to-command option. In the example below, we'll run three commands.
# Send the files name back to your shell
echo $TAR_FILENAME
# Send the contents of the file back
cat /dev/stdin
# Send EOF (Ctrl+d) back (note: since we're already in a $'' we don't use the $ again)
echo '\004'
Once the information is captured in your shell, we can start to process the data. This is a three-step process.
Get the file's name
note that, in this code, we aren't handling directories at all (simply stripping them away; i.e. dir/1.dat -> 1.dat)
you can write code to create directories for the file by replacing the forward slashes / with spaces and iterating over each directory name but that seems out-of-scope for this.
Check for the EOF (end-of-file)
Add content to file
# Get the files via ssh and tar
files=$(ssh -n <user#server> $'tar -xf <tar-file> --wildcards \'*\' --to-command=$\'echo $TAR_FILENAME; cat /dev/stdin; echo \'\004\'\'')
# Keeps track of what state we're in (filename or content)
state="filename"
filename=""
# Each line is one of these:
# - file's name
# - file's data
# - EOF
while read line; do
if [[ $state == "filename" ]]; then
filename=${line/*\//}
touch $filename
echo "Copying: $filename"
state="content"
elif [[ $state == "content" ]]; then
# look for EOF (ctrl+d)
if [[ $line == $'\004' ]]; then
filename=""
state="filename"
else
# append data to file
echo $line >> <output-folder>/$filename
fi
fi
# Double quotes here are very important
done < <(echo -e "$files")
Alternative: tar + scp
If the above example seems overly complex for what it's doing, it is. An alternative that touches the disk more and requires to separate ssh connections would be to extract the files you need from your .tar file to a folder and scp that folder back to your workstation.
ssh -n <username>#<server> 'mkdir output/; tar -C output/ -xf <tar-file> --wildcards *.dat *.def'
scp -r <username>#<server>:output/ ./
The breakdown
First, we'll make a place to keep our outputted files. You can skip this if you already know the folder they'll be in.
mkdir output/
Then, we'll extract the matching files to this folder we created (if you don't want them to be in a different folder remove the -C output/ option).
tar -C output/ -xf <tar-file> --wildcards *.dat *.def
Lastly, now that we're running commands on our machine again, we can run scp to reconnect to the remote machine and pull the files back.
scp -r <username>#<server>:output/ ./

Monitor Pre-existing and new files in a directory with bash

I have a script using inotify-tool.
This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):
inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done
By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:
for OLDFILE in ${path}
do
work on/with OLDFILE
move NEWFILE no a new directory
done
I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.
Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?
Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.
So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:
function list-existing-and-follow-modify() {
local path="$1"
inotifywait --monitor \
--event modify \
--format %f \
-- \
"$path" \
2> >( while IFS= read -r line ; do
printf '%s\n' "$line" >&2
if [[ "$line" = 'Watches established.' ]] ; then
for file in "$path"/* ; do
if [[ -e "$file" ]] ; then
basename "$file"
fi
done
break
fi
done
cat >&2
)
}
and then write:
list-existing-and-follow-modify "$path" \
| while IFS= read -r file
# ... work on/with "$file"
# move "$file" to a new directory
done
Notes:
If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.
The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.
I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.
There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.
By using inotifywait, one can only monitor new files.
I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.
... how can one make sure that the script will catch all files?
Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.
You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...
If you have no special characters and spaces in filenames, I would go with:
inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
lock "${path}"
work on "${path}/${file}"
ex. mv "${path}/${file}" ${new_location}
unlock "${path}"
done
where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.
I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.
I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.
Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.
Maybe a simple example and/or startpoint would be:
work() {
local file="$1"
some work "$file"
mv "$file" "$predefiend_path"
}
process_new_files_loop() {
# let's work on modified files in parallel, so that it is faster
trap 'wait' INT
inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
work "${path}/${file}" &
done
}
process_old_files_loop() {
# maybe we should parse in parallel here too?
# maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?
find "${path}" -type f |
while IFS= read -r file; do
work "${file}"
done
}
process_new_files_loop &
child=$!
sleep 1
if ! ps -p "$child" >/dev/null 2>&1; then
echo "ERROR running processing-new-file-loop" >&2
exit 1
fi
process_old_files_loop
wait # wait for process_new_file_loop
If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.
A simpler solution is to add an ls in front of the inotifywait in a subshell, with awk to create output that looks like inotifywait.
I use this to detect and process existing and new files:
(ls ${path} | awk '{print "'${path}' EXISTS "$1}' && inotifywait -m ${path} -e close_write -e moved_to) |
while read dir action file; do
echo $action $dir $file
# DO MY PROCESSING
done
So it runs the ls, format the output and sends it to stdout, then runs the inotifywait in the same subshell sending the output also to stdout for processing.

Monitor and ftp newly-added files on Linux -- modify existing code

The OS is centos 7, I have a small application to implement below functionality:
1.Read information from config.ini like this:
# Configuration file for ftpxml service
# Remote FTP server informations
ftpadress=1.2.3.4
username=test
password=test
# Local folders configuration
# folderA: folder for incomming files
folderA=/u02/dump
# folderB: Successfuly transfered files are copied here
folderB=/u02/dump_bak
# retrydir: when ftp upload fails, store failed files in this
# directory
retrydir=/u02/dump_retry
Monitor folder A. If there are any newly-added files in A, do step 3.
Ftp these new files to a remote ftp server in the order of their creation time, While upload finished, copy uploaded files to folder B and delete relevant files in folder A.
If ftp fails, store relevant files in retrydir and try to ftp them later.
Record every operation in a log file.
Detailed setting instruction for the application:
install ncftp package: yum install ncftp -y, it's not a service nor a daemon, just a client tool which is invoked in bash file for ftp purpose.
Customize these files to suit your setting using vi: config.ini
,ftpmon.path and ftpmon.service
copy ftpmon.path and ftpmon.service to /etc/systemd/system/, copy config.ini and ftpxml.sh to /u02/ftpxml/, run: chmod +x ftpxml.sh
Start the monitoring tool
sudo systemctl start ftpmon.path
If you want to enable it at boot time just enter: sudo systemctl enable ftpmon.path
Setup a cron task to purge queued files (add option -p)
*/5 * * * * /u02/ftpxml/ftpxml.sh -p
Now the application seems works well, except a special situation:
When we put several files in folder A continuously, for instance, put 1.txt, 2.txt and 3.txt...... one after another in a short time, we usually found 1.txt ftp well, but the upcoming files fails to ftp and still stay under folder A.
Now I am going to fix this problem. I suppose the error maybe due to: while doing ftp for the first file, maybe the second file is already created under folder A. so the code can't care about the second file.
Below is code of ftpxml.sh:
#!/bin/bash
# ! Please read the README.txt file !
# Copy files from upload dir to remote FTP then save them
# to folderB
# look our location
SCRIPT=$(readlink -f $0)
# Absolute path to this script
SCRIPTPATH=`dirname $SCRIPT`
PIDFILE=${SCRIPTPATH}/ftpmon_prog.lock
# load config.ini
if [ -f $SCRIPTPATH/config.ini ]; then
source $SCRIPTPATH/config.ini
else
echo "No config found. Exiting"
fi
# Lock to avoid multiple instances
if [ -f $PIDFILE ]; then
kill -0 $(cat $PIDFILE) 2> /dev/null
if [ $? == 0 ]; then
exit
fi
fi
# Store PID in lock file
echo $$ > $PIDFILE
# Parse cmdline arguments
while getopts ":ph" opt; do
case $opt in
p)
#we set the purge mode (cron mode)
purge_only=1
;;
\?|h)
echo "Help text"
exit 1
;;
esac
done
# declare usefull functions
# common logging function
function logmsg() {
LOGFILE=ftp_upload_`date +%Y%m%d`.log
echo $(date +%m-%d-%Y\ %H:%M:%S) $* >> $SCRIPTPATH/log/${LOGFILE}
}
# Upload to remote FTP
# we use ncftpput to batch silently
# $1 file to upload $2 return value placeholder
function upload() {
ncftpput -V -u $username -p $password $ftpadress /prog/ $1
return $?
}
function purge_retry() {
failed_files=$(ls -1 -rt ${retrydir}/*)
if [ -z $failed_files ]; then
return
fi
while read line
do
#upload ${retrydir}/$line
upload $line
if [ $? != 0 ]; then
# upload failed we exit
exit
fi
logmsg File $line Uploaded from ${retrydir}
mv $line $folderB
logmsg File $line Copyed from ${retrydir}
done <<< "$failed_files"
}
# first check out 'queue' directory
purge_retry
# if called from a cron task we are done
if [ $purge_only ]; then
rm -f $PIDFILE
exit 0
fi
# look in incoming dir
new_files=$(ls -1 -rt ${folderA}/*)
while read line
do
# launch upload
if [ Z$line == 'Z' ]; then
break
fi
#upload ${folderA}/$line
upload $line
if [ $? == 0 ]; then
logmsg File $line Uploaded from ${folderA}
else
# upload failed we cp to retry folder
echo upload failed
cp $line $retrydir
fi
# don't care upload successfull or failed, we ALWAYS move the file to folderB
mv $line $folderB
logmsg File $line Copyed from ${folderA}
done <<< "$new_files"
# clean exit
rm -f $PIDFILE
exit 0
below is content of ftpmon.path:
[Unit]
Description= Triggers the service that logs changes.
Documentation= man:systemd.path
[Path]
# Enter the path to monitor (/u02/dump)
PathModified=/u02/dump/
[Install]
WantedBy=multi-user.target
below is content of ftpmon.service:
[Unit]
Description= Starts File Upload monitoring
Documentation= man:systemd.service
[Service]
Type=oneshot
#Set here the user that ftpmxml.sh will run as
User=root
#Set the exact path to the script
ExecStart=/u02/ftpxml/ftpxml.sh
[Install]
WantedBy=default.target
Thanks in advance, hope any experts can give me some suggestion.
As you remove the successfull transfered files from A you can leave files with transfer errors in A. So I am dealing only with files in one folder.
List your files by creation time with
find -type f -maxdepth 1 -print0 | xargs -r0 stat -c %y\ %n | sort
if you want hidden files to be included or - if not -
find -type f -maxdepth 0 -print0 | xargs -r0 stat -c %y\ %n | sort
You'll get something like
2016-02-19 18:53:41.000000000 ./.dockerenv
2016-02-19 18:53:41.000000000 ./.dockerinit
2016-02-19 18:56:09.000000000 ./versions.txt
2016-02-19 19:01:44.000000000 ./test.sh
Now cut the filenames (or use xargs -r0 stat -c %n if it does not matter that the files are order by name instead the timestamp) and
do the transfer
check the success
move successfully transfered files to B
As you stated above, there are situations where newly stored files are not successfully transfered. This may be if the file is written further after you started the transfer. So filter the timestamp to be at least some time old. Add -mmin -1 to the find statement for "at least one minute old"
find -type f -maxdepth 0 -mmin -1 -print0 | xargs -r0 stat -c %n | sort
If you don't want to use a minute file age you'll have to check if the file is still open: lsof | grep ./testfile but this may have issues if you have tmpfs in your file system.
lsof | grep ./testfile
lsof: WARNING: can't stat() tmpfs file system /var/lib/docker/containers/8596cd310292a54652c7f50d7315c8390703b4816442146b340946779a72a40c/shm
Output information may be incomplete.
lsof: WARNING: can't stat() proc file system /run/docker/netns/fb9323486c44
Output information may be incomplete.
So add %s to the stats statement to check the file size twice within some seconds and if it's constant the file may be written complete. May, as the write process may be stalled.

Printing shell script doesn't print anything

I'm trying to create a print service von my raspberry pi. The idea is to have a pop3 account for print jobs where I can sent PDF files and get them printed at home. Therefore I set up fetchmail & rarr; procmail & rarr; uudeview to collect the emails (using a whitelist), extract the documents and save them to /home/pi/attachments/. Up to this point everything is working.
To get the files printed I wanted to set up a shell script which I planned to execute via a cronjob every minute. That's where I'm stuck now since I get "permission denied" messages and nothing gets printed at all with the script while it works when executing the commands manually.
This is what my script looks like:
#!/bin/bash
fetchmail # gets the emails, extracts the PDFs to ~/attachments
wait $! # takes some time so I have to wait for it to finish
FILES=/home/pi/attachments/*
for f in $FILES; do # go through all files in the directory
if $f == "*.pdf" # print them if they're PDFs
then
lpr -P ColorLaserJet1525 $f
fi
sudo rm $f # delete the files
done;
sudo rm /var/mail/pi # delete emails
After the script is executed I get the following Feedback:
1 message for print#MYDOMAIN.TLD at pop3.MYDOMAIN.TLD (32139 octets).
Loaded from /tmp/uudk7XsG: 'Test 2' (Test): Stage2.pdf part 1 Base64
Opened file /tmp/uudk7XsG
procmail: Lock failure on "/var/mail/pi.lock"
reading message print#MYDOMAIN.TLD#SERVER.HOSTER.TLD:1 of 1 (32139 octets) flushed
mail2print.sh: 6: mail2print.sh: /home/pi/attachments/Stage2.pdf: Permission denied
The email is fetched from the pop3 account, the attachement is extracted and appears for a short moment in ~/attachements/ and then gets deleted. But there's no printout.
Any ideas what I'm doing wrong?
if $f == "*.pdf"
should be
if [[ $f == *.pdf ]]
Also I think
FILES=/home/pi/attachments/*
should be quoted:
FILES='/home/pi/attachments/*'
Suggestion:
#!/bin/bash
fetchmail # gets the emails, extracts the PDFs to ~/attachments
wait "$!" # takes some time so I have to wait for it to finish
shopt -s nullglob # don't present pattern if no files are matched
FILES=(/home/pi/attachments/*)
for f in "${FILES[#]}"; do # go through all files in the directory
[[ $f == *.pdf ]] && lpr -P ColorLaserJet1525 "$f" # print them if they're PDFs
done
sudo rm -- "${FILES[#]}" /var/mail/pi # delete files and emails at once
Use below to filter the pdf files in the first place and then you can remove that if statement inside the for loop.
FILES="ls /home/pi/attachments/*.pdf"

Bash curl returns 0 whenever the copy has finished or not

I'm calling curl on bash to copy a file from a mounted SD card with the option to resume the copy later if the device gets unmounted. I receive the same status exit code 0 when I interrupt the copy by unmounting the volume and when the file gets actually copied. Any suggestions how to catch the case where the file has not been copied?
I'm copying only one file at a time.
This is the command:
curl -C - -O file:///mnt/sdcard/DCIM/100/0044.MP4
I came to a solution which is not as clear as I want, but still working. I'm executing the command 2 times one after another, so when the first command returns 0 upon unmount, the second now tries to copy the file and return error code 37 because of the unreachable source. If the second command returns 0 I consider the file copied.
Following your concept you could have a script like this:
#!/bin/bash
# Copies files persistently.
#
# Usage: pc <filepath> [<filepath2>] ...
#
function pc {
local FILE
for FILE; do
echo "Copying $FILE."
until curl -C - -O "file://${FILE}" && curl -C - -O "file://${FILE}"; do
if [[ -e $FILE ]]; then
echo "File $FILE can't be copied."
break
else
echo "Waiting for $FILE."
until
sleep 5
[[ -e $FILE ]]
do
continue
done
fi
done
done
}
pc "$#"
You could also just embed the function to a bash startup script if you like.

Resources