I have an inotify shell script which monitors a directory, and executes certain commands if a new file comes in. I need to make this inotify script into a parallelized process, so the execution of the script doesn't wait for the process to complete whenever multiple files comes into the directory.
I have tried using nohup, & and xargs to achieve this task. But the problem was, xargs runs the same script as a number of processes, whenever a new file comes in, all the running n processes try to process the script. But essentially I only want one of the processes to process the new file whichever is idle. Something like worker pool, whichever worker is free or idle tries to execute the task.
This is my shell script.
#!/bin/bash
# script.sh
inotifywait --monitor -r -e close_write --format '%w%f' ./ | while read FILE
do
echo "started script";
sleep $(( $RANDOM % 10 ))s;
#some more process which takes time when a new file comes in
done
I did try to execute the script like this with xargs =>
xargs -n1 -P3 bash sample.sh
So whenever a new file comes in, it is getting processed thrice because of P3, but ideally i want one of the processes to pick this task which ever is idle.
Please shed some light on how to approach this problem?
There is no reason to have a pool of idle processes. Just run one per new file when you see new files appear.
#!/bin/bash
inotifywait --monitor -r -e close_write --format '%w%f' ./ |
while read -r file
do
echo "started script";
( sleep $(( $RANDOM % 10 ))s
#some more process which takes time when a new "$file" comes in
) &
done
Notice the addition of & and the parentheses to group the sleep and the subsequent processing into a single subshell which we can then background.
Also, notice how we always prefer read -r and Correct Bash and shell script variable capitalization
Maybe this will work:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor
If you have a dir in which users drop files that needs to be processed you can do this on GNU/Linux (If you know what inotifywait is called on other platforms file a bug report):
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
parallel -u echo
This will run the command echo on each file put into my_dir or subdirs of my_dir.
To run at most 5 processes use -j5.
Related
It seems that when my screen is locked for some period of time, my S.gpg-agent.ssh disappears, and so in order to continue using my key I have to re-initialise it.
Obviously, this is a relatively frequent occurrence, so I've written a function for my shell to kill gpg-agent, restart it, and reset the appropriate environment variables.
This may be a bit of an 'X-Y problem', X being above this line, but I think Y below is more generally useful to know anyway.
How can I automatically run a command when an extant file no longer exists?
The best I've come up with is:
nohup echo "$file" | entr $command &
at login. But entr runs a command when files change, not just deletion, so it's not clear to me how that will behave with a socket.
According to your comment, cron daemon does not fit.
Watch socket file deletion
Try auditd
# auditctl -w /var/run/<your_socket_file> -p wa
$ tail -f /var/log/audit/audit.log | grep 'nametype=DELETE'
Howto run a script if event occurred
If you want to run a script on socketile deletion, you can use while loop, e.g.:
tail -Fn0 /var/log/audit/audit.log | grep 'name=<your_socket_file>' | grep 'nametype=DELETE' \
while IFS= read -r line; do
# your script here
done
thx to Tom Klino and his answer
You don't mention the OS you're using, but if it's linux, you can use inotifywait from the inotify-tools package:
#!/bin/sh
while inotifywait -qq -e delete_self /path/to/S.gpg-agent.ssh; do
echo "Socket was deleted!"
# Recreate it.
done
I have several large files that I need to transfer to a local machine and process. The transfer takes about as long as the processing of the file, and I would like to start processing it immediately after it transfers. But the processing could take longer than the transfer, and I don't want the processes to keep building up, but I would like to limit it to some number, say 4.
Consider the following:
LIST_OF_LARGE_FILES="file1 file2 file3 file4 ... fileN"
for FILE in $LIST_OF_LARGE_FILES; do
scp user#host:$FILE ./
myCommand $FILE &
done
This will transfer each file and start processing it after the transfer while allowing the next file to start transferring. However, if myCommand $FILE takes much longer than the time to transfer one file, these could keep piling up and bogging down the local machine. So I would like to limit myCommand to maybe 2-4 parallel instances. Subsequent attempts to invoke myCommand should buffer it until a "slot" is open. Is there a good way to do this in BASH (using xargs or other utilities is acceptable).
UPDATE:
Thanks for the help in getting this far. Now I'm trying to implement the following logic:
LIST_OF_LARGE_FILES="file1 file2 file3 file4 ... fileN"
for FILE in $LIST_OF_LARGE_FILES; do
echo "Starting on $FILE" # should go to terminal output
scp user#host:$FILE ./
echo "Processing $FILE" # should go to terminal output
echo $FILE # should go through pipe to parallel
done | parallel myCommand
You can use GNU Parallel for that. Just echo the commands you want run into parallel and it will run one job per CPU core your machine has.
for f in ... ; do
scp ...
echo ./process "$f"
done | parallel
If you specifically want 4 processes at a time, use parallel -j 4.
If you want a progress bar, use parallel --bar.
Alternatively, echo just the filename with null-termination, and add the processing command into the invocation of parallel:
for f in ... ; do
scp ...
printf "%s\0" "$f"
done | parallel -0 -j4 ./process
I have a script:
nohup tail -f /somefile >> /soemeotherfile.dat &
nohup while inotifywait -e close_write /someotherfile.dat; do ./script.sh; done &
but it seems that script.sh is never activated despite input arriving at the tail of /somefile every 5 minutes. What is wrong with my script above?
From the inotifywait docs:
close_write
A watched file or a file within a watched directory was closed, after being opened in writeable mode. This does not necessarily imply the file was written to.
close_write only triggers when a file is closed.
tail -f /somefile >> /soemeotherfile.dat
...continually appends to someotherfile.dat. It does not close it after each individual write.
Probably you want the modify event instead.
I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"
I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.
I've created a background shell to watch a folder (with inotifywait) and execute a process (a php script to send information to several other server and update a database, but I don't think that's relevant) when a new file is created in it.
My problem is that after some times the script is actually terminated, and I don't understand why (I redirected the output to a file not to fill up the buffer, even for php execution).
I'm using Ubuntu 12.04 server and latest version of php.
Here is my script:
#!/bin/sh
#get the script directory
SCRIPT=$(readlink -f "$0")
script_path=$(dirname "$SCRIPT")
for f in `ls "$script_path"/data/`
do
php myscript.php "$script_path"/data/$f &
done
#watch the directory for file creation
inotifywait -q -m --format %w%f -e create "$script_path"/data/ | while read -r line; do
php myscript.php "$line" &
done
You should take a look at nohup and screen this is exactly what you are looking for
Ok, after hours and hours i finally found a solution, it might (must) be a bit dirty but it works ! As i said in a previous command, i used the trap command, here is my final script :
#!/bin/sh
#get the script directory
SCRIPT=$(readlink -f "$0")
script_path=$(dirname "$SCRIPT")
#trap SIGHUP SIGINT SIGTERM and relaunch the script
trap "pkill -9 inotifywait;($SCRIPT &);exit" 1 2 15
for f in `ls "$script_path"/data/`
do
php myscript.php "$script_path"/data/$f &
done
#watch the directory for file creation
inotifywait -q -m --format %w%f -e create "$script_path"/data/ | while read -r line; do
php myscript.php "$line" &
done
hope it will help shell beginner as me :)
Edit : added "pkill -9 inotifywait" to make sure inotify process won't stack up,the parenthesis to make sure the new process is not a child of the current one, and exit to make sure the current process stops running