I have an inotify shell script which monitors a directory, and executes certain commands if a new file comes in. I need to make this inotify script into a parallelized process, so the execution of the script doesn't wait for the process to complete whenever multiple files comes into the directory.
I have tried using nohup, & and xargs to achieve this task. But the problem was, xargs runs the same script as a number of processes, whenever a new file comes in, all the running n processes try to process the script. But essentially I only want one of the processes to process the new file whichever is idle. Something like worker pool, whichever worker is free or idle tries to execute the task.
This is my shell script.
#!/bin/bash
# script.sh
inotifywait --monitor -r -e close_write --format '%w%f' ./ | while read FILE
do
echo "started script";
sleep $(( $RANDOM % 10 ))s;
#some more process which takes time when a new file comes in
done
I did try to execute the script like this with xargs =>
xargs -n1 -P3 bash sample.sh
So whenever a new file comes in, it is getting processed thrice because of P3, but ideally i want one of the processes to pick this task which ever is idle.
Please shed some light on how to approach this problem?
There is no reason to have a pool of idle processes. Just run one per new file when you see new files appear.
#!/bin/bash
inotifywait --monitor -r -e close_write --format '%w%f' ./ |
while read -r file
do
echo "started script";
( sleep $(( $RANDOM % 10 ))s
#some more process which takes time when a new "$file" comes in
) &
done
Notice the addition of & and the parentheses to group the sleep and the subsequent processing into a single subshell which we can then background.
Also, notice how we always prefer read -r and Correct Bash and shell script variable capitalization
Maybe this will work:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor
If you have a dir in which users drop files that needs to be processed you can do this on GNU/Linux (If you know what inotifywait is called on other platforms file a bug report):
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
parallel -u echo
This will run the command echo on each file put into my_dir or subdirs of my_dir.
To run at most 5 processes use -j5.
It seems that when my screen is locked for some period of time, my S.gpg-agent.ssh disappears, and so in order to continue using my key I have to re-initialise it.
Obviously, this is a relatively frequent occurrence, so I've written a function for my shell to kill gpg-agent, restart it, and reset the appropriate environment variables.
This may be a bit of an 'X-Y problem', X being above this line, but I think Y below is more generally useful to know anyway.
How can I automatically run a command when an extant file no longer exists?
The best I've come up with is:
nohup echo "$file" | entr $command &
at login. But entr runs a command when files change, not just deletion, so it's not clear to me how that will behave with a socket.
According to your comment, cron daemon does not fit.
Watch socket file deletion
Try auditd
# auditctl -w /var/run/<your_socket_file> -p wa
$ tail -f /var/log/audit/audit.log | grep 'nametype=DELETE'
Howto run a script if event occurred
If you want to run a script on socketile deletion, you can use while loop, e.g.:
tail -Fn0 /var/log/audit/audit.log | grep 'name=<your_socket_file>' | grep 'nametype=DELETE' \
while IFS= read -r line; do
# your script here
done
thx to Tom Klino and his answer
You don't mention the OS you're using, but if it's linux, you can use inotifywait from the inotify-tools package:
#!/bin/sh
while inotifywait -qq -e delete_self /path/to/S.gpg-agent.ssh; do
echo "Socket was deleted!"
# Recreate it.
done
I am new bee for shell scripting so if you find this post is redundant please redirect me to existing post.
I have jar command which run in background and keep on updating log file.
I am automating this process and writing shell script. I am using below code. Please help me to figure out what am I missing.
java -jar fileName.jar &
pid=$!
echo $!
cd /usr/ebp/logs/
logs=$(ls -t | head -n 1) #trying to read latest log file from directory
tail -f ${logs} | while read LOGLINE
do
if [ "${LOGLINE}" == *"Batch process completed successfully"* ]
then
echo ${LOGLINE}
echo "csv files created successfully"
break
fi
done
kill -9 ${pid}
Once batch process is completed I want to kill jar process id. But I am not able to read log file and it is stuck in while loop.
I'm currently creating a lock folder which is created when my script runs, I also move files into sub folders here for processing. When the script ends a TRAP is called which removes the lock folder and contents, all of which is working fine. We had an issue the other day when someone pulled the power from one of the servers so my TRAP was never called so when re-booted the lock folder was still there which meant my scripts couldn't re-start until they were manually removed. What's the best way of checking if the script is already running ? I currently have this approach using process id's:
if ! mkdir $LOCK_DIR 2>/dev/null; then # Try to create the lock dir. This should pass successfully first run.
# If the lock dir exists
pid=$(cat $LOCK_DIR/pid.txt)
if [[ $(ps -ef | awk '{print $2}' | grep $pid | grep -v grep | wc -l) == 1 ]]; then
echo "Script is already running"
exit 1
else
echo "It looks like the previous script was killed. Restarting process."
# Do some cleanup here before removing dir and re-starting process.
fi
fi
# Create a file in the lock dir containing the pid. Echo the current process id into the file.
touch $LOCK_DIR/pid.txt
echo $$ > $LOCK_DIR/pid.txt
# Rest of script below
Checking /proc/ and cmdline is a good call - especially as at the moment you are simply checking that there isn't a process with the process id and not if the process is actually your script.
You could still do this with your ps command - which would offer some form of platform agnosticism.
COMMAND=$(ps -o comm= -p $pid)
if [[ $COMMAND == my_process ]]
then
.....
Note the command line arguments to ps limit it to command only with no header.
Many systems nowadays use tmpfs for directories like /tmp. These directories will therefore always be cleared after a reboot.
If using your pid file, note you can easily see the command
running under that pid in /proc/$pid/cmdline and /proc/$pid/exe.
Is there any way to set the process name of a shell script? This is needed for killing this script with the killall command.
Here's a way to do it, it is a hack/workaround but it works pretty good. Feel free to tweak it to your needs, it certainly needs some checks on the symbolic link creation or using a tmp folder to avoid possible race conditions (if they are problematic in your case).
Demonstration
wrapper
#!/bin/bash
script="./dummy"
newname="./killme"
rm -iv "$newname"
ln -s "$script" "$newname"
exec "$newname" "$#"
dummy
#!/bin/bash
echo "I am $0"
echo "my params: $#"
ps aux | grep bash
echo "sleeping 10s... Kill me!"
sleep 10
Test it using:
chmod +x dummy wrapper
./wrapper some params
In another terminal, kill it using:
killall killme
Notes
Make sure you can write in your current folder (current working directory).
If your current command is:
/path/to/file -q --params somefile1 somefile2
Set the script variable in wrapper to /path/to/file (instead of ./dummy) and call wrapper like this:
./wrapper -q --params somefile1 somefile2
You can use the kill command on a PID so what you can do is run something in the background, get its ID and kill it
PID of last job run in background can be obtained using $!.
echo test & echo $!
You cannot do this reliably and portably, as far as I know. On some flavors of Unix, changing what's in argv[0] will do the job. I don't believe there's a way to do that in most shells, though.
Here are some references on the topic.
Howto change a UNIX process and child process name by modifying argv0
Is there a way to change the effective process name in Python?
This is an extremely old post. Pretty sure the original poster got his/her answer long ago. But for newcomers, thought I'd explain my own experience (after playing with bash for a half hour). If you start a script by script name w/ something like:
./script.sh
the process name listed by ps will be "bash" (on my system). However if you start a script by calling bash directly:
/bin/bash script.sh
/bin/sh script.sh
bash script.sh
you will end up with a process name that contains the name of the script. e.g.:
/bin/bash script.sh
results in a process name of the same name. This can be used to mark pids with a specific script name. And, this can be useful to (for example) use the kill command to stop all processes (by pid) that have a process name containing said script name.
You can all use the -f flag to pgrep/pkill which will search the entire command line rather than just the process name. E.g.
./script &
pkill -f script
Include
#![path to shell]
Example for path to shell -
/usr/bin/bash
/bin/bash
/bin/sh
Full example
#!/usr/bin/bash
On Linux at least, killall dvb works even though dvb is a shell script labelled with #!. The only trick is to make the script executable and invoke it by name, e.g.,
dvb watch abc write game7 from 9pm for 3:30
Running ps shows a process named
/usr/bin/lua5.1 dvb watch ...
but killall dvb takes it down.
%1, %2... also do an adequate job:
#!/bin/bash
# set -ex
sleep 101 &
FIRSTPID=$!
sleep 102 &
SECONDPID=$!
echo $(ps ax|grep "^\(${FIRSTPID}\|${SECONDPID}\) ")
kill %2
echo $(ps ax|grep "^\(${FIRSTPID}\|${SECONDPID}\) ")
sleep 1
kill %1
echo $(ps ax|grep "^\(${FIRSTPID}\|${SECONDPID}\) ")
I put these two lines at the start of my scripts so I do not have to retype the script name each time I revise the script. It won't take $0 of you put it after the first shebang. Maybe someone who actually knows can correct me but I believe this is because the script hasn't started until the second line so $0 doesn't exist until then:
#!/bin/bash
#!/bin/bash ./$0
This should do it.
My solution uses a trivial python script, and the setproctitle package. For what it's worth:
#!/usr/bin/env python3
from sys import argv
from setproctitle import setproctitle
from subprocess import run
setproctitle(argv[1])
run(argv[2:])
Call it e.g. run-with-title and stick it in your path somewhere. Then use via
run-with-title <desired-title> <script-name> [<arg>...]
Run bash script with explicit call to bash (not just like ./test.sh). Process name will contain script in this case and can be found by script name. Or by explicit call to bash with full path as
suggested in display_name_11011's answer:
bash test.sh # explicit bash mentioning
/bin/bash test.sh # or with full path to bash
ps aux | grep test.sh | grep -v grep # searching PID by script name
If the first line in script (test.sh) explicitly specifies interpreter:
#!/bin/bash
echo 'test script'
then it can be called without explicit bash mentioning to create process with name '/bin/bash test.sh':
./test.sh
ps aux | grep test.sh | grep -v grep
Also as dirty workaround it is possible to copy and use bash with custom name:
sudo cp /usr/bin/bash /usr/bin/bash_with_other_name
/usr/bin/bash_with_other_name test.sh
ps aux | grep bash_with_other_name | grep -v grep
Erm... unless I'm misunderstanding the question, the name of a shell script is whatever you've named the file. If your script is named foo then killall foo will kill it.
We won't be able to find pid of the shell script using "ps -ef | grep {scriptName}" unless the name of script is overridden using shebang. Although all the running shell scripts come in response of "ps -ef | grep bash". But this will become trickier to identify the running process as there will be multiple bash processing running simultaneously.
So a better approach is to give an appropriate name to the shell script.
Edit the shell script file and use shebang (the very first line) to name the process e.g. #!/bin/bash /scriptName.sh
In this way we would be able to grep the process id of scriptName using
"ps -ef | grep {scriptName}"