Determining all the processes started with a given executable in Linux - shell

I have this need to collect\log all the command lines that were used to start a process on my machine during the execution of a Perl script which happens to be a test automation script. This Perl script starts the executable in question (MySQL) multiple times with various command lines and I would like to inspect all of the command lines of those invocations. What would be the right way to do this? One possibility i see is run something like "ps -aux | grep mysqld | grep -v grep" in a loop in a shell script and capture the results in a file but then I would have to do some post processing on this and remove duplicates etc and I could possibly miss some process command lines because of timing issues. Is there a better way to achieve this.

Processing the ps output can always miss some processes. It will only capture the ones currently existing. The best way would be to modify the Perl script to log each command before or after it executes it.
If that's not an option, you can get the child pids of the perl script by running:
pgrep -P $pid -a
-a gives the full process command. $pid is the pid of the perl script. Then process just those.

You could use strace to log calls to execve.
$ strace -f -o strace.out -e execve perl -e 'system("echo hello")'
hello
$ egrep ' = 0$' strace.out
11232 execve("/usr/bin/perl", ["perl", "-e", "system(\"echo hello\")"], 0x7ffc6d8e3478 /* 55 vars */) = 0
11233 execve("/bin/echo", ["echo", "hello"], 0x55f388200cf0 /* 55 vars */) = 0
Note that strace.out will also show the failed execs (where execve returned -1), hence the egrep command to find the successful ones. A successful execve call does not return, but strace records it as if it returned 0.
Be aware that this is a relatively expensive solution because it is necessary to include the -f option (follow forks), as perl will be doing the exec call from forked subprocesses. This is applied recursively, so it means that your MySQL executable will itself run through strace. But for a one-off diagnostic test it might be acceptable.
Because of the need to use recursion, any exec calls done from your MySQL executable will also appear in the strace.out, and you will have to filter those out. But the PID is shown for all calls, and if you were to log also any fork or clone calls (i.e. strace -e execve,fork,clone), you would see both the parent and child PIDs, in the form <parent_pid> clone(......) = <child_pid> so then you should hopefully then have enough information to reconstruct the process tree and decide which processes you are interested in.

Related

Loop trough docker output until I find a String in bash

I am quite new to bash (barely any experience at all) and I need some help with a bash script.
I am using docker-compose to create multiple containers - for this example let's say 2 containers. The 2nd container will execute a bash command, but before that, I need to check that the 1st container is operational and fully configured. Instead of using a sleep command I want to create a bash script that will be located in the 2nd container and once executed do the following:
Execute a command and log the console output in a file
Read that file and check if a String is present. The command that I will execute in the previous step will take a few seconds (5 - 10) seconds to complete and I need to read the file after it has finished executing. I suppose i can add sleep to make sure the command is finished executing or is there a better way to do this?
If the string is not present I want to execute the same command again until I find the String I am looking for
Once I find the string I am looking for I want to exit the loop and execute a different command
I found out how to do this in Java, but if I need to do this in a bash script.
The docker-containers have alpine as an operating system, but I updated the Dockerfile to install bash.
I tried this solution, but it does not work.
#!/bin/bash
[command to be executed] > allout.txt 2>&1
until
tail -n 0 -F /path/to/file | \
while read LINE
do
if echo "$LINE" | grep -q $string
then
echo -e "$string found in the console output"
fi
done
do
echo "String is not present. Executing command again"
sleep 5
[command to be executed] > allout.txt 2>&1
done
echo -e "String is found"
In your docker-compose file make use of depends_on option.
depends_on will take care of startup and shutdown sequence of your multiple containers.
But it does not check whether a container is ready before moving to another container startup. To handle this scenario check this out.
As described in this link,
You can use tools such as wait-for-it, dockerize, or sh-compatible wait-for. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
OR
Alternatively, write your own wrapper script to perform a more application-specific health check.
In case you don't want to make use of above tools then check this out. Here they use a combination of HEALTHCHECK and service_healthy condition as shown here. For complete example check this.
Just:
while :; do
# 1. Execute a command and log the console output in a file
command > output.log
# TODO: handle errors, etc.
# 2. Read that file and check if a String is present.
if grep -q "searched_string" output.log; then
# Once I find the string I am looking for I want to exit the loop
break;
fi
# 3. If the string is not present I want to execute the same command again until I find the String I am looking for
# add ex. sleep 0.1 for the loop to delay a little bit, not to use 100% cpu
done
# ...and execute a different command
different_command
You can timeout a command with timeout.
Notes:
colon is a utility that returns a zero exit status, much like true, I prefer while : instead of while true, they mean the same.
The code presented should work in any posix shell.

Is there a way to redirect all stdout and stderr to systemd journal from within script?

I like the idea of using systemd's journal to view and manage the logs of my own scripts. I have become aware you can log to journal from my user scripts on a per message basis..
echo 'hello' | systemd-cat -t myscript -p emerg
Is there a way to redirect all messages to journald, even those generated by other commands? Like..
exec &> systemd-cat
Update:
Some partial success.
Tried Inian's suggestion from terminal.
~/scripts/myscript.sh 2>&1 | systemd-cat -t myscript.sh
and it worked, stdout and stderr were directed to systemd's journal.
Curiously,
~/scripts/myscript.sh &> | systemd-cat -t myscript.sh
didn't work in my Bash terminal.
I still need to find a way to do this inside my script for when other programs call my script.
I tried..
exec 2>&1 | systemd-cat -t myscript.sh
but it doesn't work.
Update 2:
From terminal
systemd-cat ~/scripts/myscript.sh
works. But I'm still looking for a way to do this from within the script.
A pipe to systemd-cat is a process which needs to run concurrently with your script. Bash offers a facility for this, though it's not portable to POSIX sh.
exec > >(systemd-cat -t myscript -p emerg) 2>&1
The >(command) process substitution starts another process and returns a pseudo-filename (something like /dev/fd/63) which you can redirect into. This is basically a wrapper for the mkfifo hacks you could do if you wanted to port this to POSIX sh.
If your script happens to not be a shell script, but some other programming language that allows loading extension modules linked to -lsystemd, there is another way. There is a library function sd_journal_stream_fd that quite precisely matches the task at hand. Calling it from bash itself (as opposed to some child) seems difficult at best. In Python for instance, it is available as systemd.journal.stream. What this function does in essence is connecting a unix domain stream socket and communicating what kind of data is being transmitted (e.g. priority). The difficult part with a shell here is making it connect a unix domain socket (as opposed to connecting in a child).
The key idea to this answer was given by Freenode/libera.chat user grawity.
Apparently, and for reasons that are beyond me, you can't really redirect all stdout and stderr to journald from within a script because it has to be piped in. To work around that I found a trick people were using with syslog's logger which works similarly.
You can wrap all your code into a function and then pipe the function into systemd-cat.
#!/bin/bash
mycode(){
echo "hello world"
echor "echo typo producing error"
}
mycode | systemd-cat -t myscript.sh
exit 0
And then to search journal logs..
journalctl -t myscript.sh --since yesterday
I'm disappointed there isn't a more direct way of doing this.

Confirmation about pgrep returning itself

I have read several posts here about cases where pgrep 'seems' to return itself even though it never should. The key seems to be the difference between how bash and sh function. Except that in my case, I have confirmed that sh really is a link to bash.
I'm running on SuSE 12 x86_64
/bin/sh is a link to bash
/bin/bash is the real binary
I have a Ruby script which calls pgrep like this:
cmd="/usr/bin/pgrep -lf \"#{target}\""
pidList=`#{cmd}`
I need to use the full command line because I'm actually using an argument to uniquely identify a specific 'java' process.
Now, due to some unrelated foolishness, I almost immediately do a ps -p on each of the pids returned. For a while, this was causing me great grief because the ps would sometimes return nothing. Eventually I was able to catch a case where the ps on the pid returned the pgrep command. But it was the pgrep command itself, not something like sh -c "pgrep -f blah"
To recap:
pgrep never returns itself. But differences in sh vs bash can cause it to show a subshell. But I verified that sh is a link to bash, so there should be no difference in behavior.
What I suspect (and am looking for confirmation for) is that an extra subcommand is being created because of the Ruby backticks and that is what is (only sometimes.. timing issues?) being picked up by the pgrep command.
This has been a real pain and I want to make sure the fix I implement will truly make the problem go away. Given the code I'm working with, I'm either going to
append a | grep -v grep to the end of my command
throw out any results containing 'grep' while looping through the returned results within the Ruby script
I figure #2 is faster, but it still irks me that I have to filter out pgrep itself.
Am I on the right track or do you think something else is at play?
Thanks for your time!
The problem is not in shell flavor: the shell process which calls pgrep also shows among processes (and has the searched string in its full command), so we need to filter it out like this:
pgrep -f target | grep -v $$
The answer is already in the comments to my question, but I figure I'll close this out with an official answer.
The piece of information I was missing is that
When bash is invoked as sh it behaves as a POSIX sh, not bash. – Jörg W Mittag Jan 23 at 23:31
So yes, pgrep was behaving normally. But when you call it from a Ruby script via backticks, you still need to filter out 'pgrep'

bash script + rsync: bash won't sync to host?

I've only been writing actual .sh scripts since sometime this morning, and I'm a bit stuck. I'm trying to write a script to check to see if a process is running, and to start it if it isn't. (I plan to run this script once every 10 to 15 minutes with cron.)
Here's what I have so far:
#!/bin/bash
APPCHK=$(ps aux | grep -c "/usr/bin/rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images
")
RUNSYNC=$(rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images)
if [ $APPCHK < '2' ];
then
$RUNSYNC
fi
exit
Here's the error that I'm getting:
$ ./image_sync.sh
rsync: mkdir "/home/i/webapps/pavlick_container/public/images" failed: No such file or directory (2)
rsync error: error in file IO (code 11) at main.c(595) [Receiver=3.0.7]
rsync: connection unexpectedly closed (9 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7]
./image_sync.sh: line 8: 2: No such file or directory
TRTWF is that
rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images
runs just fine from a terminal window.
What am I doing wrong?
Your grep call is wrong on two counts. The pattern shouldn't include a newline. To look for an exact string, use grep -F 'substring' or grep -xF 'exact whole line'.
Finding if a process is running with ps | grep is highly brittle. On most unices (at least Solaris, Linux and *BSD), use pgrep: pgrep -f 'PATTERN' returns true if there's a running process whose command line matches PATTERN.
Every program returns a status code, either 0 to indicate success or a number between 1 and 255 to indicate failure. In the shell, any command is a valid boolean expression; the status code 0 is treated as true and anything else as false.
$(…) means run the command inside the parentheses and capture its output. So rsync is executed as soon as the shell hits the definition of the RUNSYNC variable. To store a block of shell code, use a function (example below, although you don't actually need a function here, you could just write the code directly).
Your test [ $APPCHK < 2 ] should be [ $APPCHK -lt 2 ]: < means input redirection. (In bash, you can also write [[ foo < bar ]], but that's string comparison, not numeric comparison.)
~/ at the beginning of the remote rsync path is optional. Also, -e ssh is the default unless your version of rsync is really old.
exit at the end of the script is useless, the script will exit anyway.
Here's a script taking the above into account:
#!/bin/bash
run_rsync () {
rsync -rvz '/home/e-smith/files/ibays/drive-i/files/Warehouse Pics/organized_pics' \
imgserv#192.168.0.140:webapps/pavlick_container/public/images
}
process_pattern='/usr/bin/rsync -rvz /home/e-smith/files/ibays/drive-i/files/Warehouse Pics/organized_pics imgserv#192\.168\.0\.140:webapps/pavlick_container/public/images'
if pgrep -xF "$process_pattern"; then
run_rsync
fi
Looks like with your rsync command that some directory along this path is wrong: ~/webapps/pavlick_container/public/images
Have you checked on the server 192.168.0.140 in imgserv's home directory to see if "pavlick_container/public" exists? That's my guess.
You have a number of problems. First you are running the commands instead of putting the commands in variables. There is also a much easier way.
RUNSYNC="rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images"
if ! pgrep -f "rsync.*organized_pics"; then $RUNSYNC; fi
First of all, the way of checking if the program is running is mostly wrong. This may or may not work. You should rely on some special file you create when your script starts, that it is deleted when your script ends. This will tell you if the script is running, just checking if this file exists.
Then, try to either put a \ before the ~ or to remove the ~/ completely. If cron is run as other user, the tilde will be substituted in the client for the user directory. It works for the command line because maybe the home directory of your user in both machines match, but not in the user the cron is running. A guess at this point, but again, try to remove the ~/ and see if it works.
If your real code is missing a closing dlb-quote on the grep target, you're going to get weird results from the get-go.
Also, ps aux will not list a complete command line result like you show (at least on all the the pss I have used).
You need to make it ps auxwww. Often you will see people add | grep -v grep | (you'll see why at some point). This can be reduced to changing your static search target slightly like "/usr/bin/rsync" to "/usr/bin/[r]sync ".
Other users are also helping with their comments. Using a flag file as #DiegoSevilla mentions is marginally deprecated. use a mkdir /tmp/MyWatcher_flagDir for your flag. Directory creation is an atomic activity (where as file creations are not), and this will eliminate any errors you might encounter from having 2 copies of you monitor try to make a flag file at the same time. Only one process will succeed in making or removing a flag dir.
I hope this helps.

Why do I end up with two processes?

I wrote a script that has been running as a daemon for quite some time now.
If I ever needed to debug it, I would stop the daemon version and rerun manually in current shell. I have never logged anything out of this script, but as I am getting ready to deploy it on a remote server I figured I want to log any errors that the script would get into. For that purpose I followed hints from several SO postings and am doing the following:
if ! tty > /dev/null; then
exec > >(/bin/logger -p syslog.warning -t mytag -i) 2>&1
fi
This seems to log just fine, I am just surprised to see two instances of my script listed by ps when this feature is enabled. Is there a way to avoid it?
I know I get another process for logger and I assume that it has to do with the >(...), but still hope to avoid it
bash spawns a subshell to execute the command(s) in >( ... ). In this case, the only thing that subshell does is run /bin/logger, so it's rather pointless. I think you can "fix" this with another exec command:
if ! tty > /dev/null; then
exec > >(exec /bin/logger -p syslog.warning -t mytag -i) 2>&1
fi
This doesn't prevent the subshell from starting, but then instead of running /bin/logger as a subprocess (of the subshell), the subshell gets replaced with /bin/logger. I haven't tested this with logger, but it worked fine in a quick test I did with cat and it seemed to work fine.
Look at the PPID column. (parent process), I think you'll see that the 2 processes are connected to each other.
Generally commands surounded by '( )' pairs indicate 'running-as-a-subprocess', hence 2 listings in ps because there are 2 copies of the process.
(I'm not familiar with the bash syntax exec > **${spaceChar}** >( .... ) 2>&1, meaning the '>' seperated by a space from the 2nd '>' )
What is wrong with a crontab entry?

Resources