PS command does not show running process - shell

I have very long running shell script ( runs more than 24 hours) .
It is very simple script. It just reads xml file from a dir and perform sed operation of the content of the file. There are 1 millions xml files in the dir.
My script name is like runDataManipulation.sh
When I run following command
ps -ef | grep "runDa*"
then sometime I see my process as
username 34535 1 48 11:42:01 - 224:22 /usr/bin/ksh ./runDataManipulation.sh
But if I run exactly same command after couple of seconds then I don't see above process at all.
As my process is running all the time so I expect that ps command to show the process all the time.
If I do grep on the process id of my script like ..
ps -ef | grep 34535
then sometime I see result like
username 34535 1 51 11:42:01 - 229:22 [ksh]
sometime I see result like
username 45678 34535 0 14:12:11 - 0:0 [sed]
My main questions is that ... why do I not see my process when I grep for my process using script name. I am using AIX 6.1.

It looks to me like one script is spawning off your script in another process.
If you look at the results of your ps command below the first line is showing the process id of 34535, this is the main id (say parent id).
username 34535 1 51 11:42:01 - 229:22 [ksh]
This process in turn is firing off another process, this can be seen below, notice the id of the parent process (34535) is mentioned in the line below, the first number is the main process Id and the second number is the calling process.
username 45678 34535
If you changed your ps command to include the sed command you should always see some results if your scripts are still running!

Related

check autosys job status via unix without using autorep command

I am trying to fetch the status of autosys job via shell script without using autorep command.
I/P:
user id PID Env(AUTOSYS_JOB_NAME="test_job",script_location="/tmp/SA/test.sh")
From the above command I wanted to fetch the following details:
run id
the job success/failure/in progress everytime
script location
start time
end time
when the job is running and its getting overwritten in my sql server db under following table.
I have tried
ps aux | grep -v grep | grep 'user_id' | grep 'AUTOSYS_JOB_NAME'
The above command is not working as expected

Stopping a task in shell script if logs contain specific string

I am trying to run a command in shell script and would like to exit it if the processing logs (not sure what you call the logs that are outputted on terminal while the task is running) contains the string "INFO | Next session will start at"
I tried using grep but because the string "INFO | Next session will start at" is not in a stdout it does not detect while the command is running.
The specific command I'm running is below
pipenv run python3 run.py --config accounts/user/config.yml
By 'processing logs' I mean the log output before the stdout is displayed in the terminal.
...
[D 211127 10:07:12 init:400] atx-agent version 0.10.0
[D 211127 10:07:12 init:403] device wlan ip: route ip+net: no such network interface
[11/27 10:07:12] INFO | Time delta has set to 00:11:51.
[11/27 10:07:13] INFO | Kill atx agent.
[11/27 09:59:32] INFO | Next session will start at: 10:28:30 (2021/11/27).
[11/27 09:59:32] INFO | Time left: 00:28:57.
I am trying to do this because the yml file I'm trying to run has a limit on what time you can execute it, and I would like to exit the task if the time is not met.
I tried to give as much context but if there's something missing please let me know.
This may work:
pipenv run python3 run.py --config accounts/user/config.yml |
sed "/INFO | Next session will start at/q"
sed prints the piped input, until it matches the expression and quits (q). The program will receive SIGPIPE (broken pipe) when it tries to continue writing, and (likely) exit. It's the same as what happens when you do something like find | head.
You could also use kill in a shell wrapper:
sh -c 'pipenv run python3 run.py --config accounts/user/config.yml |
{ sed "/INFO | Next session will start at/q"; kill -- -$$; }'
Notes:
The program may print a different log if stdout is not a terminal.
If you want to match a literal string, you could use grep -Fm 1 PATTERN, but other log output will be hidden. grep fails if no match, which can be useful.
This will work any shell, including zsh. zsh or bash can also be used for the kill wrapper.
There are other approaches. This thread focuses on tail, but is a useful reference: https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found

Determining all the processes started with a given executable in Linux

I have this need to collect\log all the command lines that were used to start a process on my machine during the execution of a Perl script which happens to be a test automation script. This Perl script starts the executable in question (MySQL) multiple times with various command lines and I would like to inspect all of the command lines of those invocations. What would be the right way to do this? One possibility i see is run something like "ps -aux | grep mysqld | grep -v grep" in a loop in a shell script and capture the results in a file but then I would have to do some post processing on this and remove duplicates etc and I could possibly miss some process command lines because of timing issues. Is there a better way to achieve this.
Processing the ps output can always miss some processes. It will only capture the ones currently existing. The best way would be to modify the Perl script to log each command before or after it executes it.
If that's not an option, you can get the child pids of the perl script by running:
pgrep -P $pid -a
-a gives the full process command. $pid is the pid of the perl script. Then process just those.
You could use strace to log calls to execve.
$ strace -f -o strace.out -e execve perl -e 'system("echo hello")'
hello
$ egrep ' = 0$' strace.out
11232 execve("/usr/bin/perl", ["perl", "-e", "system(\"echo hello\")"], 0x7ffc6d8e3478 /* 55 vars */) = 0
11233 execve("/bin/echo", ["echo", "hello"], 0x55f388200cf0 /* 55 vars */) = 0
Note that strace.out will also show the failed execs (where execve returned -1), hence the egrep command to find the successful ones. A successful execve call does not return, but strace records it as if it returned 0.
Be aware that this is a relatively expensive solution because it is necessary to include the -f option (follow forks), as perl will be doing the exec call from forked subprocesses. This is applied recursively, so it means that your MySQL executable will itself run through strace. But for a one-off diagnostic test it might be acceptable.
Because of the need to use recursion, any exec calls done from your MySQL executable will also appear in the strace.out, and you will have to filter those out. But the PID is shown for all calls, and if you were to log also any fork or clone calls (i.e. strace -e execve,fork,clone), you would see both the parent and child PIDs, in the form <parent_pid> clone(......) = <child_pid> so then you should hopefully then have enough information to reconstruct the process tree and decide which processes you are interested in.

Repeating Bash Task using At

I am running ubuntu 13.10 and want to write a bash script that will execute a given task at non-pre-determined time intervals. My understanding of this is that cronjobs require me to know when the task will be performed again. Thus, I was recommended to use "at."
I'm having a bit of trouble using "at." Based on some experimentation, I've found that
echo "hello" | at now + 1 minutes
will run in my terminal (with and without quotes). Running "atq" results in my computer telling me that the command is in the queue. However, I never see the results of the command. I assume that I'm doing something wrong, but the manpages don't seem to be telling me anything useful.
Thanks in advance for any help.
Besides the fact that commands are run without a terminal (output and input is probably redirected to /dev/null), your command would also not run since what you're passing to at is not echo hello but just hello. Unless hello is really an existing command, it won't run. What you want probably is:
echo "echo hello" | at now + 1 minutes
If you want to know if your command is really running, try redirecting the output to a file:
echo "echo hello > /var/tmp/hello.out" | at now + 1 minutes
Check the file later.

How to get the time of a process created remotely via ssh?

I am currently writing a script whose purpose is to kill a process whose running time exceeds a threshold. The "killer" is run on a bunch of hosts and the jobs are send by a server to each host. My idea was to use 'ps' to get the running time of the job but all it prints is
17433 17433 ? 00:00:00 python
whatever the time I wait.
I tried to find a simplified example to avoid posting all the code I wrote. Let's call S the server and H the host.
If I do the following steps:
1) ssh login#H from the server
2) python myscript.py (now logged on the host)
3) ps -U login (still on the host)
I get the same result than the one above 00:00:00 as far as the time is concerned.
How can I get the real execution time ? When I do everything locally on my machine, it works fine.
I thank you very much for your help.
V.
Alternatively, you can look at the creation of the pid file in /var/run, assuming your process created one and use the find command to see if it exceeds a certain threshold:
find /var/run/ -name "myProcess.pid" -mtime +1d
This will return the filename if it meets the criteria (last modified at least 1 day ago). You probably also want to check to make sure the process is actually running as it may have crashed and left the pid behind.
If you want how long the process has been alive, you could try
stat -c %Y /proc/`pgrep python`
..which will give it back to you in epoch time. If alternately you want the kill in one go, I suggest using the find mentioned above (but perhaps point it at /proc)
Try this out:
ps kstart_time -ef | grep myProc.py | awk '{print $5}'
This will show the start date/time of the proccess myProc.py:
[ 19:27 root#host ~ ]# ps kstart_time -ef | grep "httpd\|STIME" | awk '{print $5}'
STIME
19:25
Another option is etime.
etime is the elapsed time since the process was started, in the form dd-hh:mm:ss. dd is the number of days; hh, the number of hours; mm, the number of minutes; ss, the number of seconds.
[ 19:47 root#host ~ ]# ps -eo cmd,etime
CMD ELAPSED
/bin/sh 2-16:04:45
And yet another way to do this:
Get the process pid and read off the timestamp in the corresponding subdirectory in /proc.
First, get the process pid using the ps command (ps -ef or ps aux)
Then, use the ls command to display the creation timestamp of the directory.
[ 19:57 root#host ~ ]# ls -ld /proc/1218
dr-xr-xr-x 5 jon jon 0 Sep 20 16:14 /proc/1218
You can tell from the timestamp that the process 1218 began executing on Sept 20, 16:14.

Resources