How do i get the Windows PPID of a procces running out of cygwin? - bash

I need to kill a windows procces that were started by the programm called from cygwin.
Here's what I do:
${wccoaDirNix}/bin/WCCILpmon.exe -proj ${projName} -user root: &
This process creates other windows process:
$ ps -W
PID PPID PGID WINPID TTY UID STIME COMMAND
1960 1 1960 1960 ? 197609 19:21:57 /usr/bin/mintty
7316 0 0 7316 ? 0 19:21:57 C:\Windows\System32\conhost.exe
1700 1960 1700 1576 pty1 197609 19:21:57 /usr/bin/bash
I 10760 9840 10760 7560 pty0 197609 19:25:47 /usr/bin/bash
32 10760 10760 32 pty0 197609 19:26:28 /cygdrive/c/Siemens/Automation/WinCC_OA/3.14/bin/WCCILpmon
6264 0 0 6264 ? 0 19:26:28 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCILpmon.exe
8420 0 0 8420 ? 0 19:26:29 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCILdata.exe
6336 0 0 6336 ? 0 19:26:29 C:\Windows\System32\conhost.exe
2808 0 0 2808 ? 0 19:26:30 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCILevent.exe
6784 0 0 6784 ? 0 19:26:30 C:\Windows\System32\conhost.exe
2972 0 0 2972 ? 0 19:26:30 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCOActrl.exe
11004 0 0 11004 ? 0 19:26:30 C:\Windows\System32\conhost.exe
9536 0 0 9536 ? 0 19:26:31 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCILsim.exe
7372 0 0 7372 ? 0 19:26:31 C:\Windows\System32\conhost.exe
9128 0 0 9128 ? 0 19:26:31 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCOAui.exe
3964 0 0 3964 ? 0 19:27:48 C:\Siemens\Automation\WinCC_OA\3.14\bin\WCCILdatabg.exe
How can i kill them?
I tried to kill them by the following command:
ps -W | grep "WCC" | awk '{print $1}' | xargs kill -f;
But it does not work as it should, it kills all processes in which name there are letters WCС, and I need to terminate only the child processes of WCCILpmon.exe
I also read the question on cygwinlist about same problem
And it upset me, is there no way to realize it?

As you need to kill a NOT cygwin process, it is better to use
windows specific program.
One example is:
https://learn.microsoft.com/en-us/sysinternals/downloads/pskill

Related

Gearman worker in shell hangs as a zombie

I have a Gearman worker in a shell script started with perp in the following way:
runuid -s gds \
/usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel \
-- xargs /home/gds/gds-rel-worker.sh < /dev/null 2>/dev/null
The worker only does some input validation and calls another shell script run.sh that invokes bash, curl, Terragrunt, Terraform, Ansible and gcloud to provision and update resources in GCP like this:
./run.sh --release 1.2.3 2>&1 >> /var/log/gds-release
The script is intended to run unattended. The problem I have is that after the job finishes successfully (that's both shell scripts run.sh and gds-rel-worker.sh) the Gearman job remains executing, because the child process becomes zombie (see last line below).
root 144748 1 0 Apr29 ? 00:00:00 perpboot -d /etc/perp
root 144749 144748 0 Apr29 ? 00:00:00 \_ tinylog -k 8 -s 100000 -t -z /var/log/perp/perpd-root
root 144750 144748 0 Apr29 ? 00:00:00 \_ perpd /etc/perp
root 2492482 144750 0 May14 ? 00:00:00 \_ tinylog (gearmand) -k 10 -s 100000000 -t -z /var/log/perp/gearmand
gearmand 2492483 144750 0 May14 ? 00:00:08 \_ /usr/sbin/gearmand -L 127.0.0.1 -p 4730 --verbose INFO --log-file stderr --keepalive --keepalive-idle 120 --keepalive-interval 120 --keepalive-count 3 --round-robin --threads 36 --worker-wakeup 3 --job-retries 1
root 2531800 144750 0 May14 ? 00:00:00 \_ tinylog (gds-rel-worker) -k 10 -s 100000000 -t -z /var/log/perp/gds-rel-worker
gds 2531801 144750 0 May14 ? 00:00:00 \_ /usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel -- xargs /home/gds/gds-rel-worker.sh
gds 2531880 2531801 0 May14 ? 00:00:00 \_ [xargs] <defunct>
So far I have traced the problem to run.sh, because if I replace its call with something simpler (e.g. echo "Hello"; sleep 5) the worker does not hang. Unfortunately, I have no clue what is causing the problem. The script run.sh is rather long and complex, but has been working without a problem so far. Tracing the worker process I see this:
getpid() = 2531801
write(2, "gearman: ", 9) = 9
write(2, "gearman_worker_work", 19) = 19
write(2, " : ", 3) = 3
write(2, "gearman_wait(GEARMAN_TIMEOUT) ti"..., 151) = 151
write(2, "\n", 1) = 1
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\n\0\0\0\0", 8192, MSG_NOSIGNAL, NULL, NULL) = 12
sendto(5, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}], 2, 1000) = 1 ([{fd=5, revents=POLLIN}])
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\6\0\0\0\0\0RES\0\0\0(\0\0\0QH:terra-"..., 8192, MSG_NOSIGNAL, NULL, NULL) = 105
pipe([6, 7]) = 0
pipe([8, 9]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fea38480a50) = 2531880
close(6) = 0
close(9) = 0
write(7, "1.2.3\n", 18) = 6
close(7) = 0
read(8, "which: no terraform-0.14 in (/us"..., 1024) = 80
read(8, "Identity added: /home/gds/.ssh/i"..., 1024) = 54
read(8, 0x7fff6251f5b0, 1024) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2531880, si_uid=1006, si_status=0, si_utime=0, si_stime=0} ---
read(8,
So the worker continues reading standard output even though the child has finished successfully and presumably closed it. Any ideas how to catch what causes this problem?
I was able to solve it. The script run.sh was starting ssh-agent, which opens a socket and since Gearman redirects all outputs the worker continued reading the open file descriptor even after the script successfully completed.
I found it by examining the open file descriptors for the Gearman worker process after it hang:
# ls -l /proc/2531801/fd/*
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/0 -> /dev/null
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/1 -> 'pipe:[9356665]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/2 -> 'pipe:[9356665]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/3 -> 'pipe:[9357481]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/4 -> 'pipe:[9357481]'
lrwx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/5 -> 'socket:[9357482]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/8 -> 'pipe:[9369888]'
Then identified the processes using file node for the pipe in file descriptor 8 that German worker continued reading:
# lsof | grep 9369888
gearman 2531801 gds 8r FIFO 0,13 0t0 9369888 pipe
ssh-agent 2531899 gds 9w FIFO 0,13 0t0 9369888 pipe
And finally listed files opened by ssh-agent and found what stands behind file descriptor 3:
# ls -l /proc/2531899/fd/*
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/0 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/1 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/2 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/3 -> 'socket:[9346577]'
# lsof | grep 9346577
ssh-agent 2531899 gds 3u unix 0xffff89016fd34000 0t0 9346577 /tmp/ssh-0b14coFWhy40/agent.2531898 type=STREAM
As a solution I added kill of the ssh-agent before exit from run.sh script and now there are no jobs hanging due to zombie process.

verifying where 'kworker/n:n' (in ps -aux) is invoked from

In the result of 'ps -aux', I couldn't find how to verify that 'kworker/...' are created from and what module/functions are related to it.
Please let me know how I find out kworkers are from with pid or else.
I've try to check files in /proc, nothing is shown about this.
$ ps -aux | grep kworker
root 15 0.0 0.0 0 0 ? S Aug12 0:00 [kworker/1:0]
root 16 0.0 0.0 0 0 ? S< Aug12 0:00 [kworker/1:0H]
root 85 0.0 0.0 0 0 ? S< Aug12 0:09 [kworker/0:1H]
root 3562 0.0 0.0 0 0 ? S< Aug12 0:00 [kworker/0:2H]
root 5578 0.0 0.0 0 0 ? S 11:13 0:01 [kworker/0:0]
root 5579 0.0 0.0 0 0 ? S 11:13 0:00 [kworker/u4:1]
root 8789 0.1 0.0 0 0 ? S 12:19 0:10 [kworker/0:2]
root 30236 0.0 0.0 0 0 ? S 08:39 0:01 [kworker/u4:0]
A good solution for these kinds of problems that I'm familiar with is to use the perf tool (It's not always enabled by default and you may need to install perf on your device).
Step 1: Set perf to record workqueue events:
perf record -e 'workqueue:*' -ag -T
Step 2: Run it as long as you think you need to catch the event (10 seconds should be ok if this event is frequent enough, but you can let it run longer, depending on the available free space you have left on your device) and then stop it with Ctrl + C.
Step 3: Print the captured events (on Linux versions < 4.1 I think it should be -f and not -F):
perf script -F comm,pid,tid,time,event,trace
This will display something like this: 
task-name pid/tid timestamp event
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
turtle   9201/9201 1473.339166:  workqueue:workqueue_queue_work: work struct=0xef20d4c4 function=pm_runtime_work workqueue=0xef1cb600 req_cpu=8 cpu=1
turtle   9201/9201 1473.339176: workqueue:workqueue_activate_work: work struct 0xef20d4c4
kworker/0:3  24223/24223 1473.339221: workqueue:workqueue_execute_start: work struct 0xef20d4c4: function pm_runtime_work
kworker/0:3  24223/24223 1473.339248:  workqueue:workqueue_execute_end: work struct 0xef20d4c4
Step 4: Analyzing the table above:
In the first row, a task named turtle (pid 9201) is pushing the work pm_runtime_work to the workqueue.
In the third row, we can see that the kworker/0:3 (pid 24223) is executing that work.
Summary: Now back to your questions, we see that kworker/0:3 has been requested by turtle task to run the pm_runtime_work function.
Now, if you want to dig further, you'll have step into the code and see what the pm_runtime_work function does. Good luck !!!

How to get bash to print the output without the fields with zero size when running smem command?

Here is the 'smem' command I run on the Redhat/CentOS Linux system. I expect the output be printed without the fields with zero size however I would expect the heading columns.
smem -kt -c "pid user command swap"
PID User Command Swap
7894 root /sbin/agetty --noclear tty1 0
9666 root ./nimbus /opt/nimsoft 0
7850 root /sbin/auditd 236.0K
7885 root /usr/sbin/irqbalance --fore 0
11205 root nimbus(hdb) 0
10701 root nimbus(spooler) 0
8446 trapsanalyzer1 /opt/traps/analyzerd/analyz 0
50316 apache /usr/sbin/httpd -DFOREGROUN 0
50310 apache /usr/sbin/httpd -DFOREGROUN 0
3971 root /usr/sbin/lvmetad -f 36.0K
63988 root su - 0
7905 ntp /usr/sbin/ntpd -u ntp:ntp - 4.0K
7876 dbus /usr/bin/dbus-daemon --syst 44.0K
9672 root nimbus(controller) 0
7888 root /usr/lib/systemd/systemd-lo 0
63990 root -bash 0
59978 postfix pickup -l -t unix -u 0
3977 root /usr/lib/systemd/systemd-ud 736.0K
9016 postfix qmgr -l -t unix -u 0
50303 root /usr/sbin/httpd -DFOREGROUN 0
3941 root /usr/lib/systemd/systemd-jo 52.0K
8199 root //usr/lib/vmware-caf/pme/bi 0
8598 daemon /opt/quest/sbin/.vasd -p /v 0
8131 root /usr/sbin/vmtoolsd 0
7881 root /usr/sbin/NetworkManager -- 8.0K
8364 root /opt/puppetlabs/puppet/bin/ 0
8616 daemon /opt/quest/sbin/.vasd -p /v 0
23290 root /usr/sbin/rsyslogd -n 3.8M
64091 root python /bin/smem -kt -c pid 0
7887 polkitd /usr/lib/polkit-1/polkitd - 0
8363 root /usr/bin/python2 -Es /usr/s 0
53606 root /usr/share/metricbeat/bin/m 0
24631 nagios /usr/local/ncpa/ncpa_passiv 0
24582 nagios /usr/local/ncpa/ncpa_listen 0
7886 root /opt/traps/bin/authorized 76.0K
7872 root /opt/traps/bin/pmd 12.0K
8374 root /opt/puppetlabs/puppet/bin/ 0
7883 root /opt/traps/bin/trapsd 64.0K
----------------------------------------------------
54 10 5.1M
Like this?:
$ awk '$NF!=0' file
PID User Command Swap
7850 root /sbin/auditd 236.0K
...
7883 root /opt/traps/bin/trapsd 64.0K
----------------------------------------------------
54 10 5.1M
But instead of using the form awk ... file you'd probably like to smem ... | awk '$NF!=0'.
Could you please try following, for extra precautions removing the space from last fields(in case it is there).
smem -kt -c "pid user command swap" | awk 'FNR==1{print;next} {sub(/[[:space:]]+$/,"")} $NF==0{next} 1'

make zpool status output scriptable

How can I turn the output of zpool status -v into something usable, with data that match by row in a data.oriented format, instead of the silly "visual" output it uses, so that it's something scriptable, using standard unix-like utilities? I had a python script that did something acceptable, but python 3 completely breaks it, and I'm not fixing it just to have some new version of python break it again. (after screwing around getting the script to run with no errors, it returns nothing :)
bascially this space-bar alinged mess:
pool: data
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0 in 4h52m with 0 errors on Fri Aug 18 04:52:47 2017
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/6dfb7dbe-68c5-11e6-982d-00e04c68f511 ONLINE 0 0 0
gptid/27f40ebe-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
gptid/9244318f-c1b4-11e6-a31d-0cc47ae2abe8 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/1993f2d7-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
gptid/529e2c88-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
gptid/53a09a3e-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
gptid/51f3b377-6a20-11e6-be8c-00e04c68f511 ONLINE 0 0 0
gptid/9fb54bde-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
gptid/9eebde32-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
cache
gptid/63db5172-20bd-11e7-b561-0cc47ae2abe8 ONLINE 0 0 0
errors: No known data errors
to something with actual columnns like this:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
data mirror-0 ONLINE 0 0 0
data mirror-0 gptid/6dfb7dbe-68c5-11e6-982d-00e04c68f511 ONLINE 0 0 0
data mirror-0 gptid/27f40ebe-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
data mirror-0 gptid/9244318f-c1b4-11e6-a31d-0cc47ae2abe8 ONLINE 0 0 0
data mirror-1 ONLINE 0 0 0
data mirror-1 gptid/1993f2d7-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
data mirror-1 gptid/529e2c88-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
data mirror-1 gptid/53a09a3e-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
data mirror-2 ONLINE 0 0 0
data mirror-2 gptid/51f3b377-6a20-11e6-be8c-00e04c68f511 ONLINE 0 0 0
data mirror-2 gptid/9fb54bde-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
data mirror-2 gptid/9eebde32-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
data cache
data cache gptid/63db5172-20bd-11e7-b561-0cc47ae2abe8 ONLINE 0 0 0
I can use perl to remove and rearrange, but I can't work out how to match the rows dynamically, in a way that would work with mirror/raidz123/stripe/cache.
datadata ONLINE 0 0 0
data mirror-0 ONLINE 0 0 0
data gptid/6dfb7dbe-68c5-11e6-982d-00e04c68f511 ONLINE 0 0 0
data gptid/27f40ebe-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
data gptid/9244318f-c1b4-11e6-a31d-0cc47ae2abe8 ONLINE 0 0 0
data mirror-1 ONLINE 0 0 0
data gptid/1993f2d7-8f1b-11e4-94f8-3085a9405b85 ONLINE 0 0 0
data gptid/529e2c88-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
data gptid/53a09a3e-f1d1-11e6-89c3-0cc47ae2abe8 ONLINE 0 0 0
data mirror-2 ONLINE 0 0 0
data gptid/51f3b377-6a20-11e6-be8c-00e04c68f511 ONLINE 0 0 0
data gptid/9fb54bde-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
data gptid/9eebde32-1e2d-11e7-a83e-0cc47ae2abe8 ONLINE 0 0 0
datacache
data gptid/63db5172-20bd-11e7-b561-0cc47ae2abe8 ONLINE 0 0 0
This is the code that generates the above.
zpool status -v data | sed '/ data/, $!d' | grep -v errors: > /tmp/diskslistzpoolstatusdata
perl -pi -e 's/^\n$//' /tmp/diskslistzpoolstatusdata #remove blank lines
perl -pi -e 's/\t$//' /tmp/diskslistzpoolstatusdata
perl -p -i -e 's/\t//g' /tmp/diskslistzpoolstatusdata
perl -pi -e 's/^/data/' /tmp/diskslistzpoolstatusdata
extra:
include the scrub summary and error lines per gptid
NAME STATE READ WRITE CKSUM
misc ONLINE 0 0 0
misc mirror-0 ONLINE 0 0 0
misc mirror-0 gptid/aefbaf6e-e004-11e6-8f42-0cc47ae2abe8 ONLINE 0 0 0 0err/4h52m/0err/Fri Aug 18 04:52:47 2017 No known data errors
misc mirror-0 gptid/affc3cac-e004-11e6-8f42-0cc47ae2abe8 ONLINE 0 0 0 0err/4h52m/0err/Fri Aug 18 04:52:47 2017 No known data errors
misc cache gptid/3139819b-20bd-11e7-b561-0cc47ae2abe8 ONLINE 0 0 0 0err/4h52m/0err/Fri Aug 18 04:52:47 2017 No known data errors
Unfortunately there is no integrated solution available. You have two options:
Parse it yourself in a language of your choice. You already extracted the essential information. The layout is relatively static, as vdevs and pools cannot be nested (pools contain vdevs, never pools themselves), the order is respected (no devices from vdev A come after vdev B), the keywords are few and fixed (mirror-N, raidzX-N, etc), and the output is quite small (less than hundreds of lines usually). This means you just have to go through each row, read the info you need, store it in nested objects or simply arrays and go to the next line.
Directly call the appropriate C functions to get the status in non-readable form and convert the output. To do this, have a look at status_callback(zpool_handle_t *zhp, void *data), where all printf-output is generated from the pool data. You could mirror this function to convert the output into a format you like instead of the indented format, and then call your mini-application from your script to give you your data.
If you are familiar with C, option 2 would be faster I think. Performance-wise it does not matter much, as the data is small (even on big systems) and the calls will most likely be very infrequent (as pool layouts do not change often).

How can I extract fields from output of more +n?

I need to get certain fields from the output of the more +n command in Windows. The output of the more command is shown below. I need to extract certain field from this output.
Backup SAP L01_xyzabc_d01p001_PBW_ON_Daily Completed full 9/17/2013 6:00:05 PM 0:00 5:49 2360.00 1 0 0 254 100% 2013/09/17-135
Backup SAP L01_xyzabc_d01p001_PEC_ON_Daily Completed full 9/17/2013 7:00:05 PM 0:00 1:37 549.89 1 0 0 75 100% 2013/09/17-142
Backup SAP L01_xyzabc_d01p001_PPI_ON_Daily Completed full 9/17/2013 7:00:07 PM 0:00 2:04 656.00 1 0 0 104 100% 2013/09/17-143
Backup SAP L01_xyzabc_d01p001_PEP_ON_Daily Completed full 9/17/2013 8:00:05 PM 0:00 0:09 12.89 1 0 0 15 100% 2013/09/17-148
Backup SAP L01_xyzabc_d01p001_PDI_ON_Daily Completed full 9/17/2013 9:00:05 PM 0:00 0:07 5.63 1 0 0 14 100% 2013/09/17-156
Backup SAP L01_xyzabc_d01p001_PSM_ON_Daily Completed full 9/17/2013 10:00:06 P 0:00 0:22 92.08 1 0 0 21 100% 2013/09/17-161
Backup SAP L01_xyzabc_d01p001_PMD_ON_Daily Completed full 9/17/2013 11:00:06 P 0:00 0:09 9.53 1 0 0 26 100% 2013/09/17-169
Can this be done without installing anything or without using PowerShell?
-Louie
Try to use a for loop. This is a batch file version.
#echo off
for /f "tokens=1,2,3" %%a in ('more +n ...') do (
echo %%a %%b %%c
)
It would depend on the columns you wanted. You can see more info by typing help for on the command-line.

Resources