How to get the host name of kill signal sender in systemtap script - kill

all,
I encounter a problem when use systemtap script.
I don't know how to get the host name of kill signal sender in systemtap script.
for example. I am execute kill -9 xclock_process_pid in server 'sf1'. at the same time, I run 1.stap -x xclock_process_pid to monitor xclock,
is there any method to obtain the server name 'sf1' in systemtap script when send a kill -9 xclock_process_pid in 'sf1'?
but I am encounter some problem. my 1.stap is shown below:
#!/usr/bin/env stap
function hostname:string () %{
STAP_RETURN(current->nsproxy->uts_ns->name.nodename);
%}
probe oneshot {
log(hostname())
}
when I run 'stap -g 1.stap' will reprot the following error Could you help me? semantic error: probe point mismatch at position 0 (alternatives: __nfs __scheduler __signal __tcpmib __vm _linuxmib _signal _sunrpc _syscall _vfs begin begin(number) end end(number) error error(number) generic ioblock ioblock_trace ioscheduler ioscheduler_trace ipmib irq_handler kernel kprobe kprocess linuxmib module(string) nd_syscall netdev never nfs nfsd perf process process(number) process(string) procfs procfs(string) scheduler scsi signal socket softirq stap staprun sunrpc syscall tcp tcpmib timer tty udp vfs vm workqueue): identifier 'oneshot' at systemtap.stap:87:7 while resolving probe point oneshot source: probe oneshot { ^ Pass 2: analysis failed. Try again with another '--vp 01' option.

In other words, you're asking how to get the hostname of the current machine. A TCP/IP-level hostname is not in reach. The sethostname(2) level name is not easy to reach, being hidden inside kernel variables behind locked utsname()-> fields.
If safety were not a problem, you could do it via an embedded-C function for 4.8 era kernels:
function hostname:string () %{
STAP_RETURN(current->nsproxy->uts_ns->name.nodename);
%}
probe oneshot {
log(hostname())
}
that you would run with stap -g ... (guru mode).

Related

Running awesome-client from a script executing as root

Running Awesome on Debian (11) testing
awesome v4.3 (Too long)
• Compiled against Lua 5.3.3 (running with Lua 5.3)
• D-Bus support: ✔
• execinfo support: ✔
• xcb-randr version: 1.6
• LGI version: 0.9.2
I'm trying to signal to Awesome when systemd triggers suspend. After fiddling with D-Bus directly for awhile and getting nowhere, I wrote a couple of functions that somewhat duplicate the functionality of signals.
I tested it by running the following command in a shell, inside of my Awesome session:
$ awesome-client 'require("lib.syskit").signal("awesome-client", "Hello world!")'
This runs just fine. A notification posts to the desktop "Hello world!" as expected. I added the path to my lib.syskit code to the $LUA_PATH in my ~/.xsessionrc. Given the error described below, I doubt this is an issue.
Now for the more difficult part. I put the following in a script located at /lib/systemd/system-sleep/pre-suspend.sh
#!/bin/bash
if [ "${1}" == "pre" ]; then
ERR=$(export DISPLAY=":0"; sudo -u naddan awesome-client 'require("lib.syskit").signal("awesome-client", "pre-suspend")' 2>&1)
echo "suspending at `date`, ${ERR}" > /tmp/systemd_suspend_test
elif [ "${1}" == "post" ]; then
ERR=$(export DISPLAY=":0"; sudo -u naddan awesome-client 'require("lib.syskit").signal("awesome-client", "post-suspend")' 2>&1)
echo "resuming at `date`, ${ERR}" >> /tmp/systemd_suspend_test
fi
Here's the output written to /tmp/systemd_suspend_test
suspending at Thu 22 Jul 2021 10:58:01 PM MDT, Failed to open connection to "session" message bus: /usr/bin/dbus-launch terminated abnormally without any error message
E: dbus-send failed.
resuming at Thu 22 Jul 2021 10:58:05 PM MDT, Failed to open connection to "session" message bus: /usr/bin/dbus-launch terminated abnormally without any error message
E: dbus-send failed.
Given that I'm already telling it the $DISPLAY that Awesome is running under (this is a laptop), and that I'm running awesome-client as my user, not root, what else am I missing that's keeping this from working?
Is there a better way that I could achieve telling Awesome when the system suspends?
awesome-client is a shell script. It is a thin wrapper around dbus-send. Thus, since you write "After fiddling with D-Bus directly for awhile and getting nowhere", I guess the same reasoning applies.
Given that I'm already telling it the $DISPLAY that Awesome is running under (this is a laptop), and that I'm running awesome-client as my user, not root, what else am I missing that's keeping this from working?
You are missing the address of the dbus session bus. For me, it is:
$ env | grep DBUS
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
Is there a better way that I could achieve telling Awesome when the system suspends?
Instead of sending a message directly to awesome via some script, you could use the existing mechanism for this: Dbus signals. These are broadcasts that interested parties can listen to.
Google suggests that there is already a PrepareForSleep signal:
https://serverfault.com/questions/573379/system-suspend-dbus-upower-signals-are-not-seen
Based on this, Google then gave me the following AwesomeWM lua code that listens for logind's PrepareForSleep signal (written by yours truely - thanks Google for finding that!):
https://github.com/awesomeWM/awesome/issues/344#issuecomment-328354719
local lgi = require("lgi")
local Gio = lgi.require("Gio")
local function listen_to_signals()
local bus = lgi.Gio.bus_get_sync(Gio.BusType.SYSTEM)
local sender = "org.freedesktop.login1"
local interface = "org.freedesktop.login1.Manager"
local object = "/org/freedesktop/login1"
local member = "PrepareForSleep"
bus:signal_subscribe(sender, interface, member, object, nil, Gio.DBusSignalFlags.NONE,
function(bus, sender, object, interface, signal, params)
-- "signals are sent right before (with the argument True) and
-- after (with the argument False) the system goes down for
-- reboot/poweroff, resp. suspend/hibernate."
if not params[1] then
-- This code is run before suspend. You can replace the following with something else.
require("gears.timer").start_new(2, function()
mytextclock:force_update()
end)
end
end)
end
listen_to_signals()

NET-SNMP: custom disman monitor not sending trap

I configured in the snmpd.conf file the "extend" function to monitor a custom script.
extend shelltest /bin/sh /tmp/snmptest.sh
Then I checked the MIB for this if this "extend" function is working or not. I am able to see all Mibs for this.
snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.8072.1.3.2
If the output of my custom script returns a failed state then the following OID returns other valued then 0.
NET-SNMP-EXTEND-MIB::nsExtendResult."shelltest" = INTEGER: 3
The disman monitors looks like this:
monitor -r 30 -u internal -o ErrorMsg "False" nsExtendResult."shelltest" != 0
So from my point of view if the nsExtendResult."shelltest" return other values then 0 a trap should be triggered.
I tried several options for the monitor function but it was never working. The default monitors for example disk monitor or file monitors are working well.

Find process where a particular system call returns a particular error

On OS X El Capitan, my log file system.log feels with hundreds of the following lines at times
03/07/2016 11:52:17.000 kernel[0]: hfs_clonefile: cluster_read failed - 34
but there is no indication of the process where this happens. Apart from that, Disk Utility could not find any fault with the file system. But I would still like to know what is going on and it seems to me that dtrace should be perfectly suited to find out that faulty process but I am stuck. I know of the function return probe but it seems to require the PID, e.g.
dtrace -n 'pidXXXX::hfs_clonefile:return { printf("ret: %d", arg1); }'
Is there a way to tell dtrace to probe all processes? And then how would I print the process name?
You can try something like this (I don't have access to an OS X machine to test it)
#!/usr/sbin/dtrace -s
# pragma D option quiet
fbt::hfs_clonefile:return
/ args[ 1 ] != 0 /
{
printf( "\n========\nprocess: %s, pid: %d, ret value: %d\n", execname, pid, args[ 1 ] );
/* get kernel and user-space stacks */
stack( 20 );
ustack( 20 );
}
For the fbt probes, args[ 1 ] is the value returned by the function.
The dTrace script will print out the process name, pid, and return value from hfs_clonefile() whenever the return value is not zero. It also adds the kernel and user space stack traces. That should be more than enough data for you to find the source of the errors.
Assuming it works on OS X, anyway.
You can use the syscall provider rather than the pid provider to do this sort of thing. Something like:
sudo dtrace -n 'syscall::hfs_clonefile*:return /errno != 0/ { printf("ret: %d\n", errno); }'
The above command is a minor variant of what's used within the built-in DTrace-based errinfo utility. You can view /usr/bin/errinfo in any editor to see how it works.
However, there's no hfs_clonefile syscall, as least as far as DTrace is concerned, on my El Capitan (10.11.5) system:
$ sudo dtrace -l -n 'syscall::hfs*:'
ID PROVIDER MODULE FUNCTION NAME
dtrace: failed to match syscall::hfs*:: No probe matches description
Also, unfortunately the syscall provider is prevented from tracing system processes by the System Integrity Protection feature introduced with El Capitan (macOS 10.11). So, you will have to disable SIP which makes your system less secure.

Read fails after tcpreplay with error: 0: Resource temporarily unavailabl

I have a very simple script to run. It calls tcpreplay and then ask the user to type in something. Then the read will fail with read: read error: 0: Resource temporarily unavailable.
Here is the code
#!/bin/bash
tcpreplay -ieth4 SMTP.pcap
echo TEST
read HANDLE
echo $HANDLE
And the output is
[root#vse1 quick_test]# ./test.sh
sending out eth4
processing file: SMTP.pcap
Actual: 28 packets (4380 bytes) sent in 0.53 seconds. Rated: 8264.2 bps, 0.06 Mbps, 52.83 pps
Statistics for network device: eth4
Attempted packets: 28
Successful packets: 28
Failed packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
TEST
./test.sh: line 6: read: read error: 0: Resource temporarily unavailable
[root#vse1 quick_test]#
I am wondering if I need to close or clear up any handles or pipes after I run tcpreplay?
Apparently tcpreplay sets O_NONBLOCK on stdin and then doesn't remove it. I'd say it's a bug in tcpreplay. To work it around you can run tcpreplay with stdin redirected from /dev/null. Like this:
tcpreplay -i eth4 SMTP.pcap </dev/null
Addition: note that this tcpreplay behavior breaks non-interactive shells only.
Another addition: alternatively, if you really need tcpreplay to receive your
input you can write a short program which resets O_NONBLOCK. Like this one
(reset-nonblock.c):
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int
main()
{
if (fcntl(STDIN_FILENO, F_SETFL,
fcntl(STDIN_FILENO, F_GETFL) & ~O_NONBLOCK) < 0) {
perror(NULL);
return 1;
}
return 0;
}
Make it with "make reset-nonblock", then put it in your PATH and use like this:
tcpreplay -i eth4 SMTP.pcap
reset-nonblock
While the C solution works, you can turn off nonblocking input in one line from the command-line using Python. Personally, I alias it to "setblocking" since it is fairly handy.
$ python3 -c $'import os\nos.set_blocking(0, True)'
You can also have Python print the previous state so that it may be changed only temporarily:
$ o=$(python3 -c $'import os\nprint(os.get_blocking(0))\nos.set_blocking(0, True)')
$ somecommandthatreadsstdin
$ python3 -c $'import os\nos.set_blocking(0, '$o')'
Resource temporarily unavailable is EAGAIN (or EWOULDBLOCK) which is the error code for nonblocking file descriptor when no further data is available (would block if wasn't in nonblocking mode). The previous command (tcpreplay in this case) erroneously left STDIN in nonblocking mode. The shell will not correct it, and the following process isn't meant to work with non- default nonblocking STDIN.
In your script, you can also turn off nonblocking with:
perl -MFcntl -e 'fcntl STDIN, F_SETFL, fcntl(STDIN, F_GETFL, 0) & ~O_NONBLOCK'

pppd stuck in dialing process

i'm connecting several usb modems to my Ubuntu:
uname -a
Linux devlp 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 21:21:01 UTC 2011 i686 GNU/Linux
pppd version: 2.4.5
i'm doing a test with 8 sierra wireless modems, and all of them are connected and work. each of them has a "ppp" interface.
after they are connected, i'm trying to reconnect ppp7, and at first, pppd fails, then in the second try it reaches the point where it says: "Serial connection established" and stuck. i tried all kill signals to kill that pppd without success, and the only way to terminate it is to plug-out the modem it tried to dial.
i looked for the exact place where the pppd get stuck and it's right over here:
int generic_establish_ppp (int fd)
{
int x;
if (new_style_driver) {
int flags;
FILE *f=fopen("/root/ptest.log","a");
fputs("before get channel\n",f);
fflush(f);
/* Open an instance of /dev/ppp and connect the channel to it */
if (ioctl(fd, PPPIOCGCHAN, &chindex) == -1) { // <<<<<<< STUCK HERE
error("Couldn't get channel number: %m");
goto err;
}
fputs("after get channel\n",f);
....
}
}
it looks like the problem is specifically with ppp7 - and it can be any modem so i don't think it's a modem problem, but what i don't understand is what really happens in that command? who is responsible for the answer? is it only the kernel? the modem driver? the modem itself? i don't quite understand what to do with that information since PPPIOCGCHAN documentation is very poor..
i thought at first that maybe pppd is not releasing the channel or ppp after disconnect so i compiled my own pppd version and added PPPIOCDISCONN and PPPIOCDETACH just to make sure my version is fine with that, and the results were the same.
what you think?
well i'm pretty i solved this one - I've added some lines in pppd before the ioctl command to make fd NONBLOCK :)

Resources