kernel module's reference count - linux-kernel

I have a module to maintain, and it appears it has issues with reference counter being kept in the kernel, which results in that I'm unable to rmmod my module, after I kill the daemon opening 3 raw sockets to the module. Interestingly, after loading the daemon 'lsmod' displays 6 references to the module exist, I'd expect only three.
This is occuring on ARM-based embedded system with Linux-2.6.31, and rmmod doesn't have 'force' mode to try enforce unload a module (it's anyway not good idea).
I've analyzed the code and here is what I have:
1) The module creates new socket address family AF_HSL and registers with the kernel:
static struct proto_ops SOCKOPS_WRAPPED (hsl_ops) = {
family: AF_HSL,
owner: THIS_MODULE,
release: hsl_sock_release,
bind: _hsl_sock_bind,
connect: sock_no_connect,
socketpair: sock_no_socketpair,
accept: sock_no_accept,
getname: _hsl_sock_getname,
poll: datagram_poll,
ioctl: sock_no_ioctl,
listen: sock_no_listen,
shutdown: sock_no_shutdown,
setsockopt: sock_no_setsockopt,
getsockopt: sock_no_getsockopt,
sendmsg: _hsl_sock_sendmsg,
recvmsg: _hsl_sock_recvmsg,
mmap: sock_no_mmap,
sendpage: sock_no_sendpage,
};
static struct net_proto_family hsl_family_ops = {
family: AF_HSL,
create: _hsl_sock_create,
owner: THIS_MODULE
};
...
static int
_hsl_sock_create (struct net *net, struct socket *sock, int protocol)
{
struct sock *sk = NULL;
sock->state = SS_UNCONNECTED;
sk = sk_alloc (current->nsproxy->net_ns, AF_HSL, GFP_KERNEL, &_prot);
if (sk == NULL)
goto ERR;
sock->ops = &SOCKOPS_WRAPPED(hsl_ops);
sock_init_data (sock,sk);
sock_hold (sk); /* XXX */
...
}
static void
_hsl_sock_destruct (struct sock *sk)
{
struct hsl_sock *hsk, *phsk;
if (!sk)
return;
...
sock_orphan (sk);
skb_queue_purge (&sk->sk_receive_queue);
sock_put (sk);
}
int
hsl_sock_release (struct socket *sock)
{
struct sock *sk = sock->sk;
/* Here goes logic to destroy net_devices */
...
_hsl_sock_destruct (sk);
sock->sk = NULL;
return 0;
}
2) the daemon creates sockets in such way
fd = socket(AF_HSL, SOCK_RAW, 0);;
bind();
getsockname();
However I don't think _hsl_sock_create() should be calling sock_hold(), that would bump socket's reference count, but it's already set to 1 by sock_init_data(), and at socket delete phase sock_put() would decrement by 1, however this would not have the socket free;d and completely removed from the system.
So I experimented and removed sock_hold(); Now killing daemon yields in all references removed and 'rmmod' succedes, however the number of references after starting daemon is still 3.
I also have checked code at socket_create() whick calls internal function __socket_create(), which in turn calls try_module_get() and holds module's reference count. This appears to be the only place I've found that explicitly increments the module's refcnt.
I am still confused. Would anyone try to help understand what is happening?
Looking forward to hearing from you.
Mark

Related

Has there been any change between kernel 5.15 and 5.4.0 concerning ioctl valid commands?

We have some custom driver working on 5.4.0. It's pretty old and the original developers are no longer supporting it, so we have to maintain it in our systems.
When upgrading to Ubuntu 22 (Kernel 5.15), the driver suddenly stopped working, and sending ioctl with the command SIOCDEVPRIVATE (which used to work in kernel 5.4.0, and in fact is used to get some necessary device information)now gives "ioctl: Operation not supported" error with no extra information anywhere on the logs.
So... has something changed between those two kernels? We did have to adapt some of the structures used to register the driver, but I can't see anything concerning registering valid operations there. Do I have to register valid operations somewhere now?
Alternatively, does somebody know what part of the kernel code is checking for the operation to be supported? I've been trying to find it from ioctl.c, but I can't seem to find where that particular error comes from.
The driver code that supposedly takes care of this (doesn't even reach first line on 5.15):
static int u50_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) {
struct u50_priv *priv = netdev_priv(dev);
if (cmd == SIOCDEVPRIVATE) {
memcpy(&ifr->ifr_data, priv->tty->name, strlen(priv->tty->name));
}
return 0;
}
And the attempt to access it that does no longer work:
struct ifreq ifr = {0};
struct ifaddrs *ifaddr, *ifa;
getifaddrs(&ifaddr);
for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
memcpy(ifr.ifr_name, ifa->ifa_name, IFNAMSIZ);
if (ioctl(lonsd, SIOCDEVPRIVATE, &ifr) < 0) {
perror("ioctl");
syslog(LOG_ERR, "Ioctl:%d: %s\n", __LINE__, strerror(errno));
}
...
and structure for registration
static const struct net_device_ops u50_netdev_ops = {
.ndo_init = u50_dev_init,
.ndo_uninit = u50_dev_uninit,
.ndo_open = u50_dev_open,
.ndo_stop = u50_dev_stop,
.ndo_start_xmit = u50_dev_xmit,
.ndo_do_ioctl = u50_dev_ioctl,
.ndo_set_mac_address = U50SetHWAddr,
};
If you need some code to respond to SIOCDEVPRIVATE, you used to be able to do it via ndo_do_ioctl (writing a compatible function, then linking it in a net_device_ops struct in 5.4). However, in 5.15 it was changed so now you have to implement a ndo_siocdevprivate function, rather than ndo_do_ioctl, which is no longer called, according to the kernel documentation.
source:
https://elixir.bootlin.com/linux/v5.15.57/source/include/linux/netdevice.h
Patch that did this: spinics.net/lists/netdev/msg698158.html

winsock2: How to get the ipv4/ipv6 address of a connected client after server side code calls `accept()`

There are other similar questions on this site, but they either do not related to winsock2 or they are suitable only for use with ipv4 address spaces. The default compiler for Visual Studio 2019 produces an error when the ntoa function is used, hence an ipv4 and ipv6 solution is required.
I did once produce the code to do this for a Linux system however I am currently at work and do not have access to that. It may or may not be "copy and paste"-able into a windows environment with winsock2. (Edit: I will of course add that code later this evening, but of course it might not be useful.)
The following contains an example, however this is an example for client side code, not server side code.
https://www.winsocketdotnetworkprogramming.com/winsock2programming/winsock2advancedInternet3c.html
Here, the getaddrinfo() function is used to obtain a structure containing matching ipv4 and ipv6 addresses. To obtain this information there is some interaction with DNS, which is not required in this case.
I have some server code which calls accept() (after bind and listen) to accept a client connection. I want to be able to print the client ip address and port to stdout.
The most closely related question on this site is here. However the answer uses ntoa and is only ipv4 compatible.
What I have so far:
So far I have something sketched out like this:
SOCKET acceptSocket = INVALID_SOCKET;
SOCKADDR_IN addr; // both of these are NOT like standard unix sockets
// I don't know how they differ and if they can be used with standard
// unix like function calls (eg: inet_ntop)
int addrlen = sizeof addr;
acceptSocket = accept(listenSocket, (SOCKADDR*)&addr, &addrlen);
if(acceptSocket == INVALID_SOCKET)
{
// some stuff
}
else
{
const std::size_t addrbuflen = INET6_ADDRSRTLEN;
char addrbuf[addrbuflen] = '\0'
inet_ntop(AF_INET, (void*)addr.sin_addr, (PSTR)addrbuf, addrbuflen);
// above line does not compile and mixes unix style function calls
// with winsock2 structures
std::cout << addrbuf << ':' << addr.sin_port << std::endl;
}
getpeername()
int ret = getpeername(acceptSocket, addrbuf, &addrbuflen);
// addrbuf cannot convert from char[65] to sockaddr*
if(ret == ???)
{
// TODO
}
You need to access the SOCKADDR. This is effectively a discriminated union. The first field tells you whether its an IPv4 (==AF_INET) or IPv6 (==AF_INET6) address. Depending on that you cast the addr pointer to be either struct sockaddr_in* or struct sockaddr_in6*, and then read off the IP address from the relevant field.
C++ code snippet in vs2019:
char* CPortListener::get_ip_str(struct sockaddr* sa, char* s, size_t maxlen)
{
switch (sa->sa_family) {
case AF_INET:
inet_ntop(AF_INET, &(((struct sockaddr_in*)sa)->sin_addr),
s, maxlen);
break;
case AF_INET6:
inet_ntop(AF_INET6, &(((struct sockaddr_in6*)sa)->sin6_addr),
s, maxlen);
break;
default:
strncpy(s, "Unknown AF", maxlen);
return NULL;
}
return s;
}
Example:
{
...
char s[INET6_ADDRSTRLEN];
sockaddr_storage ca;
socklen_t al = sizeof(ca);
SOCKET recv = accept(sd, (sockaddr*)&ca, &al);
pObj->m_ip = get_ip_str(((sockaddr*)&ca),s,sizeof(s));
}

Busy inodes/dentries after umount in self-written virtual fs

I wrote a simple FS that should only statically contain one file named hello. This file should contain the string Hello, world!. I did this for educational purposes. While the fs is mounted it actually behaves like expected. I can read the file just fine.
However after unmounting I always get
VFS: Busy inodes after unmount of dummyfs. Self-destruct in 5 seconds. Have a nice day...
If I called ls on the rootdir while the fs was mounted I get
BUG: Dentry (ptrval){i=2,n=hello} still in use (-1) [unmount of dummyfs dummyfs]
on top of that.
What does this mean in detail and how can I fix it?
The mount and kill_sb routines call mount_nodev and allocate space for a struct holding the 2 inodes this FS uses.
static struct dentry *dummyfs_mount(struct file_system_type* fs_type,
int flags, const char* dev_name, void* data)
{
struct dentry *ret;
ret = mount_nodev(fs_type, flags, data, dummyfs_fill_super);
if (IS_ERR(ret)) {
printk(KERN_ERR "dummyfs_mount failed");
}
return ret;
}
static void dummyfs_kill_sb(struct super_block *sb) {
kfree(sb->s_fs_info);
kill_litter_super(sb);
}
The fill superblock method creates the 2 inodes and saves them in the struct allocated by mount:
static int dummyfs_fill_super(struct super_block *sb, void *data, int flags)
{
struct dummyfs_info *fsi;
sb->s_magic = DUMMYFS_MAGIC;
sb->s_op = &dummyfs_sops;
fsi = kzalloc(sizeof(struct dummyfs_info), GFP_KERNEL);
sb->s_fs_info = fsi;
fsi->root = new_inode(sb);
fsi->root->i_ino = 1;
fsi->root->i_sb = sb;
fsi->root->i_op = &dummyfs_iops;
fsi->root->i_fop = &dummyfs_dops;
fsi->root->i_atime = fsi->root->i_mtime = fsi->root->i_ctime = current_time(fsi->root);
inode_init_owner(fsi->root, NULL, S_IFDIR);
fsi->file = new_inode(sb);
fsi->file->i_ino = 2;
fsi->file->i_sb = sb;
fsi->file->i_op = &dummyfs_iops;
fsi->file->i_fop = &dummyfs_fops;
fsi->file->i_atime = fsi->file->i_mtime = fsi->file->i_ctime = current_time(fsi->file);
inode_init_owner(fsi->file, fsi->root, S_IFREG);
sb->s_root = d_make_root(fsi->root);
return 0;
}
The lookup method just adds the fsi->file_inode to the dentry if the parent is the root dir:
if (parent_inode->i_ino == fsi->root->i_ino) {
d_add(child_dentry, fsi->file);
}
And the iterate method just emits the dot files and the hello file when called:
if (ctx->pos == 0) {
dir_emit_dots(file, ctx);
ret = 0;
}
if (ctx->pos == 2) {
dir_emit(ctx, "hello", 5, file->f_inode->i_ino, DT_UNKNOWN);
++ctx->pos;
ret = 0;
}
The read method just writes a static string using copy_to_user. The offsets are calculated correctly and on EOF the method just returns 0. However since the problems occur even when the read method was not called I think it is out-of-scope for this already too long question.
For actually running this I use user-mode linux from the git master (4.15+x commit d48fcbd864a008802a90c58a9ceddd9436d11a49). The userland is compiled from scratch and the init process is a derivative of Rich Felker's minimal init to which i added mount calls for /proc, /sys and / (remount).
My command line is ./linux ubda=../uml/image root=/dev/ubda
Any pointers to more thorough documentation are also appreciated.
Using gdb watching the dentry->d_lockref.count I realized that the kill_litter_super call in umount was actually responsible for the dentry issues. Replacing it with kill_anon_super solved that problem.
The busy inode problem vanished too mostly except when i unmounted after immediately after mounting. Allocating the second inode lazily solved that problem too.

FreeBSD newbus driver loading succesfully but cant create /dev/** file and debugging

I am installing a new newbuf driver on FreeBSD 10.0 . After compiling with make the driver.ko file has been created and than kldload can load successfully. kldload returns 0 and I can see the device at the kldstat output. When attempt to use the driver opening the /dev/** file, the file is not exist.
I think that this /dev/** file should be created by make_dev function which is located in device_attach member method. To test if the kldload reaches this attaching function; when write printf and uprintf to debug the driver, I can not see any output at console nor dmesg output.
But the problem is after writing printf at beginnings (after local variable definitions) of device_identify and device_probe functions, I can't see any output again at console nor dmesg.
My question is that even if the physical driver has problem (not located etc.), should I see the ouput of printf at the device_identify member function which is called by kldload at starting course (I think)?
Do I have a mistake when debugging newbuf driver with printf (I also tried a hello_world device driver and at this driver I can take output of printf at dmesg)?
Mainly how can I test/debug this driver's kldload processes?
Below some parts of my driver code (I think at least I should see MSG1, but I can not see):
struct mydrv_softc
{
device_t dev;
};
static devclass_t mydrv_devclass;
static struct cdevsw mydrv_cdevsw = {
.d_version = D_VERSION,
.d_name = "mydrv",
.d_flags = D_NEEDGIANT,
.d_open = mydrv_open,
.d_close = mydrv_close,
.d_ioctl = mydrv_ioctl,
.d_write = mydrv_write,
.d_read = mydrv_read
};
static void mydrv_identify (driver_t *driver, device_t parent) {
devclass_t dc;
device_t child;
printf("MSG1: The process inside the identfy function.");
dc = devclass_find("mydrv");
if (devclass_get_device(dc, 0) == NULL) {
child = BUS_ADD_CHILD(parent, 0, "mydrv", -1);
}
}
static int mydrv_probe(device_t dev) {
printf("MSG2: The process inside the probe function.");
mydrv_init();
if (device_get_unit(dev) != 0)
return (ENXIO);
device_set_desc(dev, "FreeBSD Device Driver");
return (0);
}
static int mydrv_attach(device_t dev) {
struct mydrv_softc *sc;
device_printf(dev, "MSG3: The process will make attachment.");
sc = (struct mydrv_softc *) device_get_softc(dev);
sc->dev = (device_t)make_dev(&mydrv_cdevsw, 0, UID_ROOT, GID_WHEEL, 0644, "mydrv_drv");
return 0;
}
static int mydrv_detach(device_t dev) {
struct mydrv_softc *sc;
sc = (struct mydrv_softc *) device_get_softc(dev);
destroy_dev((struct cdev*)(sc->dev));
bus_generic_detach(dev);
return 0;
}
static device_method_t mydrv_methods[] = {
DEVMETHOD(device_identify, mydrv_identify),
DEVMETHOD(device_probe, mydrv_probe),
DEVMETHOD(device_attach, mydrv_attach),
DEVMETHOD(device_detach, mydrv_detach),
{ 0, 0 }
};
static driver_t mydrv_driver = {
"mydrv",
mydrv_methods,
sizeof(struct mydrv_softc),
};
DRIVER_MODULE(mydrv, ppbus, mydrv_driver, mydrv_devclass, 0, 0);
If you don't see your printf's output on your console then your device functions will probably not be called. Can you show us your module's code?
Have you used DRIVER_MODULE() or DEV_MODULE()?
What parent bus are you using?
I guess printf works fine, but I prefer to use device_printf as it also prints the device name, and will be easier when looking through logs or dmesg output. Also leave multiple debug prints and check the log files on your system. Most logs for the device drivers are logged in /var/log/messages. But check other log files too.
Are you running your code on a virtual machine? Some device drivers don't show up their device files in /dev if the OS is running on a virtual machine. You should probably run your OS on actual hardware for the device file to show up.
As far as I know, you can't see the output in dmesg if you cannot find the corresponding device file in /dev but you may have luck with logs as I mentioned.
The easiest way to debug is of course using the printf statements. Other than this, you can debug the kernel using gdb running on another system. I am not familiar with the exact process but I know you can do this. Google it.

corrupted pointer in 'net_device'

the device driver I'm working on is implementing a virtual device. The logic
is as follows:
static struct net_device_ops virt_net_ops = {
.ndo_init = virt_net_init,
.ndo_open = virt_net_open,
.ndo_stop = virt_net_stop,
.ndo_do_ioctl = virt_net_ioctl,
.ndo_get_stats = virt_net_get_stats,
.ndo_start_xmit = virt_net_start_xmit,
};
...
struct net_device *dev;
struct my_dev *virt;
dev = alloc_netdev(..);
/* check for NULL */
virt = netdev_priv(dev);
dev->netdev_ops = &virt_net_ops;
SET_ETHTOOL_OPS(dev, &virt_ethtool_ops);
dev_net_set(dev, net);
virt->magic = MY_VIRT_DEV_MAGIC;
ret = register_netdev(dev);
if (ret) {
printk("register_netdev failed\n");
free_netdev(dev);
return ret;
}
...
What happens is that somewhere somehow the pointer net_device_ops in
'net_dev' gets corrupted, i.e.
1) create the device the first time (allocated net_dev, init the fields
including net_device_ops,which is
initialized with a static structure containing function pointers), register
the device with the kernel invoking register_netdev() - OK
2) attempt to create the device with the same name again, repeat the above
steps, call register_netdev() which will return negative and we
free_netdev(dev) and return error to the caller.
And between these two events the pointer to net_device_ops has changed,
although nowhere in the code it is done explicitly except the initialization
phase.
The kernel version is 2.6.31.8, platform MIPS. Communication channel between the user space and the kernel is implemented via netlink sockets.
Could anybody suggest what possibly can go wrong?
Appreciate any advices, thanks.
Mark
"The bug is somewhere else. "
The second device should not interact with the existing one. If you register_netdev with an existing name, nevertheless the ndo_init virtual function is called first before the condition is detected and -EEXIST is returned. Maybe your init function does something nasty involving some global variables. (For example, does the code assume there is one device, and stash a global pointer to it during initialization?)

Resources