Device number linux module issue - linux-kernel

I'm developing character devices for a Linux kernel distro Poky 1.7 built with Yocto and installed on a Freescale imx6sx target board. They're simple modules, that are just used to read/write IO registers of the SoC.
Everything has worked well till I've load the modules after login, by simple calling in the target console
modprobe mydevice
Problems have begun since I've automatically load this modules at boot time. First time system startup after flashing my board, my custom modules are loaded and I can open them, do read/write operations...therefore they're working.
Restarting the system (switching it off and on or rebooting), modules are still loaded, but I'm unable to use them.
After some trials, what I've seen is that the problem is dynamic allocation of the device number they use, in particular function alloc_chrdev_region( ) called in the __init( ) function of the module.
In fact, really first time the system is switched on after has been flashed, the device file's device number in /dev is up-to-date with the /sys/class/myclass/mydevice/dev device number, while from the first reboot they doesn't match anymore. Briefly:
First boot:
/dev/mydevice device number [Major, Minor] = 247,0
/sys/class/myclass/mydevice/dev device number [Major, Minor] = 247,0
Second and later boots:
/dev/mydevice device number [Major, Minor] = 247,0
/sys/class/myclass/mydevice/dev device number [Major, Minor] = 248,0
So, it seems like the device file is kept in memory instead of been updated at every boot.
Here is main code for __init( ) and __exit( ) functions:
static struct class *g_deviceClass;
static struct cdev g_characterDevice;
static dev_t g_deviceNumber;
static int __init npe_power_drv_init(void)
{
int32_t result;
if ( alloc_chrdev_region(&g_deviceNumber, 0, 1, DEVICE_NAME) < 0 )
{
return -EAGAIN;
}
cdev_init(&g_characterDevice, &npe_power_drv_ops);
g_deviceClass = class_create(THIS_MODULE, CLASS_NAME);
if( device_create(g_deviceClass, NULL, g_deviceNumber, NULL, DEVICE_NAME) == NULL )
{
class_destroy(g_deviceClass);
unregister_chrdev_region(g_deviceNumber, 1);
return -EAGAIN;
}
result = cdev_add(&g_characterDevice, g_deviceNumber, 1);
if( result < 0 )
{
device_destroy(g_deviceClass, g_deviceNumber);
class_destroy(g_deviceClass);
unregister_chrdev_region(g_deviceNumber, 1);
return -EAGAIN;
}
return 0;
}
static void __exit npe_power_drv_exit(void)
{
device_destroy(g_deviceClass, g_deviceNumber);
class_destroy(g_deviceClass);
cdev_del(&g_characterDevice);
unregister_chrdev_region(g_deviceNumber, 1);
}
module_init(npe_power_drv_init);
module_exit(npe_power_drv_exit);
Using static device number allocation, register_chrdev_region( ) everything works fine. Sure I can use this latter way to implement my code, but I would like to know why I've got this behaviour.
Many thanks for any help
Andrea

Related

Has there been any change between kernel 5.15 and 5.4.0 concerning ioctl valid commands?

We have some custom driver working on 5.4.0. It's pretty old and the original developers are no longer supporting it, so we have to maintain it in our systems.
When upgrading to Ubuntu 22 (Kernel 5.15), the driver suddenly stopped working, and sending ioctl with the command SIOCDEVPRIVATE (which used to work in kernel 5.4.0, and in fact is used to get some necessary device information)now gives "ioctl: Operation not supported" error with no extra information anywhere on the logs.
So... has something changed between those two kernels? We did have to adapt some of the structures used to register the driver, but I can't see anything concerning registering valid operations there. Do I have to register valid operations somewhere now?
Alternatively, does somebody know what part of the kernel code is checking for the operation to be supported? I've been trying to find it from ioctl.c, but I can't seem to find where that particular error comes from.
The driver code that supposedly takes care of this (doesn't even reach first line on 5.15):
static int u50_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) {
struct u50_priv *priv = netdev_priv(dev);
if (cmd == SIOCDEVPRIVATE) {
memcpy(&ifr->ifr_data, priv->tty->name, strlen(priv->tty->name));
}
return 0;
}
And the attempt to access it that does no longer work:
struct ifreq ifr = {0};
struct ifaddrs *ifaddr, *ifa;
getifaddrs(&ifaddr);
for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
memcpy(ifr.ifr_name, ifa->ifa_name, IFNAMSIZ);
if (ioctl(lonsd, SIOCDEVPRIVATE, &ifr) < 0) {
perror("ioctl");
syslog(LOG_ERR, "Ioctl:%d: %s\n", __LINE__, strerror(errno));
}
...
and structure for registration
static const struct net_device_ops u50_netdev_ops = {
.ndo_init = u50_dev_init,
.ndo_uninit = u50_dev_uninit,
.ndo_open = u50_dev_open,
.ndo_stop = u50_dev_stop,
.ndo_start_xmit = u50_dev_xmit,
.ndo_do_ioctl = u50_dev_ioctl,
.ndo_set_mac_address = U50SetHWAddr,
};
If you need some code to respond to SIOCDEVPRIVATE, you used to be able to do it via ndo_do_ioctl (writing a compatible function, then linking it in a net_device_ops struct in 5.4). However, in 5.15 it was changed so now you have to implement a ndo_siocdevprivate function, rather than ndo_do_ioctl, which is no longer called, according to the kernel documentation.
source:
https://elixir.bootlin.com/linux/v5.15.57/source/include/linux/netdevice.h
Patch that did this: spinics.net/lists/netdev/msg698158.html

How to remove dynamically assigned major number from /proc/devices?

In my kernel driver project I register with a dynamic major number by calling
register_chrdev(0, "xxxxx", &xxxxx);
and unregistered my module with
unregister_chrdev(0. "xxxxx");
When I load my driver with insmod, I received dynamic major number, for example 243, and, after rmmod, success removing module.
But, after removing the module /proc/devices still shows the major number (243).
How do I get removing my driver to also remove its major number from the list in /proc/devices?
When you call register_chrdev() with 0 as the first argument to request the assignment of a dynamic major number, the return value will be the assigned major number, which you should save.
Then when you call unregister_chrdev() you should pass the saved major number as an argument, rather than the 0 you were. Also make sure that the device name argument matches. And be aware that this function returns a result, which you can check for status/failure - in the latter case you definitely want to printk() a message so that you know that your code has not accomplished its goal.
You can see a complete example at http://www.tldp.org/LDP/lkmpg/2.6/html/x569.html with the key parts being:
static int Major; /* Major number assigned to our device driver */
int init_module(void)
{
Major = register_chrdev(0, DEVICE_NAME, &fops);
if (Major < 0) {
printk(KERN_ALERT "Registering char device failed with %d\n", Major);
return Major;
}
return SUCCESS;
}
void cleanup_module(void)
{
int ret = unregister_chrdev(Major, DEVICE_NAME);
if (ret < 0)
printk(KERN_ALERT "Error in unregister_chrdev: %d\n", ret);
}
Also be aware that this method of registering a device is considered outdated - you might want to research the newer method.

FreeBSD newbus driver loading succesfully but cant create /dev/** file and debugging

I am installing a new newbuf driver on FreeBSD 10.0 . After compiling with make the driver.ko file has been created and than kldload can load successfully. kldload returns 0 and I can see the device at the kldstat output. When attempt to use the driver opening the /dev/** file, the file is not exist.
I think that this /dev/** file should be created by make_dev function which is located in device_attach member method. To test if the kldload reaches this attaching function; when write printf and uprintf to debug the driver, I can not see any output at console nor dmesg output.
But the problem is after writing printf at beginnings (after local variable definitions) of device_identify and device_probe functions, I can't see any output again at console nor dmesg.
My question is that even if the physical driver has problem (not located etc.), should I see the ouput of printf at the device_identify member function which is called by kldload at starting course (I think)?
Do I have a mistake when debugging newbuf driver with printf (I also tried a hello_world device driver and at this driver I can take output of printf at dmesg)?
Mainly how can I test/debug this driver's kldload processes?
Below some parts of my driver code (I think at least I should see MSG1, but I can not see):
struct mydrv_softc
{
device_t dev;
};
static devclass_t mydrv_devclass;
static struct cdevsw mydrv_cdevsw = {
.d_version = D_VERSION,
.d_name = "mydrv",
.d_flags = D_NEEDGIANT,
.d_open = mydrv_open,
.d_close = mydrv_close,
.d_ioctl = mydrv_ioctl,
.d_write = mydrv_write,
.d_read = mydrv_read
};
static void mydrv_identify (driver_t *driver, device_t parent) {
devclass_t dc;
device_t child;
printf("MSG1: The process inside the identfy function.");
dc = devclass_find("mydrv");
if (devclass_get_device(dc, 0) == NULL) {
child = BUS_ADD_CHILD(parent, 0, "mydrv", -1);
}
}
static int mydrv_probe(device_t dev) {
printf("MSG2: The process inside the probe function.");
mydrv_init();
if (device_get_unit(dev) != 0)
return (ENXIO);
device_set_desc(dev, "FreeBSD Device Driver");
return (0);
}
static int mydrv_attach(device_t dev) {
struct mydrv_softc *sc;
device_printf(dev, "MSG3: The process will make attachment.");
sc = (struct mydrv_softc *) device_get_softc(dev);
sc->dev = (device_t)make_dev(&mydrv_cdevsw, 0, UID_ROOT, GID_WHEEL, 0644, "mydrv_drv");
return 0;
}
static int mydrv_detach(device_t dev) {
struct mydrv_softc *sc;
sc = (struct mydrv_softc *) device_get_softc(dev);
destroy_dev((struct cdev*)(sc->dev));
bus_generic_detach(dev);
return 0;
}
static device_method_t mydrv_methods[] = {
DEVMETHOD(device_identify, mydrv_identify),
DEVMETHOD(device_probe, mydrv_probe),
DEVMETHOD(device_attach, mydrv_attach),
DEVMETHOD(device_detach, mydrv_detach),
{ 0, 0 }
};
static driver_t mydrv_driver = {
"mydrv",
mydrv_methods,
sizeof(struct mydrv_softc),
};
DRIVER_MODULE(mydrv, ppbus, mydrv_driver, mydrv_devclass, 0, 0);
If you don't see your printf's output on your console then your device functions will probably not be called. Can you show us your module's code?
Have you used DRIVER_MODULE() or DEV_MODULE()?
What parent bus are you using?
I guess printf works fine, but I prefer to use device_printf as it also prints the device name, and will be easier when looking through logs or dmesg output. Also leave multiple debug prints and check the log files on your system. Most logs for the device drivers are logged in /var/log/messages. But check other log files too.
Are you running your code on a virtual machine? Some device drivers don't show up their device files in /dev if the OS is running on a virtual machine. You should probably run your OS on actual hardware for the device file to show up.
As far as I know, you can't see the output in dmesg if you cannot find the corresponding device file in /dev but you may have luck with logs as I mentioned.
The easiest way to debug is of course using the printf statements. Other than this, you can debug the kernel using gdb running on another system. I am not familiar with the exact process but I know you can do this. Google it.

corrupted pointer in 'net_device'

the device driver I'm working on is implementing a virtual device. The logic
is as follows:
static struct net_device_ops virt_net_ops = {
.ndo_init = virt_net_init,
.ndo_open = virt_net_open,
.ndo_stop = virt_net_stop,
.ndo_do_ioctl = virt_net_ioctl,
.ndo_get_stats = virt_net_get_stats,
.ndo_start_xmit = virt_net_start_xmit,
};
...
struct net_device *dev;
struct my_dev *virt;
dev = alloc_netdev(..);
/* check for NULL */
virt = netdev_priv(dev);
dev->netdev_ops = &virt_net_ops;
SET_ETHTOOL_OPS(dev, &virt_ethtool_ops);
dev_net_set(dev, net);
virt->magic = MY_VIRT_DEV_MAGIC;
ret = register_netdev(dev);
if (ret) {
printk("register_netdev failed\n");
free_netdev(dev);
return ret;
}
...
What happens is that somewhere somehow the pointer net_device_ops in
'net_dev' gets corrupted, i.e.
1) create the device the first time (allocated net_dev, init the fields
including net_device_ops,which is
initialized with a static structure containing function pointers), register
the device with the kernel invoking register_netdev() - OK
2) attempt to create the device with the same name again, repeat the above
steps, call register_netdev() which will return negative and we
free_netdev(dev) and return error to the caller.
And between these two events the pointer to net_device_ops has changed,
although nowhere in the code it is done explicitly except the initialization
phase.
The kernel version is 2.6.31.8, platform MIPS. Communication channel between the user space and the kernel is implemented via netlink sockets.
Could anybody suggest what possibly can go wrong?
Appreciate any advices, thanks.
Mark
"The bug is somewhere else. "
The second device should not interact with the existing one. If you register_netdev with an existing name, nevertheless the ndo_init virtual function is called first before the condition is detected and -EEXIST is returned. Maybe your init function does something nasty involving some global variables. (For example, does the code assume there is one device, and stash a global pointer to it during initialization?)

Getting the current stack trace on Mac OS X

I'm trying to work out how to store and then print the current stack in my C++ apps on Mac OS X. The main problem seems to be getting dladdr to return the right symbol when given an address inside the main executable. I suspect that the issue is actually a compile option, but I'm not sure.
I have tried the backtrace code from Darwin/Leopard but it calls dladdr and has the same issue as my own code calling dladdr.
Original post:
Currently I'm capturing the stack with this code:
int BackTrace(Addr *buffer, int max_frames)
{
void **frame = (void **)__builtin_frame_address(0);
void **bp = ( void **)(*frame);
void *ip = frame[1];
int i;
for ( i = 0; bp && ip && i < max_frames; i++ )
{
*(buffer++) = ip;
ip = bp[1];
bp = (void**)(bp[0]);
}
return i;
}
Which seems to work ok. Then to print the stack I'm looking at using dladdr like this:
Dl_info dli;
if (dladdr(Ip, &dli))
{
ptrdiff_t offset;
int c = 0;
if (dli.dli_fname && dli.dli_fbase)
{
offset = (ptrdiff_t)Ip - (ptrdiff_t)dli.dli_fbase;
c = snprintf(buf, buflen, "%s+0x%x", dli.dli_fname, offset );
}
if (dli.dli_sname && dli.dli_saddr)
{
offset = (ptrdiff_t)Ip - (ptrdiff_t)dli.dli_saddr;
c += snprintf(buf+c, buflen-c, "(%s+0x%x)", dli.dli_sname, offset );
}
if (c > 0)
snprintf(buf+c, buflen-c, " [%p]", Ip);
Which almost works, some example output:
/Users/matthew/Library/Frameworks/Lgi.framework/Versions/A/Lgi+0x2473d(LgiStackTrace+0x5d) [0x102c73d]
/Users/matthew/Code/Lgi/LgiRes/build/Debug/LgiRes.app/Contents/MacOS/LgiRes+0x2a006(tart+0x28e72) [0x2b006]
/Users/matthew/Code/Lgi/LgiRes/build/Debug/LgiRes.app/Contents/MacOS/LgiRes+0x2f438(tart+0x2e2a4) [0x30438]
/Users/matthew/Code/Lgi/LgiRes/build/Debug/LgiRes.app/Contents/MacOS/LgiRes+0x35e9c(tart+0x34d08) [0x36e9c]
/Users/matthew/Code/Lgi/LgiRes/build/Debug/LgiRes.app/Contents/MacOS/LgiRes+0x1296(tart+0x102) [0x2296]
/Users/matthew/Code/Lgi/LgiRes/build/Debug/LgiRes.app/Contents/MacOS/LgiRes+0x11bd(tart+0x29) [0x21bd]
It's getting the method name right for the shared object but not for the main app. Those just map to "tart" (or "start" minus the first character).
Ideally I'd like line numbers as well as the method name at that point. But I'll settle for the correct function/method name for starters. Maybe shoot for line numbers after that, on Linux I hear you have to write your own parser for a private ELF block that has it's own instruction set. Sounds scary.
Anyway, can anyone sort this code out so it gets the method names right?
What releases of OS X are you targetting. If you are running on Mac OS X 10.5 and higher you can just use the backtrace() and backtrace_symbols() libraray calls. They are defined in execinfo.h, and there is a manpage with some sample code.
Edit:
You mentioned in the comments that you need to run on Tiger. You can probably just include the implementation from Libc in your app. The source is available from Apple's opensource site. Here is a link to the relevent file.

Resources