corrupted pointer in 'net_device' - linux-kernel

the device driver I'm working on is implementing a virtual device. The logic
is as follows:
static struct net_device_ops virt_net_ops = {
.ndo_init = virt_net_init,
.ndo_open = virt_net_open,
.ndo_stop = virt_net_stop,
.ndo_do_ioctl = virt_net_ioctl,
.ndo_get_stats = virt_net_get_stats,
.ndo_start_xmit = virt_net_start_xmit,
};
...
struct net_device *dev;
struct my_dev *virt;
dev = alloc_netdev(..);
/* check for NULL */
virt = netdev_priv(dev);
dev->netdev_ops = &virt_net_ops;
SET_ETHTOOL_OPS(dev, &virt_ethtool_ops);
dev_net_set(dev, net);
virt->magic = MY_VIRT_DEV_MAGIC;
ret = register_netdev(dev);
if (ret) {
printk("register_netdev failed\n");
free_netdev(dev);
return ret;
}
...
What happens is that somewhere somehow the pointer net_device_ops in
'net_dev' gets corrupted, i.e.
1) create the device the first time (allocated net_dev, init the fields
including net_device_ops,which is
initialized with a static structure containing function pointers), register
the device with the kernel invoking register_netdev() - OK
2) attempt to create the device with the same name again, repeat the above
steps, call register_netdev() which will return negative and we
free_netdev(dev) and return error to the caller.
And between these two events the pointer to net_device_ops has changed,
although nowhere in the code it is done explicitly except the initialization
phase.
The kernel version is 2.6.31.8, platform MIPS. Communication channel between the user space and the kernel is implemented via netlink sockets.
Could anybody suggest what possibly can go wrong?
Appreciate any advices, thanks.
Mark

"The bug is somewhere else. "
The second device should not interact with the existing one. If you register_netdev with an existing name, nevertheless the ndo_init virtual function is called first before the condition is detected and -EEXIST is returned. Maybe your init function does something nasty involving some global variables. (For example, does the code assume there is one device, and stash a global pointer to it during initialization?)

Related

Has there been any change between kernel 5.15 and 5.4.0 concerning ioctl valid commands?

We have some custom driver working on 5.4.0. It's pretty old and the original developers are no longer supporting it, so we have to maintain it in our systems.
When upgrading to Ubuntu 22 (Kernel 5.15), the driver suddenly stopped working, and sending ioctl with the command SIOCDEVPRIVATE (which used to work in kernel 5.4.0, and in fact is used to get some necessary device information)now gives "ioctl: Operation not supported" error with no extra information anywhere on the logs.
So... has something changed between those two kernels? We did have to adapt some of the structures used to register the driver, but I can't see anything concerning registering valid operations there. Do I have to register valid operations somewhere now?
Alternatively, does somebody know what part of the kernel code is checking for the operation to be supported? I've been trying to find it from ioctl.c, but I can't seem to find where that particular error comes from.
The driver code that supposedly takes care of this (doesn't even reach first line on 5.15):
static int u50_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) {
struct u50_priv *priv = netdev_priv(dev);
if (cmd == SIOCDEVPRIVATE) {
memcpy(&ifr->ifr_data, priv->tty->name, strlen(priv->tty->name));
}
return 0;
}
And the attempt to access it that does no longer work:
struct ifreq ifr = {0};
struct ifaddrs *ifaddr, *ifa;
getifaddrs(&ifaddr);
for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
memcpy(ifr.ifr_name, ifa->ifa_name, IFNAMSIZ);
if (ioctl(lonsd, SIOCDEVPRIVATE, &ifr) < 0) {
perror("ioctl");
syslog(LOG_ERR, "Ioctl:%d: %s\n", __LINE__, strerror(errno));
}
...
and structure for registration
static const struct net_device_ops u50_netdev_ops = {
.ndo_init = u50_dev_init,
.ndo_uninit = u50_dev_uninit,
.ndo_open = u50_dev_open,
.ndo_stop = u50_dev_stop,
.ndo_start_xmit = u50_dev_xmit,
.ndo_do_ioctl = u50_dev_ioctl,
.ndo_set_mac_address = U50SetHWAddr,
};
If you need some code to respond to SIOCDEVPRIVATE, you used to be able to do it via ndo_do_ioctl (writing a compatible function, then linking it in a net_device_ops struct in 5.4). However, in 5.15 it was changed so now you have to implement a ndo_siocdevprivate function, rather than ndo_do_ioctl, which is no longer called, according to the kernel documentation.
source:
https://elixir.bootlin.com/linux/v5.15.57/source/include/linux/netdevice.h
Patch that did this: spinics.net/lists/netdev/msg698158.html

Reading the contents of a user-space page from Kernel

I'm getting a crash as a result of a call to kmap and I don'nt know why. I'm hoping someone with more kernel knowledge than I can help with this. Here is the code:
pgd_t *pgd = pgd_offset(vma->vm_mm, userspace_addr);
pud_t *pud = pud_offset(pgd, userspace_addr);
pmd_t *pmd = pmd_offset(pud, userspace_addr);
pte_t *pte = pte_offset_map(pmd, userspace_addr);
if (pte_present_user(*pte)) {
void *p = NULL;
struct page *page = pte_page(*pte);
get_page(page);
p = kmap(page); /* CRASH HERE??? */
/* Read from 'p' */
kunmap(p);
put_page(page);
}
I've isolated that the call to kmap is the culprit, since without it the code runs fine. All pointers are valid as far as I can tell.
I'm unsure if pte_offset_map should be used in conjunction with kmap...
The code above is run with mm->mmap_sem and vma->vm_mm->page_table_lock locked and on a kthread within the kernel context.
I think I solved it. The main issue in my code was that I was passing the pointer returned from kmap into kunmap. I should have passed in the page pointer instead.
One other change would be to use pte_offset_kernel instead of pte_offset_map.

How to remove dynamically assigned major number from /proc/devices?

In my kernel driver project I register with a dynamic major number by calling
register_chrdev(0, "xxxxx", &xxxxx);
and unregistered my module with
unregister_chrdev(0. "xxxxx");
When I load my driver with insmod, I received dynamic major number, for example 243, and, after rmmod, success removing module.
But, after removing the module /proc/devices still shows the major number (243).
How do I get removing my driver to also remove its major number from the list in /proc/devices?
When you call register_chrdev() with 0 as the first argument to request the assignment of a dynamic major number, the return value will be the assigned major number, which you should save.
Then when you call unregister_chrdev() you should pass the saved major number as an argument, rather than the 0 you were. Also make sure that the device name argument matches. And be aware that this function returns a result, which you can check for status/failure - in the latter case you definitely want to printk() a message so that you know that your code has not accomplished its goal.
You can see a complete example at http://www.tldp.org/LDP/lkmpg/2.6/html/x569.html with the key parts being:
static int Major; /* Major number assigned to our device driver */
int init_module(void)
{
Major = register_chrdev(0, DEVICE_NAME, &fops);
if (Major < 0) {
printk(KERN_ALERT "Registering char device failed with %d\n", Major);
return Major;
}
return SUCCESS;
}
void cleanup_module(void)
{
int ret = unregister_chrdev(Major, DEVICE_NAME);
if (ret < 0)
printk(KERN_ALERT "Error in unregister_chrdev: %d\n", ret);
}
Also be aware that this method of registering a device is considered outdated - you might want to research the newer method.

Get_user running at kernel mode returns error

I have a problem with get_user() macro. What I did is as follows:
I run the following program
int main()
{
int a = 20;
printf("address of a: %p", &a);
sleep(200);
return 0;
}
When the program runs, it outputs the address of a, say, 0xbff91914.
Then I pass this address to a module running in Kernel Mode that retrieves the contents at this address (at the time when I did this, I also made sure the process didn't terminate, because I put it to sleep for 200 seconds... ):
The address is firstly sent as a string, and I cast them into pointer type.
int * ptr = (int*)simple_strtol(buffer, NULL,16);
printk("address: %p",ptr); // I use this line to make sure the cast is correct. When running, it outputs bff91914, as expected.
int val = 0;
int res;
res= get_user(val, (int*) ptr);
However, res is always not 0, meaning that get_user returns error. I am wondering what is the problem....
Thank you!!
-- Fangkai
That is probably because you're trying to get value from a different user space. That address you got is from your simple program's address space, while you're probably using another program for passing the value to the module, aren't you?
The call to get_user must be made in the context of the user process.
Since you write "I also made sure the process didn't terminate, because I put it to sleep for 200 seconds..." I have a feeling you are not abiding by that rule. For the call to get_user to be in the context of the user process, you would have had to make a system call from that process and there would not have been a need to sleep the process.
So, you need to have your user process make a system call (an ioctl would be fine) and from that system call make the call to get_user.

How to tell if I'm leaking IMalloc memory?

I'd like to just know if there is a well-established standard way to ensure that one's process doesn't leak COM based resources (such as IMalloc'd objects)?
Take the following code as an example:
HRESULT STDMETHODCALLTYPE CShellBrowserDialog::OnStateChange(__RPC__in_opt IShellView *ppshv, ULONG uChange)
{
TRACE("ICommDlgBrowser::OnStateChange\n");
if (uChange == CDBOSC_SELCHANGE)
{
CComPtr<IDataObject> data;
if (ppshv->GetItemObject(SVGIO_SELECTION, IID_IDataObject, (void**)&data) == S_OK )
{
UINT cfFormat = RegisterClipboardFormat(CFSTR_SHELLIDLIST);
FORMATETC fmtetc = { cfFormat, 0, DVASPECT_CONTENT, -1, TYMED_HGLOBAL };
STGMEDIUM stgmed;
if (data->GetData(&fmtetc, &stgmed) == S_OK)
{
TCHAR path[MAX_PATH];
// check if this single selection (or multiple)
CIDA * cida = (CIDA*)stgmed.hGlobal;
if (cida->cidl == 1)
{
const ITEMIDLIST * pidlDirectory = (const ITEMIDLIST *)(((LPBYTE)cida) + cida->aoffset[0]);
const ITEMIDLIST * pidlFile = (const ITEMIDLIST *)(((LPBYTE)cida) + cida->aoffset[1]);
ITEMIDLIST * pidl = Pidl_Concatenate(pidlDirectory, pidlFile);
// we now have the absolute pidl of the currently selected filesystem object
if (!SHGetPathFromIDList(pidl, path))
strcpy_s(path, _T("<this object has no path>"));
// free our absolute pidl
Pidl_Free(pidl);
}
else if (cida->cidl > 1)
strcpy_s(path, _T("{multiple selection}"));
else
strcpy_s(path, _T("-"));
// trace the current selection
TRACE(_T(" Current Selection = %s\n"), path);
// release the data
ReleaseStgMedium(&stgmed);
}
}
}
return E_NOTIMPL;
}
So in the above code, I have at least three allocations that occur in code that I call, with only one of them being properly cleaned up automatically. The first is the acquisition of IDataObject pointer, which increments that object's reference count. CComPtr<> takes care of that issue for me.
But then there is IDataObject::GetData, which allocates an HGLOBAL for me. And a utility function Pidl_Concatenate which creates a PIDL for me (code left out, but you can imagine it does the obvious thing, via IMalloc::Alloc()). I have another utility Pidl_Free which releases that memory for me, but must be manually called [which makes the code in question full of exception safety issues (its utterly unsafe as its currently written -- I hate MS's coding mechanics - just asking for memory to fail to be released properly].
I will enhance this block of code to have a PIDL class of some sort, and probably a CIDA class as well, to ensure that they're properly deallocated even in the face of exceptions. But I would still like to know if there is a good utility or idiom for writing COM applications in C++ that can ensure that all IMallocs and AddRef/Dispose are called for that application's lifetime!
Implementing the IMallocSpy interface (see CoRegisterMallocSpy Function) may help get you some of the way.
Note that this is for debugging only, and be careful. There are cautionary tales on the web...
You can not free the global handle returned by IDataObject::GetData, otherwise other programs can not paste from the clipboard after the data is cleaned up.
Any pidl you get from shell needs to be freed using IMalloc::Free or ILFree (same effect once OLE32.DLL is loaded into the process). Exceptions are pointers to the middle of item list which can not be freed independently. If you are worried about exceptions, guard your code with try/catch/finally and put the free code in the finally block.

Resources