cdev_alloc() vs cdev_init() - linux-kernel

In Linux kernel modules, two different approaches can be followed when creating a struct cdev, as suggested in this site and in this answer:
First approach, cdev_alloc()
struct cdev *my_dev;
...
static int __init example_module_init(void) {
...
my_dev = cdev_alloc();
if (my_dev != NULL) {
my_dev->ops = &my_fops; /* The file_operations structure */
my_dev->owner = THIS_MODULE;
}
else
...
}
Second approach, cdev_init()
static struct cdev my_cdev;
...
static int __init example_module_init(void) {
...
cdev_init(&my_cdev, my_fops);
my_cdev.owner = THIS_MODULE;
...
}
(assuming that my_fops is a pointer to an initialized struct file_operations).
Is the first approach deprecated, or still in use?
Can cdev_init() be used also in the first approach, with cdev_alloc()? If no, why?
The second question is also in a comment in the linked answer.

Can cdev_init() be used also in the first approach, with cdev_alloc()?
No, cdev_init shouldn't be used for a character device, allocated with cdev_alloc.
At some extent, cdev_alloc is equivalent to kmalloc plus cdev_init. So calling cdev_init for a character device, created with cdev_alloc, has no sense.
Moreover, a character device allocated with cdev_alloc contains a hint that the device should be deallocated when no longer be used. Calling cdev_init for that device will clear that hint, so you will get a memory leakage.
Selection between cdev_init and cdev_alloc depends on a lifetime you want a character device to have.
Usually, one wants lifetime of a character device to be the same as lifetime of the module. In that case:
Define a static or global variable of type struct cdev.
Create the character device in the module's init function using cdev_init.
Destroy the character device in the module's exit function using cdev_del.
Make sure that file operations for the character device have .owner field set to THIS_MODULE.
In complex cases, one wants to create a character device at specific point after module's initializing. E.g. a module could provide a driver for some hardware, and a character device should be bound with that hardware. In that case the character device cannot be created in the module's init function (because a hardware is not detected yet), and, more important, the character device cannot be destroyed in the module's exit function. In that case:
Define a field inside a structure, describing a hardware, of pointer type struct cdev*.
Create the character device with cdev_alloc in the function which creates (probes) a hardware.
Destroy the character device with cdev_del in the function which destroys (disconnects) a hardware.
In the first case cdev_del is called at the time, when the character device is not used by a user. This guarantee is provided by THIS_MODULE in the file operations: a module cannot be unloaded if a file, corresponded to the character device, is opened by a user.
In the second case there is no such guarantee (because cdev_del is called NOT in the module's exit function). So, at the time when cdev_del returns, a character device can be still in use by a user. And here cdev_alloc really matters: deallocation of the character device will be deferred until a user closes all file descriptors associated with the character device. Such behavior cannot be obtained without cdev_alloc.

They do different things. The preference would be usual - prefer not to use dynamic allocation when not needed and allocate on stack when it's possible.
cdev_alloc() dynamically allocates my_dev, so it will call kfree(pointer) when cdev_del().
cdev_init() will not free the pointer.
Most importantly, the lifetime of the structure my_cdev is different. In cdev_init() case struct cdev my_cdev is bound to the containing lexical scope, while cdev_alloc() returns dynamically allocate pointer valid up until free-d.

Related

What happens when the raw pointer from shared_ptr get() is deleted?

I wrote some code like this:
shared_ptr<int> r = make_shared<int>();
int *ar = r.get();
delete ar; // report double free or corruption
// still some code
When the code ran up to delete ar;, the program crashed, and reported​ "double free or corruption", I'm confused why double free? The "r" is still in the scope, and not popped-off from stack. Do the delete operator do something magic?? Does it know the raw pointer is handled by a smart pointer currently? and then counter in "r" be decremented to zero automatically?
I know the operations is not recommended, but I want to know why?
You are deleting a pointer that didn't come from new, so you have undefined behavior (anything can happen).
From cppreference on delete:
For the first (non-array) form, expression must be a pointer to an object type or a class type contextually implicitly convertible to such pointer, and its value must be either null or pointer to a non-array object created by a new-expression, or a pointer to a base subobject of a non-array object created by a new-expression. If expression is anything else, including if it is a pointer obtained by the array form of new-expression, the behavior is undefined.
If the allocation is done by new, we can be sure that the pointer we have is something we can use delete on. But in the case of shared_ptr.get(), we cannot be sure if we can use delete because it might not be the actual pointer returned by new.
shared_ptr<int> r = make_shared<int>();
There is no guarantee that this will call new int (which isn't strictly observable by the user anyway) or more generally new T (which is observable with a user defined, class specific operator new); in practice, it won't (there is no guarantee that it won't).
The discussion that follows isn't just about shared_ptr, but about "smart pointers" with ownership semantics. For any owning smart pointer smart_owning:
The primary motivation for make_owning instead of smart_owning<T>(new T) is to avoid having a memory allocation without owner at any time; that was essential in C++ when order of evaluation of expressions didn't provide the guarantee that evaluation of the sub-expressions in the argument list was immediately before call of that function; historically in C++:
f (smart_owning<T>(new T), smart_owning<U>(new U));
could be evaluated as:
T *temp1 = new T;
U *temp2 = new U;
auto &&temp3 = smart_owning<T>(temp1);
auto &&temp4 = smart_owning<U>(temp2);
This way temp1 and temp2 are not managed by any owning object for a non trivial time:
obviously new U can throw an exception
constructing an owning smart pointer usually requires the allocation of (small) ressources and can throw
So either temp1 or temp2 could be leaked (but not both) if an exception was thrown, which was the exact problem we were trying to avoid in the first place. This means composite expressions involving construction of owning smart pointers was a bad idea; this is fine:
auto &&temp_t = smart_owning<T>(new T);
auto &&temp_u = smart_owning<U>(new U);
f (temp_t, temp_u);
Usually expression involving as many sub-expression with function calls as f (smart_owning<T>(new T), smart_owning<U>(new U)) are considered reasonable (it's a pretty simple expression in term of number of sub-expressions). Disallowing such expressions is quite annoying and very difficult to justify.
[This is one reason, and in my opinion the most compelling reason, why the non determinism of the order of evaluation was removed by the C++ standardisation committee so that such code is not safe. (This was an issue not just for memory allocated, but for any managed allocation, like file descriptors, database handles...)]
Because code frequently needed to do things such as smart_owning<T>(allocate_T()) in sub-expressions, and because telling programmers to decompose moderately complex expressions involving allocation in many simple lines wasn't appealing (more lines of code doesn't mean easier to read), the library writers provided a simple fix: a function to do the creation of an object with dynamic lifetime and the creation of its owning object together. That solved the order of evaluation problem (but was complicated at first because it needed perfect forwarding of the arguments of the constructor).
Giving two tasks to a function (allocate an instance of T and a instance of smart_owning) gives the freedom to do an interesting optimization: you can avoid one dynamic allocation by putting both the managed object and its owner next to each others.
But once again, that was not the primary purpose of functions like make_shared.
Because exclusive ownership smart pointers by definition don't need to keep a reference count, and by definition don't need to share the data needed for the deleter either between instances, and so can keep that data in the "smart pointer"(*), no additional allocation is needed for the construction of unique_ptr; yet a make_unique function template was added, to avoid the dangling pointer issue, not to optimize a non-thing (an allocation that isn't done in the fist place).
(*) which BTW means unique owner "smart pointers" do not have pointer semantic, as pointer semantic implies that you can makes copies of the "pointer", and you can't have two copies of a unique owner pointing to the same instance; "smart pointers" were never pointers anyway, the term is misleading.
Summary:
make_shared<T> does an optional optimization where there is no separate dynamic memory allocation for T: there is no operator new(sizeof (T)). There is obviously still the creation of an instance with dynamic lifetime with another operator new: placement new.
If you replace the explicit memory deallocation with an explicit destruction and add a pause immediately after that point:
class C {
public:
~C();
};
shared_ptr<C> r = make_shared<C>();
C *ar = r.get();
ar->~C();
pause(); // stops the program forever
The program will probably run fine; it is still illogical, indefensible, incorrect to explicitly destroy an object managed by a smart pointer. It isn't "your" resource. If pause() could exit with an exception, the owning smart pointer would try to destroy the managed object which doesn't even exist anymore.
It of course depends on how library implements make_shared, however most probable implementation is that:
std::make_shared allocates one block for two things:
shared pointer control block
contained object
std::make_shared() will invoke memory allocator once and then it will call placement new twice to initialize (call constructors) of mentioned two things.
| block requested from allocator |
| shared_ptr control block | X object |
#1 #2 #3
That means that memory allocator has provided one big block, which address is #1.
Shared pointer then uses it for control block (#1) and actual contained object (#2).
When you invoke delete with actual object kept by shred_ptr ( .get() ) you call delete(#2).
Because #2 is not known by allocator you get an corruption error.
See here. I quot:
std::shared_ptr is a smart pointer that retains shared ownership of an object through a pointer. Several shared_ptr objects may own the same object. The object is destroyed and its memory deallocated when either of the following happens:
the last remaining shared_ptr owning the object is destroyed;
the last remaining shared_ptr owning the object is assigned another pointer via operator= or reset().
The object is destroyed using delete-expression or a custom deleter that is supplied to shared_ptr during construction.
So the pointer is deleted by shared_ptr. You're not suppose to delete the stored pointer yourself
UPDATE:
I didn't realize that there were more statements and the pointer was not out of scope, I'm sorry.
I was reading more and the standard doesn't say much about the behavior of get() but here is a note, I quote:
A shared_ptr may share ownership of an object while storing a pointer to another object. get() returns the stored pointer, not the managed pointer.
So it looks that it is allowed that the pointer returned by get() is not necessarily the same pointer allocated by the shared_ptr (presumably using new). So delete that pointer is undefined behavior. I will be looking a little more into the details.
UPDATE 2:
The standard says at § 20.7.2.2.6 (about make_shared):
6 Remarks: Implementations are encouraged, but not required, to perform no more than one memory allocation. [ Note: This provides efficiency equivalent to an intrusive smart pointer. — end note ]
7 [ Note: These functions will typically allocate more memory than sizeof(T) to allow for internal bookkeeping structures such as the reference counts. — end note ]
So an specific implementation of make_shared could allocate a single chunk of memory (or more) and use part of that memory to initialize the stored pointer (but maybe not all the memory allocated). get() must return a pointer to the stored object, but there is no requirement by the standard, as previously said, that the pointer returned by get() has to be the one allocated by new. So delete that pointer is undefined behavior, you got a signal raised but anything can happen.

How to access a dynamic character device from userspace?

The register_chrdev() function in kernel registers a character device:
int register_chrdev(unsigned int major, const char*name,
struct file_operations*ops));
If major is 0 the kernel dynamically allocates a major number and the register function returns it.
Now, let's assume a module foo.ko wants to use /dev/foo with a dynamic major number. How does the userspace learn what major number to pass to mknod to create /dev/foo ?
As soon as a character device gets registered with a dynamic major number, the corresponding information appears in /proc/devices and thus can be retrieved by a user-space application/script in order to create an appropriate node.
For a better example you may refer to Linux Device Drivers book (3rd edition), for instance, a script to read /proc/devices is shown on this page.

Successfully de-referenced userspace pointer in kernel space without using copy_from_user()

There's a bug in a driver within our company's codebase that's been there for years.
Basically we make calls to the driver via ioctls. The data passed between userspace and driver space is stored in a struct, and the pointer to the data is fed into the ioctl. The driver is responsible to dereference the pointer by using copy_from_user(). But this code hasn't been doing that for years, instead just dereferencing the userspace pointer. And so far (that I know of) it hasn't caused any issues until now.
I'm wondering how this code hasn't caused any problems for so long? In what case will dereferencing a pointer in kernel space straight from userspace not cause a problem?
In userspace
struct InfoToDriver_t data;
data.cmd = DRV_SET_THE_CLOCK;
data.speed = 1000;
ioctl(driverFd, DEVICE_XX_DRIVER_MODIFY, &data);
In the driver
device_xx_driver_ioctl_handler (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
struct InfoToDriver_t *user_data;
switch(cmd)
{
case DEVICE_XX_DRIVER_MODIFY:
// what we've been doing for years, BAD
// But somehow never caused a kernel oops until now
user_data = (InfoToDriver_t *)arg;
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ .... }
// what we're supposed to do
copy_from_user(user_data, (void *)arg, sizeof(InfoToDriver_t));
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ ... }
A possible answer is, this depends on the architecture. As you have seen, on a sane architecture (such as x86 or x86-64) simply dereferencing __user pointers just works. But Linux pretends to support every possible architecture, there are architectures where simple dereference does not work. Otherwise copy_to/from_user won't existed.
Another reason for copy_to/from_user is possibility that usermode side modifies its memory simultaneously with the kernel side (in another thread). You cannot assume that the content of usermode memory is frozen while accessing it from kernel. Rougue usermode code can use this to attack the kernel. For example, you can probe the pointer to output data before executing the work, but when you get to copy the result back to usermode, this pointer is already invalid. Oops. The copy_to_user API ensures (should ensure) that the kernel won't crash during the copy, instead the guilty application will be killed.
A safer approach is to copy the whole usermode data structure into kernel (aka 'capture'), check this copy for consistency.
The bottom line... if this driver is proven to work well on certain architecture, and there are no plans to port it, there's no urgency to change it. But check carefully robustness of the kernel code, if capture of the usermode data is needed, or problem may arise during copying from usermode.

How to translate a delegate to absolute address in DRAM?

I'd like to translate the delegate members .ptr and .funcptr to an absolute address that matches something in the executable image in DRAM.
The goal is not to call, neither to modify, but rather to allow the target to disassemble itself at run-time, when its own image is loaded in DRAM.
So far it already works with global functions.
Is it possible ?
The address of a delegate is the value of the .funcptr property. The type of this property is a bit misleading - it is of type function and does not list the hidden argument that is actually expected for passing the context in, but for just getting the address, you can ignore the type (explicitly casting to void* or size_t if you like to change the type) and just look at the address.
This isn't the address in physical memory, you'd have to ask the operating system for that, but since the virtual address it gives is automatically translated by the processor, it is most likely what you want anyway.

What does actually cdev_add() do? in terms of registering a device to the kernel

What does cdev_add() actually do? I'm asking terms of registering a device with the kernel.
Does it add the pointer to cdev structure in some map which is indexed by major and minor number? How exactly does this happen when you say the device is added/registered with the kernel. I want to know what steps the cdev_add takes to register the device in the running kernel. We create a node for user-space using mknod command. Even this command is mapped using major and minor number. Does registration also do something similar?
cdev_add registers a character device with the kernel. The kernel maintains a list of character devices under cdev_map
static struct kobj_map *cdev_map;
kobj_map is basically an array of probes, which in this case is the list of character devices:
struct kobj_map {
struct probe {
struct probe *next;
dev_t dev;
unsigned long range;
struct module *owner;
kobj_probe_t *get;
int (*lock)(dev_t, void *);
void *data;
} *probes[255];
struct mutex *lock;
};
You can see that each entry in the list has the major and minor number for the device (dev_t dev), and the device structure (in the form of kobj_probe_t, which is a kernel object, which represents a cdev in this case). cdev_add adds your character device to the probes list:
int cdev_add(struct cdev *p, dev_t dev, unsigned count)
{
...
error = kobj_map(cdev_map, dev, count, NULL,
exact_match, exact_lock, p);
When you do an open on a device from a process, the kernel finds the inode associated to the filename of your device (via namei function). The inode has the major a minor number for the device (dev_t i_rdev), and flags (imode) indicating that it is a special (character) device. With this it can access the cdev list I explained above, and get the cdev structure instantiated for your device. From there it can create a struct file with the file operations to your cdev, and install a file descriptor in the process's file descriptor table.
This is what actually 'registering' a character device means and why it needs to be done. Registering a block device is similar. The kernel maintains another list for registered gendisks.
You can read Linux Device Driver. It is a little bit old, but the main ideas are the same. It is difficoult to explain a simple operation like cdev_add() and all the stuff around in few lines.
I suggest you to read the book and the source code. If you have trouble to navigate your source code, you can use some tag system like etags + emacs, or the eclipse indexer.
Please see the code comments here:
cdev_add() - add a char device to the system 464 *
#p: the cdev structure for the device 465 * #dev: the first device
number for which this device is responsible 466 * #count: the number
of consecutive minor numbers corresponding to this 467 *
device 468 * 469 * cdev_add() adds the device represented by #p to
the system, making it 470 * live immediately. A negative error code
is returned on failure. 471 */ `
the immediate answer to any such question is read the code. Thats what Linus say.
[edit]
the cdev_add basically adds the device to the system. What it means essentially is that after the cdev_add operation your new device will get visibility through the /sys/ file system. The function does all the necessary house keeping activities related to that particularly the kobj reference to your device will get inserted at its position in the object hierarchy. If you want to get more information about it, I would suggest some reading around /sysfs/ and struct kboj

Resources