The register_chrdev() function in kernel registers a character device:
int register_chrdev(unsigned int major, const char*name,
struct file_operations*ops));
If major is 0 the kernel dynamically allocates a major number and the register function returns it.
Now, let's assume a module foo.ko wants to use /dev/foo with a dynamic major number. How does the userspace learn what major number to pass to mknod to create /dev/foo ?
As soon as a character device gets registered with a dynamic major number, the corresponding information appears in /proc/devices and thus can be retrieved by a user-space application/script in order to create an appropriate node.
For a better example you may refer to Linux Device Drivers book (3rd edition), for instance, a script to read /proc/devices is shown on this page.
Related
In Linux kernel modules, two different approaches can be followed when creating a struct cdev, as suggested in this site and in this answer:
First approach, cdev_alloc()
struct cdev *my_dev;
...
static int __init example_module_init(void) {
...
my_dev = cdev_alloc();
if (my_dev != NULL) {
my_dev->ops = &my_fops; /* The file_operations structure */
my_dev->owner = THIS_MODULE;
}
else
...
}
Second approach, cdev_init()
static struct cdev my_cdev;
...
static int __init example_module_init(void) {
...
cdev_init(&my_cdev, my_fops);
my_cdev.owner = THIS_MODULE;
...
}
(assuming that my_fops is a pointer to an initialized struct file_operations).
Is the first approach deprecated, or still in use?
Can cdev_init() be used also in the first approach, with cdev_alloc()? If no, why?
The second question is also in a comment in the linked answer.
Can cdev_init() be used also in the first approach, with cdev_alloc()?
No, cdev_init shouldn't be used for a character device, allocated with cdev_alloc.
At some extent, cdev_alloc is equivalent to kmalloc plus cdev_init. So calling cdev_init for a character device, created with cdev_alloc, has no sense.
Moreover, a character device allocated with cdev_alloc contains a hint that the device should be deallocated when no longer be used. Calling cdev_init for that device will clear that hint, so you will get a memory leakage.
Selection between cdev_init and cdev_alloc depends on a lifetime you want a character device to have.
Usually, one wants lifetime of a character device to be the same as lifetime of the module. In that case:
Define a static or global variable of type struct cdev.
Create the character device in the module's init function using cdev_init.
Destroy the character device in the module's exit function using cdev_del.
Make sure that file operations for the character device have .owner field set to THIS_MODULE.
In complex cases, one wants to create a character device at specific point after module's initializing. E.g. a module could provide a driver for some hardware, and a character device should be bound with that hardware. In that case the character device cannot be created in the module's init function (because a hardware is not detected yet), and, more important, the character device cannot be destroyed in the module's exit function. In that case:
Define a field inside a structure, describing a hardware, of pointer type struct cdev*.
Create the character device with cdev_alloc in the function which creates (probes) a hardware.
Destroy the character device with cdev_del in the function which destroys (disconnects) a hardware.
In the first case cdev_del is called at the time, when the character device is not used by a user. This guarantee is provided by THIS_MODULE in the file operations: a module cannot be unloaded if a file, corresponded to the character device, is opened by a user.
In the second case there is no such guarantee (because cdev_del is called NOT in the module's exit function). So, at the time when cdev_del returns, a character device can be still in use by a user. And here cdev_alloc really matters: deallocation of the character device will be deferred until a user closes all file descriptors associated with the character device. Such behavior cannot be obtained without cdev_alloc.
They do different things. The preference would be usual - prefer not to use dynamic allocation when not needed and allocate on stack when it's possible.
cdev_alloc() dynamically allocates my_dev, so it will call kfree(pointer) when cdev_del().
cdev_init() will not free the pointer.
Most importantly, the lifetime of the structure my_cdev is different. In cdev_init() case struct cdev my_cdev is bound to the containing lexical scope, while cdev_alloc() returns dynamically allocate pointer valid up until free-d.
I followed some tutorials that explained how to write Linux kernel modules and I am a bit confused. Even after reading the official "documentation", I have poor understanding of the concepts.
After creating a character device (register_chrdev), I see it is common to use a combination of the following functions:
class_create
class_device_create
device_create
I was not able to understand, what is a class, device and, class device and driver?
Which one of these actually responsible to create an entry under /proc/?
Rather than going into what's a class, or what's a device (I'm no expert in Linux kernel), I will address the question as follows.
After creating the character device, you want to be able to access it from the user space. To do this, you need to add a device node under /dev. You can do this in two ways.
Use mknod to manually add a device node (old)
mknod /dev/<name> c <major> <minor>
OR
Use udev
This is where the class_create and device_create or class_device_create (old) come in.
To notify udev from your kernel module, you first create a virtual device class using
struct class * class_create(owner, name)
Now, the name will appear in /sys/class/<name>.
Then, create a device and register it with sysfs.
struct device *device_create(struct class *class, struct device *parent,
dev_t devt, void *drvdata, const char *fmt, ...)
Now, device name will appear in /sys/devices/virtual/<class name>/<device name> and /dev/<device name>
It's not clear what you are asking about the /proc entry.
After your module is loaded, it will appear in /proc/modules (do a cat /proc/modules to see it). And, after you allocate the device numbers, say with
int register_chrdev_region(dev_t first, unsigned int count, char *name)
, the name will appear in /proc/devices (do a cat /proc/devices to see it).
And, please check the kernel sources for these functions as well, as they provide a good description of what they do in their comments.
The good old LDD3 does not provide these mechanisms, but it's a very good source.
In linux kernel, when I do
cat /proc/pid/maps
I get some entries that map the files in /dev/XXX. I understand this is the device file, which corresponds to hardware devices instead of actual files. How does the memory management in linux kernel handle such mapping? What happens if I read or write to /dev/XXX?
This is what I believe is correct: When a device maps a particular memory region, say X to Y, a vm_area_struct (vm_start = X and vm_end = Y) is associated with that region and then mapped into the process's virtual memory map. That vm_are_struct is then associated with a vm_operations_struct that supplied by the device driver.
That means a device driver will implement some or all the functions in the vm_operations_struct. The most important one being the fault function.
Now, assume the process references a page in that area for the first time. That will trigger the page fault handler. The page fault has this code in it:
3646 if (pte_none(entry)) {
3647 if (vma->vm_ops) {
3648 if (likely(vma->vm_ops->fault))
3649 return do_linear_fault(mm, vma, address,
3650 pte, pmd, flags, entry);
3651 }
3652 return do_anonymous_page(mm, vma, address,
3653 pte, pmd, flags);
3654 }
Notice line #3648. It checks if the vm_operations_struct is implemented for that vm_area_struct. If it is, it will call the "fault" member function. Look at some of the implementations for that function.
The implementation of this function should return a pointer to page struct. The page itself will be allocated and populated with data by the device driver (i.e. "fault" function).
The rest of the page handler will associate a pte entry with that page. That means the next time the page is accessed and causes a page fault the check pte_none on line #3646 will fail. This will result in skipping the device driver fault function the second time that particular page is referenced.
Note: a vm_area_struct may be mapped to more than one page struct. So, one vm_area_struct may fit N*4KB. Assuming PAGE_SIZE is 4KB and N=1,2,...,X
What does cdev_add() actually do? I'm asking terms of registering a device with the kernel.
Does it add the pointer to cdev structure in some map which is indexed by major and minor number? How exactly does this happen when you say the device is added/registered with the kernel. I want to know what steps the cdev_add takes to register the device in the running kernel. We create a node for user-space using mknod command. Even this command is mapped using major and minor number. Does registration also do something similar?
cdev_add registers a character device with the kernel. The kernel maintains a list of character devices under cdev_map
static struct kobj_map *cdev_map;
kobj_map is basically an array of probes, which in this case is the list of character devices:
struct kobj_map {
struct probe {
struct probe *next;
dev_t dev;
unsigned long range;
struct module *owner;
kobj_probe_t *get;
int (*lock)(dev_t, void *);
void *data;
} *probes[255];
struct mutex *lock;
};
You can see that each entry in the list has the major and minor number for the device (dev_t dev), and the device structure (in the form of kobj_probe_t, which is a kernel object, which represents a cdev in this case). cdev_add adds your character device to the probes list:
int cdev_add(struct cdev *p, dev_t dev, unsigned count)
{
...
error = kobj_map(cdev_map, dev, count, NULL,
exact_match, exact_lock, p);
When you do an open on a device from a process, the kernel finds the inode associated to the filename of your device (via namei function). The inode has the major a minor number for the device (dev_t i_rdev), and flags (imode) indicating that it is a special (character) device. With this it can access the cdev list I explained above, and get the cdev structure instantiated for your device. From there it can create a struct file with the file operations to your cdev, and install a file descriptor in the process's file descriptor table.
This is what actually 'registering' a character device means and why it needs to be done. Registering a block device is similar. The kernel maintains another list for registered gendisks.
You can read Linux Device Driver. It is a little bit old, but the main ideas are the same. It is difficoult to explain a simple operation like cdev_add() and all the stuff around in few lines.
I suggest you to read the book and the source code. If you have trouble to navigate your source code, you can use some tag system like etags + emacs, or the eclipse indexer.
Please see the code comments here:
cdev_add() - add a char device to the system 464 *
#p: the cdev structure for the device 465 * #dev: the first device
number for which this device is responsible 466 * #count: the number
of consecutive minor numbers corresponding to this 467 *
device 468 * 469 * cdev_add() adds the device represented by #p to
the system, making it 470 * live immediately. A negative error code
is returned on failure. 471 */ `
the immediate answer to any such question is read the code. Thats what Linus say.
[edit]
the cdev_add basically adds the device to the system. What it means essentially is that after the cdev_add operation your new device will get visibility through the /sys/ file system. The function does all the necessary house keeping activities related to that particularly the kobj reference to your device will get inserted at its position in the object hierarchy. If you want to get more information about it, I would suggest some reading around /sysfs/ and struct kboj
I am learning linux network driver recently, and I wonder that if I have many network cards in same type on my board, how does the kernel drive them? Does the kernel need to load the same driver many times? I think it's not possible, insmod won't do that, so how can I make all same kind cards work at same time?
regards
The state of every card (I/O addresses, IRQs, ...) is stored into a driver-specific structure that is passed (directly or indirectly) to every entry point of the driver which can this way differenciate the cards. That way the very same code can control different cards (which means that yes, the kernel only keeps one instance of a driver's module no matter the number of devices it controls).
For instance, have a look at drivers/video/backlight/platform_lcd.c, which is a very simple LCD power driver. It contains a structure called platform_lcd that is private to this file and stores the state of the LCD (whether it is powered, and whether it is suspended). One instance of this structure is allocated in the probe function of the driver through kzalloc - that is, one per LCD device - and stored into the platform device representing the LCD using platform_set_drvdata. The instance that has been allocated for this device is then fetched back at the beginning of all other driver functions so that it knows which instance it is working on:
struct platform_lcd *plcd = to_our_lcd(lcd);
to_our_lcd expands to lcd_get_data which itself expands to dev_get_drvdata (a counterpart of platform_set_drvdata) if you look at include/linux/lcd.h. The function can then know the state of the device is has been invoked for.
This is a very simple example, and the platform_lcd driver does not directly control any device (this is deferred to a function pointer in the platform data), but add hardware-specific parameters (IRQ, I/O base, etc.) and you get how 99% of the drivers in Linux work.
The driver code is only loaded once, but it allocates a separate context structure for each card. Typically you will see a struct pci_driver with a .probe function pointer. The probe function is called once for each card by the PCI support code, and it calls alloc_etherdev to allocate a network interface with space for whatever private context it needs.