How does Linux kernel find dirty page to flush? - memory-management

Since the pages are stored in address_space within each inode, how does the background page cache flush thread know all dirty pages?

They're all in one place:
struct bdi_writeback {
struct backing_dev_info *bdi; /* our parent bdi */
unsigned int nr;
unsigned long last_old_flush; /* last old data flush */
unsigned long last_active; /* last time bdi thread was active */
struct task_struct *task; /* writeback thread */
struct timer_list wakeup_timer; /* used for delayed bdi thread wakeup */
struct list_head b_dirty; /* dirty inodes */
struct list_head b_io; /* parked for writeback */
struct list_head b_more_io; /* parked for more writeback */
spinlock_t list_lock; /* protects the b_* lists */
};
b_dirty is the list you're looking for.
For some information on how the flushing happens, take a look in here. The code is kind of complex though. To summarize (alot), consider that in the default configuration data written to disk will sit in memory until either a) they're more than 30 seconds old, or b) the dirty pages have consumed more than 10% of the active, working memory.

On the x86 platform the OS must inspect its page table entries to find the pages that are dirty. There's a special, dirty, bit in them that's set by the CPU automatically during memory writes. There must be some code that scans the PTEs for dirty=1.

Related

Create array of struct scatterlist from buffer

I am trying to build an array of type "struct scatterlist", from a buffer pointed by a virtual kernel address (I know the byte size of the buffer, but it may be large). Ideally I would like to have function like init_sg_array_from_buf:
void my_function(void *buffer, int buffer_length)
{
struct scatterlist *sg;
int sg_count;
sg_count = init_sg_array_from_buf(buffer, buffer_length, sg);
}
Which function in the scatterlist api, does something similar? Currently the only possibility I see, is to manually determine the amount of pages, spanned by the buffer. Windows has a kernel macro called "ADDRESS_AND_SIZE_TO_SPAN_PAGES", but I didn't even manage to find something like this in the linux kernel.

Structure defined in a dynamically loaded library

I am dynamically loading the cudart (Cuda Run Time Library) to access just the cudaGetDeviceProperties function. This one requires two arguments:
A cudaDeviceProp structure which is defined in a header of the run time library;
An integer which represents the device ID.
I am not including the cuda_runtime.h header in order to not get extra constants, macros, enum, class... that I do not want to use.
However, I need the cudaDeviceProp structure. Is there a way to get it without redefining it? I wrote the following code:
struct cudaDeviceProp;
class CudaRTGPUInfoDL
{
typedef int(*CudaDriverVersion)(int*);
typedef int(*CudaRunTimeVersion)(int*);
typedef int(*CudaDeviceProperties)(cudaDeviceProp*,int);
public:
struct Properties
{
char name[256]; /**< ASCII string identifying device */
size_t totalGlobalMem; /**< Global memory available on device in bytes */
size_t sharedMemPerBlock; /**< Shared memory available per block in bytes */
int regsPerBlock; /**< 32-bit registers available per block */
int warpSize; /**< Warp size in threads */
size_t memPitch; /**< Maximum pitch in bytes allowed by memory copies */
/*... Tons of members follow..*/
};
public:
CudaRTGPUInfoDL();
~CudaRTGPUInfoDL();
int getCudaDriverVersion();
int getCudaRunTimeVersion();
const Properties& getCudaDeviceProperties();
private:
QLibrary library;
private:
CudaDriverVersion cuDriverVer;
CudaRunTimeVersion cuRTVer;
CudaDeviceProperties cuDeviceProp;
Properties properties;
};
As everybody can see, I simply "copy-pasted" the declaration of the structure.
In order to get the GPU properties, I simply use this method:
const CudaRTGPUInfoDL::Properties& CudaRTGPUInfoDL::getCudaDeviceProperties()
{
// Unsafe but needed.
cuDeviceProp(reinterpret_cast<cudaDeviceProp*>(&properties), 0);
return properties;
}
Thanks for your answers.
If you need the structure to be complete, you should define it (probably by including the appropriate header).
If you're just going to be passing around references or pointers, such as in the method you show, then it doesn't need to be complete and can just be forward declared:
class cudaDeviceProp;

dma_common_mmap documentation to let user read/write physical address

I am trying to write a Linux kernel module to map some address back to the user using dma_common_mmap(). I then want the user to mmap and write/read the address space.
My main problem now is that I can't find the documentation for dma_common_mmap(), does any exist? I have googled but didn't find out how to use it and let the user read/write the address.
The documentation for dma_common_mmap() doesn't exist. But you can look at Doxygen comment for dma_mmap_attrs() function:
/**
* dma_mmap_attrs - map a coherent DMA allocation into user space
* #dev: valid struct device pointer, or NULL for ISA and EISA-like devices
* #vma: vm_area_struct describing requested user mapping
* #cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
* #handle: device-view address returned from dma_alloc_attrs
* #size: size of memory originally requested in dma_alloc_attrs
* #attrs: attributes of mapping properties requested in dma_alloc_attrs
*
* Map a coherent DMA buffer previously allocated by dma_alloc_attrs
* into user space. The coherent DMA buffer must not be freed by the
* driver until the user space mapping has been released.
*/
static inline int
dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr,
dma_addr_t dma_addr, size_t size, struct dma_attrs *attrs)
{
struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!ops);
if (ops->mmap)
return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
}
#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, NULL)
dma_mmap_attrs() calls in turn dma_common_mmap(), so all the documentation (except for attrs param) applies to dma_common_mmap() as is.
EDIT
I think you should use dma_mmap_coherent() (along with dma_alloc_coherent()), which does pretty much the same as dma_common_mmap() (see code above). See this example to get some clue on how to use it both in kernel side and in user-space. See also how dma_mmap_coherent() is used in ALSA kernel code, in snd_pcm_lib_default_mmap() function.

Memory limit for mmap

I am trying to mmap a char device. It works for 65536 bytes. But I get the following error if I try for more memory.
mmap: Resource temporarily unavailable
I want to mmap 1MB memory for a device. I use alloc_chrdev_region, cdev_init, cdev_add for the char device. How can I mmap memory larger than 65K? Should I use block device?
Using the MAP_LOCKED flag in the mmap call can cause this error. The used mlock can return EAGAIN if the amount of memory can not be locked.
From man mmap:
MAP_LOCKED (since Linux 2.5.37) Lock the pages of the mapped region
into memory in the manner of mlock(2). This flag is ignored in older
kernels.
From man mlock:
EAGAIN:
Some or all of the specified address range could not be
locked.
Did you implement *somedevice_mmap()* file operation?
static int somedev_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* Do something. You probably need to use ioremap(). */
return 0;
}
static const struct file_operations somedev_fops = {
.owner = THIS_MODULE,
/* Initialize other file operations. */
.mmap = somedev_mmap,
};

What is the use of ssi_code in signalfd_siginfo structure?

I am using signalfd() to monitor the death of child processes created by my process. If I kill a child process with a signal, parent gets a read event on the signal fd with signalfd_siginfo structure populated. It has a field ssi_code which is set to the signal number the child received (for example 9 if I sent SIGKILL to the child).
Can I rely on this behavior always ? All versions of Linux kernel where signalfd is supported has the same usage for this field ?
Note: If the child calls exit() then the code passed to exit is populated in ssi_code.
man page of signalfd states :
The format of the signalfd_siginfo structure(s) returned by read(2)s from a signalfd file descriptor is as follows:
struct signalfd_siginfo {
uint32_t ssi_signo; /* Signal number */
int32_t ssi_errno; /* Error number (unused) */
int32_t ssi_code; /* Signal code */
uint32_t ssi_pid; /* PID of sender */
uint32_t ssi_uid; /* Real UID of sender */
int32_t ssi_fd; /* File descriptor (SIGIO) */
uint32_t ssi_tid; /* Kernel timer ID (POSIX timers)
uint32_t ssi_band; /* Band event (SIGIO) */
uint32_t ssi_overrun; /* POSIX timer overrun count */
uint32_t ssi_trapno; /* Trap number that caused signal */
int32_t ssi_status; /* Exit status or signal (SIGCHLD) */
int32_t ssi_int; /* Integer sent by sigqueue(2) */
uint64_t ssi_ptr; /* Pointer sent by sigqueue(2) */
uint64_t ssi_utime; /* User CPU time consumed (SIGCHLD) */
uint64_t ssi_stime; /* System CPU time consumed (SIGCHLD) */
uint64_t ssi_addr; /* Address that generated signal
(for hardware-generated signals) */
uint8_t pad[X]; /* Pad size to 128 bytes (allow for
additional fields in the future) */
};
It seems clear : ssi_signo contains the signal number. It says about ssi_code that :
Not all fields in the returned signalfd_siginfo structure will be valid for a specific signal; the set of valid fields can be determined
from the value returned in the ssi_code field. This field is the
analog of the siginfo_t si_code field; see sigaction(2) for details.
See sigaction man page for more details about this code which is not the signal number.

Resources