Prefetching in vm_fault(), Linux drivers - memory-management

I'm implementing a simple device driver. The program that uses this driver takes in arguments from the user whether to use demand paging or prefetching(fetches next page only). But when the user requests for prefetching is should send this information to the driver. The problem is vm_fault has a standard structure as follows:
int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
So how to incorporate this additional information of prefetching into these, so that I can use it to write a different routine for prefetching?
Or is there any other way to achieve this?
[EDIT]
To give a clearer picture:
This how a program takes input.
./user_prog [filename] --prefetch
The user_prog sets some flags in it, now how to send these flags information to dev.c(the driver file), as all the arguments to functions are fixed like above fault(). I hope this gives more clarification.

You can use the flags in mmap() to pass your custom flags too.
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
Make sure your custom flag values uses bits different from the flag values used by mmap(). From the manpage, the macros are defined in sys/mman.h. Find the exact values(may vary across systems) with echo '#include <sys/mman.h>' | gcc -E - -dM | grep MAP_*. My system has this:
#define MAP_32BIT 0x40
#define MAP_TYPE 0x0f
#define MAP_EXECUTABLE 0x01000
#define MAP_FAILED ((void *) -1)
#define MAP_PRIVATE 0x02
#define MAP_ANON MAP_ANONYMOUS
#define MAP_LOCKED 0x02000
#define MAP_STACK 0x20000
#define MAP_NORESERVE 0x04000
#define MAP_HUGE_SHIFT 26
#define MAP_POPULATE 0x08000
#define MAP_DENYWRITE 0x00800
#define MAP_FILE 0
#define MAP_SHARED 0x01
#define MAP_GROWSDOWN 0x00100
#define MAP_HUGE_MASK 0x3f
#define MAP_HUGETLB 0x40000
#define MAP_FIXED 0x10
#define MAP_ANONYMOUS 0x20
#define MAP_NONBLOCK 0x10000
Some non-clashing flags would be 0x200 and 0x400.

Related

can't find the definition of __arm64_sys_ppoll function in linux source

I'm trying to boot a development board containing arm64 core using busybox, u-boot and linux-5.10.0-rc5. The boot process is almost complete but when it enters the shell program, it stops shortly after(with no kernel panic). It doesn't even show the '#' prompt (but with qemu model, the image and busybox works ok with normal shell at the end). I could see that before it stops, there are some system calls from the busybox coming to the kernel, and when it stopped, it was processing system call 73.
(You can follow from arch/arm64/kernel/syscall.c, do_el0_svc () -> el0_svc_common -> invoke_syscall -> __invoke_syscall -> syscall_fn
By examining the files I could see syscall 73 is sys_ppoll. (in include/uapi/asm-generic/unistd.h). I found in include/uapi/asm-generic/unistd.h,
/* fs/select.c */
#if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
#define __NR_pselect6 72
__SC_COMP_3264(__NR_pselect6, sys_pselect6_time32, sys_pselect6, compat_sys_pselect6_time32)
#define __NR_ppoll 73
__SC_COMP_3264(__NR_ppoll, sys_ppoll_time32, sys_ppoll, compat_sys_ppoll_time32)
#endif
The definition of __SC_COMP_3264 is at the first lines of the same file. To see what lines are selected and compiled by the #if/#endif macros, I tried adding a characters 'x' to cause compile error and I could see what lines are compiled. That is shown below.
#ifndef __SYSCALL
x <---- compile error, so compiled, and __SYSCALL(x,y) defined to be nothing?
#define __SYSCALL(x, y)
#endif
#if __BITS_PER_LONG == 32 || defined(__SYSCALL_COMPAT)
x <--------- no compile error, so not compiled
#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _32)
#else
#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _64)
#endif
#ifdef __SYSCALL_COMPAT
x <-------------- no compile error, so not compiled
#define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _comp)
#define __SC_COMP_3264(_nr, _32, _64, _comp) __SYSCALL(_nr, _comp)
#else
#define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _sys)
#define __SC_COMP_3264(_nr, _32, _64, _comp) __SC_3264(_nr, _32, _64)
#endif
So this means __SYSCALL(x, y) is defined to be doing nothing. But if that was true, all the other syscall would have done nothing and I figured __SYSCALL was defined previously and found in arch/arm64/kernel/sys.c
#undef __SYSCALL
#define __SYSCALL(nr, sym) asmlinkage long __arm64_##sym(const struct pt_regs *);
#include <asm/unistd.h>
So the function definition becomes __arm64_sys_ppoll and I can see it in the System.map file.
But I couldn't find the definition of __arm64_sys_ppoll. Where can I find the source? My another question is, how can below line be compiled and make error when I do make -j28?
#ifndef __SYSCALL
x <---- compile error, so compiled, and __SYSCALL(x,y) defined to be nothing?
#define __SYSCALL(x, y)
#endif
By the way, this is what I see when I grep for sys_ppoll in the source(excluding all non-arm64 arch files).
./include/linux/compat.h:asmlinkage long compat_sys_ppoll_time32(struct pollfd __user *ufds,
./include/linux/compat.h:asmlinkage long compat_sys_ppoll_time64(struct pollfd __user *ufds,
./include/linux/syscalls.h:asmlinkage long sys_ppoll(struct pollfd __user *, unsigned int,
./include/linux/syscalls.h:asmlinkage long sys_ppoll_time32(struct pollfd __user *, unsigned int,
./include/uapi/asm-generic/unistd.h:__SC_COMP_3264(__NR_ppoll, sys_ppoll_time32, sys_ppoll, compat_sys_ppoll_time32)
./include/uapi/asm-generic/unistd.h:__SC_COMP(__NR_ppoll_time64, sys_ppoll, compat_sys_ppoll_time64)
./tools/include/uapi/asm-generic/unistd.h:__SC_COMP_3264(__NR_ppoll, sys_ppoll_time32, sys_ppoll, compat_sys_ppoll_time32)
./tools/include/uapi/asm-generic/unistd.h:__SC_COMP(__NR_ppoll_time64, sys_ppoll, compat_sys_ppoll_time64)
./arch/arm64/include/asm/unistd32.h:__SYSCALL(__NR_ppoll, compat_sys_ppoll_time32)
./arch/arm64/include/asm/unistd32.h:__SYSCALL(__NR_ppoll_time64, compat_sys_ppoll_time64)
Thanks for reading and sorry for the long question.
For SYSCALL_DEFINE0, ..., SYSCALL_DEFINE6, first see #Ian Abbott's comment to my original question.
In file include/uapi/asm-generic/unistd.h, you can see the syscall definitions.
For example if you want to see the source code for shmat, you can use grep to see in the file these lines,
/* ipc/shm.c */
#define __NR_shmget 194
__SYSCALL(__NR_shmget, sys_shmget)
#define __NR_shmctl 195
__SC_COMP(__NR_shmctl, sys_shmctl, compat_sys_shmctl)
#define __NR_shmat 196
__SC_COMP(__NR_shmat, sys_shmat, compat_sys_shmat)
#define __NR_shmdt 197
__SYSCALL(__NR_shmdt, sys_shmdt)
So it says the definition is in file ipc/shm.c. There you can see these lines.
SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg)
{
unsigned long ret;
long err;
err = do_shmat(shmid, shmaddr, shmflg, &ret, SHMLBA);
if (err)
return err;
force_successful_syscall_return();
return (long)ret;
}
You can see the definition of shmat function with argments list.

PE/COFF symbol type field

Microsoft's documentation for PE/COFF says of the type field in the symbol table:
"The most significant byte specifies whether the symbol is a pointer to, function returning, or array of the base type that is specified in the LSB. Microsoft tools use this field only to indicate whether the symbol is a function, so that the only two resulting values are 0x0 and 0x20 for the Type field."
However, the documentation and winnt.h both specify that IMAGE_SYM_DTYPE_FUNCTION = 2, not 0x20. Even if this is taken to be the value of the MSB, that would give a value for the entire field of 0x200, not 0x20.
What am I missing?
Check winnt.h for following lines:
// type packing constants
#define N_BTMASK 0x000F
#define N_TMASK 0x0030
#define N_TMASK1 0x00C0
#define N_TMASK2 0x00F0
#define N_BTSHFT 4
#define N_TSHIFT 2
// MACROS
// Basic Type of x
#define BTYPE(x) ((x) & N_BTMASK)
// Is x a pointer?
#ifndef ISPTR
#define ISPTR(x) (((x) & N_TMASK) == (IMAGE_SYM_DTYPE_POINTER << N_BTSHFT))
#endif
// Is x a function?
#ifndef ISFCN
#define ISFCN(x) (((x) & N_TMASK) == (IMAGE_SYM_DTYPE_FUNCTION << N_BTSHFT))
#endif
So it seems official MSB, LSB description is wrong - they are not bytes but nibbles. So 0x20 would be a function (MS nibble = 2) returning base type of IMAGE_SYM_TYPE_NULL (LS nibble = 0) .

ffmpeg exit status -1094995529

I'm developing an application that makes calls to ffprobe that return the unorthodox exit status of -1094995529 for certain files when on Windows. This exit status is given consistently, and there is some minor discussion of this.
Why is this value given, and where is it documented? Can I expect this status to be different on a unix machine where the allowed exit statuses are more constrained?
Error codes from ffmpeg (error.h from avutil) :
http://ffmpeg.org/doxygen/trunk/error_8h_source.html
It turns out the value you specified is :
#define AVERROR_INVALIDDATA FFERRTAG( 'I','N','D','A')
The -1094995529 becomes -0x41444E49 and when you look at those letters, in ACSII, 0x41 = 'A', 0x44 = 'D', 0x4E = 'N, and 0x49 = 'I'. Due to the macro/etc things are reversed, so ADNI becomes INDA, which you can see from the #define snippet, is the AVERROR_INVALIDDATA defined FFERRTAG( 'I','N','D','A').
The rest of the error codes are in that file and I've pasted them below here :
#define AVERROR_BSF_NOT_FOUND FFERRTAG(0xF8,'B','S','F') ///< Bitstream filter not found
#define AVERROR_BUG FFERRTAG( 'B','U','G','!') ///< Internal bug, also see AVERROR_BUG2
#define AVERROR_BUFFER_TOO_SMALL FFERRTAG( 'B','U','F','S') ///< Buffer too small
#define AVERROR_DECODER_NOT_FOUND FFERRTAG(0xF8,'D','E','C') ///< Decoder not found
#define AVERROR_DEMUXER_NOT_FOUND FFERRTAG(0xF8,'D','E','M') ///< Demuxer not found
#define AVERROR_ENCODER_NOT_FOUND FFERRTAG(0xF8,'E','N','C') ///< Encoder not found
#define AVERROR_EOF FFERRTAG( 'E','O','F',' ') ///< End of file
#define AVERROR_EXIT FFERRTAG( 'E','X','I','T') ///< Immediate exit was requested; the called function should not be restarted
#define AVERROR_EXTERNAL FFERRTAG( 'E','X','T',' ') ///< Generic error in an external library
#define AVERROR_FILTER_NOT_FOUND FFERRTAG(0xF8,'F','I','L') ///< Filter not found
#define AVERROR_INVALIDDATA FFERRTAG( 'I','N','D','A') ///< Invalid data found when processing input
#define AVERROR_MUXER_NOT_FOUND FFERRTAG(0xF8,'M','U','X') ///< Muxer not found
#define AVERROR_OPTION_NOT_FOUND FFERRTAG(0xF8,'O','P','T') ///< Option not found
#define AVERROR_PATCHWELCOME FFERRTAG( 'P','A','W','E') ///< Not yet implemented in FFmpeg, patches welcome
#define AVERROR_PROTOCOL_NOT_FOUND FFERRTAG(0xF8,'P','R','O') ///< Protocol not found
#define AVERROR_STREAM_NOT_FOUND FFERRTAG(0xF8,'S','T','R') ///< Stream not found
#define AVERROR_BUG2 FFERRTAG( 'B','U','G',' ')
#define AVERROR_UNKNOWN FFERRTAG( 'U','N','K','N') ///< Unknown error, typically from an external library
#define AVERROR_EXPERIMENTAL (-0x2bb2afa8) ///< Requested feature is flagged experimental. Set strict_std_compliance if you really want to use it.
#define AVERROR_INPUT_CHANGED (-0x636e6701) ///< Input changed between calls. Reconfiguration is required. (can be OR-ed with AVERROR_OUTPUT_CHANGED)
#define AVERROR_OUTPUT_CHANGED (-0x636e6702) ///< Output changed between calls. Reconfiguration is required. (can be OR-ed with AVERROR_INPUT_CHANGED)
#define AVERROR_HTTP_BAD_REQUEST FFERRTAG(0xF8,'4','0','0')
#define AVERROR_HTTP_UNAUTHORIZED FFERRTAG(0xF8,'4','0','1')
#define AVERROR_HTTP_FORBIDDEN FFERRTAG(0xF8,'4','0','3')
#define AVERROR_HTTP_NOT_FOUND FFERRTAG(0xF8,'4','0','4')
#define AVERROR_HTTP_OTHER_4XX FFERRTAG(0xF8,'4','X','X')
#define AVERROR_HTTP_SERVER_ERROR FFERRTAG(0xF8,'5','X','X')

Segment definitions for linux on x86

Linux 3.4.6 defines the following macros in arch/x86/include/asm/segment.h. Can anybody explain why the __USER macros add 3 to the defined constant and why this is not done for __KERNEL macros?
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS*8)
#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS*8+3)
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)
These four symbols represent segment descriptors. The two least-significant bits of these descriptors contain the privilege level associated with them, and the third least-significant bit contains the descriptor table type (GDT or LDT). This is made clearer by code occurring a little later:
/* User mode is privilege level 3 */
#define USER_RPL 0x3
/* LDT segment has TI set, GDT has it cleared */
#define SEGMENT_LDT 0x4
#define SEGMENT_GDT 0x0
/* Bottom two bits of selector give the ring privilege level */
#define SEGMENT_RPL_MASK 0x3
/* Bit 2 is table indicator (LDT/GDT) */
#define SEGMENT_TI_MASK 0x4
To achieve this, the descriptor table entry is multiplied by 8, which shifts it three bits to the left, and then ORed with the table type and privilege level (using addition):
/* GDT, ring 0 (kernel mode) */
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
/* GDT, ring 3 (user mode) */
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)

how can I find native exceptions in an x64 stack?

I've been using WinDbg to debug dump files for a while now.
There's a nice "trick" that works with x86 native programs, you can scan the stack for the CONTEXT_ALL flags (0x1003f).
In x64 the CONTEXT_ALL flags apparently don't contain 0x1003f ...
Now the problem is that sometimes when you mix native with managed code, the regular methods of finding exceptions (like .exc or .lastevent).
What is the equivalent of this 0x1003f in x64 ? is there such a constant ?
EDIT:
BTW, if you were wondering, in theory it should have been 10003f because of the definitions:
#define CONTEXT_I386 0x00010000
#define CONTEXT_AMD64 0x00100000
#define CONTEXT_CONTROL 0x00000001L // SS:SP, CS:IP, FLAGS, BP
#define CONTEXT_INTEGER 0x00000002L // AX, BX, CX, DX, SI, DI
#define CONTEXT_SEGMENTS 0x00000004L // DS, ES, FS, GS
#define CONTEXT_FLOATING_POINT 0x00000008L // 387 state
#define CONTEXT_DEBUG_REGISTERS 0x00000010L // DB 0-3,6,7
#define CONTEXT_EXTENDED_REGISTERS 0x00000020L // cpu specific extensions
#define CONTEXT_FULL (CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_SEGMENTS)
#define CONTEXT_ALL (CONTEXT_FULL | CONTEXT_FLOATING_POINT | CONTEXT_DEBUG_REGISTERS | CONTEXT_EXTENDED_REGISTERS)
#define CONTEXT_I386_FULL CONTEXT_I386 | CONTEXT_FULL
#define CONTEXT_I386_ALL CONTEXT_I386 | CONTEXT_ALL
#define CONTEXT_AMD64_FULL CONTEXT_AMD64 | CONTEXT_FULL
#define CONTEXT_AMD64_ALL CONTEXT_AMD64 | CONTEXT_ALL
But it's not...
I usually use the segment register values for my key to finding context records (ES and DS have the same value and are next to each other in the CONTEXT stucture). The flags trick is neat too though.
Forcing an exception in a test application then digging the context record structure off the stack, looks like the magic value in my case would be 0x10001f:
0:000> dt ntdll!_context 000df1d0
...
+0x030 ContextFlags : 0x10001f
...
+0x03a SegDs : 0x2b
+0x03c SegEs : 0x2b
...
Also note that the ContextFlags value is not at the start of the structure, so if you find that value you'll have to subtract ##c++(#FIELD_OFFSET(ntdll!_CONTEXT, ContextFlags)) from it to get the base of the context structure.
Also, just in case it wasn't obvious, this value comes from a sample size of exactly one. It may not be correct in your environment and it is of course subject to change (as is anything implementation specific such as this).

Resources