Run a process at the same physical memory location - linux-kernel

For a research project, I have a long-running process that uses various buffers and stack variables. I'd like to be able to launch this process multiple times such that the physical addresses backing its heap, stack, code, and static variables are equal each time. I know the exact size of all of these variables, and the size of the heap and stack stay constant during execution. To help with this, I use some helper code to translate arbitrary virtual addresses in my program to their corresponding physical addresses (sourced from here):
struct pagemap
{
union status
{
struct present
{
unsigned long long pfn : 54;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} present;
struct swapped
{
unsigned char swaptype : 4;
unsigned long long offset : 50;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} swapped;
} status;
} __attribute__ ((packed));
unsigned long get_pfn_for_addr(void *addr)
{
unsigned long offset;
struct pagemap pagemap;
FILE *pagemap_file = fopen("/proc/self/pagemap", "rb");
offset = (unsigned long) addr / getpagesize() * 8;
if(fseek(pagemap_file, offset, SEEK_SET) != 0)
{
fprintf(stderr, "failed to seek pagemap to offset\n");
exit(1);
}
fread(&pagemap, 1, sizeof(struct pagemap), pagemap_file);
fclose(pagemap_file);
return pagemap.status.present.pfn;
}
unsigned long virt_to_phys(void *addr)
{
unsigned long pfn, page_offset, phys_addr;
pfn = get_pfn_for_addr(addr);
page_offset = (unsigned long) addr % getpagesize();
phys_addr = (pfn << PAGE_SHIFT) + page_offset;
return phys_addr;
}
So far, my methodology has only required that a specific buffer in my program is located at the same physical address for each run. For this, I was just able to exit and relaunch the process whenever the physical address for that buffer was wrong, and I would end up with the correct location relatively quickly each time. However, I'd like to extend my experiment to ensure that my process is loaded identically in physical memory between runs, and this try-and-restart method does not seem to work well for this. Ideally, I would like to be able to set apart some small number of physical page frames that can't be allocated to another process, or to the kernel itself. Then, I would pass a flag down to do_fork that notifies the kernel that this is my special process and to allocate specific page frames to it.
My questions are:
Is there any sort of isolation mechanism already built into the kernel that would let me set aside an exclusive physical memory space that I could launch my process in?
If not, what would be a starting point for modifying the kernel to support behavior like this?
Is there any other solution (not involving either of the two above) that I could use for my desired behavior?

This is something that the kernel, using virtual memory, is tasked to abstract from you, so I'm not sure it is even possible to do (without insane amounts of work).
May I ask what experiment requires this? Perhaps if you describe what you want to achieve, it is easier to offer advice.

Related

How can i know the minor on Linux module initialisation

I am writing a linux kernel module.
Here is what i've done in module's init function:
register_chrdev(300 /* major */, "mydev", &fops);
It works fine. But i need to know the minor number.
I have read we cannot set this minor number. It is the kernel which gives us this number. If so, how can i know it in module's init function ?
Thanks
register_chrdev calls __register_chrdev internally.
static inline int register_chrdev(unsigned int major, const char *name,
const struct file_operations *fops)
{
return __register_chrdev(major, 0, 256, name, fops);
}
If you will see __register_chrdev function signature, it is
int __register_chrdev(unsigned int major, unsigned int baseminor,
unsigned int count, const char *name,
const struct file_operations *fops)
register_chrdev will pass your major number(300) and a base minor number 0 with a count of 256. So, it will reserve 0-255 minor number range for your device.
Also, in the definition of __register_chrdev, dev_t structure is created (contains major & minor number) for your device.
err = cdev_add(cdev, MKDEV(cd->major, baseminor), count);
MKDEV(cd->major, baseminor) creates it. So, the first device number(dev_t) will have 0 as its minor number. Besides, count(256) is the consecutive minor numbers that can be further used.
You can also dynamically get the major & minor number if you use alloc_chrdev_region. All you have to do is pass a dev_t struct
to alloc_chrdev_region. It will dynamically allocate a major and minor number to your device. To get the major and minor number in your module, you can use
major = MAJOR(dev);
minor = MINOR(dev);

How to identify what parts of the allocated virtual memory a process is using

I want to be able to search through the allocated memory of a process (say you open notepad and type “HelloWorld” then ran the search looking for the string “HelloWorld”). For 32bit applications this is not a problem but for 64 bit applications the large quantity of allocated virtual memory takes hours to search through.
Obviously the vast majority of applications are not utilising the full amount of virtual memory allocated. I can identify the areas in memory allocated to each process with VirtualQueryEX and read them with ReadProcessMemory but when it comes to 64 bit applications this still takes hours to complete.
Does anyone know of any resources or any methods that could be used to help narrow down the amount of memory to be searched?
It is important that you only scan proper memory. If you just scanned from 0x0 to 0xFFFFFFFFF it would take at least 5 seconds in most processes. You can skip bad regions of memory by checking the memory page settings by using VirtualQueryEx. This will retrieve a MEMORY_BASIC_INFORMATION which will define the state of that memory region.
If the MemoryBasicInformation.state is not MEM_COMMIT then it is bad memory
If the MBI.Protect is PAGE_NOACCESS you also want to skip this memory.
If VirtualQuery fails then you skip to the next region.
In this manner it should only take 0-2 seconds to scan the memory on your average process because it is only scanning good memory.
char* ScanEx(char* pattern, char* mask, char* begin, intptr_t size, HANDLE hProc)
{
char* match{ nullptr };
SIZE_T bytesRead;
DWORD oldprotect;
char* buffer{ nullptr };
MEMORY_BASIC_INFORMATION mbi;
mbi.RegionSize = 0x1000;//
VirtualQueryEx(hProc, (LPCVOID)begin, &mbi, sizeof(mbi));
for (char* curr = begin; curr < begin + size; curr += mbi.RegionSize)
{
if (!VirtualQueryEx(hProc, curr, &mbi, sizeof(mbi))) continue;
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
delete[] buffer;
buffer = new char[mbi.RegionSize];
if (VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &oldprotect))
{
ReadProcessMemory(hProc, mbi.BaseAddress, buffer, mbi.RegionSize, &bytesRead);
VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, oldprotect, &oldprotect);
char* internalAddr = ScanBasic(pattern, mask, buffer, (intptr_t)bytesRead);
if (internalAddr != nullptr)
{
//calculate from internal to external
match = curr + (internalAddr - buffer);
break;
}
}
}
delete[] buffer;
return match;
}
ScanBasic is just a standard comparison function which compares your pattern against the buffer.
Second, if you know the address is relative to a module, only scan the address range of that module, you can get the size of the module via ToolHelp32Snapshot. If you know it's dynamic memory on the heap, then only scan the heap. You can get all the heaps also with ToolHelp32Snapshot and TH32CS_SNAPHEAPLIST.
You can make a wrapper for this function as well for scanning the entire address space of the process might look something like this
char* Pattern::Ex::ScanProc(char* pattern, char* mask, ProcEx& proc)
{
unsigned long long int kernelMemory = IsWow64Proc(proc.handle) ? 0x80000000 : 0x800000000000;
return Scan(pattern, mask, 0x0, (intptr_t)kernelMemory, proc.handle);
}

how to transfer string(char*) in kernel into user process using copy_to_user

I'm making code to transfer string in kernel to usermode using systemcall and copy_to_user
here is my code
kernel
#include<linux/kernel.h>
#include<linux/syscalls.h>
#include<linux/sched.h>
#include<linux/slab.h>
#include<linux/errno.h>
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
printk("getProcTag system call \n\n");
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
/*
task -> tag : string is stored in task->tag (ex : "abcde")
this part is well worked
*/
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
return 1;
}
and this is user
#include<stdio.h>
#include<stdlib.h>
int main()
{
char *ret=NULL;
int pid = 0;
printf("PID : ");
scanf("%4d", &pid);
if(syscall(339, pid, &ret)!=1) // syscall 339 is getProcTagSysCall
printf("pid %d does not exist\n", pid);
else
printf("Corresponding pid tag is %s \n",ret); //my output is %s = null
return 0;
}
actually i don't know about copy_to_user well. but I think copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) is operated like this code
so i use copy_to_user like above
#include<stdio.h>
int re();
void main(){
char *b = NULL;
if (re(&b))
printf("success");
printf("%s", b);
}
int re(char **str){
char *temp = "Gdg";
*str = temp;
return 1;
}
Is this a college assignment of some sort?
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
What is this, Linux 2.6? What's up with ** instead of *?
printk("getProcTag system call \n\n");
Somewhat bad. All strings are supposed to be prefixed.
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
What is going on here? Casting malloc makes no sense whatsoever, if you malloc you should have used sizeof(*task) instead, but you should not malloc in the first place. You want to find a task and in fact you just overwrite this pointer's value few lines later anyway.
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
find_task_by_vpid requires RCU. The kernel would have told you that if you had debug enabled.
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
So... you unlock... but you did not get any kind of reference to the task.
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
... in other words by the time you do task->tag, the task may already be gone. What requirements are there to access ->tag itself?
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
What's up with this? sizeof(char) is guaranteed to be 1.
I'm really confused by this entire business.
When you have a syscall which copies data to userspace where amount of data is not known prior to the call, teh syscall accepts both buffer AND its size. Then you can return appropriate error if the thingy you are trying to copy would not fit.
However, having a syscall in the first place looks incorrect. In linux per-task data is exposed to userspace in /proc/pid/. Figuring out how to add a file to proc is easy and left as an exercise for the reader.
It's quite obvious from the way you fixed it. copy_to_user() will only copy data between two memory regions - one accessible only to kernel and the other accessible also to user. It will not, however, handle any memory allocation. Userspace buffer has to be already allocated and you should pass address of this buffer to the kernel.
One more thing you can change is to change your syscall to use normal pointer to char instead of pointer to pointer which is useless.
Also note that you are leaking memory in your kernel code. You allocate memory for task_struct using kmalloc and then you override the only pointer you have to this memory when calling find_task_by_vpid() and this memory is never freed. find_task_by_vpid() will return a pointer to a task_struct which already exists in memory so there is no need to allocate any buffer for this.
i solved my problem by making malloc in user
I changed
char *b = NULL;
to
char *b = (char*)malloc(sizeof(char) * 100)
I don't know why this work properly. but as i guess copy_to_user get count of bytes as third argument so I should malloc before assigning a value
I don't know. anyone who knows why adding malloc is work properly tell me

Is a spinlock necessary in this Linux device driver code?

Is the following Linux device driver code safe, or do I need to protect access to interrupt_flag with a spinlock?
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static int interrupt_flag = 0;
static ssize_t my_write(struct file* filp, const char* __user buffer, size_t length, loff_t* offset)
{
interrupt_flag = 0;
wait_event_interruptible(wq_head, interrupt_flag != 0);
}
static irqreturn_t handler(int irq, void* dev_id)
{
interrupt_flag = 1;
wake_up_interruptible(&wq_head);
return IRQ_HANDLED;
}
Basically, I kick off some event in my_write() and wait for the interrupt to indicate that it completes.
If so, which form of spin_lock() do I need to use? I thought spin_lock_irq() was appropriate, but when I tried that I got a warning about the IRQ handler enabling interrupts.
Doesn't wait_event_interruptible evaluate the interrupt_flag != 0 condition? That would imply that the lock should be held while it reads the flag, right?
No lock is needed in the example given. Memory barriers are needed after the store of the flag, and before the load -- to ensure visibility to the flag -- but the wait_event_* and wake_up_* functions provide those. See the section entitled "Sleep and wake-up functions" in this document: https://www.kernel.org/doc/Documentation/memory-barriers.txt
Before adding a lock, consider what is being protected. Generally locks are needed if you're setting two or more separate pieces of data and you need to ensure that another cpu/core doesn't see an incomplete intermediate state (after you started but before you finished). In this case, there's no point in protecting the storing / loading of the flag value because stores and loads of a properly aligned integer are always atomic.
So, depending on what else your driver is doing, it's quite possible you do need a lock, but it isn't needed for the snippet you've provided.
Yes you need a lock. With the given example (that uses int and no specific arch is mentioned), the process context may be interrupted while accessing the interrupt_flag. Upon return from the IRQ, it may continue and interrupt_flag may be left in inconsistent state.
Try this:
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static int interrupt_flag = 0;
DEFINE_SPINLOCK(lock);
static ssize_t my_write(struct file* filp, const char* __user buffer, size_t length, loff_t* offset)
{
/* spin_lock_irq() or spin_lock_irqsave() is OK here */
spin_lock_irq(&lock);
interrupt_flag = 0;
spin_unlock_irq(&lock);
wait_event_interruptible(wq_head, interrupt_flag != 0);
}
static irqreturn_t handler(int irq, void* dev_id)
{
unsigned long flags;
spin_lock_irqsave(&lock, flags);
interrupt_flag = 1;
spin_unlock_irqrestore(&lock, flags);
wake_up_interruptible(&wq_head);
return IRQ_HANDLED;
}
IMHO, the code has to be written without making any arch or compiler-related assumptions (like the 'properly aligned integer' in Gil Hamilton answer).
Now if we can change the code and use atomic_t instead of the int flag, then no locks should be needed.

Setting register values in PIC16F876 using Hi Tech PICC

I am using MPLABx and the HI Tech PICC compiler. My target chip is a PIC16F876. By looking at the pic16f876.h include file, it appears that it should be possible to set the system registers of the chip by referring to them by name.
For example, within the CCP1CON register, bits 0 to 3 set how the CCP and PWM modules work. By looking at the pic16f876.h file, it looks like it should be possible to refer to these 4 bits alone, without change the value of the rest of the CCP1CON register.
However, I have tried to refer to these 4 bits in a variety of ways with no success.
I have tried;
CCP1CON.CCP1M=0xC0; this results in "error: struct/union required
CCP1CON:CCP1M=0xC0; this results in "error: undefined identifier "CCP1M"
but both have failed. I have read through the Hi Tech PICC compiler manual, but cannot see how to do this.
From the pic16f876.h file, it looks to me as though I should be able to refer to these subsets within the system registers by name, as they are defined in the .h file.
Does anyone know how to accomplish this?
Excerpt from pic16f876.h
// Register: CCP1CON
volatile unsigned char CCP1CON # 0x017;
// bit and bitfield definitions
volatile bit CCP1Y # ((unsigned)&CCP1CON*8)+4;
volatile bit CCP1X # ((unsigned)&CCP1CON*8)+5;
volatile bit CCP1M0 # ((unsigned)&CCP1CON*8)+0;
volatile bit CCP1M1 # ((unsigned)&CCP1CON*8)+1;
volatile bit CCP1M2 # ((unsigned)&CCP1CON*8)+2;
volatile bit CCP1M3 # ((unsigned)&CCP1CON*8)+3;
#ifndef _LIB_BUILD
volatile union {
struct {
unsigned CCP1M : 4;
unsigned CCP1Y : 1;
unsigned CCP1X : 1;
};
struct {
unsigned CCP1M0 : 1;
unsigned CCP1M1 : 1;
unsigned CCP1M2 : 1;
unsigned CCP1M3 : 1;
};
} CCP1CONbits # 0x017;
#endif
You need to access the bitfield members through an instance of a struct. In this case, that is CCP1CONbits. Because it is a bitfield, you only need to have the number of significant bits as defined in the bitfield, not the full eight bits in your code.
So:
CCP1CONbits.CCP1M = 0x0c;
Should be the equivalent of what you are trying to do. If you want to set all eight bits at once you can use CCP1CON = 0xc0. That would set the CCP1M bits to 0x0c and all the other bits to zero.
The header you gave also has individual bit symbols, so you could do this too:
CCP1M0 = 1;
CCP1M1 = 1;
CCP1M2 = 0;
CCP1M3 = 0;
Although the bitfield approach is cleaner.

Resources