Read init_stack task_struct from /dev/kmem - linux-kernel

I am root on an x86 64 bits Linux Kernel.
My goal is to understand task_struct structure.
I have looked at init_task address in /proc/kallsyms
I found: 0xffffffff83613740
I wanted to read this task_struct entry from /dev/kmem with this c code. (This code is runned as root user):
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE* fd = fopen("/dev/kmem","r");
fseek(fd,SEEK_SET, 0xffffffff83613740);
unsigned char buffer[1];
fread(buffer,1,1,fd);
fclose(fd);
return 0;
}
As you can see i read just a single byte, not the whole task_struct object.
I get a segfault when running this code as root.
Can you tell me what is wrong ?
/proc/kallsyms init_task does not gives us the kernel address of init's task_struct ?
If so how can i get the address ?
Thanks

Related

linux hw_breakpoint does not work while accessing memory from userspace

I am debugging a ARMv7 board and I want to know whether a kernel symbol is accessed. So I have to use hw_breakpoint in kernel.
For simplicity, I use kernel sample code:data_breakpoint to test, which locates in samples/hw_breakpoint/data_breakpoint.c.
Then I did the following operation:
insmod data_breakpoint.ko ksym=max
cat /proc/kallsyms | grep max
./read_kmem c06fa128
But this did not trigger the callback function.
If I print the value in that address in any kernel module, callback function will be triggered.
I read the cpu manual and it says that the breakpoint register in my cpu support virtual address matching. But I don't know why it doesn't work while accessing memory from userspace. I think that program does read the right value of kernel symbol.
read_kmem.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#define DEVKMEM "/dev/kmem"
#define PAGE_SIZE 0x1000
#define PAGE_MASK (~(PAGE_SIZE-1))
int main(int argc, char* argv[])
{
int fd;
char *mbase;
char read_buf[10];
unsigned int varAddr;
varAddr = strtoul(argv[1], 0, 16);
unsigned int ptr = varAddr & ~(PAGE_MASK);
fd = open(DEVKMEM, O_RDONLY);
if (fd == -1) {
perror("open");
exit(-1);
}
mbase = mmap(0,PAGE_SIZE,PROT_READ,MAP_SHARED,fd, (varAddr & PAGE_MASK));
if (mbase == MAP_FAILED) {
printf("map failed %s\n",strerror(errno));
}
printf("varAddr = 0x%X \n", varAddr);
printf("mapbase = 0x%X \n", (unsigned int)mbase);
printf("value = 0x%X \n",*(unsigned int*)(mbase+ptr));
close(fd);
munmap(mbase,PAGE_SIZE);
return 0;
}
Your userspace does not access address c06fa128, it accesses a different address - one that that mmap() returned (plus offset). Thus no breakpoint hit.
The fact that virtual address being accessed resolves to same physical address as some other virtual address that has a breapoint, does not matter. CPU executing your userspace code has no idea that different mapping exists.

Accessing SuperBlock object of linux kernel in a system call

I am trying to access super block object which is defined in linux/fs.h.
But how to initialize the object so that we can access it's properties.
I found that alloc_super() is used to initialize super but how is it called?
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>
#include <linux/fs.h>
int main(){
printf("hello there");
struct super_block *sb;
return 0;
}
The answer is very much file system dependent, since different file systems will have different super block layouts and infact different arrangements of blocks.
For instance, ext2 file systems superblock is in a known location on disk (byte 1024), and has a known size (sizeof(struct superblock) bytes).
So a typical implementation (This is not a working code but with minor modification can be made to work ) of what you want would be:
struct superblock *read_superblock(int fd) {
struct superblock *sb = malloc(sizeof(struct superblock));
assert(sb != NULL);
lseek(fd, (off_t) 1024, SEEK_SET));
read(fd, (void *) sb, sizeof(struct superblock));
return sb;
}
Now, you can alloc superblock using linux/headers, or write your own struct that exactly matches with the ext2/ext3/etc/etc file systems superblock.
Then you must know where to find the superblock (the lseek() comes here).
Also you need to pass the disk file name file_descriptor to the function.
So do a
int fd = open(argv[1], O_RDONLY);
struct superblock * sb = read_superblock(fd);

Retrieving inode struct given the path to a file

I've seen lots of questions about getting a file's path from it's inode, but almost none about doing the reverse. My kernel module needs to do this to get further information about the subjects of requests passed to open(), such as its file flags or whether or not it's a device. From what I was able to scrounge together from mailing lists, manual pages, and the Linux source code, I came up with this small function:
struct inode* get_inode_from_pathname(const char *pathname) {
struct path path;
kern_path(pathname, LOOKUP_FOLLOW, &path);
return path.dentry->d_inode;
}
Trying to use it in my replacement system call makes kernel messages get printed to the console, though:
struct inode *current_inode;
...
asmlinkage int custom_open(char const *__user file_name, int flags, int mode) {
current_inode = get_inode_from_pathname(file_name);
printk(KERN_INFO "intercepted: open(\"%s\", %X, %X)\n", file_name, flags, mode);
printk(KERN_INFO "i_mode of %s:%hu\n", file_name, current_inode->i_mode);
return real_open(file_name, flags, mode);
}
Is there a better way to do this? I'm almost positive my way is wrong.
You can use the kern_path kernel API to get the inode information from the path string. This function in turn calls the do_path_lookup() function which performs the path look up operation. You can verify the results of the kern_path function by printing the inode number (i_ino field of the inode structure) of the inode you get from your get_inode_from_pathname function and matching it with the inode number from an ls command (ls -i <path of the file>)
I made the following kernel module and it's not crashing the kernel. I am using 2.6.39 kernel.
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/mount.h>
#include <linux/path.h>
#include <linux/namei.h>
#include <linux/fs.h>
#include <linux/namei.h>
char *path_name = "/home/shubham/test_prgs/temp.c";
int myinit(void)
{
struct inode *inode;
struct path path;
kern_path(path_name, LOOKUP_FOLLOW, &path);
inode = path.dentry->d_inode;
printk("Path name : %s, inode :%lu\n", path_name, inode->i_ino);
return 0;
}
void myexit(void)
{
return;
}
module_init(myinit);
module_exit(myexit);
//MODULE_AUTHOR("Shubham");
//MODULE_DESCRIPTION("Module to get inode from path");
MODULE_LICENSE("GPL");
MODULE_LICENSE("GPL v2");
Can you send the crash stack trace?
I guess the author has already fixed his problem, but this question was the first link in google's search results, so I'll explain it further.
The problem with the code from the question was using __user pointer.
When you hook a function that deals with __user pointers, first thing you have to make is to copy content to your own kernel buffer where you will deal with it or make sure that pointer won't become invalid while you are dealing with it.
To copy it to your buffer you could use copy_from_user function
char path[MAX_PATH] = {0};
if (copy_from_user(path, user_path, strlen_user(user_path))
{
//error
}
//success
If you hook sys_open, you can use getname/putname functions, as it is done in do_sys_open function:
1010 long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
1011 {
1012 struct open_flags op;
1013 int fd = build_open_flags(flags, mode, &op);
1014 struct filename *tmp;
1015
1016 if (fd)
1017 return fd;
1018
1019 tmp = getname(filename);
1020 if (IS_ERR(tmp))
1021 return PTR_ERR(tmp);
1022
1023 fd = get_unused_fd_flags(flags);
1024 if (fd >= 0) {
1025 struct file *f = do_filp_open(dfd, tmp, &op);
1026 if (IS_ERR(f)) {
1027 put_unused_fd(fd);
1028 fd = PTR_ERR(f);
1029 } else {
1030 fsnotify_open(f);
1031 fd_install(fd, f);
1032 }
1033 }
1034 putname(tmp);
1035 return fd;
1036 }
ps: code from S_S's answer won't crash because in fact it allocated buffer for path inside kernel, so it couldn't become invalid while the module is working with it.

Limiting memory usage for a single process in OSX /Darwin

I am trying to modify some JNI code to limit the amount of memory that a process can consume. Here is the code that I am using to test setRlimit on linux and osx. In linux it works as expected and the buf is null.
This code sets the limit to 32 MB and then tries to malloc a 64 MB buffer, if buffer is null then setrlimit works.
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int main(int argc) {
pid_t pid = getpid();
struct rlimit current;
struct rlimit *newp;
int memLimit = 32 * 1024 * 1024;
int result = getrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to get rlimit");
current.rlim_cur = memLimit;
current.rlim_max = memLimit;
result = setrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to setrlimit");
printf("Doing malloc \n");
int memSize = 64 * 1024 * 1024;
char *buf = malloc(memSize);
if (buf == NULL) {
printf("Your out of memory\n");
} else {
printf("Malloc successsful\n");
}
free(buf);
}
On linux machine this is my result
memtest]$ ./m200k
Doing malloc
Your out of memory
On osx 10.8
./m200k
Doing malloc
Malloc successsful
My question is that if this does not work on osx is there a way to acomplish this task in darwin kernel. The man pages all seem to say it will work but it does not appear to do so. I have seen that launchctl has some support for limiting memory but my goal is to add this ability in code. I tried using ulimit also but this did not work either and am pretty sure ulimit uses setrlimit to set limits. Also is there a signal I can catch when setrlimit soft or hardlimit is exceeded. I haven't been able to find one.
Bonus points if it can be accomplished in windows also.
Thanks for any advice
Update
As pointed out the RLIMIT_AS is explicitly defined in the man page but is defined as the RLIMIT_RSS, so if referring to the documentation RLIMIT_RSS and RLIMIT_AS are interchangable on OSX.
/usr/include/sys/resource.h on osx 10.8
#define RLIMIT_RSS RLIMIT_AS /* source compatibility alias */
Tested trojanfoe's excellent suggestion to use RLIMIT_DATA which is described here
The RLIMIT_DATA limit specifies the maximum amount of bytes the process
data segment can occupy. The data segment for a process is the area in which
dynamic memory is located (that is, memory allocated by malloc() in C, or in C++,
with new()). If this limit is exceeded, calls to allocate new memory will fail.
The result was the same for linux and osx and that was the malloc was successful for both.
chinshaw#osx$ ./m200k
Doing malloc
Malloc successsful
chinshaw#redhat ./m200k
Doing malloc
Malloc successsful

Problems doing syscall hooking

I use the following module code to hooks syscall, (code credited to someone else, e.g., Linux Kernel: System call hooking example).
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk(KERN_ALERT "A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xffffffff804a1ba0;
original_call = sys_call_table[1024];
set_page_rw(sys_call_table);
sys_call_table[1024] = our_sys_open;
return 0;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[1024] = original_call;
}
When insmod the compiled .ko file, terminal throws "Killed". When looking into 'cat /proc/modules' file, I get the Loading status.
my_module 10512 1 - Loading 0xffffffff882e7000 (P)
As expected, I can not rmmod this module, as it complains its in use. The system is rebooted to get a clean-slate status.
Later on, after commenting two code lines in the above source sys_call_table[1024] = our_sys_open; and sys_call_table[1024] = original_call;, it can insmod successfully. More interestingly, when uncommenting these two lines (change back to the original code), the compiled module can be insmod successfully. I dont quite understand why this happens? And is there any way to successfully compile the code and insmod it directly?
I did all this on Redhat with linux kernel 2.6.24.6.
I think you should take a look to the kprobes API, which is well documented in Documentation/krpobes.txt. It gives you the ability to install handler on every address (e.g. syscall entry) so that you can do what you want. Added bonus is that your code would be more portable.
If you're only interested in tracing those syscalls you can use the audit subsystem, coding your own userland daemon which will be able to receive events on a NETLINK socket from the audit kthread. libaudit provides a simple API to register/read events.
If you do have a good reason with not using kprobes/audit, I would suggest that you check that the value you are trying to write to is not above the page that you set writable. A quick calculation shows that:
offset_in_sys_call_table * sizeof(*sys_call_table) = 1024 * 8 = 8192
which is two pages after the one you set writable if you are using 4K pages.

Resources