Not understanding heap overflow article - memory-management

Hi I'm trying to understand how heap overflows work and I've been reading this article which seems very foggy to me. Below is the page of the article that I am stuck on.
http://www.h-online.com/security/features/A-Heap-of-Risk-747224.html
My understanding ceases after the second half of page 4 in the link. They implement their own heap manager on page 2 which may also be useful. The figure bellow represents the heap data structure after string copy to image data (hopefully this is right).
Root = Hdr Free Memory
_________________ ________________
|*Next = 0xF |----------->0xF|*Next = "AAAA" |
------------------- ------------------
|*Previous = NULL | |*Previous="AAAA"|
------------------- ------------------
|Size = 0 | |Size = "AAAA" |
------------------- ------------------
|Used = 0 | |Used = "AAAA" |
------------------- ------------------
|Free Mem Data |
(Let Root start at 0x0. Also each field is 32 bits and thus 4 bytes wide. "AAAA" stands for the string "AAAA" where each 'A' is a character and therefor one byte of memory.)
From the tutorial they say that when memory is supposedly freed, the function Free_Heap() will want to read from the address "AAAA" = 0x4141414d. There explanation is that the "used" field is an offset of 12 bytes from the beginning of the header section and thus 0x41414141 + 0xc = 0x4141414d. To me that explanation makes no sense for the following reasons.
A) Why would Free_Heap() even try to read from the address in the "used" field when that value only tells Free_Heap() whether or not the data on the heap structure is being used. Unless the "used" field is a pointer to the actual data being written (which is not mentioned in the tutorial), this would not make any sense to me.
B) Assuming that the used field in the heap struct really is a pointer to the data that may be written to, why would the offset have anything to do with from where the heap should be read from? May be if the data section was located right after "used" pointer field (like in a stack), then that would mean that data should be placed at an offset of 0xf and not 0xc so that the data does not overwrite the "used" field.
Thanks for any helpful input to clear this up.

That part of the article seems either wrong or just really badly written. Although it will read hdr->next->used to check whether the follow-on memory object is in use, as you say, its used and size fields will be 0x41414141, so we won't try to merge with it. Still, the setup is fine, you will shortly afterwards dereference one of those pointers: when freeing the 'line' memory object (the one whose header we stomped), it will attempt to check if its next and prev memory blocks are in use. Dereferencing either of those pointer fields will crash or be actively exploited.

Related

Linux `alloc_pages_node` not incrementing `_refcount` for all allocated pages

When allocating physically-contiguous memory with alloc_pages_node in Linux v6.0, the _refcount in struct page for all of the allocated pages is not incremented. Only the first page of the allocation has its _refcount correctly incremented.
Is this correct/intended behavior?
Is this function only intended to be used in particular use cases/in a particular way such that the incorrect _refcount is accounted for?
Context: alloc_pages* are a series of functions in the kernel intended for allocating a physically contiguous set of pages Documentation. These functions return a pointer to the struct page corresponding to the first page of the allocated region.
I am using this function during early boot (in fact while setting up the stacks for the init process and for kthreadd).
By this point, the buddy-allocator is functional and usable.
Similar APIs (ignoring the need for physical contiguity) such as vmalloc increment the _refcount for all allocated pages.
This is the code I am running. The output is also listed below.
Code
order = get_order(nr_pages << PAGE_SIZE);
p = alloc_pages_node(node, gfp_mask, order);
if (!p)
return;
for(i = 0; i < nr_pages; i++, p++)
printk("_refcount = %d", p->_refcount);
Output
_refcount = 1
_refcount = 0
_refcount = 0
...
Arguments
gfp_mask is (THREADINFO_GFP & ~__GFP_ACCOUNT) | __GFP_NOWARN | __GFP_HIGHMEM.
The first part THREADINFO_GFP & ~__GFP_ACCOUNT of this is sent by alloc_thread_stack_node
__vmalloc_area_node adds __GFP_NOWARN | __GFP_HIGHMEM
order = get_order(nr_pages << PAGE_SIZE) = 2 since nr_pages is 4.
Is this correct/intended behavior?
Yes, this is normal. Page allocations of order higher than 0 are effectively considered as a single "high-order" page(1) by the the buddy allocator, so functions such as alloc_pages() and __free_pages(), which operate on both order-0 and high-order pages, only care about the reference count of the first page.
Upon allocation (alloc_pages), only the first struct page of the group gets its refcount initialized. Upon deallocation (__free_pages), the refcount of the first page is decremented and tested: if it reaches zero, the whole group of pages gets actually freed(2). When this happens, a sanity check is also performed on every single page to ensure that the reference count is zero.
If you intend to allocate multiple pages at once, but then manage them separately, you will need to split them using split_page(), which effectively "enables" reference counting for every single struct page and initializes its refcount to 1. You can then use __free_pages(p, 0) (or __free_page()) on each page separately.(3)
Similar APIs (ignoring the need for physical contiguity) such as vmalloc increment the _refcount for all allocated pages.
Whether to allocate single order-0 pages or do a higher-order allocation is a choice that depends on the semantics of the specific memory allocation API. Problem is, these semantics can often change based on the actual API usage in kernel code(4). Indeed as of now vmalloc() splits the high-order page obtained from alloc_pages() using split_page(), but this was only a recent change done because some of its callers were relying on the allocated pages to be independent (e.g., doing their own reference counting).
(1) Not to be confused with compound pages, although their refcounting is performed in the same way, i.e. only the first page (PageHead()) is refcounted.
(2) It is actually a little bit more complex than that, all pages except the first are freed regardless of the refcount of the first, to avoid memory leaks in rare situations, see this relevant commit. The refcount sanity check on all the freed pages is done anyway.
(3) Note that allocating high-order pages and then splitting them into order-0 pages is generally not a good idea, as you can guess from the comment on top of split_pages(): "Note: this is probably too low level an operation for use in drivers. Please consult with lkml before using this in your driver." - This is because high-order allocations are harder to satisfy than order-0 allocations, and breaking high-order page blocks only makes it even harder.
(4) Welcome to the magic world of kernel APIs I guess. Much like Hogwarts' staircases, they like to change.

How can i calculate the file offset of the memory virtual address of the export table?

so, i was trying to read a DLL file, everything was fine till i reach the Optional Header Data Directories, specifically its first member, the Export Table.
My problem is that i can't move the offset of my reader because the virtual address member is based on memory VA, and my reader is based on file offset. May a visual example helps:
As you can see, the loaded virtual address that this PE viewer reads at the Export Table Address from the Data Directory(Optional Header) is the value 0x00002630(lets refer to it as hex1 from now on).
However, when i click on the Export Table to see the actual content, the program does the conversion of this address from memory to file offset, redirecting me to this address as the result:
The address that it redirects me is the 0x00001a30(lets refer to it as hex2 from now on).
I did some tests on my own like dividing the hex1 per 8 because i thought it could be the transition from memory alignment which is 4096 and the file alignment which is 512 but it didn't gave me the same result as hex2. I also did some weird stuff to try to get that formula, but it gave me even more bizarre results.
So, my question would be, how could i get/calculate that file offset(hex2) if i only know the memory offset at the Data Directory(hex1)?
Assuming you are using MSVC C/C++, you first need to locate the array of IMAGE_SECTION_HEADER structures following the Optional Header. The SDK has a macro called IMAGE_FIRST_SECTION(pNtHeaders) in which you just pass the pointer of your PE header to make this process easier. It basically just skips past the optional header in memory which is where the section headers start. This macro will also work on either 32-bit or 64-bit Windows PE files.
Once you have the address of the IMAGE_SECTION_HEADER array, you loop through the structures up to FileHeader.NumberOfSections using pointer math. Each of the structures describe the relative starting of a memory address (VirtualAddress) for each of the named PE sections along with the file offset (PointerToRawData) to that section within the file you have loaded.
The size of the section WITHIN the file is SizeOfRawData. At this point, you now have everything you need to translate any given RVA to a file offset. First range check each IMAGE_SECTION_HEADER's VirtualAddress with the RVA you are looking up. I.e.:
if (uRva >= pSect->VirtualAddress && (uRva < (pSect->VirtualAddress + pSect->SizeOfRawData))
{
//found
}
If you find a matching section, you then subtract the VirtualAddress from your lookup RVA, then add the PointerToRawData offset:
uFileOffset = uRva - pSect->VirtualAddress + pSect->PointerToRawData
This results in an offset from the beginning of the file corresponding to that RVA. At this point you have translated the RVA to a file offset.
NOTE: Due to padding, incorrect PE files, etc., you may find not all RVAs will map to a location within the file at which point you might display an error message.

What exactly set_bh_page does for a given buffer head in page cache?

I was diving into the kernel source code and I noticed this function set_bh_page(). However, I could not understand clearly what it does.
I could only find this comment in the fs/buffer.c file:
/* Link the buffer to its page */
set_bh_page(bh, page, offset);
But it is still not clear to me what it does.
So, to make it clear, I want to understand what is the relationship of this function call to the buffer and physical page, as well as if it has anything to do with the page cache itself.
UPDATE 1:
The function alloc_page_buffers() calls this set_bh_page(), and there is some comment about that, as follows:
Create the appropriate buffers when a given a page for data area and the size of each buffer.. User the bh->b_this_page linked list to follow the buffers created. Return NULL if unable to create more buffers.
And I checked who calls the alloc_page_buffers(), which one of them is the read_page(), that has this description:
Read a page from a file.
We both read the page, and attach buffers to the page to record the address of each block (using bmap). These addresses will be used to write the block later, completely bypassing the filesystem. This usage is similar to how swap files are handled, and allows us to write to a file with no concerns of memory allocation failing.
So, by looking through the source code of read_page(), my understanding is that the buffer_head allocated must be associated to its physical page address, like a direct mapping.
Is that correct?
When the kernel needs to access a block from a block device and it discovers that there is no page in the page cache that contains the block, it allocates a page, called a block device buffer page or simply a buffer page, and then writes to it the requested block(s). The process starts with the grow_buffers function, which calls alloc_page_buffers, which is declared as follows:
struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, bool retry);
page points to the descriptor of the buffer page that is going to hold the block. size represents the size of a block in bytes, where all blocks of the buffer page are of the same size. Note that a block is a memory region of the block device, while a buffer is a memory region of main memory. A buffer holds data of a single block and it is of the same size. So a buffer page looks like this:
.
.
.
|-------------|
| buffer |
|-------------|
| buffer |
|-------------|
| buffer |
|-------------|
.
.
.
The block contained in each buffer is identified by a buffer head. You can find the struct declaration of buffer_head here. The b_bdev and b_blocknr fields together identify a block on a block device. Note that each buffer head has a pointer to the next buffer head within the same buffer page. The alloc_page_buffers function allocates and initializes the buffer heads of all the buffers of the specified buffer page. alloc_page_buffers calls the set_bh_page function to initialize two particular fields of the buffer head, b_page and b_data, which are described by the comments in the code:
struct page *b_page; /* the page this bh is mapped to */
char *b_data; /* pointer to data within the page */
As you can see, it "links the buffer to its page".

Is the second parameter in ioremap() gives the size in number of bits for a register- Linux?

My NEC microcontroller has a timer controller register 8-bits -
Do, I need to pass 8 in the second parameter of ioremap?
After reading the spec, I got to know the following property of it.
Address |Function Register Name |Symbol |R/W Manipulatable Bits |Default Val
FFFFF590H |TMP0 control register 0 |TP0CTL0 |R/W √ √ |00H
So, I believe that the physical address at which the Timer register TP0CTL0 is mapped is 0xFFFFF590.
Now, I am remapping this register as the following. After reading more description, I got to know that the register is 8-bit in size.
The spec says "The TPnCTL0 register is an 8-bit register that controls the operation of TMPn."
Is this right? I am using the base address as 0xFFFFF590 and the size of this register is 8-bits. Thus, I have given the size as 8-bit. Is it correct? Is the second paramter of ioremap_nocache is in the size of bits? Is my following API is correct? Have I used the parameters correctly in the function - ioremap_nocache.
void *tp0ctl0 = ioremap_nocache(0xFFFFF590, 8);
Next, I am doing the following -
unsigned int val = ioread8(tp0ctl0);
val = 2;
iowrite8(val, tp0ctl0);
Please correct me here. Please let me know, if I am using the API's correctly or not based on the microcontroller information I have.
The size given to ioremap_* is in bytes not bits. The purpose of this function is to map physical address space into kernel virtual address though, so anything greater than zero and less than or equal to the system page size will be equivalent.
Given the information you provided above, ioremap_nocache(0xFFFFF590, 1) would actually be correct. But the effect of "1" versus "8" will be identical since the system page size is (no doubt) larger than both.

How do you allocate memory at a predetermined location?

How do i allocate memory using new at a fixed location? My book says to do this:
char *buf=new char[sizeof(sample)];
sample *p=new(buf)sample(10,20);
Here new is allocating memory at buf's address, and (10,20) are values being passed. But what is sample? is it an address or a data type?
let me explain this code to you...
char *buf=new char[sizeof(sample)];
sample *p=new(buf)sample(10,20);
This is really four lines of code, written as two for your convenience. Let me just expand them
char *buf; // 1
buf = new char[sizeof(sample)]; // 2
sample *p; // 3
p = new(buf)sample(10,20); // 4
Line 1 and 3 are simple to explain, they are both declaring pointers. buf is a pointer to a char, p is a pointer to a sample. Now, we can not see what sample is, but we can assume that it is either a class defined else where, or some of data type that has been typedefed (more or less just given a new name) but either way, sample can be thought as a data type just link int or string
Line 2 is a allocating a block of memory and assigning it our char pointer called buf. Lets say sample was a class that contains 2 ints, this means it is (under most compilers) going to be 8 bytes (4 per int). And so buf points to the start of a block of memory that has been set aside to hold chars.
Line 4 is where it gets a big complex. if it where just p = new sample(10,20) it would be a simple case of creating a new object of type sample, passing it the two ints and storing the address of this new object in the pointer p. The addition of the (buf) is basically telling new to make use of the memory pointed to by buf.
The end effect is, you have one block of memory allocated (more or less 8 bytes) and it has two pointers pointing to it. One of the points, buf, is looking at that memory as 8 chars, that other, p, is looking at is a single sample.
Why would you do this?
Normally, you wouldn't. Modern C++ has made the sue of new rather redundant, there are many better ways to deal with objects. I guess that main reason for using this method, is if for some reason you want to keep a pool of memory allocated, as it can take time to get large blocks of memory and you might be able to save your self some time.
For the most part, if you think you need to do something like this, you are trying to solve the wrong thing
A Bit Extra
I do not have much experience with embedded or mobile devices, but I have never seen this used.
The code you posted is basically the same as just doing sample *p = new sample(10,20) neither method is controlling where the sample object is created.
Also consider that you do not always need to create objects dynamically using new.
void myFunction(){
sample p = sample(10,20);
}
This automatically creates a sample object for you. This method is much more preferable as it is easier to read and understand and you do not need to worry about deleting the object, it will be cleaned up for you when the function returns.
If you really do need to make use of dynamic objects, consider use of smart pointers, something like unique_ptr<sample> This will give you the ability to use dynamic object creation but save you the hassle of manual deleting the object of type sample (I can point you towards more info on this if you life)
It is a datatype or a typedef.

Resources