How can I access to external memory as heap - memory-management

I am working on embedded project with Keil MDK-ARM compiler. I try to access to external memory as heap but when I download my program in my micro (micro is a lpc1788) after the download is done it crashed (without start main function) in startup.c file, although I can access to external memory (before increase heap part).
Now can anyone provide a small sample project how to configure uVision, using the external RAM as heap? I want to configure uVision for program execution in internal FLASH using internal and external RAM for STACK and HEAP.

First specify the external ram range for RAM1 (for example) in the project Target settings dialog, and make sure all other settings are appropriate for your project.
Then in the Linker settings tab, un-check the "Use Memory Layout from Target Dialog" option. This will allow you to manually edit the scatter file which will initially reflect the layout defined in the Target settings.
Edit the scatter file to create a section in external ram thus (for example):
RW_RAM1 0x60000000 UNINIT 0x00040000 { ; RW data
*(HEAP) ; external SRAM
}
The actual addresses may differ for your part. If you want to use all external RAM for heap that is sufficient, if you want to allow the linker to place other data in this space then:
RW_RAM1 0x60000000 UNINIT 0x00040000 { ; RW data
*(HEAP) ; external SRAM
.ANY (+RW +ZI)
}
Check the map file for the HEAP section to verify that the space was allocated as required.
You can similarly relocate the stack if necessary. But be aware that the external memory access may be be slower than internal so doing so may affect performance.
All this assumes of course that you have correctly initialised the external RAM controller to match the external RAM device - this should be done in system_lpc1788.c (or the similarly named file for your start-up code - my experience is with STM32 so I don't know, perhaps system_lpc17xx.c)

Related

STM32f429: problem with using external SDRAM as an additional data memory

I can read/write an external sdram using fmc in stm32f429. But working with address and read/write functions is not proper for my purpose. I want to introduce external sdram as if internal sram is clearly extended and whenever I define a big variable it is projected to external sdram automatically.
I checked stm32f4 cubemx repository examples (SDRAM+DATAMEMORY) and searched a lot but it seems this is not straightforward.
Following these steps based on what I found, I get hardfault after system_init.
Defining external sdram address and size in the linker (off-chip ram)
Adding some code in startup_stm32f420xx.s
Defining DATA_IN_ExtSDRAM for initializing sdram before main function
Enabling system clock before main function
My external sdram is connected to SDRAM1 in stm32f429.
What is the correct procedure? Is SystemInit_ExtMemCtl() function implemented correctly? Is any modification needed? Is enabling clock before main function and after system_init needed?
Can anyone tell what is the correct code step by step?
Thanks in advance.
What you are asking for is not really possible.
The internal SRAM and external SDRAM are not contiguous; their addresses are a long way apart and variables cannot simply overflow automatically from one to the other.
The correct steps for using the external memory are exactly as given in the example projects, it would be meaningless to repeat them here.
The work you have to do yourself is to decide which variables go in which memory. You can assign a variable to a section using the gcc section attribute or a similar feature of another compiler. There are examples of this in the STM32Cube package.

STM32F4 running FreeRTOS in external RAM

We have a thesis project at work were the guys are trying to get external RAM to work for the STM32F417 MCU.
The project is trying out some stuff that is really resource hungry and the internal RAM just isn't enough.
The question is how to best do this.
The current approach has been to just replace the RAM address in the link script (gnu ld) with the address for external RAM.
The problem with that approach is that during initialisation, the chip has to run on internal RAM since the FSMC has not been initialized.
It seems to work but as soon as pvPortMalloc is run we get a hard fault and it is probably due to dereferencing bogus addresses, we can see that variables are not initialized correctly at system init (which makes sense I guess since the internal RAM is not used at all, when it probably should be).
I realize that this is a vague question, but what is the general approach when running code in external RAM on a Cortex M4 MCU, more specifically the STM32F4?
Thanks
FreeRTOS defines and uses a single big memory area for stack and heap management; this is simply an array of bytes, the size of which is specified by the configTOTAL_HEAP_SIZE symbol in FreeRTOSConfig.h. FreeRTOS allocates tasks stack in this memory area using its pvPortMalloc function, therefore the main goal here is to place the FreeRTOS heap area into external SRAM.
The FreeRTOS heap memory area is defined in heap_*.c (with the exception of heap_3.c that uses the standard library malloc and it doesn't define any custom heap area), the variable is called ucHeap. You can use your compiler extensions to set its section. For GCC, that would be something like:
static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ] __attribute__ ((section (".sram_data")));
Now we need to configure the linker script to place this custom section into external SRAM. There are several ways to do this and it depends again on the toolchain you're using. With GCC one way to do this would be to define a memory region for the SRAM and a section for ".sram_data" to append to the SRAM region, something like:
MEMORY
{
...
/* Define SRAM region */
sram : ORIGIN = <SRAM_START_ADDR>, LENGTH = <SRAM_SIZE>
}
SECTIONS
{
...
/* Define .sram_data section and place it in sram region */
.sram_data :
{
*(.sram_data)
} >sram
...
}
This will place the ucHeap area in external SRAM, while all the other text and data sections will be placed in the default memory regions (internal flash and ram).
A few notes:
make sure you initialize the SRAM controller/FSMC prior to calling any FreeRTOS function (like xTaskCreate)
once you start the tasks, all stack allocated variables will be placed in ucHeap (i.e. ext RAM), but global variables are still allocated in internal RAM. If you still have internal RAM size issues, you can configure other global variables to be placed in the ".sram_section" using compiler extensions (as shown for ucHeap)
if your code uses dynamic memory allocation, make sure you use pvPortMalloc/vPortFree, instead of the stdlib malloc/free. This is because only pvPortMalloc/vPortFree will use the ucHeap area in ext RAM (and they are thread-safe, which is a plus)
if you're doing a lot of dynamic task creation/deletion and memory allocation with pvPortMalloc/vPortFree with different memory block sizes, consider using heap_4.c instead of heap_2.c. heap_2.c has memory fragmentation problems when using several different block sizes, whereas heap_4.c is able to combine adjacent free memory blocks into a single large block
Another (and possibly simpler) solution would be to define the ucHeap variable as a pointer instead of an array, like this:
static uint8_t * const ucHeap = <SRAM_START_ADDR>;
This wouldn't require any special linker script editing, everything can be placed in the default sections. Note that with this solution the linker won't explicitly reserve any memory for the heap and you will loose some potentially useful information/errors (like heap area not fitting in ext RAM). But as long as you only have ucHeap in external RAM and you have configTOTAL_HEAP_SIZE smaller than external RAM size, that might work just fine.
When the application starts up it will try to initialise data by either clearing it to zero, or initialising it to a non-zero value, depending on the section the variable is placed in. Using a normal run time model, that will happen before main() is called. So you have something like:
1) Reset vector calls init code
2) C run time init code initialises variables
3) C run time init code calls main()
If you use the linker to place variables in external RAM then you need to ensure the RAM is accessible before that initialisation takes place, otherwise you will get a hard fault. Therefore you need to either have a boot loader that sets up the system for you, then starts your application....or more simply just edit the start up code to do the following:
1) Reset vector calls init code
2) >>>C run time init code configures external RAM<<<
3) C run time init code initialised variables
4) C run time init code calls main().
That way the RAM is available before you try to access it.
However, if all you want to do is have the FreeRTOS heap in external RAM, then you can leave the init code untouched, and just use an appropriate heap implementation - basically one that does not just declare a large static array. For example, if you use heap_5 then all you need to do is ensure the heap init function is called before any allocation is performed, because the heap init just describes which RAM to use as the heap, rather than statically declaring the heap.

Unknown symbol flush_cache_range in linux device driver

I am just writing my very first linux device driver, and I have ran into a problem. I want to prevent one memory region from being cached, so I have been trying to use flush_cache_range() and flush_tlb_range() to flush the cache for this memory region. Everything compiles well, but when I try to load the kernel module I get the following errors:
Unknown symbol flush_cache_range (err 0)
Unknown symbol flush_tlb_range (err 0)
I find this very strange. Shouldn't they be defined in kernel?
I know that alternatively I could also use dma_alloc_coherent() to allocate a non-cached memory region. But I don't have a device structure and passing NULL for this parameter didn't cause any errors, but I also couldn't see any of the data that was supposed to be there.
Some information about my system: I'm trying to get this running on a ARM microcontroller with an integrated FPGA (the Xilinx Zynq). The FPGA copies some data to a memory location specified by the CPU. Now I want to access this memory without getting old data from the caches.
Any help is very appreciated.
You cannot use functions such as flush_cache_range() because they are not intended to be used by modules.
To allocate memory that can be accessed by a DMA device, you must use dma_alloc_coherent().
This requires a valid device structure so that it can do proper mapping between memory addresses and bus addresses.
If your device is not on a bus that is handled by an existing framework (such as PCI), you have to create a platform device.
A few notes:
1- flush_cache_range doesn't "prevent one memory region from being cached" .. It just simply flush (clean + invalidate) the caches. Any future writes/reads to this memory region through the same virtual range will go through the cache again.
2- If the FPGA is writing to memory and then the CPU are going to read from this memory, probably flushing the cache isn't the correct thing to do any way. Usually what you need to do is to invalidate the memory region and then tell the FPGA to write.
3- Please take a look at "${kernel-src}/Documentation/DMA-API.txt" in the kernel sources. It has plenty of information about how you can safely ( cache maintenance + phys_to_dma translation ) use a specific region of memory for DMA.

How does a PE file get mapped into memory?

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

windows memory segmentation & Ollydbg

a few questions about windows memory segmentation.
every process in windows got his own virtual memory. does it mean that each each process has it own task
(I mean own Task descriptor or Task gate) ?
I opened a simple exe with ollydbg and I saw that for each CALL intruction to a dll function is taking me to the jumping table. the jumping table had jumping instructions to the DLLs like this one :
JMP DWORD PTR DS:[402058]
my question is why its uses the data segment and not the CS selector for the base address?
if I open the memory map and find what stored at 402058 I find that it containes resorces.
if I understand correctly the addresses of the DLL function stored in the DS ?
I noticed that the memory map is organized by owner. shouldn't it be organized with segments like all the code be in CS data in DS etc ?
thank you
1.
A Process has it's own virtual address space.
I do not understand what you're referring to as "Task descriptor or Task gate", but the Windows operating system holds a descriptor for each process, called the Process Control Block, that contains information about the process (such as identification, access tokens, execution state, virtual memory mapping, etc).
A Task is a logical unit that can be used to manage a single process, or multiple processes.
Job -> Tasks
Task -> Processes
Process -> Threads
2.
In the case you mentioned, which is common for compilers, the program uses the .DATA section to store the jump table after loading the function addresses.
The reason why this happens in the first place is because the compiler cannot know the DLL base address at compile-time, therefore the address has to be fixed at load-time to point to the function. This is known as Relocation.
In order to maintain the jump table seperately from the code, compilers store it in the .DATA section. This way, we can also give it write permissions (usually the .DATA segment has write permissions) and modify it as necessary without sacrificing stability and security.
3.
Each module loaded in the process' virtual address space contains it's own sections - that's why you see a different set of .text, .data, .reloc etc for each module. The "Owner" column is the module name.
P.S. Please ask one question per post - that way it will be easily accesible by other users after you get answered, and each question will likely get more accurate answers.

Resources