Resource Sizing with only static memory use

Resource Sizing with only static memory use - gcc

For my embedded application, we are using a STM32F411 chip. The chip has 512kb of flash and 128kb of RAM.
I wanted to do a resource sizing exersize so I can monitor how I am doing on resources (FLASH and RAM)
I only statically allocate memory with no Malloc() calls. and sizing with gcc gives me:
text data bss dec hex filename
230868 11236 74048 316152 4d2f8 application.elf
From the readings I have done (https://mcuoneclipse.com/2013/04/14/text-data-and-bss-code-and-data-size-explained/) I understand that because there are no dynamically allocated resources, the above information should give me a clear measure of how deep into the RAM usage I will run.
Can I expect the RAM use to ultimately be the data section + the bss sections per the summary on the link above? So in this case 85284 bytes.
And the Flash size to be text + data sections. In this case: 242104 ?

Can I expect the RAM use to ultimately be the data section + the bss
sections per the summary on the link above? So in this case 85284
bytes.
Depending on your linker script. Especially stack and heap configuration. Same is with the text segment & data segment.
For more detailed information you need to see the .map file.

Yes, but also consider that even though you don't explicitly use dynamic memory in your code, library functions might. If you're trying to maintain super-tight control over memory use and you have an application that uses close to your total amount of RAM, you'll need to account for that. If you don't, you may run into nasty runtime issues.

In short, yes. Because of the need to store the initializers for the initialized data section, the "data" section counts twice in the memory usage -- once for flash and once for RAM. This is also why it is important to be very diligent about declaring constant data as "const". That data is then placed in flash and only counts once in the overall memory usage.

Related

what is the use of attaching static data along with a program when it is loaded on the main memory?

When the operating system loads Program onto the main memory , it , along with the stack and heap memory , also attaches the static data along with it. I googled about what is present in the static data which said it contained the global variables and static variables. But I am confused as both of these are already present in the text file of the program then why do we add them seperately?

The data in the executable is often referred as the data segment. The CPU doesn't interact with the hard-disk but only with RAM. The data segment must thus be loaded in RAM before the CPU can access it. The file of the executable is not really a text file. It is an executable so it has a different extension. Text files often refer to an actual file with a .txt extension.
With that said, you also asked another question not long ago (If the amount of stack memory provided to a program is fixed then why does it grow downwards in the process architecture? Or am I getting it wrong?) so I will try to give some insight for both of these in this same answer.
I don't know much about caching and low level inner CPU workings but, today mostly, the CPU doesn't even operate on RAM directly. It will load a bunch of RAM chunks into the cache and make operations on them and keep RAM-cache consistency by implementing complex mechanisms. The OS also has its role to play in RAM-cache consistency but, like I said, I am far from an expert here. Other than that, caching is mostly transparent to the OS. The CPU handles it and the OS simply provides instructions to the CPU which executes them.
Today, you have paging used by most OS and implemented on most CPU architectures. With paging, every process sees a full contiguous virtual address space. The virtual address space is accessed contiguously and the hardware MMU translates those addresses to physical ones automatically by crossing the page tables. The OS is responsible to make sure the page tables are consistent and the MMU does the rest of the job (for more info read: What is paging exactly? OSDEV). If you understand paging well, things become much clearer.
For a process, there is mostly 3 types of memory. There is the stack (often called automatic storage), the heap and the static/global data. I will attempt to give precision on all of these to give a global picture.
The stack is given a maximum size when the process begins. The OS handles that and creates the page tables and places the proper address in the stack pointer register so that stack accesses reach the proper region of physical memory. The stack is automatic storage which means that it isn't handled manually by the high level programmer. For example, in C/C++, the stack is managed by the compiler which, at the entry of a function, will create a stack frame and place offsets from the stack base pointer in the instructions. Every local variable (within a function) will be accessed with a relative negative offset from the stack base pointer. What the compiler needs to do is to create a stack frame of the proper size so that there will be enough place for all local variables of a particular function (for more info on the stack see: Each program allocates a fixed stack size? Who defines the amount of stack memory for each application running?).
For the heap, the OS reserves a very big amount of virtual memory. Today, virtual memory is very big (2^48 bytes or more). The amount of heap available for each process is often only limited by the amount of physical memory available to back virtual memory allocations. For example, a process could use malloc() to allocate 4KB of memory in C. The OS will be called with a system call by the libc library which is an implementation of the C standard library. The OS will then reserve a page of the virtual memory available for the heap and change the page tables so that accessing that portion of virtual memory will translate to somewhere in RAM (probably somewhere another process wasn't already using).
The static/global data are simply placed in the executable in the data segment. The data segment is loaded in the virtual memory alongside the text segment. The text segment will thus be able to access this data often using RIP-relative addressing.

Memory allocation under uCOS-III

I'm developing a C-library to be used under uCOS-III. The CPU is an ARM Cortex M4 SAM4C. Within the library I want to use a third party product X, whose particular name is not relevant here. The source code for X is completely available and compiles without problems.
Inside X a lot of memory allocations are executed, using calloc() and free().
The problem is, that plain usage of malloc is not advisable for embedded systems, because of memory fragmentation. The documentation for uCOS-III explicitly advises against using malloc - instead OSMemCreate/OSMemGet/OSMemPut shall be used to allocate and free chunks of memory out of a statically allocated memory block.
Question-1:
What is the general advice to get around the "standard implementation" of malloc? I would prefer a kind of malloc, where I have access to a fixed memory pool (e.g. dedicated for a special task)
Question-2:
How should OSMemCreate() be used correctly? I have first to initialize a memory partition with a certain block size. The amount of requested memory is between 4 Bytes and about 800 bytes. I can get blocks on request, but with fixed size. If block-size=4 I cannot allocate 16 Bytes, since blocks are not contiguous in memory. If block-size=800 and I need only 4 bytes, most of the block is left unused and I will very soon run out of blocks.
So I don't know, how to solve my original problem by use of OSMemCreate...
Can anybody give me an advice how I could proceed?
Many thanks,
Michael

1) Don't link with the standard library version of malloc/free. Instead create your own implementation of malloc/free that serves as a wrapper to OSMemGet/OSMemPut.
2) You can create more than one memory partition with OSMemCreate. Create small, medium, and large partitions that hold block sizes which are tuned for your application to reduce waste.
If you want malloc to get an appropriately sized block from your various memory partitions then you'll have to invent some magic so that free returns the block to the appropriate memory partition. (Perhaps malloc allocates an extra word, stores the pointer to the memory partition in the first word, and then returns the address after the word where the pointer is stored. Then free knows to get the memory partition pointer from the preceding word.)
An alternative to using malloc/free is to rewrite that code to use statically allocated variables or call OSMemGet/OSMemPut directly.

map a buffer from Kernel to User space allocated by another module

I am developing a Linux kernel driver on 3.4. The purpose of this driver is to provide a mmap interface to Userspace from a buffer allocated in an other kernel module likely using kzalloc() (more details below). The pointer provided by mmap must point to the first address of this buffer.
I get the physical address from virt_to_phys(). I give this address right shifted by PAGE_SHIFT to remap_pfn_range() in my mmap fops call.
It is working for now but it looks to me that I am not doing the things properly because nothing ensure me that my buffer is at the top of the page (correct me if I am wrong). Maybe mmap()ing is not the right solution? I have already read the chapter 15 of LDD3 but maybe I am missing something?
Details:
The buffer is in fact a shared memory region allocated by the remoteproc module. This region is used within an asymetric multiprocessing design (OMAP4). I can get this buffer thanks to the rproc_da_to_va() call. That is why there is no way to use something like get_free_pages().
Regards
Kev

Yes, you're correct: there is no guarantee that the allocated memory is at the beginning of a page. And no simple way for you to both guarantee that and to make it truly shared memory.
Obviously you could (a) copy the data from the kzalloc'd address to a newly allocated page and insert that into the mmap'ing process' virtual address space, but then it's not shared with the original datum created by the other kernel module.
You could also (b) map the actual page being allocated by the other module into the process' memory map but it's not guaranteed to be on a page boundary and you'd also be sharing whatever other kernel data happened to reside in that page (which is both a security issue and a potential source of kernel data corruption by the user-space process into which you're sharing the page).
I suppose you could (c) modify the memory manager to return every piece of allocated data at the beginning of a page. This would work, but then every time a driver wants to allocate 12 bytes for some small structure, it will in fact be allocating 4K bytes (or whatever your page size is). That's going to waste a huge amount of memory.
There is simply no way to trick the processor into making the memory appear to be at two different offsets within a page. It's not physically possible.
Your best bet is probably to (d) modify the other driver to allocate the specific bits of data that you want to make shared in a way that ensures alignment on a page boundary (i.e. something you write to replace kzalloc).

Memory profiling tools and methods

I'm attempting to analyze memory usage of our Windows Phone 7 app. Querying the ApplicationPeakMemoryUsage property yields a value of ~90Mb following a soak test. System.GC.GetTotalMemory(true) returns ~11Mb at this time, so the balance must be unmanaged memory. The app does not explicitly allocate any unmanaged memory, so I assume the balance is GPU assets, audio and the app binary itself.
By surrounding calls to ContentManager.Load() and GPU resource allocations (new RenderTarget2D(), etc). with code similar to
System.GC.Collect();
unused = System.GC.GetTotalMemory(true);
GC.WaitForPendingFinalizers();
long mem = ((long)Microsoft.Phone.Info.DeviceExtendedProperties.GetValue("ApplicationCurrentMemoryUsage"));
.
. // perform loads/allocations
.
mem = ((long)Microsoft.Phone.Info.DeviceExtendedProperties.GetValue("ApplicationCurrentMemoryUsage")) - mem;
I am able to obtain approximate figures for memory used by render buffers, texture/audio resources etc. These total ~45-50Mb across my app. ApplicationCurrentMemoryUsage yields just under 10Mb immediately at the start of initialization. Subtracting the 11Mb managed heap as well (which is partly double-counting!), this leaves ~20Mb unaccounted for.
The Mango memory profiler tracks the totals but only breaks down allocations for the managed heap. What else might be using large quantities of unmanaged memory other than GPU resources, audio and the app binary itself? Are there any more sensible tools or methods for tracking memory than what I am doing?

Downloaded files (including images from the web) can use lots of memory. If you're using them be sure to free up the memory again properly (see http://blogs.msdn.com/b/swick/archive/2011/04/07/image-tips-for-windows-phone-7.aspx).

Are you using a WebBrowser control?
It has some flaws and can cause huge (and incremental) memory leaks in some scenarios, especially if the page contains many media or complicated scripts, or when its page is reloaded/changed with unlucky timing..

Is there any way to convert unshared heap memory to shared memory? Primary target *nix

I was wondering whether there is any reasonably portable way to take existing, unshared heap memory and to convert it into shared memory. The usage case is a block of memory which is too large for me to want to copy it unnecessarily (i.e. into a new shared memory segment) and which is allocated by routines out of my control. The primary target is *nix/POSIX, but I would also be interested to know if it can be done on Windows.

Many *nixe have Plan-9's procfs which allows to open read a process's memory by inspecting /proc/{pid}/mem
http://en.wikipedia.org/wiki/Procfs
You tell the other process your pid, size and the base address and it can simply read the data (or mmap the region in its own address space)
EDIT:: Apparently you can't open /proc/{pid}/mem without a prior ptrace(), so this is basically worthless.
Under most *nixes, ptrace(2) allows to attach to a process and read its memory.
The ptrace method does not work on OSX, you need more magic there:
http://www.voidzone.org/readprocessmemory-and-writeprocessmemory-on-os-x/
What you need under Windows is the function ReadProcessMemory
.
Googling "What is ReadProcessMemory for $OSNAME" seems to return comprehensive result sets.

You can try using mmap() with MAP_FIXED flag and address of your unshared memory (allocated from heap). However, if you provide mmap() with own pointer then it is constrained to be aligned and sized according to the size of memory page (may be requested by sysconf()) since mapping is only available for the whole page.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio