Why is non-zeroed memory only a problem with big data usage? - memory-management

I was doing a graded programming assignment — an implementation of Rope data structure. The grader fed it an initial string and a series of edit operations. I did my development in C++ on a Linux machine. After testing my solution locally with small inputs (a string of ca 10 chars) I posted it to the grader, but got Segmentation Fault on one of the test cases.
I have generated a random input data with the maximum size given in the assignment specs (the string of 300k characters). I also got the Segmentation Fault locally. After a short debugging I found out that the leaves of my tree had random left and right pointers instead of NULL. After replacing the new Vertex calls with new Vertex() (the latter calls the default constructor, unlike the former which leaves the memory as-is) the code worked fine and got accepted by the grader.
This however makes me wonder — why did my code work correctly with a small input, both locally and on the grader’s machine? Is some amount of heap guaranteed to be zeroed when I run a process? Is this an artifact of some previously run program? What exactly is happening here?

Uninitialised objects can have any value. Uninitialised pointers can contain null, they can contain valid pointers by coincidence, or contain invalid pointers. It is completely undefined. Your program will behave accordingly. And it’s quite possible that memory is filled with some amount of zeroes followed by some amount of rubbish.
There may be a compiler option that will fill uninitialised variables with data that is likely to lead to a crash. More likely, there may be compiler options warning you when you use an uninitialised variable.

Related

Flash ECC algorithm on STM32L1xx

How does the flash ECC algorithm (Flash Error Correction Code) implemented on STM32L1xx work?
Background:
I want to do multiple incremental writes to a single word in program flash of a STM32L151 MCU without doing a page erase in between. Without ECC, one could set bits incrementally, e.g. first 0x00, then 0x01, then 0x03 (STM32L1 erases bits to 0 rather than to 1), etc. As the STM32L1 has 8 bit ECC per word, this method doesn't work. However, if we knew the ECC algorithm, we could easily find a short sequence of values, that could be written incrementally without violating the ECC.
We could simply try different sequences of values and see which ones work (one such sequence is 0x0000001, 0x00000101, 0x00030101, 0x03030101), but if we don't know the ECC algorithm, we can't check, whether the sequence violates the ECC, in which case error correction wouldn't work if bits would be corrupted.
[Edit] The functionality should be used to implement a simple file system using STM32L1's internal program memory. Chunks of data are tagged with a header, which contains a state. Multiple chunks can reside on a single page. The state can change over time (first 'new', then 'used', then 'deleted', etc.). The number of states is small, but it would make things significantly easier, if we could overwrite a previous state without having to erase the whole page first.
Thanks for any comments! As there are no answers so far, I'll summarize, what I found out so far (empirically and based on comments to this answer):
According to the STM32L1 datasheet "The whole non-volatile memory embeds the error correction code (ECC) feature.", but the reference manual doesn't state anything about ECC in program memory.
The datasheet is in line with what we can find out empirically when subsequentially writing multiple words to the same program mem location without erasing the page in between. In such cases some sequences of values work while others don't.
The following are my personal conclusions, based on empirical findings, limited research and comments from this thread. It's not based on official documentation. Don't build any serious work on it (I won't either)!
It seems, that the ECC is calculated and persisted per 32-bit word. If so, the ECC must have a length of at least 7 bit.
The ECC of each word is probably written to the same nonvolatile mem as the word itself. Therefore the same limitations apply. I.e. between erases, only additional bits can be set. As stark pointed out, we can only overwrite words in program mem with values that:
Only set additional bits but don't clear any bits
Have an ECC that also only sets additional bits compared to the previous ECC.
If we write a value, that only sets additional bits, but the ECC would need to clear bits (and therefore cannot be written correctly), then:
If the ECC is wrong by one bit, the error is corrected by the ECC algorithm and the written value can be read correctly. However, ECC wouldn't work anymore if another bit failed, because ECC can only correct single-bit errors.
If the ECC is wrong by more than one bit, the ECC algorithm cannot correct the error and the read value will be wrong.
We cannot (easily) find out empirically, which sequences of values can be written correctly and which can't. If a sequence of values can be written and read back correctly, we wouldn't know, whether this is due to the automatic correction of single-bit errors. This aspect is the whole reason for this question asking for the actual algorithm.
The ECC algorithm itself seems to be undocumented. Hamming code seems to be a commonly used algorithm for ECC and in AN4750 they write, that Hamming code is actually used for error correction in SRAM. The algorithm may or may not be used for STM32L1's program memory.
The STM32L1 reference manual doesn't seem to explicitely forbid multiple writes to program memory without erase, but there is no documentation stating the opposit either. In order not to use undocumented functionality, we will refrain from using such functionality in our products and find workarounds.
Interessting question.
First I have to say, that even if you find out the ECC algorithm, you can't rely on it, as it's not documented and it can be changed anytime without notice.
But to find out the algorithm seems to be possible with a reasonable amount of tests.
I would try to build tests which starts with a constant value and then clearing only one bit.
When you read the value and it's the start value, your bit can't change all necessary bits in the ECC.
Like:
for <bitIdx>=0 to 31
earse cell
write start value, like 0xFFFFFFFF & ~(1<<testBit)
clear bit <bitIdx> in the cell
read the cell
next
If you find a start value where the erase tests works for all bits, then the start value has probably an ECC of all bits set.
Edit: This should be true for any ECC, as every ECC needs always at least a difference of two bits to detect and repair, reliable one defect bit.
As the first bit difference is in the value itself, the second change needs to be in the hidden ECC-bits and the hidden bits will be very limited.
If you repeat this test with different start values, you should be able to gather enough data to prove which error correction is used.

Efficient Algorithm for Parsing OpCodes

Let's say I'm writing a virtual machine. I read in the program data into an array of bytes. Now I need to loop through those bytes (instructions are two bytes) and instantiate a little class representing each instruction and it's arguments.
What would be a fast parsing approach? Here are the two way's I've thought of:
Logically branching by inspecting each bit from the left to the right until I narrowed it down to a particular op code. This would be like a binary search.
Inspecting some programs to come up with a list of opcodes ordered by frequency of use, and then checking the for the full opcode in that order.
Note: I will be using bit shifting and masking in C to check, not regexes or string comps or anything high-level like that.
You don't need to parse anything. If this is in C, you make a table of function pointers which has 256 entries in it, one for each possible byte value, then jump to the appropriate function based on the first byte value. If the second byte is significant then a switch statement can be used within the function to handle the second byte. This is how the original Visual Basic interpreter (versions 1-6) worked.

What are the alignas and alignof keywords used for?

I'm having trouble understand what the purpose of the alignas and alignof keywords are, and I'm not quite sure I fully understand what alignment is.
As I understand it, a memory address is aligned to n bytes if it is divisible by n, that is, it can be got to by counting 'n' bytes at a time (from 0? or some default value?). Also, the alignas keyword, when prefixing a variable declaration, specifies how the address at which the variable is stored is to be aligned, and the alignof returns how a variable's address is aligned.
However, I am not confident that this is a correct understanding of alignment or the alignof/alignas keywords - please correct me on any of the points I got wrong. I also don't see what use these keywords serve, so I would appreciate it if anyone could point out what their purpose is.
Some special types must be aligned at more bytes than usual- for example, matrices must be aligned at 16bytes on x86 for the most efficient copying to the GPU. SSE vector types can behave this way too. As such, if you want to make a container type, then you must know the alignment requirements of the type you're trying to contain or allocate.

Mapping Untyped Lisp data into a typed binary format for use in compiled functions

Background: I'm writing a toy Lisp (Scheme) interpreter in Haskell. I'm at the point where I would like to be able to compile code using LLVM. I've spent a couple days dreaming up various ways of feeding untyped Lisp values into compiled functions that expect to know the format of the data coming at them. It occurs to me that I am not the first person to need to solve this problem.
Question: What are some historically successful ways of mapping untyped data into an efficient binary format.
Addendum: In point of fact, I do know which of about a dozen different types the data is, I just don't know which one might be sent to the function at compile time. The function itself needs a way to determine what it got.
Do you mean, "I just don't know which [type] might be sent to the function at runtime"? It's not that the data isn't typed; certainly 1 and '() have different types. Rather, the data is not statically typed, i.e., it's not known at compile time what the type of a given variable will be. This is called dynamic typing.
You're right that you're not the first person to need to solve this problem. The canonical solution is to tag each runtime value with its type. For example, if you have a dozen types, number them like so:
0 = integer
1 = cons pair
2 = vector
etc.
Once you've done this, reserve the first four bits of each word for the tag. Then, every time two objects get passed in to +, first you perform a simple bit mask to verify that both objects' first four bits are 0b0000, i.e., that they are both integers. If they are not, you jump to an error message; otherwise, you proceed with the addition, and make sure that the result is also tagged accordingly.
This technique essentially makes each runtime value a manually-tagged union, which should be familiar to you if you've used C. In fact, it's also just like a Haskell data type, except that in Haskell the taggedness is much more abstract.
I'm guessing that you're familiar with pointers if you're trying to write a Scheme compiler. To avoid limiting your usable memory space, it may be more sensical to use the bottom (least significant) four bits, rather than the top ones. Better yet, because aligned dword pointers already have three meaningless bits at the bottom, you can simply co-opt those bits for your tag, as long as you dereference the actual address, rather than the tagged one.
Does that help?
Your default solution should be a simple tagged union. If you want to narrow your typing down to more specific types, you can do it - but it won't be that "toy" any more. A thing to look at is called abstract interpretation.
There are few successful implementations of such an optimisation, with V8 being probably the most widespread. In the Scheme world, the most aggressively optimising implementation is Stalin.

Matlab: avoiding memory allocation in mex

I'm trying to make my mex library avoid all memory allocation what so even.
Until now, the mex got an input, created some matrices using mxCreate...() and returned this output.
But now I'd like to modify this interface so that the mex itself would not do any allocations.
What I had in mind is that the mexFunction will get as input the matrix to fill values into and return this very same matrix as an output.
Is this supposed to be possible?
The slight alarm that got me thinking if this is at all something I need to be doing is that the left hand arguments come to the mexFunction as const and the right hand argument are non-const. to return the input matrix as an output I'll need to remove this const.
Funnily enough I was just looking at this the other day. The best info I found was threads here and here and also this.
Basically it is generally considered a very bad thing in Matlab world... but at the same time, nothing stops you so you can do it - try some simple examples and you will see that the changes are propogated. Just make changes to the data you get from prhs (you don't need to return anything - since you changed the raw data it will be reflected in the variable in the workspace).
However as pointed out in the links, this can have strange consequences, because of Matlabs copy-on-write semantics. Setting format debug can help a lot with getting intuition on this. If you do a=b then you will see a and b have different 'structure addresses' or headers, representing the fact that they are different variables, but the data pointer, pr, points to the same area in memory. Normally, if you change y in Matlab, copy-on-write kicks in and the data area is copied before being changed, so after y has a new data pointer. When you change things in mex this doesn't happen, so if you changed y, x would also change.
I think it's OK to do it - it's incredibly useful if you need to handle large datasets, but you need to keep an eye out for any oddness - try to make sure the data your putting in isn't shared among variables. Things get even more complicated with struct and cell arrays so I would be more inclined to avoid doing it to those.
Modifying the right-hand arguments would be a bad idea. Those inputs can be reference counted, and if you modify them when the reference count is greater than one, then you will be silently modifying the value stored in other variables as well.
Unfortunately, I don't believe there is a way to do what you want given the existing MEX API.

Resources