Win32 EXCEPTION_INT_OVERFLOW vs EXCEPTION_INT_DIVIDE_BY_ZERO - windows

I have a question about the EXCEPTION_INT_OVERFLOW and EXCEPTION_INT_DIVIDE_BY_ZERO exceptions.
Windows will trap the #DE errors generated by the IDIV instruction and will end up generating and SEH exception with one of those 2 codes.
The question I have is how does it differentiate between the two conditions? The information about idiv in the Intel manual indicates that it will generate #DE in both the "divide by zero" and "underflow cases".
I took a quick look at the section on the #DE error in Volume 3 of the intel manual, and the best I could gather is that the OS must be decoding the DIV instruction, loading the divisor argument, and then comparing it to zero.
That seems a little crazy to me though. Why would the chip designers not use a flag of some sort to differentiate between the 2 causes of the error? I feel like I must be missing something.
Does anyone know for sure how the OS differentiates between the 2 different causes of failure?

Your assumptions appear to be correct. The only information available on #DE is CS and EIP, which gives the instruction. Since the two status codes are different, the OS must be decoding the instruction to determine which.
I'd also suggest that the chip makers don't really need two separate interrupts for this case, since anything divided by zero is infinity, which is too big to fit into your destination register.
As for "knowing for sure" how it differentiates, all of those who do know are probably not allowed to reveal it, either to prevent people exploiting it (not entirely sure how, but jumping into kernel mode is a good place to start looking to exploit) or making assumptions based on an implementation detail that may change without notice.
Edit: Having played with kd I can at least say that on the particular version of Windows XP (32-bit) I had access to (and the processor it was running on) the nt!Ki386CheckDivideByZeroTrap interrupt handler appears to decode the ModRM value of the instruction to determine whether to return STATUS_INTEGER_DIVIDE_BY_ZERO or STATUS_INTEGER_OVERFLOW.
(Obviously this is original research, is not guaranteed by anyone anywhere, and also happens to match the deductions that can be made based on Intel's manuals.)

Zooba's answer summarizes the Windows parses the instruction to find out what to raise.
But you cannot rely on that the routine correctly chooses the code.
I observed the following on 64 bit Windows 7 with 64 bit DIV instructions:
If the operand (divisor) is a memory operand it always raises EXCEPTION_INT_DIVIDE_BY_ZERO, regardless of the argument value.
If the operand is a register and the lower dword is zero it raises EXCEPTION_INT_DIVIDE_BY_ZERO regardless if the upper half isn't zero.
Took me a day to find this out... Hope this helps.

Related

Flash ECC algorithm on STM32L1xx

How does the flash ECC algorithm (Flash Error Correction Code) implemented on STM32L1xx work?
Background:
I want to do multiple incremental writes to a single word in program flash of a STM32L151 MCU without doing a page erase in between. Without ECC, one could set bits incrementally, e.g. first 0x00, then 0x01, then 0x03 (STM32L1 erases bits to 0 rather than to 1), etc. As the STM32L1 has 8 bit ECC per word, this method doesn't work. However, if we knew the ECC algorithm, we could easily find a short sequence of values, that could be written incrementally without violating the ECC.
We could simply try different sequences of values and see which ones work (one such sequence is 0x0000001, 0x00000101, 0x00030101, 0x03030101), but if we don't know the ECC algorithm, we can't check, whether the sequence violates the ECC, in which case error correction wouldn't work if bits would be corrupted.
[Edit] The functionality should be used to implement a simple file system using STM32L1's internal program memory. Chunks of data are tagged with a header, which contains a state. Multiple chunks can reside on a single page. The state can change over time (first 'new', then 'used', then 'deleted', etc.). The number of states is small, but it would make things significantly easier, if we could overwrite a previous state without having to erase the whole page first.
Thanks for any comments! As there are no answers so far, I'll summarize, what I found out so far (empirically and based on comments to this answer):
According to the STM32L1 datasheet "The whole non-volatile memory embeds the error correction code (ECC) feature.", but the reference manual doesn't state anything about ECC in program memory.
The datasheet is in line with what we can find out empirically when subsequentially writing multiple words to the same program mem location without erasing the page in between. In such cases some sequences of values work while others don't.
The following are my personal conclusions, based on empirical findings, limited research and comments from this thread. It's not based on official documentation. Don't build any serious work on it (I won't either)!
It seems, that the ECC is calculated and persisted per 32-bit word. If so, the ECC must have a length of at least 7 bit.
The ECC of each word is probably written to the same nonvolatile mem as the word itself. Therefore the same limitations apply. I.e. between erases, only additional bits can be set. As stark pointed out, we can only overwrite words in program mem with values that:
Only set additional bits but don't clear any bits
Have an ECC that also only sets additional bits compared to the previous ECC.
If we write a value, that only sets additional bits, but the ECC would need to clear bits (and therefore cannot be written correctly), then:
If the ECC is wrong by one bit, the error is corrected by the ECC algorithm and the written value can be read correctly. However, ECC wouldn't work anymore if another bit failed, because ECC can only correct single-bit errors.
If the ECC is wrong by more than one bit, the ECC algorithm cannot correct the error and the read value will be wrong.
We cannot (easily) find out empirically, which sequences of values can be written correctly and which can't. If a sequence of values can be written and read back correctly, we wouldn't know, whether this is due to the automatic correction of single-bit errors. This aspect is the whole reason for this question asking for the actual algorithm.
The ECC algorithm itself seems to be undocumented. Hamming code seems to be a commonly used algorithm for ECC and in AN4750 they write, that Hamming code is actually used for error correction in SRAM. The algorithm may or may not be used for STM32L1's program memory.
The STM32L1 reference manual doesn't seem to explicitely forbid multiple writes to program memory without erase, but there is no documentation stating the opposit either. In order not to use undocumented functionality, we will refrain from using such functionality in our products and find workarounds.
Interessting question.
First I have to say, that even if you find out the ECC algorithm, you can't rely on it, as it's not documented and it can be changed anytime without notice.
But to find out the algorithm seems to be possible with a reasonable amount of tests.
I would try to build tests which starts with a constant value and then clearing only one bit.
When you read the value and it's the start value, your bit can't change all necessary bits in the ECC.
Like:
for <bitIdx>=0 to 31
earse cell
write start value, like 0xFFFFFFFF & ~(1<<testBit)
clear bit <bitIdx> in the cell
read the cell
next
If you find a start value where the erase tests works for all bits, then the start value has probably an ECC of all bits set.
Edit: This should be true for any ECC, as every ECC needs always at least a difference of two bits to detect and repair, reliable one defect bit.
As the first bit difference is in the value itself, the second change needs to be in the hidden ECC-bits and the hidden bits will be very limited.
If you repeat this test with different start values, you should be able to gather enough data to prove which error correction is used.

How to see the local variable in DDC-I debugger?

I am trying to see the index value of for loop in DDC-I debugger and it always shows me ERROR.
With the assembly of the same, it shows the following instruction:
cmp cr7,0,r20,r23
so it's comparing r20 and r23 but both of these registers don't hold the index value. I am not sure what is cr7 ?
In short, most embedded tool chains (including the ones you pay for) are horrible about reconstructing local/automatic variables in even lightly optimized code. A lot of them simply can't reconstruct variables that never have storage because they live in registers the whole time (loop index variables like the one you can't see are typical cases). Some even have issues with interim computation holders, and arguments (since they're almost always passed as registers).
Typical strategies might be:
Temporarily turning off optimizations around the code in question
Temporarily moving the variable in question to the global scope
Becoming proficient at reading disassembly.
This isn't a terribly practical answer, but it is surprising for a lot of people that are new to the embedded world or never had the luxury of a source level debugger on their embedded platform.
On PowerPC there are eight CR fields, cr0 to cr7. If you don't specify a CR field for a compare result the default is cr0, but in this case cr7 is specified and so the flags in field cr7 will indicate the result of the compare operation. There are 4 condition code bits in each CR field: lt, gt, eq and so. Typically the compare will be followed by a conditional branch, bc.
There is some useful info in this IBM developerWorks article: Assembly language for Power Architecture, Part 3: Programming with the PowerPC branch processor.

Overflow datastructure

Everyone knows about overflow in the programming languages, if it happens program goes to crash. However, it is not clear for me what happens actually with data which get out of the boundary. Could you explain me, saying, giving example on C++ or Java. For example, Integer can save maximum 4 byte, what will happen if one puts data more than 4 byte to Integer. How compiler will identify this undefined behaviour?
what will happen if one puts data more than 4 byte to Integer.
Typically the value will roll-over1, meaning it will jump from one end of its range to another.
This can be seen, even in Windows calculator. Start with the highest possible signed 32-bit value:
Now add one to it:
We overflowed the maximum value of a signed Dword (231-1).
1 - This is a typical result. Some architectures might actually generate an exception on integer overflow, so you shouldn't count on this behavior.
How compiler will identify this undefined behaviour?
The compiler won't identify it. That's the problem. C# can mitigate this with the checked keyword, which checks to make sure that any arithmetic done on an integer will not cause overflow/underflow.

Packed and encrypted section in x86 reversing challenge, without tripping entropy heuristics

TASK:
I'm building a set of x86 assembly reverse engineering challenges, of which I have twenty or so already completed. They're just for fun / education.
The current challenge is one of the more advanced ones, and involves some trickery that makes it look like the EP is actually in the normal program, but it's actually packed away in another PE section.
Heres' the basic flow:
Starts out as if it were a normal MSVC++ application.
Injected a sneaky call away to a bunch of anti-debugger tricks.
If they pass, a DWORD in memory is set to 1.
Later in the program flow, it checks for that value being 1, and if it works it decrypts a small call table. If it fails, it sends them off on a wild goose chase of fake anti-debug tricks and eventually just crashes.
The call table points to the real decryption routines that decrypt the actual program code section.
The decryption routines are called, and they decrypt using a basic looped xor (C^k^n where C is ciphertext, k is a 32-bit key and n is the current data offset)
VirtualProtect is used to switch the section's protection flags from RW to RX.
Control flow is redirected to OEP, program runs.
The idea is that since they think they're in normal program flow, it makes them miss the anti-debug call and later checks. Anyway, that all works fine.
PROBLEM:
The current problem is that OllyDbg and a few other tools look at the packed section and see that it has high entropy, and throw up a warning that it's packed. The code section pointer in the PE header is correctly set, so it doesn't get this from having EP outside code - it's purely an entropy analysis thing.
QUESTION:
Is there an encryption method I can use that preserves low entropy, but is still easy to implement in x86 asm? I don't want to use a plain xor, since it's too easy, but I also don't want it to catch it as packed and give the game away.
I thought of something like a shuffler (somehow produce a keystream and use it to swap 4-byte blocks of code around), but I'm not sure that this is going to work, or even be simple.
Anyone got any ideas?
Actually, OllyDbg works like this pseudocode:
useful_bytes = number_of_bytes_in_section - count_bytes_with_values(0x00, 0x90, 0xCC)
warn about compression if useful_bytes > 0x2000 and count_bytes_with_values(0xFF, 0xE8, 0x8B, 0x89, 0x83) / useful_bytes < 0.075
So, the way to avoid that warning is to use enough bytes with the values 0xFF 0xE8 0x8B 0x89 0x83 in the compressed section.
Don't pack/encrypt your entire program code. Just encrypt a small percentage of bytes, randomly selected from your program code. If they're not decrypted, the program will soon crash if it tries to run the code anyway - and because the majority of the program is unchanged, entropy-based checks won't be set off.
What about simply reversing the bytes (from last to first)? Intel assembler instructions aren't fixed length, so this would shuffle them a little. Or you could simply rotate each byte by a fixed amount...
EDIT: Wrong guess, this is not how Olly works. See my other answer. This still applies to tools other than OllyDbg that calculates entropy.
Expanding on ninjaljs comment:
While I haven't checked, the entropy value OllyDbg calculates is likely bytewise, without context. See How to calculate the entropy of a file? for a common algorithm for doing this.
This algorithm gives that the sequence 0 1 2 ... 254 255 have the maximum entropy possible, despite being completely predictable. A sequence of random bytes between 0 and 255 would get slightly lower entropy, since it won't have exactly the same number of each possible value.
Some quick checks on uncompressed executables with pefile tells me that uncompressed x86 code has entropy of about 6.3 to 6.6. Compressed code with entropy 8.0, encoded with base64, has entropy 6.0. Thus, base64 is easily enough to stop this algorithm from finding compressed code.

Modify only the LSB of a memory cell

Is it possible to write a sequence of instructions that will place a 1 in the least significant bit of the memory cell at address B3 without disturbing the other bits in the memory cell?
The machine instructions I am referring to is the STOP, ADD, SWITCH, STOP, LOAD, ROTATE etc.
Clarification: this question was originally tagged C#; since it wasn't the OP that re-tagged it, I'll leave this here until the OP's intentions are clearer.
C# is a high-level programming language, which compiles down to IL, not machine code. As such: no, there is absolutely no supported mechanism for performing specific machine code operations (and even if there were, it couldn't possibly port between langauges).
You can do high level bit operations, using the operators on the integer-based types; and if you really want you can write IL, either building it manually (ilasm), or at runtime via DynamicMethod / ILGenerator - but these still only deal with CIL opcodes, not machine codes.
I think ORing it with 1 will do the job ain't it:
algo:
byte= [data at 0xB3]
byte = byte | 0x01
this works fine with me in developing for 8051 MCUs.

Resources