MIPS input string "2147483648", how to return "error: the input number is two big." - overflow

Checking MIPS algorithm overflow
error: .assciiz "error: the input number is two big."
buffer: .space 256

For QtSpim, using syscall #5, input integer, there appears to be no way to detect when user input exceeds the 32-bit signed integer range.  No success/failure error code is returned, and the value returned by the syscall, given whatever the user entered, is simply truncated to 32 bits1.
If you want to do overflow checking, you'll have accept string data (instead of integer data) from the console and parse the string to integer checking for overflow during that parse.
Such a parse involves doing a multiplication by 10 and an addition for each digit, so with one approach both of those arithmetic operations would need to involve overflow checking.
Importantly, the input to the parse would be a string, and the output would both a status and a numeric value, unlike syscall #5, so your program can customize behavior.  The status could indicate success, or failures of various kinds, such as non-digit characters, and overflow.
Other approaches to determining the overflow during parse of string to integer are possible, since we know certain facts about decimal digits, such as at most 9 will be added each digit; since we know that 11 total digits (or more) it will overflow (not counting leading zeros, if present).
For example, an alternative approach is to compute the intermediate parse result in 64-bits, then only when converting to 32 bits, check for match of the 32-bit value with the original 64-bit value (match is ok, non-match is overflow).  Since 64 bits can still overflow, counting of total non-zero digits should also be employed with this approach, where anything over 11 would indicate overflow.
The other popular MIPS simulator MARS presents a different challenge/opportunity on overflow with syscall #5 — it takes an exception on overflow, but this is not like a Java or C# exception that is easily caught and handled, but instead this is a fault that transfers control to the kernel exception handler in supervisor mode.
Doing something useful with that would be a topic of advanced operation system coursework.  For one, once you write your own kernel exception handler, you have to be prepared to handle all possible processor exceptions, not just overflow; for another you have to differentiate between the overflow condition and the others so you can handle them separately; and lastly, there's no predefined interface for telling the overflowing user-mode program using syscall #5, that this has occurred, and what the user-mode program wants to do about it, so you'd have to provide such mechanism.
1  These simulators implement effectively useful but an irregular set of system calls — there appears no attempt made to make a complete or well rounded set of syscall operations.  For example, you can read or print an integer or string from/to the console, but can only read or print strings from files — if you want to use files instead of the console with integers, now you have to write your own int to string and vice versa.

Related

Could a CRC32 key with a most or least significant bit of 0 be valid?

I have a server receiving UDP packets with the payload being a number of CRC32 checksumed 4 byte words. The header in each UDP packet has a 2 byte field holding the "repeating" key used for the words in the payload. The way I understand it is that in CRC32 the keys must start and end with a 1 in the binary representation of the key. In other words the least and most significant bits of the key must be a 1 and not 0. So my issue is that I get, for example, the first UDP packet received has the key holding field reading 0x11BC which would have the binary representation 00010001 10111100. So the 1's are neither right nor left aligned to the key holding word. There are trailing 0's on both sides. Is my understanding on valid CRC32 keys wrong then? I ask as I'm trying to write the code to check each word using the key as is and it seems to always give a remainder meaning every word in the payload has an error and yet the instructions I've been given guarantee that the first packet received in the sample given has no errors.
Although it is true that CRC polynomials always have the top and bottom bit set, often this is dealt with implicitly; a 32-bit CRC is actually a 33-bit calculation and the specified polynomial ordinarily omits the top bit.
So e.g. the standard quoted polynomial for a CCITT CRC16 is 0x1021, which does not have its top bit set.
It is normal to include the LSB, so if you're certain you know which way around the polynomial has been specified then either the top or the bottom bit of your word should be set.
However, for UDP purposes you've possibly also made a byte ordering error on one side of the connection or the other? Network byte ordering is conventionally big endian whereas most processors today are little — is one side of the link switching byte order but not the other?

Any byte sequences that can not be present in valid x86 code?

Any byte sequences that can not be present in valid x86 code?
I'm looking for a byte sequence (or sequences), to inject into an x86 program compiled using GCC, that can not show up in the binary as a by product of compilation.
The reason is that I want these byte sequences to act as "labels", so that I can recognize them later during inspection.
Is it possible to construct patterns of bytes, so that, searching through the binary, these patterns will not show up except with very small probability (I prefer probability zero). In other words, I want to minimize the number of false positives!
There are sequences that today are not a valid encoding of any instruction.
Rather than digging in the opcode table present in the Intel Manual 2 you can exploit two facts of the x86 architecture:
The maximum instruction length is 15 bytes.
You can repeat prefixes.
These should also be more stable across generations than reserved opcodes.
The sequence 666666666666666666666666666666 (15 operand-size override prefixes, but any prefix will do) will generate an #UD exception because it is invalid.
For what it's worth, there is a specific instruction that fulfills the role of invalid instruction: ud2.
It's presence in a binary module is possible but its more idiomatic than an invalid encoding and it is standard, for example Linux uses it to mark a bug for if ud2 is the execution flow, the code behind it cannot be valid.
That said, if I got you right, that's not going to be useful to you.
You want to skip the process of decoding the instructions and scan the code section of the binary instead.
There is no guarantee that the code section will contain only code, for example ARM compilers generate literal pools - that's definitively uncommon on x86 though.
However the compilers usually align functions to a specific boundary (usually 16 bytes), this can be done in several ways - like stretching the previous function or with a mere padding.
This padding can be a sequence of bytes of any value - hence arbitrary bytes can be present in the code section.
Long story short, there is no universal byte sequence that appear with probability zero in the code section.
Everything that it's not in the execution flow can have any value.
We will deal with probability later, for now lets assume the 66..66h appears rarely enough in an executable.
You can't just use it directly, as 66..66h can be part of two instructions and thus be a valid sequence:
mov rax, 6666666666666666h
db 66h, 66h, 66h , 66h
db 66h, 66h, 66h
nop
is valid.
This is due to the immediate operands of instructions - the biggest immediate can be 8 bytes in length (as today), so the sequence must be lengthen to 15 + 8 = 23 bytes.
If you really want to be safe again future features, you can use a sequence of 14 + 15 = 29 bytes (for the 15-byte instruction length limit).
It's possible to find 23/29 bytes of value 66h in the code section or in the whole binary.
But how probable is that?
If the bytes in a binary were uniformly random then the probability would be astronomically small: 256-23 = 2-184.
Well, the point is that the bytes in a binary are not uniformly random.
You can open a file with an embedded icon to confirm that.
You can make the probability arbitrarily small by stretching the sequence - it's up to you to find a compromise between the length and an acceptable number of false positives.
It's unclear what you want to do but here some advice:
Most, if not all, building tools support generating a map file.
It is a file with all the symbols/names and their addresses.
If you could use actual labels (with a prefix and a random suffix) you'd collect them easily after the build.
Most output formats can be enriched with meta-information.
You can add an ELF/PE section with a table of offsets to the locations you want to mark.

Overflow datastructure

Everyone knows about overflow in the programming languages, if it happens program goes to crash. However, it is not clear for me what happens actually with data which get out of the boundary. Could you explain me, saying, giving example on C++ or Java. For example, Integer can save maximum 4 byte, what will happen if one puts data more than 4 byte to Integer. How compiler will identify this undefined behaviour?
what will happen if one puts data more than 4 byte to Integer.
Typically the value will roll-over1, meaning it will jump from one end of its range to another.
This can be seen, even in Windows calculator. Start with the highest possible signed 32-bit value:
Now add one to it:
We overflowed the maximum value of a signed Dword (231-1).
1 - This is a typical result. Some architectures might actually generate an exception on integer overflow, so you shouldn't count on this behavior.
How compiler will identify this undefined behaviour?
The compiler won't identify it. That's the problem. C# can mitigate this with the checked keyword, which checks to make sure that any arithmetic done on an integer will not cause overflow/underflow.

Efficient Algorithm for Parsing OpCodes

Let's say I'm writing a virtual machine. I read in the program data into an array of bytes. Now I need to loop through those bytes (instructions are two bytes) and instantiate a little class representing each instruction and it's arguments.
What would be a fast parsing approach? Here are the two way's I've thought of:
Logically branching by inspecting each bit from the left to the right until I narrowed it down to a particular op code. This would be like a binary search.
Inspecting some programs to come up with a list of opcodes ordered by frequency of use, and then checking the for the full opcode in that order.
Note: I will be using bit shifting and masking in C to check, not regexes or string comps or anything high-level like that.
You don't need to parse anything. If this is in C, you make a table of function pointers which has 256 entries in it, one for each possible byte value, then jump to the appropriate function based on the first byte value. If the second byte is significant then a switch statement can be used within the function to handle the second byte. This is how the original Visual Basic interpreter (versions 1-6) worked.

MIPS Overflow check

I am writing mips code in MARS that uses system call code 5 to read an integer.
Is there a way to verify that the integer input is within the 32-bit range?
If you allow both positive and negative integers, there is no way to verify that the integer input is within the 32-bit range, since the MARS syscall does not appear to return any error status.

Resources