Casting Primitive Values - overflow

Value: 1,921,222, is too large to be stored as a short, so numeric overflow occurs and it becomes 20,678.
Can anyone demonstrate the process of 1,921,222 becoming 20,678 ?
how to “wraps around” to the next lowest value and counts up from there to get 20,678
Thank you in advance

In language C, the "short" type has 2 bytes. Every integer value is treated by the compiler as a 32-bit or 4-byte "int" type (this can vary depending on the compiler).
short s = 1921222;
In this sentence you are losing 2 bytes of data:
Information that remains in the variable (2 bytes)
^ ^
00000000 00011101 01010000 11000110 -> total data (4 bytes, 32 bits)
v v
Information discarded when you put this value in a short type.
In other words, you "crop" the data, leaving only the part that fits the specified type.
01010000 11000110
"01010000 11000110" is 20678.
This site can help you to understand better how this process works:
https://hexed.it/

Related

How do you determine the length of the string in an OUTPUT_DEBUG_STRING_INFO?

The documentation for the OUTPUT_DEBUG_STRING_INFO structure doesn't explain, how to determine the length (or size) of the string value it points to. Specifically, the documentation for nDebugStringLength is confusing:
The lower 16 bits of the length of the string in bytes. As nDebugStringLength is of type WORD, this does not always contain the full length of the string in bytes.
For example, if the original output string is longer than 65536 bytes, this field will contain a value that is less than the actual string length in bytes.
As I understand it, the true size can be any value that's a solution to the equation:
size = nDebugStringLength + (n * 65536)
for any n in [0..65536).
Question:
How do I determine the correct size of the string? Unless I'm overlooking something, the documentation appears to be insufficient in this regard.
initially the debug event comes in the form DBGUI_WAIT_STATE_CHANGE
if use WaitForDebugEvent[Ex] api - it internally convert DBGUI_WAIT_STATE_CHANGE to DEBUG_EVENT by using DbgUiConvertStateChangeStructure[Ex]
the DbgExceptionStateChang ( in NewState) event with DBG_PRINTEXCEPTION_WIDE_C and DBG_PRINTEXCEPTION_C (in ExceptionCode) converted to OUTPUT_DEBUG_STRING_INFO. the nDebugStringLength is taken from Exception.ExceptionRecord.ExceptionInformation[0] or from ExceptionInformation[3] (in case DBG_PRINTEXCEPTION_C and api version without Ex ). but because nDebugStringLength is only 16 bit length, when original value is 32/64 bit length - it truncated - only low 16 bit of ExceptionInformation[0] (or [3]) is used.
note that ExceptionInformation[0] (and [3] in case DBG_PRINTEXCEPTION_WIDE_C ) containing string length in characters, including terminating 0.
in contrast nDebugStringLength in bytes (if we using WaitForDebugEventEx and DBG_PRINTEXCEPTION_WIDE_C exception - nDebugStringLength = (WORD)(ExceptionInformation[0] * sizeof(WCHAR))

gfortran compiler Error: result of exponentiation exceeds the range of INTEGER(4)

I have this line in fortran and I'm getting the compiler error in the title. dFeV is a 1d array of reals.
dFeV(x)=R1*5**(15) * (a**2) * EXP(-(VmigFe)/kbt)
for the record, the variable names are inherited and not my fault. I think this is an issue with not having the memory space to compute the value on the right before I store it on the left as a real (which would have enough room), but I don't know how to allocate more space for that computation.
The problem arises as one part of your computation is done using integer arithmetic of type integer(4).
That type has an upper limit of 2^31-1 = 2147483647 whereas your intermediate result 5^15 = 30517578125 is slightly larger (thanks to #evets comment).
As pointed out in your question: you save the result in a real variable.
Therefor, you could just compute that exponentiation using real data types: 5.0**15.
Your formula will end up like the following
dFeV(x)= R1 * (5.0**15) * (a**2) * exp(-(VmigFe)/kbt)
Note that integer(4) need not be the same implementation for every processor (thanks #IanBush).
Which just means that for some specific machines the upper limit might be different from 2^31-1 = 2147483647.
As indicated in the comment, the value of 5**15 exceeds the range of 4-byte signed integers, which are the typical default integer type. So you need to instruct the compiler to use a larger type for these constants. This program example shows one method. The ISO_FORTRAN_ENV module provides the int64 type. UPDATE: corrected to what I meant, as pointed out in comments.
program test_program
use ISO_FORTRAN_ENV
implicit none
integer (int64) :: i
i = 5_int64 **15_int64
write (*, *) i
end program
Although there does seem to be an additional point here that may be specific to gfortran:
integer(kind = 8) :: result
result = 5**15
print *, result
gives: Error: Result of exponentiation at (1) exceeds the range of INTEGER(4)
while
integer(kind = 8) :: result
result = 5**7 * 5**8
print *, result
gives: 30517578125
i.e. the exponentiation function seems to have an integer(4) limit even if the variable to which the answer is being assigned has a larger capacity.

how bytes are used to store information in protobuf

i am trying to understand the protocol buffer here is the sample , what i am not be able to understand is how bytes are being used in following messages. i dont know what this number
1 2 3 is used for.
message Point {
required int32 x = 1;
required int32 y = 2;
optional string label = 3;
}
message Line {
required Point start = 1;
required Point end = 2;
optional string label = 3;
}
message Polyline {
repeated Point point = 1;
optional string label = 2;
}
i read following paragraph in google protobuf but not able to understand what is being said here , can anyone help me in understanding how bytes are being used to store info.
The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional element.
The general form of a protobuf message is that it is a sequence of pairs of the form:
field header
payload
For your question, we can largely forget about the payload - that isn't the bit that relates to the 1/2/3 and the <=16 restriction - all of that is in the field header. The field header is a "varint" encoded integer; "varint" uses the most-significant-bit as an optional continuation bit, so small values (<=127, assuming unsigned and not zig-zag) require one byte to encode - larger values require multiple bytes. Or in other words, you get 7 useful bits to play with before you need to set the continuation bit, requiring at least 2 bytes.
However! The field header itself is composed of two things:
the wire-type
the field-number / "tag"
The wire-type is the first 3 bits, and indicates the fundamental format of the payload - "length-delimited", "64-bit", "32-bit", "varint", "start-group", "end-group". That means that of the 7 useful bits we had, only 4 are left; 4 bits is enough to encode numbers <= 16. This is why field-numbers <= 16 are suggested (as an optimisation) for your most common elements.
In your question, the 1 / 2 / 3 is the field-number; at the time of encoding this is left-shifted by 3 and composed with the payload's wire-type; then this composed value is varint-encoded.
Protobuf stores the messages like a map from an id (the =1, =2 which they call tags) to the actual value. This is to be able to more easily extend it than if it would transfer data more like a struct with fixed offsets. So a message Point for instance would look something like this on a high level:
1 -> 100,
2 -> 500
Which then is interpreted as x=100, y=500 and label=not set. On a lower level, protobuf serializes this tag-value mapping in a highly compact format, which among other things, stores integers with variable-length encoding. The paragraph you quoted just highlights exactly this in the case of tags, which can be stored more compactly if they are < 16, but the same for instance holds for integer values in your protobuf definition.

Convert Enum to Binary (via Integer or something similar)

I have an Ada enum with 2 values type Polarity is (Normal, Reversed), and I would like to convert them to 0, 1 (or True, False--as Boolean seems to implicitly play nice as binary) respectively, so I can store their values as specific bits in a byte. How can I accomplish this?
An easy way is a lookup table:
Bool_Polarity : constant Array(Polarity) of Boolean
:= (Normal=>False, Reversed => True);
then use it as
B Boolean := Bool_Polarity(P);
Of course there is nothing wrong with using the 'Pos attribute, but the LUT makes the mapping readable and very obvious.
As it is constant, you'd like to hope it optimises away during the constant folding stage, and it seems to: I have used similar tricks compiling for AVR with very acceptable executable sizes (down to 0.6k to independently drive 2 stepper motors)
3.5.5 Operations of Discrete Types include the function S'Pos(Arg : S'Base), which "returns the position number of the value of Arg, as a value of type universal integer." Hence,
Polarity'Pos(Normal) = 0
Polarity'Pos(Reversed) = 1
You can change the numbering using 13.4 Enumeration Representation Clauses.
...and, of course:
Boolean'Val(Polarity'Pos(Normal)) = False
Boolean'Val(Polarity'Pos(Reversed)) = True
I think what you are looking for is a record type with a representation clause:
procedure Main is
type Byte_T is mod 2**8-1;
for Byte_T'Size use 8;
type Filler7_T is mod 2**7-1;
for Filler7_T'Size use 7;
type Polarity_T is (Normal,Reversed);
for Polarity_T use (Normal => 0, Reversed => 1);
for Polarity_T'Size use 1;
type Byte_As_Record_T is record
Filler : Filler7_T;
Polarity : Polarity_T;
end record;
for Byte_As_Record_T use record
Filler at 0 range 0 .. 6;
Polarity at 0 range 7 .. 7;
end record;
for Byte_As_Record_T'Size use 8;
function Convert is new Ada.Unchecked_Conversion
(Source => Byte_As_Record_T,
Target => Byte_T);
function Convert is new Ada.Unchecked_Conversion
(Source => Byte_T,
Target => Byte_As_Record_T);
begin
-- TBC
null;
end Main;
As Byte_As_Record_T & Byte_T are the same size, you can use unchecked conversion to convert between the types safely.
The representation clause for Byte_As_Record_T allows you to specify which bits/bytes to place your polarity_t in. (i chose the 8th bit)
My definition of Byte_T might not be what you want, but as long as it is 8 bits long the principle should still be workable. From Byte_T you can also safely upcast to Integer or Natural or Positive. You can also use the same technique to go directly to/from a 32 bit Integer to/from a 32 bit record type.
Two points here:
1) Enumerations are already stored as binary. Everything is. In particular, your enumeration, as defined above, will be stored as a 0 for Normal and a 1 for Reversed, unless you go out of your way to tell the compiler to use other values.
If you want to get that value out of the enumeration as an Integer rather than an enumeration value, you have two options. The 'pos() attribute will return a 0-based number for that enumeration's position in the enumeration, and Unchecked_Conversion will return the actual value the computer stores for it. (There is no difference in the value, unless an enumeration representation clause was used).
2) Enumerations are nice, but don't reinvent Boolean. If your enumeration can only ever have two values, you don't gain anything useful by making a custom enumeration, and you lose a lot of useful properties that Boolean has. Booleans can be directly selected off of in loops and if checks. Booleans have and, or, xor, etc. defined for them. Booleans can be put into packed arrays, and then those same operators are defined bitwise across the whole array.
A particular pet peeve of mine is when people end up defining themselves a custom boolean with the logic reversed (so its true condition is 0). If you do this, the ghost of Ada Lovelace will come back from the grave and force you to listen to an exhaustive explanation of how to calculate Bernoulli sequences with a Difference Engine. Don't let this happen to you!
So if it would never make sense to have a third enumeration value, you just name objects something appropriate describing the True condition (eg: Reversed_Polarity : Boolean;), and go on your merry way.
It seems all I needed to do was pragma Pack([type name]); (in which 'type name' is the type composed of Polarity) to compress the value down to a single bit.

Little endian data and sha 256

I have to generate sha256 hashes of data that is in little endian form. I would like to know if I have to convert it to big endian first, before using the sha 256 algorithm. Or if, the algorithm is "endian-agnostic".
EDIT: Sorry, I think I wasnt clear. What I would like to know is the following: The sha256 algorithm requires to pad the end of a message with certain bits. The first step is to add a 1 at the end of the message. Then, to pad it with zero up to the end. At the very end, you must add the length of the message in bits. What I would like to know is if this padding can be performed in little endian. For example, for a 640 bit message, I could write the last word as 0x280 (in big endian), or 0x8002000 (in little endian). Can this padding be done in little endian?
SHA256 is endian-agnostic if all you want is a good hash. But if you are writing SHA256 and want to the same results with a correct implementation then you must play games on little endian hardware. SHA256 combines arithmetic addition (mod 2*32) and boolean operation thus is not endian-agnostic internally.
The SHA-256 implementation itself should take care of padding - you shouldn't have to deal with that unless you're implementing your own specialized SHA-256 code. If you are, note that the padding rules specified in the "pre-processing step" say that the length is a 64-bit big-endian integer. See SHA-2 - Wikipedia
It's hard to even figure out what "endian-agnostic" would mean, but the order of all the bits, bytes and words for a hash algorithm matter a whole lot, so I sure wouldn't use that term.
Let me reply regarding sha 256 as well as sha 512.
in short:
The algorithm itself is endian agnostic. The endian sensitive parts are when data is imported from a byte buffer to the algorithm working variables and when it is exported back to the digest result - also a byte buffer. If the import / export include casting, then endian matters.
Where could casting occur:
In sha 512 there is a working buffer of 128 bytes.
In my code its defined like this:
union
{
U64 w [80]; (see U64 example below)
byte buffer [128];
};
Input data is copied to this byte buffer and then work is done on W. This means the data was casted to some 64 bit type. This data will have to be swapped. in my case its swapped for little endian machines.
A better method would be to prepare a get macro that takes each byte and places it in its correct place in the u64 type.
When the algorithm is done the digest result is output from the working variables to some byte buffer, if this is done by memcpy it will also have to be swapped.
Another casting could occur when implementing sha 512 - which is designed for 64 bit machines - on 32 bit machines. In my case I have a 64 bit type that is defined:
typedef struct {
uint high;
uint low;
} U64;
Assume I define it for little endian as well, as follows:
typedef struct {
uint low;
uint high;
} U64;
And then the k algorithm init is done like this:
static const SHA_U64 k[80] =
{
{0xD728AE22, 0x428A2F98}, {0x23EF65CD, 0x71374491}, ...
...
...
}
But i need the logic value of k[0].high to be the same in any machine.
So in this example I will need another k array with high and low values swapped.
After the data is stored in the working parameters any bitwise manipulation would have the same result on both big/little endian machines.
Good method would be to avoid any casting:
Import bytes from input buffer to your working parameters using macro.
Work with logical values without thinking about the memory mapping.
Export output to digest result with a macro.
Macro for taking 32 bits from a byte buffer to int32 (BE = big endian):
#define GET_BE_BYTES_FROM32(a)
((((NQ_UINT32) (a)[0]) << 24) |
(((NQ_UINT32) (a)[1]) << 16) |
(((NQ_UINT32) (a)[2]) << 8) |
((NQ_UINT32) (a)[3]))
#define GET_LE_BYTES_FROM32(a)
((((NQ_UINT32) (a)[3]) << 24) |
(((NQ_UINT32) (a)[2]) << 16) |
(((NQ_UINT32) (a)[1]) << 8) |
((NQ_UINT32) (a)[0]))

Resources