algorithm to generate relative precedence number - algorithm

I want to write a module/function, which takes input an unsigned integer(32-bit) and outputs 8-bit value. output values must be in relative order as input.
and the output value once assigned should not be altered.
testing:
input: randomly generated 32-bit integer
output: in range 0-255
repeat the action 256 times.. till the output values are exhausted.
all output values should be relatively in same order as input values.
EDIT: Based on comments, it is impossible to find a perfect solution for this.
However, Can we find best solution possible. i.e without comparing the input(32-bit) values.. generating relative order output of size 8-bit.

Assume the first input number is 123456. You don't know whether the following numbers will be smaller or greater than this. So the best thing to do is to output a number in the middle of the available range: 127.
Store this assignment (123456,127).
Now, if the second number is greater, e.g. 234567, output a number form the upper half of the output range:
(234567,191)
With each following number, output the center of the corresponding output range:
(200000, 159)
You get the idea.
Of course, depending on the input sequence, you normally cannot assign unique output to each input.
Worst case:
Input sequence 2^31, 2^30, ... i.e. 2147483648, 1073741824 yields output 127, 63, 31, 15, 7, 3, 1, 0, 0, ...
So you can only use 8 of the 256 available output numbers.
You can however fine tune this near the limits of the input range. If the input number is smaller than 127, then output == input.

From your comment objecting to the idea of simply taking the top bits, and reading between the lines of "till the output values are exhausted", it sounds like you want an algorithm which:
Takes 256 distinct 32-bit integers as its input
Returns the values 0..255 associated with those integers, in order.
But of course this is simply a sort algorithm. Sort the 256 inputs, and then the output value is the 0-based index in the sort result.
If you're looking for some way to return the output value without having first examined all the inputs, it's pretty clear that that's not possible, given your "output value once assigned should not be altered" requirement.

This is just a sorting problem in disguise.
You can't output anything until you've seen all the values - unless you're under-specifying the problem, and there's a limit on the number of inputs.
If you have the intuitive limit of 256 inputs, whatever output you assign the first input, you can come up with less than 255 inputs such that you won't be able to map them.

Related

Case study of streams of digits

I'm doing a case study of a random number portal. The portal displays a sequence of numbers (1 to 49) that changes every 4:25 (about 4 1/2 minutes) to a new sequence of numbers.
Examples:
previous stream:
36, 1, 37, 6, 17, 48
Current Stream :
45, 4, 49, 30, 41, 16
What will the next stream will be?
Can we reverse engineer the current output of streams of numbers to get the next stream ?
No. First of all, you specified a random portal -- which, by definition of "random" cannot be predicted from any preceding sequence of output.
If you mean a pseudo-random sequence, then reverse-engineering is theoretically possible, but you must have enough knowledge of the RNG (random-number generator) to reduce the possible outputs to 1 from the 6^49 possible sequences (you didn't specify numbers unique within the stream of 6; if that's another oversight, then it's 49!/(49-6)! If order is unimportant, then divide again by 6!).
Look at the value of information you've presented here: 12 numbers in a particular sequence. Divide the quantity of possible continuations by that value ... the result is far more than 1.
If you can provide the characteristics of the RNG, and those characteristics are sufficiently restrictive, then perhaps it's possible to determine the future sequence. Until then, the answer remains a resounding NO.
UPDATE per OP's comment
If the application is, indeed, a TRNG, then there's your answer: see my first paragraph.
If you're trying to implement a linear congruential RNG (e.g. the equation you posted), then simply check the myriad available hits and pick one that looks good to you. Getting a set of six numbers is a simply calling the generator six times.
Either way, there is still insufficient information to definitively obtain the parameters of even a generic linear congruential RNG. Do you have bounds on the values of a and c? Do you know the range of the X values and how they're converted to the range [1,49]?

Base91, how is it calculated?

I've been looking online to find out how basE91 is calculated. I have found resources such as this one which specifies the characters used for a specific value but nowhere have I found how I get that value.
I have tried changing the input values into binary and taking chunks of both 6 and 7 bits but these do not work and I get the incorrect output. I do not want code that will do this for me as I which to write that myself, I only want to know the process needed to encode a string into basE91.
First, you need to see the input as a bit stream.
Then, read 13 bits from the stream, and form an integer value from it. If the value of this integer is lower than or equal to 88, then read one additional bit, and put it into the 14th bit (lowest bit being 1st) of the integer. This integer's (let's call it v) maximum value is: 8192+88 = 8280.
Then split v into two indices: i0 = v%91, i1 = v/91. Then use a 91-element character table, and output two characters: table[i0], table[i1].
(now you can see the reason of 88: for the maximal value (8280), both i0 and i1 become 90)
So this process is more complicated than base64, but more space efficient. Furthermore, unlike base64, the size of the output is a little bit dependent of the input bytes. A N-length sequence of 0x00 will be shorter than a N-length sequence of 0xff (where N is a sufficiently large number).

Sorting 100 unique numbers by using 40bytes of memory

I've been asked a good programming problem:
In the input I've got 100 unique numbers from 0-255(1 byte). I can only read one number at a time and only once. I've got 40 bytes of memory which I can use. The goal is to sort all numbers and print them in the output. I know for sure that the uniqueness of the numbers is very important.
Any ideas?
32 bytes give you 256 bits, just enough to maintain a bit map of which of the 256 possible byte values are seen in the input. One additional byte is used to store the input value. Read each value, mark it in the bitmap, then discard. Once you've read all 100 input values, simply write out the value associated with the bits you set in the bit map.
Then ask what you are supposed to do with the other 7 bytes :)
Since your numbers are unique and they are only 1-byte long, they have to be within 0 to 255. Treat your 40 bytes of storage as a long bit vector. As you read each number, set the appropriate bit in this 320-bit bit-vector. When you're done reading the input, turn around and scan through this bit-vector, printing the number corresponding to each set bit.
In response to #JavaNewb, here is some more detail. First, since a byte contains 8 bits, it can assume only one of 256 possible values, namely, 0 through 255. Armed with this little factoid, you look at the 40-byte storage array you have. This array turns out to have 40 bytes * 8 bits/byte = 320 bits. Since the problem states that each of the 100 1-byte numbers are unique, you know that you will see a particular number (which can range from 0 through 255) at most once. Each time you see a number, you set the corresponding bit in the 40-byte array. For instance, if you encounter the number 50, you set bit number 2 in byte number 6. A number N corresponds to bit N%8 in byte N/8. You are guaranteed to never encounter a set bit in this array since that would imply the existence of duplicates in the 100 numbers. After you've read in all the numbers, you look at the 40-byte array. Each bit that is set in this array corresponds to one of the 100 numbers you read in. By traversing this 40-byte array from the 0th bit in the 0th byte all the way to the 7th bit in the 31st byte, you will by:
Capturing all the numbers that were read in
Observing them in a sorted order
All you have to do now is print the numbers corresponding to the set bits as you traverse the 40-byte array.

Data Compression : Arithmetic coding unclear

Can anyone please explain arithmetic encoding for data compression with implementation details ? I have surfed through internet and found mark nelson's post but the implementation's technique is indeed unclear to me after trying for many hours.
Mark nelson's explanation on arithmetic coding can be located at
http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/
The main idea with arithmetic compression is its the capability to code a probability using the exact amount of data length required.
This amount of data is known, proven by Shannon, and can be calculated simply by using the following formula : -log2(p)
For example, if p=50%, then you need 1 bit.
And if p=25%, you need 2 bits.
That's simple enough for probabilities which are power of 2 (and in this special case, huffman coding could be enough). But what if the probability is 63% ? Then you need -log2(0.63) = 0.67 bits. Sounds tricky...
This property is especially important if your probability is high. If you can predict something with a 95% accuracy, then you only need 0.074 bits to represent a good guess. Which means you are going to compress a lot.
Now, how to do that ?
Well, it's simpler than it sounds. You will divide your range depending on probabilities. For example, if you have a range of 100, 2 possible events, and a probability of 95% for the 1st one, then the first 95 values will say "Event 1", and the last 5 remaining values will say "Event 2".
OK, but on computers, we are accustomed to use powers of 2. For example, with 16 bits, you have a range of 65536 possible values. Just do the same : take the 1st 95% of the range (which is 62259) to say "Event 1", and the rest to say "Event 2". You obviously have a problem of "rounding" (precision), but as long as you have enough values to distribute, it does not matter too much. Furthermore, you are not constrained to 2 events, you could have a myriad of events. All that matters is that values are allocated depending on the probabilities of each event.
OK, but now i have 62259 possible values to say "Event 1", and 3277 to say "Event 2". Which one should i choose ?
Well, any of them will do. Wether it is 1, 30, 5500 or 62256, it still means "Event 1".
In fact, deciding which value to select will not depend on the current guess, but on the next ones.
Suppose i'm having "Event 1". So now i have to choose any value between 0 and 62256. On next guess, i have the same distribution (95% Event 1, 5% Event 2). I will simply allocate the distribution map with these probabilities. Except that this time, it is distributed over 62256 values. And we continue like this, reducing the range of values with each guess.
So in fact, we are defining "ranges", which narrow with each guess. At some point, however, there is a problem of accuracy, because very little values remain.
The idea, is to simply "inflate" the range again. For example, each time the range goes below 32768 (2^15), you output the highest bit, and multiply the rest by 2 (effectively shifting the values by one bit left). By continuously doing like this, you are outputting bits one by one, as they are being settled by the series of guesses.
Now the relation with compression becomes obvious : when the range are narrowed swiftly (ex : 5%), you output a lot of bits to get the range back above the limit. On the other hand, when the probability is very high, the range narrow very slowly. You can even have a lot of guesses before outputting your first bits. That's how it is possible to compress an event to "a fraction of a bit".
I've intentionally used the terms "probability", "guess", "events" to keep this article generic. But for data compression, you just to replace them with the way you want to model your data. For example, the next event can be the next byte; in this case, you have 256 of them.
Maybe this script could be useful to build a better mental model of arithmetic coder: gen_map.py. Originally it was created to facilitate debugging of arithmetic coder library and simplify generation of unit tests for it. However it creates nice ASCII visualizations that also could be useful in understanding arithmetic coding.
A small example. Imagine we have an alphabet of 3 symbols: 0, 1 and 2 with probabilities 1/10, 2/10 and 7/10 correspondingly. And we want to encode sequence [1, 2]. Script will give the following output (ignore -b N option for now):
$ ./gen_map.py -b 6 -m "1,2,7" -e "1,2"
000000111111|1111|111222222222222222222222222222222222222222222222
------011222|2222|222000011111111122222222222222222222222222222222
---------011|2222|222-------------00011111122222222222222222222222
------------|----|-------------------------00111122222222222222222
------------|----|-------------------------------01111222222222222
------------|----|------------------------------------011222222222
==================================================================
000000000000|0000|000000000000000011111111111111111111111111111111
000000000000|0000|111111111111111100000000000000001111111111111111
000000001111|1111|000000001111111100000000111111110000000011111111
000011110000|1111|000011110000111100001111000011110000111100001111
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
001100110011|0011|001100110011001100110011001100110011001100110011
010101010101|0101|010101010101010101010101010101010101010101010101
First 6 lines (before ==== line) represent a range from 0.0 to 1.0 which is recursively subdivided on intervals proportional to symbol probabilities. Annotated first line:
[1/10][ 2/10 ][ 7/10 ]
000000111111|1111|111222222222222222222222222222222222222222222222
Then we subdivide each interval again:
[ 0.1][ 0.2 ][ 0.7 ]
000000111111|1111|111222222222222222222222222222222222222222222222
[ 0.7 ][.1][ 0.2 ][ 0.7 ]
------011222|2222|222000011111111122222222222222222222222222222222
[.1][ .2][ 0.7 ]
---------011|2222|222-------------00011111122222222222222222222222
Note, that some intervals are not subdivided. That happens when there is not enough space to represent every subinterval within given precision (which is specified by -b option).
Each line corresponds to a symbol from the input (in our case - sequence [1, 2]). By following subintervals for each input symbol we'll get a final interval that we want to encode with minimal amount of bits. In our case it's a first 2 subinterval on a second line:
[ This one ]
------011222|2222|222000011111111122222222222222222222222222222222
Following 7 lines (after ====) represent the same interval 0.0 to 1.0, but subdivided according to binary notation. Each line is a bit of output and by choosing between 0 and 1 you choose left or right half-subinterval. For example bits 01 corresponds to subinterval [0.25, 05) on a second line:
[ This one ]
000000000000|0000|111111111111111100000000000000001111111111111111
The idea of arithmetic coder is to output bits (0 or 1) until the corresponding interval will be entirely inside (or equal to) the interval determined by the input sequence. In our case it's 0011. The ~~~~ line shows where we have enough bits to unambiguously identify the interval we want.
Vertical lines formed by | symbol show the range of bit sequences (rows) that could be used to encode the input sequence.
First of all thanks for introducing me to the concept of arithmetic compression!
I can see that this method has the following steps:
Creating mapping: Calculate the fraction of occurrence for each letter which gives a range size for each alphabet. Then order them and assign actual ranges from 0 to 1
Given a message calculate the range (pretty straightforward IMHO)
Find the optimal code
The third part is a bit tricky. Use the following algorithm.
Let b be the optimal representation. Initialize it to empty string (''). Let x be the minimum value and y the maximum value.
double x and y: x=2*x, y=2*y
If both of them are greater than 1 append 1 to b. Go to step 1.
If both of them are less than 1, append 0 to b. Go to step 1.
If x<1, but y>1, then append 1 to b and stop
b essentially contains the fractional part of the number you are transmitting. Eg. If b=011, then the fraction corresponds to 0.011 in binary.
What part of implementation do you not understand?

DBL_MAX & Max value of a double

This line:
NSLog(#"DBL_MAX: %f", DBL_MAX);
prints this very large value:
17976931348623157081452742373170435679807056752584499659891747680315726078002853876058955863276687817154045895351438246423432132688946418276846754670353751698604991057655128207624549009038932894407586850845513394230458323690322294816580855933212334827479
However, when I test a double value like this:
double test = 9999999999999999.0;
NSLog(#"test: %f", test);
I get this unexpected result:
test: 10000000000000000.000000
This appears to be the maximum number of digits that produce the expected result:
double test = 999999999999999.0;
NSLog(#"test: %f", test);
test: 999999999999999.000000
How can I work with higher positive fractions?
Thanks.
Unfortunately, I can't answer the question directly, as I don't understand what you mean by "How can I work with higher positive fractions?".
However, I can shed some light on what a floating-point number is ans what it isn't.
A floating-point number consists of:
A sign (plus or minus)
An exponent
A value (known as the "mantissa").
These are combined using a clever encoding typically into 32, 64, 80, or 128 bits. In addition, some special encodings are used to represent +-infinity, Not a Number (NaN), and +-Zero.
As the mantissa has a limited number of bits, your value can only have this number of significant bits. A really small floating-point number can represent values in the 10^-308 and and large one 10^308. However, any number can only have about 16 decimal digits.
On other words, the print-out if DBL_MAX does not corresponds the amount of information stored in the variable. For example, there is no way to represent the same number but with a ...7480 instead of ...7479 at the end.
So back to the question, in order to tell how to represent your values, you must describe what kind of values you want to represent. Are they really fractions (i.e. one integer divided by another integer), in that case you might want to represent this using two integers. If you want to represent really large values, you might want to use packages like http://gmplib.org
Floating point in C# doesn't produce accurate results all the time. There are numbers that cannot be represented in double, floats or decimals. You can improve your accuracy by using "decimal" instead of "double", but it still doesn't ensure that all numbers will be represented exactly.

Resources