Related
I understand that a seed is a number used to initialize pseudo-random number generator. in pytorch, torch.get_rng_state documentation states as follows "Returns the random number generator state as a torch.ByteTensor.". and when i print it i get a 1-d tensor of size 5048 whose values are as shown below
tensor([ 80, 78, 248, ..., 0, 0, 0], dtype=torch.uint8)
why does a seed have 5048 values and how is this different from usual seed which we can get using torch.initial_seed
It sounds like you're thinking of the seed and the state as equivalent. For older pseudo-random number generators (PRNGs) that was true, but with more modern PRNGs tend to work as described here. (The answer in the link was written with respect to Mersenne Twister, but the concepts apply equally to other generators.)
Why is it a good idea to not have a 32- or 64-bit state space and report the state as the generator's output? Because if you do that, as soon as you see any value repeat the entire sequence will repeat. PRNGs were designed to be "full cycle," i.e., to iterate through the maximum number of values possible before repeating. This paper showed that the birthday problem could quickly (O(sqrt(cycle-length)) identify such PRNGs as non-random. This meant, for instance, that with 32-bit integers you shouldn't use more than ~50000 values before a statistician could call you out with a better than 99% level of confidence. The solution, used by many modern PRNGs, is to have a larger state space and collapse it down to output a 32- or 64-bit result. Since multiple states can produce the same output, duplicates will occur in the output stream without the entire stream being replicated. It looks like that's what PyTorch is doing.
Given the larger state space, why allow seeding with a single integer? Convenience. For instance, Mersenne Twister has a 19,937 bit state space, but most people don't want to enter that much info to kick-start it. You can if you want to, but most people use the front-end which populates the full state space from a single integer input.
I need to generate a random number between 1 and 52, for a card game (I know how to).
I could either use random (52) to directly reference each card in the pack, or I could do random(4) and random(13) to get the Suit and Value separately.
I can get the suit and value from the number between 1 and 52 with r div 13 and r mod 13 + 1.
But I am wondering if generating two random numbers will affect the "randomness" of the outcome. As the numbers generates will be pseudo random numbers, so that could affect it in some way?
And if the low numbers 4 and 13 vs 52 don't make a difference, is there a value where this could become an issue?
If you're using a low-quality PRNG (like your average rand() implementation: Sure it'll affect stuff, but not in a way which is easily predictable without knowing your exact PRNG implementation and your exact code. Either one might be "better" than the other, for some value of "better".
If you're using a good-quality PRNG: Nah, doesn't matter. Go wild.
I have to deal with sequences of a lot of small numbers, about a million, and I have to put as many as possible (more is better) in 4KB. Obviously that's just too little space to put all of them. Also, while this is a specific scenario, I'd love an answer as general as possible.
The numbers don't follow any pattern, but here is what a small script has to say about their distribution:
407037 times 1
165000 times 2
85389 times 3
52257 times 4
34749 times 5
23567 times 6
15892 times 7
11183 times 8
7636 times 9
5402 times 10
3851 times 11
2664 times 12
2023 times 13
1547 times 14
1113 times 15
... many more lines ...
1 times 62
62 is the biggest number I have, so let's set the maximum number we care about at 64. If the method is easily adaptable to accommodate for bigger max numbers, that would be better.
Here is a sample of the numbers:
20
1
1
1
13
1
5
1
15
1
3
4
3
2
2
A naive way to do this would just be to use 6 bits per number, but I think we can do better.
EDIT: adding a bit of info following discussion in comments.
I also have 2KB of ram and a dozen cycles on a microprocessor to decode each number. I need to store, sequentially, from the first number, as many numbers as I can.
EDIT: see graybeard's comment and my followup too.
The correct way to do this would be Rangecoding, Huffman or Shannon-Fano which you can see in any of the digital-communication blogs over the net, so I'm not exactly explaining you these.
I can suggest you a custom method, which is really simple and you can compare it with other methods if you can use this to store more numbers or not.
I see that there are no 0's in your script. So just decrease each number by 1 (while decoding, add 1 to decoded result). Use either 4 or 7 bits to encode numbers. All numbers up-to 8 can be represented in 3-bits. If the number is n <= 8, set the 1st bit as 0 and next 3 bits can represent the number. Else, if the number is n > 8, set 1st bit as 1 and represent the number as 6 bits from there.
Though in Huffman or Shannon-Fano, few of the representations can be as long as over 20 bits.
For provide correct answer, need to know - is decoder size also limited, or there is not limit for decodes size?
If no limit for decoder (just limit for data), I suggest you to use rangecoder, or Huffman coding. Rangecoder has better compression, but extensive arithmetic operation usage.
However, both decoders uses memory for a code, and for statistical tables. So, perhaps, better answer to create something more easy (custom compressor), but with simple and compact code and without any tables. As easy, code-compact, I can propose the run-1 algorithm. This algorithm is not very efficient for your data (rangecoder or Huffman better), but has trivial compact decoder without any tables.
Idea - sequence can contain zero or more bit_1, and use bit_0 as symbol separator. For example, if we would like encode with run-1 the sequence:
1, 1, 2, 1, 5
There will be bit sequence:
0-0-10-0-11110
There, you needed just count number of sequenced bit_1, add 1, and return value as decoded number.
Maybe slightly better than straight Huffman can be attempted by combining with run-length coding.
If you count the successive identical elements, you can rewrite your sequence as a pairs of (value, count). Every such pair appears with some probability and you can use Huffman coding on these. (I don't mean to code the values and the counts separately, but the pairs as a whole).
Your sample yields
(20, 1), (1, 3), (13 1), (1, 1), (5, 1), (1, 1), (15, 1), (3, 1), (4, 1), (3, 1), (5, 2)
The singletons will be (practically) coded as before, and there are more opportunities for compression of longer runs.
You can limit the maximum count(s) that are supported; if the actual count exceeds the limit, it is no big deal to insert several pairs.
The very first step is to compute an histogram of the count values to see if there are enough repetitions for this approach to be worth.
Alternatively, you can try Huffman coding on the deltas (signed differences between successive values). If there are many repetitions, the frequency of 0 will be much higher, increasing the entropy. Obviously, run-length coding of the deltas is also possible.
I took the distribution you listed, and tried an exponential fit. The result was decently good:
More importantly, the fit was reasonably close to p(x) ~= 2^-x. This suggests a very simple coding, known as "unary coding": to encode the number k, output k-1 zeroes, followed by a 1. If your numbers exactly fit the p(x) ~= 2^-x distribution, that would give you an expected code length of 2 bits. Since your numbers appear to be heavier-tailed than that (otherwise it would be vanishingly unlikely to see a 62 in only a million numbers), you won't quite achieve that. Still, given the simplicity of the coding and the ease of decoding (twelve cycles should be sufficient), you should consider trying it out.
You might also look into other universal codes, such as Elias Delta. Golomb coding would be optimal, but decoding it is an involved process.
Can anyone please explain arithmetic encoding for data compression with implementation details ? I have surfed through internet and found mark nelson's post but the implementation's technique is indeed unclear to me after trying for many hours.
Mark nelson's explanation on arithmetic coding can be located at
http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/
The main idea with arithmetic compression is its the capability to code a probability using the exact amount of data length required.
This amount of data is known, proven by Shannon, and can be calculated simply by using the following formula : -log2(p)
For example, if p=50%, then you need 1 bit.
And if p=25%, you need 2 bits.
That's simple enough for probabilities which are power of 2 (and in this special case, huffman coding could be enough). But what if the probability is 63% ? Then you need -log2(0.63) = 0.67 bits. Sounds tricky...
This property is especially important if your probability is high. If you can predict something with a 95% accuracy, then you only need 0.074 bits to represent a good guess. Which means you are going to compress a lot.
Now, how to do that ?
Well, it's simpler than it sounds. You will divide your range depending on probabilities. For example, if you have a range of 100, 2 possible events, and a probability of 95% for the 1st one, then the first 95 values will say "Event 1", and the last 5 remaining values will say "Event 2".
OK, but on computers, we are accustomed to use powers of 2. For example, with 16 bits, you have a range of 65536 possible values. Just do the same : take the 1st 95% of the range (which is 62259) to say "Event 1", and the rest to say "Event 2". You obviously have a problem of "rounding" (precision), but as long as you have enough values to distribute, it does not matter too much. Furthermore, you are not constrained to 2 events, you could have a myriad of events. All that matters is that values are allocated depending on the probabilities of each event.
OK, but now i have 62259 possible values to say "Event 1", and 3277 to say "Event 2". Which one should i choose ?
Well, any of them will do. Wether it is 1, 30, 5500 or 62256, it still means "Event 1".
In fact, deciding which value to select will not depend on the current guess, but on the next ones.
Suppose i'm having "Event 1". So now i have to choose any value between 0 and 62256. On next guess, i have the same distribution (95% Event 1, 5% Event 2). I will simply allocate the distribution map with these probabilities. Except that this time, it is distributed over 62256 values. And we continue like this, reducing the range of values with each guess.
So in fact, we are defining "ranges", which narrow with each guess. At some point, however, there is a problem of accuracy, because very little values remain.
The idea, is to simply "inflate" the range again. For example, each time the range goes below 32768 (2^15), you output the highest bit, and multiply the rest by 2 (effectively shifting the values by one bit left). By continuously doing like this, you are outputting bits one by one, as they are being settled by the series of guesses.
Now the relation with compression becomes obvious : when the range are narrowed swiftly (ex : 5%), you output a lot of bits to get the range back above the limit. On the other hand, when the probability is very high, the range narrow very slowly. You can even have a lot of guesses before outputting your first bits. That's how it is possible to compress an event to "a fraction of a bit".
I've intentionally used the terms "probability", "guess", "events" to keep this article generic. But for data compression, you just to replace them with the way you want to model your data. For example, the next event can be the next byte; in this case, you have 256 of them.
Maybe this script could be useful to build a better mental model of arithmetic coder: gen_map.py. Originally it was created to facilitate debugging of arithmetic coder library and simplify generation of unit tests for it. However it creates nice ASCII visualizations that also could be useful in understanding arithmetic coding.
A small example. Imagine we have an alphabet of 3 symbols: 0, 1 and 2 with probabilities 1/10, 2/10 and 7/10 correspondingly. And we want to encode sequence [1, 2]. Script will give the following output (ignore -b N option for now):
$ ./gen_map.py -b 6 -m "1,2,7" -e "1,2"
000000111111|1111|111222222222222222222222222222222222222222222222
------011222|2222|222000011111111122222222222222222222222222222222
---------011|2222|222-------------00011111122222222222222222222222
------------|----|-------------------------00111122222222222222222
------------|----|-------------------------------01111222222222222
------------|----|------------------------------------011222222222
==================================================================
000000000000|0000|000000000000000011111111111111111111111111111111
000000000000|0000|111111111111111100000000000000001111111111111111
000000001111|1111|000000001111111100000000111111110000000011111111
000011110000|1111|000011110000111100001111000011110000111100001111
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
001100110011|0011|001100110011001100110011001100110011001100110011
010101010101|0101|010101010101010101010101010101010101010101010101
First 6 lines (before ==== line) represent a range from 0.0 to 1.0 which is recursively subdivided on intervals proportional to symbol probabilities. Annotated first line:
[1/10][ 2/10 ][ 7/10 ]
000000111111|1111|111222222222222222222222222222222222222222222222
Then we subdivide each interval again:
[ 0.1][ 0.2 ][ 0.7 ]
000000111111|1111|111222222222222222222222222222222222222222222222
[ 0.7 ][.1][ 0.2 ][ 0.7 ]
------011222|2222|222000011111111122222222222222222222222222222222
[.1][ .2][ 0.7 ]
---------011|2222|222-------------00011111122222222222222222222222
Note, that some intervals are not subdivided. That happens when there is not enough space to represent every subinterval within given precision (which is specified by -b option).
Each line corresponds to a symbol from the input (in our case - sequence [1, 2]). By following subintervals for each input symbol we'll get a final interval that we want to encode with minimal amount of bits. In our case it's a first 2 subinterval on a second line:
[ This one ]
------011222|2222|222000011111111122222222222222222222222222222222
Following 7 lines (after ====) represent the same interval 0.0 to 1.0, but subdivided according to binary notation. Each line is a bit of output and by choosing between 0 and 1 you choose left or right half-subinterval. For example bits 01 corresponds to subinterval [0.25, 05) on a second line:
[ This one ]
000000000000|0000|111111111111111100000000000000001111111111111111
The idea of arithmetic coder is to output bits (0 or 1) until the corresponding interval will be entirely inside (or equal to) the interval determined by the input sequence. In our case it's 0011. The ~~~~ line shows where we have enough bits to unambiguously identify the interval we want.
Vertical lines formed by | symbol show the range of bit sequences (rows) that could be used to encode the input sequence.
First of all thanks for introducing me to the concept of arithmetic compression!
I can see that this method has the following steps:
Creating mapping: Calculate the fraction of occurrence for each letter which gives a range size for each alphabet. Then order them and assign actual ranges from 0 to 1
Given a message calculate the range (pretty straightforward IMHO)
Find the optimal code
The third part is a bit tricky. Use the following algorithm.
Let b be the optimal representation. Initialize it to empty string (''). Let x be the minimum value and y the maximum value.
double x and y: x=2*x, y=2*y
If both of them are greater than 1 append 1 to b. Go to step 1.
If both of them are less than 1, append 0 to b. Go to step 1.
If x<1, but y>1, then append 1 to b and stop
b essentially contains the fractional part of the number you are transmitting. Eg. If b=011, then the fraction corresponds to 0.011 in binary.
What part of implementation do you not understand?
Are there are any pseudo-random number generators that are easy enough to do with mental arithmetic, or mental arithmetic plus counting on your fingers. Obviously this limits to fairly simple math - it needs to be something someone of average mathematical ability can do, or maybe average ability for a programmer, not a math prodigy.
The simplest I have found is the Middle square method, but not only is it known to be a poor source of randomness, it still looks too complex to do without pencil and paper.
If the only way to do this is by limiting the range, like maybe it only can output 8 bit numbers, that is fine. I suspect one of the standard PRNG algorithms would be simple enough in an 8 bit version, but I don't know enough to simplify any of them from the 32 bit version to an 8 bit version. (All the ones I looked at depend on specially picked seed numbers that are different depending how many bits you are working with, and usually only 32 and 64 bit examples are given.)
A linear feedback shift register is pretty simple, as long as you're comfortable with thinking in binary (or maybe hex, since it's easy to map between the two).
A more complex one is Xorshift, but if you know your bitwise operations, it should be quite possible to work with as well.
In your head you can do "semantic" random number generation :-)
Like taking random word, and calculating some metric out of it, repeat until you'll get
number with reasonable length.
For example, word "exercise" might get converted to 10100101b (you can see my conversion idea here).
How about Blum Blum Shub, but with prime numbers too small for secure use? Used securely it's slow, but it involves operations that we're used to dealing with, so you might be able to get to a manageable speed without too much practice, maybe with M = 437 or moderately bigger.
I doubt whether anything I could do in my head will be secure, anyway. I just can't remember big enough numbers to work without mistakes on a reasonably-sized state.
You can easily do a 10 bit LFSR on your fingers, if you have decent tendons ;-)
Not a direct answer, but depending why you're asking you might be interested in Solitaire, which generates a keystream (i.e. a pseudo-random sequence) using a deck of cards. Can't be done in your head, but doesn't require pencil and paper.
A comment points out that this is wrong. Months later I still haven't found the time to revisit how I came up with the magic numbers and where I went wrong, so I'm adding this note at the top in the interim.
This is pretty basic and should fit in most people's heads:
Start with a three-digit seed number (finding a suitable seed may be a harder problem).
Multiply it by nine.
Separate the fourth digit from the bottom three and add the two numbers together for a new three-digit number.
Write down these digits. To help disguise the pattern you might write down just one or two of the digits.
Repeat 2-4 as required.
So long as you don't start with zero, this will iterate through a period of 4500 results. The output doesn't "look" random, but it's in decimal and even true random results fail to look random, which is why humans suck at this task.
I might try to hack up a program to convert it to binary in an unbiased way to test it.
Alternative configurations:
three digits and multiply by 3
four digits and multiply by 6
five digits and multiply by 2
The easiest way would be to generate several numbers that come to your head and then sum and mod 10 each of the digits. The more numbers you add, the more random and less biased it will be.
510932
689275
539108
======
628205
Yes I know of one that can possibly be done in your head , and if modified further can result in truly random numbers take a list of numbers , an ordered list of numbers in base ten cause that would be the easiest to calculate in. Add them up together , the keep only the ones digit place number of that resulting number and then place that on the end of the list and drop off the first digit , and then repeat , this will not produce true random numbers but random enough and depending on the size of the list of numbers that you choose to use , will eventually repeat but for a large initial list will not repeat for a sufficiently large amount of time.
for example if I used just 5 numbers in a list 12345 then the next list would be 2345 and the rightmost digit of 1+2+3+4+5ie 15 or 5 so the list would be 23455 now the one has dropped off and is not used anymore so the next sum adds up to 20 -1 (15+5 minus the one that dropped off) so the next list would be 34559 then 45596 then 55969 then 59694 now here we stop , because we have generated a full seeds worth of digits so initially we had 12345.
For the next seed we got 59694 , now there is a kind of a shortcut that you can also use once a full seed has been calculated, or the shortcut itself could be used, which is you take the last digit , multiply it by 2 and subtract the first digit doubling one digit is easily done in the head, the important thing is to remember all the other digits and their order in the sequence, this will at best though only produce pseudo - random numbers , with some long repeat times the larger the list of numbers that you use, but the initial list must be chosen with care, like for instance don't pick all zeroes as you list or you will have an endless stream of zeroes and well some sets of digits will produce longer repeat cycles than others (but maybe this should be done on paper provided you have a pencil or pen and a sheet of paper handy... :) hope this helps..(modified a bit this makes the start of a very good true random number generator) enjoy...
I hope this is better if not then tell me so :) (I was never very good in English ! :)
If non deterministic algorithms are allowed, your eyes are in your head, so What about something like "the number of red objects in front of me plus the number of blue things modulo the number of green things plus the height of the tallest stack of things containing at least one thing with the letters g and uppercase A on it."
I'm sure there is a way to do this that would actually be fairly random.
Here is a very simple one that is based on a linear method:
Pick three number $a$, $b$, $n$ with $2<=a<n$ and $1<=b<n$ with $n$ being a prime. In this example, I'll use $a=83$, $b=52$, $n=101$.
Let $f(x) = (ax+b) (mod n)$.
Find the unique stationary point, which is the value $k$ such that $f(k)=k$. For the values of $a$, $b$, $n$ as above, the stationary point is $k=24$.
Recursively apply $f$ starting with a seed that is not the stationary point. You get a stream that outputs values from $0$ to $n-1$ except $k$. When the value $n-1$ is generated, write it down as $k$ instead.
For this example, starting with 0, we get a stream of 0, 52, 25, 6, 45, 50, 61, 65, 94, 77, 80, 26, 89, 66, 76, 98, 5, 63, 29, 35, 28, 53, 7, 27, 71, 87, 1, 34, 46, 32, 82, 91, 30, 17, 49, 79, 44, 68, 40, 39, 57, 36, 10, 74, 33, 64, 11, 56, 54, 90, 48, 97, 23, 42, 3, 99, 88, 84, 55, 72, 69, 22, 60, 83, 73, 51, 43, 86, 19, 13, 20, 96, 41, 21, 78, 62, 47, 14, 2, 16, 67, 58, 18, 31, 24, 70, 4, 81, 8, 9, 92, 12, 38, 75, 15, 85, 37, 93, 95, 59, which has a period of 100. A period of $n-1$ is guaranteed if $a$ is a primitive root of $n$, so there are many pairs of $(a, b)$ that gives a period of $n-1$.
I recommend a set of 23 functions
X = 0
Definition_0ne(X);
....
Definition_TwentyThree(X);
What each one does can be as simple as (X^2), but given 1 value all 23 most provide unique results.
From here you build a sequencer , that will call all 23 in a given order based off any seed, so if I gave you "Jimmy" as a seed for example. You could accept that and covert it to some form of decimal, then multiply it by some known non repeating decimal that goes out 23 decimal spots ( this value can be made up on the spot )
Then it will call the function closest to the last 2 decimal values, and Everytime it has already been called it will attempt to call the 2nd closest above, followed by 2nd closest below, after 23 passes, all remaining will be sequenced in , in a predertermined order , highest to lowest will work fine, stopping at the point that at least half of the functions have been called , and X is very much psuedo random, after all functions remaining are called the class will return the final X value
This takes a computer like .000000001 seconds to do, a human about 15 minutes on paper.
Your 23 functions could be as simple as X+1 , to X+23 , return X, you will never be able to accurately predict without first do the math of each function, then run on decimal modifier , then redoing the math , over and over to find out which functions will get called , and what order the would get called in, and only the author would know this, given that 12 of the 23 of the functions will get called minimally, and 23 max , you shouldn't ever have to worry about anyone backwards engineering your code :)
Sure they can keep putting in the same seed, but that won't solve anything and in a game or application setting your seed will be amended with a piece of extra info generated from storage in most cases. I like to use touch sequences on Mobile for that extra data, is your last 3 initial contact points are always saved and added to what ever random seed you start with , on a computer if it's an application I used a pointer to some kind of memory that only gets allocated after the initiation of the application, and i don't know what to use html , but I am sure there is a way to get information that isnt random but isn't the same in every instance to amend to the seed, to make backward engineering much more difficult