I'm trying to know what number the computer will randomly choose. Will I need a specific algorithm? Or do I need artificial intelligence?
You can't predict a true random number, by definition, as it is random. Pseudorandom numbers can be predicted if you know both the algorithm being used and the seed number being provided.
https://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
Here's a good article on the different methods of generating random numbers.
Related
Actually, I am working on cryptography and I have to distinguish a number generated from a true random number generator and a number from a biased number generator.
Is there any theorem to follow and know how many samples of data do we need to distinguish and how much the bias should be?
Thank you in advance.
I am currently trying two create two specialized versions of the linear congruence generator (pesudo-random number generator) algorithm in my program (and I set the seed to the result of the algorithm every time I generate a random number). I would think that this would make both random number generators more random, but it seems like altering one linear congruence generator implementation ruins the other whenever I'm able to get one of them to be random. Does sharing the seed actually make the program more random in any way, or do both generators behave like one generator when a seed is shared?
I would like to apply a permutation test to a sequence with 4,000,000 elements. To my knowledge, it is infeasible due to a number of possible permutations being ridiculously large (no RNG will generate uniformly distributed values in range {1 ... 4000000!}). I've heard of pseudorandom permutations though, and it sounds like something I need, but I can't comprehend if it's actually a proper replacement for random shuffle in my case.
If you are running a permutation test I presume that you want to generate a random sample from the set of all possible permutations, so that you can test some statistic calculated on the real data against the distribution of statistics calculated on the permuted data.
Algorithms for generating random permutations, such as those described at http://en.wikipedia.org/wiki/Random_permutation, typically use many random numbers, so there is no requirement for any single step of the generation process to need numbers as large as 4000000!. The only worry would be that, since the seed used to generate the random numbers is typically much smaller than 4000000!, not all permutations are possible.
There are other statistical tests which consume very large quantities of pseudo-random numbers (e.g. MCMC), so I wouldn't worry about this if you are using a random number generator which is commonly used for statistical tests. If you are worried about this, you could repeat the test with a cryptographically secure random number generator, such as http://docs.oracle.com/javase/6/docs/api/java/security/SecureRandom.html. This will be slower, so you might need to reduce the number of permutations tested, but it is very unlikely that it has any characteristic which would stand out far enough to affect your test results, because any such characteristic would be a security weakness - it would mean that, given a large quantity of random numbers already generated, you would have a slightly better than random chance of guessing the next number correctly.
I got the following interesting task:
Given a list of 1 million numbers with 16 digits (say, credit card numbers), which includes 990,000 purely random numbers generated by a computer system, and 10,000 created manually by fraudsters. These numbers are labeled as genuine or fraud. Build an algorithm to predict non-random numbers.
My approach so far is a bit of a brute-force: looking at non-random numbers to find patterns (such as repeated numbers: 22222, or 01234).
I wonder if there's a ready-made algorithm or tool for this kind of task. I imagine this task should be quite common among fraud analytic community.
Thanks.
First off, if you know they're credit card numbers, use Luhn's algorithm, which is a quick checksum algorithm for valid credit card numbers.
However, if they are simply 16 digit integers, there are a couple of approaches that you can use. It is hard to tell if an individual number came from a random source(as the number 1111111111111111 is just as likely as any other number out of a random number generator). As for your repeated numbers and patterns, that is very reminiscent of the concept of Kolmogorov complexity(see links below). You could try looking for patterns in this brute force method, but I feel like it would be quite inaccurate, as humans might actually tend to avoid putting digits and sequences in these numbers!
Instead, I suggest focusing on the way people generate numbers. You can treat human input like a very poor random number generator. So I recommend just making a list yourself of random human entered numbers, if you don't have another dataset. Then, you can use machine learning to generate a classifier algorithm to distinguish between purely random numbers(those without 'human-like' attributes that your machine learning algorithm has recognized). In terms of the metrics for the statistical classifier, Kolmogorov complexity could be one, perhaps frequency of digits for another metric(see Benford's law on Wikipedia), and number of repeating digits for another(humans might try to avoid repeating digits to look non-random, so let your classifier do the work!)
From my personal experience, tough problems like this are a textbook case for machine learning algorithms and statistical classifiers.
Hope this helps!
Links:
Kolmogorov Complexity
Complexity calculator
I am trying to test 100 different sets of 100 random human generated numbers for randomness in comparison to the randomness of 100 different sets of 100 random computer generated numbers, but the diehard program wants a single set of around 100000 numbers.
I was wondering if it's possible to combine the human sets together into a block of 100000 numbers by using the human numbers as a seed for a pseudo number generator, and using the output as the number to test for the diehard program. I would do the same with the computer set with the same pseudo random generator. Would this actually change the result of the randomness if all I'm trying to prove is that computer generated numbers is more random than human generated numbers?
You can try just concatenating the numbers. I wouldn't think any combination would consistently be a lot better than some other. Any way of combining the numbers would cause them to lose some properties (possibly including the classification of 'random' by some test) regardless (some combinations more than others in certain cases, but if we're dealing with random numbers, you can't really predict much).
I'm not sure why you'd want to use the numbers as a seed for another random number generator (if I understand you correctly). This will not yield any useful applicable results. If you use a random number generator, you will get a sequence of numbers from a pseudo-random set, the seed will only determine where in this set you start, starting with any seed should produce as random results as starting with any other seed.
Any alleged test for randomness can, at best, say that some set is probably random. No test can measure true randomness accurately, that would probably contradict the definition of randomness.