Related
I'm trying to set up remember me functionality for a simple Sinatra CRUD application.
I have found answers that explain how to structure this with the setting of a random token and anonymised user reference. However, the method suggested says to randomise over a sufficiently large space, but I'm unclear what this actually means?
Should I be using a randomly generated alphanumeric string? Of what length is sufficient?
Is there any standard practice in this area?
I'm looking at this answer
TL; DR
Use a well-maintained gem for generating login tokens when you can rather than rolling your own. However, understanding how to evaluate the relative strength of such tokens depends on the size of the numerical range and the entropy inherent in the generation of the tokens.
Understanding Space and Entropy
"A suffiently large space" is using the term in the mathematical or cryptographic sense. If a chosen random number can only vary between whole numbers from 1..10, you have a very small space. If your number can vary over 128 bits or more, then you have a much larger space from which to choose. This avoids the likelihood of collisions. Mathematically speaking, the amount of entropy and the seed value used to generate a pseudo-random value will also have a significant impact on the overall security and collision-resistance of the generated number.
What constitutes a sufficiently large space depends on your problem domain. In many cases, UUIDv4 as generated by Ruby's SecureRandom#uuid method is sufficiently random to be considered a universally unique identifier that is sufficiently random to avoid collisions. Because it is (pragmatically speaking) "universally unique," the utility value of salting it or hashing it with other information is probably unnecessary. However, it is still important to associate the UUID with a user ID or other unique attribute of a user so that the value can be used in a cookie, form data, or query parameter to associate it with an existing login, or with whatever other data it is that you're trying to persist.
Rather than doing this yourself, it is generally better to use a well-designed and well-maintained authentication mechanism like Devise to manage your Rails logins. The same is true for authorization, where other gems like CanCan may be useful. In either case, under the hood avoidance of collisions in authentication tokens are being handled for you.
If you are rolling your own, then understanding statistics, entropy, and the risks of deliberate or accidental collisions are extremely important. While a short answer here simply cannot do justice to the complexity of the underlying question, it should give you enough to get started and help you select the amount of randomness or uniqueness your current problem requires.
This may have to be more of a statistical/mathematical question, especially given the general nature of it. I'll move it if need be, but feedback is greatly appreciated.
I am very curious what the potential impact to a random number generator would be in a SOA design. For the sake of argument, let's assume a well seeded and design generator, with no issues and a perfectly random generation capability.
Within the full set of numbers created by this generator we see perfect randomness. But what happens when the consuming applications are distributed, and multiple remote hosts request a random number? Will it still be random to the host?
It seems like there is at least a potential reduction in the variability of the randomness, since each host will be taking a subset of the randomly generated numbers. And as such it could, by chance/coincidence create a more or less structured pattern. The pattern would only exist on a host by host basis, while the overall system would still seem to be generating perfectly random results.
Assuming you'd really generate random numbers, than any subset of those numbers would still be random.
Now since that isn't possible, what happens if you would take the random number generators which are used for security purposes like encryption (since those have more sophisticated algorithms than the simple generators)? I see two scenarios:
Your clients will talk to the server in the same order. So 10 clients means you get every 10th generated number. People could try to take those numbers and figure out the configuration of the random number generator. But they can do it also when they see all the numbers. And if any subset of those generators would be predictable, it wouldn't be useful for encryption anymore, so here you can argue that they still want any subset to be as well random.
Your clients will act completely independently and access the server in random orders. Here you actually get real randomness (if it is for instance based on user input) and therefore even badly designed pseudo random generators could give you a better random distribution on each client. Since those don't know if they are getting the next number of the generator or the 100th.
I have a project that needs to do validation on the frontend for an American Social Security Number (format ddd-dd-dddd). One suggestion would be to use a hash algorithm, but given the tiny character set used ([0-9]), this would be disastrous. It would be acceptable to validate with some high probability that a number is correct and allow the backend to do a final == check, but I need to do far better than "has nine digits" etc etc.
In my search for better alternatives, I came upon the validation checksums for ISBN numbers and UPC. These look like a great alternative with a high probability of success on the frontend.
Given those constraints, I have three questions:
Is there a way to prove that an algorithm like ISBN13 will work with a different category of data like SSN, or whether it is more or less fit to the purpose from a security perspective? The checksum seems reasonable for my quite large sample of one real SSN, but I'd hate to find out that they aren't generally applicable for some reason.
Is this a solved problem somewhere, so that I can simply use a pre-existing validation scheme to take care of the problem?
Are there any such algorithms that would also easily accommodate validating the last 4 digits of an SSN without giving up too much extra information?
Thanks as always,
Joe
UPDATE:
In response to a question below, a little more detail. I have the customer's SSN as previously entered, stored securely on the backend of the app. What I need to do is verification (to the maximum extent possible) that the customer has entered that same value again on this page. The issue is that I need to prevent the information from being incidentally revealed to the frontend in case some non-authorized person is able to access the page.
That is why an MD5/SHA1 hash is inappropriate: namely that it can be used to derive the complete SSN without much difficulty. A checksum (say, modulo 11) provides nearly no information to the frontend while still allowing a high degree of accuracy for the field validation. However, as stated above I have concerns over its general applicability.
Wikipedia is not the best source for this kind of thing, but given that caveat, http://en.wikipedia.org/wiki/Social_Security_number says
Unlike many similar numbers, no check digit is included.
But before that it mentions some widely used filters:
The SSA publishes the last group number used for each area number. Since group numbers are allocated in a regular (if unusual) pattern, it is possible to identify an unissued SSN that contains an invalid group number. Despite these measures, many fraudulent SSNs cannot easily be detected using only publicly available information. In order to do so there are many online services that provide SSN validation.
Restating your basic requirements:
A reasonably strong checksum to protect against simple human errors.
"Expected" checksum is sent from server -> client, allowing client-side validation.
Checksum must not reveal too much information about SSN, so as to minimize leakage of sensitive information.
I might propose using a cryptographic has (SHA-1, etc), but do not send the complete hash value to the client. For example, send only the lowest 4 bits of the 160 bit hash result[1]. By sending 4 bits of checksum, your chance of detecting a data entry error are 15/16-- meaning that you'll detect mistakes 93% of the time. The flip side, though, is that you have "leaked" enough info to reduce their SSN to 1/16 of search space. It's up to you to decide if the convenience of client-side validation is worth this leakage.
By tuning the number of "checksum" bits sent, you can adjust between convenience to the user (i.e. detecting mistakes) and information leakage.
Finally, given your requirements, I suspect this convenience / leakage tradeoff is an inherent problem: Certainly, you could use a more sophisticated crypto challenge / response algorithm (as Nick ODell astutely suggests). However, doing so would require a separate round-trip request-- something you said you were trying to avoid in the first place.
[1] In a good crypto hash function, all output digits are well randomized due to avalanche effect, so the specific digits you choose don't particularly matter-- they're all effectively random.
Simple solution. Take the number mod 100001 as your checksum. There is 1/100_000 chance that you'll accidentally get the checksum right with the wrong number (and it will be very resistant to one or two digit mistakes canceling out), and 10,000 possible SSNs that it could be so you have not revealed the SSN to an attacker.
The only drawback is that the 10,000 possible other SSNs are easy to figure out. If the person can get the last 4 of the SSN from elsewhere, then they can probably figure out the SSN. If you are concerned about this then you should take the user's SSN number, add a salt, and hash it. And deliberately use an expensive hash algorithm to do so. (You can just iterate a cheaper algorithm, like MD5, a fixed number of times to increase the cost.) Then use only a certain number of bits. The point here being that while someone can certainly go through all billion possible SSNs to come up with a limited list of possibilities, it will cost them more to do so. Hopefully enough that they don't bother.
Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 6 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Given the extremely high requirements for unpredictability to prevent casinos from going bankrupt, what random number generation algorithm and seeding scheme is typically used in devices like slot machines, video poker machines, etc.?
EDIT: Related questions:
Do stateless random number generators exist?
True random number generator
Alternative Entropy Sources
There are many things that the gaming sites have to consider when choosing/implementing an RNG. Without due dilligence, it can go spectacularly wrong.
To get a licence to operate a gaming site in a particular jurisdiction usually requires that the RNG has been certified by an independent third-party. The third-party testers will analyse the source code and run statistical tests (e.g. Diehard) to ensure that the RNG behaves randomly. Reputable poker sites will usually include details of the certification that their RNG has undergone (for example: PokerStars's RNG page).
I've been involved in a few gaming projects, and for one of them I had to design and implement the RNG part, so I had to investigate all of these issues. Most poker sites will use some hardware device for entropy, but they won't rely on just hardware. Usually it will be used in conjunction with a pseudo-RNG (PRNG). There are two main reasons for this. Firstly, the hardware is slow, it can only extract a certain number of bits of entropy in a given time period from whatever physical process it is monitoring. Secondly, hardware fails in unpredictable ways that software PRNGs do not.
Fortuna is the state of the art in terms of cryptographically strong PRNGs. It can be fed entropy from one or more external sources (e.g. a hardware RNG) and is resilient in the face of attempted exploits or RNG hardware failure. It's a decent choice for gaming sites, though some might argue it is overkill.
Pokerroom.com used to just use Java's SecureRandom (they probably still do, but I couldn't find details on their site). This is mostly good enough, but it does suffer from the degrees of freedom problem.
Most stock RNG implementations (e.g. Mersenne Twister) do not have sufficient degrees of freedom to be able to generate every possible shuffle of a 52-card deck from a given initial state (this is something I tried to explain in a previous blog post).
EDIT: I've answered mostly in relation to online poker rooms and casinos, but the same considerations apply to physical video poker and video slots machines in real world casinos.
For a casino gaming applications, I think the seeding of the algorithm is the most important part to make sure all games "booted" up don't run through the same sequence or some small set of predictable sequences. That is, the source of entropy leading to the seed for the starting position is the critical thing. Beyond that, any good quality random number generator where each bit position as has a ~50/50 probability of being 1/0 and the period is relatively long would be sufficient. For example, something like the Mersenne twister PRNG has such properties.
Using cryptographically secure random generators only becomes important when the actual output of the random generator can be viewed directly. For example, if you were monitoring each number actually generated by the number generator - after viewing many numbers in the sequence - with a non-cryptographic generator information about that sequence can lead to establishing information about all the internal state of the generator. At this point, if you know what the algorithm looks like, you would be able to predict future numbers and that would be bad. The cryptographic generator prevents that reverse engineering back to the internal state so that predicting future numbers becomes "impossible".
However, in the case of a casino game, you would (or should) have no visibility to the actual numbers being generated under the hood. Each time a random number is generated - say a 32-bit number - that number will be used then, for example, mod 52 for a deck shuffling algorithm....no where in that process do you have any idea what numbers were being generated by the algorithm to shuffle that deck. That is, most of the bits of "randomness" is just being thrown out and even the ones being used you have no visibility to. Therefore, no way to reverse engineer the state.
Getting back to a true source of entropy to seed the whole process, that is the hard part. See the Wikipedia entry on entropy for some starting points on techniques.
As an aside, if you did want cryptographically sequence random numbers from a "regular" algorithm, a simple approach is to take a few random numbers in sequence, concatenate them together and then run something like MD5 or SHA-1 on them and the result is just as random and also cryptographically secure. That is, you just made your own "secure" random number generator.
We've been using the Protego R210-USB TRNG (and the non-usb version before that) as random seed generators in casino applications, with java.security.SecureRandom
on top. We had The Swedish National Laboratory of Forensic Science perform a separate audit of the R210, and it passed without a flaw.
You probably need a cryptographically secure pseudo-random generator. There are a lot of variants. Google "Blum-Blum-Shub", for example.
The security properties of these pseudo-random generators will generally be that, even when the attacker can observe polynomially many outputs from such generators, it won't be feasible to guess the next output with a probability much better than random guessing. Also, it is not feasible to distinguish the output of such generators from truly random bits. The security holds even when all the algorithms and parameters are known by the attacker (except for the secret seed).
The security of the generators is often measured with respect to a security parameter. In the case of BBS, it is the size of the modulus. This is no different from other crypto stuff. For example, RSA is secure only when the key is long enough.
Note that, the output of such generators may not be uniform (in fact, can be far away from uniform in statistical sense). But since no one can distinguish the two distributions without infinite computing power, these generators will suffice in most applications that require truly random bits.
Bear in mind, however, that these cryptographically secure pseudo-random generators are usually slow. So if speed is indeed a concern, less rigorous approaches may be more relevant, such as using hash functions, as suggested by Jeff.
Casino slot machines generate random numbers continuously at very high speed and use the most recent result(s) when the user pulls the lever (or hits the button) to spin the reels.
Even a simplistic generator can be used. Even if you knew the algorithm used, you cannot observe where in the sequence it is because nearly all the results are discarded. If somehow you did know where it was in the sequence, you'd have to have millisecond or better timing to take advantage of it.
Modern "mechanical reel" machines use PRNGs and drive the reels with stepper motors to simulate the old style spin-and-brake.
http://en.wikipedia.org/wiki/Slot_machine#Random_number_generators
http://entertainment.howstuffworks.com/slot-machine3.htm
Casinos shouldn't be using Pseudo-random number generators, they should be using hardware ones: http://en.wikipedia.org/wiki/Hardware_random_number_generator
I suppose anything goes for apps and offshore gambling nowadays, but all these other answers are incomplete, at least for Nevada Gaming Control Board licensed machines, which I believe the question is originally about.
The technical specifications for RNGs licensed in Nevada for gaming purposes are laid out in Regulation 14.040(2).
As of May 24, 2012, here is a summary of the rules a RNG must follow:
Static seeds cannot be used. You have to seed the RNG using a millisecond time source or other true entropy source which has no external readout anywhere on the machine. (This helps to reduce the incidence of "magic number" attacks)
The RNG must continue generating numbers in its sequence at least 100 times per second when the game is not being played. (This helps to avoid timing attacks)
RNG outputs cannot be reused; they must be used exactly once if at all and then thrown away.
Multi-system cabinets must use a separate RNG and separate seed for each game.
Games that use RNGs for helping to choose numbers on behalf of the player (such as Lotto Quick Pick) must use a separate RNG for that process.
Games must not roll RNGs until they are actually needed in a game. (i.e. you need to wait until the player chooses to deal or spin before generating RNGs)
The RNG must pass a 95% confidence chi-squared test based on 10,000 trials as apart of a system test. It must display a warning if this test fails, and it must disable play if it fails twice in a row.
It must remember, and be able to report on, the last 10 test results as described in 7.
Every possible game outcome must be generatable by the RNG. As a pessimal example, linear congruential generators don't actually generate every possible output in their range, so they're not very useful for gaming.
Additionally, your machine design has to be submitted to the gaming commission and it has to be approved, which is expensive and takes lots of time. There are a few third-party companies that specialize in auditing your new RNG to make sure it's random. Gaming Laboratories publishes an even stricter set of standards than Nevada does. They go into much greater detail about the limitations of hardware RNGs, and Nevada in particular likes to see core RNGs that it's previously approved. This can all get very expensive, which is why many developers prefer to license an existing previously-approved RNG for new game projects.
Here is a fun list of random number generator attacks to keep you up late at night.
For super-nerds only: the source for most USB hardware RNGs is typically an avalanche diode. However, the thermal noise produced by this type of diode is not quantum-random, and it is possible to influence the randomness of an avalanche diode by significantly lowering the temperature.
As a final note, someone above recommended just using a Mersenne Twister for random number generation. This is a Bad Idea unless you are taking additional entropy from some other source. The plain vanilla Mersenne Twister is highly inappropriate for gaming and cryptographic applications, as described by its creator.
If you want to do it properly you have to get physical - ERNIE the UK national savings number picker uses a shot noise in Neon tubes.
I for sure have seen a german gambling machine that was not allowed to be ran commercially after a given date, so I suppose it was a PNRG with a looong one time pad seed list.
Most poker sites use hardware random number generators. They will also modify the output to remove any scaling bias and often use 'pots' of numbers which can be 'stirred' using entropic events (user activity, serer i/o events etc). Quite often the resultant numbers just index pre-generated decks (starting off as a sorted list of cards).
Is there a difference between generating multiple numbers using a single random number generator (RNG) versus generating one number per generator and discarding it? Do both implementations generate numbers which are equally random? Is there a difference between the normal RNGs and the secure RNGs for this?
I have a web application that is supposed to generate a list of random numbers on behalf of clients. That is, the numbers should appear to be random from each client's point of view. Does this mean I need retain a separate random RNG per client session? Or can I share a single RNG across all sessions? Or can I create and discard a RNG on a per-request basis?
UPDATE: This question is related to Is a subset of a random sequence also random?
A random number generator has a state -- that's actually a necessary feature. The next "random" number is a function of the previous number and the seed/state. The purists call them pseudo-random number generators. The numbers will pass statistical tests for randomness, but aren't -- actually -- random.
The sequence of random values is finite and does repeat.
Think of a random number generator as shuffling a collection of numbers and then dealing them out in a random order. The seed is used to "shuffle" the numbers. Once the seed is set, the sequence of numbers is fixed and very hard to predict. Some seeds will repeat sooner than others.
Most generators have period that is long enough that no one will notice it repeating. A 48-bit random number generator will produce several hundred billion random numbers before it repeats -- with (AFAIK) any 32-bit seed value.
A generator will only generate random-like values when you give it a single seed and let it spew values. If you change seeds, then numbers generated with the new seed value may not appear random when compared with values generated by the previous seed -- all bets are off when you change seeds. So don't.
A sound approach is to have one generator and "deal" the numbers around to your various clients. Don't mess with creating and discarding generators. Don't mess with changing seeds.
Above all, never try to write your own random number generator. The built-in generators in most language libraries are really good. Especially modern ones that use more than 32 bits.
Some Linux distros have a /dev/random and /dev/urandom device. You can read these once to seed your application's random number generator. These have more-or-less random values, but they work by "gathering noise" from random system events. Use them sparingly so there are lots of random events between uses.
I would recommend using a single generator multiple times. As far as I know, all the generators have a state. When you seed a generator, you set its state to something based on the seed. If you keep spawning new ones, it's likely that the seeds you pick will not be as random as the numbers generated by using just one generator.
This is especially true with most generators I've used, which use the current time in milliseconds as a seed.
Hardware-based, true [1], random number generators are possible, but non-trivial and often have low mean rates. Availablity can also be an issue [2]. Googling for "shot noise" or "radioactive decay" in combination with "random number generator" should return some hits.
These systems do not need to maintain state. Probably not what you were looking for.
As noted by others, software systems are only pseudo-random, and must maintain state.
A compromise is to use a hardware based RNG to provide an entropy pool (stored state) which is made available to seed a PRNG. This is done quite explicitly in the linux implementation of /dev/random [3] and /dev/urandom [4].
These is some argument about just how random the default inputs to the /dev/random entropy pool really are.
Footnotes:
modulo any problems with our understanding of physics
because you're waiting for a random process
/dev/random features direct access to the entropy pool seeded from various sources believed to be really or nearly random, and blocks when the entropy is exhausted
/dev/urandom is like /dev/random, but when the entopy is exhausted a cryptographic hash is employed which makes the entropy pool effectively a stateful PRNG
If you create a RNG and generate a single random number from it then discard the RNG, the number generated is only as random as the seed used to start the RNG.
It would be much better to create a single RNG and draw many numbers from it.
As people have already said, it's much better to seed the PRNG once, and reuse it. A secure PRNG is simply one which is suitable for cryptographic applications. The only way re-seeding each time will give reasonably random results is where it comes from a genuinely random "real world" source - ie specialised hardware. Even then, it's possible that the source is biased and it will still be theoretically better to use the same PRNG over.
Normally seeding a new state takes quite while for a serious PRNG, and making new ones each time won't really help much.
The only case I can think of where you might want more than one PRNG is for different systems, say in a casino game you have one generator for shuffling cards and a separate one to generate comments done by the computer control characters, this way REALLY dedicated users can't guess outcomes based on character behaviors.
A nice solution for seeding is to use this (Random.org) , they supply random numbers generated from the atmospheric noise for free. It could be a better source for seeding than using time.
Edit: In your case, I would definitely use one PRNG per client, if for no other reason than for good programming standards. Anyways if you share one PRNG among clients, you will still be providing pseudo-random values to each, of a quality equal to your PRNG's quality. So that's a viable option but seems like a bad policy for programming
It's worth mentioning that Haskell is a language which attempts to entirely eliminate mutable state. In order to reconcile this goal with hard-requirements like IO (which requires some form of mutability), monads must be used to thread state from one calculation to the next. In this way, Haskell implements its pseudo-random number generator. Strictly speaking, generating random numbers is an inherently stateful operation, but Haskell is able to hide this fact by moving the state "mutation" into the bind (>>=) operation.
This probably sounds a little abstract, and it doesn't really answer your question completely, but I think it is still applicable. From a theoretical standpoint, it is impossible to work with a RNG without involving state. Regardless, there are techniques which can be used to mitigate this interaction and make it appear as if the entire operation is of a stateless nature.
It's generally better to create a single PRNG and pull multiple values from it. Creating multiple instances means you need to ensure that the seeds for the instances are guaranteed unique, which will require incorporating instance-specific information.
As an aside, there are better "true" Random Number Generators, but they usually require specialized hardware which does things like derive random data from electrical signal variance inside the computer. Unless you're really worried about it, I'd say the Pseudo Random Number Generators built into the language libraries and/or OS are probably sufficient, as long as your seed value is not easily predictable.
The use of a secure PRNG depends on your application. What are the random numbers used for?
If they're something of real value (e.g. anything cryptographically related), you wouldn't want to use anything less.
Secure PRNGs are much slower, and may require libraries to do operation of arbitrary precision, and primality testing, etc etc...
Well, as long as they are seeded differently each time they're created, then no, I don't think there'd be any difference; however, if it depended on something like the time, then they'd probably be non-uniform, due to the biased seed.