characteristics of various hash algorithms? - performance

MD5, MD6?, all the SHA-somethings, CRC-somethings. I've used them before and seen them used in various places, but I have no idea why you would use one over another.
On a very high level, what is the difference between all these 3/4 letter acronyms In terms of performance, collision probability and general hard-to-crackness? Does any of those depend on what kind or what amount of data I am hashing?
What trade-offs am I making when i choose one over another? I've read that the CRC is not suitable to use for security, but what about for general hash-table collision avoidance?

CRC-whatever is used primarily (should be exclusively) for protection against accidental changes in data. They do quite a good job of detecting noise and such, but are not intended for cryptographic purposes -- finding a second preimage (a second input that produces the same hash) is (by cryptographic standards) trivial. [Edit: As noted by #Jon, unlike the other hashes mentioned here, CRC is not and never was intended for cryptographic use.]
MD-5. Originally intended for cryptographic use, but fairly old and now considered fairly weak. Although no second preimage attack is known, a collision attack is known (i.e., a way to produce two selected inputs that produce the same result, but not a second input to produce the same result as one that's specified). About the only time to use this any more is as a more elaborate version of a CRC.
SHA-x
Once upon a time, there was simply "SHA". Very early in its history, a defect was found, and a slight modification was made to produce SHA-1. SHA was in use for a short enough time that it's rarely of practical interest.
SHA-1 is generally more secure than MD-5, but still in the same general range -- a collision attack is known, though it's a lot1 more expensive than for MD-5. No second preimage attack is known, but the collision attack is enough to say "stay away".
SHA-256, SHA-384, SHA-512: These are sort of based on SHA-1, but are somewhat more complex internally. At least as far as I'm aware, neither a second-preimage attack nor a collision attack is known on any of these at the present time.
SHA-3: US National Institute of Standards and Technology (NIST) is currently holding a competition to standardize a replacement for the current SHA-2 series hash algorithm, which will apparent be called SHA-3. As I write this (September 2011) the competition is currently in its third round, with five candidates (Blake, Grøstl, JH, Kaccek and Skein2) left in the running. Round 3 is scheduled to be over in January 2012, at which time public comments on the algorithms will no longer be (at least officially) accepted. In March 2012, a (third) SHA-3 conference will be held (in Washington DC). At some unspecified date later in 2012, the final selection will be announced.
1 For anybody who cares about how much more expensive it is to attack SHA-1 than MD-5, I'll try to give some concrete numbers. For MD-5, my ~5 year-old machine can produce a collision in about 40-45 minutes. For SHA-1, I only have an estimate, but my estimate is that a cluster to produce collisions at a rate of one per week would cost well over a million US dollars (and probably closer to $10 million). Even given an existing machine, the cost of operating the machine long enough to find a collision is substantial.
2 Since it's almost inevitable that somebody will at least wonder, I'll point out that the entry Bruce Schneier worked on is Skein.

Here's the really short summary:
MD4, MD5, SHA-1 and SHA-2 all belong to a category of general purpose secure hashing algorithms. They're generally interchangeable, in that they all produce a hashcode from a plaintext such that it's designed to be computationally infeasible to determine a plaintext that generates a hash (a 'preimage'), or to find two texts that hash to the same thing (a 'collision'). All of them are broken to various degrees, or at least believed to be vulnerable.
MD6 was a candidate for NIST's SHA-3 competition, but it was withdrawn. It has the same general characteristics of the above hash functions, but like many of the SHA-3 candidates, adds extra features - in this case a merkle-tree like structure for improving parallelization of hashes. It goes without saying that it, along with the remaining SHA-3 candidates, are not yet well tested.
A CRC is in fact not a hash algorithm at all. Its name stands for Cyclic Redundancy Check, and it's a checksum rather than a hash. Different CRCs are designed to resist different levels of transmission errors, but they all have in common a guarantee that they will detect a certain number of bit errors, something hash algorithms do not share. They're not as well distributed as a hash algorithm, so shouldn't be used as one.
There are a range of general purpose hash algorithms suitable for use in hashtables etcetera, such as FNV. These tend to be a lot faster than cryptographic hashes, but aren't designed to resist attacks by an adversary. Unlike secure hashes, some of them show quite poor distribution characteristics, and are only suitable for hashing certain types of data.

To complete the other answers: performance varies among hash functions. Hash functions are built on elementary operations, which are more or less efficient, depending on the architecture. For instance, the SHA-3 candidate Skein uses additions on 64-bit integers and is very fast on platforms which offer 64-bit operations, but on 32-bit-only systems (including all ARM processors), Skein is much slower.
SHA-256 is usually said to be "slow" but will still hash data at a rate of about 150 megabytes per second on a basic PC (a 2.4 GHz Core2), which is more than enough for most applications. It is rare that hash function performance is really important on a PC. Things can be different on embedded systems (from smartcards to smartphones) where you could get more data to process than what the CPU can handle. MD5 will be typically 3 to 6 times faster than SHA-256. SHA-256 is still the recommended default choice, since its security is still intact; consider using something else only if you get a real, duly constated and measured performance issue.
On small 32-bit architectures (MIPS, ARM...), all remaining SHA-3 candidates are slower than SHA-256, so getting something faster and yet secure could be challenging.

Related

Fast hash function with collision possibility near SHA-1

I'm using SHA-1 to detect duplicates in a program handling files. It is not required to be cryptographic strong and may be reversible. I found this list of fast hash functions https://code.google.com/p/xxhash/ (list has been moved to https://github.com/Cyan4973/xxHash)
What do I choose if I want a faster function and collision on random data near to SHA-1?
Maybe a 128 bit hash is good enough for file deduplication? (vs 160 bit sha-1)
In my program the hash is calculated on chuncks from 0 - 512 KB.
Maybe this will help you:
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM & MurmurHash
I don't know about xxHash but it looks also promising.
MurmurHash is very fast and version 3 supports 128bit length, I would choose this one. (Implemented in Java and Scala.)
Since the only relevant property of hash algorithms in your case is the collision probability, you should estimate it and choose the fastest algorithm which fulfills your requirements.
If we suppose your algorithm has absolute uniformity, the probability of a hash collision among n files using hashes with d possible values will be:
For example, if you need a collision probability lower than one in a million among one million of files, you will need to have more than 5*10^17 distinct hash values, which means your hashes need to have at least 59 bits. Let's round to 64 to account for possibly bad uniformity.
So I'd say any decent 64-bit hash should be sufficient for you. Longer hashes will further reduce collision probability, at a price of heavier computation and increased hash storage volume. Shorter caches like CRC32 will require you to write some explicit collision handling code.
Google developed and uses (I think) FarmHash for performance-critical hashing. From the project page:
FarmHash is a successor to CityHash, and includes many of the same tricks and techniques, several of them taken from Austin Appleby’s MurmurHash.
...
On CPUs with all the necessary machine instructions, about six different hash functions can contribute to FarmHash's lineup. In some cases we've made significant performance gains over CityHash by using newer instructions that are now commonly available. However, we've also squeezed out some more speed in other ways, so the vast majority of programs using CityHash should gain at least a bit when switching to FarmHash.
(CityHash was already a performance-optimized hash function family by Google.)
It was released a year ago, at which point it was almost certainly the state of the art, at least among the published algorithms. (Or else Google would have used something better.) There's a good chance it's still the best option.
The facts:
Good hash functions, specially the cryptographic ones (like SHA-1),
require considerable CPU time because they have to honor a number of
properties that wont be very useful for you in this case;
Any hash function will give you only one certainty: if the hash values of two files are different, the files are surely different. If, however, their hash values are equal, chances are that the files are also equal, but the only way to tell for sure if this "equality" is not just a hash collision, is to fall back to a binary comparison of the two files.
The conclusion:
In your case I would try a much faster algorithm like CRC32, that has pretty much all the properties you need, and would be capable of handling more than 99.9% of the cases and only resorting to a slower comparison method (like binary comparison) to rule out the false positives. Being a lot faster in the great majority of comparisons would probably compensate for not having an "awesome" uniformity (possibly generating a few more collisions).
128 bits is indeed good enough to detect different files or chunks. The risk of collision is infinitesimal, at least as long as no intentional collision is being attempted.
64 bits can also prove good enough if the number of files or chunks you want to track remain "small enough" (i.e. no more than a few millions ones).
Once settled the size of the hash, you need a hash with some very good distribution properties, such as the ones listed with Q.Score=10 in your link.
It kind of depends on how many hashes you are going to compute over in an iteration.
Eg, 64bit hash reaches a collision probability of 1 in 1000000 with 6 million hashes computed.
Refer to : Hash collision probabilities
Check out MurmurHash2_160. It's a modification of MurmurHash2 which produces 160-bit output.
It computes 5 unique results of MurmurHash2 in parallel and mixes them thoroughly. The collision probability is equivalent to SHA-1 based on the digest size.
It's still fast, but MurmurHash3_128, SpookyHash128 and MetroHash128 are probably faster, albeit with a higher (but still very unlikely) collision probability. There's also CityHash256 which produces a 256-bit output which should be faster than SHA-1 as well.

Is there a somewhat-reliable way to detect that a list of integers came from a common PRNG?

Basically I'm looking for a detective function. I pass it a list of integers (probably between 20 and 100 integers) and it tell me "Yeah, 84% chance this came from a PRNG, I tested it against the main ones that most modern programming languages use", or "No, only 12% chance this came from a well-known PRNG".
If it helps (or hinders), the integers will always be between 1 and 999.
Does this exist?
Unless you are prepared to break new ground in number theory, you would only be able to detect obsolete, badly designed, or poorly seeded PRNGs. Good PRNGs are explicitly designed to prevent what you are trying to do. Random number generation is a critical part of digital cryptography, so a lot of effort goes into producing random numbers that meet all known tests.
There are batteries of tests to profile PRNGs. See for example this NIST page.
As the comments point out, the first two sentences are overstated and are only strictly true for PRNGs that may be used in cryptography. Weaker (i.e. more predictable) PRNGs might be chosen for other domains in order to improve time or space performance.
You can write a battery of tests for a list of candidate generators, but there are a lot of generators, and some have enormous state where adjacent values of a well-seeded generator will reveal nothing useful and you'll have to see wait for a long time before you can get the two data points which will have an informative relationship.
On the plus side; while the list of random number generators that you might encounter is vast, there are telltale signs that will help you identify some classes of simple generators quickly and then you can perform focussed analysis to derive the specific configuration.
Unfortunately even a simple generator like KISS shows that while the generator can be trivially broken when you know its configuration, it can hide its signature from anything that does not know its configuration, leaving you in a situation where you have to individually test for every possible configuration.
There are quality tests like dieharder and TestU01 which will consume many megabytes of data to identify any weakness in a generator; however, these can also identify weaknesses in real RNGs, so they could give a strong false positive.
To consume only a 100 integers you would really need to have a list of generators in mind. For example, to detect LCG used inappropriately, you simply test to see if the bottom three bits cycle through a repeating pattern of 8 values -- but this is by far the easiest case.
If you had a sequence 625 or more 32-bit integers, you could detect with high confidence whether it was from consecutive calls to Mersenne Twister. That is because it leaks state information in the output values.
For an example of how it is done, see this blog entry.
Similar results are in theory possible when you don't have ideal data such as full 32-bit integers, but you would need a longer sequence and the maths gets harder. You would also need to know - or perhaps guess by trying obvious options - how the numbers were being reduced from the larger range to the smaller one.
Similar results are possible from other PRNGs, but generally only the non-cryptographic ones.
In principle you could identify specific PRNG sequences with very high confidence, but even simple barriers such as missing numbers from the strict sequence can make it a lot harder. There will also be many PRNGs that you will not be able to reliably detect, and typically you will either have close to 100% confidence of a match (to a hackable PRNG) or 0% confidence of any match.
Whether or not a PRNG is a hackable (and therefore could be detected by the numbers it emits) is not a general indicator of PRNG quality. Obviously, "hackable" is opposite to a requirement for "secure", so don't consider Mersenne Twister for creating unguessable codes. However, do consider it as a source of randomness for e.g. neural networks, genetic algorithms, monte-carlo simulations and other places where you need a lot of statistically random-looking data.

Ideal hashing method for wide distribution of values?

As part of my rhythm game that I'm working, I'm allowing users to create and upload custom songs and notecharts. I'm thinking of hashing the song and notecharts to uniquely identify them. Of course, I'd like as few collisions as possible, however, cryptographic strength isn't of much importance here as a wide uniform range. In addition, since I'd be performing the hashes rarely, computational efficiency isn't too big of an issue.
Is this as easy as selecting a tried-and-true hashing algorithm with the largest digest size? Or are there some intricacies that I should be aware of? I'm looking at either SHA-256 or 512, currently.
All cryptographic-strength algorithm should exhibit no collision at all. Of course, collisions necessarily exist (there are more possible inputs than possible outputs) but it should be impossible, using existing computing technology, to actually find one.
When the hash function has an output of n bits, it is possible to find a collision with work about 2n/2, so in practice a hash function with less than about 140 bits of output cannot be cryptographically strong. Moreover, some hash functions have weaknesses that allow attackers to find collisions faster than that; such functions are said to be "broken". A prime example is MD5.
If you are not in a security setting, and fear only random collisions (i.e. nobody will actively try to provoke a collision, they may happen only out of pure bad luck), then a broken cryptographic hash function will be fine. The usual recommendation is then MD4. Cryptographically speaking, it is as broken as it can be, but for non-cryptographic purposes it is devilishly fast, and provides 128 bits of output, which avoid random collisions.
However, chances are that you will not have any performance issue with SHA-256 or SHA-512. On a most basic PC, they already process data faster than what a hard disk can provide: if you hash a file, the file reading will be the bottleneck, not the hashing. My advice would be to use SHA-256, possibly truncating its output to 128 bits (if used in a non-security situation), and consider switching to another function only if some performance-related trouble is duly noticed and measured.
If you're using it to uniquely identify tracks, you do want a cryptographic hash: otherwise, users could deliberately create tracks that hash the same as existing tracks, and use that to overwrite them. Barring a compelling reason otherwise, SHA-1 should be perfectly satisfactory.
If cryptographic security is not of concern then you can look at this link & this. The fastest and simplest (to implement) would be Pearson hashing if you are planing to compute hash for the title/name and later do lookup. or you can have look at the superfast hash here. It is also very good for non cryptographic use.
What's wrong with something like an md5sum? Or, if you want a faster algorithm, I'd just create a hash from the file length (mod 64K to fit in two bytes) and 32-bit checksum. That'll give you a 6-byte hash which should be reasonably well distributed. It's not overly complex to implement.
Of course, as with all hashing solutions, you should monitor the collisions and change the algorithm if the cardinality gets too low. This would be true regardless of the algorithm chosen (since your users may start uploading degenerate data).
You may end up finding you're trying to solve a problem that doesn't exist (in other words, possible YAGNI).
Isn't cryptographic hashing an overkill in this case, though I understand that modern computers do this calculation pretty fast? I assume that your users will have an unique userid. When they upload, you just need to increment a number. So, you will represent them internally as userid1_song_1, userid1_song_2 etc. You can store this info in a database with that as the unique key along with user specified name.
You also didn't mention the size of these songs. If it is midi, then file size will be small. If file sizes are big (say 3MB) then sha calculations will not be instantaneous. On my core2-duo laptop, sha256sum of a 3.8 MB file takes 0.25 sec; for sha1sum it is 0.2 seconds.
If you intend to use a cryptographic hash, then sha1 should be more than adequate and you don't need sha256. No collisions --- though they exist --- have been found yet. Git, Mercurial and other distributed version control systems use sh1. Git is a content based system and uses sha1 to find out if content has been modified.

New cryptographic algorithms?

I was wondering about new trends in cryptography. Which algorithms are new ? Which are improved and which died beacuse of the time that past ?
For example EEC ( Elliptic Curve Cryptography ) is quite new approach, but definitly not the only one. Could you name some of them ?
ECC actually originates from the 80's; it is not exactly new.
In the context of asymmetric encryption and digital signatures, there has been in the last few years much research on pairings. Pairings open up the next level. Conceptually: symmetric cryptography is for problems with one entity (all entities share a secret key, thus they are the "same" entity), asymmetric cryptography is for problems with two entities (the signer and the verifier), and pairings are a tool for protocols with three entities (for instance, electronic cash, where there are the bank, the merchant and the buyer). The only really practical pairings found so far use elliptic curves, but with a much higher dose of mathematics.
As for more classical asymmetric encryption and signatures, there has been some work on many other algorithms, such as HFE, which seems especially good with regards to signature sizes, or lattice-based cryptography. This is still quite new. It takes some time (say a dozen years or so) before a newly created algorithm becomes mature enough to be standardized.
Following work by Bellovin and Merritt in 1992, some Password Authenticated Key Exchange protocols have been described. These protocols are meant to allow for password-based mutual authentication immune to offline dictionary attacks (i.e. the protocols imply that an attacker, even if actively masquerading as one of the parties, cannot obtain enough information to test passwords at his leisure; each guess from the attacker must go through an interaction with one of the entities who knows the password). IEEE group P1363 is working on writing standards on that subject.
In the area of symmetric encryption, the AES has been a bit "final". A few stream ciphers have been designed afterwards (stream ciphers are supposed to provide better performance, at the cost of less generic applicability); some were analyzed by the eSTREAM project. There has been quite some work on encryption modes, which try to combine symmetric encryption and integrity checks in one efficient system (see for instance GCM and CWC).
Hash functions have been a hot subject lately. A bunch of old hash functions, including the famous MD5, were broken in 2004. There is an ongoing competition to determine the next American standard hash function, codenamed SHA-3.
An awful lot of work has been done on some implementation issues, in particular side-channel leaks (how secret data leaks through power consumption, timing, residual electro-magnetic emissions...) and how to block them.
The main problem of contemporary cryptography is not finding algorithms but whole concepts and approaches for different situations (but of course the algorithms are continually improved too).
We have today
Symmetric algorithms (AES)
Asymmetric algorithms (RSA, ECC)
Key exchange (Diffie-Hellman-Key-Exchange, Shamir's no key protocol)
Secret sharing (intersection of n-dimensional planes)
Cryptographic hash functions (SHA)
Some have proven insecure and were improved
DES due to a much to small key-space
MD5
and some are broken
Merke/Hellman knapsack cryptosystem
Monoalphabetic subsitution
Naive Vigenère
Which particular algorithm is chosen is often a question of available resources (elliptic curves need smaller keys that RSA algorithm for comparable safety) or just of standardization (as tanascius pointed out, there are competitions for such algorithms). Totally new trends usually start when a whole class of cryptosystems has been shown vulnerable against a specific attack (man-in-the-middle, side-channel) or scientific progress is made (quantum cryptography).
Of course, there is also steganography which doesn't attempt so conceal the content but the existence of a secret message by hiding it in other documents.
Currently there is the NIST hash function competition running with the goal to find a replacement for the older SHA-1 and SHA-2 functions. So this is about a cryptographic hash function.
You can have a look at the list of the accepted algorithms for round two, and you can get whitepapers to all of the algorithms taking part there.
I am not up-to-date, but I doubt that there are any completely new approaches for the algorithms.
EDIT: Well, one of the candidates was the Elliptic curve only hash, but it is listed under "Entrants with substantial weaknesses" ^^
People have covered most other things; I'll talk about design:
Block ciphers: Traditional block cipher (e.g. DES) uses a Feistel structure. There's a move to a more general S-P network (Rijndael, Serpent) which are easier to parallelize, and cipher modes which support parallelization (CS, GCM) and efficient authenticated encryption (CS, GCM, IGE/BIGE).
Hashes: Traditional hashes use the obvious Merkle–Damgård construction. This has some undesirable properties:
Length extension is trivial (to some extent this is mitigated by proper finalization)
Several attacks on the collision resistance (multicollisions, Joux 2004; "expandable message" attacks, Kelsey 2005; herding attacks, Kelsey 2006).
In the popular Davies–Meyer mode used in MD{4,5} and SHA-{0,1,2} — hi=hi−1⊕E(mi, hi−1)) — every message block has a fixed-point D(mi,0). This makes the expandable message attack much easier.
The SHA-3 contest notice (070911510–7512–01) also suggests randomized hashing and resistance against length extension (currently achieved with HMAC for MD5/SHA-1/SHA-2) and parallelization (a few hashes specify a tree hashing mode).
There's a general trend towards avoiding potential table lookups (e.g. Threefish, XTEA) to mitigate cache timing attacks.

What Type of Random Number Generator is Used in the Casino Gaming Industry? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 6 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Given the extremely high requirements for unpredictability to prevent casinos from going bankrupt, what random number generation algorithm and seeding scheme is typically used in devices like slot machines, video poker machines, etc.?
EDIT: Related questions:
Do stateless random number generators exist?
True random number generator
Alternative Entropy Sources
There are many things that the gaming sites have to consider when choosing/implementing an RNG. Without due dilligence, it can go spectacularly wrong.
To get a licence to operate a gaming site in a particular jurisdiction usually requires that the RNG has been certified by an independent third-party. The third-party testers will analyse the source code and run statistical tests (e.g. Diehard) to ensure that the RNG behaves randomly. Reputable poker sites will usually include details of the certification that their RNG has undergone (for example: PokerStars's RNG page).
I've been involved in a few gaming projects, and for one of them I had to design and implement the RNG part, so I had to investigate all of these issues. Most poker sites will use some hardware device for entropy, but they won't rely on just hardware. Usually it will be used in conjunction with a pseudo-RNG (PRNG). There are two main reasons for this. Firstly, the hardware is slow, it can only extract a certain number of bits of entropy in a given time period from whatever physical process it is monitoring. Secondly, hardware fails in unpredictable ways that software PRNGs do not.
Fortuna is the state of the art in terms of cryptographically strong PRNGs. It can be fed entropy from one or more external sources (e.g. a hardware RNG) and is resilient in the face of attempted exploits or RNG hardware failure. It's a decent choice for gaming sites, though some might argue it is overkill.
Pokerroom.com used to just use Java's SecureRandom (they probably still do, but I couldn't find details on their site). This is mostly good enough, but it does suffer from the degrees of freedom problem.
Most stock RNG implementations (e.g. Mersenne Twister) do not have sufficient degrees of freedom to be able to generate every possible shuffle of a 52-card deck from a given initial state (this is something I tried to explain in a previous blog post).
EDIT: I've answered mostly in relation to online poker rooms and casinos, but the same considerations apply to physical video poker and video slots machines in real world casinos.
For a casino gaming applications, I think the seeding of the algorithm is the most important part to make sure all games "booted" up don't run through the same sequence or some small set of predictable sequences. That is, the source of entropy leading to the seed for the starting position is the critical thing. Beyond that, any good quality random number generator where each bit position as has a ~50/50 probability of being 1/0 and the period is relatively long would be sufficient. For example, something like the Mersenne twister PRNG has such properties.
Using cryptographically secure random generators only becomes important when the actual output of the random generator can be viewed directly. For example, if you were monitoring each number actually generated by the number generator - after viewing many numbers in the sequence - with a non-cryptographic generator information about that sequence can lead to establishing information about all the internal state of the generator. At this point, if you know what the algorithm looks like, you would be able to predict future numbers and that would be bad. The cryptographic generator prevents that reverse engineering back to the internal state so that predicting future numbers becomes "impossible".
However, in the case of a casino game, you would (or should) have no visibility to the actual numbers being generated under the hood. Each time a random number is generated - say a 32-bit number - that number will be used then, for example, mod 52 for a deck shuffling algorithm....no where in that process do you have any idea what numbers were being generated by the algorithm to shuffle that deck. That is, most of the bits of "randomness" is just being thrown out and even the ones being used you have no visibility to. Therefore, no way to reverse engineer the state.
Getting back to a true source of entropy to seed the whole process, that is the hard part. See the Wikipedia entry on entropy for some starting points on techniques.
As an aside, if you did want cryptographically sequence random numbers from a "regular" algorithm, a simple approach is to take a few random numbers in sequence, concatenate them together and then run something like MD5 or SHA-1 on them and the result is just as random and also cryptographically secure. That is, you just made your own "secure" random number generator.
We've been using the Protego R210-USB TRNG (and the non-usb version before that) as random seed generators in casino applications, with java.security.SecureRandom
on top. We had The Swedish National Laboratory of Forensic Science perform a separate audit of the R210, and it passed without a flaw.
You probably need a cryptographically secure pseudo-random generator. There are a lot of variants. Google "Blum-Blum-Shub", for example.
The security properties of these pseudo-random generators will generally be that, even when the attacker can observe polynomially many outputs from such generators, it won't be feasible to guess the next output with a probability much better than random guessing. Also, it is not feasible to distinguish the output of such generators from truly random bits. The security holds even when all the algorithms and parameters are known by the attacker (except for the secret seed).
The security of the generators is often measured with respect to a security parameter. In the case of BBS, it is the size of the modulus. This is no different from other crypto stuff. For example, RSA is secure only when the key is long enough.
Note that, the output of such generators may not be uniform (in fact, can be far away from uniform in statistical sense). But since no one can distinguish the two distributions without infinite computing power, these generators will suffice in most applications that require truly random bits.
Bear in mind, however, that these cryptographically secure pseudo-random generators are usually slow. So if speed is indeed a concern, less rigorous approaches may be more relevant, such as using hash functions, as suggested by Jeff.
Casino slot machines generate random numbers continuously at very high speed and use the most recent result(s) when the user pulls the lever (or hits the button) to spin the reels.
Even a simplistic generator can be used. Even if you knew the algorithm used, you cannot observe where in the sequence it is because nearly all the results are discarded. If somehow you did know where it was in the sequence, you'd have to have millisecond or better timing to take advantage of it.
Modern "mechanical reel" machines use PRNGs and drive the reels with stepper motors to simulate the old style spin-and-brake.
http://en.wikipedia.org/wiki/Slot_machine#Random_number_generators
http://entertainment.howstuffworks.com/slot-machine3.htm
Casinos shouldn't be using Pseudo-random number generators, they should be using hardware ones: http://en.wikipedia.org/wiki/Hardware_random_number_generator
I suppose anything goes for apps and offshore gambling nowadays, but all these other answers are incomplete, at least for Nevada Gaming Control Board licensed machines, which I believe the question is originally about.
The technical specifications for RNGs licensed in Nevada for gaming purposes are laid out in Regulation 14.040(2).
As of May 24, 2012, here is a summary of the rules a RNG must follow:
Static seeds cannot be used. You have to seed the RNG using a millisecond time source or other true entropy source which has no external readout anywhere on the machine. (This helps to reduce the incidence of "magic number" attacks)
The RNG must continue generating numbers in its sequence at least 100 times per second when the game is not being played. (This helps to avoid timing attacks)
RNG outputs cannot be reused; they must be used exactly once if at all and then thrown away.
Multi-system cabinets must use a separate RNG and separate seed for each game.
Games that use RNGs for helping to choose numbers on behalf of the player (such as Lotto Quick Pick) must use a separate RNG for that process.
Games must not roll RNGs until they are actually needed in a game. (i.e. you need to wait until the player chooses to deal or spin before generating RNGs)
The RNG must pass a 95% confidence chi-squared test based on 10,000 trials as apart of a system test. It must display a warning if this test fails, and it must disable play if it fails twice in a row.
It must remember, and be able to report on, the last 10 test results as described in 7.
Every possible game outcome must be generatable by the RNG. As a pessimal example, linear congruential generators don't actually generate every possible output in their range, so they're not very useful for gaming.
Additionally, your machine design has to be submitted to the gaming commission and it has to be approved, which is expensive and takes lots of time. There are a few third-party companies that specialize in auditing your new RNG to make sure it's random. Gaming Laboratories publishes an even stricter set of standards than Nevada does. They go into much greater detail about the limitations of hardware RNGs, and Nevada in particular likes to see core RNGs that it's previously approved. This can all get very expensive, which is why many developers prefer to license an existing previously-approved RNG for new game projects.
Here is a fun list of random number generator attacks to keep you up late at night.
For super-nerds only: the source for most USB hardware RNGs is typically an avalanche diode. However, the thermal noise produced by this type of diode is not quantum-random, and it is possible to influence the randomness of an avalanche diode by significantly lowering the temperature.
As a final note, someone above recommended just using a Mersenne Twister for random number generation. This is a Bad Idea unless you are taking additional entropy from some other source. The plain vanilla Mersenne Twister is highly inappropriate for gaming and cryptographic applications, as described by its creator.
If you want to do it properly you have to get physical - ERNIE the UK national savings number picker uses a shot noise in Neon tubes.
I for sure have seen a german gambling machine that was not allowed to be ran commercially after a given date, so I suppose it was a PNRG with a looong one time pad seed list.
Most poker sites use hardware random number generators. They will also modify the output to remove any scaling bias and often use 'pots' of numbers which can be 'stirred' using entropic events (user activity, serer i/o events etc). Quite often the resultant numbers just index pre-generated decks (starting off as a sorted list of cards).

Resources