two-way keyed encryption/hash algorithm - algorithm

I am no way experienced in this type of thing so I am not even sure of the keywords (hence the title).
Basically I need a two way function
encrypt(w,x,y) = z
decrypt(z) = w, x, y
Where w = integer
x = string (username)
y = unix timestamp
and z = is an 8 digit number (possibly including letters, spec isn't there yet.)
I would like z to be not easily guessable and easily verifiable. Speed isn't a huge concern, security isn't either. Tracking one-to-one relationship is the main requirement.
Any resources or direction would be appreciated.
EDIT
Thanks for the answers, learning a lot. So to clarify, 8 characters is the only hard requirement, along with the ability to link W <-> Z. The username (Y) and timestamp (Z) would be considered icing on the cake.
I would like to do this mathematically rather than doing some database looks up, if possible.
If i had to finish this tonight, I could just find a fitting hash algorithm and use a look up table. I am simply trying to expand my understanding of this type of thing and see if I could do it mathematically.

Encryption vs. Hashing
This is an encryption problem, since the original information needs to be recovered. The quality of a cryptographic hash is judged by how difficult it is to reverse the hash and recover the original information, so hashing is not applicable here.
To perform encryption, some key material is needed. There are many encryption algorithms, but they fall into two main groups: symmetric and asymmetric.
Application
The application here isn't clear. But if you are "encrypting" some information and sending it somewhere, then later getting it back and doing something with it, symmetric encryption is the way to go. For example, say you want to encode a user name, an IP address, and some identifier from your application in a parameter that you include in a link in some HTML. When the user clicks the link, that parameter is passed back to your application and you decode it to recover the original information. That's a great fit for symmetric encryption, because the sender and the recipient are the same party, and key exchange is a no-op.
Background
In symmetric encryption, the sender and recipient need to know the same key, but keep it secret from everyone else. As a simple example, two people could meet in person, and decide on a password. Later on, they could use that password to keep their email to each other private. However, anyone who overhears the password exchange will be able to spy on them; the exchange has to happen over a secure channel... but if you had a secure channel to begin with, you wouldn't need to exchange a new password.
In asymmetric encryption, each party creates a pair of keys. One is public, and can be freely distributed to anyone who wants to send a private message. The other is private. Only the message recipient knows that private key.
A big advantage to symmetric encryption is that it is fast. All well-designed protocols use a symmetric algorithm to encrypt large amounts of data. The downside is that it can be difficult to exchange keys securely—what if you can't "meet up" (virtually or physically) in a secure place to agree on a password?
Since public keys can be freely shared, two people can exchange a private message over an insecure channel without having previously agreed on a key. However, asymmetric encryption is much slower, so its usually used to encrypt a symmetric key or perform "key agreement" for a symmetric cipher. SSL and most cryptographic protocols go through a handshake where asymmetric encryption is used to set up a symmetric key, which is used to protect the rest of the conversation.

You just need to encrypt a serialization of (w, x, y) with a private key. Use the same private key to decrypt it.
In any case, the size of z cannot be simply bounded like you did, since it depends on the size of the serialization (since it needs to be two way, there's a bound on the compression you can do, depending on the entropy).
And you are not looking for a hash function, since it would obviously lose some information and you wouldn't be able to reverse it.
EDIT: Since the size of z is a hard limit, you need to restrict the input to 8 bytes, and choose a encryption technique that use 64 bits (or less) block size. Blowfish and Triple DES use 64 bits blocks, but remember that those algorithms didn't receive the same scrutiny as AES.
If you want something really simple and quite unsecure, just xor your input with a secret key.

You probably can't.
Let's say that w is 32 bits, x supports at least 8 case-insensitive ASCII chars, so at least 37 bits, and y is 32 bits (gets you to 2038, and 31 bits doesn't even get you to now).
So, that's a total of at least 101 bits of data. You're trying to store it in an 8 digit number. It's mathematically impossible to create an invertible function from a larger set to a smaller set, so you'd need to store more than 12.5 bits per "digit".
Of course if you go to more than 8 characters, or if your characters are 16 bit unicode, then you're at least in with a chance.

Let's formalize your problem, to better study it.
Let k be a key from the set K of possible keys, and (w, x, y) a piece of information, from a set I, that we need to crypt. Let's define the set of "crypted-messages" as A8, where A is the alphabet from which we extract the characters to our crypted message (A = {0, 1, ..., 9, a, b, ..., z, ... }, depending on your specs, as you said).
We define the two functions:
crypt: I * K --> A^8.
decrypt A^8 * K --> I
The problem here is that the size of the set A^8, of crypted-messages, might be smaller than the set of pieces of information (w, x, y). If this is so, it is simply impossible to achieve what you are looking for, unless we try something different...
Let's say that only YOU (or your server, or your application on your server) have to be able to calculate (w, x, y) from z. That is, you might send z to someone, and you don't care that they will not be able to decrypt it.
In this case, what you can do is use a database on your server. You will crypt the information using a well-known algorithm, than you generate a random number z. You define the table:
Id: char[8]
CryptedInformation: byte[]
You will then store z on the Id column, and the crypted information on the corresponding column.
When you need to decrypt the information, someone will give you z, the index of the crypted information, and then you can proceed to decryption.
However, if this works for you, you might not even need to crypt the information, you could have a table:
Id: char[8]
Integer: int
Username: char[]
Timestamp: DateTime
And use the same method, without crypting anything.
This can be applied to an "e-mail verification system" on a subscription process, for example. The link you would send to the user by mail would contain z.
Hope this helps.

I can't tell if you are trying to set this up a way to store passwords, but if you are, you should not use a two way hash function.
If you really want to do what you described, you should just concatenate the string and the timestamp (fill in extra spaces with underscores or something). Take that resulting string, convert it to ASCII or UTF-8 or something, and find its value modulo the largest prime less than 10^8.

Encryption or no encryption, I do not think it is possible to pack that much information into an 8 digit number in such a way that you will ever be able to get it out again.
An integer is 4 bytes. Let's assume your username is limited to 8 characters, and that characters are bytes. Then the timestamp is at least another 4 bytes. That's 16 bytes right there. In hex, that will take 32 digits. Base36 or something will be less, but it's not going to be anywhere near 8.

Hashes by definition are one way only, once hashed, it is very difficult to get the original value back again.
For 2 way encryption i would look at TripleDES which .net has baked right in with TripleDESCryptoServiceProvider.
A fairly straight forward implementation article.
EDIT
It has been mentioned below that you can not cram a lot of information into a small encrypted value. However, for many (not all) situations this is exactly what Bit Masks exist to solve.

Related

Will this obfuscation algorithm for a URL shortener work?

DISCLAIMER: I am not asking how to make a URL shortener (I have already implemented the "bijective function" answer found HERE that uses a base-62 encoded string). Instead, I want to expand this implementation to obfuscate the generated string so that it is both:
A) not an easily guessable sequence, and
B) still bijective.
You can easily randomize your base-62 character set, but the problem is that it still increments like any other number in any other base. For example, one possible incremental progression might be {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}
I have come up with an obfuscation technique that I am pleased with in terms of requirement A), but I'm only partially sure that it satisfies B). The idea is this:
The only thing that is guaranteed to change in the incremental approach is the "1's place" (I'll use decimal terminology for practicality reasons). In the sample progression I gave earlier, that would be {E, 3, f, R, …}. So if each character in the base-62 set had its own unique offset number (say, its distance from the "zero character"), then you could apply the offset of the "1's place" character to the rest of the string.
For instance, let's assume a base-5 set with characters {A, f, 9, p, Z, 3} (in ascending order from 0 to 5). Each one would then have a unique offset of 0 to 5 respectively. Counting would look like {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} and so on. So the algorithm, when given a value of fZ3p, would look at the p and, having an offset of +3, would permute the string into Zf9p (assuming the base-5 set is a circular array). The next incremental number would be fZ3Z, and with Z's offset being +4, the algorithm returns 39pZ. These permutated results would be handed off to the user as his/her "unique URL", who would never see the actual base-62 encoded string.
This approach certainly seems reversible; just look at the last character, and perform the same permutation with the negative offset. And I'm thinking that for this reason, it has to still be bijective. But I don't know if this is necessarily true? Are there any edge/corner cases I'm not considering?
EDIT : My intentions are more heavily weighed towards the length of the shortened-URL rather than the security of the pattern. I realize there are plenty of solutions involving cryptographic functions, block ciphers, etc. But I would like to emphasize that I am not asking the best way to achieve A), but rather, "is my offset-approach satisfying B)".
Any holes you can find would be appreciated.
If you honestly want them to be hard to guess, keep it simple.
Start with a normal encryption algorithm running in counter mode. When you get a URL to shorten, increment your counter, encrypt it, convert the result to something using printable characters (e.g., base 64) and put the original URL and the shortened version into your table so you can get the original URL from the shortened version when needed.
The only real question at that point is what encryption algorithm to use. That, in turn, depends on your threat model. I don't see exactly what you gain by making shortened URLs hard to guess, so I'm a bit uncertain about the threat model.
If you want to make it mildly difficult to guess, you could use something like a 40-bit version of RC4. This is pretty easy to break, but enough to keep most people from bothering.
If you want a bit more security, you could step up to DES. That's been broken, but even at this late date breaking it is quite a bit of work.
If you want more security than that, you can use AES.
Note that as you increase the security, the shortened URL gets longer though. RC4-40 starts out with 5 bytes, DES 7 bytes, and AES with 32 bytes. Depending on how you convert to printable text, that's going to expand at least a little.
Another option is to use the Luby-Rackoff construction (see also here), which is a way to generate a pseudo-random permutation from a pseudo-random function.
You just have to pick a "round function" F. F must take as input a key K and a block of bits half the size of what you are encoding. F must produce as output a block of bits also half the size of whatever you are encoding.
Then you just run the Luby-Rackoff construction (aka. "Feistel network") for four rounds, each round using a different K.
The construction guarantees that the result is a bijective map, and it will be hard to invert provided that F is hard to invert.
I tried to solve the same problem (in php) and ended up with those functions:
hashing the row id with a kind of feistel algo
applying a bijective function to compress the integer
So for the A): it's not easily guessable (to me) as you cant increment a string to get the next record without an algo
And for the B): for what I understand it's 100% bijective.
Thanks to #Nemo for naming the feistel network, which lead me to the first function i have linked to.
If you're trying to avoid people crawling the URLs, I think Nick Johnson has the right idea, that you need to make sure your URL space is not dense.
Here's a simple idea: take your URL, and prepend a few random characters to it. Then run it through a compression algorithm -- I'd try range encoding (you can probably specify the basis if you find a good library). This should be decompressible to the original form, and should both impact locality and make the encoded space more sparse.
That said, I imagine nearly all URL shorteners out there keep a hash table with state on the server side. How else are you going to losslessly compress a hundred-character URL into 5 or 6 characters?

Simple integer encryption

Is there a simple algorithm to encrypt integers? That is, a function E(i,k) that accepts an n-bit integer and a key (of any type) and produces another, unrelated n-bit integer that, when fed into a second function D(E(i),k) (along with the key) produces the original integer?
Obviously there are some simple reversible operations you can perform, but they all seem to produce clearly related outputs (e.g. consecutive inputs lead to consecutive outputs). Also, of course, there are cryptographically strong standard algorithms, but they don't produce small enough outputs (e.g. 32-bit). I know any 32-bit cryptography can be brute-forced, but I'm not looking for something cryptographically strong, just something that looks random. Theoretically speaking it should be possible; after all, I could just create a dictionary by randomly pairing every integer. But I was hoping for something a little less memory-intensive.
Edit: Thanks for the answers. Simple XOR solutions will not work because similar inputs will produce similar outputs.
Would not this amount to a Block Cipher of block size = 32 bits ?
Not very popular, because it's easy to break. But theorically feasible.
Here is one implementation in Perl :
http://metacpan.org/pod/Crypt::Skip32
UPDATE: See also Format preserving encryption
UPDATE 2: RC5 supports 32-64-128 bits for its block size
I wrote an article some time ago about how to generate a 'cryptographically secure permutation' from a block cipher, which sounds like what you want. It covers using folding to reduce the size of a block cipher, and a trick for dealing with non-power-of-2 ranges.
A simple one:
rand = new Random(k);
return (i xor rand.Next())
(the point xor-ing with rand.Next() rather than k is that otherwise, given i and E(i,k), you can get k by k = i xor E(i,k))
Ayden is an algorithm that I developed. It is compact, fast and looks very secure. It is currently available for 32 and 64 bit integers. It is on public domain and you can get it from http://github.com/msotoodeh/integer-encoder.
You could take an n-bit hash of your key (assuming it's private) and XOR that hash with the original integer to encrypt, and with the encrypted integer to decrypt.
Probably not cryptographically solid, but depending on your requirements, may be sufficient.
If you just want to look random and don't care about security, how about just swapping bits around. You could simply reverse the bit string, so the high bit becomes the low bit, second highest, second lowest, etc, or you could do some other random permutation (eg 1 to 4, 2 to 7 3 to 1, etc.
How about XORing it with a prime or two? Swapping bits around seems very random when trying to analyze it.
Try something along the lines of XORing it with a prime and itself after bit shifting.
How many integers do you want to encrypt? How much key data do you want to have to deal with?
If you have few items to encrypt, and you're willing to deal with key data that's just as long as the data you want to encrypt, then the one-time-pad is super simple (just an XOR operation) and mathematically unbreakable.
The drawback is that the problem of keeping the key secret is about as large as the problem of keeping your data secret.
It also has the flaw (that is run into time and again whenever someone decides to try to use it) that if you take any shortcuts - like using a non-random key or the common one of using a limited length key and recycling it - that it becomes about the weakest cipher in existence. Well, maybe ROT13 is weaker.
But in all seriousness, if you're encrypting an integer, what are you going to do with the key no matter which cipher you decide on? Keeping the key secret will be a problem about as big (or bigger) than keeping the integer secret. And if you're encrypting a bunch of integers, just use a standard, peer reviewed cipher like you'll find in many crypto libraries.
RC4 will produce as little output as you want, since it's a stream cipher.
XOR it with /dev/random

Generating unique N-valued key

I want to generate unique random, N-valued key.
This key can contain numbers and latin characters, i.e. A-Za-z0-9.
The only solution I am thinking about is something like this (pseudocode):
key = "";
smb = "ABC…abc…0123456789"; // allowed symbols
for (i = 0; i < N; i++) {
key += smb[rnd(0, smb.length() - 1)]; // select symbol at random position
}
Is there any better solution? What can you suggest?
I would look into GUIDs. From the Wikipedia entry, "the primary purpose of the GUID is to have a totally unique number," which sounds exactly like what you are looking for. There are several implementations out there that generate GUIDs, so it's likely you will not have to reinvent the wheel.
Keeping in mind that the whole field of cryptography relies on, amongst other things, making random numbers. Therefore the NSA, the CIA, and some of the best mathematicians in the world are working on this so I guarantee you that there are better ideas.
Me? I'd just do what fbrereto suggests, and just get a guid. Or look into cryptographic key generators, or y'know, some lava lamps and a camera.
Oh, and as to the code you have; depending on the language, you may need to seed the RNG, or it'll generate the same key every time.
Whatever you do, if you wind up generating a key that uses all numbers and all letters, and if a person is ever going to see that key (which is likely if you are using numbers and letters), omit the characters l, I, 1, O, and 0. People get them confused.
Nothing in your post addresses the question of uniqueness. You're going to have to have some way of not generating the same key twice. Usually, when I need a unique key, I have some unique information to start with. I usually take a one-way hash like MD5, then there are ways to convert that to a key with varying degrees of readability:
Convert to hex
Base64 encode it
Use bits of of the key to index into a list of words.
Example: the unique string computed by hashing the part of this answer above the horizontal line is
abduction's brogue's melted bragger's
You could do a base64 encoding of some random data and remove the +, /, and = characters from the result? I don't know if this would make a predictable distribution. Also, it seems like more work that what you're doing now, which is a fine solution.
Assuming you're using a language/library without an utterly pathetic random number generator, what you've got looks pretty good. N symbols uniformly distributed over a reasonable alphabet works for me, and no amount of applying fancier code is likely to make it more random (just slower).
(For the record, pathetic would include ditching the high-order bits of the underlying random numbers when choosing a value from the given range. While ideally all RNGs would make every bit equally random, in practice that's not so; the higher-order bits tend to be more random. This means that the modulus operator is totally the wrong thing to use when clamping to a restricted range.)

Encryption algorithm that output byte by byte based on password and offset

Is there a well-known (to be considered) algorithm that can encrypt/decrypt any arbitrary byte inside the file based on the password entered and the offset inside the file.
(Databyte, Offset, Password) => EncryptedByte
(EncryptedByte, Offset, Password) => DataByte
And is there some fundamental weakness in this approach or it's still theoretically possible to build it strong enough
Update:
More datails: Any cryptographic algorithm has input and output. For many existing ones the input operates on large blocks. I want to operate on only one byte, but the system based on this can only can remap bytes and weak by default, but if we take the position in the file of this byte, we for example can take the bits of this position value to interpret them as some operation on some step (0: xor, 1: shitf) and create the encrypted byte with this. But it's too simple, I'm looking for something stronger.
Maybe it's not very efficient but how about this:
for encryption use:
encryptedDataByte = Encrypt(offset,key) ^ dataByte
for decryption use:
dataByte = Encrypt(offset,key) ^ encryptedDataByte
Where Encrypt(offset,key) might be e.g. 3DES or AES (with padding the offset, if needed, and throwing away all but one result bytes)
If you can live with block sizes of 16 byte, you can try the XTS-mode described in the wikipedia article about Disk encryption theory (the advantage being that some good cryptologists already looked at it).
If you really need byte-wise encryption, I doubt that there is an established solution. In the conference Crypto 2009 there was a talk about How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuffle. In your case the domain is a byte, and as this is a power of 2, a Thorp Shuffle corresponds to a maximally unbalanced Feistel network. Maybe one can build something using the position and the password as key, but I'd be surprised if a home-made solution will be secure.
You can use AES in Counter Mode where you divide your input into blocks of 16 bytes (128 bits) and then basically encrypt a counter on the block number to get a pseudo-random 16 bytes that you can XOR with the plaintext. It is critically important to not use the same counter start value (and/or initialization vector) for the same key ever again or you will open yourself for an easy attack where an attacker can use a simple xor to recover the key.
You mention that you want to only operate on individual bytes, but this approach would give you that flexibility. Output Feedback Mode is another common one, but you have to be careful in its use.
You might consider using the EAX mode for better security. Also, make sure you're using something like PBKDF-2 or scrypt to generate your encryption key from the password.
However, as with most cryptography related issues, it's much better to use a rigorously tested and evaluated library rather than rolling your own.
Basically what you need to do is generate some value X (probably 1 byte) based on the offset and password, and use this to encrypt/decrypt the byte at that offset. We'll call it
X = f(offset,password)
The problem is that an attacker that "knows something" about the file contents (e.g. the file is English text, or a JPEG) can come up with an estimate (or sometimes be certain) of what an X could be. So he has a "rough idea" about many X values, and for each of these he knows what the offset is. There is a lot of information available.
Now, it would be nice if all that information were of little use to the attacker. For most purposes, using a cryptographic hash function (like SHA-1) will give you a reasonable assurance of decent security.
But I must stress that if this is something critical, consult an expert.
One possibility is a One Time Pad, possibly using the password to seed some pseudo-random number generator. One time pads theoretically achieve perfect secrecy, but there are some caveats. It should do what you're looking for though.

How to generate a verification code/number?

I'm working on an application where users have to make a call and type a verification number with the keypad of their phone.
I would like to be able to detect if the number they type is correct or not. The phone system does not have access to a list of valid numbers, but instead, it will validate the number against an algorithm (like a credit card number).
Here are some of the requirements :
It must be difficult to type a valid random code
It must be difficult to have a valid code if I make a typo (transposition of digits, wrong digit)
I must have a reasonable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Given these requirements, how would you generate such a number?
EDIT :
#Haaked: The code has to be numerical because the user types it with its phone.
#matt b: On the first step, the code is displayed on a Web page, the second step is to call and type in the code. I don't know the user's phone number.
Followup : I've found several algorithms to check the validity of numbers (See this interesting Google Code project : checkDigits).
After some research, I think I'll go with the ISO 7064 Mod 97,10 formula. It seems pretty solid as it is used to validate IBAN (International Bank Account Number).
The formula is very simple:
Take a number : 123456
Apply the following formula to obtain the 2 digits checksum : mod(98 - mod(number * 100, 97), 97) => 76
Concat number and checksum to obtain the code => 12345676
To validate a code, verify that mod(code, 97) == 1
Test :
mod(12345676, 97) = 1 => GOOD
mod(21345676, 97) = 50 => BAD !
mod(12345678, 97) = 10 => BAD !
Apparently, this algorithm catches most of the errors.
Another interesting option was the Verhoeff algorithm. It has only one verification digit and is more difficult to implement (compared to the simple formula above).
For 1M combinations you'll need 6 digits. To make sure that there aren't any accidentally valid codes, I suggest 9 digits with a 1/1000 chance that a random code works. I'd also suggest using another digit (10 total) to perform an integrity check. As far as distribution patterns, random will suffice and the check digit will ensure that a single error will not result in a correct code.
Edit: Apparently I didn't fully read your request. Using a credit card number, you could perform a hash on it (MD5 or SHA1 or something similar). You then truncate at an appropriate spot (for example 9 characters) and convert to base 10. Then you add the check digit(s) and this should more or less work for your purposes.
You want to segment your code. Part of it should be a 16-bit CRC of the rest of the code.
If all you want is a verification number then just use a sequence number (assuming you have a single point of generation). That way you know you are not getting duplicates.
Then you prefix the sequence with a CRC-16 of that sequence number AND some private key. You can use anything for the private key, as long as you keep it private. Make it something big, at least a GUID, but it could be the text to War and Peace from project Gutenberg. Just needs to be secret and constant. Having a private key prevents people from being able to forge a key, but using a 16 bit CR makes it easier to break.
To validate you just split the number into its two parts, and then take a CRC-16 of the sequence number and the private key.
If you want to obscure the sequential portion more, then split the CRC in two parts. Put 3 digits at the front and 2 at the back of the sequence (zero pad so the length of the CRC is consistent).
This method allows you to start with smaller keys too. The first 10 keys will be 6 digits.
Does it have to be only numbers? You could create a random number between 1 and 1M (I'd suggest even higher though) and then Base32 encode it. The next thing you need to do is Hash that value (using a secret salt value) and base32 encode the hash. Then append the two strings together, perhaps separated by the dash.
That way, you can verify the incoming code algorithmically. You just take the left side of the code, hash it using your secret salt, and compare that value to the right side of the code.
I must have a reasonnable number of possible combinations (let's say 1M)
The code must be as short as possible, to avoid errors from the user
Well, if you want it to have at least one million combinations, then you need at least six digits. Is that short enough?
When you are creating the verification code, do you have access to the caller's phone number?
If so I would use the caller's phone number and run it through some sort of hashing function so that you can guarantee that the verification code you gave to the caller in step 1 is the same one that they are entering in step 2 (to make sure they aren't using a friend's validation code or they simply made a very lucky guess).
About the hashing, I'm not sure if it's possible to take a 10 digit number and come out with a hash result that would be < 10 digits (I guess you'd have to live with a certain amount of collision) but I think this would help ensure the user is who they say they are.
Of course this won't work if the phone number used in step 1 is different than the one they are calling from in step 2.
Assuming you already know how to detect which key the user hit, this should be doable reasonably easily. In the security world, there is the notion of a "one time" password. This is sometimes referred to as a "disposable password." Normally these are restricted to the (easily typable) ASCII values. So, [a-zA-z0-9] and a bunch of easily typable symbols. like comma, period, semi colon, and parenthesis. In your case, though, you'd probably want to limit the range to [0-9] and possibly include * and #.
I am unable to explain all the technical details of how these one-time codes are generated (or work) adequately. There is some intermediate math behind it, which I'd butcher without first reviewing it myself. Suffice it to say that you use an algorithm to generate a stream of one time passwords. No matter how mnay previous codes you know, the subsequent one should be impossibel to guess! In your case, you'll simply use each password on the list as the user's random code.
Rather than fail at explaining the details of the implementation myself, I'll direct you to a 9 page article where you can read up on it youself: https://www.grc.com/ppp.htm
It sounds like you have the unspoken requirement that it must be quickly determined, via algorithm, that the code is valid. This would rule out you simply handing out a list of one time pad numbers.
There are several ways people have done this in the past.
Make a public key and private key. Encode the numbers 0-999,999 using the private key, and hand out the results. You'll need to throw in some random numbers to make the result come out to the longer version, and you'll have to convert the result from base 64 to base 10. When you get a number entered, convert it back to base64, apply the private key, and see if the intereting numbers are under 1,000,000 (discard the random numbers).
Use a reversible hash function
Use the first million numbers from a PRN seeded at a specific value. The "checking" function can get the seed, and know that the next million values are good. It can either generate them each time and check one by one when a code is received, or on program startup store them all in a table, sorted, and then use binary search (maximum of compares) since one million integers is not a whole lot of space.
There are a bunch of other options, but these are common and easy to implement.
-Adam
You linked to the check digits project, and using the "encode" function seems like a good solution. It says:
encode may throw an exception if 'bad' data (e.g. non-numeric) is passed to it, while verify only returns true or false. The idea here is that encode normally gets it's data from 'trusted' internal sources (a database key for instance), so it should be pretty usual, in fact, exceptional that bad data is being passed in.
So it sounds like you could pass the encode function a database key (5 digits, for instance) and you could get a number out that would meet your requirements.

Resources