Algorithm for unique CD-KEY generation with validation - algorithm

I am trying to create a unique CD-KEY to put in our product's box, just like a normal CD-KEY found in standard software boxes that users use to register the product.
However we are not selling software, we are selling DNA collection kit for criminal and medical purposes. Users will receive a saliva collection kit by mail with the CD-KEY on it and they will use that CD-KEY to create an account on our website and get their results. The results from the test will be linked to the CD-KEY. This is the only way that we will have to link the results to the patients. It is therefore important that it does not fail :)
One of the requirements would be that the list of CD-KEYs must be sufficiently "spread" apart so that there is no possibility of someone entering an incorrect CD-KEY and still having it approved for someone else kit, thereby mixing up two kits. That could cost us thousands of dollars in liability.
For example, it cannot be a incremental sequence of numbers such as
00001
00002
00003
...
The reason is that if someone receives the kit 00002, but registers it as 000003 by accident, then his results will be matched to someone else. So it must be like credit card numbers... Unless a valid sequence is entered, your chances of randomly hitting a valid number is 1 in a million...
Also, we are selling over 50,000 kits annually to various providers (who will generate their own CD-KEYS using our algorithm) so we cannot maintain a list of all previously issued CD-KEYS to check for duplicate. The algorithm must generate unique CD-KEYs.
We also require the ability to verify that the CD-KEY is valid using a quick check algorithm, so that we can inform the user if the code he enters is invalid. This leaves out many hashing or MD5 algorithms I believe. And it cannot be a 128 bit because, who would take that time to type it out on the computer screen?
So far this is what I was thinking the final CD-KEY structure would look like
(4 char product code) - (4 char reseller code) - (12 char unique, verifiable CD-KEY)
Ex. 384A - GTLD - {4565 - FR54 - EDF3}
To insure the uniqueness of the KEYS, I could include the current date (20090521) as part of the source. We wont generate unique keys more than once a week, so this value changes often enough for the purpose of unique initial value.
What possible algorithm can I use to generate the unique keys?

Create the strings <providername>000001, <providername>000002, etc. or whatever and encrypt them with a public key, and that's your "CD-KEY" that the user enters. Decrypt the CD-KEY with the private key and validate that when decrypted you get a valid string with a valid provider name.

Credit Card numbers use the Luhn algorithm you might want to look at something similar to that.

I use SeriousBit Ellipter link for software protection but I don't see any reason you could generate a group of unique keys each week and us the library to verify the key validity when entered into your web site. You can also encode optional services into the key allow you to control how the sample is processed from the key (that's if you have different service levels).
As it uses an encrypted method of key generation in the first place and it's relatively cheap, it's certainly worth a look I would say.

I finally settled for a cd-key of this form
<TIMESTAMP>-<incremented number>-<8 char MD5 hash>-<checksumdigit>
I used the mod 11 ISBN checksum digit algorithm.

Generate GUID and catenate a random number to it. GUID is guaranteed to be unique and random number will make it improbable to hit a code accidentally. Just don't modify the GUID in any way or you might compromise the uniqueness.
http://msdn.microsoft.com/en-us/library/aa475087.aspx

Related

What are alternatives for GUIDs for key generation when central server is not possible?

I am looking for alternative to GUIDs for key generation in a distributed app. For example supposed I have Bob, James, and Jack all running a bug tracking application on their desktop where they can do thing like create bug tickets ala JIRA, or Bugzilla ... etc. When a ticket is created it is assigned a number such as T-1, T-2, T-3, T-4 ... etc. Tickets need to have a stable ID and should be creatable without having to consult a central server.
I understand that this is what GUID's are really good for but it in my case displaying a GUID in a UI is ugly people can't just copy and paste it and discuss it on a phone call, I really want integers or some sort of short string that is easy to talk about read in one glance .. etc.
Is there a way to use the bitcoin block chain as some sort of counter?
You may evaluate the approach taken by git. They use sha1 hash of commit information. And then abbreviate IDs are allowed which are much shorter and easier to read\transfer manually.
Having the number of bugs in your tracker is not going to reach millions that should be sufficient. Once it is you'll just need a longer abbreviation.
There seem to be plenty info around on how git calculates hash IDs and abbreviates them.
If I recall correctly how UUIDv1 works - it's "just" putting together the mac address and a very exact timestamp + maybe some additional integer. As your mac address should be unique (unless you've fiddled with it) and there are only so many UUIDs one computer can generate within a nano second, the resulting ID will be unique.
This is a very general and uninformed way to create IDs. If you'd implement a version of it yourself for your specific use case you could get much smaller IDs.
Assuming you can identify each node with a bug tracking system with a simple and unique string - for instance "Bob", "James", "Jack" - and you can create unique continuous integers within each node, you could combine those two and have IDs like "Bob-1", "James-12", ...
As you can see, actually there has to be again one central point, which will assign the unique strings, however depending on the number of nodes and how long they stay within the system, this could be as well done just by a human being.
The additional disadvantage (or advantage, depends how you look at it) of this approach (as well as of UUIDv1) would be, that you'd know where the ticket has been created as well as order of the tickets within one system.

Barcode Encryption of Personal Identifiers (or alternatives suggested by you)

I am trying to create a health application of a rather sensitive nature which will require some form of cryptography/obfuscation. There is a health study in which once a year, known individuals with permanent and recognisable identifier numbers (eg KIG0005001 as an individuals identifier) walk into the clinic, are identified, have their blood tested as part of a study. Next year, the same happens again, as this is a longitudinal study. Now the results of the blood test should NOT be able to be traceable to an actual individual (HIV status, etc are highly sensitive bits of information that should not be linkable with actual individuals due to their right to privacy), but it is IMPERATIVE that we can identify year on year which blood samples belong to one unique individual (without knowing WHO the individual actually is, the emphasis is on the blood samples being traceable to one individual, not the individual).
My idea (and here is where am asking for your expertise in cryptography and obfuscation) is that when the individual visits the clinic they come with an identifying card with their regular id number KIG0005001 . This number is entered into a system where via an algorithm/encryption it spits out a barcode (based on the original id KIG0005001 , therefore any future visits should produce the SAME barcode for a particular individual) which can be printed out as stickers. These barcode stickers are the ones to be used to identify the samples (stick em on the samples). The stickers should have the following information in them: unique identifier (via barcode?), the round number that the sample was taken (samples will be taken once a year, so year 1= round 1) and date sample taken.
Is this possible? What are the alternatives? How/What should I do in terms of transforming KIG0005001 into an encrypted barcode which is repeatable year on year (so blood sample can always be traced back to the same source). Am programming in Java.
Thanks in advance,
Tumaini
To answer this question, I don't think it needs to be in the barcode section.
First of all, there is no way to keep everything 100% secure... but you can make it more complicated to be understood by a human.
It's the same thing as the passport controversy... A biometric passport must be secure: it's not possible to read the information without knowing the "private key". But let's say you read and record everybody's passport that enters your store and save it to a database. You will be able to trace who is coming back and even what they previously bought since you have their passport's ID...
To make the life harder for your employees, you need to generate an ID that will match the real person's ID. So if the employee is testing the blood of KIG0005001, they will receive a different unique ID for that day; the computer will know how to link them up. So that your employee has no idea who is this number at that moment...
Cryptography is probably useless here since you work with IDs. Even a gibberish data repeated multiple time is still an ID.

How to create pins with one password (aka Google 2-Factor-Auth)

maybe some of you know Googles 2-Factor-Authentication; first Google generates a constant password (eg. "abcd").
If you login, you're asked for a pin, an app can generate that or you can use one of 10 preset pins. The interesting part is, that you don't have to use one pin, the app generates a random one without using network access.
How is that done? I know how to do it with one specific pin, but how could you use several "random" pins?
Thanks,
Marc
This is made possible by systems like HOTP (hash-based OTP). The RFC explains how it works in detail, but in short:
The server generates a random secret key and shares it with the OTP generator.
Both server and OTP generator initialize a counter to 0.
When the user requests a new key from the OTP generator, it increments the counter, calculates the HMAC of it using the shared key, and encodes part of the hash in a specified way, resulting in a numeric code.
When the server receives an OTP code, it performs the same calculation, accepting it if it matches. If it does not, it tries again with several other (larger) counter IDs in case the user skipped one or more IDs.
Pre-generated lists of OTPs are simply produced as described above, ahead of time.
I believe that Google does it by computing multiple pins that it thinks you could use, and is willing to accept any of these that match.
This is an important usability feature, because it means that if someone fails to login once using 2-factor, they can try to login another time and still be OK.

Is there a way to generate a short random id, avoiding collisions, without hitting persistent storage?

If you've used GoToMeeting, that's the type of ID I want. I'd like it to be random so that it obfuscates the number of items being tracked and short, so that it's easy to reference manually; UUIDs are way too long. I'd like to avoid hitting persistent storage merely for performance reasons, but I can't think of any other way to avoid collisions. Is 9 digits enough to do something time-based?
In response to questions:
I'm building a ticket-tracking application. This ID would be used as the primary key for a table, but it would be needed before the record is persisted which would result in an extra database call that I'd like to avoid if possible.
I'd like to keep it at a 9 digit int. I consider a UUID to be too long because people are going to have to reference the ID manually (via email, phone, etc.).
I'm thinking of using the time of generation somehow. Since time is always ticking on forward, it would continually limit the set of potential IDs, excluding those that had already been generated.
One way is to take a unique number or string (like a random UUID) then calculate a fixed-length digest (such as MD5 or SHA-1) and/or encode it in a higher base (like base64) to shorten it further.
Git does something similar where it generates a sha numbers for commits (and other events) and then the user can references the numbers manually in order to lookup those commits. The trick they used is that the user doesn't have to enter the whole string in order to find the correct event, they simply have to enter a long enough string that it doesn't collide with any other commit currently in the repository. In general this only require 5 or so hex digits for relatively large repositories.

Algorithm for message code validation

If you read this thread before - forget everything I wrote, I must have been drunk when I wrote it. I'm starting over:
I'm currently working on a project where we will be using some sort of algorithm for validating user input. There are three parties to consider;
Client - Browsing our web pages
Company - We, handling the Client requests
3rd Party Company - Handling Client messages
On our web pages we will show the Client some information about a product. If he/she wants more information about the product he has to contact the 3rd Party Company and state the products code (not unique per se, but not common either). In return the Client will receive some other code from the 3rd Party Company which he should input on our web page, where we will validate the code for approval.
The best would be if we, the Company, had no interaction with the 3rd Party Company. Pure encryption is out of the picture because it generates a string that is too long. We are doing this by SMS, so the codes has to be short.
What I've come up with so far:
For every product I generate a somewhat unique code (it doesn't matter if it's unique or not really) in base 16 (0-f). The Client who wants more info about the product sends a SMS to the 3rd Party Company stating the products code. In return the Client receives the same code, but the digits are multiplied (possibly by 2) and converted to base 36. On top of that a last character is added to the code, a control number, to make the code valid for the Luhn algorithm in base 36. The user enters the received code and we, the Company, validate it on the server side against the product code (validate against Luhn, divide by 2 and switch back to base 16).
Does this sound reasonably safe and appropriate? Is it a valid way to send messages by three parties, when two of them shouldn't need to communicate?
Sorry for the edit, but my mind must have been elsewhere when I wrote the first post.
I think you are confusing things, if you use the Luhn algorithm, for example, it'll just return True or False on the checksum. The sample code you gave seems to indicate that you want to have some checksum result (ex. 12345) that can be hashed from two different values. This problem would be more difficult.
How will the third party create this value? Will you give them some Javascript code for them to execute, or some other language? Couldn't you have a shared secret key and they could symmetrically encrypt the value with that secret key, you could have them prefix the part they encrypt with some known value so you could verify it quickly.
Their code:
to_send = encrypted(shared_key, 'check' + code)
Your code:
unencrypted = decrypt(shared_key, to_send)
if not unencrypted.startswith('check'):
return False # failed check
OK, so you want no interaction between the other application and your application. And you would like to limit the codes to 6 characters. Here are my thoughts:
Use 10 characters, that will make brute-force attacks harder;
Use all Latin letters and digits - that will give you 36 possible character values;
Why not use some big number library and simply multiply your code (taken as a Base36 number) by some ludicrously large value (say, 2048 random bits). Then convert it to Base36 and take the last 10 digits. Or maybe first 5 and last 5. Or maybe some other combination dependant on the original code. I've no idea how cryptographically strong this will be (probably not much), but the effort to crack the code will doubtfuly be smaller than simply paying for the service.
Alternatively you could salt (prepend some secret string) your code and then calculate MD5 of it. Return the MD5 (or some N characters of it) to the user as your code. This should be pretty cryptographically OK, although I'm no expert. By converting the MD5 result to Base36 you could increase the strength of this algorithm.
Why a "checksum"? Can't the 3rd party run any little utility that you give them? All you need is a 5-digit encryptor that the 3rd party can run on their computer, feed the product code into, and send the 5-digit result to the client as the key code.
The encryptor always produces the same result from the same input.
Then, the client sends you the product code and the key code. You run the product code through an exact copy of that encryptor, and compare that result to the key code.
The security of this system can be enhanced without changing the fundamental architecture.
-Al.
Edit after some clarifications:
I still think that the product code and the matching 3rd party response cannot be constant - otherwise it can be shared will other users, which will thus be able to give the response code without going to the 3rd party.
If the product code is constant, a possible approach is that the 3rd party response depends on both the code and the user's phone number, and so is your validation. This way, each response is both product and user specific.
The specific permutation of the Luhn algorithm isn't too important in my opinion - if someone can crack one variation, he'll probably be able to crack another one.
Original Answer:
In short, I think you can use the Luhn algorithm, if you give the user a one-time ticket, valid for a limited amount of time.
First, if I understand the problem correctly, your product code cannot be constant - otherwise the response created by the 3rd party will always be the same for this product. This means the user will be able to use this code again later, or even give it to another user.
Therefore, I think you should generate and give the user a random new code per his request of information/access to the product. This code should be valid for this product for a limited period of time (an hour, a day, depending on your needs).
The response sent by the 3rd party to the user should be valid only when entered together with the code you provided to the user.
After validation, this code cannot be used until the specified time period is over.
As an option, I think you and the 3rd party can append something like the current date to the code and response pair during computation, so they are not always the same pair.
After long debates with the 3 Party Company we've concluded that the best solution will be if they pass the Clients SMS to me, I generate a new code and send it back to them which in their turn send a new SMS to the Client with the code I generated. Not optimal from my point of view, but at least I can now do it in any way I want.
Thanks for your input thou.

Resources