I am trying to create a health application of a rather sensitive nature which will require some form of cryptography/obfuscation. There is a health study in which once a year, known individuals with permanent and recognisable identifier numbers (eg KIG0005001 as an individuals identifier) walk into the clinic, are identified, have their blood tested as part of a study. Next year, the same happens again, as this is a longitudinal study. Now the results of the blood test should NOT be able to be traceable to an actual individual (HIV status, etc are highly sensitive bits of information that should not be linkable with actual individuals due to their right to privacy), but it is IMPERATIVE that we can identify year on year which blood samples belong to one unique individual (without knowing WHO the individual actually is, the emphasis is on the blood samples being traceable to one individual, not the individual).
My idea (and here is where am asking for your expertise in cryptography and obfuscation) is that when the individual visits the clinic they come with an identifying card with their regular id number KIG0005001 . This number is entered into a system where via an algorithm/encryption it spits out a barcode (based on the original id KIG0005001 , therefore any future visits should produce the SAME barcode for a particular individual) which can be printed out as stickers. These barcode stickers are the ones to be used to identify the samples (stick em on the samples). The stickers should have the following information in them: unique identifier (via barcode?), the round number that the sample was taken (samples will be taken once a year, so year 1= round 1) and date sample taken.
Is this possible? What are the alternatives? How/What should I do in terms of transforming KIG0005001 into an encrypted barcode which is repeatable year on year (so blood sample can always be traced back to the same source). Am programming in Java.
Thanks in advance,
Tumaini
To answer this question, I don't think it needs to be in the barcode section.
First of all, there is no way to keep everything 100% secure... but you can make it more complicated to be understood by a human.
It's the same thing as the passport controversy... A biometric passport must be secure: it's not possible to read the information without knowing the "private key". But let's say you read and record everybody's passport that enters your store and save it to a database. You will be able to trace who is coming back and even what they previously bought since you have their passport's ID...
To make the life harder for your employees, you need to generate an ID that will match the real person's ID. So if the employee is testing the blood of KIG0005001, they will receive a different unique ID for that day; the computer will know how to link them up. So that your employee has no idea who is this number at that moment...
Cryptography is probably useless here since you work with IDs. Even a gibberish data repeated multiple time is still an ID.
Related
I'm aware of the Luhn algorithm for validation of payment card numbers.
However, is there something similar that will tell me whether a particular card requires a start date or issue number, as these aren't universal?
Using this information, I would then show or hide the start date and/or issue number input boxes once a customer has entered their payment card number.
As far as I know, no there is no way to detect whether the expiry date (I assume that's what you meant by start date) is required based on the card number.
In these kind of situations (i.e. special cases for different cards), I've turned to this site as well as the backing data. With this data, you can get the bin of a credit card from the first 6-8 digits. I'm skeptical as to whether even the bin contains the data you are looking for, but good luck!
Just a note: be wary of the data since it is open source. In my experience it has been accurate, but make sure you keep that in mind. There are enterprise solutions to get bin data if accuracy is that important to you.
Suppose we have an event and we want to prove that the event occurred after a particular date, we have a few easy ways of doing so. For example, one may just show a snapshot of a newspaper with a particular date and headline, indicating that the event is at least after that day. Or we could put in the ending stock price in a particular exchange of a particular date to say that it was after the end of trading hours of that day. This could be as fine grained to the second after the time when the exchange closed.
How to do the converse ? How can one say that an event occurred before a particular point of time ? One could depict large events (skyline of NYC to show various before or after WTC) and geological changes, but that is a very large-scale measure. Is there a much more fine-grained way to depict the fact, of the granularity of a few hours or days ?
Hash up the information you need to preserve (e.g. with a https://en.wikipedia.org/wiki/Merkle_tree) and publish the resulting hash value openly. This doesn't disclose any usable information, but if you later need to prove precedence, you can disclose the values you hashed up to show you had the information at that time.
I heard a story of AT&T paying for newspaper advertisements, long before computer security was mainstream, which disclosed a hash value. After a while the paper became worried that they were publishing mysterious advertisements every day that looked like secret codes and AT&T had to explain to the newspaper what the function of these were.
(A web search finds https://www.newscientist.com/article/mg13318103-800-technology-computer-fraudsters-foiled-by-the-small-ads/ including
Bellcore began running its advertisements in the New York Times in October
1991. They were interrupted for several months when newspaper employees
became suspicious of their cryptic contents. ‘Somebody said, ‘These look
like codes. You might be telling a terrorist to kill somebody,’ says Haber.
Fortunately for Bellcore, the Times’ computer correspondent persuaded the
newspaper to allow the advertisements back in.
Beware - article is buried in CSS and cookie notifications and inline ads)
I am looking for alternative to GUIDs for key generation in a distributed app. For example supposed I have Bob, James, and Jack all running a bug tracking application on their desktop where they can do thing like create bug tickets ala JIRA, or Bugzilla ... etc. When a ticket is created it is assigned a number such as T-1, T-2, T-3, T-4 ... etc. Tickets need to have a stable ID and should be creatable without having to consult a central server.
I understand that this is what GUID's are really good for but it in my case displaying a GUID in a UI is ugly people can't just copy and paste it and discuss it on a phone call, I really want integers or some sort of short string that is easy to talk about read in one glance .. etc.
Is there a way to use the bitcoin block chain as some sort of counter?
You may evaluate the approach taken by git. They use sha1 hash of commit information. And then abbreviate IDs are allowed which are much shorter and easier to read\transfer manually.
Having the number of bugs in your tracker is not going to reach millions that should be sufficient. Once it is you'll just need a longer abbreviation.
There seem to be plenty info around on how git calculates hash IDs and abbreviates them.
If I recall correctly how UUIDv1 works - it's "just" putting together the mac address and a very exact timestamp + maybe some additional integer. As your mac address should be unique (unless you've fiddled with it) and there are only so many UUIDs one computer can generate within a nano second, the resulting ID will be unique.
This is a very general and uninformed way to create IDs. If you'd implement a version of it yourself for your specific use case you could get much smaller IDs.
Assuming you can identify each node with a bug tracking system with a simple and unique string - for instance "Bob", "James", "Jack" - and you can create unique continuous integers within each node, you could combine those two and have IDs like "Bob-1", "James-12", ...
As you can see, actually there has to be again one central point, which will assign the unique strings, however depending on the number of nodes and how long they stay within the system, this could be as well done just by a human being.
The additional disadvantage (or advantage, depends how you look at it) of this approach (as well as of UUIDv1) would be, that you'd know where the ticket has been created as well as order of the tickets within one system.
I am trying to create a unique CD-KEY to put in our product's box, just like a normal CD-KEY found in standard software boxes that users use to register the product.
However we are not selling software, we are selling DNA collection kit for criminal and medical purposes. Users will receive a saliva collection kit by mail with the CD-KEY on it and they will use that CD-KEY to create an account on our website and get their results. The results from the test will be linked to the CD-KEY. This is the only way that we will have to link the results to the patients. It is therefore important that it does not fail :)
One of the requirements would be that the list of CD-KEYs must be sufficiently "spread" apart so that there is no possibility of someone entering an incorrect CD-KEY and still having it approved for someone else kit, thereby mixing up two kits. That could cost us thousands of dollars in liability.
For example, it cannot be a incremental sequence of numbers such as
00001
00002
00003
...
The reason is that if someone receives the kit 00002, but registers it as 000003 by accident, then his results will be matched to someone else. So it must be like credit card numbers... Unless a valid sequence is entered, your chances of randomly hitting a valid number is 1 in a million...
Also, we are selling over 50,000 kits annually to various providers (who will generate their own CD-KEYS using our algorithm) so we cannot maintain a list of all previously issued CD-KEYS to check for duplicate. The algorithm must generate unique CD-KEYs.
We also require the ability to verify that the CD-KEY is valid using a quick check algorithm, so that we can inform the user if the code he enters is invalid. This leaves out many hashing or MD5 algorithms I believe. And it cannot be a 128 bit because, who would take that time to type it out on the computer screen?
So far this is what I was thinking the final CD-KEY structure would look like
(4 char product code) - (4 char reseller code) - (12 char unique, verifiable CD-KEY)
Ex. 384A - GTLD - {4565 - FR54 - EDF3}
To insure the uniqueness of the KEYS, I could include the current date (20090521) as part of the source. We wont generate unique keys more than once a week, so this value changes often enough for the purpose of unique initial value.
What possible algorithm can I use to generate the unique keys?
Create the strings <providername>000001, <providername>000002, etc. or whatever and encrypt them with a public key, and that's your "CD-KEY" that the user enters. Decrypt the CD-KEY with the private key and validate that when decrypted you get a valid string with a valid provider name.
Credit Card numbers use the Luhn algorithm you might want to look at something similar to that.
I use SeriousBit Ellipter link for software protection but I don't see any reason you could generate a group of unique keys each week and us the library to verify the key validity when entered into your web site. You can also encode optional services into the key allow you to control how the sample is processed from the key (that's if you have different service levels).
As it uses an encrypted method of key generation in the first place and it's relatively cheap, it's certainly worth a look I would say.
I finally settled for a cd-key of this form
<TIMESTAMP>-<incremented number>-<8 char MD5 hash>-<checksumdigit>
I used the mod 11 ISBN checksum digit algorithm.
Generate GUID and catenate a random number to it. GUID is guaranteed to be unique and random number will make it improbable to hit a code accidentally. Just don't modify the GUID in any way or you might compromise the uniqueness.
http://msdn.microsoft.com/en-us/library/aa475087.aspx
I guess this is a multi-part question. I am building a membership site and want to have the accounts as international as possible.
What is the best way to collect phone numbers on a form that allows for international numbers? I'm not worried about storing them, just collection and validation. What I have now is a drop down with a country list that will add the country code, and then the number itself with validation for us/can/uk based on the country code, and then the extension. These will be stored as strings in 3 fields for cc/number/ext Does anyone have a better, solid solution for this, or perhaps seen one in action anywhere?
Ditto for addresses. What is the best way to go? Address/City/State/Zip/Country or just lines? I would like to be able to sort by these, so a single text field isn't a very good solution, though it is the most flexible.
This is also important because we may be sending actual mail to our members. I am put in mind of a few members I've had for other services that had addresses in countries I had never heard of, that even the woman at the post office couldn't tell if they were formatted correctly.
I want to have geodata in the db, at least country/state, for things like populating a state dropdown after selecting a country, field standardization, etc. Does anyone know of a great database that can be used as the geodata base of an app?
Phone number validation - I'm not sure if I'd spend a lot of time on this. Numbering schemes change quite often (for example, during the time I lived in the UK, the phone numbers for London area codes changed at least once, with another change shortly before I moved there) and in Germany it is (or at least used to be) quite common to increase the number of available phone numbers on a given exchange by taking an old number and tacking an extra digit or two at the end. So any assumption about a given phone number format will change and you'll end up playing catch-up. If you insist on splitting the phone number into international/area code/main number you'll probably find that this is a very country-specific way of representing the information so you'll need an input mask pretty much for every country and specific validation rules. Not to mention that in places like Germany, an area code can have between two and four digits etc...
Regarding postal addresses, the most important suggestion I have is to ensure that you can accept non-numeric post/zip codes, otherwise you won't be able to handle addresses in Canada and the UK (and possibly other places). This is a bit of a hobby horse of mine as I've had a few issues with websites in other countries that simply refused to let me put in a non-numeric post code and I had to resort to faxing over my address information as I couldn't fill in the online application form. In my book that's bad karma if you allow international customers....
Also, assuming the existence of certain parts of the address (state/county, for example) and requiring them is usually more of a headache than it's worth. I'd be tempted to offer the standard house number + street (combine them, different languages put the house number in different places so separating them out is not a good idea IMHO unless you know how to reassemble them correctly, plus sometimes you'll end up with a house name instead of a number), town and zip/post code, possibly with an optional county/state field. If you want to be really helpful to you international audience, offer a free-form, single text entry field for those addresses that don't conform to our "standard" assumptions of how an address looks. And please make them big enough so people with quite long addresses don't run out of space...
There is an international standard for telephone numbers, but it leaves a lot of breathing room. Separators are not mandatory, but are restricted to space, period, and hyphen. Round brackets (aka parentheses) are to be put around digits which are optional depending on where you are dialling from. For example, the area code is optional in some areas. I would provide a text field and let the user enter their number however they want.
For addresses, provide lots of fields and don't restrict too much. House numbers sometimes contain letters. Road types are sometimes written in full, and other times abbreviated. (St = Street, Ave = Avenue, etc.) I would provide drop-downs where possible (state/province), but allow freeform input when you don't have a list. When the user is entering their address, it's ok to validate for security risks, but you might want to leave geographical validation until later. For example, if the user enters a postal code of T8N 4E3 and selects Ontario as their province, the address is not valid because the given postal code is for Alberta. Display a friendly message to the user letting them know that they need to correct their address or contact you if it's correct (possible bug in your code).
Address - just remember not everywhere you got states and ZIP codes. and if you got ZIP codes they can be in diffrent format ([0-9]{2}-[0-9]{3} here). (edit: usually postal address with 2 address lines, city, state (optional), zip code (optional) and country is ok).
So is with geodata - you can make sequential dropdowns with states and cities but guess you won't cover every city. Why not show a piece of google maps and allow the users to click there to mark their position?