Related
I am trying to create a health application of a rather sensitive nature which will require some form of cryptography/obfuscation. There is a health study in which once a year, known individuals with permanent and recognisable identifier numbers (eg KIG0005001 as an individuals identifier) walk into the clinic, are identified, have their blood tested as part of a study. Next year, the same happens again, as this is a longitudinal study. Now the results of the blood test should NOT be able to be traceable to an actual individual (HIV status, etc are highly sensitive bits of information that should not be linkable with actual individuals due to their right to privacy), but it is IMPERATIVE that we can identify year on year which blood samples belong to one unique individual (without knowing WHO the individual actually is, the emphasis is on the blood samples being traceable to one individual, not the individual).
My idea (and here is where am asking for your expertise in cryptography and obfuscation) is that when the individual visits the clinic they come with an identifying card with their regular id number KIG0005001 . This number is entered into a system where via an algorithm/encryption it spits out a barcode (based on the original id KIG0005001 , therefore any future visits should produce the SAME barcode for a particular individual) which can be printed out as stickers. These barcode stickers are the ones to be used to identify the samples (stick em on the samples). The stickers should have the following information in them: unique identifier (via barcode?), the round number that the sample was taken (samples will be taken once a year, so year 1= round 1) and date sample taken.
Is this possible? What are the alternatives? How/What should I do in terms of transforming KIG0005001 into an encrypted barcode which is repeatable year on year (so blood sample can always be traced back to the same source). Am programming in Java.
Thanks in advance,
Tumaini
To answer this question, I don't think it needs to be in the barcode section.
First of all, there is no way to keep everything 100% secure... but you can make it more complicated to be understood by a human.
It's the same thing as the passport controversy... A biometric passport must be secure: it's not possible to read the information without knowing the "private key". But let's say you read and record everybody's passport that enters your store and save it to a database. You will be able to trace who is coming back and even what they previously bought since you have their passport's ID...
To make the life harder for your employees, you need to generate an ID that will match the real person's ID. So if the employee is testing the blood of KIG0005001, they will receive a different unique ID for that day; the computer will know how to link them up. So that your employee has no idea who is this number at that moment...
Cryptography is probably useless here since you work with IDs. Even a gibberish data repeated multiple time is still an ID.
I am trying to create a unique CD-KEY to put in our product's box, just like a normal CD-KEY found in standard software boxes that users use to register the product.
However we are not selling software, we are selling DNA collection kit for criminal and medical purposes. Users will receive a saliva collection kit by mail with the CD-KEY on it and they will use that CD-KEY to create an account on our website and get their results. The results from the test will be linked to the CD-KEY. This is the only way that we will have to link the results to the patients. It is therefore important that it does not fail :)
One of the requirements would be that the list of CD-KEYs must be sufficiently "spread" apart so that there is no possibility of someone entering an incorrect CD-KEY and still having it approved for someone else kit, thereby mixing up two kits. That could cost us thousands of dollars in liability.
For example, it cannot be a incremental sequence of numbers such as
00001
00002
00003
...
The reason is that if someone receives the kit 00002, but registers it as 000003 by accident, then his results will be matched to someone else. So it must be like credit card numbers... Unless a valid sequence is entered, your chances of randomly hitting a valid number is 1 in a million...
Also, we are selling over 50,000 kits annually to various providers (who will generate their own CD-KEYS using our algorithm) so we cannot maintain a list of all previously issued CD-KEYS to check for duplicate. The algorithm must generate unique CD-KEYs.
We also require the ability to verify that the CD-KEY is valid using a quick check algorithm, so that we can inform the user if the code he enters is invalid. This leaves out many hashing or MD5 algorithms I believe. And it cannot be a 128 bit because, who would take that time to type it out on the computer screen?
So far this is what I was thinking the final CD-KEY structure would look like
(4 char product code) - (4 char reseller code) - (12 char unique, verifiable CD-KEY)
Ex. 384A - GTLD - {4565 - FR54 - EDF3}
To insure the uniqueness of the KEYS, I could include the current date (20090521) as part of the source. We wont generate unique keys more than once a week, so this value changes often enough for the purpose of unique initial value.
What possible algorithm can I use to generate the unique keys?
Create the strings <providername>000001, <providername>000002, etc. or whatever and encrypt them with a public key, and that's your "CD-KEY" that the user enters. Decrypt the CD-KEY with the private key and validate that when decrypted you get a valid string with a valid provider name.
Credit Card numbers use the Luhn algorithm you might want to look at something similar to that.
I use SeriousBit Ellipter link for software protection but I don't see any reason you could generate a group of unique keys each week and us the library to verify the key validity when entered into your web site. You can also encode optional services into the key allow you to control how the sample is processed from the key (that's if you have different service levels).
As it uses an encrypted method of key generation in the first place and it's relatively cheap, it's certainly worth a look I would say.
I finally settled for a cd-key of this form
<TIMESTAMP>-<incremented number>-<8 char MD5 hash>-<checksumdigit>
I used the mod 11 ISBN checksum digit algorithm.
Generate GUID and catenate a random number to it. GUID is guaranteed to be unique and random number will make it improbable to hit a code accidentally. Just don't modify the GUID in any way or you might compromise the uniqueness.
http://msdn.microsoft.com/en-us/library/aa475087.aspx
If you read this thread before - forget everything I wrote, I must have been drunk when I wrote it. I'm starting over:
I'm currently working on a project where we will be using some sort of algorithm for validating user input. There are three parties to consider;
Client - Browsing our web pages
Company - We, handling the Client requests
3rd Party Company - Handling Client messages
On our web pages we will show the Client some information about a product. If he/she wants more information about the product he has to contact the 3rd Party Company and state the products code (not unique per se, but not common either). In return the Client will receive some other code from the 3rd Party Company which he should input on our web page, where we will validate the code for approval.
The best would be if we, the Company, had no interaction with the 3rd Party Company. Pure encryption is out of the picture because it generates a string that is too long. We are doing this by SMS, so the codes has to be short.
What I've come up with so far:
For every product I generate a somewhat unique code (it doesn't matter if it's unique or not really) in base 16 (0-f). The Client who wants more info about the product sends a SMS to the 3rd Party Company stating the products code. In return the Client receives the same code, but the digits are multiplied (possibly by 2) and converted to base 36. On top of that a last character is added to the code, a control number, to make the code valid for the Luhn algorithm in base 36. The user enters the received code and we, the Company, validate it on the server side against the product code (validate against Luhn, divide by 2 and switch back to base 16).
Does this sound reasonably safe and appropriate? Is it a valid way to send messages by three parties, when two of them shouldn't need to communicate?
Sorry for the edit, but my mind must have been elsewhere when I wrote the first post.
I think you are confusing things, if you use the Luhn algorithm, for example, it'll just return True or False on the checksum. The sample code you gave seems to indicate that you want to have some checksum result (ex. 12345) that can be hashed from two different values. This problem would be more difficult.
How will the third party create this value? Will you give them some Javascript code for them to execute, or some other language? Couldn't you have a shared secret key and they could symmetrically encrypt the value with that secret key, you could have them prefix the part they encrypt with some known value so you could verify it quickly.
Their code:
to_send = encrypted(shared_key, 'check' + code)
Your code:
unencrypted = decrypt(shared_key, to_send)
if not unencrypted.startswith('check'):
return False # failed check
OK, so you want no interaction between the other application and your application. And you would like to limit the codes to 6 characters. Here are my thoughts:
Use 10 characters, that will make brute-force attacks harder;
Use all Latin letters and digits - that will give you 36 possible character values;
Why not use some big number library and simply multiply your code (taken as a Base36 number) by some ludicrously large value (say, 2048 random bits). Then convert it to Base36 and take the last 10 digits. Or maybe first 5 and last 5. Or maybe some other combination dependant on the original code. I've no idea how cryptographically strong this will be (probably not much), but the effort to crack the code will doubtfuly be smaller than simply paying for the service.
Alternatively you could salt (prepend some secret string) your code and then calculate MD5 of it. Return the MD5 (or some N characters of it) to the user as your code. This should be pretty cryptographically OK, although I'm no expert. By converting the MD5 result to Base36 you could increase the strength of this algorithm.
Why a "checksum"? Can't the 3rd party run any little utility that you give them? All you need is a 5-digit encryptor that the 3rd party can run on their computer, feed the product code into, and send the 5-digit result to the client as the key code.
The encryptor always produces the same result from the same input.
Then, the client sends you the product code and the key code. You run the product code through an exact copy of that encryptor, and compare that result to the key code.
The security of this system can be enhanced without changing the fundamental architecture.
-Al.
Edit after some clarifications:
I still think that the product code and the matching 3rd party response cannot be constant - otherwise it can be shared will other users, which will thus be able to give the response code without going to the 3rd party.
If the product code is constant, a possible approach is that the 3rd party response depends on both the code and the user's phone number, and so is your validation. This way, each response is both product and user specific.
The specific permutation of the Luhn algorithm isn't too important in my opinion - if someone can crack one variation, he'll probably be able to crack another one.
Original Answer:
In short, I think you can use the Luhn algorithm, if you give the user a one-time ticket, valid for a limited amount of time.
First, if I understand the problem correctly, your product code cannot be constant - otherwise the response created by the 3rd party will always be the same for this product. This means the user will be able to use this code again later, or even give it to another user.
Therefore, I think you should generate and give the user a random new code per his request of information/access to the product. This code should be valid for this product for a limited period of time (an hour, a day, depending on your needs).
The response sent by the 3rd party to the user should be valid only when entered together with the code you provided to the user.
After validation, this code cannot be used until the specified time period is over.
As an option, I think you and the 3rd party can append something like the current date to the code and response pair during computation, so they are not always the same pair.
After long debates with the 3 Party Company we've concluded that the best solution will be if they pass the Clients SMS to me, I generate a new code and send it back to them which in their turn send a new SMS to the Client with the code I generated. Not optimal from my point of view, but at least I can now do it in any way I want.
Thanks for your input thou.
I have a form that asks users to enter a start and end time for an event. For many years, we have allowed them to enter the times by selecting the hour (1-12), minute (1-60), and AM/PM from three drop down boxes. This has worked fine without complaints from customers. However, today I was hit with a request to change the input to one text box for the user to enter time in military time (aka 0000 - 2359). In my gut I believe this is a bad idea but am having trouble coming up with any hard facts.
What are the best reasons I can give that this would be a bad idea?
If there is a better solution for entering time, what would it be?
Also, FYI the users filling out the form run the gamut from very little skill with computers to advanced users. They are in no way military related.
Update: All my users are local and no other forms (web or print) use military time as the standard.
Three dropdowns are a nightmare usability-wise. You can cut these down to two by eliminating AM/PM and moving to 24-hour format, but still: a dropdown with 60 items is overkill.
I'd much prefer to enter time "manually", provided that these input boxes will be intelligent enough (say, they should be able to convert 18 to 1800, 0 to 0000, allow : as a separator, etc.). Plus do not allow users to enter incorrect data in the first place.
To answer your question: I see no reason to disallow your users to do what they want. After all, they are users.
Well, from a user interface standpoint, this could be a mistake simply according to some of Jakob Nielsen's user interface heuristics:
"Match between system and real world." If your users are not used to entering dates in military time, asking them to do so for your app can be distracting at best, and frustrating at worst.
"Error prevention" You are not eliminating error-prone conditions, but possibly introducing them.
There is also the question of why this change is being made. Are customers complaining? Is data coming in incorrectly? As mentioned by others, are your users used to military time? Any interface change should happen for a reason, IMO, because you're going to change the user experience and there will be ramifications for that; it's just a matter of how large those ramifications will be. My assumption is that data entry errors are supposedly going to be avoided -- but are they? Asking a user to enter a time as "XX:XX" and parsing out the semicolon (or, as Aaron Digulla stated, ANY non-number characters) and then converting it as needed seems less likely to result in errors than asking a user to enter a time in a format they are not used to using daily.
My concern would be that a user wants to enter 3:30 PM, and, while not paying much attention, simply enters 330. This is now 3:30 AM, and the user will never know the difference, because the app takes the information and happily assumes that this is what is meant. However, allowing the user to enter the time in "XX:XX" format and having an "AM/PM" selection makes much more sense.
As far as hard facts, well, I don't have them either. But if your boss/client won't be swayed by Nielsen's heuristics, I'm not sure what can change their mind.
Oh my.
My advice is to quit and find a different project.
We did a scheduling app for a "military customer" - and even they could not agree on what constituted "military time". Half of them wanted something called "Zulu Time" - the other half wanted "GMT plus offset" - then some wanted local time in 24h format. Contrary to what our contract specified, a Colonel insisted we use "Zulu" - we made the change for political reasons (in violation of our contract) - and then HE missed showing up for a scheduled event, because he thought it was in local time. Then contract management came down on us like a ton of bricks.
(never mind that the published schedule also used an obsolete "offset" that was a cold-war holdover meant to "fool the Russians").
In that this is just me sharing a war-story. . .
The real answer is to Elicit Requirements from your customer. Get those requirements SPECIFICALLY written into your contract. Make sure that the stakeholder who is actually writing your check, agrees. Develop to that specification exactly. When someone complains tell them to pay for a contract mod. You'll probably be changing this back and forth among many different settings for the next 10 years. You'll have steady work, and you'll understand why military contracts frequently go way over budget and are never on schedule.
"They are in no way military related."
That's a good enough reason for me. It's an uncommon format that, while not exactly "user-hostile," is nonetheless not the way most of us are used to seeing dates, and requiring your users to do the conversion in their head will lead to arithmetic errors eventually.
That said, drop-down boxes aren't great either. Best to go with 2 input boxes and an AM/PM dropdown, in my opinion.
It may not be a bad idea. Imagine the case where users must enter that bit of information lots of times, for example because they are in call support. Or they may find the dropdown boxes not usable enough, even after having tried them. They may prefer that other format.
It is usually a good idea to talk to the stakeholder and ask him: "Why do you want it this way?" you can then contrast their ideas with yours, but if yours are only that you have the "gut" feeling that this is not right, guess who will win the argument. The gut feeling is not a valid business argument - especially when the business is not yours.
So in short, do what your customer wants - just make sure that they understand their options well, and point out to them any inconvenience that they may have foreseen - once you find one, that is.
Honnestly, I think using AM/PM format is a bad practice, but that may be because I'm used to the 24 hours scale.
One reason against is that if all your users are used to the 12H scale, then most of them might still enter 1:00 instead of 13:00 for 1:00. Since the PM is not here, it will result in mistakes.
However, one good reason to do the switch is simply because it's the international standard.
Depending of what you want to put the emphasis (speed or functionality) you can use a time picker that would rely on regional setting to diplay the time in the user format or use a clock-like control. If speed is important, you might prefer a simple mask-textbox.
Hmmm, describing the 24 hour clock as "military time" and then noting that the users are not military makes me a more than a little twitchy.
It will depend on your users but I think that it is more than reasonable to expect people in contemporary society to understand the 24 hour time format and to be able to enter times using that format (given that I would - possibly naively - expect that format to be in use for bus, train, plane and other timemtables almost universally for the simple reason that its unambiguous). Perhaps this is not true worldwide - but it is certainly true across Europe.
That said, changes need to be made for a reason - "if it ain't broke..." is a very sound maxim for a working site and whilst I wouldn't ever willingly use am/pm for time entry I don't have a problem with use of dropdowns for time entry - especially as one can type "into" them. In this case I think that going from drop downs to text boxes is most likely an opportunity to introduce errors (although again it rather depends on the users).
I can see why you think this is a bad idea, silly users input wrong format etc.
However have you considered a jQuery Masked input box?
In my own frames, I accept times and dates in a wide variety of formats. When the field loses focus, I'll try to parse the input and format it into the "correct" or "official" format. This gives the user a nice way to enter the data and a visual cue when something is wrong.
For example, in a date field, I'll accept "1" as "01.12.2009" (current month+year). In a time box, I'll accept "1030", "10 30", "10.30" (i.e. I just filter out anything which isn't a number). "010409 1125" becomes 1. April 2009, 11:25am.
Few outside the united states knows the words "military time". They also prefer 24-hour format.
If you want globalization, you can do one of the two:
use accepted and de-facto standards, such as ISO8601 date format, 24h time and speak English
dive into the nightmare of the vast regional-based localization complexity (some unfortunate programmers have to do it anyway. Then they support AM/PM, unicode and never-showing-yellow-color for certain cultures)
I cannot believe how much consideration this idea has gotten.
Forcing your user to do things your way, because it's "more efficient" is a terrible idea.
Your forms should be both streamlined (power users can enter data quickly from the keypad) and comprehendible (first time users can navigate successfully). The conversion to 24 hour time will throw people immediately. I lived in Quebec for almost six years and still had troubles switching back and forth from 24hour time. DON'T DO THIS.
Just in addition to all the rest of comments you should thing about one more thing.
Programmers and designers usually think the client pays us just for creating what he tells us to... That's only half true. They pay us, even if they don't realize it, for telling them what they need, what's best for them.
Of course, the final decision is always theirs, as the pay, but if you feel it is wrong and you think you know the business model better than them, then do not blindly accept whatever they told you to do.
You might want to consider using the jQuery timepicker (or Telerik DateTimePicker in Time-only mode for WinForms) and also build in support, on the backend, for multiple formats in the event that javascript is disabled.
date/time input through select boxes is a horrible UI design.
but, if some of your users come from the few countries that stick to AM/PM for time format, then forcing the "military" format on them without assistance from the program is also bad.
use something like the jQuery masked input plugin.
if i was doing this, i would use a masked text input and a "PM" checkbox: if the value is more than 1259, the checkbox is disabled. otherwise, it's clear by default.
Why not use a TimePicker control of some sort?
You shouldn't force non-military users to user a strange to them time format.
In any case, assuming that all input is by logged-in users, you can provide multiple mechanisms (and certainly multiple ways if displaying time) and make the choice a user preference. But I'd strongly recommend that whatever you do, for any given user times should be entered and displayed in a consistent manner.
I guess this is a multi-part question. I am building a membership site and want to have the accounts as international as possible.
What is the best way to collect phone numbers on a form that allows for international numbers? I'm not worried about storing them, just collection and validation. What I have now is a drop down with a country list that will add the country code, and then the number itself with validation for us/can/uk based on the country code, and then the extension. These will be stored as strings in 3 fields for cc/number/ext Does anyone have a better, solid solution for this, or perhaps seen one in action anywhere?
Ditto for addresses. What is the best way to go? Address/City/State/Zip/Country or just lines? I would like to be able to sort by these, so a single text field isn't a very good solution, though it is the most flexible.
This is also important because we may be sending actual mail to our members. I am put in mind of a few members I've had for other services that had addresses in countries I had never heard of, that even the woman at the post office couldn't tell if they were formatted correctly.
I want to have geodata in the db, at least country/state, for things like populating a state dropdown after selecting a country, field standardization, etc. Does anyone know of a great database that can be used as the geodata base of an app?
Phone number validation - I'm not sure if I'd spend a lot of time on this. Numbering schemes change quite often (for example, during the time I lived in the UK, the phone numbers for London area codes changed at least once, with another change shortly before I moved there) and in Germany it is (or at least used to be) quite common to increase the number of available phone numbers on a given exchange by taking an old number and tacking an extra digit or two at the end. So any assumption about a given phone number format will change and you'll end up playing catch-up. If you insist on splitting the phone number into international/area code/main number you'll probably find that this is a very country-specific way of representing the information so you'll need an input mask pretty much for every country and specific validation rules. Not to mention that in places like Germany, an area code can have between two and four digits etc...
Regarding postal addresses, the most important suggestion I have is to ensure that you can accept non-numeric post/zip codes, otherwise you won't be able to handle addresses in Canada and the UK (and possibly other places). This is a bit of a hobby horse of mine as I've had a few issues with websites in other countries that simply refused to let me put in a non-numeric post code and I had to resort to faxing over my address information as I couldn't fill in the online application form. In my book that's bad karma if you allow international customers....
Also, assuming the existence of certain parts of the address (state/county, for example) and requiring them is usually more of a headache than it's worth. I'd be tempted to offer the standard house number + street (combine them, different languages put the house number in different places so separating them out is not a good idea IMHO unless you know how to reassemble them correctly, plus sometimes you'll end up with a house name instead of a number), town and zip/post code, possibly with an optional county/state field. If you want to be really helpful to you international audience, offer a free-form, single text entry field for those addresses that don't conform to our "standard" assumptions of how an address looks. And please make them big enough so people with quite long addresses don't run out of space...
There is an international standard for telephone numbers, but it leaves a lot of breathing room. Separators are not mandatory, but are restricted to space, period, and hyphen. Round brackets (aka parentheses) are to be put around digits which are optional depending on where you are dialling from. For example, the area code is optional in some areas. I would provide a text field and let the user enter their number however they want.
For addresses, provide lots of fields and don't restrict too much. House numbers sometimes contain letters. Road types are sometimes written in full, and other times abbreviated. (St = Street, Ave = Avenue, etc.) I would provide drop-downs where possible (state/province), but allow freeform input when you don't have a list. When the user is entering their address, it's ok to validate for security risks, but you might want to leave geographical validation until later. For example, if the user enters a postal code of T8N 4E3 and selects Ontario as their province, the address is not valid because the given postal code is for Alberta. Display a friendly message to the user letting them know that they need to correct their address or contact you if it's correct (possible bug in your code).
Address - just remember not everywhere you got states and ZIP codes. and if you got ZIP codes they can be in diffrent format ([0-9]{2}-[0-9]{3} here). (edit: usually postal address with 2 address lines, city, state (optional), zip code (optional) and country is ok).
So is with geodata - you can make sequential dropdowns with states and cities but guess you won't cover every city. Why not show a piece of google maps and allow the users to click there to mark their position?