How to choose a barcode type - barcode

I am building a time keeping application for a business. Their staff will carry a unique barcode (either on a lanyard or as an image on their phone) and will display it to a barcode reader. The reader will use it to identify the clock-on/clock-off activity of the staff member.
As it currently stands, each staff member already has a unique id. These are incrementing integers, starting at the digit 1. At the very most, there may be hundreds or thousands of unique staff members (throughout the duration of the lifetime of my application) but certainly not one million or more. I am planning to encode this unique ID as the barcode.
Given the above, how should I choose a bar code system?
It seems to me that EAN13 is widely supported by barcode readers, and has ample 'space' for my needs (i.e. less than 1-million unique id's). This would seem like a good choice.
I see that some other systems include 'error checking', but they include a lot more visual detail. I presume that these codes would need to be printed carefully (e.g. not on a home printer), and would only useful in well lit environments.

EAN-13 uses "registered" numbers and is primarily for consumer packaging.
If you want a linear barcode, look at code128. It has a built in check-digit and packs two digits per "character". You should plan to zero-pad the numbers to 6 or 8 digits so the barcodes are always the same size. The following shows "000101" in code128:
https://bwipjs-api.metafloor.com/?bcid=code128&text=000101&includetext&backgroundcolor=ffffff

I'd choose QR codes cos they're sick, you can stick a logo in them, and they're just nice in general.

Related

How can I generate a unique identifier that is apparently not progressive [duplicate]

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.

10 digit phone numbers... Is that enough for USA?

My app may be used anywhere in the USA, but will be used by local businesses serving their own areas.
As my project-in-development exists now (and I can change it) I'm storing only 10 digits. I'd like to think my software may be in use 10 years from now, although I can certainly release updates. But since the trend is for every person to have a phone instead of just one number per household, I understand the USA is running out of 10-digit phone numbers.
I know it may not seem so, but yes, I HAVE Googled and the answer I seek is still as clear as mud.
I read that there are locales within the USA (I don't know where) in which even within the same area code, a 1 and the area code must be dialed first. Other times, just the area code must be dialed, without the 1, even within the same area code.
MY QUESTION IS: To accommodate the whole USA and the foreseeable future, will I need to add an "optional 1" in front of the number, in the form of a check box or other device to distinguish those which need a 1 from those that don't? Is there another phone number schema coming in the future? Or putting it all more simply: Is 10 digits enough?
if you only want to store North american numbers you'll be fine
North American Numberin Plan
10 is the standard length in north america (includes canada)
You should allow for 15 digits including the country code. You already need 12 to 14 digits (including country code) for many countries.
Store all numbers in E.164 format including the country code, without spaces or punctuation.
This will allow easy expansion internationally to other countries and also allow manipulation of numbers in the database if the length of numbers used in any country were to ever change.
There's talk that US numbers will become a digit longer some time in the next decade or or two. You should plan for that now, not when you have tens of millions of numbers stored.
There's constant change in national number plans. If you know that area code 765 in country 980 is changing to area code 77 and all local numbers are having 88 added to the beginning it's a simple operation to make that change if all the numbers in the database include the country code.

Public source of randomness

I want to set up "public lottery", in which everyone can see the selection is random and fair. If I only needed one bit, I would use, for example, the LSB of the closing Dow Jones index for that day. The problem is, I need 32 bits. I need a source that is:
available daily
visible to the public throughout the world
not manipulable (by me or anyone else)
unbiased
simple
I suppose I could just pick 32 stocks or stock-indices and use the LSB of each, that would be at least difficult to manipulate, and run them through some hash to eliminate any bias toward 0, but that doesn't really qualify as "simple". Other thoughts: some feed of meteorological or seismological data. That would be more difficult to manipulate (much easier to buy a share of stock than to cause an earthquake) but harder to authenticate (since there aren't armies of auditors watching weather data).
Any suggestions?
Check out http://www.random.org/ They have a section for Third-Party Draw Service
The Third-Party Draw Service is useful for people who operate raffles,
sweepstakes, promotional giveaways and other lottery type services
professionally. In a similar fashion to a certified official,
RANDOM.ORG acts as an unbiased third party who conducts the drawings
in a manner that is guaranteed to be fair and truly random. The
drawings are made using true randomness that comes from atmospheric
noise, which for many purposes is better than the pseudo-random number
algorithms typically used in computer programs.
Check out the Public Records for details about recent drawings held
with the service.
This sounds like what you are looking for, but you would end up having to rely on random.org for the numbers.
The part "visible to the public throughout the world" is the trickiest part in my opinion.
An excellent source of really random numbers is the noise on a webcam (or any other CCD camera). This noise is caused by quantum fluctuation of electron temperature on the CCD plate, so it's truly random.
You could use a picture from a publicly available webcam, but it's hard to find one with a closed shutter... You could set one up and make it available yourself, or you could use one that monitors some meteorological event and subtract a time-averaged image every day.
I hope this is simple enough!
Look at the XKCD GeoHashing algorithm.
MD5(Date, Dow Jones Opening)
Depends how "simple" you want.
I would take a large set of unrelated inputs. You could include some or all of these:
Stock prices (preferably from multiple locations, e.g. Last digit of Dow Jones + last digit of FTSE)
Last digit of the reading from a publicly-visible digital thermometer (easy to find in large cities)
The date
MD5 sum of the current google.com logo image
Name of top-billed guest on today's episode of <insert name of TV talk show here>
Other public lotteries
Concatenate all of these into one large string and apply a cryptographic hash function to it.
The hash will not increase the total entropy, but what it will do is make the output harder to manipulate (because the attacker would need to manipulate many inputs simultaneously.)
Now just take the first 32 bits of the hash.
Separate the non deterministic from the random use a third party service that streams random number sets with a sn assigned to each set.
you set up the number of bits and the number of digits in sn.
Now it streams in random sets with assigned sn in a loop the size of your sn. Save it and you get a batch set of numbers that you put out for public record
Now you can chose a smaller number that doesn't need to be random, just non deterministic to pick the single set of numbers

Is there pseudocode for UK address or phone number validation?

Do you have pseudocode for field validation of the following items in the UK? I am from the USA, so I only know the ones in the USA right now.
Address Line 1
Phone Number
Mobile Number (in case they have a special rule for this, which they might not)
Post Code
Address line 1, if you want to validate what the user entered freeform, you're probably hosed. There's huge variability. You can use the PostCode Address File (see below) to assist,
Typically, if you want a "standard" address, UK-oriented websites ask for the postcode, then prompt the user to choose the correct address from all addresses at that postcode
Phone and mobile numbers. See here http://en.wikipedia.org/wiki/Telephone_numbers_in_the_United_Kingdom. A script to validate these (in several languages) can be found here: http://www.braemoor.co.uk/software/telnumbers.shtml
Post code format: http://en.wikipedia.org/wiki/UK_postcodes (contains a regular expression for validation, and refers to the Postcode Address File which lists valid addresses)
Address line 1 could be almost anything. There aren't always house numbers.
Phone numbers: the length of an area code varies. I wouldn't like to swear whether the full length is constant, but I suspect it's always at least 10 digits. Mobile numbers typically (IME) start with 07 whereas landlines typeically start with 01 or 02. Special numbers (free, local rate etc) typically start with 08. I'll try to find a reference for this. (EDIT: Again, there's a good Wikipedia article.)
Wikipedia has a good article about UK postcodes, including regular expressions for them.
Perl has the Number::Phone module that can handle UK phone numbers. Royal Mail has services for address validation and list cleansing.
For UK phone numbers, your best bet (unfortunately) is probably to download the numbering plans from Ofcom (they're excel spreadsheets, with all relevant number ranges, split up into area codes, "geographical numbers", personal numbers, mobile numbers, assorted service numbers, the different pay-for numbers and also have a mapping from number range to operator (the latter is probably NOT something you need, though).
As always with this sort of question, don't get hung up on over-validation. Check for the most likely grossly-malformed inputs then move on. Trying to keep track of what prefixes and lengths of phone number are in use in any particular locale is an enormous waste of your time. Letting a few, possibly-fixable mistakes through is better than losing customers by telling them they are ‘invalid’.
(The nearest to a standard addressing format you get in the UK is postcode plus house number. Even then there are always exceptions. )
I've started the code (or at least a bunch of RegEx patterns) for Django forms validation for GB telephone numbers here.
With a basic explanation here.

What is the best format for a customer number, order number?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
A large international company deploys a new web and MOTO (Mail Order and Telephone Order) handling system. Among other things you are tasked to design format for both order and customer identification numbers.
What would be the best format in your opinion? Please list any assumptions and considerations.
Accepted Answer
Michael Haren's answer selected due to the most up votes, but please do read other answers and comments as they make Michael's answer more complete.
Go with all numbers or all letters. If you must mix it up, then make sure there are no ambiguous characters (Il1m, O0, etc.).
When displayed/printed, put spaces in every 3-4 characters but make sure your systems can handle inputs without the spaces.
Edit:
Another thing to consider is having a built in way to distinguish orders, customers, etc. e.g. customers always start with 10, orders always start with 20, vendors always start with 30, etc.
DON'T encode ANY mutable customer/order information into the numbers! And you have to assume that everything is mutable!
Some of the above suggestions include a region code. Companies can move. Your own company might reorganize and change its own definition of regions. Customer/company names can change as well.
Customer/order information belongs in the customer/order record. Not in the ID. You can modify the customer/order record later. IDs are generally written in stone.
Even just encoding the date on which the number was generated into the ID might seem safe, but that assumes that the date is never wrong on the systems generating the numbers. Again, this belongs in the record. Otherwise it can never be corrected.
Will more than one system be generating these numbers? If so, you have the potential for duplication if you use only date-based and/or sequential numbers.
Without knowing much about the company, I'd start down this path:
A one-character code identifying the type of number. C for customers, R for orders (don't use "O" as it could be confused with zero), etc.
An identifier of the system that generated the number. The length of this identifier depends on how many of these systems there will be.
A sequence number, unique to the system generating it. Just a counter.
A random number, to prevent guessable order/customer numbers. Make this as long as your paranoia requires.
A simple checksum. Not for security, but for error checking.
Breaking this up into segments makes it more human-readable as others have pointed out.
CX5-0000758-82314-12 is a possible number generated by this approach
. This consists of:
C: it's a customer number.
X5: the station that generated the number.
0000758: this is the 758th number generated by X5. We can generate 10 million before retiring this station ID or the station itself. Or don't pad with zeros and there's no limit.
82314: this was randomly generated and results in a 1/100,000 chance of guessing a customer ID.
12: checksum.
A primary advantage of using only numbers is that they can be entered much more efficiently using 10-key.
The length of that number should be as short as possible while still encompassing the entire entity space you expect to catalog with room to spare. This can be tricky and should be given a bit of thought. A little set theory can give you the number of unique keys you will have access to, given a group of elements.
It is natural when speaking, to break numbers up into sets of two to four digits. By inserting dashes in some pattern, you can "force" the customer to repeat them in a more efficient and unambiguous manner.
For instance, 323-23-5344, which, of course, is social security number format, helps to inform the speaker where to pause when vocalizing the number. It also provides a visual delineation when writing the number and makes it easy to compare when copying the number.
I second the recommendation that the ordering system masks the input correctly so that no dashes need to be entered at any time. This should be carried through to printed forms to provide a clear expectation of what should be entered. For instance, a printed box for each digit separated by printed dashes.
I disagree that too much information should be embedded in this number especially if those attributes might change. For instance, say we give "323" the meaning of "is a nice customer" but then they call in four times with an attitude. Are we then going to change their customer key to "324", "is a jerk"? What if they are in region 04 and move their company to region 05?
If that happens, your options will be to update that primary key throughout the database or live with the ambiguity that the information embedded in that key is no longer reliable, thus rendering all of the information embedded in the keys of questionable utility.
It is better to store attributes that may change as separate fields in the database and have the customer number be a unique, unchanging key for that customer.
To build on Daniel and Michael's questions: it's even better if the separated numbers MEAN something else. For example, I worked for a company where account numbers were like this:
xxxx-xxxx-xxxxxxxx
The first set of numbers represented the region and the second set represented the market within that region. Once you got used to knowing what numbers were from were, it made it really easy to tell what area an account was in without even having to look at the customer's account.
There are several assumptions that I make when answering this question; some are based on the fact that it is a large international organization, and some are based on the fact that the format is for two separate table types.
Assumptions based on the fact that it's an international organization:
It is probable that each region will need to operate independently -- that is, region A must be able to add customer numbers independently from region B
Each region probably uses a different language so to make the identifiers easily type-able by users around the world, it is best to stick to numbers and spaces only.
Assumptions based on the fact that there are two tables for which this format will be used:
This format may be used by more than the two tables listed, so it should be able to handle an arbitrarily large number of tables.
Experienced users should be able to know what type of identifier they are looking at based on information encoded into the identifier itself.
It would be nice if identifiers were globally unique within the entire system.
Considerations:
For a global company, identifiers can be very long if only numerics are used. We should attempt to limit the amount of extraneous information encoded into the identifier as much as possible.
Identifiers should be self-verifiable to a limited extent; that is a program should be able to detect a large percent of invalid identifiers without looking anything up at all. This implies a checksum.
Proposed format:
SSSS0RR0TTC
The format proposed is as simple as possible, but no simpler:
C The first (rightmost) character will be a checksum of all other characters in the identifier. A simple checksum will do. This will eliminate 90% of all typing errors. If it is decided that this is not enough, then this can be expanded to 2 digits which will eliminate 99% of all typing errors.
TT The next N digits represent the table type number. No table type number can contain the digit zero.
The next digit is a zero. This zero separates the table type number from the region number.
RR The next N digits are the region number. No region numbers can contain a zero.
The next digit is a zero. This zero separates the region from the sequence number.
SSSS The next N digits are the sequence number. This number can contain zeros.
Each set of four numbers are separated by spaces when printed or typed in by convention. Internally they are not separated, but this helps the user transfer them correctly.
Examples
Assuming:
Customer table type=1
Order table table type=2
Region code for US-Alabama=1
Region code for CA-Alberta=43
Region code for Ethopia=924
10 1013 - Customer #1 in Alabama (3 is the checksum: 1 +1 + 1)
10 1024 - Order #1 in Alabama
9259 0304 3016 - customer # 925903 in Alberta, Canada
20 3043 4092 4023 - order number 2030434 in Ethopia
Advantages of this approach:
90% of mistyped numbers will be caught
There are an unlimited number of table types
There are an unlimited number of regions
There are an unlimited number of sequential numbers for each table
Identifier numbers are globally unique to the system. This is important - a customer number cannot be mistaken for an order number and visa versa.
Each region can independently add sequence numbers without a global key
Disadvantages
Each identifier is at least six characters
table types numbers and region numbers cannot contain a zero because the zero is used to separate the sequence number from the region number from the table type number.
Make the number as long as necessary, but not any longer. Every time I pay my water bill, I have to enter my 20-digit customer number, and an 18-digit invoice number. Thankfully, a dash in my customer number separates it into two parts.
Do not depend on leading zeros. Having to figure out how many zeros are in my invoice number is extremely annoying. Take 000000000051415432 for example. Their system won't recognize just 51415432.
Group digits together. If you absolutely have to use long numbers, four-digit chunks should work well.
I would never use user information in IDs. Suppose you use the first letters of the customer's last name followed by some number: e.g. Thomsom could be customer THOM-0001.
Only, it appears you made a mistake, and the man's name is Tomson instead of Thomson. User data can be corrected, IDs should never be modifiable. So next time you look up Tomson under TOMS-... you can't find him. Same with other data, like a customer type. It can always change, the ID can't.
This is very basic to RDBMS.
Simply use counting numbers. For readability it's a good idea to insert separators such that you never have more than 4 successive digits: 9999-9999 is better than 999-99999. And don't make the number longer than necessary; people are much more annoyed by being reduced to a 20 digit number than just being reduced to a number.
There's a catch, though. Especially if you have a small business simple counters can give more away than you would appreciate. Say I order something from you, and the order number is 090145. Next month I order again, and the order number is 090171. Er.. 26 orders in a month? Same, I wouldn't feel comfortable to become customer 0006 in a business which has been active for 10 years.
The solution is simple: skip numbers. Don't use random numbers, because you still want them to be in sequence.
I would have my order numbers follow this format:
ddmmyyyy-####-####
Where ####-#### resets to zero at the beginning of every day. This makes it very easy to correlate orders with the date it was placed.
For customer IDs, I would mix capital letters and numbers, but as Michael said avoid commonly mistaken letters (0,o,L,1,5,s). This will give you 30 characters to deal with. If you use 20 characters, that will give you almost a 64 bit range of customer IDs -- pretty good for security. Make sure you use a secure random number generator when generating ID. As for how you display the format, it should be the following:
####-####-####-####-####
As Michael said again, make sure your system can deal with dashes, spaces, no spaces, or no dashes. (It should just strip all those characters from the input before validation.)
I hope that helps!
You may add a small checksum (using XOR for instance) to ensure (enhance) correctness of given ids.
If it's by mail, consider z-base-32 encoding. But here, with telephone orders, you may prefer decimal identification.
assuming that the creation of orders/customers is not centralized, or will not always be centralized, use a GUID
if the creation of orders/customers will always be centralized, an unsigned integer would be fine
there is no compelling reason for the order number of customer number to "mean" anything, and it is likely that any segmented number scheme invented will have to be overhauled down the road. Stick to something unique and meaningless.
EDIT: for MOTO, any multi-character alphabetical identifier will cause problems over the phone, so GUIDs are right out. Assuming multiple decentralized MOTO locations, assign each MOTO location a prefix (A, B, C, etc., or 01, 02, ...) and use an integer or big-integer for the customer and order IDs, e.g. 01-1 is the first order from MOTO location #1. Note that zero-padding is unnecessary, imposes an implicit digit limit to the numbers, and requires the customer to distinguish between six zeros and seven zeros when speaking the number. If you must use a padded fixed-length format, break the number up into groups of no more than 4 or 5 digits each.
ADDENDUM: the order number and the customer number do not have to be the primary keys of their respective tables, just unique indexed columns for lookup. You'll probably want to use something simpler/more efficient for the primary keys in the database.
We use leading zeroes for some of our references "numbers" where I work and I can't tell you how many wasted hours I've had over the last seven years forcing Excel to treat them as text. Don't do it.
Auto-incrementing integers are all well and good for computers, but they greatly reduce human beings ability to spot errors. How important that is will depend on your business. I work with property (housing) related data and our primary reference has the front door embedded in it. It's not elegant but it means that experienced admin staff can spot 90% of minor errors (when we get invoices, etc in) before they get near a database. But in an environment where you're not relying on that kind of process this argument is less compelling.
(Now, some folks have strongly warned about using meaningful data in references as it could be changed, and there's some truth in that, but you can be smart. You don't have to pick something obviously fickle like whether the person is married - you can anchor yourself on past events like a character representing the region they first opened a particular account. Even if you don't do that, have some kind of pattern to help communication with customers. I've worked in a number of call centres and people sometimes come to phone with every piece of documentation from birth certificate onwards as they desperately try to find their account/order/customer number. I don't think saying "It'll be a number between 1 and 100 trillion" would be very handy)
It's been said, but don't create enormously long references. We're busy people, we haven't got time to be keying in this crap over a phone system and making a mistake on digit 17 only to restart (again). Some of your customers may have disabilities and it's likely a growing number will be over 55+. Once again, watch out for the zeroes. You see purchase order numbers and the like with fourteen digits. How many orders do they think they're going to be placing?
If there's going to be any data aggregation outside of your network (and thus not connected to your database) - have some sort of check digit/regular expression pattern which your partners/suppliers can verify they've not made mistakes. One example of this is the UK's electrical supply numbering system (MPAN) is a good example of this - designed for people to maintain their own records without having to download the big list of every electricity meter in the universe to check they've not made a typo.
I would use numbers only since it is an international company. I would use spaces or dashes every 4-6 numbers to separate it. I would also keep the format separate for quick identification
Example:
000-00000-00000 - could be an customer number
00000-00000-00000-00000 - could be a order number
Stick to numbers (no chars or special stuff):
Can be easily input in an IVR flow
its international - No language hassles
No confusion in chars vs. numbers - O vs. 0, I vs. 1
As long as leading 0 is meaningless, you can store/manipulate them more effectively
I would use a completely numeric systems for both Order Number and Customer Number, this will allow you to avoid issues with other languages.
Avoid leading zeros, as this can cause issues with data entry and validation.
The number of digits for each will be dependent on your expected volume. You will always have a greater number of Order Numbers than Customer Numbers. A six digit Customer number starting at 100000 will still give you 899,999 customers. Add an additional 3-4 digits for the order number, will give you 999 to 9,999 orders per customer (more if you consider one off customers).
There is no need to build any sort of identification into your numbering sequence. You have other database fields to identify where a customer is from, etc. Do not overly complicate your system.
KISS (keep it simple stackoverflow)
I would suggest using 16 digit identifiers that when printed or shown to customers are formatted in the format of xxxx-xxxx-xxxx-xxxx but stored as numbers without the dashes in your system.
The reason for using this format is that it makes it easier for people reading out the number over to phone to read as they can do it in batches of 4 rather then trying to remember how much they have said already.
If you wish the first 4 digits can be used to identify the type of number, 1000 for customers, 2000 for suppliers, 3000 for orders, 4000 for invoices etc.
The second set can then by a year/month identifier if you wish to keep that sort of information encoded in the number itself, using a format of yymm so 1000-0903-xxxx-xxxx would be a customer entered in march 2009.
This then leaves you with 8 digits for the actual data itself.
I would consider the use of letters in the identifiers to be a very bad idea for any system that deals with telephones as the differences in accents and understand is so varied that people are bound to get upset at trying to get their identifier recognised by someone who cannot understand their accent properly.
An additional consideration to the format issue- in the code, create a separate class for OrderId and CustomerId. These classses are immutable, and validate their input to ensure that they are acceptable IDs. Also, no value could be and order ID and a customer ID.
The simplest approach would just be to have the backing values for OrderId be ints that start with 1, and CustomerIds be ints that start with 2, or something similar.
Wow - what a simple yet revealing question! And what a lot of contradictory answers. I think there are 3 obvious candidate answers here:
1) Use an autoincrementing long integer.
2) Use a GUID
3) Use a compound type that includes other information in the ID.
For simpler systems, and especially web based systems where all users are hitting a central database, (1) works well. It has the advantage that numbers stay as short and simple as possible, but no shorter, avoids alphabetic characters (you would be amazed how different the names for the same letters are in different countries - one countries E is another countries I). It does not differentiate the order ID from the customer ID intrinsically, but you could always prepend or append a "C" or "O" to each and silently drop them on entry?
It also does not have a checksum or error check.
For distributed systems where many software components need to create the numbers on the fly, without reference to a master database (2) is the only way to go. They have the advantage of being largely error checking, since the address space is so large, but by the same token, are too long and alphanumeric to comfortably read over the telephone.
As for (3) - embedding region information or today's date into the number - those are the sorts of ideas that experienced developers train themselves out of. Looks like a good idea at first, but always comes back to haunt you. Consider the case where a customer moves to a new state, or an order is manually rekeyed a week after originally issued? These items of information belong in related tables where they can be edited independantly of the ID which should represent the entities identity only.
To repeat: NEVER ENCODE BUSINESS DATA IN AN ID OR PRIMARY KEY - every time you do that you leave a time bomb for others to clean up one day.
Given that this is a centralised (phone based) system I would go with option (1) until a clear need arose to change. Simpler is usually better. Insert hyphens as others suggest and prepend or postpend a checksum and/or identifying letter if required.
First step: in an org sufficiently large to require such a system, there is an existing system that you're replacing. Continue the previous system's scheme, if possible. It makes a lot of things easier if you can access, even at a basic level, the data from the old system.
That said, there's often a good reason to change the scheme, particularly when it's coming from a legacy system. i find, though, that it's often helpful to formally rule out the old scheme before proceeding.
Second step: systems like this never exist in a vacuum. Is there already an organization-wide scheme for user and/or order IDs, such as in the accounting, inventory management, or CRM system? If so, consider adopting the existing schemes to make interoperability easier. Many large orgs have multiple ways to specify a single customer or order, and it just makes getting useful intelligence out of the data that much harder.
Third step: if the old system's scheme is too awful to continue and there's no other scheme to adopt, roll your own. In this case, look at the shortcomings of the original scheme, whatever they are, and correct them. The right answer will depend on the specific requirements of the application. The problem statement you've given us is too vague to speculate usefully on what the final form might look like.
I always stick with auto-increment numbers, and I always seed the sequence high enough so that they will all have a consistent number of digits - seems to be less confusing.
I also sometimes start an order number, say 6 digits, starting at 200,000 and customer numbers at 5 digits, starting at 10,000 which would for example give me 90,000 unique customer numbers and 800,000 unique order numbers to use, and you could always tell just by looking at it whether it was a customer number or an order number. (i.e. so if a customer rep was asking for a number over the phone it would immediately be obvious which was which)
I would not however build logic in the app that would depend on that, so even if it did roll over, the system wouldn't care.
The biggest issue here is to try not to overthink the problem.
Although I'm more experienced in e-commerce systems I think some of the points made in this post could be applied to mail order and telephone order systems.
For orders, an auto-increment integer works perfectly as the primary key in the database as well as the number that the customer will see on his/her invoice. There is absolutely no reason to create some overcomplicated algorithm for your numbers. If you want to tell which country/region they're from use a separate field in your database. Also if you are concerned about your competitors spying on you; let them! If your business revolves around spying on your competitors because you're not generating enough revenue then most likely your businessidea isn't good in the first place. Also if you wanted to fool your competitor you could just create your own script that will autocreate fake orders. If your e-commerce system is well designed then this won't be an issue.
Key stuff using an auto-increment integer:
All numbers/digits => easier to communicate, no ambiguities over the phone, works for all languages/cultures that use 0-9 as their numerical system
No extra coding
Looks nice on the invoice and it's the shortest possible number of digits a customer would ever need to spell out over the phone
Works for small AND large businesses
It's scalable
Serviceminded/Customerminded (What's best for the customer) (se bullets 1 and 3)
Simple
Whenever or whatever you're designing should always begin with what's best for the customer. At the end of the day they are the ones putting food on your table. A happy customer is a returning customer.
For me, my preferred is getting the combination of date + a counter for today's transaction. I was challenged to come up with only 5 digit order number. So with that, I come up with the following below:
I have to get the current date then
get the current counter for today's transaction then add 1.
I decided to use a counting larger than decimal(10), so i use base 16 for counting. So with that, if i will get the max of 5 digit out of hexadecimal(FFFFF) that will be 1,048,575 counts. By involving the date, I can say I can get 1,048,575 counts per day. So to make that count unique every day, I mixed the date by getting the sum of the following:
Current Year count starting from the year of the implementation which is 1
Current hour(max is 24)
Current day of the year(max is 365)
So with that, I will have a max 3 characters start for my counting. So that will be XXX + Todays current transaction. Example:
Current Date:
2014-12-31 01:22 PM
Implementation date: 2010
Running total for today's transaction: 100
Count: (5 + 13 + 365) + 101 = 383101
Order Number: AD-5D87D
AD there is just a custom order number prefix. So by the time i will be out of order number that will 1000000 years from the time of my implementation date.
Anyway, this is not a good solution if you think your transaction per day can be high as 1000000 counts.

Resources