Systems have to sometimes accommodate the possibility of real world bad data. Consider that some data originates with paper forms. And forms inherently have a limited means of validating data.
Example 1: On one form users are expected to enter an integer distance (in miles) into a blank. We capture the information as written as a string since we don't always end up getting integer values.
Example 2: On another form we capture a code. That code should map to one of the codes in our system. However, sometimes the code written on the form is incorrect. We capture the code and allow it to exist with an invalid value until some future time of resolution. That is, we temporarily allow bad data since it's important to record the record even if some of it is invalid.
I'm interested in learning more about how systems accommodate bad data, that is, human error. Databases are supposed to be bastions of data integrity, but the real world is messy and people make mistakes. Systems must allow us to reflect those mistakes.
What are some ways systems you've developed accommodate human error? What practices have you used? What lessons have you learned?
Any further reading on the topic? (I had trouble Googling it.)
I agree with you, whatever we do there's no guarantee that we can get rid of bad or incorrect data. Especially, but not only, if it comes to user input. In my experience the same problems exist in complex integration projects, in which you have to integrate and merge (often inconsistent) data retrieved from different systems.
A good strategy is to decouple the input from the operational system itself. First, place user (or external system) provided data in a separate datastore (e.g. different schema). In a second step load this data into your operational datastore, but only if it confirms to strict rules (e.g. use address verification software to verify a given address). This Extract, Transform, Load (ETL) approach is fairly common in Data Warehousing (DWH) solutions, but can be applied programmatically in transactional systems as well (in my experience).
The above approach often leads to asynchronous processes in which the input is subitted first and (maybe) at a later time the external entity (user or system) retrives feedback whether its data was correct or not.
EDIT: For further readings I recommend to have a look at DWH concepts. Alhtough, you may not want to build such a thing, you could partially apply those concepts:
http://en.wikipedia.org/wiki/Extract,_transform,_load
http://en.wikipedia.org/wiki/Data_warehouse
http://en.wikipedia.org/wiki/Data_cleansing
A government department I worked in does a lot of surveys, most of which are (were) still paper based.
All the results were OCR'd into the system.
As part of the OCR process a digital scan of the forms is kept.
Data is then validated, data that is undecipherable or which fails validation is flagged.
When a human operator reviews the digital data they can modify the data if they are confident that they can correctly interpret what the code could not; they (here's the cool bit) can also bring up the scan of the paper based original, and use that to determine what the user was trying to say.
On a different thread; at some point you want to validate the data coming in against any expected data ranges that you want it to conform to; buy rejecting it at the point of entry you give the user a chance to correct it - the trade off is that every time you reject it you increase the chance of them abandoning the whole process.
At some point in your system you need to specify the rules which will be used for validation. At the end of the day a system is only going to be as smart as those rules. You can develop these yourself into the code (probably the business logic) or you might use a 3rd party component.
having flexible control over the validation is pretty important as they are likely to change overtime.
To be honest with you, one point of migrating from paper-based systems to IT is to remove these errors and make sure all data is always correct. I doubt any correctly planned and developed IT system (especially business financial systems) would allow such errors. Not in the company I am working for anyway...
There are lots of software tools that address the kinds of problems you mention. There are platforms and tools that let you define rules for scrubbing and transforming data and handling validation errors. Those techniques are widely used for Data Integration and Business Intelligence applications. Google for "Data Quality" or "Data Integration".
The easiest thing to do is to (this is not always possible) design the interface where users enter the data to limit as much as possible the amount of text that they need to enter. In my experience this seems to be where a lot of problems come from. One simple example of this is to provide a select, or auto-complete select field
One thing that you could do is do everything possible to determine if the data is correct before going into the db. I try to give the user entering the data as much feedback as possible so they can (ideally) fix some of the issues before the data gets persisted. For example, it is a very quick check to determine if the data being entered is of the correct type.
I got started in legal systems before the PC era. Litigation support databases routinely have to accommodate factually incorrect, incomplete, and contradictory information. It takes a different way of thinking.
The short version . . .
Instead of recording a single fact, you record multiple assertions about a fact. It boils down to designing a database to store data from assertions like these.
In an interview at 2011-01-03 08:13, Neil Rimes told Officer Cane
that he was at home from 2011-01-02 20:00 until 2011-01-03 08:13.
In an interview at 2011-01-03 08:25, Liza Nevers told Officer Cane
that Neil Rimes came home at 2011-01-02 23:45.
In a deposition at 2011-05-13 10:22, Cody Maxon told attorney Kurt
Schlagel that he saw Neil Rimes at Kroger at 2011-01-03 03:00
I've noticed that error messages tend to be written in a handful of common styles. Either in full-form, casual-friendly sentences, or in shortened passive ones that don't always form a full sentence. The latter of the two seems to be the more common - though maybe not as common as the haphazard mixing of styles that I see in a lot of applications.
Does anyone include error messages in their style guide? I'm more curious about opinions on the consistent grammatical construction of these things than about the content of them, which has already been discussed.
Both the iPhone Human Interface Guidelines and the Apple Human Interface Guidelines contain sections on alerts.
Also, the Windows User Experience Interaction Guidelines have information for a few different types of dialogs, including error messages.
(iPhone)
As you compose the required alert title:
Keep the title short enough to display on a single line, if possible.
A long alert title is difficult for
people to read quickly, and it might
get truncated or force the alert
message to scroll.
Avoid single-word titles that don’t provide any useful information, such
as “Error” or “Warning.”
When possible, use a sentence fragment. A short, informative
statement is often easier to
understand than a complete sentence.
Don’t hesitate to be negative. People understand that most alerts
tell them about problems or warn them
about dangerous situations. It’s
better to be negative and direct than
it is to be positive but oblique.
Avoid using “you,” “your,” “me,” and “my” as much as possible. Sometimes,
text that identifies people directly
can be ambiguous and can even be
interpreted as an insult.
Use title-style capitalization and no ending punctuation when:
The title is a sentence fragment
The title consists of a single sentence that is not a question
Use sentence-style capitalization and an ending question mark if the
title consists of a single sentence
that is a question. In general,
consider using a question for an alert
title if it allows you to avoid adding
a message.
Use sentence-style capitalization and appropriate ending punctuation for
each sentence if the title consists of
two or more sentences. A two-sentence
alert title should seldom be
necessary, although you might consider
it if it allows you to avoid adding a
message.
If you provide an optional alert message:
Create a short, complete sentence that uses sentence-style
capitalization and appropriate ending
punctuation.
Avoid creating a message that is too long. If possible, keep the message
short enough to display on one or two
lines. If the message is too long it
will scroll, which is not a good user
experience.
Avoid lengthening your alert text with descriptions of which button to
tap, such as “Tap View to see the
information.” Ideally, the combination
of unambiguous alert text and logical
button labels gives people enough
information to understand the
situation and their choices. However,
if you must provide detailed guidance,
follow these guidelines:
Be sure to use the word “tap” (not “touch” or “click” or “choose”) to
describe the selection action.
Don’t enclose a button title in quotes, but do preserve its
capitalization.
-
(Mac)
Alert message text. This text, in emphasized (bold) system font, provides a short, simple summary of the error or condition that summoned the alert. This should be a complete sentence; often it is presented as a question. See “Writing Good Alert Messages” for more information.
Informative text. This text appears in the small system font and provides a fuller description of the situation, its consequences, and ways to get out of it. For example, a warning that an action cannot be undone is an appropriate use of informative text.
A good alert message states clearly what caused the alert to appear and what the user can do about it. Express everything in the user’s vocabulary.
To make the alert really useful, provide a suggestion about what the user can do about the current situation. Even if the alert serves as a notification to the user and no further action is required, provide as much information as needed to describe the situation.
-
(Microsoft)
The characteristics of good error messages
In contrast to the previous bad
examples, good error messages have:
A problem. States that a problem occurred.
A cause. Explains why the problem occurred.
A solution. Provides a solution so that users can fix the problem.
Additionally, good error messages are
presented in a way that is:
Relevant. The message presents a problem that users care about.
Actionable. Users should either perform an action or change their
behavior as the result of the message.
User-centered. The message describes the problem in terms of target user
actions or goals, not in terms of what
the code is unhappy with.
Brief. The message is as short as possible, but no shorter.
Clear. The message uses plain language so that the target users can
easily understand problem and
solution.
Specific. The message describes the problem using specific language,
giving specific names, locations, and
values of the objects involved.
Courteous. Users shouldn't be blamed or made to feel stupid.
Rare. Displayed infrequently. Frequently displayed error messages
are a sign of bad design.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
If you program for a nontechnical audience, you find yourself at a high risk that users will not read your carefully worded and enlightening error messages, but just click on the first button available with a shrug of frustration.
So, I'm wondering what good practices you can recommend to help users actually read your error message, instead of simply waiving it aside. Ideas I can think of would fall along the lines of:
Formatting of course help; maybe a simple, short message, with a "learn more" button that leads to the longer, more detailed error message
Have all error messages link to some section of the user guide (somewhat difficult to achieve)
Just don't issue error messages, simply refuse to perform the task (a somewhat "Apple" way of handling user input)
Edit: the audience I have in mind is a rather broad user base that doesn't use the software too often and is not captive (i.e., no in-house software or narrow community). A more generic form of this question was asked on slashdot, so you may want to check there for some of the answers.
That is an excellent question worthy of a +1 from me. The question despite being simple, covers many aspects of the nature of end-users. It boils down to a number of factors here which would benefit you and the software itself, and of course for the end-users.
Do not place error messages in the status bar - they will never read them despite having it jazzed up with colours etc....they will always miss them! No matter how hard you'll try... At one stage during the Win 95 UI testing before it was launched, MS carried out an experiment to read the UI (ed - it should be noted that the message explicitly stated in the context of 'Look under the chair'), with a $100 dollar bill taped to the underside of the chair that the subjects were sitting on...no one spotted the message in the status bar!
Make the messages short, do not use intimidating words such as 'Alert: the system encountered a problem', the end-user is going to hit the panic button and will over-react...
No matter how hard you try, do not use colours to identify the message...psychologically, it's akin to waving a red-flag to the bull!
Use neutral sounding words to convey minimal reaction and how to proceed!
It may be better to show a dialog box listing the neutral error message and to include a checkbox indicating 'Do you wish to see more of these error messages in the future?', the last thing an end-user wants, is to be working in the middle of the software to be bombarded with popup messages, they will get frustrated and will be turned off by the application! If the checkbox was ticked, log it to a file instead...
Keep the end-users informed of what error messages there will be...which implies...training and documentation...now this is a tricky one to get across...you don't want them to think that there will be 'issues' or 'glitches' and what to do in the event of that...they must not know that there will be possible errors, tricky indeed.
Always, always, be not afraid to ask for feedback when the uneventful happens - such as 'When that error number 1304 showed up, how did you react? What was your interpretation' - the bonus with that, the end-user may be able to give you a more coherent explanation instead of 'Error 1304, database object lost!', instead they may be able to say 'I clicked on this so and so, then somebody pulled the network cable of the machine accidentally', this will clue you in on having to deal with it and may modify the error to say 'Ooops, Network connection disconnected'... you get the drift.
Last but not least, if you want to target international audiences, take into account of internationalization of the error messages - hence that's why to keep it neutral, because then it will be easier to translate, avoid synonyms, slang words, etc which would make the translation meaningless - for example, Fiat Ford, the motor car company was selling their brand Fiat Ford Pinto, but noticed no sales was happening in South America, it turned out, Pinto was a slang there for 'small penis' and hence no sales...
(ed)Document the list of error messages to be expected in a separate section of the documentation titled 'Error Messages' or 'Corrective Actions' or similar, listing the error numbers in the correct order with a statement or two on how to proceed...
(ed) Thanks to Victor Hurdugaci for his input, keep the messages polite, do not make the end-users feel stupid. This goes against the answer by Jack Marchetti if the user base is international...
Edit: A special word of thanks to gnibbler who mentioned another extremely vital point as well!
Allow the end-user to be able to select/copy the error message so that they can if they do so wish, to email to the help support team or development team.
Edit#2: My bad! Whoops, thanks to DanM who mentioned that about the car, I got the name mixed up, it was Ford Pinto...my bad...
Edit#3: Have highlighted by ed to indicate additionals or addendums and credited to other's for their inputs...
Edit#4: In response to Ken's comment - here's my take...
No it is not, use neutral standard Windows colours...do not go for flashy colours! Stick to the normal gray back-colour with black text, which is a normal standard GUI guideline in the Microsoft specifications..see UX Guidelines (ed).
If you insist on flashy colours, at least, take into account of potential colour-blind users i.e. accessibility which is another important factor for those that have a disability, screen magnification friendly error messages, colour-blindness, those that suffer with albino, they may be sensitive to flashy colours, and epileptics as well...who may suffer from a particular colours that could trigger a seizure...
Show them the message. Due dilligence and all, but log every error to a file. Users can't remember what they were doing or what the error message was seconds after the event, it's like eye-witness accounts of perpatrators.
Provide a good way to allow them to email or upload the log to you so that you can assist them in reconciling the issue. If it's a web application: even better, you can be receiving information about the situation ahead of anyone even reporting the problem.
Short answer: You can't.
Less short answer: Make them visible, relevant, and contextual (highlight what they messed up). But still, you're fighting a losing battle. People don't read on computer screens, they scan, and they've been trained to click the buttons until the dialog boxes go away.
We put a simple memorable graphic in the error box: not an icon, a fairly large bitmap, and nothing like the standard Windows message icons. Nobody can ever remember the wording of a messagebox (most won't even read it if the box has an "OK" button they can press), but most people DO remember the picture they saw. So our support people can ask the customer "did you see the coffee-drinking guy?" or "did you see the empty desk?". At least that way we know roughly what went wrong.
Depending on your user base, writing funny/rude/personal error messages can work great.
For instance, I wrote an application which allowed our HR people to better track the hire/fire dates of employees. [we were a small company, very laid back].
When they entered wrong dates I would write:
Hey dumb ass, learn how to enter a date!
EDIT: Of course a more helpful message is to say: "Please enter date as mm/dd/yyyy" or perhaps in code to try and figure out what they entered and if they entered "blahblah" to show an error. However, this was a very small application for an HR person I knew personally. Hence again people, read the first line of this post: Depending on your user base...
I recently worked on an Art Institute project, so the error messages were geared towards the audience, such as:
Most art before the Baroque period was
unsigned. However, we’re beyond the
Baroque period now, so all fields must
be completed.
Basically gear it to your audience if at all possible, and avoid boring as all unearthly general errors such as: "please enter email" or "please enter valid email".
Alerts/popups are annoying, that's why everyone hits the first button they see.
Make it less annoying. Example: if the user entered the date incorrectly, or entered a text where numbers are expected, then DON'T popup a message, just highlight the field and write a message somewhere around it.
Make a custom message box. Do not ever use the default message box of the system, for example Windows XP message boxes are annoying themselves. Make a new colored message box, with a different background color than system default.
Very Important: do not insist. Some message boxes use the Modal dialog and insist on making you read it, this is very annoying. If you can make the message box appear as a warning message it would be better, for example, Stack Overflow messages that appear right on the top of the page, informing but not annoying.
UPDATE
Make the message meaningful and helpful. For example, do not write something like, "No Keyboard found, press F1 to continue."
The best UI design will be where you virtually never show an error message. The software should adapt to the user. With that sort of a design, an error message will be novel and will grab the users attention. If you pepper the user with senseless dialogs like that you're explicitly training them to ignore your messages.
In my opinion and experience, it's the power users, who do not read error messages. The nontechnical audience I know reads every message on the screen most carefully and the problem at this point mostly is: They don't understand it.
This point may be the cause of your experience, because at some point they will stop reading them, because "they don't understand it anyway", so your task is easy:
Make the error message as easy to understand as possible and keep the technical part under the hood.
For example I transfer a message like this:
ORA-00237: snapshot operation disallowed: control file newly created
Cause: An attempt to invoke cfileMakeAndUseSnapshot with a currently mounted control file that was newly created with CREATE CONTROLFILE was made.
Action: Mount a current control file and retry the operation.
to something like:
This step could not be processed due to momentary problems with the database. Please contact (your admin|the helpdesk|anyone who can contact the developer or admin to solve the problem). Sorry for the inconvenience.
Show users that the error message has a meaning, and it's a way to provide assistance to them and they will read it. If it's just jargon-bable or generic nonsense message they will learn to dismiss them quicly.
I have learned that is very good practice to include an error dialog with default action to send (eg. via email) detailed diagnostic info, if you quickly respond to those emails with valuable information or workaround, they will worship you.
This is also a great learning tool. In future versions you can solve known-issues or at least provide in-place workaround info. Until then users will learn that this message is caused by X and the problem can by solved by Y - all because someone did explain it to them.
Of course this won't work on a large scale application, but works very well in enterprise applications with few hundred users, and in a lean agile, release early release often, environment.
EDIT:
Since you have a broad user base I recommend to provide software that does what users are/can expect it to do, eg. do not show them eroror message if phone number is not formatted well, reformat if for them.
I personally like software that does not make me think, and when occasionally there is nothing you (the developer) can do to interpret my intention, provide a very well written (and reviewed by actual users) messages.
It's common knowlege that people do not read documentation (did you read instructions back-to-back do when you did plugged in household appliance?), they try a way to get results quickly, when failed you have to grab their attention (eg. disable default button for a while) with meaningful and helpful info. They don't care about your sofware failure, they want to get results, now.
One good tip I've learned is that you should write a dialog box like a newspaper article. Not in the size-sense, but in the importance-sense. Let me explain.
You should write the most important things to read, first, and provide more detailed information second.
In other words, this is no good:
There was a problem loading the file, the file might have been deleted, or
it might be present on a network share that you don't have access to at
your present location.
Do you want to retry opening the file?
Instead, change the order:
Problem loading file, do you want to retry?
There was a problem loading the file, the file might have been deleted, or
it might be present on a network share that you don't have access to at
your present location.
This way, the user can read just as much as he wants, or bothers, and still have an idea about what's being asked.
To start, write error messages that users can actually understand. "Error: 1023" is not good example. I think better way is logging the error, than showing it to the user with some "fancy" code. Or if logging is not possible, give the users proper way to send the error details to the support department.
Also, be short and clear enough. Do not include some technical details. Do not show them information that they cannot use. If possible provide a workaround for the error. If not provide a default route, that should be taken.
If your application is a web app, designing custom error pages is a good idea. They stress users less, take SO for example. You can get some ideas how to design a good error page here: http://www.smashingmagazine.com/2007/07/25/wanted-your-404-error-pages/
Make them fun. (It seemed relevant, given the site we're on :) )
One thing I'd like to add.
Use verbs for your action buttons to close your error messages rather than exclamations, example don't use "Ok!" "Close" etc.
Unless you can provide the user some simple work-around, don't bother showing the user an error message at all. There is just no point, since 90% of users won't care what it says.
On the other hand If you CAN actually show the user a useful workaround, then one way to force them to read it is make the OK button become enabled after 10 seconds or so. Sort of how Firefox does it whenever you are trying to install a new plug-in.
If it is a total crash that you cannot gracefully recover from, then inform the user in very layman terms saying:
"I'm sorry we screwed up, we would like to send some information about this crash, will you allow us to do so? YES / NO"
In addition, try not to make your error messages longer than a sentence. When people (me included) see a whole paragraph talking about the error, my mind just shuts off.
With so much social media and information overload, people's mind freeze when they see a wall of text.
EDIT:
Someone once also recently suggested using comic strips along with whatever message you want to show. Such as something from Dilbert that may be close to the type of error you may have.
From my experience: you don't get users (especially non-technical ones) to read error messages. No matter how clear and understandable, bold, red and flashing the message is, that you display, most users will just click anything away that they're not used to, even if it's "Do you really want to delete everything?". I have seen users click the "window close"-icon instead of "OK" or "cancel" even though they didn't even know which option they chose by doing so ...
If you really need to force users to read what you're displaying, I'd suggest a JavaScript-Countdown until a button is clickable. That way the user will hopefully use the waiting time to really read what he's supposed to. Be careful though: most users will be even more annoyed by that :)
I furthermore like your idea of a "read more"-link, although I doubt that will get users more interested that just want to get rid of the message by all means ...
Just for the record: there are users that DO read error messages but are so afraid that they won't do anything with it. I once had a support call where the customer would read an error message to me, asking me, what he should do. "Well, what are your options?", I asked. "The window only has an 'OK'-button.", he replied. ... mmh, hard one :)
I often display the error in red (when the design allows it).
Red stands for "alert", etc. so it's more often read.
Well, to answer your question directly: Don't have your programmers write your error messages. If you follow this one piece of advice, you'd save, cumulatively, thousands of hours of user angst and productivity and millions of dollars in technical support costs.
The real goal, however, should be to design your application so users can't make mistakes. Don't let them take actions that lead to error massages and require them to back up. As a simple example, in a web form that requires all its fields to be filled in, instead of popping up an error message when users click on the Send button, don't enable the Send button until all the field contain valid content. It means more work on the back side, but it results in a better user experience.
Of course, that's a bit of an ideal world. Sometimes, program errors are unavoidable. When they do occur, you need to provide clear, complete, and useful information, and most importantly, don't expose the system to user and don't blame users for their actions.
A good error message should contain:
What the problem is and why it happened.
How to resolve the problem.
One of the worst things you can do is simply pass system error messages through to users. For example, when your Java program throws an exception, don't simply pass the programmer-ese up to the UI and expose it to the user. Catch it, and have a clear message created by your user assistance developer that you can present to your user.
I was lucky enough, on my last job, to work with a team of programmer who wouldn't think of writing their own error messages. Any time they found themselves in a situation where one was required and the program couldn't be designed to avoid it (often because of limited resources), they always came to me, explained what they needed, and let me create an error message that was clear and followed company style. If that was the default mindset of every programmer, the computing world would be a far, far better place.
Less errors
If an application throws vomit at you on a regular basis, you become immune to it, and errors become irritating background muzak. If an error is a rare event, it will garner more attention.
Quosh anything which isn't a major deal, throw out all those warnings, find ways of understanding user intent, take out the decisions wherever possible. I have a few apps which I continue to streamline in this way. Developers see every error as important, but this is not true from a user perspective. Look for the users' common response to a problem and capture that, deploy that as your response.
If you do need to raise an error: short, concise, low terror factor, no exclamation marks. Paragraphs are fail.
There's no silver bullet, but you need to socially engineer to make errors important.
We told users their manager had been contacted (which was a lie). It worked a little too well and had to be removed.
Adding an "Advanced" button that enables some more technical details will provide an incentive to read it for the part of the target audience that thinks itself as technical
I'd suggest that you give feedback (stating that the user made a mistake) immediately after the mistake is made. (For instance, when entering a value of a date field, check the value and, if it is wrong, make the input field visually different).
If there are errors on the page (I'm more into web development, hence I'm referring to it as a "page", but it can be also called "form"), show an "error summary", explaining that there were errors and a bulleted list of what exactly errors happened. However, if there are more than 5-6 words per message, those won't be read/understood.
How about making the button state "Click here to speak with a support technician who will assist you with this issue."
There are many websites that provide the option to speak with a real person.
I read a candidate for the most horrific solution on slashdot:
We have found that the only way to
make users take responsibility for
errors is to give them a penalty for
forcing the error to go away. For
starters, where possible, the error
wont actually close for them unless we
enter an admin password to make it go
away, and if they reboot to get rid of
it (Task Manager is disabled on all
client PC's) the machine will not open
the application that crashed for 15
minutes. Of course, this all depends
on the type of users you are dealing
with, as more technically adept users
wouldnt accept this kind of system,
but after trying for literally YEARS
to make users take responsibility for
crashes and making sure the IT
department is aware of them in order
to fix the issue before it gets too
hard to manage, these are the only
steps that worked. Now, all of our end
users are aware that if they ignore
errors, they are going to suffer for
it themselves.
"ATTENTION! ATTENTION! If you do not read error message you WILL DIE!"
Despite all the recommendations in the accepted answer, my users continued to click the first button they could find. So now I show this:
The user has to make a choice before the OK button appears
If he selects the 3rd option, he can continue, otherwise the application quits.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Is it even possible to perform address (physical, not e-mail) validation? It seems like the sheer number of address formats, even in the US alone, would make this a fairly difficult task. On the other hand it seems like a task that would be necessary for several business requirements.
Here's a free and sort of "outside the box" way to do it. Not 100% perfect, but it should reject blatantly non-existent addresses.
Submit the entire address to Google's geocoding web service. This service attempts to return the exact coordinates of the location you feed it, i.e. latitude and longitude.
In my experience if the address is invalid you will get a result of 602 from the service. There's definitely a possibility of false positives or false negatives, but used in conjunction with other consistency checks it could be useful.
(Yahoo's geocoding web service, on the other hand, will return the coordinates of the center of the town if the town exists but the rest of the address is bogus. Potentially useful as long as you pay close attention to the "precision" field in the result).
There are a number of good answers in here but most of them make the assumption that the user wants an "API" solution where they must write code to connect to a 3rd-party service and/or screen scrape the USPS. This is all well and good, but should be factored into the business requirements and costs associated with the implementation and then weighed against the desired benefits.
Depending upon the business requirements and the way that the data is received into the system, a real-time address processing solution may be the best bet. If a real-time solution is required, you will want to consider the license agreement and technical limitations of the Google Maps/Bing/Yahoo APIs. They typically limit the number of calls you can make each day. The USPS web tools API is the same in additional they restrict how/why you can use their system and how you are allowed to use the data thereafter.
At the same time, there are a handful of great service providers that can easily process a static list of addresses. Essentially, you give the service provider a CSV file or Excel file, they clean it up and get it back to you. It's a one-time deal with no long-term commitment or obligation—usually.
Full disclosure: I'm the founder of SmartyStreets. We do address verification for addresses within the United States. We are easily able to CASS certify a list and we also offer a address verification web service API. We have no hidden fees, contracts, or anything. You use our service until you no longer need it and you can walk away. (Unlike cell phone companies that require a contract.)
USPS has an address cleaner online, which someone has screen scraped into a poor man's webservice. However, if you're doing this often enough, it'd be a better idea to apply for a USPS account and call their own webservice.
I will refer you to my blog post - A lesson in address storage, I go into some of the techniques and algorithms used in the process of address validation. My key thought is "Don't be lazy with address storage, it will cause you nothing but headaches in the future!"
Also, there is another StackOverflow question that asks this question entitled How should international geographic addresses be stored in a relational database.
In the course of developing an in-house address verification service at a German company I used to work for I've come across a number of ways to tackle this issue. I'll do my best to sum up my findings below:
Free, Open Source Software
Clearly, the first approach anyone would take is an open-source one (like openstreetmap.org), which is never a bad idea. But whether or not you can really put this to good and reliable use depends very much on how much you need to rely on the results.
Addresses are an incredibly variable thing. Verifying U.S. addresses is not an easy task, but bearable, but once you're going for Europe, especially the U.K. with their extensive Postal Code system, the open-source approach will simply lack data.
Web Services / APIs
Enterprise-Class Software
Money gets it done, obviously. But not every business or developer can spend ~$0.15 per address lookup (that's $150 for 1,000 API requests) - a very expensive business model the vast majority of address validation APIs have implemented.
What I ended up integrating: streetlayer API
Since I was not willing to take on the programmatic approach of verifying address data manually I finally came to the conclusion that I was in need of an API with a price tag that would not make my boss want to fire me and still deliver solid and reliable international verification results.
Long story short, I ended up integrating an API built by apilayer, called "streetlayer API". I was easily convinced by a simple JSON integration, surprisingly accurate validation results and their developer-friendly pricing. Also, 100 requests/month are entirely free.
Hope this helps!
I have used the services of http://www.melissadata.com Their "address object" works very well. Its pricey, yes. But when you consider costs of writing your own solutions, the cost of dirty data in your application, returned mailers - lost sales, and the like - the costs can be justified.
For us-based address data my company has used GeoStan. It has bindings for C and Java (and we created a Perl binding). Note that it is a commercial product and isn't cheap. It is quite fast though (~300 addresses per second) and offers features like CASS certification (USPS bulk mail discount), DPV (Delivery point verification) flagging, and LON/LAT geocoding.
There is a Perl module Geo::PostalAddress, but it uses heuristics and doesn't have the other features mentioned for GeoStan.
Edit: some have mentioned 'doing it yourself', if you do decide to do this, a good source of information to start with is the US Census Tiger Data Set, which contains a lot of information about the US including address information.
As seen on reddit:
$address = urlencode('1600 Pennsylvania Avenue, Washington, DC');
$json = json_decode(file_get_contents("http://where.yahooapis.com/geocode?q=$address&flags=J"));
print_r($json);
Fixaddress.com service is available that provides following services,
1) Address Validation.
2) Address Correction.
3) Address spell correcting.
4) Correct addresses phonetic mistakes.
Fixaddress.com uses USPS and Tiger data as reference data.
For more detail visit below link,
http://www.fixaddress.com/
One area where address lookups have to be performed reliably is for VOIP E911 services. I know companies reliably using the following services for this:
Bandwidth.com 9-1-1 Access API MSAG Address Validation
MSAG = Master Street Address Guide
https://www.bandwidth.com/9-1-1/
SmartyStreet US Street Address API
https://smartystreets.com/docs/cloud/us-street-api
There are companies that provide this service. Service bureaus that deal with mass mailing will scrub an entire mailing list to that it's in the proper format, which results in a discount on postage. The USPS sells databases of address information that can be used to develop custom solutions. They also have lists of approved vendors who provide this kind of software and service.
There are some (but not many) packages that have APIs for hooking address validation into your software.
However, you're right that its a pretty nasty problem.
http://www.usps.com/ncsc/ziplookup/vendorslicensees.htm
As mentioned there are many services out there, if you are looking to truly validate the entire address then I highly recommend going with a Web Service type service to ensure that changes can quickly be recognized by your application.
In addition to the services listed above, webservice.net has this US Address Validation service. http://www.webservicex.net/WCF/ServiceDetails.aspx?SID=24
We have had success with Perfect Address.
Their database has all the US street names and street number ranges. Also acts as a pretty decent parser for free-form address fields, if you are lucky enough to have that kind of data.
Validating it is a valid address is one thing.
But if you're trying to validate a given person lives at a given address, your only almost-guarantee would be a test mail to the address, and even that is not certain if the person is organised or knows somebody at that address.
Otherwise people could just specify an arbitrary random address which they know exists and it would mean nothing to you.
The best you can do for immediate results is request the user send a photographed / scanned copy of the head of their bank statement or some other proof-of-recent-residence, because at least then they have to work harder to forget it, and forging said things show up easily with a basic level of image forensic analysis.
There is no global solution. For any given country it is at best rather tricky.
In the UK, the PostOffice controlls postal addresses, and can provide (at a cost) address information for validation purposes.
Government agencies also keep an extensive list of addresses, and these are centrally collated in the NLPG (National Land and Property Gazetteer).
Actually validating against these lists is very difficult. Most people don't even know exactly how their address as it is held by the PostOffice. Some businesses don't even know what number they are on a particular street.
Your best bet is to approach a company that specialises in this kind of thing.
Yahoo has also a Placemaker API. It is good only for locations but it has an universal id for all world locations.
It look that there is no standard in ISO list.
You could also try SAP's Data Quality solutions which are available in both a server platform is processing a large number of requests or as an embeddable SDK if you wanted to run it in process with your application. We use it in our application and it's very robust and scalable.
NAICS.com is coming out with an API that will add all kinds of key business data including street address. This would happen on the fly as your site's forms are processed. https://www.naics.com/business-intelligence-api/
You can try Pitney Bowes “IdentifyAddress” Api available at - https://identify.pitneybowes.com/
The service analyses and compares the input addresses against the known address databases around the world to output a standardized detail. It corrects addresses, adds missing postal information and formats it using the format preferred by the applicable postal authority. I also uses additional address databases so it can provide enhanced detail, including address quality, type of address, transliteration (such as from Chinese Kanji to Latin characters) and whether an address is validated to the premise/house number, street, or city level of reference information.
You will find a lot of samples and sdk available on the site and i found it extremely easy to integrate.
For US addresses you can require a valid state, and verify that the zip is valid. You could even check that the zip code is in the right state, but beyond that I don't think there are many tests you could run that wouldn't provide a lot of false negatives.
What are you trying to do -- prevent simple mistakes or enforcing some kind of identity check?