Which Google places API query results are allowed to be stored in a database? [closed] - google-api

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
Improve this question
I am exploring the Google APIs, mostly the Places API. Because the number of requests to the Google Places API is limited to 100,000, I am looking for ways to minimize the number of requests sent to the API. I was thinking of using a database to store previously received responses, so in the future I could retrieve them without making requests to the API and will do request to the API only in a case if the needed data had not been previously stored in my database.
According to Google API terms of use, specifically the section 10.1.3 Restrictions against Copying or Data Export, it is not allowed to store data indefinitely, but it is legal to cache it temporarily:
You must not pre-fetch, cache, or store any Content, except that you may store: (i) limited amounts of Content for the purpose of improving the performance of your Maps API Implementation if you do so temporarily (and in no event for more than 30 calendar days), securely, and in a manner that does not permit use of the Content outside of the Service; and (ii) any content identifier or key that the Maps APIs Documentation specifically permits you to store. For example, you must not use the Content to create an independent database of "places" or other local listings information.
I find this section not well explained. Can I store any data received by the API in my database for 30 days or only ids of places? Because in some other contexts I have read that it is only allowed to store the ids. I understood it this way: I can store places ids indefinitely, but can use the whole data only for 30 days.
Because I have been reading about Google APIs for only a couple of days, I have maybe missed some term of use, so I would be really thankful if you could help me.
Any suggestions how to minimize the number of calls to the APIs, or sharing some experiences related to real projects using those APIs will be really appreciated. Also if you could suggest me some alternative APIs which could provide similar functionality would be really helpful.
Thank You in advance!

From my experience with the Google Places API, your understanding is just about correct. Let me explain the two stipulations in my own words:
i) Without prefetching, or redistributing outside your application, you may cache API results for up to 30 days.
ii) You can use a place ID or key in your application specific data, but nothing else (e.g. if your app lets user's "check-in" places, you can store a list of place IDs where they've been on a user object and lookup the places as needed by ID, but you can't store a list of all the places with Google's names/details).
In order to reduce the number of API calls and accelerate my app, what I do is cache the nearby place calls in a simple key-value cache, where the key is the lat-lng pair rounded to a certain precision (so that calls within a certain radius will hit the cache) and the value is the entire JSON result string. Here is my code, which is Java running on Google's App Engine:
// Using 4 decimal places for rounding represents approximately 11 meters of precision
// http://gis.stackexchange.com/questions/8650/how-to-measure-the-accuracy-of-latitude-and-longitude
public static final int LAT_LONG_CACHE_PRECISION = 4;
public static final int CACHE_DURATION_SEC = 24 * 60 * 60; // one day in seconds
...
String cacheKey = "lat,lng:" + round(latitude) + "," + round(longitude);
asyncCache.put(cacheKey, dataJSON, Expiration.byDeltaSeconds(CACHE_DURATION_SEC), MemcacheService.SetPolicy.SET_ALWAYS);
...
private static double round(double value) {
BigDecimal bd = new BigDecimal(value);
bd = bd.setScale(LAT_LONG_CACHE_PRECISION, RoundingMode.HALF_UP);
return bd.doubleValue();
}
As for alternative APIs, I would suggest you look at the following:
Yelp API - provides bar/restaurant data that Google lacks
Facebook API - easy to use if you're already using Facebook's SDK
Factual: Places Crosswalk - aggregates and normalizes places data from many sources, including Facebook and Yelp, but not Google
Currently I'm only using Google Places API, but I'm planning on adding Yelp or Factual later to improve the results for the end user.

Related

How Google is charging the language detection

Background
We have an application that is using Google Translation API in order to translate some data to a target language.
For that, we use the /language/translate/v2 call without specifying the source language and Google is detecting the source language and translating in the same call.
Issue
Most of the time we are sending to translate text in the same language that is our target. In example: we receive a row in English (we don't know the source lang) and it needs to be translated into English.
Questions
We know that Google is charging by characters and operation. But, is Google charging double we you call /translate and it has to perform both operations?
Is Google performing a translation when the target lang is the same that was detected?
If so, is Google charging when the source lang is the same that the target lang?
You might already know about the Cloud Translation API > Documentation > Pricing section of the documentation, however I am posting it here so everyone could access.
As it is mentioned in the documentation * Price is per character sent to the API for processing, including whitespace characters. Empty queries are charged for one character.. Which means that Google is charging per character that is sent to the API.
However, it is also specified that Google does not charge extra for language detection when you do not specify the source language for the translate method; the Language Detection price applies to detect method calls.. Which also means that if you don't specify the source language, you will be charged only for the translation process and not the language detection.
We know that Google is charging by characters and operation. But, is Google charging double we you call /translate and it has to perform both operations?
No.
Is Google performing a translation when the target lang is the same that was detected?
No.
If so, is Google charging when the source lang is the same that the target lang?
Good question, I assume so, but cannot say.
It may seem a bit unfair, but much of their cost is development and then handling requests at scale. This goes back to their original motivation for making the API paid - people were calling it in very inefficient ways, for example every time a webpage loaded, which risked making it unstable for other developers or unsustainable for Google.
Similarly, they do not charge twice for translation between two languages that are not English, even though usually they are translating twice under the hood - from the source language to English, and English to the target language. So most of the time, it works in the client's favour.
Personally, if I knew that most of my queries will be no-ops, I would run a client-side lang id library first. Then you can set a threshold, to balance cost and risk. In fact, you could send multiple requests if lang id gives a high score for multiple languages.
Besides the API cost there is latency and machine cost and network cost, and I also bet that you can do better than Google Translate's detection by tuning a bit for your content, assuming it's not open domain.

Cost efficient way of displaying web page view count with Go/Datastore

I'd like for each page to display the number of visitors; using Google Analytics does not seem to be an option since too many visitors use adblockers.
I guess there is no smart way to do it with Go since I doubt there is an efficient way of storing the counter in a text file; whereas using Datastore seems like a good option, but I guess the costs of updating and retrieving the view count on every request is not worth it.
I was hoping someone might be able to recommend a good and cheap solution (that preferably does not rely on third-party services that are not a part of the Google Cloud Platform)

What is the difference between Falcor and GraphQL?

GraphQL consists of a type system, query language and execution
semantics, static validation, and type introspection, each outlined
below. To guide you through each of these components, we've written an
example designed to illustrate the various pieces of GraphQL.
- https://github.com/facebook/graphql
Falcor lets you represent all your remote data sources as a single
domain model via a virtual JSON graph. You code the same way no matter
where the data is, whether in memory on the client or over the network
on the server.
- http://netflix.github.io/falcor/
What is the difference between Falcor and GraphQL (in the context of Relay)?
I have viewed the Angular Air Episode 26: FalcorJS and Angular 2 where Jafar Husain answers how GraphQL compares to FalcorJS. This is the summary (paraphrasing):
FalcorJS and GraphQL are tackling the same problem (querying data, managing data).
The important distinction is that GraphQL is a query language and FalcorJS is not.
When you are asking FalcorJS for resources, you are very explicitly asking for finite series of values. FalcorJS does support things like ranges, e.g. genres[0..10]. But it does not support open-ended queries, e.g. genres[0..*].
GraphQL is set based: give me all records where true, order by this, etc. In this sense, GraphQL query language is more powerful than FalcorJS.
With GraphQL you have a powerful query language, but you have to interpret that query language on the server.
Jafar argues that in most applications, the types of the queries that go from client to server share the same shape. Therefore, having a specific and predictable operations like get and set exposes more opportunities to leverage cache. Furthermore, a lot of the developers are familiar with mapping the requests using a simple router in REST architecture.
The end discussion resolves around whether the power that comes with GraphQL outweighs the complexity.
I have now written apps with both libraries and I can agree with everything in Gajus' post, but found some different things most important in my own use of the frameworks.
Probably the biggest practical difference is that most of the examples and presumably work done up to this point on GraphQL has been concentrated on integrating GraphQL with Relay - Facebook's system for integrating ReactJS widgets with their data requirements. FalcorJS on the other hand tends to act separately from the widget system which means both that it may be easier to integrate into a non-React/Relay client and that it will do less for you automatically in terms of matching widget data dependencies with widgets.
The flip side of FalcorJS being flexible in client side integrations is that it can be very opinionated about how the server needs to act. FalcorJS actually does have a straight up "Call this Query over HTTP" capability - although Jafar Husain doesn't seem to talk about it very much - and once you include those, the way the client libraries react to server information is quite similar except that GraphQL/Relay adds a layer of configuration. In FalcorJS, if you are returning a value for movie, your return value better say 'movie', whereas in GraphQL, you can describe that even though the query returns 'film', you should put that in the client side datastore as 'movie'. - this is part of the power vs complexity tradeoff that Gajus mentioned.
On a practical basis, GraphQL and Relay seems to be more developed. Jafar Husain has mentioned that the next version of the Netflix frontend will be running at least in part on FalcorJS whereas the Facebook team has mentioned that they've been using some version of the GraphQL/Relay stack in production for over 3 years.
The open source developer community around GraphQL and Relay seems to be thriving. There are a large number of well-attended supporting projects around GraphQL and Relay whereas I have personally found very few around FalcorJS. Also the base github repository for Relay (https://github.com/facebook/relay/pulse) is significantly more active than the github repository for FalcorJS (https://github.com/netflix/falcor/pulse). When I first pulled the Facebook repo, the examples were broken. I opened a github issue and it was fixed within hours. On the other hand, the github issue I opened on FalcorJS has had no official response in two weeks.
Lee Byron one of the engineer behind GraphQL did an AMA on hashnode, here is his answer when asked this question:
Falcor returns Observables, GraphQL just values. For how Netflix wanted to use Falcor, this makes a lot of sense for them. They make multiple requests and present data as it's ready, but it also means that the client developer has to work with the Observables directly. GraphQL is a request/response model, and returns back JSON, which is trivially easy to then use. Relay adds back in some of the dynamicism that Falcor presents while maintaining only using plain values.
Type system. GraphQL is defined in terms of a type system, and that's allowed us to built lots of interesting tools like GraphiQL, code generators, error detection, etc. Falcor is much more dynamic, which is valuable in its own right but limits the ability to do this kind of thing.
Network usage. GraphQL was originally designed for operating Facebook's news feed on low end devices on even lower end networks, so it goes to great lengths to allow you to declare everything you need in a single network request in order to minimize latency. Falcor, on the other hand, often performs multiple round trips to collect additional data. This is really just a tradeoff between the simplicity of the system and the control of the network. For Netflix, they also deal with very low end devices (e.g. Roku stick) but the assumption is the network will be good enough to stream video.
Edit: Falcor can indeed batch requests, making the comment about the network usage inaccurate. Thanks to #PrzeoR
UPDATE: I've found the very useful comment under my post that I want to share with you as a complementary thing to the main content:
Regarding lack of examples, you can find the awesome-falcorjs repo userful, there are different examples of a Falcor's CRUD usage:
https://github.com/przeor/awesome-falcorjs ... Second thing, there is a book called "Mastering Full Stack React Development" which includes Falcor as well (good way to learn how to use it):
ORGINAL POST BELOW:
FalcorJS (https://www.facebook.com/groups/falcorjs/) is much more simpler to be efficient in comparison to Relay/GraphQL.
The learning curve for GraphQL+Relay is HUGE:
In my short summary: Go for Falcor. Use Falcor in your next project until YOU have a large budget and a lot of learning time for your team then use RELAY+GRAPHQL.
GraphQL+Relay has huge API that you must be efficient in. Falcor has small API and is very easy to grasp to any front-end developer who is familiar with JSON.
If you have an AGILE project with limited resources -> then go for FalcorJS!
MY SUBJECTIVE opinion: FalcorJS is 500%+ easier to be efficient in full-stack javascript.
I have also published some FalcorJS starter kits on my project (+more full-stack falcor's example projects): https://www.github.com/przeor
To be more in technical details:
1) When you are using Falcor, then you can use both on front-end and backend:
import falcor from 'falcor';
and then build your model based upon.
... you need also two libraries which are simple to use on backend:
a) falcor-express - you use it once (ex. app.use('/model.json', FalcorServer.dataSourceRoute(() => new NamesRouter()))). Source: https://github.com/przeor/falcor-netflix-shopping-cart-example/blob/master/server/index.js
b) falcor-router - there you define SIMPLE routes (ex. route: '_view.length'). Source:
https://github.com/przeor/falcor-netflix-shopping-cart-example/blob/master/server/router.js
Falcor is piece of cake in terms of learning curve.
You can also see documentation which is much simpler than FB's lib and check also the article "why you should care about falcorjs (netflix falcor)".
2) Relay/GraphQL is more likely like a huge enterprise tool.
For example, you have two different documentations that separately are talking about:
a) Relay: https://facebook.github.io/relay/docs/tutorial.html
- Containers
- Routes
- Root Container
- Ready State
- Mutations
- Network Layer
- Babel Relay Plugin
- GRAPHQL
GraphQL Relay Specification
Object Identification
Connection
Mutations
Further Reading
API REFERENCE
Relay
RelayContainer
Relay.Route
Relay.RootContainer
Relay.QL
Relay.Mutation
Relay.PropTypes
Relay.Store
INTERFACES
RelayNetworkLayer
RelayMutationRequest
RelayQueryRequest
b) GrapQL: https://facebook.github.io/graphql/
2Language
2.1Source Text
2.1.1Unicode
2.1.2White Space
2.1.3Line Terminators
2.1.4Comments
2.1.5Insignificant Commas
2.1.6Lexical Tokens
2.1.7Ignored Tokens
2.1.8Punctuators
2.1.9Names
2.2Query Document
2.2.1Operations
2.2.2Selection Sets
2.2.3Fields
2.2.4Arguments
2.2.5Field Alias
2.2.6Fragments
2.2.6.1Type Conditions
2.2.6.2Inline Fragments
2.2.7Input Values
2.2.7.1Int Value
2.2.7.2Float Value
2.2.7.3Boolean Value
2.2.7.4String Value
2.2.7.5Enum Value
2.2.7.6List Value
2.2.7.7Input Object Values
2.2.8Variables
2.2.8.1Variable use within Fragments
2.2.9Input Types
2.2.10Directives
2.2.10.1Fragment Directives
3Type System
3.1Types
3.1.1Scalars
3.1.1.1Built-in Scalars
3.1.1.1.1Int
3.1.1.1.2Float
3.1.1.1.3String
3.1.1.1.4Boolean
3.1.1.1.5ID
3.1.2Objects
3.1.2.1Object Field Arguments
3.1.2.2Object Field deprecation
3.1.2.3Object type validation
3.1.3Interfaces
3.1.3.1Interface type validation
3.1.4Unions
3.1.4.1Union type validation
3.1.5Enums
3.1.6Input Objects
3.1.7Lists
3.1.8Non-Null
3.2Directives
3.2.1#skip
3.2.2#include
3.3Starting types
4Introspection
4.1General Principles
4.1.1Naming conventions
4.1.2Documentation
4.1.3Deprecation
4.1.4Type Name Introspection
4.2Schema Introspection
4.2.1The "__Type" Type
4.2.2Type Kinds
4.2.2.1Scalar
4.2.2.2Object
4.2.2.3Union
4.2.2.4Interface
4.2.2.5Enum
4.2.2.6Input Object
4.2.2.7List
4.2.2.8Non-null
4.2.2.9Combining List and Non-Null
4.2.3The __Field Type
4.2.4The __InputValue Type
5Validation
5.1Operations
5.1.1Named Operation Definitions
5.1.1.1Operation Name Uniqueness
5.1.2Anonymous Operation Definitions
5.1.2.1Lone Anonymous Operation
5.2Fields
5.2.1Field Selections on Objects, Interfaces, and Unions Types
5.2.2Field Selection Merging
5.2.3Leaf Field Selections
5.3Arguments
5.3.1Argument Names
5.3.2Argument Uniqueness
5.3.3Argument Values Type Correctness
5.3.3.1Compatible Values
5.3.3.2Required Arguments
5.4Fragments
5.4.1Fragment Declarations
5.4.1.1Fragment Name Uniqueness
5.4.1.2Fragment Spread Type Existence
5.4.1.3Fragments On Composite Types
5.4.1.4Fragments Must Be Used
5.4.2Fragment Spreads
5.4.2.1Fragment spread target defined
5.4.2.2Fragment spreads must not form cycles
5.4.2.3Fragment spread is possible
5.4.2.3.1Object Spreads In Object Scope
5.4.2.3.2Abstract Spreads in Object Scope
5.4.2.3.3Object Spreads In Abstract Scope
5.4.2.3.4Abstract Spreads in Abstract Scope
5.5Values
5.5.1Input Object Field Uniqueness
5.6Directives
5.6.1Directives Are Defined
5.7Variables
5.7.1Variable Uniqueness
5.7.2Variable Default Values Are Correctly Typed
5.7.3Variables Are Input Types
5.7.4All Variable Uses Defined
5.7.5All Variables Used
5.7.6All Variable Usages are Allowed
6Execution
6.1Evaluating requests
6.2Coercing Variables
6.3Evaluating operations
6.4Evaluating selection sets
6.5Evaluating a grouped field set
6.5.1Field entries
6.5.2Normal evaluation
6.5.3Serial execution
6.5.4Error handling
6.5.5Nullability
7Response
7.1Serialization Format
7.1.1JSON Serialization
7.2Response Format
7.2.1Data
7.2.2Errors
AAppendix: Notation Conventions
A.1Context-Free Grammar
A.2Lexical and Syntactical Grammar
A.3Grammar Notation
A.4Grammar Semantics
A.5Algorithms
BAppendix: Grammar Summary
B.1Ignored Tokens
B.2Lexical Tokens
B.3Query Document
It's your choice:
Simple sweet and short documented Falcor JS VERSUS Huge-enterprise-grade tool with long and advanced documentation as GraphQL&Relay
As I said before, if you are a front-end dev who grasp idea of using JSON, then JSON graph implementation from Falcor's team is best way to do your full-stack dev project.
In short, Falcor or GraphQL or Restful solve the same problem - provide a tool to query/manipulate data effectively.
How they differ is in how they present their data:
Falcor wants you to think their data as a very big virtual JSON tree, and uses get, set and call to read, write data.
GraphQL wants you to think their data as a group of predefined typed objects, and uses queries and mutations to read, write data.
Restful wants you to think their data as a group of resources, and uses HTTP verbs to read, write data.
Whenever we need to provide data for user, we end up with something liked: client -> query -> {a layer translate query into data ops} -> data.
After struggling with GraphQL, Falcor and JSON API (and even ODdata), I wrote my own data query layer. It's simpler, easier to learn, and more equivalent with GraphQL.
Check it out at:
https://github.com/giapnguyen74/nextql
It also integrates with featherjs for real time query/mutation.
https://github.com/giapnguyen74/nextql-feathers
OK, just start from a simple but important difference, GraphQL is a query based while Falcor is not!
But how they help u?
Basically, they both helping us to manage and querying data, but GraphQL has a req/res Model and return the data as JSON, basically the idea in GraphQL is having a single request to get all your data in one goal... Also, have exact response by having an exact request, So something to run on low-speed internet and mobile devices, like 3G networks... So if you have many mobile users or for some reasons you'd like to have less requests and faster response, use GraphQL... While Faclor is not too far from this, so read on...
On the other hand, Falcor by Netflix, usually have extra request (usually more than once) to retrieve all your data, eventhough they trying to improving it to a single req... Falcor is more limited for queries and doesn't have pre-defined query helpers like range and etc...
But for more clarification, let's see how each of them introduce itself:
GraphQL, A query language for your API
GraphQL is a query language for APIs and a runtime for fulfilling
those queries with your existing data. GraphQL provides a complete and
understandable description of the data in your API, gives clients the
power to ask for exactly what they need and nothing more, makes it
easier to evolve APIs over time, and enables powerful developer tools.
Send a GraphQL query to your API and get exactly what you need,
nothing more and nothing less. GraphQL queries always return
predictable results. Apps using GraphQL are fast and stable because
they control the data they get, not the server.
GraphQL queries access not just the properties of one resource but
also smoothly follow references between them. While typical REST APIs
require loading from multiple URLs, GraphQL APIs get all the data your
app needs in a single request. Apps using GraphQL can be quick even on
slow mobile network connections.
GraphQL APIs are organized in terms of types and fields, not
endpoints. Access the full capabilities of your data from a single
endpoint. GraphQL uses types to ensure Apps only ask for what’s
possible and provide clear and helpful errors. Apps can use types to
avoid writing manual parsing code.
Falcor, a JavaScript library for efficient data fetching
Falcor lets you represent all your remote data sources as a single
domain model via a virtual JSON graph. You code the same way no matter
where the data is, whether in memory on the client or over the network
on the server.
A JavaScript-like path syntax makes it easy to access as much or as
little data as you want, when you want it. You retrieve your data
using familiar JavaScript operations like get, set, and call. If you
know your data, you know your API.
Falcor automatically traverses references in your graph and makes
requests as needed. Falcor transparently handles all network
communications, opportunistically batching and de-duping requests.

How to look up elevation data by lat/lng [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am planning an app that will need the ability to look up the elevation of geographic points by lat/lng. Ideally I would like something that would work worldwide, but US-only would also suffice. I have looked at using the USGS Elevation Query Web Service, however it only allows you to query for one point at a time, and I will need to look up several hundred, and possibly up to several thousand. I also considered downloading & hosting the National Elevation Dataset myself, but it's almost 100 gigs, and apparently the USGS only allows you to download 1.5 gigs at a time.
Can anyone familiar with GIS recommend a good solution for me? I'm looking for something as lightweight & simple as possible. I am completely new to GIS, so I would really appreciate suggestions on where to get the data, how to store it, and how/what to use when working with it.
Thanks in advance.
EDIT: Just to clarify, the data points I need are not predetermined. They are arbitrary points chosen by the user (by interacting with a google maps mashup), so I do need to be able to query for any point, not just a small subset.
EDIT 2: If there is no lightweight or simple solution, I'll take whatever I can get =)
I'll give you one of the best "secrets" that I learned throughout the years after going through many different pains (leeching scripts, manual clicking, etc). It is an old-school trick... contact a real person there!
The best way do get the NED elevation dataset is to contact USGS's Eros group directly at bulkdatainfo_at_usgs.gov
You send them an external drive and after 4 to 8 weeks (usually much less than that) they will send you the entire dataset that you requested.
Then use GDAL to query your points in a way similar to this example. You may want to read the Affine Geotransform section of the GDAL Data Model
I recently stumbled in to this question while doing research. I don't have a complete simple answer either, but there are some other options not listed in the answers so far:
Google Elevation API: 25,000 requests/day, limited to Google Maps applications
Lat/Lon to Elevation: 2 points / second
GPSVisualizer: no published speed, but not intended for general DEM use
Shuttle Radar Topography: alternative to the NED, 7 gigabytes for the US.
The USGS Elevation Query Web Service only allows one query at a time, but it allows you to make requests with SOAP, HTTP GET, or HTTP POST. Choose your favorite language and write a script to generate requests for each of your data points.
I bought a GEOIP database and store every single post long lat data in mysql. After that i just implement the Google Map API by passing dynamic longlat data to the Google MAP. What I get is all my post shown in the Google Map and also display nearest post from a certain location
What you need is a GEOIP database, A query that calculate distance in miles base on given longlat, Google API, PHP Dynamic passing API variables.
example of my site : Matchimedia.com.

How do you perform address validation? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Is it even possible to perform address (physical, not e-mail) validation? It seems like the sheer number of address formats, even in the US alone, would make this a fairly difficult task. On the other hand it seems like a task that would be necessary for several business requirements.
Here's a free and sort of "outside the box" way to do it. Not 100% perfect, but it should reject blatantly non-existent addresses.
Submit the entire address to Google's geocoding web service. This service attempts to return the exact coordinates of the location you feed it, i.e. latitude and longitude.
In my experience if the address is invalid you will get a result of 602 from the service. There's definitely a possibility of false positives or false negatives, but used in conjunction with other consistency checks it could be useful.
(Yahoo's geocoding web service, on the other hand, will return the coordinates of the center of the town if the town exists but the rest of the address is bogus. Potentially useful as long as you pay close attention to the "precision" field in the result).
There are a number of good answers in here but most of them make the assumption that the user wants an "API" solution where they must write code to connect to a 3rd-party service and/or screen scrape the USPS. This is all well and good, but should be factored into the business requirements and costs associated with the implementation and then weighed against the desired benefits.
Depending upon the business requirements and the way that the data is received into the system, a real-time address processing solution may be the best bet. If a real-time solution is required, you will want to consider the license agreement and technical limitations of the Google Maps/Bing/Yahoo APIs. They typically limit the number of calls you can make each day. The USPS web tools API is the same in additional they restrict how/why you can use their system and how you are allowed to use the data thereafter.
At the same time, there are a handful of great service providers that can easily process a static list of addresses. Essentially, you give the service provider a CSV file or Excel file, they clean it up and get it back to you. It's a one-time deal with no long-term commitment or obligation—usually.
Full disclosure: I'm the founder of SmartyStreets. We do address verification for addresses within the United States. We are easily able to CASS certify a list and we also offer a address verification web service API. We have no hidden fees, contracts, or anything. You use our service until you no longer need it and you can walk away. (Unlike cell phone companies that require a contract.)
USPS has an address cleaner online, which someone has screen scraped into a poor man's webservice. However, if you're doing this often enough, it'd be a better idea to apply for a USPS account and call their own webservice.
I will refer you to my blog post - A lesson in address storage, I go into some of the techniques and algorithms used in the process of address validation. My key thought is "Don't be lazy with address storage, it will cause you nothing but headaches in the future!"
Also, there is another StackOverflow question that asks this question entitled How should international geographic addresses be stored in a relational database.
In the course of developing an in-house address verification service at a German company I used to work for I've come across a number of ways to tackle this issue. I'll do my best to sum up my findings below:
Free, Open Source Software
Clearly, the first approach anyone would take is an open-source one (like openstreetmap.org), which is never a bad idea. But whether or not you can really put this to good and reliable use depends very much on how much you need to rely on the results.
Addresses are an incredibly variable thing. Verifying U.S. addresses is not an easy task, but bearable, but once you're going for Europe, especially the U.K. with their extensive Postal Code system, the open-source approach will simply lack data.
Web Services / APIs
Enterprise-Class Software
Money gets it done, obviously. But not every business or developer can spend ~$0.15 per address lookup (that's $150 for 1,000 API requests) - a very expensive business model the vast majority of address validation APIs have implemented.
What I ended up integrating: streetlayer API
Since I was not willing to take on the programmatic approach of verifying address data manually I finally came to the conclusion that I was in need of an API with a price tag that would not make my boss want to fire me and still deliver solid and reliable international verification results.
Long story short, I ended up integrating an API built by apilayer, called "streetlayer API". I was easily convinced by a simple JSON integration, surprisingly accurate validation results and their developer-friendly pricing. Also, 100 requests/month are entirely free.
Hope this helps!
I have used the services of http://www.melissadata.com Their "address object" works very well. Its pricey, yes. But when you consider costs of writing your own solutions, the cost of dirty data in your application, returned mailers - lost sales, and the like - the costs can be justified.
For us-based address data my company has used GeoStan. It has bindings for C and Java (and we created a Perl binding). Note that it is a commercial product and isn't cheap. It is quite fast though (~300 addresses per second) and offers features like CASS certification (USPS bulk mail discount), DPV (Delivery point verification) flagging, and LON/LAT geocoding.
There is a Perl module Geo::PostalAddress, but it uses heuristics and doesn't have the other features mentioned for GeoStan.
Edit: some have mentioned 'doing it yourself', if you do decide to do this, a good source of information to start with is the US Census Tiger Data Set, which contains a lot of information about the US including address information.
As seen on reddit:
$address = urlencode('1600 Pennsylvania Avenue, Washington, DC');
$json = json_decode(file_get_contents("http://where.yahooapis.com/geocode?q=$address&flags=J"));
print_r($json);
Fixaddress.com service is available that provides following services,
1) Address Validation.
2) Address Correction.
3) Address spell correcting.
4) Correct addresses phonetic mistakes.
Fixaddress.com uses USPS and Tiger data as reference data.
For more detail visit below link,
http://www.fixaddress.com/
One area where address lookups have to be performed reliably is for VOIP E911 services. I know companies reliably using the following services for this:
Bandwidth.com 9-1-1 Access API MSAG Address Validation
MSAG = Master Street Address Guide
https://www.bandwidth.com/9-1-1/
SmartyStreet US Street Address API
https://smartystreets.com/docs/cloud/us-street-api
There are companies that provide this service. Service bureaus that deal with mass mailing will scrub an entire mailing list to that it's in the proper format, which results in a discount on postage. The USPS sells databases of address information that can be used to develop custom solutions. They also have lists of approved vendors who provide this kind of software and service.
There are some (but not many) packages that have APIs for hooking address validation into your software.
However, you're right that its a pretty nasty problem.
http://www.usps.com/ncsc/ziplookup/vendorslicensees.htm
As mentioned there are many services out there, if you are looking to truly validate the entire address then I highly recommend going with a Web Service type service to ensure that changes can quickly be recognized by your application.
In addition to the services listed above, webservice.net has this US Address Validation service. http://www.webservicex.net/WCF/ServiceDetails.aspx?SID=24
We have had success with Perfect Address.
Their database has all the US street names and street number ranges. Also acts as a pretty decent parser for free-form address fields, if you are lucky enough to have that kind of data.
Validating it is a valid address is one thing.
But if you're trying to validate a given person lives at a given address, your only almost-guarantee would be a test mail to the address, and even that is not certain if the person is organised or knows somebody at that address.
Otherwise people could just specify an arbitrary random address which they know exists and it would mean nothing to you.
The best you can do for immediate results is request the user send a photographed / scanned copy of the head of their bank statement or some other proof-of-recent-residence, because at least then they have to work harder to forget it, and forging said things show up easily with a basic level of image forensic analysis.
There is no global solution. For any given country it is at best rather tricky.
In the UK, the PostOffice controlls postal addresses, and can provide (at a cost) address information for validation purposes.
Government agencies also keep an extensive list of addresses, and these are centrally collated in the NLPG (National Land and Property Gazetteer).
Actually validating against these lists is very difficult. Most people don't even know exactly how their address as it is held by the PostOffice. Some businesses don't even know what number they are on a particular street.
Your best bet is to approach a company that specialises in this kind of thing.
Yahoo has also a Placemaker API. It is good only for locations but it has an universal id for all world locations.
It look that there is no standard in ISO list.
You could also try SAP's Data Quality solutions which are available in both a server platform is processing a large number of requests or as an embeddable SDK if you wanted to run it in process with your application. We use it in our application and it's very robust and scalable.
NAICS.com is coming out with an API that will add all kinds of key business data including street address. This would happen on the fly as your site's forms are processed. https://www.naics.com/business-intelligence-api/
You can try Pitney Bowes “IdentifyAddress” Api available at - https://identify.pitneybowes.com/
The service analyses and compares the input addresses against the known address databases around the world to output a standardized detail. It corrects addresses, adds missing postal information and formats it using the format preferred by the applicable postal authority. I also uses additional address databases so it can provide enhanced detail, including address quality, type of address, transliteration (such as from Chinese Kanji to Latin characters) and whether an address is validated to the premise/house number, street, or city level of reference information.
You will find a lot of samples and sdk available on the site and i found it extremely easy to integrate.
For US addresses you can require a valid state, and verify that the zip is valid. You could even check that the zip code is in the right state, but beyond that I don't think there are many tests you could run that wouldn't provide a lot of false negatives.
What are you trying to do -- prevent simple mistakes or enforcing some kind of identity check?

Resources