What is the best way to convert latitude/longitudes into a street address at a high scale in batch? - google-geocoder

The existing questions are related to real-time needs; the answers are API-based solutions.
These solutions are not well suited to do a large batch task.
Convert Latitude/Longitude To Address
This solution suggests caching google results for a short time:
what is the best geocode to get long and lot for a street address?
Is there a better solution? If there is not a good open-source dataset available for this then do you know of a good paid service?

Open Street Map has the Nominatim database which might do what you need.

Related

Is opencv image similarity comparison reliable for objects? Is there any cost/benefit quality alternative to open-source API's?

I'm trying to choose an API to match object images taken with a cell phone with a list of images in a file system. The point is, I'm afraid that I won't get reliable results and it won't be worth it to loose time in this feature.
I would really appreciate some advice regarding this topic.

Synchronize contact lists with a central server

We have some data that we are trying to synchronize between N machines and a centralized server, and I'm looking for a way to do this that is relatively efficient and robust.
Looking around, it appears that this is called a "set reconciliation problem". It's good to have a label for it, but searching on that turns up a lot of fairly academic work, which is at times a bit difficult to gauge in terms of its usefulness for our data, which is best described as contact lists in terms of its properties: objects (people) with multiple fields that do get updated, but not that often.
Our system involves a central server and machines connected to it. The central server, ideally, is the 'good' copy. A feature that's nice to have also, is the ability to force the machines to resend by tweaking something on the server.
So far, my thinking is along the lines of a UUID for each object and something like a version or timestamp (per object and or per collection of objects?) to use to tell which data to attempt to synchronize... but my thinking is still a bit fuzzy, and I thought asking would probably lead to a better solution than trying to invent this on my own.
It is not easy, and the perfect solution is academical. So you are on the good track.
You can craft a sync algorithm for your own problem, relaxing some of the requirements of the general solution.
I delivered a presentation on these topics at the last JsDay in Italy.
Here are my slides: http://www.slideshare.net/matteocollina/operational-transformation-12962149
Let me know if they help you, or if you need some assistance.

What are the performance consequences of encryption in SQL Server 2008?

If you had a table that had 100,000,000 email addresses (example) and you want to store them securely but you don't want to take a huge hit with performance when you retrieve them, how would you go about storing them?
Encryption algorithms are designed to be fast. If you have decided that encrypting your data is important (and personal information such as email address is important as #Marc B pointed out) then the question becomes "which algorithm has the right performance for my use case?"
I'd suggest you look at how much throughput you are expecting and test it. Use an industry standard algorithm and see if it meets your needs. Whatever you do, don't roll your own encryption. Crypto is hard to get right so trust the experts.
I'd suggest you start with AES/CBC/PKCS5. Store the ciphertext as a blob in the database. Key storage is always tricky and has been discussed on StackOverflow a fair bit so with any luck you can get advice on that aspect fairly easily. If you get sufficient throughput with that algorithm then you're done. If not, look around for a faster algorithm. I recommend http://bouncycastle.org as a good crypto library. It has a bunch of algorithms implemented so you can experiment and find one that meets your needs.
Did I mention that you shouldn't roll your own encryption? Good.

How the computer knows "Recommended for You"?

Recently, I found several web site have something like : "Recommended for You", for example youtube, or facebook, the web site can study my using behavior, and recommend some content for me... ...I would like to know how they analysis this information? Is there any Algorithm to do so? Thank you.
Amazon and Netflix (among others) use a technique called Collaborative filtering to suggest things you might like based on the likes/dislikes of others who have made purchases and selections similar to yours.
Is there any Algorithm to do so?
Yes
Yes. One fairly common one is to look at things you've selected in the past, find other people who've made those selections, then find the other selections most common among those other people, and guess that you're likely to be interested in those as well.
Yup there are lots of algorithms. Things such as k-nearest neighbor: http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.
Here is a pretty good book on the subject that covers making these sorts of systems along with others: http://www.amazon.com/gp/product/0596529325?ie=UTF8&tag=ianburriscom-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596529325.
It's generally done by matching you with other users who have similar usage history / profile and then recommending other things that they've purhased/watched/whatever.
Searching for "recommendation algorithm" yields lots of papers. Most algorithms incorporate "machine learning" algorithms to determine groups of things (comedy movies, books on gardening, orchestral music, etc.). Your matching with those groups yields recommendations. Some companies use humans to classify things, too.
Such an algorithm is going to vary wildly from company to company. In many cases, it analyzes some combination of your search history, purchase history, physical location, and other factors. It probably will also compare purchases/searches amongst other people to find what those people have purchased/searched for, and recommend some of those products to you.
There are probably hundreds of these algorithms out there, but I doubt you can use any of them (that are actually good). Probably you are better off figuring it out yourself.
If you can categorize your contents (i.e. by tagging or content analysis), you can also categorize your users and their preferences.
For example: you have a video portal with 5 million videos .. 1 mio of them are tagged mostly red. If 80% of all videos watched by a user (who is defined by an IP, a persistent user account, ...) are tagged mostly red, you might want to recommend even more red videos to him. You might want to refine your recommendations by looking at his further actions: does he like your recommendations -- if so, why not give him even more, if not, try the second-best guess, maybe he's not looking for color, but for the background music ...
There's no absolute algorithm to do it, but all implementations will go into a similar direction. It's always basing on observing users, which scares me from time to time :-)
There's whole lot of algorithms tackling the issue: Wiki article. It's a Machine Learning domain problem. Computer's can be learned using two main techniques: classification and clustering. They require some datasets as input. If the dataset is informative (really holds some useful patterns) than those ML techniques can dig most of it.
Clustering could be best to use for this kind of problem. It's main usage is to find similarities among points in provided dataset. If the points are, e.g. your search history, they can be grouped together to form certain clusters. If Your search history closely relates to another, a hint can be given - picking links that are most similar to Your's.
The same comes with book recommendations - it's obvious what dataset they use: "Other people who bought this product also bought Product A, Product B,...". The key here is to match your profile to other's and use the most similar to recommend.
The computer retrieves information from the human brain with complex memory scan process, sorts it accordingly and outputs results based on what you have experienced in your life so far.

Media recommendation engine - Single user system - How to start

I want to implement a media recommendation engine. I saw a similar posts on this, but I think my requirements are bit different from those, so posting here.
Here is the deal.
I want to implement a recommendation engine for media players like VLC, which would be an engine that has to care for only single user. Like, it would be embedded in a media player on a PC which is typically used by single user. And it will start learning the likes and dislikes of the user and gradually learns what a user likes. Here it will not be able to find similar users for using their data for recommendation as its a single user system. So how to go about this?
Or you can consider it as a recommendation engine that has to be put in say iPods, which has to learn about a single user and recommend music/Movies from the collections it has.
I thought of start collecting the genre of music/movies (maybe even artist name) that user watches and recommend movies from the most watched Genre, but it look very crude, isn't it?
So is there any algorithms I can use or any resources I can refer up to?
Regards,
MicroKernel :)
What you're trying to do is quite challenging... particularly because it's still in the research stage and a lot of PHDs from reputable universities across the world are trying to get a good solution for that.
SO here are some things that you might need:
Data that you can analyze:
Lots, and lots, and lots of data!
It could be meta data about the media (name, duration, title, author, style, etc.)
Or you can try to do some crazy feature extraction from the media itself.
References to correlate the data to.
Since you can't get other users, you always need the user feedback.
If you don't want to annoy your user to death with feedback questions, then make your application connect to a central server so you can compare users.
An algorithm that can model your data sufficiently well.
If you have no experience at all, then try k-nearest neighbor (the simplest one).
Collaborative filtering
Pearson Correlation
Matrix Factorization/Decomposition
Singular value decomposition (SVD)
Ensemble learning <-- Allows you to combine multiple algorithms and take advantage of their strengths.
The winners of the NetFlix prize said this:
Predictive accuracy is substantially
improved when blending multiple
predictors. Our experience is that
most efforts should be concentrated in
deriving substantially different
approaches, rather than refining a
single technique. Consequently, our
solution is an ensemble of many
methods.
Conclusion:
There is no silver bullet for recommendation engines and it takes years of exploration to find a good combination of algorithms that produce sufficient results. :)

Resources