How can I organize a search in a social network web application? Searching is done by full name. I want to use stored procedures. Is it the best sollution? What algorithm can be used?
While registering, the user specifies his/her full name for ex: Alice Johnson Martin.
I want to search for a user using his/her fullname. In case someone is searching for Johnson Martin Alice, the user with the name Alice Johnson Martin should be found. I am using postgre sql and asp.net mvc.
What algorithm can be used?
Graph algorithms, machine learning, heuristics etc come to my mind. Depends on what you want to achieve.
Related
We need as part of our start-up product to compute "similar user feature". And we've decided to go with pig for it.
I've been learning pig for a few days now and understand how it work.
So to start here is how the log file look like.
user url time
user1 http://someurl.com 1235416
user1 http://anotherlik.com 1255330
user2 http://someurl.com 1705012
user3 http://something.com 1705042
user3 http://someurl.com 1705042
As the number of users and url can be huge, we can't use a bruteforce approach here, so first we need to find the user's that have access at least to on common url.
The algorithm could be splited as bellow:
Find all users that has accessed to some common urls.
generate pair-wise combination of all users for each resource accessed.
for each pair and and url, compute the similarity of those users: the similarity depend of the timeinterval between the access (so we need to keep track of the time).
sum up for each pair-url the similarity.
here is what i've written so far:
A = LOAD 'logs.txt' USING PigStorage('\t') AS (uid:bytearray, url:bytearray, time:long);
grouped_pos = GROUP A BY ($1);
I know it is not much yet, but now i don't know how to generate the pair or move further.
So any help would be appreciated.
Thanks.
There's a nice, detailed paper from IBM on doing co-clustering with MapReduce that may be useful for you.
The Google News Personalization paper describes a fairly straightforward implementation of Locality Sensitive Hashing for solving the same problem.
For algorithms, look at papers on query/URL bipartite graphs. Here are a couple of links:
Query suggestion using hitting time
by Qiaozhu Mei, Dengyong Zhou, Kenneth Church
http://www-personal.umich.edu/~qmei/pub/cikm08-sugg.ppt
Random walks on the click graph
Nick Craswell and Martin Szummer
July 2007
http://research.microsoft.com/apps/pubs/default.aspx?id=65235
I am implementing a web application that has many users and I would give the users rating based on their activities and based on other users liking their activities. How would I implement such an algorithm for that? I am looking for elegant and smart algorithm that could help.
You are basically looking for Scoring Algos. These articles might help -
How not to sort by average rating
Rank hotness with Newtons law of Cooling
How Reddit Ranking Algorithms work
Hope this helps.
Maybe your answer is staring right at you next to your username on this site :-) Stackoverflow.com's scoring system and badges are here to promote certain behaviors on the site. The algorithm is simple and the feedback is immediate so that everybody can see the consequences of certain actions.
What are the ratings used for? If you want to use the ratings as incentives for you users to encourage a specific behavior, then I believe you need to look at disciplines like behavioral psychology to figure out what behaviors you want to measure and reward.
If you already have a user base that reflects the typical user base you're trying to address, you might want to try with simple trial and error. Pick some actions, like e.g. receiving a like on a post and add points to the user's score whenever that happens. Watch the user community's reaction when you introduce the scoring system and see it it helps motivate the behavior you want. If not, try to change some other parameters and repeat.
Depending on your system, some users might try to game the system, so you could find yourself locked into an eternal cat and mouse game once you introduce a rating system (example: Google page ranking).
I'm trying to learn more about trust metrics (including related algorithms) and how user voting, ranking and rating systems can be wired to stiffle abuse. I've read abstract articles and papers describing trust metrics but haven't seen any actual implementations. My goal is to create a system that allows users to vote on other users and the content of other users and with those votes and related meta-data, determine if those votes can be applied to a users level or popularity.
Have you used or seen some sort of trust system within a social graph? How did it work and what were its areas of strength and weaknesses?
I'm reading the book Programming Collective Intelligence.
From the description:
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet.
The algorithms in the book are implemented in python.
I've just started reading the book so I don't know if it can help solve your problem, but it's worth taking a look.
I am thinking of starting a project which is based on recommandation system. I need to improve myself at this area which looks like a hot topic on the web side. Also wondering what is the algorithm lastfm, grooveshark, pandora using for their recommendation system. If you know any book, site or any resource for this kind of algorithms please inform.
Have a look at Collaborative filtering or Recommender systems.
One simple algorithm is Slope One.
A fashionably late response:
Pandora and Grooveshark are very different in the algorithm they use.
Basically there are two major approaches to recommendation systems -
1. collaborative filtering,
and 2. content based.
(and hybrid systems)
Most systems are based on collaborative filtering. This basically means matching lists of preferences): If I liked items A,B,C,D,E and F, and several other users liked A,B,C,D,E,F and J - the system will recommend J to me based on the fact that I share the same taste with these users (it's not that simple but that's the idea). The main features that are analyzed here are the items id and the users vote about these items.
Content based method analyze the content of the items at hand and build my profile based on the content of the items I like and not based on what other users like.
Having that said - Grooveshark is based on collaborative filtering Pandora is content based (maybe with some collaborative filtering layer on top).
The interesting thing about Pandora is that the content is analyzed by humans (musicians) and not automatically. They call it the music genome project (http://www.pandora.com/mgp.shtml), where annotators tag each song with a number of labels on a few axes such as structure, rhythm, tonality, recording technique and more (full list: http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes)
That's what gives them the option to explain and justify the recommended song.
Programming Collective Intelligence is a nice, approachable introduction to this field.
There's a good demo video with explanation (and a link to the author's thesis) at Mapping and visualizing music collections. This approach deals with analyzing the characteristics of the music itself. Other methods, like NetFlix and Amazon, rely on recommendations from other users with similar tastes as well as basic category filtering.
Great paper by Yehuda Koren (on the team that won the Netflix prize): The BellKor Solution to the Netflix Grand Prize (google "GrandPrize2009_BPC_BellKor.pdf").
Couple websites:
Trustlet.org
Collaborative Filtering tutorials by Dr. Jun Wang
Google: item-based top-n recommendation algorithms
Manning also has two good books on this subject. Algorithms of the Intelligent Web and Collective Intelligence in Action
Last.fm "neighbours" is probably collaborative filtering.
Pandora hired hundreds of musicologists to classify songs along ~500 dimensions.
http://en.wikipedia.org/wiki/Music_Genome_Project
These are two very different approaches. Google Scholar is your friend as far as the literature goes.
Pandoras algorithim started with just matching specific music genres to the certain song you inputed. Then it has been slowly growing by people voting if they like the song or dislike the song, enabling it to eliminate bad songs, and push good songs to the front. It also will sneek new songs that have few votes either up or down into your song playlist so that song can get some votes.
Not sure about the other sites listed.
I need to find and algorithm to find the best matches in a social network. The system is a college student social network, and basically the main idea is to find a study partner for a class. The idea it's to suggest to the user what are the potential best partners based on different criteria, such as common class, GPA, rating, common schedule, etc. I wonder what would be the best algorithm to use.
Such problem is called collaborative filtering. Collaborative filtering systems can produce personal recommendations by computing the similarity between your preference and the one of other people.
There are a lot of information about such teqniques. You might start with good presentation.
Maybe some sort of clustering algorithm could help. Those whose vectors (Common class, GPA etc...) are similar would be clustered together.
You might want to start off by looking at recommendation systems and nearest neighbor search.