It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I want to implement a "text" suggestion.
I have a huge number of data, how I can implement an efficient and scalable auto-complete?
Edit 1:
i have a mysql table with one client per row and a 'name' column, i want to create a suggest in order to search client name (like google suggest but instead of queries it is client name) - I have a huge numbers of rows, how I can design an efficient suggest?
When user will start typing inside an "input text", I want to display possible client names
OK, I think I understand what you're looking for and here are some possible solutions for you:
What is the best autocomplete/suggest algorithm,datastructure [C++/C] (the answers are generic enough despite the fact that it's a C/C++ question)
How to implement autocomplete on a massive dataset
Autocomplete using a trie
Algorithm for autocomplete?
Trie based addressbook and efficient search by name and contact number
How do you autocomplete names containing spaces?
Essentially, it seems like you're looking for auto-complete functionality (if I understood your question correctly). Along those lines, the above questions and their answers also provide a lot of references on how to do more complex suggestions (i.e. based on content, semantics, intent, etc.).
This will probably not address your question if you're looking for an algorithm that makes "related" suggestions, e.g.:
"water" may suggest kool-aid, gatorade, vitamin water.
"sea" may suggest ocean, lake, river
Related
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I've a book review site, where readers can write reviews about books, other users can post comments. I wanted to know following things automatically whenever new review publish or new comment published.
(1) whether book review is positive or not? How much % positive / negative?
(2) whether comment made by particular user is positive or not? How much % positive / negative?
(3) I want to read Tweets about particular book and wanted to check whether the tweet is positive or not?
bottom line, I want some tool suggestions (opensource), which I can use for my website. Website is written in PHP and I'm looking for some semantic analysis tool which I can customize to meet my need or which best fit my need.
if not, I want to know if its easy to build one with minimal requirements. I know PHP, Perl, Shell Script. I can learn Python. I know C++, Java may be right language to start from scratch; but don't have much experience.
There is an open source semantic analyses engine incubated in the Apache Software Foundation, currently, called Stanbol. It provides APIs to interface with it over HTTP as well as through a Java API if needed. It's pretty advanced, but generally speaking if your needs are simpler you can always try some SaS solution like uClassify.
In response to your first request, I'd suggest you create a form where the user has a voting option (such as a x/5 star rating, etc) then you would calculate the average from all of the reviews.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Now. I build a formula for news to get list hot news.
have any factor as pageview, time,content.
what does a solution for this problem?
Thanks
You can try a machine learning approach for this problem.
Extract your features, and give each a numeric value (you can use
the Bag of Words model for content). Note that some feature selection algorithm might be needed.
Manually label large enough set of examples - and give each of them a score according to its importance.
Use linear regression and build a function that evaluates each article and gives it a score.
Now that you have your regression function, you can use it to give score to each article. Use it to achieve the raw score.
For post processing - combine this score with the time in a second function to get the article's final score. #MattBall's suggested link seems like reasonable approach.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I need to test drive Naïve string search algorithm.
http://en.wikipedia.org/wiki/String_searching_algorithm
Can someone shed some light on how I could approach the issue.
should my tests only be testing outside behaviour? (i.e. the pattern occuring indexes irrespective of the algorithm used? )
Or should I be algorithm specific and test drive algorithm specific implementations?
Or should I be algorithm specific and test drive algorithm specific implementations?
This largely depends on how your class will be used. Testing public contract is usually the way to go (and it's fairly easy to write decent tests for that), so unless your clients can somehow use implementation details knowledge, I'd stick to that.
Note that having specific algorithm on paper could help pinpointing few basic tests, without writing strictly implementation related tests, like:
invalid input (empty strings, nulls)
input being too large/too small (like, pattern exceeding searched string length - what do you do then?)
valid input, yet matching nothing
This should give you basic entry point for more implementation specific testing. Keep in mind that utilizing data driven testing can help you avoid the need of having implementation level knowledge altogether, and with large enough data set might be just enough to verify algorithm correctness aswell.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I want to scrape reviews about various products and things in the web, how can I do that. There is a company called searchreviews.com, they do it, I want to know how they do it.
They get a page's HTML then parse it, targeting whatever information they need.
It's really awful, because it depends on the DOM of the site you're scraping, which can change at any time, in both trivial and complex ways. I've worked with companies that have scraped (legitimately) various types of sites, and it's horrible.
mechanize or watir or rautomation are related gems that might help you here.
I've done this very often for various clients, and most of the time a site that gathers reviews is pretty well structured, so scraping isn't too hard. Look at Yelp.com for example. I built a routine in screen-scraper that searched zip codes in the client's area, used the filters to hone in on the desired business types, and makes a list of unique results (since the zip code searches could render duplicate results). From there I hie each unique URL. The reviews are pretty easy to parse with just RegEx, and some page iteration.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I was wondering if such algorithm exists. I have a bunch of text documents and would like to find a pattern among all these documents, if a pattern exists. Please note im NOT trying to classify the documents all i want to do is find a pattern if it exists among some documents. Thanks!
The question as it stands now is kinda vague.. you kinda need to know what you are looking for in order to be able to find it.
Some ideas that may be of use -
Get n-gram counts for each document separately for n = 1,2,3,4 and then compare the frequencies of each ngram across the documents. This should help you find commonly occuring phrases across all documents.
Get a part of speech tagger to get convert all the docs into a stream of POS tags and then do the same as 1
Use a PCFG software such as the Stanford Parser to get parse trees for all the sentences across all the documents, and then try to figure out how similar the distribution of sentence structures are for different documents.