Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm working on a school project on product analysis which is based on sentimental analysis. I've been looking for a training dataset for quite a some time now and what I've been able to find so far is a dataset for movie reviews. My question is, can I use this dataset for training the classifier, i.e. will it have an effect on the accuracy of classification? If so, does anyone here know where I can get a free dataset for product reviews?
I am assuming you are using some textual model like the bag of words model.
From my experiments, you usually don't get good results when changing from one domain to another (even if the train data set and the test are all products, but of different categories!).
Think of it logically, an oven that gets hot quickly usually indicate a good product. Is it also the same for laptops?
When I experimented with it a few years ago I used amazon comments as both train set and also to test my algorithms.
The comments are short and informative and were enough to get ~80% accuracy. The 'ground' truth was the stars system, where 1-2 stars were 'negative', 3 stars - 'neutral', and 4-5 stars 'positive'.
I used a pearl script from esuli.it to crawl amazon's comments.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I've been told about car's plate image database that are avaliable on the web for free download to develop image processing and automatic number plate recognition algorithms, does anyone have a link to download or at least some keywords to search on the web?
If it's not legal or is there any ethic issues i would thank if you notice me.
It's perfectly legal to do so, as long as the images are CC (Creative Commons) licensed, or you have permission of the website owner to do.
A quick search for number plate image database yields some results:
Examples of test images
Academic Document (More examples of number plates)
Good small library of number plates.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Our site allows users to enter the company they work for as a free form text entry.
Historically we gathered around a few millions of unique entries. Since we put no constraints we ended up with a lot of variations, typos (e.g. over 1000 distinct entries just for McDonald's)
We realized we could provide our users with a great feature if only we could tie these variations together. We compiled a clean list of companies as a starting point using various online sources [Dictionary]
Now, we're trying to find out a best way to deal with the user data source. We thought about assigning some similarity score:
- comparing each entry with [Dictionary], calculating a lexical distance (possibly in Hadoop job)
- taking advantage of some search database (e.g. Solr)
And associate the user enter text this way.
What we're wondering is did anyone go through similar "classification" exercise and could share any tips?
Thanks,
Piotr
I'd use simple Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance).
A few millions entries - you should be able to process it easily on one computer (no hadoop, or other heavy-weight tools).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I was wondering if anyone had knew of a website that provides a great review of data structures and algorithms. I would like it to specifically geared towards interview questions with regards to data structures and algorithms. Would implementation of all of these data structures be something good to review?
Thanks!
This page is a good starting point:
This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science. When preparing for technical interviews in the past, I found myself spending hours crawling the internet putting together the best, average, and worst case complexities for search and sorting algorithms so that I wouldn't be stumped when asked about them. Over the last few years, I've interviewed at several Silicon Valley startups, and also some bigger companies, like Yahoo, eBay, LinkedIn, and Google, and each time that I prepared for an interview, I thought to msyelf "Why oh why hasn't someone created a nice Big-O cheat sheet?". So, to save all of you fine folks a ton of time, I went ahead and created one.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am a newbie in Matlab field. And i want to learn more about methodology to comparing 2 images to know the similarity between them.
I need more information in international journal / international proceeding, book or another reprort that describe about it.
I Will use it as my literature study.
Is there any suggestion what is the similar journal,book or proceeding that has discussed about it? If has, please include the title and link of them..
Thank You for the attention.
For journals I would recommend the IEEE Transactions on Image Processing:
http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=83
This is a good general intro from MIT:
http://www.mit.edu/~ka21369/Imaging2012/tannenbaum.pdf
You need to define "similarity" better.
In the image compression sense, similarity is a function of the pixel-wise difference between the images (PSNR, and other metrics).
In a computer vision sense, you would want to see if the two images contain similar content such as objects or scenes. I would recommend using Google Scholar for that.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I wish to use Google 2-grams for my project; but the data size renders searching expensive both in terms of speed and storage.
Is there a Web-API available for this purpose (in any language) ? The website http://books.google.com/ngrams/graph renders an image, can I get data values?
Well, I got a round about way of doing that, using Google BigQuery
In that, trigrams are available in public domain. Using Command line access did the job for me.
I found a great alternative: Microsoft Web N-Gram
It can be queried in different ways, including a straighforward GET call through the REST interface.
For instance, calling the URL:
http://weblm.research.microsoft.com/weblm/rest.svc/bing-body/apr10/1/jp?u={YOUR_TOKEN}&p=red+panda
returns
-9.005
which is the log likelihood of the phrase red panda.
Furthermore, it is handier than Google N-Grams, as for a given phrase it does not simply output its absolute frequency, but it can output its joint probability, conditional probability and even the most likely words that follow.
Disclaimer: I am not a Microsoft employee, I simply think that I just found an awesome service.