Similar Sounding Music [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
We have a user, with a music library of 100 songs. Out of those he loves 20, he hates 10 and there are 5 he neither hates nor loves. He never listened to the remaining 65.
Question: What kind of algorithm(s) is/are used to scan the remaining 65 songs and find out music the user will like?

Do some research on a product called MusicIP, it had some very clever algorithm fingerprinting technology. It converted the track to WAV and then created a fingerprint, then some clever magic to match songs that were similar.

To suggest new unfamiliar content to a user, the general approach is to use machine learning, specifically collaborative filtering, which is often used for recommender systems. The idea is to use the knowledge of the crowd, and finds people (or groups) that have similar taste to yours, and recommend new items that they tend to like.
An alternative is creating a classification algorithm for like/dislike, but that might require extracting features from each song that will describe the essense of the problem, and that's usually not trivial at all.
Some classification algorithms you might want to try are SVM, Naive Bayes, neural networks, Decision trees and more. The real challenge, as I mentioned would be to find the right features for the problem.

Related

Detecting presence of music in ambient sound [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I hoping to create an application that would listen to ambient sound and detect if music is being played. It is not important to identify the music being played; just detecting that some music is being played is enough.
I looked around for existing solutions but couldn't find any. Does anyone know algorithms that I can use to solve this problem? If source code is available, all the better.
I found are a couple of academic papers and implemented solutions suggested in them. But the results I obtained were not satisfactory.
PS:
i) It would be a bonus if the algorithm is not computationally intensive; if algorithm is completely in time-domain that would be wonderful. ii) It is okay if the solution is not very accurate; occasional false-positives are okay.
Under the assumption that music is made of a bunch of chords instead of single pitch (like monophonic MIDI), multiple pitches at the same time (aka, the chords) may be a good candidate to be detected and differentiated from pure noise. Actually there is a very good Harmony Progression Analyser software package in which chords are detected based on a chromagram. Hope it helps.

How to classify a large collection of user entered company names? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Our site allows users to enter the company they work for as a free form text entry.
Historically we gathered around a few millions of unique entries. Since we put no constraints we ended up with a lot of variations, typos (e.g. over 1000 distinct entries just for McDonald's)
We realized we could provide our users with a great feature if only we could tie these variations together. We compiled a clean list of companies as a starting point using various online sources [Dictionary]
Now, we're trying to find out a best way to deal with the user data source. We thought about assigning some similarity score:
- comparing each entry with [Dictionary], calculating a lexical distance (possibly in Hadoop job)
- taking advantage of some search database (e.g. Solr)
And associate the user enter text this way.
What we're wondering is did anyone go through similar "classification" exercise and could share any tips?
Thanks,
Piotr
I'd use simple Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance).
A few millions entries - you should be able to process it easily on one computer (no hadoop, or other heavy-weight tools).

Reviewing Data Structures and Algorithms [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I was wondering if anyone had knew of a website that provides a great review of data structures and algorithms. I would like it to specifically geared towards interview questions with regards to data structures and algorithms. Would implementation of all of these data structures be something good to review?
Thanks!
This page is a good starting point:
This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science. When preparing for technical interviews in the past, I found myself spending hours crawling the internet putting together the best, average, and worst case complexities for search and sorting algorithms so that I wouldn't be stumped when asked about them. Over the last few years, I've interviewed at several Silicon Valley startups, and also some bigger companies, like Yahoo, eBay, LinkedIn, and Google, and each time that I prepared for an interview, I thought to msyelf "Why oh why hasn't someone created a nice Big-O cheat sheet?". So, to save all of you fine folks a ton of time, I went ahead and created one.

Is there any online judge for data mining [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
There are many Online Judges (OJ) for ACM/ICPC questions. And another Online Judge for Interview questions, named Leetcode (http://leetcode.com).
I think these OJs are very useful for us to learn algorithms. Recently, I am going to learn data mining algorithms. Is there any OJ for data mining questions?
Thank you very much.
There is MLcomp, where you can submit an algorithm and it will run it on a number of data sets to judge how well it is doing.
Plus, there is Kaggle, which hosts various classification competitions.
And of course you can do classes at Cousera. These are pretty much low level, but in order to get submission points you need to reproduce the known performance.
In particular the first also allows you to run several standard algorithms such as naive bayes and SVM and see how well they did. Obviously, your own implementation should perform similar then.
Unfortunately, both are pretty much focused on machine learning (i.e. classification and regression). There is very little in the unsupervised domain, clustering and outlier detection. On unlabeled data, things get too hard even to evaluate locally, so doing any kind of online judging is pretty much unsolved. What you can do is largely a one-class classification, or you just strip labels before running the algorithm.

Naive approaches to detecting plagiarism? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Let's say you wanted to compare essays by students and see if one of those essays was plagiarized. How would you go about this in a naive manner (i.e. not too complicated approach)? Of course, there are simple ways like comparing the words used in the essays, and complicated ways like using compressing functions, but what are some other ways to check plagiarism without too much complexity/theory?
There are several papers giving several approaches, I recommend reading this
The paper shows an algorithm based on an index structure
built over the entire file collection.
So they say their algorithm can be used to find similar code fragments in a large software system. Before the index is built, all the files in the
collection are tokenized. This is a simple parsing problem, and can be solved in
linear time. For each of the N files in the collection, The output of the tokenizer
for a file F_i is a string of n_i tokens.
here is other paper you could read
Other good algorithm is a scam based algorithm that consists on detecting plagiarism by making comparison on a set of words that are common between test document
and registered document. Our plagiarism detection system, like many Information Retrieval systems, is evaluated with metrics of precision and recall.
You could take a look at Dick Grune's similarity comparator, which claims to work on natural language texts as well (I've only tried it on software). The algorithms are described as well. (By the way, his book on parsing is really good, in my opinion.)

Resources