What machine learning algorithm to use for face matching? [closed] - algorithm

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I want your opinion on what would be the best machine learning algorithm, or even better, library, to use if I wanted to match two faces that look similar. Kind of like how google photos can put photos of the same people in their own album automatically. What's the best way to tackle this?

Face-based user identification isn't only one algorithm, it's a whole process that's still in active research. One way to experiment with it, is to follow these four steps:
Face region detection/extraction using the Histogram of Oriented Gradients (HOG)
Centralize eyes and lips using face landmark estimation
Image encoding using a CNN Model (openface for instance)
Image Search using a voisinage algorithm (KNN, LSHForest, etc)
Here's a blog article that draws a nice walk through the steps required to do face-based user identification:
machine learning is fun part 4 modern face recognition with deep learning

Related

What's a good data structure to store and work with a thesaurus? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've been working for a few years on an English-language thesaurus project, which combines a few sources (e.g. WordNet, Wiktionary Thesaurus, Moby Thesaurus, Word2vec) to make a large thesaurus. Currently I have the data defined as a list of lists. And each link has a score (higher = stronger), so "hotel" and "inn" might have a score of 2.0; but "hotel" and "fleabag" has a score of 0.2. High scores are near synonyms, low scores are more distant associations. I've been able to use Dijkstra and A* to find links between words (so-called "synonym chains").
Is there a type of graph database and/or analysis tools which is ideally suited for this sort of data? Word relationship strengths are often asymmetric. For example "Hoover Dam" links to "Herbert Hoover" more strongly than "Herbert Hoover" links back to "Hoover Dam". I'm interested in better ways to find the links between words, find unrelated words, measure word similarity.
I'd appreciate any new pointers/direction.
Interesting question. Not sure about the best data structure, but for processing, you can look at shell neighbors within this package: https://grispy.readthedocs.io/en/latest/api.html

Find duplicate images algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to create a program that find the duplicate images into a directory, something like this app does and I wonder what would be the algorithm to determine if two images are the same.
Any suggestion is welcome.
This task can be solved by perceptual-hashing, depending on your use-case, combined with some data-structure responsible for nearest-neighbor search in high-dimensions (kd-tree, ball-tree, ...) which can replace the brute-force search (somewhat).
There are tons of approaches for images: DCT-based, Wavelet-based, Statistics-based, Feature-based, CNNs (and more).
Their designs are usually based on different assumptions about the task, e.g. rotation allowed or not?
A google scholar search on perceptual image hashing will list a lot of papers. You can also look for the term image fingerprinting.
Here is some older ugly python/cython code doing the statistics-based approach.
Remark: Digikam can do that for you too. It's using some older Haar-wavelet based approach i think.

Is there some reliable way of detecting fake Facebook profiles [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I believe this could be interesting for many Facebook developers. Is there some reliable way of detecting fake profiles on Facebook? I am developing some games and applications for Facebook and have some virtual goods for sale. If player wants to play more he can create another profile or many others and play as much as he like. The idea here is to somehow detect this and stop them from doing so.
Best Regards!
Put validation on no. of friends.. if no. of friends < A PARTICULAR THRESHOLD, disallow user, else continue. Well.. That's only an opinion, not a solution.. :)
You can try using anomaly detection.
Make your 'features' number of likes/spam/friends/other relevant features you've found helpful, and use the algorithm to detect the anomalies.
Another approach could be supervised learning - but will require a labeled set of examples of "fake" and "real" users. The 'features' will be similar to these for anomaly detection.
Train your learning algorithm using the labeled set (usually referred as training set), and use the resulting classifier to decide if a new user is fake or not.
Some algorithms you can use are SVM, C4.5, KNN, Naive Bayes.
You can evaluate results for both methods using cross-validation (this requires a training set, of course)
If you want to learn more about machine learning approaches, I recommend taking the webcourse at coursera.

Algorithms under Plagiarism detection machines [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm very impressed to how plagiarism checkers (such as Turnitin website ) works. But how do they do that ? In a very effective way, I'm new to this area thus is there any word matching algorithm or anything that is similar to that is used for detecting alike sentences?
Thank you very much.
I'm sure many real-world plagiarism detection systems use more sophisticated schemes, but the general class of problem of detecting how far apart two things are is called the edit distance. That link includes links to many common algorithms used for this purpose. The gist is effectively answering the question "How many edits must I perform to turn one input into the other?". The challenge for real-world systems is performing this across a large corpus in an efficient manner. A related problem is the longest common subsequence, which might also be useful for such schemes to identify passages that are copied verbatim.

Google similar images algorithm [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Does any one have an idea regarding what sort of algorithm might Google be using to find similar images ?
No, but they could be using SIFT.
I'm not sure this has much to do with image processing. When I ask for "similar images" of the Eiffel tower, I get a bunch of photos of Paris Hilton, and street maps from Paris. Curiously, all of these images have the word "Paris" in the file name.
Currently the Google Image Search provides these filtering options:
Image size
Face detection
Continuous-tone ("Photo") vs. Smooth shading ("Clipart") vs. bitonal("Line drawing")
Color histogram
These options can be seen in its Image Search Result page.
I don't know about faces, but see at least:
http://www.incm.cnrs-mrs.fr/LaurentPerrinet/Publications/Perrinet08spie
Compare two images the python/linux way
I have heard, that one should use this when comparing images
(I mean: make the prob model, calc. the probs, use this):
http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Or then it might even be one of those PCFG things that MIT people tend to use with robotics stuff. One I read used some sort of PCFG model made of basic shapes (that you can rotate magically) and searched the best match with
http://en.wikipedia.org/wiki/Inside-outside_algorithm

Resources