Best Fuzzy Matching Algorithm? [closed] - fuzzy-search

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?

I suggest you read the articles by Navarro mentioned in the Refences section of the Wikipedia article titled
Approximate string matching.
Making your decision based on actual research is always better than on suggestions by random
strangers.. Especially if performance on a known set of records is important to you.

It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.

Related

What are the best sites to learn about formal languages, automata, algorithms and data structure? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'd like to know what are the best sites to learn about formal languages, automata, algorithms and data structures. Preferable with many solved questions...
Thanks in advance
What I prefer is., a best book " On Theory of Automation", http://www.amazon.com/Introduction-Automata-Languages-Computation-Edition/dp/0321455363 .,
I have read this book., superb it is.
visit http://rosettacode.org/wiki/Rosetta_Code
You can compare also structure of programs on examples.
You didn't mentioned what kind of algorithms you want to learn. Anyway for basic algorithms and data structures TopCoder algorithm tutorial's page is a good place to start. Visit http://www.topcoder.com/tc?d1=tutorials&d2=alg_index&module=Static

Which data structure should be used while storing large number of data, but not any RDBMS? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
This question was asked in an interview. First, I came up with B-tree. He asked me to be more specific and asked me to describe how I would store the data so that it would be easier to retrieve.
Can you please throw some light on this. Thanks in advance
You question isn't really clear.
"Good" ways to store the data depend on what you want to do with it.
If you want access parts of your data, a list of offsets suffices. If you want to search in text, using an additional inverted index in combonation with docIds->offsets is great. If you have frequent updates to your data and reading is rare, none of those make sense. So it really depends
Sounds like an open question, so you can demonstrate your vast experience of ... well, http://en.wikipedia.org/wiki/NoSQL would be my guess, but you could argue that http://en.wikipedia.org/wiki/Dbm answers the question.

Algorithm ----Where I can get the algorithm resource-----Schedule algorithm and so on [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I wonder If there is a place provided many algorithms.
I want to know the details of some processĀ“s schedule algorithms.
For example, If I want to get some informations about Network, I will check out the RFC documents. I want to know, in the field of os algorithms ,if there is something like RFC.
Further more, If there is a place I can read lots of algorithms in many fields. In my view, Reading the algorithms in many fields can help me a lot in algorithm ------Anyway, someday, maybe I can combine two algorithms to solve one particular problem.
Thanks.
How about this: List of Algorithms. Also you can study Donald Knuth's The Art of Computer Programming Vol 1 - 4.
Wikipedia has lots of them. I don't think that there is not any organization that provides algorithms for OS.
Wikipedia holds a lot of algorithms.
Use section "See Also" there.

metrics for algorithms [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can anyone provide a complete list of metrics for rating an algorithm?
For example, my list starts with:
elegance
readability
computational efficiency
space efficiency
correctness
This list is not in order and my suspicion is that it isn't near complete. Can anyone provide a more complete list?
An exhaustive list may be difficult to put in a concise answer, since some important qualities will only apply to a subset of algorithms, like "levels of security offered by an encryption system for particular key sizes".
In any case, I'm interested to see more additions people might have. Here are a few:
optimal (mathematically proven to be the best)
accuracy / precision (heuristics)
any bounds on best, worst, average-case
any pathological cases? (asymptotes for chosen bad data, or encryption systems which do poorly for particular "weak" keys)
safety margin (encryption systems are breakable given enough time and resources)

Hash stable to small changes in text [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there a hash function that is stable to small changes in text? I'm looking for the opposite of a cryptographic hash, where small changes in the source lead to huge changes in the result.
Something like a perceptual hash for text. Is there such a thing?
Edited: by "small changes in text" I mean changes in punctuation, correction of ortographic / grammatical mistakes, etc. The text itself is an article, like a wikipedia entry (but it can be much smaller, like 2 or 3 paragraphs).
Bonus points if somebody can point to a Python implementation.
You're looking for locality sensitive hashing.

Resources