Hash function for classification [closed] - algorithm

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Given a known set $A$ of distinct numbers $0 ~ 2^(n+1)-1$. In binary mode, it is a n-dimensional vector with 0/1 elements. Now for an arbitrary subset $S$ containing $m$ distinct numbers of $A$, is it possible to find a function $f$, such that $f(S)$ becomes $0,1,...,m-1$, while $f(A\S)$ should not fall in $0,1,...,m-1$. The function $f$ should be as simple as possible, a linear one is preferred. Thanks.

The keyword you're looking for is a minimal perfect hash function, and yes, it's always possible to construct a minimal perfect hash function for a given S.

Related

Why is it so difficult to program a true random number generator? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I don't understand why a PRNG is easier to program than a true RNG. Shouldn't a typical processor make short work of producing a truly random number?
Computers are deterministic machines, given the same input, code included, they will produce the same result. To get true randomness you need to introduce something random from the real world, like the time or cosmic rays or something else that you can't predict.

Cutting rectangular pieces of a rectangular paper and minimizing the wastage. [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
A rectangular piece of paper is given of W*H(widht * height). One is supposed to cut rectangular pieces out of it. The list (having 'k' elements) of size of the pieces is given. The size of the pieces is given by w*h. All the numbers are integers.
The cut must go from one end to the other.
There could be any number of pieces of the listed sizes(including no piece).
The aim is to use as much paper as possible, i.e minimize wastage.
Can anyone suggest me how to approach this problem.
this is your typical knapsack problem. i will spare you the details here but you can get more info and ideas on how to approach it here
http://en.wikipedia.org/wiki/Knapsack_problem

Robust image hashing algorithm implementation? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there any robust image hashing implementation in any programming language that I can use? This image hashing at minimum be able to generate the same hash for images that are altered in minor form (resized, rotated, minor touch, cropped etc ) .
The best example will be Tineye.com. They somehow hash each image and they are able to detect other duplicate images with minor modification.
I found some research but not implementation.
http://scholar.google.com/scholar?hl=en&as_sdt=0,10&q=robust+image+hashing

Hash stable to small changes in text [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there a hash function that is stable to small changes in text? I'm looking for the opposite of a cryptographic hash, where small changes in the source lead to huge changes in the result.
Something like a perceptual hash for text. Is there such a thing?
Edited: by "small changes in text" I mean changes in punctuation, correction of ortographic / grammatical mistakes, etc. The text itself is an article, like a wikipedia entry (but it can be much smaller, like 2 or 3 paragraphs).
Bonus points if somebody can point to a Python implementation.
You're looking for locality sensitive hashing.

Best Fuzzy Matching Algorithm? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?
I suggest you read the articles by Navarro mentioned in the Refences section of the Wikipedia article titled
Approximate string matching.
Making your decision based on actual research is always better than on suggestions by random
strangers.. Especially if performance on a known set of records is important to you.
It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.

Resources