Find the pair of bitstrings with the largest number of common set bits - algorithm

I want to find an algorithm to find the pair of bitstrings in an array that have the largest number of common set bits (among all pairs in the array). I know it is possible to do this by comparing all pairs of bitstrings in the array, but this is O(n2). Is there a more efficient algorithm? Ideally, I would like the algorithm to work incrementally by processing one incoming bitstring in each iteration.
For example, suppose we have this array of bitstrings (of length 8):
B1:01010001
B2:01101010
B3:01101010
B4:11001010
B5:00110001
The best pair here is B2 and B3, which have four common set bits.
I found a paper that appears to describe such an algorithm (S. Taylor & T. Drummond (2011); "Binary Histogrammed Intensity Patches for Efficient and Robust Matching"; Int. J. Comput. Vis. 94:241–265), but I don't understand this description from page 252:
This can be incrementally updated in each iteration as the only [bitstring] overlaps that need recomputing are those for the new parent feature and any other [bitstrings] in the root whose “most overlapping feature” was one of the two selected for combination. This avoids the need for the O(N2) overlap comparison in every iteration and allows a forest for a typically-sized database of 700 features to be built in under a second.

As far as I can tell, Taylor & Drummond (2011) do not purport to give an O(n) algorithm for finding the pair of bitstrings in an array with the largest number of common set bits. They sketch an argument that a record of the best such pairs can be updated in O(n) after a new bitstring has been added to the array (and two old bitstrings removed).
Certainly the explanation of the algorithm on page 252 is not very clear, and I think their sketch argument that the record can be updated in O(n) is incomplete at best, so I can see why you are confused.
Anyway, here's my best attempt to explain Algorithm 1 from the paper.
Algorithm
The algorithm takes an array of bitstrings and constructs a lookup tree. A lookup tree is a binary forest (set of binary trees) whose leaves are the original bitstrings from the array, whose internal nodes are new bitstrings, and where if node A is a parent of node B, then A & B = A (that is, all the set bits in A are also set in B).
For example, if the input is this array of bitstrings:
then the output is the lookup tree:
The algorithm as described in the paper proceeds as follows:
Let R be the initial set of bitstrings (the root set).
For each bitstring f1 in R that has no partner in R, find and record its partner (the bitstring f2 in R − {f1} which has the largest number of set bits in common with f1) and record the number of bits they have in common.
If there is no pair of bitstrings in R with any common set bits, stop.
Let f1 and f2 be the pair of bitstrings in R with the largest number of common set bits.
Let p = f1 & f2 be the parent of f1 and f2.
Remove f1 and f2 from R; add p to R.
Go to step 2.
Analysis
Suppose that the array contains n bitstrings of fixed length. Then the algorithm as described is O(n3) because step 2 is O(n2), and there are O(n) iterations, because at each iteration we remove two bitstrings from R and add one.
The paper contains an argument that step 2 is Ω(n2) only on the first time around the loop, and on other iterations it is O(n) because we only have to find the partner of p "and any other bitstrings in R whose partner was one of the two selected for combination." However, this argument is not convincing to me: it is not clear that there are only O(1) other such bitstrings. (Maybe there's a better argument?)
We could bring the algorithm down to O(n2) by storing the number of common set bits between every pair of bitstrings. This requires O(n2) extra space.
Reference
S. Taylor & T. Drummond (2011). "Binary Histogrammed Intensity Patches for Efficient and Robust Matching". Int. J. Comput. Vis. 94:241–265.

Well for each bit position you could maintain two sets, those with that position on and those with it off. The sets could be placed in two binary trees for example.
Then you just perform set unions, first with all eight bits, than every combination of 7 and so on, until you find union with two elements.
The complexity here grows exponentially in the bit size, but if it is small and fixed this isn't a problem.
Another way to do it might be to look at the n k-bit strings as n points in a kD space, and your task is to find the two points closest together. There are a number of geometric algorithms to do this.

Related

Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper:
Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon)
) random permutations of the bits. For each random permutation σ, we
maintain a sorted order O σ of the bit vectors, in lexicographic order
of the bits permuted by σ. Given a query bit vector q, we find the
approximate nearest neighbor by doing the following: For each permu-
tation σ, we perform a binary search on O σ to locate the two bit
vectors closest to q (in the lexicographic order ob- tained by bits
permuted by σ). We now search in each of the sorted orders O σ
examining elements above and below the position returned by the binary
search in order of the length of the longest prefix that matches q.
This can be done by maintaining two pointers for each sorted order O σ
(one moves up and the other down). At each step we move one of the
pointers up or down corresponding to the element with the longest
matching prefix. (Here the length of the longest matching prefix in O
σ is computed relative to q with its bits permuted by σ). We examine
2N = O(n 1/(1+epsilon) ) bit vec- tors in this way. Of all the bit vectors
examined, we return the one that has the smallest Hamming distance to
q.
I'm confused by this algorithm and I don't think that I understood how it works.
I found already this question on the topic, but I didn't understand the answer in the comments. Also in this question in point 2 the same algorithm is described but again, I don't understand how its works.
Can you please try to explain to me how it works step by step trying to be more simple as possible?
I even tried to make a list of things that I don't understand, but in practice is so bad written that I don't understand most of the sentences!
EDIT after gsamaras answer:
I mostly understood the answer, but I still have some doubts:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them ?
The permutation+sorting process described above is performed only once during the pre-processing or for every query q? It seems already pretty expensive O(Nnlogn) even in pre-processing, if we have to do this at query time it's a disaster :D
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?
This question is somehow broad, so I am just going to give a minimal (abstract) example here:
We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.
Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:
v1
v5
v0
v3
v2
v4
Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.
However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:
v1
v5
v0 <-- up pointer
<-- q lies here
v3 <-- down pointer
v2
v4
Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.
Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.
Edit:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them?
If they actually sort every permutation from scratch, then yes, but it's not clear for me how they do it.
The permutation+sorting process described above is performed only once during the pre-processing or for every query q?
ONCE.
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?
I think they do it with the permuted version (see the parentheses before 2N in the paper). But that wouldn't make any difference, since they permute q too with the same permute (σ).
This quora answer may shed some light too.

Ordered lattice point enumeration

Setup: Let ei be an orthogonal basis for n-dimensional Euclidean space, but suppose that ei has irrational (L1) norm. Let L be the set of points obtained by taking linear combinations of the ei with coefficients in the natural numbers (including zero). Now order the points in L first by their L1-norm and then lexicographically.
Question: Is there an efficient algorithm for producing the points in L in increasing order up to some pre-defined bound? Note that I do not want to produce the points and then sort them, rather I want to walk the lattice in order.
Observation: This is easy to do if the ei are an orthonormal basis. For instance, this problem is solved here. In principle something similar would work here, however determining the radii to iterate over is almost as hard as solving the enumeration problem, so it isn't very useful.
How about this:
Let L₁ and L₂ be lists of vectors, where L₁ is the list of visited/processed lattice vectors and L₂ is a list of lists of vectors that will be visited next.
Set L₁={ } and L₂ = {[0]}, where 0 is the zero-vector.
Let v be the smallest vector of the first list in L₂.
Visit/process the vector v.
Add the list L={v+e₁,...,v+en} to L₂, such that the lists are sorted by their smallest element. Only generate v+ei as long as its norm is smaller than your predefined bound.
Insert v at the end of L₁ and remove it from the front of the first list L₂.
If the first list is now empty, remove it from L₂. If not, move it to the correct place.
If L₂ is not empty, goto 2.
This algorithm requires the ei to be sorted by their norm from small to big.
This algorithm adds at most n vectors to L₂ per round. Let B your predefined upper bound, then there are at most nk-1 vectors you are going to visit, where k = 1+B/||e₁||. The first ca. nk' rounds, the list will be of size n, where k' = B/||en||. So in total you have to store less than N = nk' + (nk-1)/(nk'+1) lists. you can generate a new list in O(n) and add place it in L₂ in O(log N) (binary search the correct place and link insert it there).
So the overall complexity would be something like O(N⋅n⋅log N), but notice that N is about the number of vectors you are looking for.
Notice: most likely there is a faster algorithm, but this is something you can try.

How to code the maximum set packing algorithm?

Suppose we have a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint .
The optimization version of the problem, maximum set packing, asks for the maximum number of pairwise disjoint sets in the list.
http://en.wikipedia.org/wiki/Set_packing
So, Let S = {1,2,3,4,5,6,7,8,9,10}
and `Sa = {1,2,3,4}`
and `Sb = {4,5,6}`
and `Sc = {5,6,7,8}`
and `Sd = {9,10}`
Then the maximum number of pairwise disjoint sets are 3 ( Sa, Sc, Sd )
I could not find any articles about the algorithm involved. Can you shed some light on the same?
My approach:
Sort the sets according to the size. Start from the set of the smallest size. If no element of the next set intersects with the current set, then we unite the set and increase the count of maximum sets. Does this sound good to you? Any better ideas?
As hivert pointed out, this problem is NP-hard, so there's no efficient way to do this. However, if your input is relatively small, you can still pull it off. Exponential doesn't mean impossible, after all. It's just that exponential problems become impractical very quickly, as the input size grows. But for something like 25 sets, you can easily brute force it.
Here's one approach. Let's say you have n subsets, called S0, S1, ..., etc. We can try every combination of subsets, and pick the one with maximum cardinality. There are only 2^25 = 33554432 choices, so this is probably reasonable enough.
An easy way to do this is to notice that any non-negative number strictly below 2^N represents a particular choice of subsets. Look at the binary representation of the number, and choose the sets whose indices correspond to the bits that are on. So if the number is 11, the 0th, 1st and 3rd bits are on, and this corresponds to the combination [S0, S1, S3]. Then you just verify that these three sets are in fact disjoint.
Your procedure is as follows:
Iterate i from 0 to 2^N - 1
For each value of i, use the bits that are on to figure out the corresponding combination of subsets.
If those subsets are pairwise disjoint, update your best answer with this combination (i.e., use this if it is bigger than your current best).
Alternatively, use backtracking to generate your subsets. The two approaches are equivalent, modulo implementation tradeoffs. Backtracking will have some stack overhead, but can cut off entire lines of computation if you check disjointness as you go. For example, if S1 and S2 are not disjoint, then it will never bother with any bigger combinations containing those two, saving some time. The iterative method can't optimize itself in this way, but is fast and efficient because of the bitwise operations and tight loop.
The only nontrivial matter here is how to check if the subsets are pairwise disjoint. There are all sorts of tricks you can pull here as well, depending on the constraints.
A simple approach is to start with an empty set structure (pick whatever you want from the language of your choice) and add elements from each subset one by one. If you ever hit an element that's already in the set, then it occurs in at least two subsets, and you can give up on this combination.
If the original set S has m elements, and m is relatively small, you can map each of them to the range [0, m-1] and use bitmasks for each set. So if m <= 64, you can use a Java long to represent each subset. Turn on all the bits that correspond to the elements in the subset. This allows blazing fast set operation, because of the speed of bitwise operations. Bitwise AND corresponds to set intersection, and bitwise OR is a union. You can check if two subsets are disjoint by seeing if the intersection is empty (i.e., ANDing the two bitmasks gives you 0).
If you don't have so few elements, you can still avoid repeating the set intersections multiple times. You have very few sets, so precompute which ones are disjoint at the start. You can just store a boolean matrix D, such that D[i][j] = true iff i and j are disjoint. Then you just look up all pairs in a combination to verify pairwise disjointness, rather than doing real set operations.
You can solve the set packing problem searching a Maximum independent set. You encode your problem as follows:
for each set you put a vertex
you put an edge between two vertex if they share a common number.
Then you wan't a maximum set of vertex without two having two related vertex. Unfortunately this is a NP-Hard problem. Any know algorithm is exponential.

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.
Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
Edit:
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
sums.push_back(sum);
}
}
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.
It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.
If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
pairs.append((2**len(pairs),x))
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
0
|
v
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.

Superset Search

I'm looking for an algorithm to solve the following in a reasonable amount of time.
Given a set of sets, find all such sets that are subsets of a given set.
For example, if you have a set of search terms like ["stack overflow", "foo bar", ...], then given a document D, find all search terms whose words all appear in D.
I have found two solutions that are adequate:
Use a list of bit vectors as an index. To query for a given superset, create a bit vector for it, and then iterate over the list performing a bitwise OR for each vector in the list. If the result is equal to the search vector, the search set is a superset of the set represented by the current vector. This algorithm is O(n) where n is the number of sets in the index, and bitwise OR is very fast. Insertion is O(1). Caveat: to support all words in the English language, the bit vectors will need to be several million bits long, and there will need to exist a total order for the words, with no gaps.
Use a prefix tree (trie). Sort the sets before inserting them into the trie. When searching for a given set, sort it first. Iterate over the elements of the search set, activating nodes that match if they are either children of the root node or of a previously activated node. All paths, through activated nodes to a leaf, represent subsets of the search set. The complexity of this algorithm is O(a log a + ab) where a is the size of the search set and b is the number of indexed sets.
What's your solution?
The prefix trie sounds like something I'd try if the sets were sparse compared to the total vocabulary. Don't forget that if the suffix set of two different prefixes is the same, you can share the subgraph representing the suffix set (this can be achieved by hash-consing rather than arbitrary DFA minimization), giving a DAG rather than a tree. Try ordering your words least or most frequent first (I'll bet one or the other is better than some random or alphabetic order).
For a variation on your first strategy, where you represent each set by a very large integer (bit vector), use a sparse ordered set/map of integers (a trie on the sequence of bits which skips runs of consecutive 0s) - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 (implemented in http://www.scala-lang.org/docu/files/api/scala/collection/immutable/IntMap.html).
If your reference set (of sets) is fixed, and you want to find for many of those sets which ones contain others, I'd compute the immediate containment relation (a directed acyclic graph with a path from a->b iff b is contained in a, and without the redundant arcs a->c where a->b and b->c). The branching factor is no more than the number of elements in a set. The vertices reachable from the given set are exactly those that are subsets of it.
First I would construct 2 data structures, S and E.
S is an array of sets (set S has the N subsets).
S[0] = set(element1, element2, ...)
S[1] = set(element1, element2, ...)
...
S[N] = set(element1, element2, ...)
E is a map (element hash for index) of lists. Each list contains S-indices, where the element appears.
// O( S_total_elements ) = O(n) operation
E[element1] = list(S1, S6, ...)
E[element2] = list(S3, S4, S8, ...)
...
Now, 2 new structures, set L and array C.
I store all the elements of D, that exist in E, in the L. (O(n) operation)
C is an array (S-indices) of counters.
// count subset's elements that are in E
foreach e in L:
foreach idx in E[e]:
C[idx] = C[idx] + 1
Finally,
for i in C:
if C[i] == S[i].Count()
// S[i] subset exists in D
Can you build an index for your documents? i.e. a mapping from each word to those documents containing that word. Once you've built that, lookup should be pretty quick and you can just do set intersection to find the documents matching all words.
Here's Wiki on full text search.
EDIT: Ok, I got that backwards.
You could convert your document to a set (if your language has a set datatype), do the same with your searches. Then it becomes a simple matter of testing whether one is a subset of the other.
Behind the scenes, this is effectively the same idea: it would probably involve building a hash table for the document, hashing the queries, and checking each word in the query in turn. This would be O(nm) where n is the number of searches and m the average number of words in a search.

Resources