Doubts about Alphabetic complexity

Doubts about Alphabetic complexity - algorithm

I'm beginner in Description complexity so i have some doubts to resolve many exercises. Maybe, can someone explainde me how to resolve this problem? also the main idea that involves this problem and maybe some interesting lectures. Thanks in advance.
Alphabetic complexity
Rank the following structures based on their complexity:
d j x k
a a a a
p q r s
a b c d
w x y z
By default, a series of four letters requires 4 × log(26) bits, but we can do much better for several of these sequences (one needs log2(N) bits in general to represent one object taken among N objects; if the set of objects is ordered, the complexity of the nth one is at most log2(n)).
Provide indications showing how the complexity of the above structures can be computed. Be precise. Imagine a simple "machine" with a few avilable operations such as copy, increment, search, last; the machine may read several lists: list of operations, alphabet, list of lists. One needs a few bits to designate a list, plus more bits to designate an element in that list. The problem is to find the smallest designation of the object.

Related

Efficiently search for pairs of numbers in various rows

Imagine you have N distinct people and that you have a record of where these people are, exactly M of these records to be exact.
For example
1,50,299
1,2,3,4,5,50,287
1,50,299
So you can see that 'person 1' is at the same place with 'person 50' three times. Here M = 3 obviously since there's only 3 lines. My question is given M of these lines, and a threshold value (i.e person A and B have been at the same place more than threshold times), what do you suggest the most efficient way of returning these co-occurrences?
So far I've built an N by N table, and looped through each row, incrementing table(N,M) every time N co occurs with M in a row. Obviously this is an awful approach and takes 0(n^2) to O(n^3) depending on how you implent. Any tips would be appreciated!

There is no need to create the table. Just create a hash/dictionary/whatever your language calls it. Then in pseudocode:
answer = []
for S in sets:
for (i, j) in pairs from S:
count[(i,j)]++
if threshold == count[(i,j)]:
answer.append((i,j))
If you have M sets of size of size K the running time will be O(M*K^2).
If you want you can actually keep the list of intersecting sets in a data structure parallel to count without changing the big-O.
Furthermore the same algorithm can be readily implemented in a distributed way using a map-reduce. For the count you just have to emit a key of (i, j) and a value of 1. In the reduce you count them. Actually generating the list of sets is similar.

The known concept for your case is Market Basket analysis. In this context, there are different algorithms. For example Apriori algorithm can be using for your case in a specific case for sets of size 2.
Moreover, in these cases to finding association rules with specific supports and conditions (which for your case is the threshold value) using from LSH and min-hash too.

you could use probability to speed it up, e.g. only check each pair with 1/50 probability. That will give you a 50x speed up. Then double check any pairs that make it close enough to 1/50th of M.
To double check any pairs, you can either go through the whole list again, or you could double check more efficiently if you do some clever kind of reverse indexing as you go. e.g. encode each persons row indices into 64 bit integers, you could use binary search / merge sort type techniques to see which 64 bit integers to compare, and use bit operations to compare 64 bit integers for matches. Other things to look up could be reverse indexing, binary indexed range trees / fenwick trees.

Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper:
Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon)
) random permutations of the bits. For each random permutation σ, we
maintain a sorted order O σ of the bit vectors, in lexicographic order
of the bits permuted by σ. Given a query bit vector q, we find the
approximate nearest neighbor by doing the following: For each permu-
tation σ, we perform a binary search on O σ to locate the two bit
vectors closest to q (in the lexicographic order ob- tained by bits
permuted by σ). We now search in each of the sorted orders O σ
examining elements above and below the position returned by the binary
search in order of the length of the longest prefix that matches q.
This can be done by maintaining two pointers for each sorted order O σ
(one moves up and the other down). At each step we move one of the
pointers up or down corresponding to the element with the longest
matching prefix. (Here the length of the longest matching prefix in O
σ is computed relative to q with its bits permuted by σ). We examine
2N = O(n 1/(1+epsilon) ) bit vec- tors in this way. Of all the bit vectors
examined, we return the one that has the smallest Hamming distance to
q.
I'm confused by this algorithm and I don't think that I understood how it works.
I found already this question on the topic, but I didn't understand the answer in the comments. Also in this question in point 2 the same algorithm is described but again, I don't understand how its works.
Can you please try to explain to me how it works step by step trying to be more simple as possible?
I even tried to make a list of things that I don't understand, but in practice is so bad written that I don't understand most of the sentences!
EDIT after gsamaras answer:
I mostly understood the answer, but I still have some doubts:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them ?
The permutation+sorting process described above is performed only once during the pre-processing or for every query q? It seems already pretty expensive O(Nnlogn) even in pre-processing, if we have to do this at query time it's a disaster :D
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?

This question is somehow broad, so I am just going to give a minimal (abstract) example here:
We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.
Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:
v1
v5
v0
v3
v2
v4
Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.
However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:
v1
v5
v0 <-- up pointer
<-- q lies here
v3 <-- down pointer
v2
v4
Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.
Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.
Edit:
Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them?
If they actually sort every permutation from scratch, then yes, but it's not clear for me how they do it.
The permutation+sorting process described above is performed only once during the pre-processing or for every query q?
ONCE.
At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?
I think they do it with the permuted version (see the parentheses before 2N in the paper). But that wouldn't make any difference, since they permute q too with the same permute (σ).
This quora answer may shed some light too.

Is this searching algorithm optimal?

I have two lists, L and M, each containing thousands of 64-bit unsigned integers. I need to find out whether the sum of any two members of L is itself a member of M.
Is it possible to improve upon the performance of the following algorithm?
Sort(M)
for i = 0 to Length(L)
for j = i + 1 to Length(L)
BinarySearch(M, L[i] + L[j])

(I'm assuming your goal is to find all pairs in L that sum to something in M)
Forget hashtables!
Sort both lists.
Then do the outer loop of your algorithm: walk over every element i in L, then every larger element j in L. As you go, form the sum and check to see if it's in M.
But don't look using a binary search: simply do a linear scan from the last place you looked. Let's say you're working on some value i, and you have some value j, followed by some value j'. When searching for (i+j), you would have got to the point in M where that value is found, or the first largest value. You're now looking for (i+j'); since j' > j, you know that (i+j') > (i+j), and so it cannot be any earlier in M than the last place you got. If L and M are both smoothly distributed, there is an excellent chance that the point in M where you would find (i+j') is only a little way off.
If the arrays are not smoothly distributed, then better than a linear scan might be some sort of jumping scan - look forward N elements at a time, halving N if the jump goes too far.
I believe this algorithm is O(n^2), which is as fast as any proposed hash algorithm (which have an O(1) primitive operation, but still have to do O(n**2) of them. It also means that you don't have to worry about the O(n log n) to sort. It has much better data locality than the hash algorithms - it basically consists of paired streamed reads over the arrays, repeated n times.
EDIT: I have written implementations of Paul Baker's original algorithm, Nick Larsen's hashtable algorithm, and my algorithm, and a simple benchmarking framework. The implementations are simple (linear probing in the hashtable, no skipping in my linear search), and i had to make guesses at various sizing parameters. See http://urchin.earth.li/~twic/Code/SumTest/ for the code. I welcome corrections or suggestions, about any of the implementations, the framework, and the parameters.
For L and M containing 3438 items each, with values ranging from 1 to 34380, and with Larsen's hashtable having a load factor of 0.75, the median times for a run are:
Baker (binary search): 423 716 646 ns
Larsen (hashtable): 733 479 121 ns
Anderson (linear search): 62 077 597 ns
The difference is much bigger than i had expected (and, i admit, not in the direction i had expected). I suspect i have made one or more major mistakes in the implementation. If anyone spots one, i really would like to hear about it!
One thing is that i have allocated Larsen's hashtable inside the timed method. It is thus paying the cost of allocation and (some) garbage collection. I think this is fair, because it's a temporary structure only needed by the algorithm. If you think it's something that could be reused, it would be simple enough to move it into an instance field and allocate it only once (and Arrays.fill it with zero inside the timed method), and see how that affects performance.

The complexity of the example code in the question is O(m log m + l2 log m) where l=|L| and m=|M| as it runs binary search (O(log m)) for every pair of elements in L (O(l2)), and M is sorted first.
Replacing the binary search with a hash table reduces the complexity to O(l2) assuming that hash table insert and lookup are O(1) operations.
This is asymptotically optimal as long as you assume that you need to process every pair of numbers on the list L, as there are O(l2) such pairs. If there are a couple of thousands of numbers on L, and they are random 64-bit integers, then definitely you need to process all the pairs.

Instead of sorting M at a cost of n * log(n), you could create a hash set at the cost of n.
You could also store all sums in another hash set while iterating and add a check to make sure you don't perform the same search twice.

You can avoid binary search by using hashtable except sorted M array.

Alternatively, add all of the members of L to a hashset lSet, then iterate over M, performing these steps for each m in M:
add m to hashset mSet - if m is already in mSet, skip this iteration; if m is in hashset dSet, also skip this iteration.
subtract each member l of L less than m from m to give d, and test whether d is also in lSet;
if so, add (l, d) to some collection rSet; add d to hashset dSet.
This will require fewer iterations, at the cost of more memory. You will want to pre-allocate the memory for the structures, if this is to give you a speed increase.

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.

Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
Edit:
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
sums.push_back(sum);
}
}
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.

It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.

If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
pairs.append((2**len(pairs),x))
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
0
|
v
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.

Dynamic Programming: Sum-of-products

Let's say you have two lists, L1 and L2, of the same length, N. We define prodSum as:
def prodSum(L1, L2) :
ans = 0
for elem1, elem2 in zip(L1, L2) :
ans += elem1 * elem2
return ans
Is there an efficient algorithm to find, assuming L1 is sorted, the number of permutations of L2 such that prodSum(L1, L2) < some pre-specified value?
If it would simplify the problem, you may assume that L1 and L2 are both lists of integers from [1, 2, ..., N].
Edit: Managu's answer has convinced me that this is impossible without assuming that L1 and L2 are lists of integers from [1, 2, ..., N]. I'd still be interested in solutions that assume this constraint.

I want to first dispell a certain amount of confusion about the math, then discuss two solutions and give code for one of them.
There is a counting class called #P which is a lot like the yes-no class NP. In a qualitative sense, it is even harder than NP. There is no particular reason to believe that this counting problem is any better than #P-hard, although it could be hard or easy to prove that.
However, many #P-hard problems and NP-hard problems vary tremendously in how long they take to solve in practice, and even one particular hard problem can be harder or easier depending on the properties of the input. What NP-hard or #P-hard mean is that there are hard cases. Some NP-hard and #P-hard problems also have less hard cases or even outright easy cases. (Others have very few cases that seem much easier than the hardest cases.)
So the practical question could depend a lot on the input of interest. Suppose that the threshold is on the high side or on the low side, or you have enough memory for a decent number of cached results. Then there is a useful recursive algorithm that makes use of two ideas, one of them already mentioned: (1) After partially assigning some of the values, the remaining threshold for list fragments may rule out all of the permutations, or it may allow all of them. (2) Memory permitting, you should cache the subtotals for some remaining threshold and some list fragments. To improve the caching, you might as well pick the elements from one of the lists in order.
Here is a Python code that implements this algorithm:
list1 = [1,2,3,4,5,6,7,8,9,10,11]
list2 = [1,2,3,4,5,6,7,8,9,10,11]
size = len(list1)
threshold = 396 # This is smack in the middle, a hard value
cachecutoff = 6 # Cache results when up to this many are assigned
def dotproduct(v,w):
return sum([a*b for a,b in zip(v,w)])
factorial = [1]
for n in xrange(1,len(list1)+1):
factorial.append(factorial[-1]*n)
cache = {}
# Assumes two sorted lists of the same length
def countprods(list1,list2,threshold):
if dotproduct(list1,list2) <= threshold: # They all work
return factorial[len(list1)]
if dotproduct(list1,reversed(list2)) > threshold: # None work
return 0
if (tuple(list2),threshold) in cache: # Already been here
return cache[(tuple(list2),threshold)]
total = 0
# Match the first element of list1 to each item in list2
for n in xrange(len(list2)):
total += countprods(list1[1:],list2[:n] + list2[n+1:],
threshold-list1[0]*list2[n])
if len(list1) >= size-cachecutoff:
cache[(tuple(list2),threshold)] = total
return total
print 'Total permutations below threshold:',
print countprods(list1,list2,threshold)
print 'Cache size:',len(cache)
As the comment line says, I tested this code with a hard value of the threshold. It is quite a bit faster than a naive search over all permutations.
There is another algorithm that is better than this one if three conditions are met: (1) You don't have enough memory for a good cache, (2) the list entries are small non-negative integers, and (3) you're interested in the hardest thresholds. A second situation to use this second algorithm is if you want counts for all thresholds flat-out, whether or not the other conditions are met. To use this algorithm for two lists of length n, first pick a base x which is a power of 10 or 2 that is bigger than n factorial. Now make the matrix
M[i][j] = x**(list1[i]*list2[j])
If you compute the permanent of this matrix M using the Ryser formula, then the kth digit of the permanent in base x tells you the number of permutations for which the dot product is exactly k. Moreover, the Ryser formula is quite a bit faster than the summing over all permutations directly. (But it is still exponential, so it does not contradict the fact that computing the permanent is #P-hard.)
Also, yes it is true that the set of permutations is the symmetric group. It would be great if you could use group theory in some way to accelerate this counting problem. But as far as I know, nothing all that deep comes from that description of the question.
Finally, if instead of exactly counting the number of permutations below a threshold, you only wanted to approximate that number, then probably the game changes completely. (You can approximate the permanent in polynomial time, but that doesn't help here.) I'd have to think about what to do; in any case it isn't the question posed.
I realized that there is another kind of caching/dynamic programming that is missing from the above discussion and the above code. The caching implemented in the code is early-stage caching: If just the first few values of list1 are assigned to list2, and if a remaining threshold occurs more than once, then the cache allows the code to reuse the result. This works great if the entries of list1 and list2 are integers that are not too large. But it will be a failed cache if the entries are typical floating point numbers.
However, you can also precompute at the other end, when most of the values of list1 have been assigned. In this case, you can make a sorted list of the subtotals for all of the remaining values. And remember, you can use up list1 in order, and do all of the permutations on the list2 side. For example, suppose that the last three entries of list1 are [4,5,6], and suppose that three of the values in list2 (somewhere in the middle) are [2.1,3.5,3.7]. Then you would cache a sorted list of the six dot products:
endcache[ [2.1, 3.5, 3.7] ] = [44.9, 45.1, 46.3, 46.7, 47.9, 48.1]
What does this do for you? If you look in the code that I did post, the function countprods(list1,list2,threshold) recursively does its work with a sub-threshold. The first argument, list1, might have been better as a global variable than as an argument. If list2 is short enough, countprods can do its work much faster by doing a binary search in the list endcache[list2]. (I just learned from stackoverflow that this is implemented in the bisect module in Python, although a performance code wouldn't be written in Python anyway.) Unlike the head cache, the end cache can speed up the code a lot even if there are no numerical coincidences among the entries of list1 and list2. Ryser's algorithm also stinks for this problem without numerical coincidences, so for this type of input I only see two accelerations: Sawing off a branch of the search tree using the "all" test and the "none" test, and the end cache.

Probably not (without the simplifying assumption): your problem is NP-Hard. Here's a trivial reduction to SUBSET-SUM. Let count_perms(L1, L2, x) represent the function "count the number of permutations of L2 such that prodSum(L1, L2) < x"
SUBSET_SUM(L2,n): # (determine if any subset of L2 adds up to n)
For i in [1,...,len(L2)]
Set L1=[0]*(len(L2)-i)+[1]*i
calculate count_perms(L1,L2,n+1)-count_perms(L1,L2,n)
if result positive, return true
Return false
Thus, if there were a way to calculate your function count_perms(L1, L2, x) efficiently, then we would have an efficient algorithm to calculate SUBSET_SUM(L2,n).

This also turns out to be an abstract algebra problem. It's been awhile for me, but here's a few things to get started. There's nothing terribly significant about the following (it's all very basic; an expansion on the fact that every group is isomorphic to a permutation group), but it provides a different way of looking at the problem.
I'll try to stick to fairly standard notation: "x" is a vector, and "xi" is the ith component of x. If "L" is a list, L is the equivalent vector. "1n" is a vector with all components = 1. The set of natural numbers ℕ is taken to be the positive integers. "[a,b]" is the set of integers from a through b, inclusive. "θ(x, y)" is the angle formed by x and y
Note prodSum is the dot product. The question is equivalent to finding all vectors L generated by an operation (permuting elements) on L2 such that θ(L1, L) less than a given angle α. The operation is equivalent to reflecting a point in ℕn through a subspace with presentation:
< ℕn | (xixj-1)(i,j) ∈ A >
where i and j are in [1,n], A has at least one element and no (i,i) is in A (i.e. A is a non-reflexive subset of [1,n]2 where |A| > 0). Stated more plainly (and more ambiguously), the subspaces are the points where one or more components are equal to one or more other components. The reflections correspond to matrices whose columns are all the standard basis vectors.
Let's name the reflection group "RPn" (it should have another name, but memory fails). RPn is isomorphic to the symmetric group Sn. Thus
|RPn| = |Sn| = n!
In 3 dimensions, this gives a group of order 6. The reflection group is D3, the triangle symmetry group, as a subgroup of the cube symmetry group. It turns out you can also generate the points by rotating L2 in increments of π/3 around the line along 1n. This is the the modular group ℤ6 and this points to a possible solution: find a group of order n! with a minimal number of generators and use that to generate the permutations of L2 as sequences with increasing, then decreasing, angle with L2. From there, we can try to generate the elements L with θ(L1, L) < α directly (for example we can binsearch on the 1st half of each sequence to find the transition point; with that, we can specify the rest of the sequence that fulfills the condition and count it in O(1) time). Let's call this group RP'n.
RP'4 is constructed of 4 subspaces isomorphic to ℤ6. More generally, RP'n is constructed of n subspaces isomorphic to RP'n-1.
This is where my abstract algebra muscles really begins to fail. I'll try to keep working on the construction, but Managu's answer doesn't leave much hope. I fear that reducing RP3 to ℤ6 is the only useful reduction we can make.

It looks like if l1 and l2 are both ordered high->low (or low->high, whatever, if they have the same order), the result is maximized, and if they are ordered oposite, the result is minimized, and other alterations of order appear to follow some rules; swapping two numbers in a continuous list of integers always reduces the sum by a fixed amount which seems to be related to their distance apart (ie swapping 1 and 3 or 2 and 4 have the same effect). This was just from a little messing around, but the idea is that there is a maximum, a minimum, and if some-pre-specified-value is between them, there are ways to count the permutations that make that possible (although; if the list isn't evenly spaced, then there aren't. Well, not that I know of. If l2 is (1 2 4 5) swapping 1 2 and 2 4 would have different effects)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio