Mapping an index to each pair of unique points - algorithm

Given an sample size N, I want to map each pair of unique points to an index, is such a way that I generate the pair using the index.
example - let's say N=5, so the number of unique pairs is N*(N-1)/2=10.
The pairs are -
0: (0, 1)
1: (0, 2)
2: (0, 3)
3: (0, 4)
4: (1, 2)
5: (1, 3)
6: (1, 4)
7: (2, 3)
8: (2, 4)
9: (3, 4)
so given a specific i, let's say i=4, a mapping function should should return (1,2).
The original ordering of the pairs can be changed, if that helps.

I like to order the pairs (or, in general, ordered tuples) in what's called "colex" order, which is lexicographic order of the reversed tuple. Or, in other words, sorted by the largest element (and using the next largest element as a tie-breaker if the tuples are bigger than pairs.) This results in the ordering
0: (0, 1)
1: (0, 2)
2: (1, 2)
3: (0, 3)
4: (1, 3)
5: (2, 3)
6: (0, 4)
7: (1, 4)
8: (2, 4)
9: (3, 4)
The advantage of this ordering is that it doesn't depend on N, which is extremely helpful if you might later need to increase N without invalidating any existing index.
You can then compute (x, y) as:
n = floor((sqrt(8 * i + 1) - 1) / 2)
x = i - n * (n + 1) / 2
y = n + 1

Related

Get permutation count

I searching for an algorithm which gives me the permutation count of the elements 1....n. If i define the cycle lengths.
For example n := 4
<Set of cycle lengths> -> permutation count
1,1,1,1 -> 1 read 4 cycles of length 1 leads to 1 permutation: 1,2,3,4
1,1,2 -> 5 read 2 cycles of length 1 and 1 cycle of length 2 leads to 5 permutations: 1,2,4,3, 1,4,3,2, 1,3,2,4, 2,1,3,4, 3,2,1,4,
2,2 -> 3 read 2 cycles of length 2 leads to 3 permutations: 2,1,4,3, 3,4,1,2,4,3,2,1
1,3 -> 9 read 1 cycle of length 1 and 1 cycle of length 3 leads to 9 permutations 1,3,2,4, 1,3,4,2, 1,4,2,3, 2,3,1,4, 2,4,3,1, 3,1,2,4, 3,2,4,1,4,1,3,2, 4,2,1,3,
4 -> 6 read 1 cycle of length 4 leads to 6 permutations:
2,3,4,1, 2,4,1,3, 3,1,4,2, 3,4,2,1, 4,1,2,3, 4,3,1,2
How can i compute the permutation count of a given set consisting cycle lengths? Iterating through all permutations is not an option.
For a given cycle type, we can produce a permutation with that cycle type by writing down a permutation of the list 1, ..., n and then bracketing it appropriately, according to the lengths in the cycle type, to get a permutation written in cycle notation.
For example, if we want cycle type (3, 2, 2), then the permutation 1, 2, 3, 4, 5, 6, 7 is bracketed as (1 2 3)(4 5)(6 7), while 5, 1, 6, 2, 4, 3, 7 gives (5 1 6)(2 4)(3 7).
It's clear that we get all permutations of cycle type (3, 2, 2) this way, but it's also clear that we can get each permutation in multiple different ways. There are two causes of overcounting: first, we can make a cyclic shift for any of the cycles: (5 1 6)(2 4)(3 7) is the same permutation as (1 6 5)(2 4)(3 7) or (6 5 1)(2 4)(3 7). Second, cycles of the same length can be permuted arbitrarily: (5 1 6)(2 4)(3 7) is the same permutation as (5 1 6)(3 7)(2 4). A bit of thought should convince you that these are the only possible causes of overcounting.
To account for both causes of overcounting, we divide the total number of permutations by (a) the product of the cycle lengths, and also (b) the factorial of the number of cycles for any given cycle length. In the (3, 2, 2) case: we divide by 3 × 2 × 2 for (a), and 2! for (b), because there are two cycles of length 2.
Since this is Stack Overflow, here's some Python code:
from collections import Counter
from math import factorial
def count_cycle_type(p):
"""Number of permutations with a given cycle type."""
count = factorial(sum(p))
for cycle_length, ncycles in Counter(p).items():
count //= cycle_length ** ncycles * factorial(ncycles)
return count
Example:
>>> count_cycle_type((2, 2))
3
>>> count_cycle_type((3, 2, 2))
210
To double check correctness, we can add the counts for all cycle types of a given length n, and check that we get n!. The cycle types are the partitions of n. We can compute those fairly simply by a recursive algorithm. Here's some code to do that. partitions is the function we want; bounded_partitions is a helper.
def bounded_partitions(n, k):
"""Generate partitions of n with largest element <= k."""
if k == 0:
if n == 0:
yield ()
else:
if n >= k:
for c in bounded_partitions(n - k, k):
yield (k,) + c
yield from bounded_partitions(n, k - 1)
def partitions(n):
"""Generate partitions of n."""
return bounded_partitions(n, n)
Example:
>>> for partition in partitions(5): print(partition)
...
(5,)
(4, 1)
(3, 2)
(3, 1, 1)
(2, 2, 1)
(2, 1, 1, 1)
(1, 1, 1, 1, 1)
And here's the double check: the sum of all the cycle type counts, for total lengths 5, 6, 7 and 20. We get the expected results of 5!, 6!, 7! and 20!.
>>> sum(count_cycle_type(p) for p in partitions(5))
120
>>> sum(count_cycle_type(p) for p in partitions(6))
720
>>> sum(count_cycle_type(p) for p in partitions(7))
5040
>>> sum(count_cycle_type(p) for p in partitions(20))
2432902008176640000
>>> factorial(20)
2432902008176640000
This can be broken down into:
The number of ways to partition elements in to buckets matching the required count of elements with each distinct cycle size;
Multiplied by, for each distinct cycle size, the number of unique ways to partition the elements evenly into the required number of cycles;
Multiplied by, for each cycle, the number of distinct cyclic orderings
1: For bucket sizes s1...sk, that works out to n!/(s1! * ... * sk!)
2: For a bucket containing m elements that must be partitioned into c cycles, there are m!/( (m/c)!c * c! ) ways
3: For a cycle containing m elements, there are (m-1)! distinct cyclic orderings if m > 1, and just 1 ordering otherwise

Moving maximum variant

Yesterday, I got asked the following question during a technical interview.
Imagine that you are working for a news agency. At every discrete point of time t, a story breaks. Some stories are more interesting than others. This "hotness" is expressed as a natural number h, with greater numbers representing hotter news stories.
Given a stream S of n stories, your job is to find the hottest story out of the most recent k stories for every t >= k.
So far, so good: this is the moving maximum problem (also known as the sliding window maximum problem), and there is a linear-time algorithm that solves it.
Now the question gets harder. Of course, older stories are usually less hot compared to newer stories. Let the age a of the most recent story be zero, and let the age of any other story be one greater than the age of its succeeding story. The "improved hotness" of a story is then defined as max(0, min(h, k - a)).
Here's an example:
n = 13, k = 4
S indices: 0 1 2 3 4 5 6 7 8 9 10
S values: 1 3 1 7 1 3 9 3 1 3 1
mov max hot indices: 3 3 3 6 6 6 6 9
mov max hot values: 7 7 7 9 9 9 9 3
mov max imp-hot indices: 3 3 5 6 7 7 9 9
mov max imp-hot values: 4 3 3 4 3 3 3 3
I was at a complete loss with this question. I thought about adding the index to every element before computing the maximum, but that gives you the answer for when the hotness of a story decreases by one at every step, regardless of whether it reached the hotness bound or not.
Can you find an algorithm for this problem with sub-quadratic (ideally: linear) running time?
I'll sketch a linear-time solution to the original problem involving a double-ended queue (deque) and then extend it to improved hotness with no loss of asymptotic efficiency.
Original problem: keep a deque that contains the stories that are (1) newer or hotter than every other story so far (2) in the window. At any given time, the hottest story in the queue is at the front. New stories are pushed onto the back of the deque, after popping every story from the back until a hotter story is found. Stories are popped from the front as they age out of the window.
For example:
S indices: 0 1 2 3 4 5 6 7 8 9 10
S values: 1 3 1 7 1 3 9 3 1 3 1
deque: (front) [] (back)
push (0, 1)
deque: [(0, 1)]
pop (0, 1) because it's not hotter than (1, 3)
push (1, 3)
deque: [(1, 3)]
push (2, 1)
deque: [(1, 3), (2, 1)]
pop (2, 1) and then (1, 3) because they're not hotter than (3, 7)
push (3, 7)
deque: [(3, 7)]
push (4, 1)
deque: [(3, 7), (4, 1)]
pop (4, 1) because it's not hotter than (5, 3)
push (5, 3)
deque: [(3, 7), (5, 3)]
pop (5, 3) and then (3, 7) because they're not hotter than (6, 9)
push (6, 9)
deque: [(6, 9)]
push (7, 3)
deque: [(6, 9), (7, 3)]
push (8, 1)
deque: [(6, 9), (7, 3), (8, 1)]
pop (8, 1) and (7, 3) because they're not hotter than (9, 3)
push (9, 3)
deque: [(6, 9), (9, 3)]
push (10, 1)
pop (6, 9) because it exited the window
deque: [(9, 3), (10, 1)]
To handle the new problem, we modify how we handle aging stories. Instead of popping stories as they slide out of the window, we pop the front story whenever its improved hotness becomes less than or equal to its hotness. When determining the top story, only the most recently popped story needs to be considered.
In Python:
import collections
Elem = collections.namedtuple('Elem', ('hot', 't'))
def winmaximphot(hots, k):
q = collections.deque()
oldtop = 0
for t, hot in enumerate(hots):
while q and q[-1].hot <= hot:
del q[-1]
q.append(Elem(hot, t))
while q and q[0].hot >= k - (t - q[0].t) > 0:
oldtop = k - (t - q[0].t)
del q[0]
if t + 1 >= k:
yield max(oldtop, q[0].hot) if q else oldtop
oldtop = max(0, oldtop - 1)
print(list(winmaximphot([1, 3, 1, 7, 1, 3, 9, 3, 1, 3, 1], 4)))
Idea is the following: for each breaking news, it will beat all previous news after k-h steps. It means for k==30 and news hotness h==28, this news will be hotter than all previous news after 2 steps.
Let's keep all moments of time when next news will be the hottest. At step i we get moment of time when current news will beat all previous ones equal to i+k-h.
So we will have such sequence of objects {news_date | news_beats_all_previous_ones_date}, which is in increasing order by news_beats_all_previous_ones_date:
{i1 | i1+k-h} {i3 | i3+k-h} {i4 | i4+k-h} {i7 | i7+k-h} {i8 | i8+k-h}
At current step we get i9+k-h, we are adding it to the end of this list, removing all values which are bigger (since sequence is increasing this is easy).
Once first element's news_beats_all_previous_ones_date becomes equal current date (i), this news becomes answer to the sliding window query and we remove this item from the sequence.
So, you need a data structure with ability to add to the end, and remove from beginning and from the end. This is Deque. Time complexity of solution is O(n).

Exhaustively permutate a vector of size 20 in Matlab

I'm trying to exhaustively permutate a vector of size 20, but when I tried to use perms(v), I get the error
Error using perms (line 23)
Maximum variable size allowed by the program is exceeded.
I've read from the documentation that the memory required for vectors longer than 10 is astronomical. So I'm looking for an alternative.
What I'm trying to do is the following (using a smaller scale example, where the vector here is only of size 3 instead of 20) - find all vectors, x, of length 3 where (x_i)^2 = 1, e.g.
(1, 1, 1),
(-1, 1, 1), (1, -1, 1), (1, 1, -1),
(-1, -1, 1), (-1, 1, -1), (1, -1, -1),
(-1, -1, -1)
I was trying to iteratively create the "base vector", where the number of '-1' elements increased from 0 to 20, then use perms(v) to permutate each "base vector", but I ran into the memory problem.
Is there any alternative to do this?
There are 2^20 such vectors (about 1 million). So you can make a cycle with counter in range 0..2^20-1 and map counter value (binary representation) to needed vector (zero bit to -1, one bit to +1 or vice versa). Simple mapping formula:
Vector_Element = bit * 2 - 1
Example for length 4:
i=10
binary form 1 0 1 0
+/-1 vector: 1 -1 1 -1

How to implement a data structure which takes an integer and its weight such that a query to it returns that integer its weight% times?

Data is entered as the number followed by the weight. For example, if the data structure has data entered (1, 9) and (2, 1) then it should return the number 1 90% of the time and the number 2 10% of the time.
Which data structure should be used for this implementation? Also, what would the basic code for the query function look like?
Edit: I was considering a tree which stores the cumulative sum for every subtree. Say I have (1, 4), (2, 7), (3, 1), and (4, 11).
The tree would look like:
23
/ \
11 12
/ \ / \
4 7 1 11
I do not know if this tree should be binary. Also, does it make sense to store the weights in the tree and map them to the number or somehow use the numbers given as data input?
Make from value/weight tuples (1, 4), (2, 7), (3, 1), (4, 11) an array with cumulative weight sums
[(1, 4), (2, 11), (3, 12), (4, 23)]
and get value with binary search for cumulative weight field.
It is not clear from question, how the query should work - randomly?

How to iterate through array combinations with constant sum efficiently?

I have an array and its length is X. Each element of the array has range 1 .. L. I want to iterate efficiently through all array combinations that has sum L.
Correct solutions for: L = 4 and X = 2
1 3
3 1
2 2
Correct solutions for: L = 5 and X = 3
1 1 3
1 3 1
3 1 1
1 2 2
2 1 2
2 2 1
The naive implementation is (no wonder) too slow for my problem (X is up to 8 in my case and L is up to 128).
Could anybody tell me how is this problem called or where to find a fast algorithm for the problem?
Thanks!
If I understand correctly, you're given two numbers 1 ≤ X ≤ L and you want to generate all sequences of positive integers of length X that sum to L.
(Note: this is similar to the integer partition problem, but not the same, because you consider 1,2,2 to be a different sequence from 2,1,2, whereas in the integer partition problem we ignore the order, so that these are considered to be the same partition.)
The sequences that you are looking for correspond to the combinations of X − 1 items out of L − 1. For, if we put the numbers 1 to L − 1 in order, and pick X − 1 of them, then the lengths of intervals between the chosen numbers are positive integers that sum to L.
For example, suppose that L is 16 and X is 5. Then choose 4 numbers from 1 to 15 inclusive:
Add 0 at the beginning and 16 at the end, and the intervals are:
and 3 + 4 + 1 + 6 + 2 = 16 as required.
So generate the combinations of X − 1 items out of L − 1, and for each one, convert it to a partition by finding the intervals. For example, in Python you could write:
from itertools import combinations
def partitions(n, t):
"""
Generate the sequences of `n` positive integers that sum to `t`.
"""
assert(1 <= n <= t)
def intervals(c):
last = 0
for i in c:
yield i - last
last = i
yield t - last
for c in combinations(range(1, t), n - 1):
yield tuple(intervals(c))
>>> list(partitions(2, 4))
[(1, 3), (2, 2), (3, 1)]
>>> list(partitions(3, 5))
[(1, 1, 3), (1, 2, 2), (1, 3, 1), (2, 1, 2), (2, 2, 1), (3, 1, 1)]
There are (L − 1)! / (X − 1)!(L − X)! combinations of X − 1 items out of L − 1, so the runtime of this algorithm (and the size of its output) is exponential in L. However, if you don't count the output, it only needs O(L) space.
With L = 128 and X = 8, there are 89,356,415,775 partitions, so it'll take a while to output them all!
(Maybe if you explain why you are computing these partitions, we might be able to suggest some way of meeting your requirements without having to actually produce them all.)

Resources