Range Minimum Query <O(n), O(1)> approach (Last steps) - algorithm

Continued from my last question "Range Minimum Query approach (from tree to restricted RMQ)" (It's recommended to give it a read)
Again, from this tutorial on TopCoder, I have a few questions here and there, and I hope someone can clear them out.
So I transform a RMQ (Range Minimum Query) problem to a LCA (Lowest Common Ancestor) problem, and then transform it back, I can have an array that's simplified. (both transform can be found in the tutorial, and the simplified array is array L discussed in "From LCA to RMQ")
Anyway, I can get that array by using Euler Tour, and that's the core part of all the calculation.
First, I need to make it even simpler by making the whole array consists of only 1 and -1, so this is what I do: Ls[i] = L[i] - L[i-1].
The second step is actually partition, and that's simple enough, but there's this third step that confuses me.
Let A'[i] be the minimum value for the i-th block in A and B[i] be the
position of this minimum value in A.
A refers to the L array in this sentence, so the minimum value would always be 1 or -1, and there's gonna be multiple 1s and -1s. It confuses me since I don't think this makes calculation easier.
The fourth step,
Now, we preprocess A' using the ST algorithm described in Section1.
This will take O(N/l * log(N/l)) = O(N) time and space.
If A' only keep records of 1s and -1s, it would seemed useless to do anything on it.
The last step,
To index table P, preprocess the type of each block in A and store it
in array T[1, N/l]. The block type is a binary number obtained by
replacing -1 with 0 and +1 with 1.
What does it mean? To calculate each kind of combination? Like, 000 - 001 -.....?
It looks like multiple questions, but I was hoping that someone can just walk me thorough these last steps. Thanks!

Hopefully this helps explain things.
A refers to the L array in this sentence, so the minimum value would always be 1 or -1, and there's gonna be multiple 1s and -1s. It confuses me since I don't think this makes calculation easier.
I think that the author is mixing up terms here. In this case, I believe that array A refers to the array of original values before they've been preprocessed into -1's and +1's. These values are good to have lying around, since having the minimum value computed for each block of the original array makes it a lot faster to do RMQ. More on that later. For now, don't worry about the +1 and -1 values. They come into play later.
If A' only keep records of 1s and -1s, it would seemed useless to do anything on it.
That's true. However, here A' holds the minimum values from each block before they've been preprocessed into -1 and +1 values, so this actually is an interesting problem to solve. Again, the -1 and +1 steps haven't come into play yet.
To index table P, preprocess the type of each block in A and store it in array T[1, N/l]. The block type is a binary number obtained by replacing -1 with 0 and +1 with 1.
This is where the -1 and +1 values come in. The key idea behind this step is that with small block sizes, there aren't very many possible combinations of -1's and +1's in a block. For example, if the block size is 3, then the possible blocks are
---
--+
-+-
-++
+--
+-+
++-
+++
Here, I'm using + and - to mean +1 and -1.
The article you're reading gives the following trick. Rather than using -1 and +1, use binary 0 and 1. This means the possible blocks are
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
The advantage of this scheme is twofold. First, since there are only finitely many blocks, it's possible to precompute, for each possible block, the RMQ answer for any pair of indices within that block. Second, since each block can be interpreted as an integer, it's possible to store the answers to these questions in an array keyed by integers, where each integer is what you get by converting the block's -1 and +1 values into 0s and 1s.
Hope this helps!

Related

Calculate the number of distinct possible values of the array after the repeated process of absolute difference between any 2 elements

This was a question asked in a mock test. So, I could not find any online evaluator for this.
Basically, you are provided with an array of elements. You can take any 2 elements and add their absolute difference back to the array.
The solution should be the number of distinct values in the array after infinite number of above given step.
Example 1-
Input [2,3,4,5]
Output - 5
Explanation: Since 3-2=1, this can be added to the set. So the total number of distinct values will be 5.
Example 2-
Input [1,100]
Output - 100
Explanation - 100-1 = 99 -> add this back to the set
Then, 99-1 = 98 -> add this back to the set.
After repeating the process, all the numbers from 1 to 100 will be present in the set.
I used a hash set to store the distinct elements to store the initial array elements and used 2 for loops to subtract the store the distinct result to the same array-
But only after submitting my code, i realized, the distinct result should be again used for further subtraction.
Hence my answer was wrong.
Can anyone help me solve this ? Thanks in advance.
Edit : correcting the solution of first example to 5 instead of 6.
Adding explanation for maximum(array)/gcd(array) as pointed out in the comments.
gcd(array) - the greatest common divisor for all the numbers in the array is calculated i.e. the largest number that divides all the values in the array. It is known that division is nothing but repeated subtractions. So, no matter how many times we were to find absolute difference of the numbers in the array, GCD of the numbers is the maximum we can reduce the difference to. For an instance, let's say the array is [2, 10], the array can only be become [2, 4, 6, 8, 10]. Other numbers in this range will never get added, that is, an absolute difference of 1 never occurs.
maximum(array) - we are finding difference here, so at any point, the difference cannot exceed the maximum value.
Therefore, maximum(array)/gcd(array) formula gives the right output for the question.
Hope this helps!

Solving ACM ICPC - SEERC 2009

I have been sitting on this for almost a week now. Here is the question in a PDF format.
I could only think of one idea so far but it failed. The idea was to recursively create all connected subgraphs which works in O(num_of_connected_subgraphs), but that is way too slow.
I would really appreciate someone giving my a direction. I'm inclined to think that the only way is dynamic programming but I can't seem to figure out how to do it.
OK, here is a conceptual description for the algorithm that I came up with:
Form an array of the (x,y) board map from -7 to 7 in both dimensions and place the opponents pieces on it.
Starting with the first row (lowest Y value, -N):
enumerate all possible combinations of the 2nd player's pieces on the row, eliminating only those that conflict with the opponents pieces.
for each combination on this row:
--group connected pieces into separate networks and number these
networks starting with 1, ascending
--encode the row as a vector using:
= 0 for any unoccupied or opponent position
= (1-8) for the network group that that piece/position is in.
--give each such grouping a COUNT of 1, and add it to a dictionary/hashset using the encoded vector as its key
Now, for each succeeding row, in ascending order {y=y+1}:
For every entry in the previous row's dictionary:
--If the entry has exactly 1 group, add it's COUNT to TOTAL
--enumerate all possible combinations of the 2nd player's pieces
on the current row, eliminating only those that conflict with the
opponents pieces. (change:) you should skip the initial combination
(where all entries are zero) for this step, as the step above actually
covers it. For each such combination on the current row:
+ produce a grouping vector as described above
+ compare the current row's group-vector to the previous row's
group-vector from the dictionary:
++ if there are any group-*numbers* from the previous row's
vector that are not adjacent to any gorups in the current
row's vector, *for at least one value of X*, then skip
to the next combination.
++ any groups for the current row that are adjacent to any
groups of the previous row, acquire the lowest such group
number
++ any groups for the current row that are not adjacent to
any groups of the previous row, are assigned an unused
group number
+ Re-Normalize the group-number assignments for the current-row's
combination (**) and encode the vector, giving it a COUNT equal
to the previous row-vector's COUNT
+ Add the current-row's vector to the dictionary for the current
Row, using its encoded vector as the key. If it already exists,
then add it's COUNT to the COUNT for the pre-exising entry
Finally, for every entry in the dictionary for the last row:
If the entry has exactly one group, then add it's COUNT to TOTAL
**: Re-Normalizing simply means to re-assign the group numbers so as to eliminate any permutations in the grouping pattern. Specifically, this means that new group numbers should be assigned in increasing order, from left-to-right, starting from one. So for example, if your grouping vector looked like this after grouping ot to the previous row:
2 0 5 5 0 3 0 5 0 7 ...
it should be re-mapped to this normal form:
1 0 2 2 0 3 0 2 0 4 ...
Note that as in this example, after the first row, the groupings can be discontiguous. This relationship must be preserved, so the two groups of "5"s are re-mapped to the same number ("2") in the re-normalization.
OK, a couple of notes:
A. I think that this approach is correct , but I I am really not certain, so it will definitely need some vetting, etc.
B. Although it is long, it's still pretty sketchy. Each individual step is non-trivial in itself.
C. Although there are plenty of individual optimization opportunities, the overall algorithm is still pretty complicated. It is a lot better than brute-force, but even so, my back-of-the-napkin estimate is still around (2.5 to 10)*10^11 operations for N=7.
So it's probably tractable, but still a long way off from doing 74 cases in 3 seconds. I haven't read all of the detail for Peter de Revaz's answer, but his idea of rotating the "diamond" might be workable for my algorithm. Although it would increase the complexity of the inner loop, it may drop the size of the dictionaries (and thus, the number of grouping-vectors to compare against) by as much as a 100x, though it's really hard to tell without actually trying it.
Note also that there isn't any dynamic programming here. I couldn't come up with an easy way to leverage it, so that might still be an avenue for improvement.
OK, I enumerated all possible valid grouping-vectors to get a better estimate of (C) above, which lowered it to O(3.5*10^9) for N=7. That's much better, but still about an order of magnitude over what you probably need to finish 74 tests in 3 seconds. That does depend on the tests though, if most of them are smaller than N=7, it might be able to make it.
Here is a rough sketch of an approach for this problem.
First note that the lattice points need |x|+|y| < N, which results in a diamond shape going from coordinates 0,6 to 6,0 i.e. with 7 points on each side.
If you imagine rotating this diamond by 45 degrees, you will end up with a 7*7 square lattice which may be easier to think about. (Although note that there are also intermediate 6 high columns.)
For example, for N=3 the original lattice points are:
..A..
.BCD.
EFGHI
.JKL.
..M..
Which rotate to
A D I
C H
B G L
F K
E J M
On the (possibly rotated) lattice I would attempt to solve by dynamic programming the problem of counting the number of ways of placing armies in the first x columns such that the last column is a certain string (plus a boolean flag to say whether some points have been placed yet).
The string contains a digit for each lattice point.
0 represents an empty location
1 represents an isolated point
2 represents the first of a new connected group
3 represents an intermediate in a connected group
4 represents the last in an connected group
During the algorithm the strings can represent shapes containing multiple connected groups, but we reject any transformations that leave an orphaned connected group.
When you have placed all columns you need to only count strings which have at most one connected group.
For example, the string for the first 5 columns of the shape below is:
....+ = 2
..+++ = 3
..+.. = 0
..+.+ = 1
..+.. = 0
..+++ = 3
..+++ = 4
The middle + is currently unconnected, but may become connected by a later column so still needs to be tracked. (In this diagram I am also assuming a up/down/left/right 4-connectivity. The rotated lattice should really use a diagonal connectivity but I find that a bit harder to visualise and I am not entirely sure it is still a valid approach with this connectivity.)
I appreciate that this answer is not complete (and could do with lots more pictures/explanation), but perhaps it will prompt someone else to provide a more complete solution.

Range Minimum Query <O(n), O(1)> approach (Query)

Continued from my last two questions, "Range Minimum Query approach (from tree to restricted RMQ)" and "Range Minimum Query approach (Last steps)"
I followed this tutorial on TopCoder, and the approach is introduced in the last section.
Now assuming I have everything done, and I am ready for query. According to the tutorial, this is what I should do:
i and j are in the same block, so we use the value computed in P and T
For example, if there's a block like this:
000111
The minimum value lies of course in the third 0, but if i and j are like 4 and 6, the third 0 won't lie in the queried criteria. Is my understanding wrong?
i and j are in different blocks, so we compute three values: the
minimum from i to the end of i's block using P and T, the minimum of
all blocks between i's and j's block using precomputed queries on A'
and the minimum from the begining of j's block to j, again using T and
P; finally return the position where the overall minimum is using the
three values you just computed.
Why compute the minimum from i to end of i's block and the minimum of start of j's block to j? Don't the answer of both lies outside of i...j? Also, how to do that if it's not entirely a fit just like the last question.
The minimum value lies of course in the third 0, but if i and j are like 4 and 6, the third 0 won't lie in the queried criteria. Is my understanding wrong?
The idea is to precompute the RMQ for all pairs of indices in every possible block. As a result, regardless of what indices you query within that block, you should always be able, in O(1) time, to read off the RMQ of the two values within the block. In the case you listed in your question, the fact that indices 4 and 6 don't contain the block minimum is true but irrelevant. You'll already have the RMQ precomputed for indices 4 and 6.
Why compute the minimum from i to end of i's block and the minimum of start of j's block to j? Don't the answer of both lies outside of i...j? Also, how to do that if it's not entirely a fit just like the last question.
Consider this picture:
+------+------+------+------+------+------+
| ?i?? | ???? | ???? | ???? | ??j? | ???? |
+------+------+------+------+------+------+
^ ^
i j
If you want to solve RMQ(i, j), then the minimum could be in one of three places:
In the same block as i, at an index from the position of i within its block to the end of its block,
In the same block as j, at an index from 0 to the position of j within its block, or
Somewhere in one of the middle three blocks.
The algorithm works by using the precomputed tables to solve the problem in the first two cases, then using the other algorithm to solve it for the third case. The minimum of these three should be your answer.
Hope this helps! This is by no means an easy algorithm, so please feel free to ask more questions here if you need help!

Returning i-th combination of a bit array

Given a bit array of fixed length and the number of 0s and 1s it contains, how can I arrange all possible combinations such that returning the i-th combinations takes the least possible time?
It is not important the order in which they are returned.
Here is an example:
array length = 6
number of 0s = 4
number of 1s = 2
possible combinations (6! / 4! / 2!)
000011 000101 000110 001001 001010
001100 010001 010010 010100 011000
100001 100010 100100 101000 110000
problem
1st combination = 000011
5th combination = 001010
9th combination = 010100
With a different arrangement such as
100001 100010 100100 101000 110000
001100 010001 010010 010100 011000
000011 000101 000110 001001 001010
it shall return
1st combination = 100001
5th combination = 110000
9th combination = 010100
Currently I am using a O(n) algorithm which tests for each bit whether it is a 1 or 0. The problem is I need to handle lots of very long arrays (in the order of 10000 bits), and so it is still very slow (and caching is out of the question). I would like to know if you think a faster algorithm may exist.
Thank you
I'm not sure I understand the problem, but if you only want the i-th combination without generating the others, here is a possible algorithm:
There are C(M,N)=M!/(N!(M-N)!) combinations of N bits set to 1 having at most highest bit at position M.
You want the i-th: you iteratively increment M until C(M,N)>=i
while( C(M,N) < i ) M = M + 1
That will tell you the highest bit that is set.
Of course, you compute the combination iteratively with
C(M+1,N) = C(M,N)*(M+1)/(M+1-N)
Once found, you have a problem of finding (i-C(M-1,N))th combination of N-1 bits, so you can apply a recursion in N...
Here is a possible variant with D=C(M+1,N)-C(M,N), and I=I-1 to make it start at zero
SOL=0
I=I-1
while(N>0)
M=N
C=1
D=1
while(i>=D)
i=i-D
M=M+1
D=N*C/(M-N)
C=C+D
SOL=SOL+(1<<(M-1))
N=N-1
RETURN SOL
This will require large integer arithmetic if you have that many bits...
If the ordering doesn't matter (it just needs to remain consistent), I think the fastest thing to do would be to have combination(i) return anything you want that has the desired density the first time combination() is called with argument i. Then store that value in a member variable (say, a hashmap that has the value i as key and the combination you returned as its value). The second time combination(i) is called, you just look up i in the hashmap, figure out what you returned before and return it again.
Of course, when you're returning the combination for argument(i), you'll need to make sure it's not something you have returned before for some other argument.
If the number you will ever be asked to return is significantly smaller than the total number of combinations, an easy implementation for the first call to combination(i) would be to make a value of the right length with all 0s, randomly set num_ones of the bits to 1, and then make sure it's not one you've already returned for a different value of i.
Your problem appears to be constrained by the binomial coefficient. In the example you give, the problem can be translated as follows:
there are 6 items that can be chosen 2 at a time. By using the binomial coefficient, the total number of unique combinations can be calculated as N! / (K! (N - K)!, which for the case of K = 2 simplifies to N(N-1)/2. Plugging 6 in for N, we get 15, which is the same number of combinations that you calculated with 6! / 4! / 2! - which appears to be another way to calculate the binomial coefficient that I have never seen before. I have tried other combinations as well and both formulas generate the same number of combinations. So, it looks like your problem can be translated to a binomial coefficient problem.
Given this, it looks like you might be able to take advantage of a class that I wrote to handle common functions for working with the binomial coefficient:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
It should not be hard to convert this class to the language of your choice.
There may be some limitations since you are using a very large N that could end up creating larger numbers than the program can handle. This is especially true if K can be large as well. Right now, the class is limited to the size of an int. But, it should not be hard to update it to use longs.

Find the smallest set group to cover all combinatory possibilities

I'm making some exercises on combinatorics algorithm and trying to figure out how to solve the question below:
Given a group of 25 bits, set (choose) 15 (non-permutable and order NON matters):
n!/(k!(n-k)!) = 3.268.760
Now for every of these possibilities construct a matrix where I cross every unique 25bit member against all other 25bit member where
in the relation in between it there must be at least 11 common setted bits (only ones, not zeroes).
Let me try to illustrate representing it as binary data, so the first member would be:
0000000000111111111111111 (10 zeros and 15 ones) or (15 bits set on 25 bits)
0000000001011111111111111 second member
0000000001101111111111111 third member
0000000001110111111111111 and so on....
...
1111111111111110000000000 up to here. The 3.268.760 member.
Now crossing these values over a matrix for the 1 x 1 I must have 15 bits common. Since the result is >= 11 it is a "useful" result.
For the 1 x 2 we have 14 bits common so also a valid result.
Doing that for all members, finally, crossing 1 x 3.268.760 should result in 5 bits common so since it's < 11 its not "useful".
What I need is to find out (by math or algorithm) wich is the minimum number of members needed to cover all possibilities having 11 bits common.
In other words a group of N members that if tested against all others may have at least 11 bits common over the whole 3.268.760 x 3.268.760 universe.
Using a brute force algorithm I found out that with 81 25bit member is possible achive this. But i'm guessing that this number should be smaller (something near 12).
I was trying to use a brute force algorithm to make all possible variations of 12 members over the 3.268.760 but the number of possibilities
it's so huge that it would take more than a hundred years to compute (3,156x10e69 combinations).
I've googled about combinatorics but there are so many fields that i don't know in wich these problem should fit.
So any directions on wich field of combinatorics, or any algorithm for these issue is greatly appreciate.
PS: Just for reference. The "likeness" of two members is calculated using:
(Not(a xor b)) and a
After that there's a small recursive loop to count the bits given the number of common bits.
EDIT: As promissed (#btilly)on the comment below here's the 'fractal' image of the relations or link to image
The color scale ranges from red (15bits match) to green (11bits match) to black for values smaller than 10bits.
This image is just sample of the 4096 first groups.
tl;dr: you want to solve dominating set on a large, extremely symmetric graph. btilly is right that you should not expect an exact answer. If this were my problem, I would try local search starting with the greedy solution. Pick one set and try to get rid of it by changing the others. This requires data structures to keep track of which sets are covered exactly once.
EDIT: Okay, here's a better idea for a lower bound. For every k from 1 to the value of the optimal solution, there's a lower bound of [25 choose 15] * k / [maximum joint coverage of k sets]. Your bound of 12 (actually 10 by my reckoning, since you forgot some neighbors) corresponds to k = 1. Proof sketch: fix an arbitrary solution with m sets and consider the most coverage that can be obtained by k of the m. Build a fractional solution where all symmetries of the chosen k are averaged together and scaled so that each element is covered once. The cost of this solution is [25 choose 15] * k / [maximum joint coverage of those k sets], which is at least as large as the lower bound we're shooting for. It's still at least as small, however, as the original m-set solution, as the marginal returns of each set are decreasing.
Computing maximum coverage is in general hard, but there's a factor (e/(e-1))-approximation (≈ 1.58) algorithm: greedy, which it sounds as though you could implement quickly (note: you need to choose the set that covers the most uncovered other sets each time). By multiplying the greedy solution by e/(e-1), we obtain an upper bound on the maximum coverage of k elements, which suffices to power the lower bound described in the previous paragraph.
Warning: if this upper bound is larger than [25 choose 15], then k is too large!
This type of problem is extremely hard, you should not expect to be able to find the exact answer.
A greedy solution should produce a "fairly good" answer. But..how to be greedy?
The idea is to always choose the next element to be the one that is going to match as many possibilities as you can that are currently unmatched. Unfortunately with over 3 million possible members, that you have to try match against millions of unmatched members (note, your best next guess might already match another member in your candidate set..), even choosing that next element is probably not feasible.
So we'll have to be greedy about choosing the next element. We will choose each bit to maximize the sum of the probabilities of eventually matching all of the currently unmatched elements.
For that we will need a 2-dimensional lookup table P such that P(n, m) is the probability that two random members will turn out to have at least 11 bits in common, if m of the first n bits that are 1 in the first member are also 1 in the second. This table of 225 probabilities should be precomputed.
This table can easily be computed using the following rules:
P(15, m) is 0 if m < 11, 1 otherwise.
For n < 15:
P(n, m) = P(n+1, m+1) * (15-m) / (25-n) + P(n+1, m) * (10-n+m) / (25-n)
Now let's start with a few members that are "very far" from each other. My suggestion would be:
First 15 bits 1, rest 0.
First 10 bits 0, rest 1.
First 8 bits 1, last 7 1, rest 0.
Bits 1-4, 9-12, 16-23 are 1, rest 0.
Now starting with your universe of (25 choose 15) members, eliminate all of those that match one of the elements in your initial collection.
Next we go into the heart of the algorithm.
While there are unmatched members:
Find the bit that appears in the most unmatched members (break ties randomly)
Make that the first set bit of our candidate member for the group.
While the candidate member has less than 15 set bits:
Let p_best = 0, bit_best = 0;
For each unset bit:
Let p = 0
For each unmatched member:
p += P(n, m) where m = number of bits in common between
candidate member+this bit and the unmatched member
and n = bits in candidate member + 1
If p_best < p:
p_best = p
bit_best = this unset bit
Set bit_best as the next bit in our candidate member.
Add the candidate member to our collection
Remove all unmatched members that match this from unmatched members
The list of candidate members is our answer
I have not written code, I therefore have no idea how good an answer this algorithm will produce. But assuming that it does no better than your current, for 77 candidate members (we cheated and started with 4) you have to make 271 passes through your unmatched candidates (25 to find the first bit, 24 to find the second, etc down to 11 to find the 15th, and one more to remove the matched members). That's 20867 passes. If you have an average of 1 million unmatched members, that's on the order of a 20 billion operations.
This won't be quick. But it should be computationally feasible.

Resources