Ugly Number - Mathematical intuition for dp - algorithm

I am trying find the "ugly" numbers, which is a series of numbers whose only prime factors are [2,3,5].
I found dynamic programming solution and wanted to understand how it works and what is the mathematical intuition behind the logic.
The algorithm is to keep three different counter variable for a multiple of 2, 3 and 5. Let's assume i2,i3, and i5.
Declare ugly array and initialize 0 index to 1 as the first ugly number is 1.
Initialize i2=i3=i4=0;
ugly[i] = min(ugly[i2]*2, ugly[i3]*3, ugly[i5]*5) and increment i2 or i3 or i5 which ever index was chosen.
Dry run:
ugly = |1|
i2=0;
i3=0;
i5=0;
ugly[1] = min(ugly[0]*2, ugly[0]*3, ugly[0]*5) = 2
---------------------------------------------------
ugly = |1|2|
i2=1;
i3=0;
i5=0;
ugly[2] = min(ugly[1]*2, ugly[0]*3, ugly[0]*5) = 3
---------------------------------------------------
ugly = |1|2|3|
i2=1;
i3=1;
i5=0;
ugly[3] = min(ugly[1]*2, ugly[1]*3, ugly[0]*5) = 4
---------------------------------------------------
ugly = |1|2|3|4|
i2=2;
i3=1;
i5=0;
ugly[4] = min(ugly[2]*2, ugly[1]*3, ugly[0]*5) = 5
---------------------------------------------------
ugly = |1|2|3|4|5|
i2=2;
i3=1;
i5=1;
ugly[4] = min(ugly[2]*2, ugly[1]*3, ugly[0]*5) = 6
---------------------------------------------------
ugly = |1|2|3|4|5|6|
I am getting lost how six is getting formed from 2's index. Can someone explain in an easy way?

Every "ugly" number (except 1) can be formed by multiplying a smaller ugly number by 2, 3, or 5.
So let's say that the ugly numbers found so far are [1,2,3,4,5]. Based on that list we can generate three sequences of ugly numbers:
Multiplying by 2, the possible ugly numbers are [2,4,6,8,10]
Multiplying by 3, the possible ugly numbers are [3,6,9,12,15]
Multiplying by 5, the possible ugly numbers are [5,10,15,20,25]
But we already have 2,3,4, and 5 in the list, so we don't care about values less than or equal to 5. Let's mark those entries with a - to indicate that we don't care about them
Multiplying by 2, the possible ugly numbers are [-,-,6,8,10]
Multiplying by 3, the possible ugly numbers are [-,6,9,12,15]
Multiplying by 5, the possible ugly numbers are [-,10,15,20,25]
And in fact, all we really care about is the smallest number in each sequence
Multiplying by 2, the smallest number greater than 5 is 6
Multiplying by 3, the smallest number greater than 5 is 6
Multiplying by 5, the smallest number greater than 5 is 10
After adding 6 to the list of ugly numbers, each sequence has one additional element:
Multiplying by 2, the possible ugly numbers are [-,-,-,8,10,12]
Multiplying by 3, the possible ugly numbers are [-,-,9,12,15,18]
Multiplying by 5, the possible ugly numbers are [-,10,15,20,25,30]
But the elements from each sequence that are useful are:
Multiplying by 2, the smallest number greater than 6 is 8
Multiplying by 3, the smallest number greater than 6 is 9
Multiplying by 5, the smallest number greater than 6 is 10
So you can see that what the algorithm is doing is creating three sequences of ugly numbers. Each sequence is formed by multiplying all of the existing ugly numbers by one of the three factors.
But all we care about is the smallest number in each sequence (larger than the largest ugly number found so far).
So the indexes i2, i3, and i5 are the indexes into the corresponding sequences. When you use a number from a sequence, you update the index to point to the next number in that sequence.

The intuition is the following:
any ugly number can be written as the product between 2, 3 or 5 and another (smaller) ugly number.
With that in mind, the solution that is mentioned in the question keeps track of i2, i3 and i5, the indices of the smallest ugly numbers generated so far, which multiplied by 2, 3, respectively 5 lead to a number that was not already generated. The smallest of these products is the smallest ugly number that was not already generated.
To state this differently, I believe that the following statement from the question might be the source of some confusion:
The algorithm is to keep three different counter variable for a
multiple of 2, 3 and 5. Let's assume i2,i3, and i5.
Note, for example, that ugly[i2] is not necessarily a multiple of 2. It is simply the smallest ugly number for which 2 * ugly[i2] is greater than ugly[i] (the largest ugly number known so far).
Regarding how the number 6 is generated in the next step, the procedure is shown below:
ugly = |1|2|3|4|5
i2 = 2;
i3 = 1;
i5 = 1;
ugly[5] = min(ugly[2]*2, ugly[1]*3, ugly[1]*5) = min(3*2, 2*3, 2*5) = 6
---------------------------------------------------
ugly = |1|2|3|4|5|6
i2 = 3
i3 = 2
i5 = 1
Note that here both i2 and i3 need to be incremented after generating the number 6, because both i2*2, as well as i3*3 produced the same next smallest ugly number.

Related

How to find the xth decibinary number?

Hackerrank has a problem called Decibinary numbers which are essentially numbers with 0-9 digit values but are exponentiated using powers of 2. The question asks us to display the xth decibinary number. There is another twist to the problem. Multiple decibinary numbers can equal the same decimal number. For example, 4 in decimal can be 100, 20, 12, and 4 in decibinary.
At first, I thought that finding how many decibinary numbers for a given decimal number would be helpful.
I consulted this post for a bit help ( https://math.stackexchange.com/questions/3540243/whats-the-number-of-decibinary-numbers-that-evaluate-to-given-decimal-number ). The post was a bit too hard to understand but then I also realized that even though we have how many decibinary numbers a decimal number can have, this doesn't help FINDING them (at least to my knowledge) which is the original goal of the question.
I do realize that for any decimal number, the largest decibinary number for it will simply be its binary representation. For ex, for 4 it is 100. So the brute force approach would be to check all numbers in this range for each decimal number and see if their decibinary representation evaluates to the given decimal number, but it is clearly evident that this approach will never pass since the input constraints define x to be from 1 to 10^16. Not only that, we have to find the xth decibinary number for a q amount of queries where q is from 1 to 10^5.
This question falls under the section of dp but I am confused how dp will be used or how it is even possible. In order for calculating the xth decibinary number q times (which is described in the brute force method above) it would be better to use a table (like the problem suggests). But for that, we would need to store and calculate 10^16 integers since that is the how big x can be. Assuming an integer is 4 Bytes, 4B * 10^16 ~= 4B * (2^3)^16 = 2^50 Bytes.
Can someone please explain how this problem is solved optimally. I am still new to CP so if I have made an error in something, please let me know.
(see link below for full problem statement):
https://www.hackerrank.com/challenges/decibinary-numbers/problem
This is solvable with about 80 MB of data. I won't give code, but I will explain the strategy.
Build a lookup count[n][i] that gives you the number of ways to get the decimal number n using the first i digits. You start by inserting 0 everywhere, and then put a 1 in count[0][0]. Now start filling in using the rule:
count[n][i] = count[n][i-1] + count[n - 2**i][i-1] + count[n - 2*2**i][i-1] + ... + count[n - 9*2**i][i-1]
It turns out that you only need the first 19 digits, and you only need counts of n up to 2**19-1. And the counts all fit in 8 byte longs.
Once you have that, create a second data structure count_below[n] which is the count of how many decibinary numbers will give a value less than n. Use the same range of n as before.
And now a lookup proceeds as follows. First you do a binary search on count_below to find the last value that has less than your target number below it. Subtracting count_below from your query, you know which decibinary number of that value you want.
Next, search through count[n][i] to find the i such that you get your target query with i digits, and not with less. This will be the position of the leading digit of your answer. You then subtract off count[n][i-1] from your query (all the decibinaries with fewer digits). Then subtract off count[n-2**i][i-1], count[n-2* 2**i][i-1], ... count[n-8*2**i][i-1] until you find what that leading digit is. Now you subtract the contribution of that digit from the value, and repeat the logic for finding the correct decibinary for that smaller value with fewer digits.
Here is a worked example to clarify. First the data structures for the first 3 digits and up to 2**3 - 1:
count = [
[1, 1, 1, 1], # sum 0
[0, 1, 1, 1], # sum 1
[0, 1, 2, 2], # sum 2
[0, 1, 2, 2], # sum 3
[0, 1, 3, 4], # sum 4
[0, 1, 3, 4], # sum 5
[0, 1, 4, 6], # sum 6
[0, 1, 4, 6], # sum 7
]
count_below = [
0, 1, 2, 4, 6, 10, 14, 20, 26, ...
]
Let's find the 20th.
count_below[6] is 14 and count_below[7] is 20 so our decimal sum is 6.
We want the 20 - count_below[6] = 6th decibinary with decimal sum 6.
count[6][2] is 4 while count[6][3] is 6 so we have a non-zero third digit.
We want the count[6][3] - count[6][2] = 2 with a non-zero third digit.
count[1][6 - 2**2] is 2, so 2 have 3rd digit 1.
The third digit is 1
We are now looking for the second decibinary whose decimal sum is 2.
count[2][1] is 1 and count[2][2] is 2 so it has a non-zero second digit.
We want the count[2][2] - count[2][1] = 1st with a non-zero second digit.
The second digit is 1
The rest is 0 because 2 - 2**1 = 0.
And thus you find that the answer is 110.
Now for such a small number, this was a lot of work. But even for your hardest lookup you'll only need about 20 steps of a binary search to find your decimal sum, another 20 steps to find the position of the first non-zero digit, and for each of of those digits, you'll have to do 1-9 different calculations to find what that digit is. Which means only hundreds of calculations to find the number.

Split array into four boxes such that sum of XOR's of the boxes is maximum

Given an array of integers which are needed to be split into four
boxes such that sum of XOR's of the boxes is maximum.
I/P -- [1,2,1,2,1,2]
O/P -- 9
Explanation: Box1--[1,2]
Box2--[1,2]
Box3--[1,2]
Box4--[]
I've tried using recursion but failed for larger test cases as the
Time Complexity is exponential. I'm expecting a solution using dynamic
programming.
def max_Xor(b1,b2,b3,b4,A,index,size):
if index == size:
return b1+b2+b3+b4
m=max(max_Xor(b1^A[index],b2,b3,b4,A,index+1,size),
max_Xor(b1,b2^A[index],b3,b4,A,index+1,size),
max_Xor(b1,b2,b3^A[index],b4,A,index+1,size),
max_Xor(b1,b2,b3,b4^A[index],A,index+1,size))
return m
def main():
print(max_Xor(0,0,0,0,A,0,len(A)))
Thanks in Advance!!
There are several things to speed up your algorithm:
Build in some start-up logic: it doesn't make sense to put anything into box 3 until boxes 1 & 2 are differentiated. In fact, you should generally have an order of precedence to keep you from repeating configurations in a different order.
Memoize your logic; this avoids repeating computations.
For large cases, take advantage of what value algebra exists.
This last item may turn out to be the biggest saving. For instance, if your longest numbers include several 5-bit and 4-bit numbers, it makes no sense to consider shorter numbers until you've placed those decently in the boxes, gaining maximum advantage for the leading bits. With only four boxes, you cannot have a num from 3-bit numbers that dominates a single misplaced 5-bit number.
Your goal is to place an odd number of 5-bit numbers into 3 or all 4 boxes; against this, check only whether this "pessimizes" bit 4 of the remaining numbers. For instance, given six 5-digit numbers (range 16-31) and a handful of small ones (0-7), your first consideration is to handle only combinations that partition the 5-digit numbers by (3, 1, 1, 1), as this leaves that valuable 5-bit turned on in each set.
With a more even mixture of values in your input, you'll also need to consider how to distribute the 4-bits for a similar "keep it odd" heuristic. Note that, as you work from largest to smallest, you need worry only about keeping it odd, and watching the following bit.
These techniques should let you prune your recursion enough to finish in time.
We can use Dynamic programming here to break the problem into smaller sets then store their result in a table. Then use already stored result to calculate answer for bigger set.
For example:
Input -- [1,2,1,2,1,2]
We need to divide the array consecutively into 4 boxed such that sum of XOR of all boxes is maximised.
Lets take your test case, break the problem into smaller sets and start solving for smaller set.
box = 1, num = [1,2,1,2,1,2]
ans = 1 3 2 0 1 3
Since we only have one box so all numbers will go into this box. We will store this answer into a table. Lets call the matrix as DP.
DP[1] = [1 3 2 0 1 3]
DP[i][j] stores answer for distributing 0-j numbers to i boxes.
now lets take the case where we have two boxes and we will take numbers one by one.
num = [1] since we only have one number it will go into the first box.
DP[1][0] = 1
Lets add another number.
num = [1 2]
now there can be two ways to put this new number into the box.
case 1: 2 will go to the First box. Since we already have answer
for both numbers in one box. we will just use that.
answer = DP[0][1] + 0 (Second box is empty)
case 2: 2 will go to second box.
answer = DP[0][0] + 2 (only 2 is present in the second box)
Maximum of the two cases will be stored in DP[1][1].
DP[1][1] = max(3+0, 1+2) = 3.
Now for num = [1 2 1].
Again for new number we have three cases.
box1 = [1 2 1], box2 = [], DP[0][2] + 0
box1 = [1 2], box2 = [1], DP[0][1] + 1
box1 = [1 ], box2 = [2 1], DP[0][0] + 2^1
Maximum of these three will be answer for DP[1][2].
Similarly we can find answer of num = [1 2 1 2 1 2] box = 4
1 3 2 0 1 3
1 3 4 6 5 3
1 3 4 6 7 9
1 3 4 6 7 9
Also note that a xor b xor a = b. you can use this property to get xor of a segment of an array in constant time as suggested in comments.
This way you can break the problem in smaller subset and use smaller set answer to compute for the bigger ones. Hope this helps. After understanding the concept you can go ahead and implement it with better time than exponential.
I would go bit by bit from the highest bit to the lowest bit. For every bit, try all combinations that distribute the still unused numbers that have that bit set so that an odd number of them is in each box, nothing else matters. Pick the best path overall. One issue that complicates this greedy method is that two boxes with a lower bit set can equal one box with the next higher bit set.
Alternatively, memoize the boxes state in your recursion as an ordered tuple.

Dynamic Programming Implementation of Unique subset generation

I came across this problem where we need to find the total number of ways in which you can form a subset of n(where n is the input by the user). The subset conditions are : the numbers in the subset should be distinct and the numbers in the subset should be in decreasing order.
Example: Given n =7 ,the output is 4 because the possible combinations are (6,1)(5,2)(4,3)(4,2,1). Note that though (4,1,1,1) also add upto 7,however it has repeating numbers. Hence it is not a valid subset.
I have solved this problem using the backtracking method which has exponential complexity. However, I am trying to figure out how can we solve this problem using Dynamic Prog ? The nearest I could come up with is the series : for n= 0,1,2,3,4,5,6,7,8,9,10 for which the output is 0,0,0,1,1,2,3,4,5,7,9 respectively. However, I am not able to come up with a general formula that I can use to calculate f(n) . While going through the internet I came across this kind of integer sequences in https://oeis.org/search?q=0%2C0%2C0%2C1%2C1%2C2%2C3%2C4%2C5%2C7%2C9%2C11%2C14&sort=&language=&go=Search . But I am unable to understand the formula they have provided here. Any help would be appreciated. Any other insights that improves exponential complexity is also appreciated.
What you call "unique subset generation" is better known as integer partitions with distinct parts. Mathworld has an entry on the Q function, which counts the number of partitions with distinct parts.
However, your function does not count trivial partitions (i.e. n -> (n)), so what you're looking for is actually Q(n) - 1, which generates the sequence that you linked in your question--the number of partitions into at least 2 distinct parts. This answer to a similar question on Mathematics contains an efficient algorithm in Java for computing the sequence up to n = 200, which easily can be adapted for larger values.
Here's a combinatorial explanation of the referenced algorithm:
Let's start with a table of all the subsets of {1, 2, 3}, grouped by their sums:
index (sum) partitions count
0 () 1
1 (1) 1
2 (2) 1
3 (1+2), (3) 2
4 (1+3) 1
5 (2+3) 1
6 (1+2+3) 1
Suppose we want to construct a new table of all of subsets of {1, 2, 3, 4}. Notice that every subset of {1, 2, 3} is also a subset of {1, 2, 3, 4}, so each subset above will appear in our new table. In fact, we can divide our new table into two equally sized categories: subsets which do not include 4, and those that do. What we can do is start from the table above, copy it over, and then "extend" it with 4. Here's the table for {1, 2, 3, 4}:
index (sum) partitions count
0 () 1
1 (1) 1
2 (2) 1
3 (1+2), (3) 2
4 (1+3), [4] 2
5 (2+3), [1+4] 2
6 (1+2+3),[2+4] 2
7 [1+2+4],[3+4] 2
8 [1+3+4] 1
9 [2+3+4] 1
10 [1+2+3+4] 1
All the subsets that include 4 are surrounded by square brackets, and they are formed by adding 4 to exactly one of the old subsets which are surrounded by parentheses. We can repeat this process and build the table for {1,2,..,5}, {1,2,..,6}, etc.
But we don't actually needed to store the actual subsets/partitions, we just need the counts for each index/sum. For example, if with have the table for {1, 2, 3} with only counts, we can build the count-table for {1, 2, 3, 4} by taking each (index, count) pair from the old table, and adding count to the the current count for index + 4. The idea is that, say, if there are two subsets of {1, 2, 3} that sum to 3, then adding 4 to each of those two subsets will make two new subsets for 7.
With this in mind, here's a Python implementation of this process:
def c(n):
counts = [1]
for k in range(1, n + 1):
new_counts = counts[:] + [0]*k
for index, count in enumerate(counts):
new_counts[index + k] += count
counts = new_counts
return counts
We start with the table for {}, which has just one subset--the empty set-- which sums to zero. Then we build the table for {1}, then for {1, 2}, ..., all the way up to {1,2,..,n}. Once we're done, we'll have counted every partition for n, since no partition of n can include integers larger than n.
Now, we can make 2 major optimizations to the code:
Limit the table to only include entries up to n, since that's all we're interested in. If index + k exceeds n, then we just ignore it. We can even preallocate space for the final table up front, rather than growing bigger tables on each iteration.
Instead of building a new table from scratch on each iteration, if we carefully iterate over the old table backwards, we can actually update it in-place without messing up any of the new values.
With these optimizations, you effectively have the same algorithm from the Mathematics answer that was referenced earlier, which was motivated by generating functions. It runs in O(n^2) time and takes only O(n) space. Here's what it looks like in Python:
def c(n):
counts = [1] + [0]*n
for k in range(1, n + 1):
for i in reversed(range(k, n + 1)):
counts[i] += counts[i - k]
return counts
def Q(n):
"-> the number of subsets of {1,..,n} that sum to n."
return c(n)[n]
def f(n):
"-> the number of subsets of {1,..,n} of size >= 2 that sum to n."
return Q(n) - 1

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

Random number generator that fills an interval

How would you implement a random number generator that, given an interval, (randomly) generates all numbers in that interval, without any repetition?
It should consume as little time and memory as possible.
Example in a just-invented C#-ruby-ish pseudocode:
interval = new Interval(0,9)
rg = new RandomGenerator(interval);
count = interval.Count // equals 10
count.times.do{
print rg.GetNext() + " "
}
This should output something like :
1 4 3 2 7 5 0 9 8 6
Fill an array with the interval, and then shuffle it.
The standard way to shuffle an array of N elements is to pick a random number between 0 and N-1 (say R), and swap item[R] with item[N]. Then subtract one from N, and repeat until you reach N =1.
This has come up before. Try using a linear feedback shift register.
One suggestion, but it's memory intensive:
The generator builds a list of all numbers in the interval, then shuffles it.
A very efficient way to shuffle an array of numbers where each index is unique comes from image processing and is used when applying techniques like pixel-dissolve.
Basically you start with an ordered 2D array and then shift columns and rows. Those permutations are by the way easy to implement, you can even have one exact method that will yield the resulting value at x,y after n permutations.
The basic technique, described on a 3x3 grid:
1) Start with an ordered list, each number may exist only once
0 1 2
3 4 5
6 7 8
2) Pick a row/column you want to shuffle, advance it one step. In this case, i am shifting the second row one to the right.
0 1 2
5 3 4
6 7 8
3) Pick a row/column you want to shuffle... I suffle the second column one down.
0 7 2
5 1 4
6 3 8
4) Pick ... For instance, first row, one to the left.
2 0 7
5 1 4
6 3 8
You can repeat those steps as often as you want. You can always do this kind of transformation also on a 1D array. So your result would be now [2, 0, 7, 5, 1, 4, 6, 3, 8].
An occasionally useful alternative to the shuffle approach is to use a subscriptable set container. At each step, choose a random number 0 <= n < count. Extract the nth item from the set.
The main problem is that typical containers can't handle this efficiently. I have used it with bit-vectors, but it only works well if the largest possible member is reasonably small, due to the linear scanning of the bitvector needed to find the nth set bit.
99% of the time, the best approach is to shuffle as others have suggested.
EDIT
I missed the fact that a simple array is a good "set" data structure - don't ask me why, I've used it before. The "trick" is that you don't care whether the items in the array are sorted or not. At each step, you choose one randomly and extract it. To fill the empty slot (without having to shift an average half of your items one step down) you just move the current end item into the empty slot in constant time, then reduce the size of the array by one.
For example...
class remaining_items_queue
{
private:
std::vector<int> m_Items;
public:
...
bool Extract (int &p_Item); // return false if items already exhausted
};
bool remaining_items_queue::Extract (int &p_Item)
{
if (m_Items.size () == 0) return false;
int l_Random = Random_Num (m_Items.size ());
// Random_Num written to give 0 <= result < parameter
p_Item = m_Items [l_Random];
m_Items [l_Random] = m_Items.back ();
m_Items.pop_back ();
}
The trick is to get a random number generator that gives (with a reasonably even distribution) numbers in the range 0 to n-1 where n is potentially different each time. Most standard random generators give a fixed range. Although the following DOESN'T give an even distribution, it is often good enough...
int Random_Num (int p)
{
return (std::rand () % p);
}
std::rand returns random values in the range 0 <= x < RAND_MAX, where RAND_MAX is implementation defined.
Take all numbers in the interval, put them to list/array
Shuffle the list/array
Loop over the list/array
One way is to generate an ordered list (0-9) in your example.
Then use the random function to select an item from the list. Remove the item from the original list and add it to the tail of new one.
The process is finished when the original list is empty.
Output the new list.
You can use a linear congruential generator with parameters chosen randomly but so that it generates the full period. You need to be careful, because the quality of the random numbers may be bad, depending on the parameters.

Resources