recursively split line into smaller segments - algorithm

I have a line. It starts with two indexes, call them 0 and 1, at the outermost points. At any point I can create a new point which bisects two other ones (there must not already be a point between them). However when this happens the indexes need to increment. For example, here's a potential series of steps to achieve N=5 since there are indexes in the result.
(graph) (split between) (iteration #)
< ============================ >
0 1 0,1 0
0 1 2 1,2 1
0 1 2 3 0,1 2
0 1 2 3 4
I have two questions:
What pseudocode could be used to find the "split between" values given the iteration number?
How could I prevent the shape from being unbalanced? Are there certain restrictions I should place on the value of N? I don't particularly care what order the splits happen in, but I do want to make sure the result is balanced.
This is an issue I've encountered when developing a video game.

I'm not sure if this is the kind of answer you are looking for, but I see this as a binary tree structure. Every tree node contains its own label and its left and right labels. The root of the tree (level 0) would be (2, 0, 1) (split 2 with 0 on the left and 1 and the right). Every node would be split into two children. The algorithm would go something like this:
At step N, pick the leftmost node without two children in level floor(log2(N - 1)).
Take the node label T and the left and right labels L and R from that node.
If the node does not have a left child, add a left child node (N, L, T).
If the node already has a left child, add a right child node (N, T, R).
N <- N + 1
For example, at iteration 5 you would have something like this:
Level 0: (2, 0, 1)
/ \
/ \
/ \
Level 1: (3, 0, 2) (4, 2, 1)
/
/
Level 2: (5, 0, 3)
Now, to reconstruct the current split, you would do the following:
Initialize a list S <- [0].
For every node (T, L, R) in the tree traversed in postorder:
If the node does not have a left child, append T to S.
If the node does not have a right child, append R to S.
For the previous case, you would have:
S = [0]
(5, 0, 3) -> S = [0, 5, 3]
(3, 0, 2) -> S = [0, 5, 3, 2]
(4, 2, 1) -> S = [0, 5, 3, 2, 4, 1]
(2, 0, 1) -> S = [0, 5, 3, 2, 4, 1]
So the complete split would be [0, 5, 3, 2, 4, 1]. The split would be perfectly balanced only when N = 2k for some positive integer k. Of course, you can annotate the tree nodes with additional "distance" information if you need to keep track of something like that.

I agree with jdehesa in that what you are doing does have its similarities with a binary tree. I would recommend looking in using that data structure if you can, since it is highly structured, well-defined, and many great algorithms exist for working with them.
Additionally, as mentioned in the comment section above, a linked list would also be a nice option, since you are adding in a lot of elements. A normal array (which is contiguous in memory) will require you to move many elements over and over again as you insert additional elements, which is slow. A linked list would allow you to add your element anywhere in memory, and then just update a few pointers in the linked list on both sides of where you want to insert it, and be done. No moving things around.
However, if you really just want to put together a working solution using array and aren't concerned with using other data structures, here is the math for the indexing you requested:
Each pair can be listed as (a, b), and we can quickly see b = a + 1. Thus, if you find a, you know b. To get these, you'll need two loops:
iteration := 0
i := 0
while iteration < desired_iterations
for j = (2 ^ i) - 1; j >= 0 && iteration < desired_iterations; j--
print j, j + 1
iteration++
i++
Where ^ is the exponentiation operator. What we do is find the second to last element in the list (2^i)-1 and count backwards, listing off the indices. We then increment "i" to signify that we've now doubled our array size, and then repeat again. If at any point we research our desired number of iterations, we break out of both loops because we're finished.

Related

Daily Coding Problem 316 : Coin Change Problem - determination of denomination?

I'm going through the Daily Coding Problems and am currently stuck in one of the problems. It goes by:
You are given an array of length N, where each element i represents
the number of ways we can produce i units of change. For example, [1,
0, 1, 1, 2] would indicate that there is only one way to make 0, 2, or
3 units, and two ways of making 4 units.
Given such an array, determine the denominations that must be in use.
In the case above, for example, there must be coins with values 2, 3,
and 4.
I'm unable to figure out how to determine the denomination from the total number of ways array. Can you work it out?
Somebody already worked out this problem here, but it's devoid of any explanation.
From what I could gather is that he collects all the elements whose value(number of ways == 1) and appends it to his answer, but I think it doesn't consider the fact that the same number can be formed from a combination of lower denominations for which still the number of ways would come out to be 1 irrespective of the denomination's presence.
For example, in the case of arr = [1, 1, a, b, c, 1]. We know that denomination 1 exists since arr[1] = 1. Now we can also see that arr[5] = 1, this should not necessarily mean that denomination 5 is available since 5 can be formed using coins of denomination 1, i.e. (1 + 1 + 1 + 1 + 1).
Thanks in advance!
If you're solving the coin change problem, the best technique is to maintain an array of ways of making change with a partial set of the available denominations, and add in a new denomination d by updating the array like this:
for i = d upto N
a[i] += a[i-d]
Your actual problem is the reverse of this: finding denominations based on the total number of ways. Note that if you know one d, you can remove it from the ways array by reversing the above procedure:
for i = N downto d
a[i] -= a[i-d]
You can find the lowest denomination available by looking for the first 1 in the array (other than the value at index 0, which is always 1). Then, once you've found the lowest denomination, you can remove its effect on the ways array, and repeat until the array is zeroed (except for the first value).
Here's a full solution in Python:
def rways(A):
dens = []
for i in range(1, len(A)):
if not A[i]: continue
dens.append(i)
for j in range(len(A)-1, i-1, -1):
A[j] -= A[j-i]
return dens
print(rways([1, 0, 1, 1, 2]))
You might want to add error-checking: if you find a non-zero value that's not 1 when searching for the next denomination, then the original array isn't valid.
For reference and comparison, here's some code for computing the ways of making change from a set of denominations:
def ways(dens, N):
A = [1] + [0] * N
for d in dens:
for i in range(d, N+1):
A[i] += A[i-d]
return A
print(ways([2, 3, 4], 4))

How can I sort a vector of boolean vectors in this way? ('ranking analysis')

We need to sort a large number of vectors (an array of arrays) containing only true and false (1's and 0's), all the same size.
We have the rules that 1 + 1 = 1 (true + true = true) and 1 + 0 = 1 and 0 + 0 = 0.
The first vector is the one with the most 1's.
The second vector is the one which brings more 1's in addition to the ones we already had in the first vector.
The third vector is the one which brings more 1's in addition to the ones we already had in the previous 2 vectors.
And so on.
For example, let's say we have these 3 vectors:
a. (0, 1, 0, 0, 1, 1, 0)
b. (1, 0, 1, 1, 0, 1, 1)
c. (0, 1, 1, 1, 0, 1, 0)
The first one in our sort is b because it has the most 1's.
The next one is a. Even though c has more 1's than a, a has more 1's in addition to the 1's we had in b.
By now, the sum of a + b is (1, 1, 1, 1, 1, 1, 1), so the last one is c because, it brings nothing new to the sorting.
If two vectors brings the same number of extra 1's, the order of them doesn't really matter. I believe there are multiple possible results for this kind of sorting and they are all as good.
We call this a 'ranking analysis' here, but we don't have a clear term for this kind of sort and google doesn't yield very useful info on it.
The easiest method is to just take them one by one with an O(n^2). However, we are working with big data and we already have a software for this which is too slow, so we need something really optimized.
How can we achieve this? Programming language doesn't matter, we can use anything. Can this be parallelized (run it on multiple CPU's to speed up the process)? Any sources or ideas are welcome.
Edit: I checked; apparently we have a case where the length of these vectors is 103, so they can be longer than 64 slots.

Finding if it's possible to get to a certain element in a binary tree

I was wondering if I, from a certain point in sth like a binary tree, could get to a next certain point.
I should say also, that I don't have a tree structure. I will have just points.
For example (342,124) -> (23420,1324) and the program should say me if it is possible to go from (342,124) to (23420,1324).
My coordinate system template (depthToNode,Node). I just need to know if I can move from a point to point, which those points are linked by exactly the same way, the same values, like in the data structure in the image.
Some explanation:
The top node is (0, 0)
Every MOVE increases depthToNode by one; At the same time value of the node decreases by 1 when moving to left or increases by 1 when moving to right.During a MOVE Every node is connected to left and right subnodes. When moving to left - leftNode decreases value node and Node value can increase by one. Thus basic MOVE can be only (+1, -1) or (+1, +1).
First it appeared first to me like a problem that can be solved using Breadth First Search or Depth First Search algorithms.
After having a closer look at the problem - I noticed that there is a clear pattern that can be turned into equation. I focused on the fact that left subnode has value Node-1 and and right subnode has value Node+1.
So thinking in terms of points we can have points (a, b) and (c, d):
For (a, b) and (c, d) where c>=a
You have N=c-a consecutive operations.
There are two types of operations L= -1 and R=+1.
The solution exists if there is l and r that:
(Ll + Rr)=d-b where l+r==N and l >= 0, r >= 0, both int.
So:
(L*(N-r) + R*r)=d-b
NL-Lr+Rr = d-b
-N+2r=d-b
2r=d-b+N
2r=d-b+c-a
So in the end:
If d-b+c-a produce R that is even and greater or equal to zero (0, 2, 4 ... ) = there is a path.
Lets try it:
(0, 0) -> (3, 1): 1 - 0 + 3 - 0 = 2 (path exists, cause it is 0, 2, 4...)
(2, 0) -> (4, -4): -4 - 0 + 4 - 2 = -2 (path does not exists).

Disperse Duplicates in an Array

Source : Google Interview Question
Write a routine to ensure that identical elements in the input are maximally spread in the output?
Basically, we need to place the same elements,in such a way , that the TOTAL spreading is as maximal as possible.
Example:
Input: {1,1,2,3,2,3}
Possible Output: {1,2,3,1,2,3}
Total dispersion = Difference between position of 1's + 2's + 3's = 4-1 + 5-2 + 6-3 = 9 .
I am NOT AT ALL sure, if there's an optimal polynomial time algorithm available for this.Also,no other detail is provided for the question other than this .
What i thought is,calculate the frequency of each element in the input,then arrange them in the output,each distinct element at a time,until all the frequencies are exhausted.
I am not sure of my approach .
Any approaches/ideas people .
I believe this simple algorithm would work:
count the number of occurrences of each distinct element.
make a new list
add one instance of all elements that occur more than once to the list (order within each group does not matter)
add one instance of all unique elements to the list
add one instance of all elements that occur more than once to the list
add one instance of all elements that occur more than twice to the list
add one instance of all elements that occur more than trice to the list
...
Now, this will intuitively not give a good spread:
for {1, 1, 1, 1, 2, 3, 4} ==> {1, 2, 3, 4, 1, 1, 1}
for {1, 1, 1, 2, 2, 2, 3, 4} ==> {1, 2, 3, 4, 1, 2, 1, 2}
However, i think this is the best spread you can get given the scoring function provided.
Since the dispersion score counts the sum of the distances instead of the squared sum of the distances, you can have several duplicates close together, as long as you have a large gap somewhere else to compensate.
for a sum-of-squared-distances score, the problem becomes harder.
Perhaps the interview question hinged on the candidate recognizing this weakness in the scoring function?
In perl
#a=(9,9,9,2,2,2,1,1,1);
then make a hash table of the counts of different numbers in the list, like a frequency table
map { $x{$_}++ } #a;
then repeatedly walk through all the keys found, with the keys in a known order and add the appropriate number of individual numbers to an output list until all the keys are exhausted
#r=();
$g=1;
while( $g == 1 ) {
$g=0;
for my $n (sort keys %x)
{
if ($x{$n}>1) {
push #r, $n;
$x{$n}--;
$g=1
}
}
}
I'm sure that this could be adapted to any programming language that supports hash tables
python code for algorithm suggested by Vorsprung and HugoRune:
from collections import Counter, defaultdict
def max_spread(data):
cnt = Counter()
for i in data: cnt[i] += 1
res, num = [], list(cnt)
while len(cnt) > 0:
for i in num:
if num[i] > 0:
res.append(i)
cnt[i] -= 1
if cnt[i] == 0: del cnt[i]
return res
def calc_spread(data):
d = defaultdict()
for i, v in enumerate(data):
d.setdefault(v, []).append(i)
return sum([max(x) - min(x) for _, x in d.items()])
HugoRune's answer takes some advantage of the unusual scoring function but we can actually do even better: suppose there are d distinct non-unique values, then the only thing that is required for a solution to be optimal is that the first d values in the output must consist of these in any order, and likewise the last d values in the output must consist of these values in any (i.e. possibly a different) order. (This implies that all unique numbers appear between the first and last instance of every non-unique number.)
The relative order of the first copies of non-unique numbers doesn't matter, and likewise nor does the relative order of their last copies. Suppose the values 1 and 2 both appear multiple times in the input, and that we have built a candidate solution obeying the condition I gave in the first paragraph that has the first copy of 1 at position i and the first copy of 2 at position j > i. Now suppose we swap these two elements. Element 1 has been pushed j - i positions to the right, so its score contribution will drop by j - i. But element 2 has been pushed j - i positions to the left, so its score contribution will increase by j - i. These cancel out, leaving the total score unchanged.
Now, any permutation of elements can be achieved by swapping elements in the following way: swap the element in position 1 with the element that should be at position 1, then do the same for position 2, and so on. After the ith step, the first i elements of the permutation are correct. We know that every swap leaves the scoring function unchanged, and a permutation is just a sequence of swaps, so every permutation also leaves the scoring function unchanged! This is true at for the d elements at both ends of the output array.
When 3 or more copies of a number exist, only the position of the first and last copy contribute to the distance for that number. It doesn't matter where the middle ones go. I'll call the elements between the 2 blocks of d elements at either end the "central" elements. They consist of the unique elements, as well as some number of copies of all those non-unique elements that appear at least 3 times. As before, it's easy to see that any permutation of these "central" elements corresponds to a sequence of swaps, and that any such swap will leave the overall score unchanged (in fact it's even simpler than before, since swapping two central elements does not even change the score contribution of either of these elements).
This leads to a simple O(nlog n) algorithm (or O(n) if you use bucket sort for the first step) to generate a solution array Y from a length-n input array X:
Sort the input array X.
Use a single pass through X to count the number of distinct non-unique elements. Call this d.
Set i, j and k to 0.
While i < n:
If X[i+1] == X[i], we have a non-unique element:
Set Y[j] = Y[n-j-1] = X[i].
Increment i twice, and increment j once.
While X[i] == X[i-1]:
Set Y[d+k] = X[i].
Increment i and k.
Otherwise we have a unique element:
Set Y[d+k] = X[i].
Increment i and k.

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

Resources