Difficulties formulating a square matrix algorithm in O(n) time - algorithm

I am looking back on some past papers, for my exam, and I have come across a square matrix algorithm question/analysis that I cannot for the life of me approach.
Basically, I am given a N by N matrix (a square matrix basically), and I need to implement a data structure that allows me to increase the size of the matrix by 1 (row + 1, column + 1) in O(n) time.
After coercing my tutor, I realise the best data structure would be an array of arrays, so so essentially an something like this [ {1,2,3},{4,5,6},{7,8,9} ] this would denote my matrix, row 1, row 2, row 3
Now I need to be able to expand this matrix by 1 when the increase_size() method is called, I have already tried a naive solution, that is create a new empty array of size 4 sinze our previous matrix has 3 elements, append this array to our matrix_array, and then add a 0 to all the remaining arrays, however this takes O(n^2) time.
I believe there is something here that is related to the rows and columns, when we increase our matrix size we are essentially creating a new row and column, I believe this has something to do with the solution.
I have attached the question below.

Try an array of arrays:
M = [ A1, A2, ..., An ]
each array Ax contains the values a_{i,j} if max(i,j) == x.
I'll let you do the proofs.

naive solution works in O(n) actually:
consider any matrix of size n * n implemented as array of arrays (where arrays are dynamic)
to reach size n + 1 we first create a new array of size n and append to the end of our matrix, this takes O(n) time
then for every row in our matrix we append a 0 to the end we have n + 1 rows and append 1 element each so this too takes O(n) time
in total runtime is O(n)

Related

How to find 2 special elements in the array in O(n)

Let a1,...,an be a sequence of real numbers. Let m be the minimum of the sequence, and let M be the maximum of the sequence.
I proved that there exists 2 elements in the sequence, x,y, such that |x-y|<=(M-m)/n.
Now, is there a way to find an algorithm that finds such 2 elements in time complexity of O(n)?
I thought about sorting the sequence, but since I dont know anything about M I cannot use radix/bucket or any other linear time algorithm that I'm familier with.
I'd appreciate any idea.
Thanks in advance.
First find out n, M, m. If not already given they can be determined in O(n).
Then create a memory storage of n+1 elements; we will use the storage for n+1 buckets with width w=(M-m)/n.
The buckets cover the range of values equally: Bucket 1 goes from [m; m+w[, Bucket 2 from [m+w; m+2*w[, Bucket n from [m+(n-1)*w; m+n*w[ = [M-w; M[, and the (n+1)th bucket from [M; M+w[.
Now we go once through all the values and sort them into the buckets according to the assigned intervals. There should be at a maximum 1 element per bucket. If the bucket is already filled, it means that the elements are closer together than the boundaries of the half-open interval, e.g. we found elements x, y with |x-y| < w = (M-m)/n.
If no such two elements are found, afterwards n buckets of n+1 total buckets are filled with one element. And all those elements are sorted.
We once more go through all the buckets and compare the distance of the content of neighbouring buckets only, whether there are two elements, which fulfil the condition.
Due to the width of the buckets, the condition cannot be true for buckets, which are not adjoining: For those the distance is always |x-y| > w.
(The fulfilment of the last inequality in 4. is also the reason, why the interval is half-open and cannot be closed, and why we need n+1 buckets instead of n. An alternative would be, to use n buckets and make the now last bucket a special case with [M; M+w]. But O(n+1)=O(n) and using n+1 steps is preferable to special casing the last bucket.)
The running time is O(n) for step 1, 0 for step 2 - we actually do not do anything there, O(n) for step 3 and O(n) for step 4, as there is only 1 element per bucket. Altogether O(n).
This task shows, that either sorting of elements, which are not close together or coarse sorting without considering fine distances can be done in O(n) instead of O(n*log(n)). It has useful applications. Numbers on computers are discrete, they have a finite precision. I have sucessfuly used this sorting method for signal-processing / fast sorting in real-time production code.
About #Damien 's remark: The real threshold of (M-m)/(n-1) is provably true for every such sequence. I assumed in the answer so far the sequence we are looking at is a special kind, where the stronger condition is true, or at least, for all sequences, if the stronger condition was true, we would find such elements in O(n).
If this was a small mistake of the OP instead (who said to have proven the stronger condition) and we should find two elements x, y with |x-y| <= (M-m)/(n-1) instead, we can simplify:
-- 3. We would do steps 1 to 3 like above, but with n buckets and the bucket width set to w = (M-m)/(n-1). The bucket n now goes from [M; M+w[.
For step 4 we would do the following alternative:
4./alternative: n buckets are filled with one element each. The element at bucket n has to be M and is at the left boundary of the bucket interval. The distance of this element y = M to the element x in the n-1th bucket for every such possible element x in the n-1thbucket is: |M-x| <= w = (M-m)/(n-1), so we found x and y, which fulfil the condition, q.e.d.
First note that the real threshold should be (M-m)/(n-1).
The first step is to calculate the min m and max M elements, in O(N).
You calculate the mid = (m + M)/2value.
You concentrate the value less than mid at the beginning, and more than mid at the end of he array.
You select the part with the largest number of elements and you iterate until very few numbers are kept.
If both parts have the same number of elements, you can select any of them. If the remaining part has much more elements than n/2, then in order to maintain a O(n) complexity, you can keep onlyn/2 + 1 of them, as the goal is not to find the smallest difference, but one difference small enough only.
As indicated in a comment by #btilly, this solution could fail in some cases, for example with an input [0, 2.1, 2.9, 5]. For that, it is needed to calculate the max value of the left hand, and the min value of the right hand, and to test if the answer is not right_min - left_max. This doesn't change the O(n) complexity, even if the solution becomes less elegant.
Complexity of the search procedure: O(n) + O(n/2) + O(n/4) + ... + O(2) = O(2n) = O(n).
Damien is correct in his comment that the correct results is that there must be x, y such that |x-y| <= (M-m)/(n-1). If you have the sequence [0, 1, 2, 3, 4] you have 5 elements, but no two elements are closer than (M-m)/n = (4-0)/5 = 4/5.
With the right threshold, the solution is easy - find M and m by scanning through the input once, and then bucket the input into (n-1) buckets of size (M-m)/(n-1), putting values that are on the boundaries of a pair of buckets into both buckets. At least one bucket must have two values in it by the pigeon-hole principle.

find 4th smallest element in linear time

So i had an exercise given to me about 2 months ago, that says the following:
Given n (n>=4) distinct elements, design a divide & conquer algorithm to compute the 4th smallest element. Your algorithm should run in linear time in the worst case.
I had an extremely hard time with this problem, and could only find relevant algorithms that runs in the worst case O(n*k). After several weeks of trying, we managed, with the help of our teacher, "solve" this problem. The final algorithm is as follows:
Rules: The input size can only be of size 2^k
(1): Divide input into n/2. One left array, one right array.
(2): If input size == 4, sort the arrays using merge sort.
(2.1) Merge left array with right array into a new result array with length 4.
(2.2) Return element at index [4-1]
(3): Repeat step 1
This is solved recursively and our base case is at step 2. Step 2.2 means that for all
of our recursive calls that we did, we will get a final result array of length 4, and at that
point, we can justr return the element at index [4-1].
With this algorithm, my teacher claims that this runs in linear time. My problem with that statement is that we are diving the input until we reach sub-arrays with an input size of 4, and then that is sorted. So for an input size of 8, we would sort 2 sub-arrays with length 4, since 8/4 = 2. How is this in any case linear time? We are still sorting the whole input size but in blocks aren't we? This really does not make sense to me. It doesn't matter if we sort the whole input size at it is, or divide it into sub-arrays with size of 4,and sort them like that? It will still be a worst time of O(n*log(n))?
Would appreciate some explanations on this !
To make proving that algorithm runs in linear time, let's modify it a bit (we will only change an order of dividing and merging blocks, nothing more):
(1): Divide input into n/4 blocks, each has size 4.
(2): Until there is more than one block, repeat:
Merge each pair of adjacent blocks into one block of size 4.
(For example, if we have 4 blocks, we will split them in 2 pairs -
first pair contains first and second blocks,
second pair contains third and fourth blocks.
After merging we will have 2 blocks -
the first one contains 4 least elements from blocks 1 and 2,
the second one contains 4 least elements from blocks 3 and 4).
(3): The answer is the last element of that one block left.
Proof: It's a fact that array of constant length (in your case, 4) can be sorted in constant time. Let k = log(n). Loop (2) runs k-2 iterations (on each iteration the count of elements left is divided by 2, until 4 elements are left).
Before i-th iteration (0 <= i < k-2) there are (2^(k-i)) elements left, so there are 2^(k-i-2) blocks and we will merge 2^(k-i-3) pairs of blocks. Let's find how many pairs we will merge in all iterations. Count of merges equals
mergeOperationsCount = 2^(k-3) + 2^(k-4) + .... + 2^(k-(k-3)) =
= 2^(k-3) * (1 + 1/2 + 1/4 + 1/8 + .....) < 2^(k-2) = O(2^k) = O(n)
Since we can merge each pair in constant time (because thay have constant size), and the only operation we make is merging pairs, the algorithm runs in O(n).
And after this proof, I want to notice that there is another linear algorithm which is trivial, but it is not divide-and-conquer.

Check if array B is a permutation of A

I tried to find a solution to this but couldn't get much out of my head.
We are given two unsorted integer arrays A and B. We have to check whether array B is a permutation of A. How can this be done.? Even XORing the numbers wont work as there can be several counterexamples which have same XOR value bt are not permutation of each other.
A solution needs to be O(n) time and with space O(1)
Any help is welcome!!
Thanks.
The question is theoretical but you can do it in O(n) time and o(1) space. Allocate an array of 232 counters and set them all to zero. This is O(1) step because the array has constant size. Then iterate through the two arrays. For array A, increment the counters corresponding to the integers read. For array B, decrement them. If you run into a negative counter value during iteration of array B, stop --- the arrays are not permutations of each others. Otherwise at the end (assuming A and B have the same size, a prerequisite) the counter array is all zero and the two arrays are permutations of each other.
This is O(1) space and O(n) time solution. However it is not practical, but would easily pass as a solution to the interview question. At least it should.
More obscure solutions
Using a nondeterministic model of computation, checking that the two arrays are not permutations of each others can be done in O(1) space, O(n) time by guessing an element that has differing count on the two arrays, and then counting the instances of that element on both of the arrays.
In randomized model of computation, construct a random commutative hash function and calculate the hash values for the two arrays. If the hash values differ, the arrays are not permutations of each others. Otherwise they might be. Repeat many times to bring the probability of error below desired threshold. Also on O(1) space O(n) time approach, but randomized.
In parallel computation model, let 'n' be the size of the input array. Allocate 'n' threads. Every thread i = 1 .. n reads the ith number from the first array; let that be x. Then the same thread counts the number of occurrences of x in the first array, and then check for the same count on the second array. Every single thread uses O(1) space and O(n) time.
Interpret an integer array [ a1, ..., an ] as polynomial xa1 + xa2 + ... + xan where x is a free variable and the check numerically for the equivalence of the two polynomials obtained. Use floating point arithmetics for O(1) space and O(n) time operation. Not an exact method because of rounding errors and because numerical checking for equivalence is probabilistic. Alternatively, interpret the polynomial over integers modulo a prime number, and perform the same probabilistic check.
If we are allowed to freely access a large list of primes, you can solve this problem by leveraging properties of prime factorization.
For both arrays, calculate the product of Prime[i] for each integer i, where Prime[i] is the ith prime number. The value of the products of the arrays are equal iff they are permutations of one another.
Prime factorization helps here for two reasons.
Multiplication is transitive, and so the ordering of the operands to calculate the product is irrelevant. (Some alluded to the fact that if the arrays were sorted, this problem would be trivial. By multiplying, we are implicitly sorting.)
Prime numbers multiply losslessly. If we are given a number and told it is the product of only prime numbers, we can calculate exactly which prime numbers were fed into it and exactly how many.
Example:
a = 1,1,3,4
b = 4,1,3,1
Product of ith primes in a = 2 * 2 * 5 * 7 = 140
Product of ith primes in b = 7 * 2 * 5 * 2 = 140
That said, we probably aren't allowed access to a list of primes, but this seems a good solution otherwise, so I thought I'd post it.
I apologize for posting this as an answer as it should really be a comment on antti.huima's answer, but I don't have the reputation yet to comment.
The size of the counter array seems to be O(log(n)) as it is dependent on the number of instances of a given value in the input array.
For example, let the input array A be all 1's with a length of (2^32) + 1. This will require a counter of size 33 bits to encode (which, in practice, would double the size of the array, but let's stay with theory). Double the size of A (still all 1 values) and you need 65 bits for each counter, and so on.
This is a very nit-picky argument, but these interview questions tend to be very nit-picky.
If we need not sort this in-place, then the following approach might work:
Create a HashMap, Key as array element, Value as number of occurances. (To handle multiple occurrences of the same number)
Traverse array A.
Insert the array elements in the HashMap.
Next, traverse array B.
Search every element of B in the HashMap. If the corresponding value is 1, delete the entry. Else, decrement the value by 1.
If we are able to process entire array B and the HashMap is empty at that time, Success. else Failure.
HashMap will use constant space and you will traverse each array only once.
Not sure if this is what you are looking for. Let me know if I have missed any constraint about space/time.
You're given two constraints: Computational O(n), where n means the total length of both A and B and memory O(1).
If two series A, B are permutations of each other, then theres also a series C resulting from permutation of either A or B. So the problem is permuting both A and B into series C_A and C_B and compare them.
One such permutation would be sorting. There are several sorting algorithms which work in place, so you can sort A and B in place. Now in a best case scenario Smooth Sort sorts with O(n) computational and O(1) memory complexity, in the worst case with O(n log n) / O(1).
The per element comparision then happens at O(n), but since in O notation O(2*n) = O(n), using a Smooth Sort and comparison will give you a O(n) / O(1) check if two series are permutations of each other. However in the worst case it will be O(n log n)/O(1)
The solution needs to be O(n) time and with space O(1).
This leaves out sorting and the space O(1) requirement is a hint that you probably should make a hash of the strings and compare them.
If you have access to a prime number list do as cheeken's solution.
Note: If the interviewer says you don't have access to a prime number list. Then generate the prime numbers and store them. This is O(1) because the Alphabet length is a constant.
Else here's my alternative idea. I will define the Alphabet as = {a,b,c,d,e} for simplicity.
The values for the letters are defined as:
a, b, c, d, e
1, 2, 4, 8, 16
note: if the interviewer says this is not allowed, then make a lookup table for the Alphabet, this takes O(1) space because the size of the Alphabet is a constant
Define a function which can find the distinct letters in a string.
// set bit value of char c in variable i and return result
distinct(char c, int i) : int
E.g. distinct('a', 0) returns 1
E.g. distinct('a', 1) returns 1
E.g. distinct('b', 1) returns 3
Thus if you iterate the string "aab" the distinct function should give 3 as the result
Define a function which can calculate the sum of the letters in a string.
// return sum of c and i
sum(char c, int i) : int
E.g. sum('a', 0) returns 1
E.g. sum('a', 1) returns 2
E.g. sum('b', 2) returns 4
Thus if you iterate the string "aab" the sum function should give 4 as the result
Define a function which can calculate the length of the letters in a string.
// return length of string s
length(string s) : int
E.g. length("aab") returns 3
Running the methods on two strings and comparing the results takes O(n) running time. Storing the hash values takes O(1) in space.
e.g.
distinct of "aab" => 3
distinct of "aba" => 3
sum of "aab => 4
sum of "aba => 4
length of "aab => 3
length of "aba => 3
Since all the values are equal for both strings, they must be a permutation of each other.
EDIT: The solutions is not correct with the given alphabet values as pointed out in the comments.
You can convert one of the two arrays into an in-place hashtable. This will not be exactly O(N), but it will come close, in non-pathological cases.
Just use [number % N] as it's desired index or in the chain that starts there. If any element has to be replaced, it can be placed at the index where the offending element started. Rinse , wash, repeat.
UPDATE:
This is a similar (N=M) hash table It did use chaining, but it could be downgraded to open addressing.
I'd use a randomized algorithm that has a low chance of error.
The key is to use a universal hash function.
def hash(array, hash_fn):
cur = 0
for item in array:
cur ^= hash_item(item)
return cur
def are_perm(a1, a2):
hash_fn = pick_random_universal_hash_func()
return hash_fn(a1, hash_fn) == hash_fn(a2, hash_fn)
If the arrays are permutations, it will always be right. If they are different, the algorithm might incorrectly say that they are the same, but it will do so with very low probability. Further, you can get an exponential decrease in chance for error with a linear amount of work by asking many are_perm() questions on the same input, if it ever says no, then they are definitely not permutations of each other.
I just find a counterexample. So, the assumption below is incorrect.
I can not prove it, but I think this may be possible true.
Since all elements of the arrays are integers, suppose each array has 2 elements,
and we have
a1 + a2 = s
a1 * a2 = m
b1 + b2 = s
b1 * b2 = m
then {a1, a2} == {b1, b2}
if this is true, it's true for arrays have n-elements.
So we compare the sum and product of each array, if they equal, one is the permutation
of the other.

Efficient way to find all zeros in a matrix?

I am thinking of efficient algorithm to find the number of zeros in a row of matrix but can only think of O(n2) solution (i.e by iterating over each row and column). Is there a more efficient way to count the zeros?
For example, given the matrix
3, 4, 5, 6
7, 8, 0, 9
10, 11, 12, 3
4, 0, 9, 10
I would report that there are two zeros.
Without storing any external information, no, you can't do any better than Θ(N2). The rationale is simple - if you don't look at all N2 locations in the matrix, then you can't guarantee that you've found all of the zeros and might end up giving the wrong answer back. For example, if I know that you look at fewer than N2 locations, then I can run your algorithm on a matrix and see how many zeros you report. I could then look at the locations that you didn't access, replace them all with zeros, and run your algorithm again. Since your algorithm doesn't look at those locations, it can't know that they have zeros in them, and so at least one of the two runs of the algorithm would give back the wrong answer.
More generally, when designing algorithms to process data, a good way to see if you can do better than certain runtimes is to use this sort of "adversarial analysis." Ask yourself the question: if I run faster than some time O(f(n)), could an adversary manipulate the data in ways that change the answer but I wouldn't be able to detect? This is the sort of analysis that, along with some more clever math, proves that comparison-based sorting algorithms cannot do any better than Ω(n log n) in the average case.
If the matrix has some other properties to it (for example, if it's sorted), then you might be able to do a better job than running in O(N2). As an example, suppose that you know that all rows of the matrix are sorted. Then you can easily do a binary search on each row to determine how many zeros it contains, which takes O(N log N) time and is faster.
Depending on the parameters of your setup, you might be able to get the algorithm to run faster if you assume that you're allowed to scan in parallel. For example, if your machine has K processors on it that can be dedicated to the task of scanning the matrix, then you could split the matrix into K roughly evenly-sized groups, have each processor count the number of zeros in the group, then sum the results of these computations up. This ends up giving you a runtime of Θ(N2 / K), since the runtime is split across multiple cores.
Always O(n^2) - or rather O(n x m). You cannot jump over it.
But if you know that matrix is sparse (only a few elements have nonzero values), you can store only values that are non zero and matrix size. Then consider using hashing over storing whole matrix - generally create hash which maps a row number to a nested hash.
Example:
m =
[
0 0 0 0
0 2 0 0
0 0 1 0
0 0 1 0
]
Will be represented as:
row_numbers = 4
column_numbers = 4
hash = { 1 => { 1 => 2}, 2 => {2 => 1, 3 => 2}}
Then:
number_of_zeros = row_numbers * column_numbers - number_of_cells_in_hash(hash)
For any un sorted matrix it should be O(n). Since generally we represent total elements with 'n'.
If Matrix contains X Rows and Y Columns, X by Y = n.
E.g In 4 X 4 un sorted matrix it total elements 16. so When we iterate in linear with 2 loops 4 X 4 = 16 times. it will be O(n) because the total elements in the array are 16.
Many people voted for O(n^2) because they considered n X n as matrix.
Please correct me if my understanding is wrong.
Assuming that when you say "in a row of a matrix", you mean that you have the row index i and you want to count the number of zeros in the i-th row, you can do better than O(N^2).
Suppose N is the number of rows and M is the number of columns, then store your
matrix as a single array [3,4,5,6,7,8,0,9,10,11,12,34,0,9,10], then to access row i, you access the array at index N*i.
Since arrays have constant time access, this part doesn't depend on the size of the matrix. You can then iterate over the whole row by visiting the element N*i + j for j from 0 to N-1, this is O(N), provided you know which row you want to visit and you are using an array.
This is not a perfect answer for the reasons I'll explain, but it offers an alternative solution potentially faster than the one you described:
Since you don't need to know the position of the zeros in the matrix, you can flatten it into a 1D array.
After that, perform a quicksort on the elements, this may provide a performance of O(n log n), depending on the randomness of the matrix you feed in.
Finally, count the zero elements at the beginning of the array until you reach a non-zero number.
In some cases, this will be faster than checking every element, although in a worst-case scenario the quicksort will take O(n2), which in addition to the zero counting at the end may be worse than iterating over each row and column.
assuming the given Matrix is M do an M+(-M) operation but do use the default + use instead my_add(int a, int b) such that
int my_add(int a, int b){
return (a == b == 0) ? 1 : (a+b);
}
That will give you a matrix like
0 0 0 0
0 0 1 0
0 0 0 0
0 1 0 0
Now you create a s := 0 and keep adding all elements to s. s += a[i][j]
You can do both in one cycle even. s += my_add(a[i][j], (-1)*a[i][j])
But still Its O(m*n)
NOTE
To count the number of 1's you generally check all items in the Matrix. without operating on all elements I don't think you can tell the number of 1's. and to loop all elements its (m*n). It can be faster than (m*n) if and only if you can leave some elements unchecked and say the number of 1's
EDIT
However if you move a 2x2 kernel over the matrix and hop you will get (m*n)/k iteration e.g. if you operate on neighboring elements a[i][j], a[i+1][j], a[i][j+1], a[i+1][j+1] till i < m & i< n

Data structure that supports range based most frequently occuring element query

I'm looking for a data structure with which I can find the most frequently occuring number (among an array of numbers) in a given, variable range.
Let's consider the following 1 based array:
1 2 3 1 1 3 3 3 3 1 1 1 1
If I query the range (1,4), the data structure must retun 1, which occurs twice.
Several other examples:
(1,13) = 1
(4,9) = 3
(2,2) = 2
(1,3) = 1 (all of 1,2,3 occur once, so return the first/smallest one. not so important at the moment)
I have searched, but could not find anything similar. I'm looking (ideally) a data structure with minimal space requirement, fast preprocessing, and/or query complexities.
Thanks in advance!
Let N be the size of the array and M the number of different values in that array.
I'm considering two complexities : pre-processing and querying an interval of size n, each must be spacial and temporal.
Solution 1 :
Spacial : O(1) and O(M)
Temporal : O(1) and O(n + M)
No pre-processing, we look at all values of the interval and find the most frequent one.
Solution 2 :
Spacial : O(M*N) and O(1)
Temporal : O(M*N) and O(min(n,M))
For each position of the array, we have an accumulative array that gives us for each value x, how many times x is in the array before that position.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval. We iterate over each x and find the maximum value. If n < M we iterate over each value of the interval, otherwise we iterate over all possible values for x.
Solution 3 :
Spacial : O(N) and O(1)
Temporal : O(N) and O(min(n,M)*log(n))
For each value x build a binary heap of all the position in the array where x is present. The key in your heap is the position but you also store the total number of x between this position and the begin of the array.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval : in O(log(N)) we can ask the x's heap to find the two positions just before the start/end of the interval and substract the numbers. Basically it needs less space than a histogram but the query in now in O(log(N)).
You could create a binary partition tree where each node represents a histogram map of {value -> frequency} for a given range, and has two child nodes which represent the upper half and lower half of the range.
Querying is then just a case of recursively adding together a small number of these histograms to cover the range required, and scanning the resulting histogram once to find the highest occurrence count.
Useful optimizations include:
Using a histogram with mutable frequency counts as an "accumulator" while you add histograms together
Stop using precomputed histograms once you get down to a certain size (maybe a range less than the total number of possible values M) and just counting the numbers directly. It's a time/space trade-off that I think will pay off a lot of the time.
If you have a fixed small number of possible values, use an array rather than a map to store the frequency counts at each node
UPDATE: my thinking on algorithmic complexity assuming a bounded small number of possible values M and a total of N values in the complete range:
Preprocessing is O(N log N) - basically you need to traverse the complete list and build a binary tree, building one node for every M elements in order to amortise the overhead of each node
Querying is O(M log N) - basically adding up O(log N) histograms each of size M, plus counting O(M) values on either side of the range
Space requirement is O(N) - approx. 2N/M histograms each of size M. The 2 factor is the sum from having N/M histograms at the bottom level, 0.5N/M histograms at the next level, 0.25N/M at the third level etc...

Resources