Tree Hash: How to verify if a range is tree-hash-aligned? - algorithm

"Tree Hash" is a concept similar to Merkle Tree/Tiger Hash Tree used by Amazon Glacier to verify the data integrity of a subsets of a given datastream.
In order to receive tree hashes from Amazon Glacier when retrieving data, the specified byte range has to be "tree hash aligned".
The concept of "tree hash aligned" is described here.
Quoting from the developer documentation:
A range [A, B] is tree-hash aligned with respect to an archive if and only if when a new tree hash is built over [A, B], the root of the tree hash of that range is equivalent to a node in the tree hash of the whole archive. [...]
Consider [P, Q) as the range query for an archive of N megabytes (MB) and P and Q are multiples of one MB. Note that the actual inclusive range is [P MB, Q MB – 1 byte], but for simplicity, we show it as [P, Q). With these considerations, then
If P is an odd number, there is only one possible tree-hash aligned range—that is [P, P + 1 MB).
If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P. X is an integer greater than 0. The tree-hash aligned ranges fall in the following categories:
For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
P = 0 is the special case where A = 2[lgN]*0
Now the question: How do I verify programmatically if a given range [startByte, endByte] is tree-hash-aligned? Programming language does not matter.
Test cases:
[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true

Here a basic implementation of the is_treehash_aligned function in Python:
import math
def max_k(x):
return 1 + max_k(x/2) if x % 2 == 0 else 0
def is_treehash_aligned(P, Q):
if (Q < P):
return False
elif (P % 2 == 1):
return Q == P
else:
ilen = Q - P + 1 # size(interval)
if not (((ilen & (ilen - 1)) == 0) and ilen != 0):
return False # size(interval) ~ not power of two
if P == 0:
return True
else:
k = max_k(P)
i = int(math.log(ilen, 2))
return i <= k
if (__name__ == "__main__"):
ranges = [(0, 0), (0, 1), (0, 2), (0, 3), (1, 2), \
(4, 5), (6, 7), (2, 4), (6, 8), (5, 6), \
(4, 4), (1, 1), (4194304, 5242879), \
(4194304, 5242880), (4194304, 5242881)]
for r in ranges:
ret = is_treehash_aligned(*r)
print("[" + str(r[0]) + ", " + str(r[1]) + ") => " + str(ret))
The output is:
[0, 0) => True
[0, 1) => True
[0, 2) => False
[0, 3) => True
[1, 2) => False
[4, 5) => True
[6, 7) => True
[2, 4) => False
[6, 8) => False
[5, 6) => False
[4, 4) => True
[1, 1) => True
[4194304, 5242879) => True
[4194304, 5242880) => False
[4194304, 5242881) => False
Note that:
I adopted your notation for intervals rather than the one provided by the instructions. As a consequence, it is possible to assume that each interval is Megabyte aligned.
The result for the test-case [4194304, 5242880) differs from what you put in your original question, though I double-checked it and I am somewhat confident it is correct.
if N is known, which is not the case in your test-cases, then when P == 0 one should also accept any range s.t. Q >= floor(N), and not only those with a size that is a power of two. A similar argument could be made for sub-trees for which there is nothing else on the right. Both of these cases would match the definition of Tree-Hash Alignment given here, but not the instructions for identifying it.
Notes: both the question and the description of the problem appear to be albeit confusing.
The test cases are given with the notation [A, B) where A is the index of the starting block and B is the index of the ending block (included), assuming that the whole archive is comprised by an array --indexed starting from 0-- of N blocks size 1 MB each (except possibly the last one). E.g.:
[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true
However, the instructions assume that the ranges are given with the notation [P MB, Q MB – 1 byte].
The instructions are misleading.
For example, here it says:
If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P
The power symbol appears to be omitted, perhaps due to wrong HTML code, as the sentence should be "the largest k s.t. P = (2^k)*X".
Another example is:
For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
Assume for example that Q = P + 1, i > 0 and k > 0. Then the interval [P, Q + 2^i) has size = Q + 2^i - P = P + 1 + 2^i - P = 2^i + 1 > 1. However, by construction there exists no such tree-hash aligned range with an odd size larger than one. The proposition should be: "[...], then [P, P + 2^i) is a tree-hash aligned range".

Related

minimum number of operations to make two numbers equal

I had an interview and couldn't think a clear/best solution for this problem.
Given 2 numbers A and B and we need to convert a number A to B with minimum number of the following operations:
Subtract 1
Add 1
Multiply 2
Divide 2
Multiply 3
Divide 3
For e.g. : if a=3 and b=7, the program should output 2.
1st operation : *2 -> 3*2 = 6.
2nd operation : +1 -> 6 + 1 =7.
For e.g. : if a=10 and b=60, the program should output 2.
1st operation: *2 -> 10*2 = 20.
2nd operation: *3 -> 20*3 = 60
As we can Change m (10) to n (60) after 2 operations, the answer is 2.
Tried to use dynamic programming and recursion approach but to no avail. Any tips?
As mentioned in other answers, this can be approached using BFS in a graph whose nodes correspond to numbers and whose edges correspond to operations.
Interestingly, sometimes, optimal paths need to contain quite large numbers (larger than 3 * max(A, B)).
Below is an example of an optimal paths with such large numbers within it:
a = 82, b = 73
optimal path:
[82, 164, 328, 656, 657, 219, 73] (6 operations)
optimal path if paths with values larger than 3 * max(a, b) are discarded:
[82, 81, 162, 54, 108, 216, 72, 73] (7 operations)
Below is a python implementation of this BFS solution:
def solve(a, b, max_n=None):
# the bfs queue
queue = []
# length[i] = length of the shortest
# path to get from `a' to `i'
length = {}
# previous[i] = previous value reached
# in the shortest path from `a' to `i'
previous = {}
# node with value `a' is the first in the path
queue.append(a)
length[a] = 0
previous[a] = None
while True:
val = queue.pop(0)
# add an element to the queue (if it was not
# already visited, and eventually not above
# some limit)
def try_add(next_val):
if max_n is not None and next_val > max_n:
return
if next_val in length:
return
queue.append(next_val)
length[next_val] = length[val] + 1
previous[next_val] = val
try_add(val + 1)
try_add(val - 1)
try_add(val * 2)
if val % 2 == 0:
try_add(val // 2)
try_add(val * 3)
if val % 3 == 0:
try_add(val // 3)
# check whether we already have a solution
if b in length:
break
path = [b]
while True:
if path[-1] == a:
break
else:
path.append(previous[path[-1]])
path.reverse()
return path
if __name__ == '__main__':
a = 82
b = 73
path = solve(a, b)
print(len(path), ': ', path)
path = solve(a, b, 3 * max(a, b))
print(len(path), ': ', path)
Treat numbers as nodes of a graph, and operations as edges. Use BFS to find the shortest path from A to B.
I think you can cap the nodes at 3 times the absolute value of A and B, to minimize the number of steps, but this is not necessary.
The space and time complexity is proportional to the answer, e.g. if the answer is 2, in the worst case we have to visit 6*2=12 nodes.
Here's a BFS Javascript solution:
const findPath = (ops) => (A, B) => {
const queue = new Set() .add ( [A, []] )
const paths = new Map()
while (queue .size !== 0 && !paths .has (B)) {
const next = [...queue] [0]
const [n, p] = next
ops.forEach((fn) => {
const m = fn(n);
if (Number.isInteger(m)) {
if (!paths.has(m)) {
queue.add([m, [...p, n]])
paths.set(m, [...p, n])
}
queue.delete(next)
}
})
}
return paths.get(B)
}
const ops = [n => n + 1, n => n - 1, n => 2 * n, n => 3 * n, n => n / 2, n => n / 3]
console .log (
findPath (ops) (82, 73)
)
We keep a queue of numbers still to process and a dictionary recording the paths for each number found, and keep testing them until the queue is empty (won't happen with these operations, but others might let us drain it) or we've found our target. For each number we run each operation and for integer results add it to our structures if it's not already found.
There is nothing in here to attempt to stop a chain from spiraling out of control. It's not clear how we would do that. And it would clearly be possible with different operations: if we had, say, add 2, subtract 2, and double, we'd never be able to get from 2 to 3. This algorithm would never stop.
While this could of course be converted to a recursive algorithm, the naive recursion is not likely to succeed as it would work depth-first and usually miss the value and never halt.

Can someone explain the mathematics behind this solution

A question asks:
Take a sequence of numbers from 1 to n (where n > 0).
Within that sequence, there are two numbers, a and b.
The product of a and b should equal the sum of all numbers in the sequence excluding a and b.
Given a number n, could you tell me the numbers excluded from the sequence?
My plan was to get the sum of the range, then create an array using the combination enumerator to get all of the possible pairs of the range, then check if the product of the pair equals the sum of the range minus the sum of the pair. This solution worked, but took way too long:
def removNb(n)
arr = [*1..n]
sum = arr.inject(:+)
ab = []
[*(n/2)..n].combination(2).to_a.each do |pair|
if pair.inject(:*) == sum - pair.inject(:+)
ab << pair
ab << [pair[1],pair[0]]
end
end
ab
end
Here is a solution that I found:
def removNb(n)
res = []
total = (n*n + n) / 2
range = (1..n)
(1..n).each do |a|
b = ((total - a) / (a * 1.0 + 1.0))
if b == b.to_i && b <= n
res.push([a,b.to_i])
end
end
return res
end
but can't understand how it works. I understand the equation behind the total.
You could form a equation
a * b = (sum of sequence from 1 to n) - (a + b)
from this statement
the product of a and b should be equal to the sum of all numbers in
the sequence, excluding a and b
sum of sequence from 1 to n (denote as total) = n(n+1)/2 = (n*n + n) / 2
Reorder above equation, you get
b = (total - a) / (a + 1)
The remaining work is to test if there exist integer a and b matching this equation
The code returns an array of all pairs of numbers in the sequence that have the desired property. Let's step through it.
Initialize the array to be returned.
res = []
Compute the sum of the elements in the sequence. The sum of the elements of any arithmetic sequence equals the first element plus the last element, multiplied by the number of elements in the sequence, the product divided by 2. Here that is total = n*(1+n)/2, which can be expressed as
total = (n*n + n) / 2
range = (1..n) is unnecessary as range is not subsequently referenced.
Loop through the elements of the sequence
(1..n).each do |a|
For each value of a we seek another element of the sequence b such that
a*b = total - a - b
Solving for b:
b = (total - a)/ (a * 1.0 + 1.0)
If b is in the range, save the pair [a, b]
if b == b.to_i && b <= n
res.push([a,b.to_i])
end
Return the array res
res
This method contains two errors:
If [a,b] is added to res, it will be added twice
[a,a] could be added to res (such as n=5, a=b=3)
I would write this as follows.
def remove_numbers(n)
total = n*(n+1)/2
(1..n-1).each_with_object([]) do |a,res|
next unless (total-a) % (a+1) == 0
b = (total-a)/(a+1)
res << [a,b] if (a+1..n).cover?(b)
end
end
For example,
remove_numbers 10
#=> [[6, 7]]
remove_numbers 1000
#=> []
Out of cursiosity:
(2..10_000).map { |x| [x, remove_numbers(x).size] }.max_by(&:last)
#=> [3482, 4]
remove_numbers 3482
#=> [[1770, 3423], [2023, 2995], [2353, 2575], [2460, 2463]]

Number of unique sequences of 3 digits (-1,0,1) given a length that matches a sum

Say you have a vertical game board of length n (being the number of spaces). And you have a three-sided die that has the options: go forward one, stay and go back one. If you go below or above the number of board game spaces it is an invalid game. The only valid move once you reach the end of the board is "stay". Given an exact number of die rolls t, is it possible to algorithmically work out the number of unique dice rolls that result in a winning game?
So far I've tried producing a list of every possible combination of (-1,0,1) for the given number of die rolls and sorting through the list to see if any add up to the length of the board and also meet all the requirements for being a valid game. But this is impractical for dice rolls above 20.
For example:
t=1, n=2; Output=1
t=3, n=2; Output=3
You can use a dynamic programming approach. The sketch of a recurrence is:
M(0, 1) = 1
M(t, n) = T(t-1, n-1) + T(t-1, n) + T(t-1, n+1)
Of course you have to consider the border cases (like going off the board or not allowing to exit the end of the board, but it's easy to code that).
Here's some Python code:
def solve(N, T):
M, M2 = [0]*N, [0]*N
M[0] = 1
for i in xrange(T):
M, M2 = M2, M
for j in xrange(N):
M[j] = (j>0 and M2[j-1]) + M2[j] + (j+1<N-1 and M2[j+1])
return M[N-1]
print solve(3, 2) #1
print solve(2, 1) #1
print solve(2, 3) #3
print solve(5, 20) #19535230
Bonus: fancy "one-liner" with list compreehension and reduce
def solve(N, T):
return reduce(
lambda M, _: [(j>0 and M[j-1]) + M[j] + (j<N-2 and M[j+1]) for j in xrange(N)],
xrange(T), [1]+[0]*N)[-1]
Let M[i, j] be an N by N matrix with M[i, j] = 1 if |i-j| <= 1 and 0 otherwise (and the special case for the "stay" rule of M[N, N-1] = 0)
This matrix counts paths of length 1 from position i to position j.
To find paths of length t, simply raise M to the t'th power. This can be performed efficiently by linear algebra packages.
The solution can be read off: M^t[1, N].
For example, computing paths of length 20 on a board of size 5 in an interactive Python session:
>>> import numpy
>>> M = numpy.matrix('1 1 0 0 0;1 1 1 0 0; 0 1 1 1 0; 0 0 1 1 1; 0 0 0 0 1')
>>> M
matrix([[1, 1, 0, 0, 0],
[1, 1, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1]])
>>> M ** 20
matrix([[31628466, 51170460, 51163695, 31617520, 19535230],
[51170460, 82792161, 82787980, 51163695, 31617520],
[51163695, 82787980, 82792161, 51170460, 31628465],
[31617520, 51163695, 51170460, 31628466, 19552940],
[ 0, 0, 0, 0, 1]])
So there's M^20[1, 5], or 19535230 paths of length 20 from start to finish on a board of size 5.
Try a backtracking algorithm. Recursively "dive down" into depth t and only continue with dice values that could still result in a valid state. Propably by passing a "remaining budget" around.
For example, n=10, t=20, when you reached depth 10 of 20 and your budget is still 10 (= steps forward and backwards seemed to cancelled), the next recursion steps until depth t would discontinue the 0 and -1 possibilities, because they could not result in a valid state at the end.
A backtracking algorithms for this case is still very heavy (exponential), but better than first blowing up a bubble with all possibilities and then filtering.
Since zeros can be added anywhere, we'll multiply those possibilities by the different arrangements of (-1)'s:
X (space 1) X (space 2) X (space 3) X (space 4) X
(-1)'s can only appear in spaces 1,2 or 3, not in space 4. I got help with the mathematical recurrence that counts the number of ways to place minus ones without skipping backwards.
JavaScript code:
function C(n,k){if(k==0||n==k)return 1;var p=n;for(var i=2;i<=k;i++)p*=(n+1-i)/i;return p}
function sumCoefficients(arr,cs){
var s = 0, i = -1;
while (arr[++i]){
s += cs[i] * arr[i];
}
return s;
}
function f(n,t){
var numMinusOnes = (t - (n-1)) >> 1
result = C(t,n-1),
numPlaces = n - 2,
cs = [];
for (var i=1; numPlaces-i>=i-1; i++){
cs.push(-Math.pow(-1,i) * C(numPlaces + 1 - i,i));
}
var As = new Array(cs.length),
An;
As[0] = 1;
for (var m=1; m<=numMinusOnes; m++){
var zeros = t - (n-1) - 2*m;
An = sumCoefficients(As,cs);
As.unshift(An);
As.pop();
result += An * C(zeros + 2*m + n-1,zeros);
}
return result;
}
Output:
console.log(f(5,20))
19535230

Minimum number of special moves to sort number

Given the list of numbers
1 15 2 5 10
I need to obtain
1 2 5 10 15
The only operation I can do is "move the number X at position Y".
In the above example I only need to do "move the number 15 at position 5".
I would like to minimize the number of operations but I can't find/remember a classical algorithm for that, given the operation available.
Some background :
I'm interacting with an API for a kanban-like service.
I have about 600 cards and some actions on our bug-tracker can imply a reordering of these 600 cards in the kanban (multiple cards can move at the same time if the priority of a project is changed)
I can do it in 600 calls to the API but I'm trying to reduce that number as much as possible.
Lemma: The minimum number of (delete element, insert element) pairs you can perform to sort a list L (in increasing order) is:
Smin(L) = |L| - |LIC(L)|
Where LIC(L) is the Longest Increasing Subsequence.
Thus, you have to:
Establish the LIC of your list.
Remove the elements not in it and insert them back at the appropriate position (using binary search).
Proof:
By induction.
For a list of size 1, the longest increasing subsequence is of length... 1! The list is already sorted so the number of (del,ins) pairs required is
|L| - |LIC(L)| = 1 - 1 = 0
Now let Ln be a list of length n, 1 ≤ n. Let Ln+1 be the list obtained by adding an element en+1 to the left of Ln.
This element may or may not influence the Longest Increasing Subsequence. Let's try to see how...
Let in,1 and in,2 be the two first elements of LIC(Ln) (*):
If en+1 > in,2, then LIC(Ln+1) = LIC(Ln)
If en+1 ≤ in,1, then LIC(Ln+1) = en+1 || LIC(Ln)
Else, LIC(Ln+1) = LIC(Ln) - in,1 + en+1. We keep the LIC with the highest first element. This is done by removing in,1 from the LIC and replacing it with en+1.
In the first case, we delete en+1, we thus get to sort Ln. By the induction hypothesis, this require n (deletion, insertion) pairs. We then have to insert en+1 at the appropriate position. Thus:
S(Ln+1)min = 1 + S(Ln)min
S(Ln+1)min = 1 + n - |LIC(Ln)|
S(Ln+1)min = |Ln+1| - |LIC(Ln+1|
In the second case, we ignore en+1. We begin by deleting elements not in LIC(Ln). These elements have to be inserted again! There are
S(Ln)min = |Ln| - |LIC(Ln)|
such elements.
Now, we just have to take care and insert them in the right order (relatively to en+1). In the end, it requires:
S(Ln+1)min = |Ln| - |LIC(Ln)|
S(Ln+1)min = |Ln| + 1 - (|LIC(Ln)| + 1)
Since we have |LIC(Ln+1)| = |LIC(Ln)| + 1 and |Ln+1| = |Ln| + 1, we have in the end:
S(Ln+1)min = |Ln+1| - |LIC(Ln+1)|
The last case can be proved by considering the list L'n obtained by removing in,1 from Ln+1. In that case LIC(L'n) = LIC(Ln+1) and thus:
|LIC(L'n)| = |LIC(Ln)| (1)
From there, we can sort L'n (which takes |L'n| - |LIC(L'n| by the induction hypothesis. The previous equality (1) leads to the result.
(*): If LIC(Ln) < 2, then in,2 doesn't exist. Just ignore the comparisons with it. In that case, only case 2 and case 3 apply... The result is still valid
One possible solution is to find the longest increasing subsequence and move only elements that aren't inside it.
I can't prove it's optimal, but it is easy to prove it is correct and better than N swaps.
Here is a proof-of-concept in Python 2. I implemented it as a O(n2) algorithm, but I'm pretty sure it can be reduced to O(n log n).
from operator import itemgetter
def LIS(V):
T = [1]*(len(V))
P = [-1]*(len(V))
for i, v in enumerate(V):
for j in xrange(i-1, -1, -1):
if T[j]+1 > T[i] and V[j] <= V[i]:
T[i] = T[j] + 1
P[i] = j
i, _ = max(enumerate(T), key=itemgetter(1))
while i != -1:
yield i
i = P[i]
def complement(L, n):
for a, b in zip(L, L[1:]+[n]):
for i in range(a+1, b):
yield i
def find_moves(V):
n = len(V)
L = list(LIS(V))[::-1]
SV = sorted(range(n), key=lambda i:V[i])
moves = [(x, SV.index(x)) for x in complement(L, n)]
while len(moves):
a, b = moves.pop()
yield a, b
moves = [(x-(x>a)+(x>b), y) for x, y in moves]
def make_and_print_moves(V):
print 'Initial array:', V
for a, b in find_moves(V):
x = V.pop(a)
V.insert(b, x)
print 'Move {} to {}. Result: {}'.format(a, b, V)
print '***'
make_and_print_moves([1, 15, 2, 5, 10])
make_and_print_moves([4, 3, 2, 1])
make_and_print_moves([1, 2, 4, 3])
It outputs something like:
Initial array: [1, 15, 2, 5, 10]
Move 1 to 4. Result: [1, 2, 5, 10, 15]
***
Initial array: [4, 3, 2, 1]
Move 3 to 0. Result: [1, 4, 3, 2]
Move 3 to 1. Result: [1, 2, 4, 3]
Move 3 to 2. Result: [1, 2, 3, 4]
***
Initial array: [1, 2, 4, 3]
Move 3 to 2. Result: [1, 2, 3, 4]
***

How do you find the largest gap in a vector in O(n) time?

You are given the locations of various cars in the same lane on a highway as doubles to a vector, in no particular order. How can you find the largest gap between neighboring cars in O(n) time?
It seems like a simple solution would be to sort then check, but of course this isn't linear.
Divide the vector in n+1 equally sized buckets. For each such buckets, store the maximum and the minimum value, all other values can be discarded. Because of the pigeonhole principle, at least one of those parts is empty, so the non-minimum/non-maximum values in either parts don't have an influence for the result.
Then, go over the buckets and calculate the distance to the next and the previous non-empty bucket, and take the maximum; this is the final result.
An example with n=5 and values 5,2,20,17,3. Minimum is 2, maximum is 20 => bucket size is (20-2)/5 = 4.
Bucket: 2 6 10 14 18 20
Min/Max: 2-5 - - 17,17 20,20
Differences: 2-5, 5-17, 17-20.
Maximum is 5-17.
My Python implementation of ipc's solution:
def maximum_gap(l):
n = len(l)
if n < 2:
return 0
(x_min, x_max) = (min(l), max(l))
if x_min == x_max:
return 0
buckets = [None] * (n + 1)
bucket_size = float(x_max - x_min) / n
for x in l:
k = int((x - x_min) / bucket_size)
if buckets[k] is None:
buckets[k] = (x, x)
else:
buckets[k] = (min(x, buckets[k][0]), max(x, buckets[k][1]))
result = 0
for i in range(n):
if buckets[i + 1] is None:
buckets[i + 1] = buckets[i]
else:
result = max(result, buckets[i + 1][0] - buckets[i][1])
return result
assert maximum_gap([]) == 0
assert maximum_gap([42]) == 0
assert maximum_gap([1, 1, 1, 1]) == 0
assert maximum_gap([1, 2, 3, 4, 6, 8]) == 2
assert maximum_gap([5, 2, 20, 17, 3]) == 12
I use a tuple for bucket's elements, None if empty. In the last part, I eliminate preemptively any remaining empty bucket by assigning it to the previous one (this works, since the first one is guaranteed to be non-empty).
Note the special case when all elements are equal.

Resources