Choose second largest or smallest number in KDB - max

Is there a simple way to find the second largest or smallest number in a group of columns of a table?
I can easily find the largest or smallest by using select min/max (a, b, c, d) by i from t.
However I can't seem to figure out a simple way to find the second (or third) largest or smallest from the group.
Thanks

If you want only second maximum/minimum, you can use:
q) a: 4 3 5 1 6 8
q) max a except max a / second maximum
q) min a except min a / second minimum
But if you want general function that would work for any nth min/max, here is one way:
For Nth maximum
f:a (idesc a)[n-1]
q) a (idesc a)[2-1] // second maximum
For Nth minimum
f: a (iasc a)[n-1]
q) a (iasc a)[2-1] // second minimum

you can try using rank for this:
q)a:10?100
q)a
65 93 15 82 76 14 75 78 44 79
q)f:{x where y=rank x}
q)f[a;1] / second smallest
,15
q)f[a;2] / third smallest
,44
I don't think there is a reverse rank function in q so you can do this:
q)f2:{x where y=iasc idesc x}
q)f2[a;1] / second biggest
,82
q)f2[a;2] / third biggest
,79
http://code.kx.com/q/ref/sort/#rank

Related

Find the number of all possible path in a grid, from (0, 0) to (n, n)

I don't know how to find the number of all possible path in a grid, from a Point A to a Point B.
The point A is on (0,0) and the point B is on (n,n).
A can move up, down, right, and left, and can't move on visited points.
While A moving, A(x,y) = (x,y|(0=<x=<n)∩(0=<y=<n)).
You can solve this problem with recursive backtracking, but there's another approach which I think is more interesting.
If we work out the first few cases by hand we find that:
A 1x1 square has 1 path
A 2x2 square has 2 paths
A 3x3 square has 12 paths
If we then go to OEIS (the Online Encyclopedia of Integer Sequences) and put in the search phrase "1,2,12 paths", the very first result is A007764 which is entitled "Number of nonintersecting (or self-avoiding) rook paths joining opposite corners of an n X n grid".
Knowing what integer sequence you're looking for unlocks significant mathematical resources, including source code to generate the sequence, related sequences, and best-known values.
The known values of the sequence are:
1 1
2 2
3 12
4 184
5 8512
6 1262816
7 575780564
8 789360053252
9 3266598486981642
10 41044208702632496804
11 1568758030464750013214100
12 182413291514248049241470885236
13 64528039343270018963357185158482118
14 69450664761521361664274701548907358996488
15 227449714676812739631826459327989863387613323440
16 2266745568862672746374567396713098934866324885408319028
17 68745445609149931587631563132489232824587945968099457285419306
18 6344814611237963971310297540795524400449443986866480693646369387855336
19 1782112840842065129893384946652325275167838065704767655931452474605826692782532
20 1523344971704879993080742810319229690899454255323294555776029866737355060592877569255844
21 3962892199823037560207299517133362502106339705739463771515237113377010682364035706704472064940398
22 31374751050137102720420538137382214513103312193698723653061351991346433379389385793965576992246021316463868
23 755970286667345339661519123315222619353103732072409481167391410479517925792743631234987038883317634987271171404439792
24 55435429355237477009914318489061437930690379970964331332556958646484008407334885544566386924020875711242060085408513482933945720
25 12371712231207064758338744862673570832373041989012943539678727080484951695515930485641394550792153037191858028212512280926600304581386791094
26 8402974857881133471007083745436809127296054293775383549824742623937028497898215256929178577083970960121625602506027316549718402106494049978375604247408
27 17369931586279272931175440421236498900372229588288140604663703720910342413276134762789218193498006107082296223143380491348290026721931129627708738890853908108906396
You can generate the first few terms yourself on paper or via recursive backtracking, per the other answer.
I would suggest solving this with naive recursion.
Keep a set visted of places that you have visited. And in pseudo-code that is deliberately not any particular language:
function recursive_call(i, j, visited=none)
if visited is none then
visited = set()
end if
if i = n and j = n then
return 1
else if (i, j) in visited or not in grid then
return 0
else
total = 0
add (i, j) to visited
for direction in directions:
(new_i, new_j) = move(i, j, direction)
total += recursive_call(new_i, new_j, visited)
remove (i, j) from visited
return total
end if
end function

Falling segment reunites with other segments with a probability, determine the expected medium length of the segment

This is a really tough problem, just a heads-up.
We have N segments, numbered from 1 to N and defined by their left and right points, {Left[i],Right[i]}.
The i-th segment is at height N-i. The first segment (the highest one) starts falling while the others remain fixed. If during the fall a segment i intersects another segment j in at least one point, then the two will reunite with the probability P[j]/Q[j], and the obtained segment will keep falling. From the reunion of two segments, {A,B} and {C,D}, the obtained segment will be {min(A,C),max(B,D)}.
You are asked to determine the expected medium length of the first segment (i.e after it reached a height smaller than the height of any of the other segments). If this answer is a rational number U/V, you are asked to determine X such that X*V=U (mod 10^9+7)
Restrictions :
0 < P < Q < 1 000
0 < Left < Right < 1 000 000
N ≤ 100 000
time : 2.5 sec
memory : 32768 kbytes
`
The input contains N on the first line, then on the following N lines there are 4 integers : Left, Right, P, Q, representing the i-th segment [Left, Right] with a probability P/Q to reunite with the falling segment.
Example:
input:
5
35 64 58 873
41 70 407 729
18 90 165 628
10 57 33 104
60 69 152 466
output:
779316733
The answer is approximately 49.813963.
Idea 1
The length of the final segment is R-L where R is the location of the right end, and L is the location of the left end.
Expectation is a linear operation so
E(length) = E(R) - E(L)
We can compute E(R) and E(L) separately, then combined the results.
Idea 2
We can iteratively compute the PDF for the position of the left end.
It starts off being at the left end of the first segment (Left[1]) with probability 1.
When it falls past segment i, there will be an interesting collision if the left end is between Left[i] and Right[i]. We define an interesting collision to be one that affects the position of the left end.
The key point here is that if we need to know the current position of the right end to determine if there is a collision, then it is not an interesting collision! This is because if we need to know the right end, then the segment i must be completely to the right of the start point, and therefore it does not affect the position of the left edge.
So to update the PDF we collect up all the probability mass between Left[i] and Right[i], multiply by the probability of collision, and add the result to Left[i]. (The existing mass in those locations is scaled down by the probability of collision.)
Idea 3
At the moment we have an O(n^2) algorithm made of n iterations of O(n) to count and modify the mass in each range.
However, we can use a data structure such as a segment tree to allow us to perform each iteration in O(logn) time for a total time complexity of O(nlogn).

The 1000th element which is product of 2, 3, 5

There is a sequence S.
All the elements in S is product of 2, 3, 5.
S = {2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24 ...}
How to get the 1000th element in this sequence efficiently?
I check each number from 1, but this method is too slow.
A geometric approach:
Let s = 2^i . 3^j . 5^k, where the triple (i, j, k) belongs to the first octant of a 3D state space.
Taking the logarithm,
ln(s) = i.ln(2) + j.ln(3) + k.ln(5)
so that in the state space the iso-s surfaces are planes, which intersect the first octant along a triangle. On the other hand, the feasible solutions are the nodes of a square grid.
If one wants to produce the s-values in increasing order, one can keep a list of the grid nodes closest to the current s-plane*, on its "greater than" side.
If I am right, to move from one s-value to the next, it suffices to discard the current (i, j, k) and replace it by the three triples (i+1, j, k), (i, j+1, k) and (i, j, k+1), unless they are already there, and pick the next smallest s.
An efficient implementation will be by storing the list as a binary tree with the log(s)-value as the key.
If you are asking for the first N values, you will explore a pyramidal volume of state-space of height O(³√N), and base area O(³√N²), which is the number of tree nodes, hence the spatial complexity. Every query in the tree will take O(log(N)) comparisons (and O(1) operations to fetch the minimum), for a total of O(N.log(N)).
*More precisely, the list will contain all triples on the "greater than" side and such that no index can be decreased without getting on the other side of the plane.
Here is Python code that implements these ideas.
You will notice that the logarithms are converted to fixed point (7 decimals) to avoid floating-point inaccuracies that could result in the log(s)-values not being found equal. This causes the s values being inexact in the last digits, but this does not matter as long as the ordering of the values is preserved. Recomputing the s-values from the indexes yields exact values.
import math
import bintrees
# Constants
ln2= round(10000000 * math.log(2))
ln3= round(10000000 * math.log(3))
ln5= round(10000000 * math.log(5))
# Initial list
t= bintrees.FastAVLTree()
t.insert(0, (0, 0, 0))
# Find the N first products
N= 100
for i in range(N):
# Current s
s= t.pop_min()
print math.pow(2, s[1][0]) * math.pow(3, s[1][1]) * math.pow(5, s[1][2])
# Update the list
if not s[0] + ln2 in t:
t.insert(s[0] + ln2, (s[1][0]+1, s[1][1], s[1][2]))
if not s[0] + ln3 in t:
t.insert(s[0] + ln3, (s[1][0], s[1][1]+1, s[1][2]))
if not s[0] + ln5 in t:
t.insert(s[0] + ln5, (s[1][0], s[1][1], s[1][2]+1))
The 100 first values are
1 2 3 4 5 6 8 9 10 12
15 16 18 20 24 25 27 30 32 36
40 45 48 50 54 60 64 72 75 80
81 90 96 100 108 120 125 128 135 144
150 160 162 180 192 200 216 225 240 243
250 256 270 288 300 320 324 360 375 384
400 405 432 450 480 486 500 512 540 576
600 625 640 648 675 720 729 750 768 800
810 864 900 960 972 1000 1024 1080 1125 1152
1200 1215 1250 1280 1296 1350 1440 1458 1500 1536
The plot of the number of tree nodes confirms the O(³√N²) spatial behavior.
Update:
When there is no risk of overflow, a much simpler version (not using logarithms) is possible:
import math
import bintrees
# Initial list
t= bintrees.FastAVLTree()
t[1]= None
# Find the N first products
N= 100
for i in range(N):
# Current s
(s, r)= t.pop_min()
print s
# Update the list
t[2 * s]= None
t[3 * s]= None
t[5 * s]= None
Simply put, you just have to generate each ith number consecutively. Let's call the set {2, 3, 5} to be Z. At ith iteration, assume you have all (i-1) of the values generated in the previous iteration. While generating the next one, what you basically have to do is trying all the elements in Z and for each of them generating **the least element they can form that is larger than the element generated at (i-1)th iteration. Then, you simply consider the smallest one among them as the ith value. A simple and not so efficient implementation is given below.
def generate_simple(N, Z):
generated = [1]
for i in range(1, N+1):
minFound = -1
minElem = -1
for j in range(0, len(Z)):
for k in range(0, len(generated)):
candidateVal = Z[j] * generated[k]
if candidateVal > generated[-1]:
if minFound == -1 or minFound > candidateVal:
minFound = candidateVal
minElem = j
break
generated.append(minFound)
return generated[-1]
As you may observe, this approach has a time complexity of O(N2 * |Z|). An improvement in terms of efficiency would be to store where we left off scanning in the array of generated values for each element in a second array, indicesToStart. Then, for each element we would only scan all N values of the array generated for once(i.e. all through the algorithm), which means the time complexity after such an improvement would be O(N * |Z|).
A simple implementation of the improvement based on the simple version provided above, is given below.
def generate_improved(N, Z):
generated = [1]
indicesToStart = [0] * len(Z)
for i in range(1, N+1):
minFound = -1
minElem = -1
for j in range(0, len(Z)):
for k in range(indicesToStart[j], len(generated)):
candidateVal = Z[j] * generated[k]
if candidateVal > generated[-1]:
if minFound == -1 or minFound > candidateVal:
minFound = candidateVal
minElem = j
break
indicesToStart[j] += 1
generated.append(minFound)
indicesToStart[minElem] += 1
return generated[-1]
If you have a hard time understanding how complexity decreases with this algorithm, try looking into the difference in time complexity of any graph traversal algorithm when an adjacency list is used, and when an adjacency matrix is used. The improvement adjacency lists help achieve is almost exactly the same kind of improvement we get here. In a nutshell, you have an index for each element and instead of starting to scan from the beginning you continue from wherever you left the last time you scanned the generated array for that element. Consequently, even though there are N iterations in the algorithm(i.e. the outermost loop) the overall number of operations you make is O(N * |Z|).
Important Note: All the code above is a simple implementation for demonstration purposes, and you should consider it just as a pseudocode you can test. While implementing this in real life, based on the programming language you choose to use, you will have to consider issues like integer overflow when computing candidateVal.

minimum steps required to make array of integers contiguous

given a sorted array of distinct integers, what is the minimum number of steps required to make the integers contiguous? Here the condition is that: in a step , only one element can be changed and can be either increased or decreased by 1 . For example, if we have 2,4,5,6 then '2' can be made '3' thus making the elements contiguous(3,4,5,6) .Hence the minimum steps here is 1 . Similarly for the array: 2,4,5,8:
Step 1: '2' can be made '3'
Step 2: '8' can be made '7'
Step 3: '7' can be made '6'
Thus the sequence now is 3,4,5,6 and the number of steps is 3.
I tried as follows but am not sure if its correct?
//n is the number of elements in array a
int count=a[n-1]-a[0]-1;
for(i=1;i<=n-2;i++)
{
count--;
}
printf("%d\n",count);
Thanks.
The intuitive guess is that the "center" of the optimal sequence will be the arithmetic average, but this is not the case. Let's find the correct solution with some vector math:
Part 1: Assuming the first number is to be left alone (we'll deal with this assumption later), calculate the differences, so 1 12 3 14 5 16-1 2 3 4 5 6 would yield 0 -10 0 -10 0 -10.
sidenote: Notice that a "contiguous" array by your implied definition would be an increasing arithmetic sequence with difference 1. (Note that there are other reasonable interpretations of your question: some people may consider 5 4 3 2 1 to be contiguous, or 5 3 1 to be contiguous, or 1 2 3 2 3 to be contiguous. You also did not specify if negative numbers should be treated any differently.)
theorem: The contiguous numbers must lie between the minimum and maximum number. [proof left to reader]
Part 2: Now returning to our example, assuming we took the 30 steps (sum(abs(0 -10 0 -10 0 -10))=30) required to turn 1 12 3 14 5 16 into 1 2 3 4 5 6. This is one correct answer. But 0 -10 0 -10 0 -10+c is also an answer which yields an arithmetic sequence of difference 1, for any constant c. In order to minimize the number of "steps", we must pick an appropriate c. In this case, each time we increase or decrease c, we increase the number of steps by N=6 (the length of the vector). So for example if we wanted to turn our original sequence 1 12 3 14 5 16 into 3 4 5 6 7 8 (c=2), then the differences would have been 2 -8 2 -8 2 -8, and sum(abs(2 -8 2 -8 2 -8))=30.
Now this is very clear if you could picture it visually, but it's sort of hard to type out in text. First we took our difference vector. Imagine you drew it like so:
4|
3| *
2| * |
1| | | *
0+--+--+--+--+--*
-1| |
-2| *
We are free to "shift" this vector up and down by adding or subtracting 1 from everything. (This is equivalent to finding c.) We wish to find the shift which minimizes the number of | you see (the area between the curve and the x-axis). This is NOT the average (that would be minimizing the standard deviation or RMS error, not the absolute error). To find the minimizing c, let's think of this as a function and consider its derivative. If the differences are all far away from the x-axis (we're trying to make 101 112 103 114 105 116), it makes sense to just not add this extra stuff, so we shift the function down towards the x-axis. Each time we decrease c, we improve the solution by 6. Now suppose that one of the *s passes the x axis. Each time we decrease c, we improve the solution by 5-1=4 (we save 5 steps of work, but have to do 1 extra step of work for the * below the x-axis). Eventually when HALF the *s are past the x-axis, we can NO LONGER IMPROVE THE SOLUTION (derivative: 3-3=0). (In fact soon we begin to make the solution worse, and can never make it better again. Not only have we found the minimum of this function, but we can see it is a global minimum.)
Thus the solution is as follows: Pretend the first number is in place. Calculate the vector of differences. Minimize the sum of the absolute value of this vector; do this by finding the median OF THE DIFFERENCES and subtracting that off from the differences to obtain an improved differences-vector. The sum of the absolute value of the "improved" vector is your answer. This is O(N) The solutions of equal optimality will (as per the above) always be "adjacent". A unique solution exists only if there are an odd number of numbers; otherwise if there are an even number of numbers, AND the median-of-differences is not an integer, the equally-optimal solutions will have difference-vectors with corrective factors of any number between the two medians.
So I guess this wouldn't be complete without a final example.
input: 2 3 4 10 14 14 15 100
difference vector: 2 3 4 5 6 7 8 9-2 3 4 10 14 14 15 100 = 0 0 0 -5 -8 -7 -7 -91
note that the medians of the difference-vector are not in the middle anymore, we need to perform an O(N) median-finding algorithm to extract them...
medians of difference-vector are -5 and -7
let us take -5 to be our correction factor (any number between the medians, such as -6 or -7, would also be a valid choice)
thus our new goal is 2 3 4 5 6 7 8 9+5=7 8 9 10 11 12 13 14, and the new differences are 5 5 5 0 -3 -2 -2 -86*
this means we will need to do 5+5+5+0+3+2+2+86=108 steps
*(we obtain this by repeating step 2 with our new target, or by adding 5 to each number of the previous difference... but since you only care about the sum, we'd just add 8*5 (vector length times correct factor) to the previously calculated sum)
Alternatively, we could have also taken -6 or -7 to be our correction factor. Let's say we took -7...
then the new goal would have been 2 3 4 5 6 7 8 9+7=9 10 11 12 13 14 15 16, and the new differences would have been 7 7 7 2 1 0 0 -84
this would have meant we'd need to do 7+7+7+2+1+0+0+84=108 steps, the same as above
If you simulate this yourself, can see the number of steps becomes >108 as we take offsets further away from the range [-5,-7].
Pseudocode:
def minSteps(array A of size N):
A' = [0,1,...,N-1]
diffs = A'-A
medianOfDiffs = leftMedian(diffs)
return sum(abs(diffs-medianOfDiffs))
Python:
leftMedian = lambda x:sorted(x)[len(x)//2]
def minSteps(array):
target = range(len(array))
diffs = [t-a for t,a in zip(target,array)]
medianOfDiffs = leftMedian(diffs)
return sum(abs(d-medianOfDiffs) for d in diffs)
edit:
It turns out that for arrays of distinct integers, this is equivalent to a simpler solution: picking one of the (up to 2) medians, assuming it doesn't move, and moving other numbers accordingly. This simpler method often gives incorrect answers if you have any duplicates, but the OP didn't ask that, so that would be a simpler and more elegant solution. Additionally we can use the proof I've given in this solution to justify the "assume the median doesn't move" solution as follows: the corrective factor will always be in the center of the array (i.e. the median of the differences will be from the median of the numbers). Thus any restriction which also guarantees this can be used to create variations of this brainteaser.
Get one of the medians of all the numbers. As the numbers are already sorted, this shouldn't be a big deal. Assume that median does not move. Then compute the total cost of moving all the numbers accordingly. This should give the answer.
community edit:
def minSteps(a):
"""INPUT: list of sorted unique integers"""
oneMedian = a[floor(n/2)]
aTarget = [oneMedian + (i-floor(n/2)) for i in range(len(a))]
# aTargets looks roughly like [m-n/2?, ..., m-1, m, m+1, ..., m+n/2]
return sum(abs(aTarget[i]-a[i]) for i in range(len(a)))
This is probably not an ideal solution, but a first idea.
Given a sorted sequence [x1, x2, …, xn]:
Write a function that returns the differences of an element to the previous and to the next element, i.e. (xn – xn–1, xn+1 – xn).
If the difference to the previous element is > 1, you would have to increase all previous elements by xn – xn–1 – 1. That is, the number of necessary steps would increase by the number of previous elements × (xn – xn–1 – 1). Let's call this number a.
If the difference to the next element is >1, you would have to decrease all subsequent elements by xn+1 – xn – 1. That is, the number of necessary steps would increase by the number of subsequent elements × (xn+1 – xn – 1). Let's call this number b.
If a < b, then increase all previous elements until they are contiguous to the current element. If a > b, then decrease all subsequent elements until they are contiguous to the current element. If a = b, it doesn't matter which of these two actions is chosen.
Add up the number of steps taken in the previous step (by increasing the total number of necessary steps by either a or b), and repeat until all elements are contiguous.
First of all, imagine that we pick an arbitrary target of contiguous increasing values and then calculate the cost (number of steps required) for modifying the array the array to match.
Original: 3 5 7 8 10 16
Target: 4 5 6 7 8 9
Difference: +1 0 -1 -1 -2 -7 -> Cost = 12
Sign: + 0 - - - -
Because the input array is already ordered and distinct, it is strictly increasing. Because of this, it can be shown that the differences will always be non-increasing.
If we change the target by increasing it by 1, the cost will change. Each position in which the difference is currently positive or zero will incur an increase in cost by 1. Each position in which the difference is currently negative will yield a decrease in cost by 1:
Original: 3 5 7 8 10 16
New target: 5 6 7 8 9 10
New Difference: +2 +1 0 0 -1 -6 -> Cost = 10 (decrease by 2)
Conversely, if we decrease the target by 1, each position in which the difference is currently positive will yield a decrease in cost by 1, while each position in which the difference is zero or negative will incur an increase in cost by 1:
Original: 3 5 7 8 10 16
New target: 3 4 5 6 7 8
New Difference: 0 -1 -2 -2 -3 -8 -> Cost = 16 (increase by 4)
In order to find the optimal values for the target array, we must find a target such that any change (increment or decrement) will not decrease the cost. Note that an increment of the target can only decrease the cost when there are more positions with negative difference than there are with zero or positive difference. A decrement can only decrease the cost when there are more positions with a positive difference than with a zero or negative difference.
Here are some example distributions of difference signs. Remember that the differences array is non-increasing, so positives always have to be first and negatives last:
C C
+ + + - - - optimal
+ + 0 - - - optimal
0 0 0 - - - optimal
+ 0 - - - - can increment (negatives exceed positives & zeroes)
+ + + 0 0 0 optimal
+ + + + - - can decrement (positives exceed negatives & zeroes)
+ + 0 0 - - optimal
+ 0 0 0 0 0 optimal
C C
Observe that if one of the central elements (marked C) is zero, the target must be optimal. In such a circumstance, at best any increment or decrement will not change the cost, but it may increase it. This result is important, because it gives us a trivial solution. We pick a target such that a[n/2] remains unchanged. There may be other possible targets that yield the same cost, but there are definitely none that are better. Here's the original code modified to calculate this cost:
//n is the number of elements in array a
int targetValue;
int cost = 0;
int middle = n / 2;
int startValue = a[middle] - middle;
for (i = 0; i < n; i++)
{
targetValue = startValue + i;
cost += abs(targetValue - a[i]);
}
printf("%d\n",cost);
You can not do it by iterating once on the array, that's for sure.
You need first to check the difference between each two numbers, for example:
2,7,8,9 can be 2,3,4,5 with 18 steps or 6,7,8,9 with 4 steps.
Create a new array with the difference like so: for 2,7,8,9 it wiil be 4,1,1. Now you can decide whether to increase or decrease the first number.
Lets assume that the contiguous array looks something like this -
c c+1 c+2 c+3 .. and so on
Now lets take an example -
5 7 8 10
The contiguous array in this case will be -
c c+1 c+2 c+3
In order to get the minimum steps, the sum of the modulus of the difference of the integers(before and after) w.r.t the ith index should be the minimum. In which case,
(c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2 should be minimum
Let f(c) = (c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2
= 4c^2 - 48c + 146
Applying differential calculus to get the minima,
f'(c) = 8c - 48 = 0
=> c = 6
So our contiguous array is 6 7 8 9 and the minimum cost here is 2.
To sum it up, just generate f(c), get the first differential and find out c.
This should take O(n).
Brute force approach O(N*M)
If one draws a line through each point in the array a then y0 is a value where each line starts at index 0. Then the answer is the minimum among number of steps reqired to get from a to every line that starts at y0, in Python:
y0s = set((y - i) for i, y in enumerate(a))
nsteps = min(sum(abs(y-(y0+i)) for i, y in enumerate(a))
for y0 in xrange(min(y0s), max(y0s)+1)))
Input
2,4,5,6
2,4,5,8
Output
1
3

Finding the minimum and maximm element from one of many arrays

I received a question during an Amazon interview and would like assistance with solving it.
Given N arrays of size K each, each of these K elements in the N arrays are sorted, and each of these N*K elements are unique. Choose a single element from each of the N arrays, from the chosen subset of N elements. Subtract the minimum and maximum element. This difference should be the least possible minimum.
Sample:
N=3, K=3
N=1 : 6, 16, 67
N=2 : 11,17,68
N=3 : 10, 15, 100
here if 16, 17, 15 are chosen, we get the minimum difference as
17-15=2.
I can think of O(N*K*N)(edited after correctly pointed out by zivo, not a good solution now :( ) solution.
1. Take N pointer initially pointing to initial element each of N arrays.
6, 16, 67
^
11,17,68
^
10, 15, 100
^
2. Find out the highest and lowest element among the current pointer O(k) (6 and 11) and find the difference between them.(5)
3. Increment the pointer which is pointing to lowest element by 1 in that array.
6, 16, 67
^
11,17,68
^
10, 15, 100 (difference:5)
^
4. Keep repeating step 2 and 3 and store the minimum difference.
6, 16, 67
^
11,17,68
^
10,15,100 (difference:5)
^
6, 16, 67
^
11,17,68
^
10,15,100 (difference:2)
^
Above will be the required solution.
6, 16, 67
^
11,17,68
^
10,15,100 (difference:84)
^
6, 16, 67
^
11,17,68
^
10,15,100 (difference:83)
^
And so on......
EDIT:
Its complexity can be reduced by using a heap (as suggested by Uri). I thought of it but faced a problem: Each time an element is extracted from heap, its array number has to be found out in order to increment the corresponding pointer for that array. An efficient way to find array number can definitely reduce the complexity to O(K*N log(K*N)). One naive way is to use a data structure like this
Struct
{
int element;
int arraynumer;
}
and reconstruct the initial data like
6|0,16|0,67|0
11|1,17|1,68|1
10|2,15|2,100|2
Initially keep the current max for first column and insert the pointed elements in heap. Now each time an element is extracted, its array number can be found out, pointer in that array is incremented , the newly pointed element can be compared to current max and max pointer can be adjusted accordingly.
So here is an algorithm to do solve this problem in two steps:
First step is to merge all your arrays into one sorted array which would look like this:
combined_val[] - which holds all numbers
combined_ind[] - which holds index of which array did this number originally belonged to
this step can be done easily in O(K*N*log(N)) but i think you can do better than that too (maybe not, you can lookup variants of merge sort because they do step similar to that)
Now second step:
it is easier to just put code instead of explaining so here is the pseduocode:
int count[N] = { 0 }
int head = 0;
int diffcnt = 0;
// mindiff is initialized to overall maximum value - overall minimum value
int mindiff = combined_val[N * K - 1] - combined_val[0];
for (int i = 0; i &lt N * K; i++)
{
count[combined_ind[i]]++;
if (count[combined_ind[i]] == 1) {
// diffcnt counts how many arrays have at least one element between
// indexes of "head" and "i". Once diffcnt reaches N it will stay N and
// not increase anymore
diffcnt++;
} else {
while (count[combined_ind[head]] > 1) {
// We try to move head index as forward as possible while keeping diffcnt constant.
// i.e. if count[combined_ind[head]] is 1, then if we would move head forward
// diffcnt would decrease, that is something we dont want to do.
count[combined_ind[head]]--;
head++;
}
}
if (diffcnt == N) {
// i.e. we got at least one element from all arrays
if (combined_val[i] - combined_val[head] &lt mindiff) {
mindiff = combined_val[i] - combined_val[head];
// if you want to save actual numbers too, you can save this (i.e. i and head
// and then extract data from that)
}
}
}
the result is in mindiff.
The runing time of second step is O(N * K). This is because "head" index will move only N*K times maximum. so the inner loop does not make this quadratic, it is still linear.
So total algorithm running time is O(N * K * log(N)), however this is because of merging step, if you can come up with better merging step you can probably bring it down to O(N * K).
This problem is for managers
You have 3 developers (N1), 3 testers (N2) and 3 DBAs (N3)
Choose the less divergent team that can run a project successfully.
int[n] result;// where result[i] keeps the element from bucket N_i
int[n] latest;//where latest[i] keeps the latest element visited from bucket N_i
Iterate elements in (N_1 + N_2 + N_3) in sorted order
{
Keep track of latest element visited from each bucket N_i by updating 'latest' array;
if boundary(latest) < boundary(result)
{
result = latest;
}
}
int boundary(int[] array)
{
return Max(array) - Min(array);
}
I've O(K*N*log(K)), with typical execution much less. Currently cannot think anything better. I'll explain first the easier to describe (somewhat longer execution):
For each element f in the first array (loop through K elements)
For each array, starting from the second array (loop through N-1 arrays)
Do a binary search on the array, and find element closest to f. This is your element (Log(K))
This algorithm can be optimized, if for each array, you add a new Floor Index. When performent the binary search, search between 'Floor' to 'K-1'.
Initially Floor index is 0, and for first element you search through the entire arrays. Once you find an element closest to 'f', update the Floor Index with the index of that element. Worse case is the same (Floor may not update, if maximum element of first array is smaller than any other minimum), but average case will improve.
Correctness proof for the accepted answer (Terminal's solution)
Assume that the algorithm finds a series A=<A[1],A[2],...,A[N]> which isn't the optimal solution (R).
Consider the index j in R, such that item R[j] is the first item among R that the algorithm examines and replaces it with the next item in its row.
Let A' denote the candidate solution at that phase (prior to the replacement). Since R[j]=A'[j] is the minimum value of A', it's also the minimum of R.
Now, consider the maximum value of R, R[m]. If A'[m]<R[m], then R can be improved by replacing R[m] with A'[m], which contradicts the fact that R is optimal. Therefore, A'[m]=R[m].
In other words, R and A' share the same maximum and minimum, therefore they are equivalent. This completes the proof: if R is an optimal solution, then the algorithm is guaranteed to find a solution as good as R.
for every element in 1st array
choose the element in 2nd array that is closest to the element in 1st array
current_array = 2;
do
{
choose the element in current_array+1 that is closest to the element in current_array
current_array++;
} while(current_array < n);
complexity: O(k^2*n)
Here is my logic on how to resolve this issue, keeping in mind that we need to pick one element from each of the N arrays (to compute the least minimum)
// if we take the above values as an example!
// then the idea would be to sort all three arrays while keeping another
// array to keep the reference to their sets (1 or 2 or 3, could be
// extended to n sets)
1 3 2 3 1 2 1 2 3 // this is the array that holds the set index
6 10 11 15 16 17 67 68 100 // this is the sorted combined array.
| |
5 2 33 // this is the computed least minimum,
// the rule is to make sure the indexes of the values
// we are comparing are different (to make sure we are
// comparing elements from different sets), then for example
// the first element of that example is index:1|value:6 we hold
// that value 6 (that is the value we will be using to compute the least minimum,
// then we go to the edge of the comparison which would be the second different index,
// we skip index:3|value:10 (we remove it from the array) we compare index:2|value:11
// to index:1|value:6 we obtain 5 which would go to a variable named leastMinimum = 5,
// now we remove the indexes and values we already used,
// and redo the same steps.
Step 1:
1 3 2 3 1 2 1 2 3
6 10 11 15 16 17 67 68 100
|
5
leastMinumum = 5
Step 2:
3 1 2 1 2 3
15 16 17 67 68 100
|
2
leastMinimum = min(2, leastMinumum) // which is equal 2
Step 3:
1 2 3
67 68 100
33
leastMinimum = min(33, leastMinumum) // which is equal to old leastMinumum which is 2
Now: We suppose we have elements from the same array that are very close to each other (k=2 this time which means we only have 3 sets with two values) :
// After sorting the n arrays we will have the below indexes array and values array
1 1 2 3 2 3
6 7 8 12 15 16
* * *
* we skip second index of 1|7 and we take the least minimum of 1|6 and 3|12 (index:2|value:8 will be removed as it is not at the edges, we pick the minimum and maximum of the unique index subset of n elements)
1 3
6 12
=6
* second step we remove the values we already used, so the array become like below:
1 2 3
7 15 16
* * *
7 - 16
= 9
Note:
Another approach that consumes more memory would consist of creating N sub-arrays from which we would be comparing the maximum - minumum
So from the below sorted values array and its corresponding indexes array we extract three other sub arrays:
1 3 2 3 1 2 1 2 3
6 10 11 15 16 17 67 68 100
First Array:
1 3 2
6 10 11
11-6 = 5
Second Array:
3 1 2
15 15 17
17-15 = 2
Third Array:
1 2 3
67 68 100
100 - 67 = 33

Resources