How to perform range updates in sqrt{n} time? - algorithm

I have an array and I have to perform query and updates on it.
For queries, I have to find frequency of a particular number in a range from l to r and for update, I have to add x from some range l to r.
How to perform this?
I thought of sqrt{n} optimization but I don't know how to perform range updates with this time complexity.
Edit - Since some people are asking for an example, here is one
Suppose the array is of size n = 8
and it is
1 3 3 4 5 1 2 3
And there are 3 queries to help everybody explain about what I am trying to say
Here they are
q 1 5 3 - This means that you have to find the frequency of 3 in range 1 to 5 which is 2 as 3 appears on 2nd and 3rd position.
second is update query and it goes like this - u 2 4 6 -> This means that you have to add 6 in the array from range 2 to 4. So the new array will become
1 9 9 10 5 1 2 3
And the last query is again the same as first one which will now return 0 as there is no 3 in the array from position 1 to 5 now.
I believe things must be more clear now. :)

I developed this algorithm long time (20+ years) ago for Arithmetic coder.
Both Update and Retrieve are performed in O(log(N)).
I named this algorithm "Method of Intervals". Let I show you the example.
Imagine, we have 8 intervals, with numbers 0-7:
+--0--+--1--+--2-+--3--+--4--+--5--+--6--+--7--+
Lets we create additional set of intervals, each spawns pair of original ones:
+----01-----+----23----+----45-----+----67-----+
Thereafter, we'll create the extra one layer of intervals, spawn pairs of 2nd:
+---------0123---------+---------4567----------+
And at last, we create single interval, covers all 8:
+------------------01234567--------------------+
As you see, in this structure, to retrieve right border of the interval [5], you needed just add together length of intervals [0123] + [45]. to retrieve left border of the interval [5], you needed sum of length the intervals [0123] + [4] (left border for 5 is right border for 4).
Of course, left border of the interval [0] is always = 0.
When you'll watch this proposed structure carefully, you will see, the odd elements in the each layers aren't needed. I say, you do not needed elements 1, 3, 5, 7, 23, 67, 4567, since these elements aren't used, during Retrieval or Update.
Lets we remove the odd elements and make following remuneration:
+--1--+--x--+--3-+--x--+--5--+--x--+--7--+--x--+
+-----2-----+-----x----+-----6-----+-----x-----+
+-----------4----------+-----------x-----------+
+----------------------8-----------------------+
As you see, with this remuneration, used the numbers [1-8]. Lets they will be array indexes. So, you see, there is used memory O(N).
To retrieve right border of the interval [7], you needed add length of the values with indexes 4,6,7. To update length of the interval [7], you needed add difference to all 3 of these values. As result, both Retrieval and Update are performed for Log(N) time.
Now is needed algorithm, how by the original interval number compute set of indexes in this data structure. For instance - how to convert:
1 -> 1
2 -> 2
3 -> 3,2
...
7 -> 7,6,4
This is easy, if we will see binary representation for these numbers:
1 -> 1
10 -> 10
11 -> 11,10
111 -> 111,110,100
As you see, in the each chain - next value is previous value, where rightmost "1" changed to "0". Using simple bit operation "x & (x - 1)", we can wtite a simple loop to iterate array indexes, related to the interval number:
int interval = 7;
do {
int index = interval;
do_something(index);
} while(interval &= interval - 1);

Related

Algorithms: Get final state of array after incrementing/decrementing elements based on neighbors

You're given an array of integers A. You keep doing iterations of the following until the array stops changing: if an element is larger than both of its adjacent neighbors, decrement it by 1. If an element is smaller than both of its adjacent neighbors, increment it by 1. Return the final state of the array (when it will not change any more). Note that the first and last elements do not have two neighbors, so they will never change.
Example: [1,2,7,4,6] -> [1,2,6,5,6] ->[1,2,5,6,6]
Example: [1,2,3,4] Does not change
Anyone have an idea of how to do this better than simulation? I feel like there should be an O(n) solution, but I can't think of it.
You can calculate teh differnce between all numbers. For your series:
1 , 2 , 7 , 4 , 6 you get
1 5 -3 2
you can conclude that a sign change from + to - means a decrease of the number and from - to + and increase
You also can conclude that 7 can be decreased min(abs(5), abs(-3)) = 3 times max before it "hits its boundry" what is 4. Problem is that 4 changes with the first iteration. This you can recognize by the 2 sequential sign changes. So when this is happening, your max before hitting the boudry for 7 becomes: min(abs(5), ceiling(abs(-3)/2)) = 2
as the max of hitting the boundy on 4 becomes min(ceiling(abs(-3)/2), abs(2)) = 2
With the knowledge above, you know you need to deduct: 7 - 2 = 5 and increase 4 + 2 = 6 to get your answer.

Data structure to handle numerous queries on large size array

Given q queries of the following form. A list is there.
1 x y: Add number x to the list y times.
2 n: find the nth number of the sorted list
constraints
1 <= q <= 5 * 100000
1 <= x, y <= 1000000000
1 <= n < length of list
sample.
input
4
1 3 6
1 5 2
2 7
2 4
output
5
3
This is a competitive programming problem that it's too early in the morning for me to solve right now, but I can try and give some pointers.
If you were to store the entire array explicitly, it would obviously blow out your memory. But you can exploit the structure of the array to instead store the number of times each entry appears in the array. So if you got the query
1 3 5
then instead of storing [3, 3, 3], you'd store the pair (3, 5), indicating that the number 3 is in the list 5 times.
You can pretty easily build this, perhaps as a vector of pairs of ints that you update.
The remaining task is to implement the 2 query, where you find an element by its index. A side effect of the structure we've chosen is that you can't directly index into that vector of pairs of ints, since the indices in that list don't match up with the indices into the hypothetical array. We could just add up the size of each entry in the vector from the start until we hit the index we want, but that's O(n^2) in the number of queries we've processed so far... likely too slow. Instead, we probably want some updatable data structure for prefix sums—perhaps as described in this answer.

Determining the Longest Continguous Subsequence

There are N nodes (1 <= N <= 100,000) various positions along a
long one-dimensional length. The ith node is at position x_i (an
integer in the range 0...1,000,000,000) and has a node type b_i(an integer in
the range 1..8). Nodes can not be in the same position
You want to get a range on this one-dimension in which all of the types of nodes are fairly represented. Therefore, you want to ensure that, for whatever types of nodes that are present in the range, there is an equal number of each node type (for example, a range with 27 each of types 1 and 3 is ok, a range with 27 of types 1, 3, and 4 is
ok, but 9 of type 1 and 10 of type 3 is not ok). You also want
at least K (K >= 2) types (out of the 8 total) to be represented in the
rand. Find the maximum size of this range that satisfies the constraints. The size of a photo is the difference between the maximum and minimum positions of the nodes in the photo.
If there are no ranges satisfying the constraints, output -1 instead.
INPUT:
* Line 1: N and K separated by a space
* Lines 2..N+1: Each line contains a description of a node as two
integers separated by a space; x(i) and its node type.
INPUT:
9 2
1 1
5 1
6 1
9 1
100 1
2 2
7 2
3 3
8 3
INPUT DETAILS:
Node types: 1 2 3 - 1 1 2 3 1 - ... - 1
Locations: 1 2 3 4 5 6 7 8 9 10 ... 99 100
OUTPUT:
* Line 1: A single integer indicating the maximum size of a fair
range. If no such range exists, output -1.
OUTPUT:
6
OUTPUT DETAILS:
The range from x = 2 to x = 8 has 2 each of types 1, 2, and 3. The range
from x = 9 to x = 100 has 2 of type 1, but this is invalid because K = 2
and so you need at least 2 distinct types of nodes.
Could You Please help in suggesting some algorithm to solve this. I have thought about using some sort of priority queue or stack data structure, but am really unsure how to proceed.
Thanks, Todd
It's not too difficult to invent almost linear-time algorithm because recently similar problem was discussed on CodeChef: "ABC-Strings".
Sort nodes by their positions.
Prepare all possible subsets of node types (for example, we could expect types 1,2,4,5,7 to be present in resulting interval and all other types not present there). For K=2 there may be only 256-8-1=247 subsets. For each subset perform remaining steps:
Initialize 8 type counters to [0,0,0,0,0,0,0,0].
For each node perform remaining steps:
Increment counter for current node type.
Take L counters for types included to current subset, subtract first of them from other L-1 counters, which produces L-1 values. Take remaining 8-L counters and combine them together with those L-1 values into a tuple of 7 values.
Use this tuple as a key for hash map. If hash map contains no value for this key, add a new entry with this key and value equal to the position of current node. Otherwise subtract value in the hash map from the position of current node and (possibly) update the best result.

minimum steps required to make array of integers contiguous

given a sorted array of distinct integers, what is the minimum number of steps required to make the integers contiguous? Here the condition is that: in a step , only one element can be changed and can be either increased or decreased by 1 . For example, if we have 2,4,5,6 then '2' can be made '3' thus making the elements contiguous(3,4,5,6) .Hence the minimum steps here is 1 . Similarly for the array: 2,4,5,8:
Step 1: '2' can be made '3'
Step 2: '8' can be made '7'
Step 3: '7' can be made '6'
Thus the sequence now is 3,4,5,6 and the number of steps is 3.
I tried as follows but am not sure if its correct?
//n is the number of elements in array a
int count=a[n-1]-a[0]-1;
for(i=1;i<=n-2;i++)
{
count--;
}
printf("%d\n",count);
Thanks.
The intuitive guess is that the "center" of the optimal sequence will be the arithmetic average, but this is not the case. Let's find the correct solution with some vector math:
Part 1: Assuming the first number is to be left alone (we'll deal with this assumption later), calculate the differences, so 1 12 3 14 5 16-1 2 3 4 5 6 would yield 0 -10 0 -10 0 -10.
sidenote: Notice that a "contiguous" array by your implied definition would be an increasing arithmetic sequence with difference 1. (Note that there are other reasonable interpretations of your question: some people may consider 5 4 3 2 1 to be contiguous, or 5 3 1 to be contiguous, or 1 2 3 2 3 to be contiguous. You also did not specify if negative numbers should be treated any differently.)
theorem: The contiguous numbers must lie between the minimum and maximum number. [proof left to reader]
Part 2: Now returning to our example, assuming we took the 30 steps (sum(abs(0 -10 0 -10 0 -10))=30) required to turn 1 12 3 14 5 16 into 1 2 3 4 5 6. This is one correct answer. But 0 -10 0 -10 0 -10+c is also an answer which yields an arithmetic sequence of difference 1, for any constant c. In order to minimize the number of "steps", we must pick an appropriate c. In this case, each time we increase or decrease c, we increase the number of steps by N=6 (the length of the vector). So for example if we wanted to turn our original sequence 1 12 3 14 5 16 into 3 4 5 6 7 8 (c=2), then the differences would have been 2 -8 2 -8 2 -8, and sum(abs(2 -8 2 -8 2 -8))=30.
Now this is very clear if you could picture it visually, but it's sort of hard to type out in text. First we took our difference vector. Imagine you drew it like so:
4|
3| *
2| * |
1| | | *
0+--+--+--+--+--*
-1| |
-2| *
We are free to "shift" this vector up and down by adding or subtracting 1 from everything. (This is equivalent to finding c.) We wish to find the shift which minimizes the number of | you see (the area between the curve and the x-axis). This is NOT the average (that would be minimizing the standard deviation or RMS error, not the absolute error). To find the minimizing c, let's think of this as a function and consider its derivative. If the differences are all far away from the x-axis (we're trying to make 101 112 103 114 105 116), it makes sense to just not add this extra stuff, so we shift the function down towards the x-axis. Each time we decrease c, we improve the solution by 6. Now suppose that one of the *s passes the x axis. Each time we decrease c, we improve the solution by 5-1=4 (we save 5 steps of work, but have to do 1 extra step of work for the * below the x-axis). Eventually when HALF the *s are past the x-axis, we can NO LONGER IMPROVE THE SOLUTION (derivative: 3-3=0). (In fact soon we begin to make the solution worse, and can never make it better again. Not only have we found the minimum of this function, but we can see it is a global minimum.)
Thus the solution is as follows: Pretend the first number is in place. Calculate the vector of differences. Minimize the sum of the absolute value of this vector; do this by finding the median OF THE DIFFERENCES and subtracting that off from the differences to obtain an improved differences-vector. The sum of the absolute value of the "improved" vector is your answer. This is O(N) The solutions of equal optimality will (as per the above) always be "adjacent". A unique solution exists only if there are an odd number of numbers; otherwise if there are an even number of numbers, AND the median-of-differences is not an integer, the equally-optimal solutions will have difference-vectors with corrective factors of any number between the two medians.
So I guess this wouldn't be complete without a final example.
input: 2 3 4 10 14 14 15 100
difference vector: 2 3 4 5 6 7 8 9-2 3 4 10 14 14 15 100 = 0 0 0 -5 -8 -7 -7 -91
note that the medians of the difference-vector are not in the middle anymore, we need to perform an O(N) median-finding algorithm to extract them...
medians of difference-vector are -5 and -7
let us take -5 to be our correction factor (any number between the medians, such as -6 or -7, would also be a valid choice)
thus our new goal is 2 3 4 5 6 7 8 9+5=7 8 9 10 11 12 13 14, and the new differences are 5 5 5 0 -3 -2 -2 -86*
this means we will need to do 5+5+5+0+3+2+2+86=108 steps
*(we obtain this by repeating step 2 with our new target, or by adding 5 to each number of the previous difference... but since you only care about the sum, we'd just add 8*5 (vector length times correct factor) to the previously calculated sum)
Alternatively, we could have also taken -6 or -7 to be our correction factor. Let's say we took -7...
then the new goal would have been 2 3 4 5 6 7 8 9+7=9 10 11 12 13 14 15 16, and the new differences would have been 7 7 7 2 1 0 0 -84
this would have meant we'd need to do 7+7+7+2+1+0+0+84=108 steps, the same as above
If you simulate this yourself, can see the number of steps becomes >108 as we take offsets further away from the range [-5,-7].
Pseudocode:
def minSteps(array A of size N):
A' = [0,1,...,N-1]
diffs = A'-A
medianOfDiffs = leftMedian(diffs)
return sum(abs(diffs-medianOfDiffs))
Python:
leftMedian = lambda x:sorted(x)[len(x)//2]
def minSteps(array):
target = range(len(array))
diffs = [t-a for t,a in zip(target,array)]
medianOfDiffs = leftMedian(diffs)
return sum(abs(d-medianOfDiffs) for d in diffs)
edit:
It turns out that for arrays of distinct integers, this is equivalent to a simpler solution: picking one of the (up to 2) medians, assuming it doesn't move, and moving other numbers accordingly. This simpler method often gives incorrect answers if you have any duplicates, but the OP didn't ask that, so that would be a simpler and more elegant solution. Additionally we can use the proof I've given in this solution to justify the "assume the median doesn't move" solution as follows: the corrective factor will always be in the center of the array (i.e. the median of the differences will be from the median of the numbers). Thus any restriction which also guarantees this can be used to create variations of this brainteaser.
Get one of the medians of all the numbers. As the numbers are already sorted, this shouldn't be a big deal. Assume that median does not move. Then compute the total cost of moving all the numbers accordingly. This should give the answer.
community edit:
def minSteps(a):
"""INPUT: list of sorted unique integers"""
oneMedian = a[floor(n/2)]
aTarget = [oneMedian + (i-floor(n/2)) for i in range(len(a))]
# aTargets looks roughly like [m-n/2?, ..., m-1, m, m+1, ..., m+n/2]
return sum(abs(aTarget[i]-a[i]) for i in range(len(a)))
This is probably not an ideal solution, but a first idea.
Given a sorted sequence [x1, x2, …, xn]:
Write a function that returns the differences of an element to the previous and to the next element, i.e. (xn – xn–1, xn+1 – xn).
If the difference to the previous element is > 1, you would have to increase all previous elements by xn – xn–1 – 1. That is, the number of necessary steps would increase by the number of previous elements × (xn – xn–1 – 1). Let's call this number a.
If the difference to the next element is >1, you would have to decrease all subsequent elements by xn+1 – xn – 1. That is, the number of necessary steps would increase by the number of subsequent elements × (xn+1 – xn – 1). Let's call this number b.
If a < b, then increase all previous elements until they are contiguous to the current element. If a > b, then decrease all subsequent elements until they are contiguous to the current element. If a = b, it doesn't matter which of these two actions is chosen.
Add up the number of steps taken in the previous step (by increasing the total number of necessary steps by either a or b), and repeat until all elements are contiguous.
First of all, imagine that we pick an arbitrary target of contiguous increasing values and then calculate the cost (number of steps required) for modifying the array the array to match.
Original: 3 5 7 8 10 16
Target: 4 5 6 7 8 9
Difference: +1 0 -1 -1 -2 -7 -> Cost = 12
Sign: + 0 - - - -
Because the input array is already ordered and distinct, it is strictly increasing. Because of this, it can be shown that the differences will always be non-increasing.
If we change the target by increasing it by 1, the cost will change. Each position in which the difference is currently positive or zero will incur an increase in cost by 1. Each position in which the difference is currently negative will yield a decrease in cost by 1:
Original: 3 5 7 8 10 16
New target: 5 6 7 8 9 10
New Difference: +2 +1 0 0 -1 -6 -> Cost = 10 (decrease by 2)
Conversely, if we decrease the target by 1, each position in which the difference is currently positive will yield a decrease in cost by 1, while each position in which the difference is zero or negative will incur an increase in cost by 1:
Original: 3 5 7 8 10 16
New target: 3 4 5 6 7 8
New Difference: 0 -1 -2 -2 -3 -8 -> Cost = 16 (increase by 4)
In order to find the optimal values for the target array, we must find a target such that any change (increment or decrement) will not decrease the cost. Note that an increment of the target can only decrease the cost when there are more positions with negative difference than there are with zero or positive difference. A decrement can only decrease the cost when there are more positions with a positive difference than with a zero or negative difference.
Here are some example distributions of difference signs. Remember that the differences array is non-increasing, so positives always have to be first and negatives last:
C C
+ + + - - - optimal
+ + 0 - - - optimal
0 0 0 - - - optimal
+ 0 - - - - can increment (negatives exceed positives & zeroes)
+ + + 0 0 0 optimal
+ + + + - - can decrement (positives exceed negatives & zeroes)
+ + 0 0 - - optimal
+ 0 0 0 0 0 optimal
C C
Observe that if one of the central elements (marked C) is zero, the target must be optimal. In such a circumstance, at best any increment or decrement will not change the cost, but it may increase it. This result is important, because it gives us a trivial solution. We pick a target such that a[n/2] remains unchanged. There may be other possible targets that yield the same cost, but there are definitely none that are better. Here's the original code modified to calculate this cost:
//n is the number of elements in array a
int targetValue;
int cost = 0;
int middle = n / 2;
int startValue = a[middle] - middle;
for (i = 0; i < n; i++)
{
targetValue = startValue + i;
cost += abs(targetValue - a[i]);
}
printf("%d\n",cost);
You can not do it by iterating once on the array, that's for sure.
You need first to check the difference between each two numbers, for example:
2,7,8,9 can be 2,3,4,5 with 18 steps or 6,7,8,9 with 4 steps.
Create a new array with the difference like so: for 2,7,8,9 it wiil be 4,1,1. Now you can decide whether to increase or decrease the first number.
Lets assume that the contiguous array looks something like this -
c c+1 c+2 c+3 .. and so on
Now lets take an example -
5 7 8 10
The contiguous array in this case will be -
c c+1 c+2 c+3
In order to get the minimum steps, the sum of the modulus of the difference of the integers(before and after) w.r.t the ith index should be the minimum. In which case,
(c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2 should be minimum
Let f(c) = (c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2
= 4c^2 - 48c + 146
Applying differential calculus to get the minima,
f'(c) = 8c - 48 = 0
=> c = 6
So our contiguous array is 6 7 8 9 and the minimum cost here is 2.
To sum it up, just generate f(c), get the first differential and find out c.
This should take O(n).
Brute force approach O(N*M)
If one draws a line through each point in the array a then y0 is a value where each line starts at index 0. Then the answer is the minimum among number of steps reqired to get from a to every line that starts at y0, in Python:
y0s = set((y - i) for i, y in enumerate(a))
nsteps = min(sum(abs(y-(y0+i)) for i, y in enumerate(a))
for y0 in xrange(min(y0s), max(y0s)+1)))
Input
2,4,5,6
2,4,5,8
Output
1
3

How to display unique numbers with their frequencies as occurring in a matrix?

I have a matrice with some number:
1 2 3 6
6 7 2 1
1 4 5 6
And the program should display all different number with own frequency for example:
1 -> 3
2 -> 2
3 -> 1
4 -> 1
5 -> 1
6 -> 3
7 -> 1
Please help me
You probably mean
1->3
Create vector (array), filled with zeros, that have size of max value in matrice (like [0..9]), travell by whole matrice and with every step increment index of vector that equals actual number.
This is soluction for short range values in matrice. If you excpect some big values, use joined list insted of vector, or matrice like this for counting:
1 0
5 0
15 0
142 0
2412 0
And increment values in second column and expand this matrice rows every time you find a new number.
Using pointers this problem reduces from matrix to a single dimensional array. Maintain a 1D array whose size is equal to the total no. of elements in the matrix, say it COUNT. Initialize it with zero. Now start with first element of the matrix and compare it with all the other elements. If we use pointers this problem transforms into traversing a 1D array and finding the no of occurrences of each element. For traversing all you have to do is just increment the pointer. While comparing when you encounter the same number just shift forward all the consecutive numbers one place ahead. For example, if 0th element is 1 and you again found 1 on 4th index, then shift forward element on 5th index to 4th, 6th to 5th and so on till the last element. This way the duplicate entry at 4th index is lost. Now decrease the count of total no of elements in the matrix by 1 and increase the corresponding entry in array COUNT by 1. Continuing this way till the last element we get a matrix with distinct nos. and their corresponding frequency in array COUNT.
This implementation is very effective for languages which support pointers.
Here's an example of how it could be done in Python.
The dict is of this format: {key:value, key2:value2}. So you can use that so you have something like {'2':3} so it'll tell you what number has how many occurances. (I'm not assuming you're going to use Python. It's just so you understand the code... maybe)
matrix = [[1,5,6],
[2,6,3],
[5,3,9]]
dict = {}
for row in matrix:
for column in row:
if str(column) in dict.keys():
dict[str(column)] += 1
else:
dict[str(column)] = 1
for key in sorted(dict.keys()):
print key, '->', dict[key]
I hope you can figure out what this does. This codepad shows the output and nice syntax hightlighting.
(I don't get why SO isn't aligning the code properly... it's monospaced but not aligned :S ... turns out it's because I was using IE6 (It's the only browser at work :-(

Resources