I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.
Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.
If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
Sort queries based on r values.
Make a Segment Tree for range sum queries [0, n]
For each value in input array from left to right:
Increment by 1 at current index i in the segment tree.
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.
There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.
Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.
For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.
Solution Idea:
First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query.
Now sort the vector v on the basis of the endpoint of a query.
Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].
Initially, all elements of the tree are zero and all elements of the last are -1.
now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].
last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree.
if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.
Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1.
6. Now print the answer[] array which contains your final answer in the given input order.
the complexity of the algorithm is O(n log n).
The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.
Overall time complexity is O(N*SQRT(N)).
Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.
kd-trees provide range queries in O(logn), where n is the number of points.
If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:
O(logdn + k)
where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Bentley is an important name when it comes to this field. :)
The problem link is: http://www.spoj.com/problems/ORDERSET/en/
In this problem, you have to maintain a dynamic set of numbers which support the two fundamental operations
INSERT(S,x): if x is not in S, insert x into S
DELETE(S,x): if x is in S, delete x from S
and the two type of queries
K-TH(S) : return the k-th smallest element of S
COUNT(S,x): return the number of elements of S smaller than x
Input
Line 1: Q (1 ≤ Q ≤ 200000), the number of operations
In the next Q lines, the first token of each line is a character I, D, K or C meaning that the corresponding operation is INSERT, DELETE, K-TH or COUNT, respectively, following by a whitespace and an integer which is the parameter for that operation.
If the parameter is a value x, it is guaranteed that 0 ≤ |x| ≤ 109. If the parameter is an index k, it is guaranteed that 1 ≤ k ≤ 109.
Output
For each query, print the corresponding result in a single line. In particular, for the queries K-TH, if k is larger than the number of elements in S, print the word 'invalid'.
I thought of using a set here, since the insertion and deletion in a set can be done in logarithmic time. However, I am not sure if set is the ideal data structure for finding the k'th element and number of elements less than it. What other DS can I use to make it optimal. Thanks!
A set is an abstract data type, not a data structure. An hash table is a data structure that can be used to implements a set, as is a binary tree.
Speaking of binary tree, it seems well suited for your needs here.
If balanced : fast INSERT and DELETE
If you keep track of subnodes count : K-TH and COUNT should be very fast too
I tried to find a solution to this but couldn't get much out of my head.
We are given two unsorted integer arrays A and B. We have to check whether array B is a permutation of A. How can this be done.? Even XORing the numbers wont work as there can be several counterexamples which have same XOR value bt are not permutation of each other.
A solution needs to be O(n) time and with space O(1)
Any help is welcome!!
Thanks.
The question is theoretical but you can do it in O(n) time and o(1) space. Allocate an array of 232 counters and set them all to zero. This is O(1) step because the array has constant size. Then iterate through the two arrays. For array A, increment the counters corresponding to the integers read. For array B, decrement them. If you run into a negative counter value during iteration of array B, stop --- the arrays are not permutations of each others. Otherwise at the end (assuming A and B have the same size, a prerequisite) the counter array is all zero and the two arrays are permutations of each other.
This is O(1) space and O(n) time solution. However it is not practical, but would easily pass as a solution to the interview question. At least it should.
More obscure solutions
Using a nondeterministic model of computation, checking that the two arrays are not permutations of each others can be done in O(1) space, O(n) time by guessing an element that has differing count on the two arrays, and then counting the instances of that element on both of the arrays.
In randomized model of computation, construct a random commutative hash function and calculate the hash values for the two arrays. If the hash values differ, the arrays are not permutations of each others. Otherwise they might be. Repeat many times to bring the probability of error below desired threshold. Also on O(1) space O(n) time approach, but randomized.
In parallel computation model, let 'n' be the size of the input array. Allocate 'n' threads. Every thread i = 1 .. n reads the ith number from the first array; let that be x. Then the same thread counts the number of occurrences of x in the first array, and then check for the same count on the second array. Every single thread uses O(1) space and O(n) time.
Interpret an integer array [ a1, ..., an ] as polynomial xa1 + xa2 + ... + xan where x is a free variable and the check numerically for the equivalence of the two polynomials obtained. Use floating point arithmetics for O(1) space and O(n) time operation. Not an exact method because of rounding errors and because numerical checking for equivalence is probabilistic. Alternatively, interpret the polynomial over integers modulo a prime number, and perform the same probabilistic check.
If we are allowed to freely access a large list of primes, you can solve this problem by leveraging properties of prime factorization.
For both arrays, calculate the product of Prime[i] for each integer i, where Prime[i] is the ith prime number. The value of the products of the arrays are equal iff they are permutations of one another.
Prime factorization helps here for two reasons.
Multiplication is transitive, and so the ordering of the operands to calculate the product is irrelevant. (Some alluded to the fact that if the arrays were sorted, this problem would be trivial. By multiplying, we are implicitly sorting.)
Prime numbers multiply losslessly. If we are given a number and told it is the product of only prime numbers, we can calculate exactly which prime numbers were fed into it and exactly how many.
Example:
a = 1,1,3,4
b = 4,1,3,1
Product of ith primes in a = 2 * 2 * 5 * 7 = 140
Product of ith primes in b = 7 * 2 * 5 * 2 = 140
That said, we probably aren't allowed access to a list of primes, but this seems a good solution otherwise, so I thought I'd post it.
I apologize for posting this as an answer as it should really be a comment on antti.huima's answer, but I don't have the reputation yet to comment.
The size of the counter array seems to be O(log(n)) as it is dependent on the number of instances of a given value in the input array.
For example, let the input array A be all 1's with a length of (2^32) + 1. This will require a counter of size 33 bits to encode (which, in practice, would double the size of the array, but let's stay with theory). Double the size of A (still all 1 values) and you need 65 bits for each counter, and so on.
This is a very nit-picky argument, but these interview questions tend to be very nit-picky.
If we need not sort this in-place, then the following approach might work:
Create a HashMap, Key as array element, Value as number of occurances. (To handle multiple occurrences of the same number)
Traverse array A.
Insert the array elements in the HashMap.
Next, traverse array B.
Search every element of B in the HashMap. If the corresponding value is 1, delete the entry. Else, decrement the value by 1.
If we are able to process entire array B and the HashMap is empty at that time, Success. else Failure.
HashMap will use constant space and you will traverse each array only once.
Not sure if this is what you are looking for. Let me know if I have missed any constraint about space/time.
You're given two constraints: Computational O(n), where n means the total length of both A and B and memory O(1).
If two series A, B are permutations of each other, then theres also a series C resulting from permutation of either A or B. So the problem is permuting both A and B into series C_A and C_B and compare them.
One such permutation would be sorting. There are several sorting algorithms which work in place, so you can sort A and B in place. Now in a best case scenario Smooth Sort sorts with O(n) computational and O(1) memory complexity, in the worst case with O(n log n) / O(1).
The per element comparision then happens at O(n), but since in O notation O(2*n) = O(n), using a Smooth Sort and comparison will give you a O(n) / O(1) check if two series are permutations of each other. However in the worst case it will be O(n log n)/O(1)
The solution needs to be O(n) time and with space O(1).
This leaves out sorting and the space O(1) requirement is a hint that you probably should make a hash of the strings and compare them.
If you have access to a prime number list do as cheeken's solution.
Note: If the interviewer says you don't have access to a prime number list. Then generate the prime numbers and store them. This is O(1) because the Alphabet length is a constant.
Else here's my alternative idea. I will define the Alphabet as = {a,b,c,d,e} for simplicity.
The values for the letters are defined as:
a, b, c, d, e
1, 2, 4, 8, 16
note: if the interviewer says this is not allowed, then make a lookup table for the Alphabet, this takes O(1) space because the size of the Alphabet is a constant
Define a function which can find the distinct letters in a string.
// set bit value of char c in variable i and return result
distinct(char c, int i) : int
E.g. distinct('a', 0) returns 1
E.g. distinct('a', 1) returns 1
E.g. distinct('b', 1) returns 3
Thus if you iterate the string "aab" the distinct function should give 3 as the result
Define a function which can calculate the sum of the letters in a string.
// return sum of c and i
sum(char c, int i) : int
E.g. sum('a', 0) returns 1
E.g. sum('a', 1) returns 2
E.g. sum('b', 2) returns 4
Thus if you iterate the string "aab" the sum function should give 4 as the result
Define a function which can calculate the length of the letters in a string.
// return length of string s
length(string s) : int
E.g. length("aab") returns 3
Running the methods on two strings and comparing the results takes O(n) running time. Storing the hash values takes O(1) in space.
e.g.
distinct of "aab" => 3
distinct of "aba" => 3
sum of "aab => 4
sum of "aba => 4
length of "aab => 3
length of "aba => 3
Since all the values are equal for both strings, they must be a permutation of each other.
EDIT: The solutions is not correct with the given alphabet values as pointed out in the comments.
You can convert one of the two arrays into an in-place hashtable. This will not be exactly O(N), but it will come close, in non-pathological cases.
Just use [number % N] as it's desired index or in the chain that starts there. If any element has to be replaced, it can be placed at the index where the offending element started. Rinse , wash, repeat.
UPDATE:
This is a similar (N=M) hash table It did use chaining, but it could be downgraded to open addressing.
I'd use a randomized algorithm that has a low chance of error.
The key is to use a universal hash function.
def hash(array, hash_fn):
cur = 0
for item in array:
cur ^= hash_item(item)
return cur
def are_perm(a1, a2):
hash_fn = pick_random_universal_hash_func()
return hash_fn(a1, hash_fn) == hash_fn(a2, hash_fn)
If the arrays are permutations, it will always be right. If they are different, the algorithm might incorrectly say that they are the same, but it will do so with very low probability. Further, you can get an exponential decrease in chance for error with a linear amount of work by asking many are_perm() questions on the same input, if it ever says no, then they are definitely not permutations of each other.
I just find a counterexample. So, the assumption below is incorrect.
I can not prove it, but I think this may be possible true.
Since all elements of the arrays are integers, suppose each array has 2 elements,
and we have
a1 + a2 = s
a1 * a2 = m
b1 + b2 = s
b1 * b2 = m
then {a1, a2} == {b1, b2}
if this is true, it's true for arrays have n-elements.
So we compare the sum and product of each array, if they equal, one is the permutation
of the other.
Given two arrays
a[] = {1,3,2,4}
b[] = {4,2,3,1}
both will have the same numbers but in different order.
We have to sort both of them. The condition is that you cannot compare elements within the same array.
I can give you an algorithm of O(N*log(N)) time complexity based on quick sort.
Randomly select an element a1 in array A
Use a1 to partition array B, note that you only have to compare every element in array B with a1
Partitioning returns the position b1. Use b1 to partition array A (the same as step 2)
Go to step 1 for the partitioned sub-arrays if their length are greater than 1.
Time complexity: T(N) = 2*T(N/2) + O(N). So the overall complexity is O(N*log(N)) according to master theorem.
Not sure I understood the question properly, but from my understanding the task is a follows:
Sort a given array a without comparing any two elements from a directly. However we are given a second array b which is guaranteed to contain the same elements as a but in arbitrary order. You are not allowed to modify b (otherwise just sort b and return it...).
In case the elements in a are distinct this is easy: for every element in a count how many elements in b are smaller. This number gives us the (zero based) index in a sorted order.
The case where elements are not necessarily distinct is left to the reader :)