Finding closest number in a range - algorithm

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.

This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.

sort the array and do binary search . complexity : o(nlogn + logn *t )

I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

Related

How to effectively answer range queries in an array of integers?

How to effectively and range queries in an array of integers?
Queries are of one type only, which is, given a range [a,b], find the sum of elements that are less than x (here x is a part of each query, say of the form a b x).
Initially, I tried to literally go from a to b and check if current element is less than x and adding up. But, this way is very inefficient as complexity is O(n).
Now I am trying with segment trees and sort the numbers while merging. But now my challenge is if I sort, then I am losing integers relative order. So when a query comes, I cannot use the sorted array to get values from a to b.
Here are two approaches to solving this problem with segment trees:
Approach 1
You can use a segment tree of sorted arrays.
As usual, the segment tree divides your array into a series of subranges of different sizes. For each subrange you store a sorted list of the entries plus a cumulative sum of the sorted list. You can then use binary search to find the sum of entries below your threshold value in any subrange.
When given a query, you first work out the O(log(n)) subrange that cover your [a,b] range. For each of these you use a O(log(n)) binary search. Overall this is O(qlog^2n) complexity to answer q queries (plus the preprocessing time).
Approach 2
You can use a dynamic segment tree.
A segment tree allows you to answer queries of the form "Compute sum of elements from a to b" in O(logn) time, and also to modify a single entry in O(logn).
Therefore if you start with an empty segment tree, you can reinsert the entries in increasing order. Suppose we have added all entries from 1 to 5, so our array may look like:
[0,0,0,3,0,0,0,2,0,0,0,0,0,0,1,0,0,0,4,4,0,0,5,1]
(The 0s represent entries that are bigger than 5 so haven't been added yet.)
At this point you can answer any queries that have a threshold of 5.
Overall this will cost O(nlog(n)) to add all the entries into the segment tree, O(qlog(q)) to sort the queries, and O(qlog(n)) to use the segment tree to answer the queries.

Best sorting algorithm - Partially sorted linked list

Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).

How can this be done in O(nlogn) time complexity

I had a question on my exams for which I had to come up with an efficient algorithm. The problem was like this:
We have some objects which have two properties:
H = <1,1000000>
R = <1,1000000>
we can insert one object into another if H1>H2 and R1>R2. The input contains pairs of H and R, one pair per line. if the current object can be inserted in any previous objects, we choose such with the least H and then we destroy both of them. print the number of left objects in the output.
I wonder how can this problem be solved in O(n.log(n)) time complexity using binary search trees or segment tree, or with fenwick tree.
Thanks in advance.
A solution with fenwick tree, as follows;
Let's sort the whole array by R at first (right now, we are not caring about H), and assign each item a token (which is equal to it's position in the sorted array).
Let's get back to our original array. We are going to run a sweep on the given array. Say, we have a fenwick tree, which will, instead of cumulative sum, store maximum (from beginning to that position) only for H.
For an item, say, we couldn't fit it into another item. Then we'll insert it into the tree. We'll insert in such position that is equal to it's token.
So, right now, we've a fenwick tree, which contains only the items we've dealt with till now. Other values are 0. The items in the tree are positioned in R sorted order.
Now, how to find out if we can fit current item to another object? We can actually run a binary search (upper bound) on fenwick tree for current item's H. And, as the items are already sorted in R order, instead of whole tree, we need to search in the effective range.
Binary search in fenwick tree can be done in O(log(n)). Check out the Find index with given cumulative frequency part of this article.

Number of binary search trees over n distinct elements

How many binary search trees can be constructed from n distinct elements? And how can we find a mathematically proved formula for it?
Example:
If we have 3 distinct elements, say 1, 2, 3, there
are 5 binary search trees.
Given n elements, the number of binary search trees that can be made from those elements is given by the nth Catalan number (denoted Cn). This is equal to
Intuitively, the Catalan numbers represent the number of ways that you can create a structure out of n elements that is made in the following way:
Order the elements as 1, 2, 3, ..., n.
Pick one of those elements to use as a pivot element. This splits the remaining elements into two groups - those that come before the element and those that come after.
Recursively build structures out of those two groups.
Combine those two structures together with the one element you removed to get the final structure.
This pattern perfectly matches the ways in which you can build a BST from a set of n elements. Pick one element to use as the root of the tree. All smaller elements must go to the left, and all larger elements must go to the right. From there, you can then build smaller BSTs out of the elements to the left and the right, then fuse them together with the root node to form an overall BST. The number of ways you can do this with n elements is given by Cn, and therefore the number of possible BSTs is given by the nth Catalan number.
Hope this helps!
I am sure this question is not just to count using a mathematical formula.. I took out some time and wrote the program and the explanation or idea behind the calculation for the same.
I tried solving it with recursion and dynamic programming both. Hope this helps.
The formula is already present in the previous answer:
So if you are interested in learning the solution and understanding the apporach you can always check my article Count Binary Search Trees created from N unique elements
Let T(n) be the number of bsts of n elements.
Given n distinct ordered elements, numbered 1 to n, we select i as the root.
This leaves (1..i-1) in the left subtree for T(i-1) combinations and (i+1..n) in the right subtree for T(n-i) combinations.
Therefore:
T(n) = sum_i=1^n(T(i-1) * T(n-i))
and of course T(1) = 1

What is the fastest way of updating an ordered array of numbers?

I need to calculate a 1d histogram that must be dynamically maintained and looked up frequently. One idea I had involves keeping an ordered array with the data (cause thus I can determine percentiles in O(1), and this suffices for quickly finding a histogram with non-uniform bins with the exactly same amount of points inside each bin).
So, is there a way that is less than O(N) to insert a number into an ordered array while keeping it ordered?
I guess the answer is very well known but I don't know a lot about algorithms (physicists doing numerical calculations rarely do).
In the general case, you could use a more flexible tree-like data structure. This would allow access, insertion and deletion in O(log) time and is also relatively easy to get ready-made from a library (ex.: C++'s STL map).
(Or a hash map...)
An ordered array with binary search does the same things as a tree, but is more rigid. It might probably be faster for acess and memory use but you will pay when having to insert or delete things in the middle (O(n) cost).
Note, however, that an ordered array might be enough for you: if your data points are often the same, you can mantain a list of pairs {key, count}, ordered by key, being able to quickly add another instance of an existing item (but still having to do more work to add a new item)
You could use binary search. This is O(log(n)).
If you like to insert number x, then take the number in the middle of your array and compare it to x. if x is smaller then then take the number in the middle of the first half else the number in the middle of the second half and so on.
You can perform insertions in O(1) time if you rearrange your array as a bunch of linked-lists hanging off of each element:
keys = Array([0][1][2][3][4]......)
a c b e f . .
d g i . . .
h j .
|__|__|__|__|__|__|__/linked lists
There's also the strategy of keeping two datastructures at the same time, if your update workload supports it without increasing time-complexity of common operations.
So, is there a way that is less than O(N) to insert a number into an
ordered array while keeping it ordered?
Yes, you can use an array to implement a binary search tree using arrays and do the insertion in O(log n) time. How?
Keep index 0 empty; index 1 = root; if node is the left child of parent node, index of node = 2 * index of parent node; if node is the right child of parent node, index of node = 2 * index of parent node + 1.
Insertion will thus be O(log n). Unfortunately, you might notice that the binary search tree for an ordered list might degenerate to a linear search if you don't balance the tree i.e. O(n), which is pointless. Here, you may have to implement a red black tree to keep the height balanced. However, this is quite complicated, BUT insertion can be done with arrays in O(log n). Note that the array elements will no longer be ints; instead, they'll have to be objects with a colour attribute.
I wouldn't recommend it.
Any particular reason this demands an array? You need an data structure which keeps data ordered and allows you to insert quickly. Why not a binary search tree? Or better still, a red black tree. In C++, you could use the Set structure in the Standard template library which is implemented as a red black tree. Gives you O(log(n)) insertion time and the ability to iterate over it like an array.

Resources