Finding tuple with maximum difference between its minimum and maximum first element - algorithm

Given an array of elements of the form (a,b) where a is an integer and b is a string.
The array is sorted in terms of a the first element. We have to find the string b
which has the maximum difference between it's lowest a and highest a.
My Thoughts :
A simple approach is to hash each string to a HashTable ensuring that no two same strings
map to the same hashtable. Now consider any bucket for a string b we need to store only
two two elements in that bucket one the max a encountered till now and one the min a encountered
till now. Once the hashtable is populated we simple have to iterate over all strings and
find the one with the maximum difference.
Now this could run in O(N) time
But the only questionable assumption here is that the strings will go into a different bucket
That cannot be guaranted simply by any implementation of HashTable while maintaining the
average time complexity of insert, search and delete to be Theta(1)

Related

How to effectively answer range queries in an array of integers?

How to effectively and range queries in an array of integers?
Queries are of one type only, which is, given a range [a,b], find the sum of elements that are less than x (here x is a part of each query, say of the form a b x).
Initially, I tried to literally go from a to b and check if current element is less than x and adding up. But, this way is very inefficient as complexity is O(n).
Now I am trying with segment trees and sort the numbers while merging. But now my challenge is if I sort, then I am losing integers relative order. So when a query comes, I cannot use the sorted array to get values from a to b.
Here are two approaches to solving this problem with segment trees:
Approach 1
You can use a segment tree of sorted arrays.
As usual, the segment tree divides your array into a series of subranges of different sizes. For each subrange you store a sorted list of the entries plus a cumulative sum of the sorted list. You can then use binary search to find the sum of entries below your threshold value in any subrange.
When given a query, you first work out the O(log(n)) subrange that cover your [a,b] range. For each of these you use a O(log(n)) binary search. Overall this is O(qlog^2n) complexity to answer q queries (plus the preprocessing time).
Approach 2
You can use a dynamic segment tree.
A segment tree allows you to answer queries of the form "Compute sum of elements from a to b" in O(logn) time, and also to modify a single entry in O(logn).
Therefore if you start with an empty segment tree, you can reinsert the entries in increasing order. Suppose we have added all entries from 1 to 5, so our array may look like:
[0,0,0,3,0,0,0,2,0,0,0,0,0,0,1,0,0,0,4,4,0,0,5,1]
(The 0s represent entries that are bigger than 5 so haven't been added yet.)
At this point you can answer any queries that have a threshold of 5.
Overall this will cost O(nlog(n)) to add all the entries into the segment tree, O(qlog(q)) to sort the queries, and O(qlog(n)) to use the segment tree to answer the queries.

Data structure that supports random access by index and key, insertion, deletion in logaritmic time with order maintained

I'm looking for the data structure that stores an ordered list of E = (K, V) elements and supports the following operations in at most O(log(N)) time where N is the number of elements. Memory usage is not a problem.
E get(index) // get element by index
int find(K) // find the index of the element whose K matches
delete(index) // delete element at index, the following elements have their indexes decreased by 1
insert(index, E) // insert element at index, the following elements have their indexes increased by 1
I have considered the following incorrect solutions:
Use array: find, delete, and insert will still O(N)
Use array + map of K to index: delete and insert will still cost O(N) for shifting elements and updating map
Use linked list + map of K to element address: get and find will still cost O(N)
In my imagination, the last solution is the closest, but instead of linked list, a self-balancing tree where each node stores the number of elements on the left of it will make it possible for us to do get in O(log(N)).
However I'm not sure if I'm correct, so I want to ask whether my imagination is correct and whether there is a name for this kind of data structure so I can look for off-the-shelf solution.
The closest data structure i could think of is treaps.
Implicit treap is a simple modification of the regular treap which is a very powerful data structure. In fact, implicit treap can be considered as an array with the following procedures implemented (all in O(logN)O(log⁡N) in the online mode):
Inserting an element in the array in any location
Removal of an arbitrary element
Finding sum, minimum / maximum element etc. on an arbitrary interval
Addition, painting on an arbitrary interval
Reversing elements on an arbitrary interval
Using modification with implicit keys allows you to do all operation except the second one (find the index of the element whose K matches). I'll edit this answer if i come up with a better idea :)

Find the sum of nodes in a binary search tree whose value lie in a certain range by augmenting the BST

I want to augment a binary search tree such that search, insertion and delete be still supported in O(h) time and then I want to implement an algorithm to find the sum of all node values in a given range.
If you add an additional data structure to your BST class, specifically a Hashmap or Hashtable. Your keys will be the different numbers your BST contains and your values the number of occurrences for each. BST search(...) will not be impacted however insert(...) and delete(...) will need slight code changes.
Insert
When adding nodes to the BST check to see if that value exist in the Hashmap as a key. If it does exist increment occurrence count by 1. If it doesn't exist add it to the Hashmap with an initial value of 1.
Delete
When deleting decrement the occurrence count in the Hashmap (assuming your aren't being told to delete a node that doesn't exist)
Sum
Now for the sum function
sum(int start, int end)
You can iteratively check your Hashmap to see which numbers from the range exist in your map and their number of occurrences. Using this you can build out your sum by adding up all of the values in the Map that are in the range multiplied by their number of occurrences.
Complexities
Space: O(n)
Time of sum method: O(range size).
All other method time complexity isn't impacted.
You didn't mention a space restraint so hopefully this is OK. I am very interested to see if you can some how use the properties of a BST to solve this more efficiently nothing comes to mind for me.

Find the N-th most frequent number in the array

Find the nth most frequent number in array.
(There is no limit on the range of the numbers)
I think we can
(i) store the occurence of every element using maps in C++
(ii) build a Max-heap in linear time of the occurences(or frequence) of element and then extract upto the N-th element,
Each extraction takes log(n) time to heapify.
(iii) we will get the frequency of the N-th most frequent number
(iv) then we can linear search through the hash to find the element having this frequency.
Time - O(NlogN)
Space - O(N)
Is there any better method ?
It can be done in linear time and space. Let T be the total number of elements in the input array from which we have to find the Nth most frequent number:
Count and store the frequency of every number in T in a map. Let M be the total number of distinct elements in the array. So, the size of the map is M. -- O(T)
Find Nth largest frequency in map using Selection algorithm. -- O(M)
Total time = O(T) + O(M) = O(T)
Your method is basically right. You would avoid final hash search if you mark each vertex of the constructed heap with the number it represents. Moreover, it is possible to constantly keep watch on the fifth element of the heap as you are building it, because at some point you can get to a situation where the outcome cannot change anymore and the rest of the computation can be dropped. But this would probably not make the algorithm faster in the general case, and maybe not even in special cases. So you answered your own question correctly.
It depends on whether you want most effective, or the most easy-to-write method.
1) if you know that all numbers will be from 0 to 1000, you just make an array of 1000 zeros (occurences), loop through your array and increment the right occurence position. Then you sort these occurences and select the Nth value.
2) You have a "bag" of unique items, you loop through your numbers, check if that number is in a bag, if not, you add it, if it is here, you just increment the number of occurences. Then you pick an Nth smallest number from it.
Bag can be linear array, BST or Dictionary (hash table).
The question is "N-th most frequent", so I think you cannot avoid sorting (or clever data structure), so best complexity can not be better than O(n*log(n)).
Just written a method in Java8: This is not an efficient solution.
Create a frequency map for each element
Sort the map content based on values in reverse order.
Skip the (N-1)th element then find the first element
private static Integer findMostNthFrequentElement(int[] inputs, int frequency) {
return Arrays.stream(inputs).boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.skip(frequency - 1).findFirst().get().getKey();
}

Determining if a sequence T is a sorting of a sequence S in O(n) time

I know that one can easily determine if a sequence is sorted in O(n) time. However, how can we insure that some sequence T is indeed the sorting of elements from sequence S in O(n) time?
That is, someone might have an algorithm that outputs some sequence T that is indeed in sorted order, but may not contain any elements from sequence S, so how can we check that T is indeed a sorted sequence of S in O(n) time?
Get the length L of S.
Check the length of T as well. If they differ, you are done!
Let Hs be a hash map with something like 2L buckets of all elements in S.
Let Ht be a hash map (again, with 2L buckets) of all elements in T.
For each element in T, check that it exists in Hs.
For each element in S, check that it exists in Ht.
This will work if the elements are unique in each sequence. See wcdolphin's answer for the small changes needed to make it work with non-unique sequences.
I have NOT taken memory consumption into account. Creating two hashmap of double the size of each sequence may be expensive. This is the usual tradeoff between speed and memory.
While Emil's answer is very good, you can do slightly better.
Fundamentally, in order for T to be a reordering of S it must contain all of the same elements. That is to say, for every element in T or S, they must occur the same number of times. Thus, we will:
Create a Hash table of all elements in S, mapping from the 'Element' to the number of occurrences.
Iterate through every element in T, decrementing the number of times the current element occurred.
If the number of occurrences is zero, remove it from the hash.
If the current element is not in the hash, T is not a reordering of S.
Create a hash map of both sequences. Use the character as key, and the count of the character as value. If a character has not been added yet add it with a count of 1. If a character has already been added increase its count by 1.
Verify that for each character in the input sequence that the hash map of the sorted sequence contains the character as key and has the same count as value.
I believe it this is a O(n^2) problem because:
Assuming data structure you use to store elements is a linked list for minimal operations of removing an element
You will be doing a S.contains(element of T) for every element of T, and one to check they are the same size.
You cannot assume that s is ordered and therefore need to do a element by element comparison for every element.
worst case would be if S is reverse of T
This would mean that for element (0+x) of T you would do (n-x) comparisons if you remove each successful element.
this results in (n*(n+1))/2 operations which is O(n^2)
Might be some other cleverer algorithm out there though

Resources