Unordered set vs Sorted Set - find(x) - set

How are the find(x) methods of a unordered set different from than that of an ordered set.?
find(x) in an unordered set looks up the hashtable in O(1) and returns the value else null, wherein find(x) of an ordered set ? I came across this term called successor search ? can anyone explain what a successor search is ?

Related

Algorithm to find closest tuples for each tuple in a list

Consider an array of elements, where each element is a pair like (a,b)
The first element a is a Date and b is some positive integer.
The given array is sorted based on the Date.
We have to write a function that returns an array of integers.
Each element in the array at ith location is derived from the corresponding tuple element in the original array like following.
Take the ith tuple say (a,b) . Now look at all the tuples which occur after it. And find the one (c,d) such that d is less than b and is maximum.
The ith element in the returnes array will be (c-a).
My thoughts -
We scan from right side of the given array of tuples. And each time we encounter a tuple we add it in an AVL tree. Now searching takes time equal to height of tree.
So if the elements are distinct this will work in n log n time.
But if the second element in a tuple occur more times then we may end up traversing the whole tree.
Not sure how to address that.
We could probably store the min and Max nodes in a node for each subtree.
//input list of pair is of the form (date, value)
//DS is a data structure that support lower bound search and insert in O(logn)
index = size of list of pair - 1
for (pair p in input list of pair, scanning from right to left):
//search should return a sentinel if DS is empty
resultant_array[index--] = pair<p.date, search(previous of lower bound of p.value in DS)?.date>
if(DS doesn't contain pair with key p.value)
insert in DS the pair <p.value, p.date>
The above algorithm considers highest date if b in (a, b) could be duplicate. To take the lowest date, if p.value exists in DS, update instead of insert on the last line of algorithm.
DS could be ordered map, AVL, red-black, etc. Whole DS doesn't need to be traversed even in case of duplicate values against dates. So, just O(logn) per search.

Topological sort based on a comparator (rather than a graph)

I have a set of items and a comparator function which defines a partial ordering -- given two items, it returns "=", "<", ">", or "no defined ordering" (say "<>"). I want to produce a sorted list of items that respects this partial ordering.
If I look for algorithms to do a topological sort, they generally start with a directed acyclic graph. But I don't have a DAG, and I can't see a simple way to construct one without doing a large number (N*N perhaps?) of comparisons. What I would like is some kind of QuickSort-like algorithm that works by comparing and swapping selected items in a list. Is there such an algorithm? I'm assuming most classical sorting algorithms would fail because of the indeterminism.
I thought of trying to use a classical sort algorithm and treating "<>" as "=", but it doesn't work because I can have the situation A < B, A <> C, B <> C, so I can't treat C as being equal to both A and B.
Any ideas or pointers?
You don't need to create a graph explicitly to use topological sort algorithm.
Let S be the set of elements you want to sort, and there is a partial order in S. Let used be the dictionary that maps each element from S to a boolean value (false by default) that will be true when we visit a 'node' with this element. Let stack be the stack of elements from S (empty by default).
Define a method dfs(x) (x is an element of S) that does the following:
Set used[x] to true
For each element y in S:
If used[y] is false and x is less than or equal to y (*):
Call dfs(y)
Add x to stack
Then:
For each element x in S:
If used[x] is false:
Call dfs(x)
After this loop, elements in stack will be ordered (first item to be popped from stack is minimal one (not necessarily minimum), last item is maximal one (not necessarily maximum)).
This algorithm obviously runs in O(n^2) because it's still a topological sort, just without creating graph explicitly.
(*): Like topological sort processes only edges that go from x to y and does not process cases where edge goes from y to x or there is no edge at all, this algorithm processes only 'less than or equal to' relations.

Sorting an array of pairs using a given sort function

Let's say I have access to a stable sorting algorithm that can sort an array of integers A. So sort(A) would return the elements of A in ascending order.
I have an array of pairs of integers, that I would like to sort on the second element, where duplicates are possible. If duplicates exist in the second element, the array will preserve the ordering of the elements (it should be stable).
So if the array had entries :
(1,2),(1,1),(0,2),(3,2),(4,1)
Then the result would be :
(1,1),(4,1),(1,2),(0,2),(3,2)
Is this possible, using just the sorting function I am provided, or do I need to write my own sorting function?
array.sorted(by: { $0.1 < $1.1 })
After thinking about it for a while, I think this might work, but perhaps someone can double check that it makes sense:
I was thinking of creating a map such that if (x,y) and (z,y) are the only pairs in the array whose second component is y, then map(y) = [x,z]. In other words, given an integer x, this map finds the set of all integers y such that (y,x) is a pair in the array. I should be able to construct this map in linear time (in the size of the array of pairs) by looping through the array, and then after sorting on y, use the map to reconstruct the original array (again in linear time in the size of the array of pairs.).

Two Heaps with same key Algorithms

Hi if you have two heaps how do you determine if they have a key that is the same in O(nlogn) runtime, where n is the total size between the two min heaps.
I was thinking that it might be related to adding one of the heaps to the other but I am not positive.
bool have_same_element(heap<int> h1, heap<int> h2) {
while (!h1.empty() && !h2.empty()) {
int t1 = h1.top(), t2 = h2.top();
if (t1 == t2) return true;
if (t1 < t2) h1.pop();
else h2.pop();
}
return false;
}
O(s1 ln(s1) + s2 ln(s2)) guarantee O(n ln(n)) where n = s1+s2;
I think heap structure doesn't help to solve such task, so it doesn't matter if you have two heaps or two arrays of items. To find if two dataset have same value you can use different algorythms, i suggest two for example:
You can put items of smaller set into any hash-based dictionary and then you can enumerate items of another set and check whether they are in first set. Probably it is fastest way, but takes some additional space for dictionary.
Let's suppose you have 2 classical heap structures kept in arrays (for heap it is possible). Then you can sort smallest array. After that just enumerate items of the second heap and use binary search to check if item is in first heap.
When you finish, you can rebuild your broken heap again (it is possible to do inline into same array wheere items are).
So you get O(nlogn) where n is number of smaller heap and can do without additional memory usage.

Best data structure for nearest neighbour in 1 dimension

I have a list of values (1-dimensional) and I would like to know the best data structure / algorithm for finding the nearest to a query value I have. Most of the solutions (all?) I found for questions here are for 2 or more dimensions. Can anybody suggest to me the approach for my case?
My instinct tells me to sort the data and use binary search somehow. By the way, there is no limit on the construction or insertion time for any tree needed, so probably someone can suggest a better tree than simply a sorted list.
If you need something faster than O(log(n)), which you can easily get with a sorted array or a binary search tree, you can use a van Emde Boas Tree. vEB trees give you O(log(log(n))) to search for the closest element on either side.
If insertion time is irrelevant, then binary search on a sorted array is the simplest way to achieve O(log N) query time. Each time an item is added sort everything. For each query, perform a binary search. If a match is found, return it. Otherwise, the binary search should return the index of the item, where it should have been inserted. Use this index to check the two neighboring items and determine which of them is closer to the query point.
I suppose that there are solutions with O(1) time. I will try to think of one that doesn't involve too much memory usage...
As you already mentioned, the fastest and easiest way should be sorting the data and then looking for the left and right neighbour of a data point.
Sort the list and use binary search to find the element you are looking for, then compare your left and right neighbors. You can use an array which is O(1) access.
Something like:
int nearest(int[] list, int element) {
sort(list);
int idx = binarySearch(element, list);
// make sure you are accessing elements that exist
min = (element - list[idx-1] <= list[idx+1] - element) ? idx-1 : idx+1;
return list[min];
}
This is O(n log n), which will be amortized if you are going to perform many look ups.
EDIT: For that you'd have to move the sorting out of this method
Using OCaml's Set:
module S = Set.Make(struct type t = int let compare = compare end)
let nearest xs y =
let a, yisin, b = S.split y xs in
if yisin then y
else
let amax, bmin = S.max_elt a, S.min_elt b in
if abs (amax - y) < abs (bmin - y) then amax else bmin
Incidentally, you may appreciate my nth-nearest neighbor sample from OCaml for Scientists and The F#.NET Journal article Traversing networks: nth-nearest neighbors.

Resources