How to order a list according to an arbitrary order - algorithm

I searched a relevant question but couldn't find one. So my question is how do I sort an array based on an arbitrary order. For example, let's say the ordering is:
order_of_elements = ['cc', 'zz', '4b', '13']
and my list to be sorted:
list_to_be_sorted = ['4b', '4b', 'zz', 'cc', '13', 'cc', 'zz']
so the result needs to be:
ordered_list = ['cc', 'cc', 'zz', 'zz', '4b', '4b', '13']
please note that the reference list(order_of_elements) describes ordering and I don't ask about sorting according to the alphabetically sorted indices of the reference list.
You can assume that order_of_elements array includes all the possible elements.
Any pseudocode is welcome.

A simple and Pythonic way to accomplish this would be to compute an index lookup table for the order_of_elements array, and use the indices as the sorting key:
order_index_table = { item: idx for idx, item in enumerate(order_of_elements) }
ordered_list = sorted(list_to_be_sorted, key=lambda x: order_index_table[x])
The table reduces order lookup to O(1) (amortized) and thus does not change the time complexity of the sort.
(Of course it does assume that all elements in list_to_be_sorted are present in order_of_elements; if this is not necessarily the case then you would need a default return value in the key lambda.)

Since you have a limited number of possible elements, and if these elements are hashable, you can use a kind of counting sort.
Put all the elements of order_of_elements in a hashmap as keys, with counters as values. Traverse you list_to_be_sorted, incrementing the counter corresponding to the current element. To build ordered_list, go through order_of_elements and add each current element the number of times indicated by the counter of that element.
hashmap hm;
for e in order_of_elements {
hm.add(e, 0);
}
for e in list_to_be_sorted {
hm[e]++;
}
list ordered_list;
for e in order_of_elements {
list.append(e, hm[e]); // Append hm[e] copies of element e
}

Approach:
create an auxiliary array which will hold the index of 'order_of_elements'
sort the auxiliary array.
2.1 re-arrange the value in the main array while sorting the auxiliary array

Related

How to augment a skip list such that we can extract max value of a specific segment of the skiplist efficiently? [Skiplist not sorted by value]

i have a problem im struggling with.
I have a skiplist with elements:
element = (date,value)
The dates are the key's of the skiplist,and hence,the skiplist is sorted by date.
How can i augment the skiplist such that the function
Max(d1,d2) -> returns largest value between dates d1 and d2
is most efficient.
The values are integers.
The most efficient way is to iterate over each item from d1 to d2 and select the maximum item. Because the skip list is ordered by date, you cannot assume anything about the order of values: they might as well be randomly ordered. So you'll have to look at each one.
So it's O(log n) (on average: this is a skip list, after all) to find d1, and then it's O(range) to find the maximum element, where range is the number of items between d1 and d2, inclusive.
How you'd implement this is to add a function to the skip list that will allow you to iterate the list starting at an arbitrary element. You almost certainly already have a function that will iterate over the entire list in order, so all you have to do is create a function that will iterate over a range of keys (i.e. from a start key to an end key).

Given 2 arrays, returns elements that are not included in both arrays

I had an interview, and did one of the questions described below:
Given two arrays, please calculate the result: get the union and then remove the intersection from the union. e.g.
int a[] = {1, 3, 4, 5, 7};
int b[] = {5, 3, 8, 10}; // didn't mention if has the same value.
result = {1,4,7,8,10}
This is my idea:
Sort a, b.
Check each item of b using 'dichotomy search' in a. If not found, pass. Otherwise, remove this item from both a, b
result = elements left in a + elements left in b
I know it is a lousy algorithm, but nonetheless it's better than nothing. Is there a better approach than this one?
There are many approaches to this problem. one approach is:
1. construct hash-map using distinct array elements of array a with elements as keys and 1 is a value.
2. for every element,e in array b
if e in hash-map
set value of that key to 0
else
add e to result array.
3.add all keys from hash-map whose values 1 to result array.
another approach may be:
join both lists
sort the joined list
walk through the joined list and completely remove any elements that occurs multiple times
this have one drawback: it does not work if input lists already have doublets. But since we are talking about sets and set theory i would also expect the inputs to be sets in the mathematical sense.
Another (in my opinion the best) approach:
you do not need a search through your both lists. you can just sequentially iterate through them:
sort a and b
declare an empty result set
take iterators to both lists and repeat the following steps:
if the iterators values are unequal: add the smaller number to the result set and increment the belonging iterator
if the iterators values are equal: increment both iterators without adding something to the result set
if one iterator reaches end: add all remaining elements of the other set to the result

O(1) find value from a key in a range

What kind of data structure would allow me to get a corresponding value from a given key in a set of ordered range-like keys, where my key is not necessarily in the set.
Consider, [key, value]:
[3, 1]
[5, 2]
[10, 3]
Looking up 3 or 4 would return 1, 5 - 9 would return 2 and 10 would return 3. The ranges are not constant sized.
O(1) or like-O(1) is important, if possible.
A balanced binary search tree will give you O(log n).
what about a key-indexed array? Say, you know your keys are below 1000, you can simply fill a int[1000] with values, like this:
[0,0]
[1,0]
[2,0]
[3,1]
[4,1]
[5,2]
......
and so on. that'll give you o(1) performance, but huge memory overhead.
otherwise, a hash table is the closest i know of. hope it helps.
edit: look up red-black tree, it's a self balancing tree which has a worst case of o
(logn) in searching.
I would use i Dictionary in this scenario. Retrieving a value by using its key is very fast, close to O(1)
Dictionary<int, int> myDictionary = new Dictionary<int, int>();
myDictionary.Add(3,1);
myDictionary.Add(5,2);
myDictionary.Add(10,3);
//If you know the key exists, you can use
int value = myDictionary[3];
//If you don't know if the key is in the dictionary - use the TryGetValue method
int value;
if (myDictionary.TryGetValue(3, out value))
{
//The key was found and the corresponding value is stored in "value"
}
For more info: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

Queue-like data structure with fast search and insertion

I need a datastructure with the following properties:
It contains integer numbers.
Duplicates aren't allowed (that is, it stores at most one of any integer).
After it reaches the maximal size the first element is removed.
So if the capacity is 3, then this is how it would look when putting in it sequential numbers:
{}, {1}, {1, 2}, {1, 2, 3}, {2, 3, 4}, {3, 4, 5} etc.
Only two operations are needed: inserting a number into this container (INSERT) and checking if the number is already in the container (EXISTS).
The number of EXISTS operations is expected to be approximately 2 * number of INSERT operations.
I need these operations to be as fast as possible.
What would be the fastest data structure or combination of data structures for this scenario?
Sounds like a hash table using a ring buffer for storage.
O(1) for both insert and lookup (and delete if you eventually need it).
Data structures:
Queue of Nodes containing the integers, implemented as a linked list (queue)
and
HashMap mapping integers to Queue's linked list nodes (hashmap)
Insert:
if (queue.size >= MAX_SIZE) {
// Remove oldest int from queue and hashmap
hashmap.remove(queue.pop());
} else if (!hashmap.exists(newInt)) { // remove condition to allow dupes.
// add new int to queue and hashmap if not already there
Node n = new Node(newInt);
queue.push(n);
hashmap.add(newInt, n);
}
Lookup:
return hashmap.exists(lookupInt);
Note: With this implementation, you can also remove integers in O(1) since all you have to do is lookup the Node in the hashmap, and remove it from the linked list (by linking its neighbors together.)
You want a ring buffer, the best way to do this is to define an array of the size you want and then maintain indexes as to where it starts and ends.
int *buffer = {0,0,0};
int start = 0;
int end = 0;
#define LAST_INDEX 2;
void insert(int data)
{
buffer[end] = data;
end = (end == LAST_INDEX) ? 0 : end++;
}
void remove_oldest()
{
start = (start == LAST_INDEX) ? 0 : start++;
}
void exists(int data)
{
// search through with code to jump around the end if needed
}
start always points to the first item
end always points to the most recent item
the list may wrap over the end of the array
search n logn
insert 1
delete 1
For true geek marks though build a Bloom filter http://en.wikipedia.org/wiki/Bloom_filter
not guaranteed to be 100% accurate but faster than anything.
If you want to remove the lowest value, use a sorted list and if you have more elements than needed, remove the lowest one.
If you want to remove the oldest value, use a set and a queue. Both the set and the queue contain a copy of each value. If the value is in the set, no-op. If the value isn't in the set, append the value to the queue and add it to the set. If you've exceeded your size, pop the queue and remove that value from the set.
If you need to move duplicated values to the back of the queue, you'll need to switch from a set to a hash table mapping values to stable iterators into the queue and be able to remove from the middle of the queue.
Alternatively, you could use a sorted list and a hash table. Instead of just putting your values into the sorted list, you could put in pairs (id, value) and then have the hash table map from value to (id, value). id would just be incremented after every insert. When you find a match in the hash table, you remove that (id, value) from the list and add a new (id, value) pair at the end of the list. Otherwise you just add to the end of the list and pop from the beginning if it's too long.

Whats the best data-structure for storing 2-tuple (a, b) which support adding, deleting tuples and compare (either on a or b))

So here is my problem. I want to store 2-tuple (key, val) and want to perform following operations:
keys are strings and values are Integers
multiple keys can have same value
adding new tuples
updating any key with new value (any new value or updated value is greater than the previous one, like timestamps)
fetching all the keys with values less than or greater than given value
deleting tuples.
Hash seems to be the obvious choice for updating the key's value but then lookups via values will be going to take longer (O(n)). The other option is balanced binary search tree with key and value switched. So now lookups via values will be fast (O(lg(n))) but updating a key will take (O(n)). So is there any data-structure which can be used to address these issues?
Thanks.
I'd use 2 datastructures, a hash table from keys to values and a search tree ordered by values and then by keys. When inserting, insert the pair into both structures, when deleting by key, look up the value from the hash and then remove the pair from the tree. Updating is basically delete+insert. Insert, delete and update are O(log n). For fetching all the keys less than a value lookup the value in the search tree and iterate backwards. This is O(log n + k).
The choices for good hash table and search tree implementations depend a lot on your particular distribution of data and operations. That said, a good general purpose implementation of both should be sufficient.
For binary Search Tree Insert is O(logN) operation in average and O(n) in worst case. The same for lookup operation. So this should be your choice I believe.
Dictionary or Map types tend to be based on one of two structures.
Balanced tree (guarantee O(log n) lookup).
Hash based (best case is O(1), but a poor hash function for the data could result in O(n) lookups).
Any book on algorithms should cover both in lots of detail.
To provide operations both on keys and values, there are also multi-index based collections (with all the extra complexity) which maintain multiple structures (much like an RDBMS table can have multiple indexes). Unless you have a lot of lookups over a large collection the extra overhead might be a higher cost than a few linear lookups.
You can create a custom data structure which holds two dictionaries.
i.e
a hash table from keys->values and another hash table from values->lists of keys.
class Foo:
def __init__(self):
self.keys = {} # (KEY=key,VALUE=value)
self.values = {} # (KEY=value,VALUE=list of keys)
def add_tuple(self,kd,vd):
self.keys[kd] = vd
if self.values.has_key(vd):
self.values[vd].append(kd)
else:
self.values[vd] = [kd]
f = Foo()
f.add_tuple('a',1)
f.add_tuple('b',2)
f.add_tuple('c',3)
f.add_tuple('d',3)
print f.keys
print f.values
print f.keys['a']
print f.values[3]
print [f.values[v] for v in f.values.keys() if v > 1]
OUTPUT:
{'a': 1, 'c': 3, 'b': 2, 'd': 3}
{1: ['a'], 2: ['b'], 3: ['c', 'd']}
1
['c', 'd']
[['b'], ['c', 'd']]

Resources