Is there an efficient algorithm that could do this? - algorithm

I have two lists of integers of equal length, each with no duplicates, and I need to map them to each other based on the (absolute value) of their differences, where nothing could be switched in the output to make the totaled differences of all pair smaller. The 'naive' approach I could think of would run would be this (in condensed C#, but I think it's pretty easy to get):
Dictionary<int, int> output;
List<int> list1, list2;
while(!list1.Empty) //While we haven't arranged all the pairs
{
int bestDistance = Int32.MaxValue; //best distance between numbers so far
int bestFirst, bestSecond; //best numbers so far
foreach(int i in list1)
{
foreach(int j in list2)
{
int distance = Math.Abs(i - j);
//if the distance is better than the best so far, make it the new best
if(distance < bestDistance)
{
bestDistance = distance;
bestFirst = i;
bestSecond = j;
}
}
}
output[bestFirst] = bestSecond; //add the best to dictionary
list1.Remove(bestFirst); //remove it from the lists
list2.Remove(bestSecond);
}
Essentially, it just finds the best pair, removes it, and then repeates until it's done. But this runs in cubic time, if I see it correctly, and would take incredibly long for large lists. Is there any faster way to do this?

This is less trivial than my initial hunch suggested. The key to keeping this O(N log(N)) is to work with sorted lists, and search for the "pivot" element in the second sorted list with the smallest difference to the first element in the first sorted list.
Thus the steps to take become:
Sort both input lists
Find the pivot element in the second sorted list
Return this pivot element together with the first element of the first sorted list
Keep track of the element index left to the pivot and right to the pivot
Iterate the first list in sorted order, returning either the left or right element, depending on which difference is smallest and adjusting the left and right indexes.
As in (c# example):
public static IEnumerable<KeyValuePair<int, int>> FindSmallestDistances(List<int> first, List<int> second)
{
Debug.Assert(first.Count == second.Count); // precondition.
// sort the input: O(N log(N)).
first.Sort();
second.Sort();
// determine pivot: O(N).
var min_first = first[0];
var smallest_abs_dif = Math.Abs(second[0] - min_first);
var pivot_ndx = 0;
for (int i = 1; i < second.Count; i++)
{
var abs_dif = Math.Abs(second[i] - min_first);
if (abs_dif < smallest_abs_dif)
{
smallest_abs_dif = abs_dif;
pivot_ndx = i;
}
};
// return the first one.
yield return new KeyValuePair<int, int>(min_first, second[pivot_ndx]);
// Iterate the rest: O(N)
var left = pivot_ndx - 1;
var right = pivot_ndx + 1;
for (var i = 1; i < first.Count; i++)
{
if (left >= 0)
{
if (right < first.Count && Math.Abs(first[i] - second[left]) > Math.Abs(first[i] - second[right]))
yield return new KeyValuePair<int, int>(first[i], second[right++]);
else
yield return new KeyValuePair<int, int>(first[i], second[left--]);
}
else
yield return new KeyValuePair<int, int>(first[i], second[right++]);
}
}

Related

Trie Autocomplete with word weight(frequency)

I was asked this during a recent phone interview -
Given a Dictionary with a word and the weight of a word(frequency, higher is better), like so -
var words = new Dictionary<string,int>();
words.Add("am",7);
words.Add("ant", 5);
words.Add("amazon", 10);
words.Add("amazing", 8);
words.Add("an", 4);
words.Add("as", 11);
words.Add("be", 8);
words.Add("bee", 2);
words.Add("bed", 4);
words.Add("best", 12);
words.Add("amuck", 1);
words.Add("amock", 2);
words.Add("bestest", 1);
Design an API method, that given a prefix and a number k, return the top k words that match the prefix.
The words should be sorted based on their weight, the higher the better.
So, prefix = "am", k = 5, returns amazon, amazing, am, amock, amuck - in that specific order.
Performance on the prefix lookup is paramount, you can pre-process and use as much space as you like, as long as the prefix lookup is fast.
This is a Trie implementation, but my question is how best to handle the word weight and optimise the lookup. In my mind the options are -
a. For each node in the Trie, also store a sorted list of words (SortedDictionary<int,List<string>>) that start with this prefix - more space, but faster lookup.
b. For each node, store the Child nodes in some kind of sorted list, so you would still need to do a DFS for each child node to get the K words needed - less space compared to a., but slower.
I decided to go with option a.
public class TrieWithSuggestions
{
TrieWithSuggestions _trieRoot;
public TrieWithSuggestions()
{
}
public char Character { get; set; }
public int WordCount { get; set; } = 1;
public TrieWithSuggestions[] ChildNodes { get; set; } = new TrieWithSuggestions[26];
//Stores all words with this prefix.
public SortedDictionary<int, HashSet<string>> PrefixWordsDictionary = new SortedDictionary<int, HashSet<string>>();
public TrieWithSuggestions ConstructTrie(Dictionary<string, int> words)
{
if (words.Count > 0)
{
_trieRoot = new TrieWithSuggestions() { Character = default(char) };
foreach (var word in words)
{
var node = _trieRoot;
for (int i = 0; i < word.Key.Length; i++)
{
var c = word.Key[i];
if (node.ChildNodes[c - 'a'] != null)
{
node = node.ChildNodes[c - 'a'];
UpdateParentNodeInformation(node, word.Key, words[word.Key]);
node.WordCount++;
}
else
{
InsertIntoTrie(node, word.Key, i, words);
break;
}
}
}
}
return _trieRoot;
}
public List<string> GetMathchingWords(string prefix, int k)
{
if (_trieRoot != null)
{
var node = _trieRoot;
foreach (var ch in prefix)
{
if (node.ChildNodes[ch - 'a'] != null)
{
node = node.ChildNodes[ch - 'a'];
}
else
return null;
}
if (node != null)
return GetWords(node, k);
else
return null;
}
return null;
}
List<string> GetWords(TrieWithSuggestions node, int k)
{
List<string> output = new List<string>();
foreach (var dictEntry in node.PrefixWordsDictionary)
{
var entries = node.PrefixWordsDictionary[dictEntry.Key];
var take = Math.Min(entries.Count, k);
output.AddRange(entries.Take(take).ToList());
k -= entries.Count;
if (k == 0)
break;
}
return output;
}
void InsertIntoTrie(TrieWithSuggestions parentNode, string word, int startIndex, Dictionary<string, int> words)
{
for (int i = startIndex; i < word.Length; i++)
{
var c = word[i];
var childNode = new TrieWithSuggestions() { Character = c };
parentNode.ChildNodes[c - 'a'] = childNode;
UpdateParentNodeInformation(parentNode, word, words[word]);
parentNode = childNode;
if (i == word.Length - 1)
UpdateParentNodeInformation(parentNode, word, words[word]);
}
}
void UpdateParentNodeInformation(TrieWithSuggestions parentNode, string word, int wordWeight)
{
wordWeight *= -1;
if (parentNode.PrefixWordsDictionary.ContainsKey(wordWeight))
{
if (!parentNode.PrefixWordsDictionary[wordWeight].Contains(word))
parentNode.PrefixWordsDictionary[wordWeight].Add(word);
}
else
parentNode.PrefixWordsDictionary.Add(wordWeight, new HashSet<string>() { word });
}
}
Construct Trie - RunTime O(N* M * logN), Space - O(N * M * N) , N - #of words, M - avg word length.
Justification -
If there were no Dictionary, this would be O(N * M), insertion into a SortedDictionary is O(logN), so worst case Runtime must be O(N* M * logN)
Space seems trickier, but like before if there were no SortedDictionary, space would be O(N * M), and in the worst case, the Dictionary could have all N words, so Space Complexity looks like O(N * M * N)
GetMatchingWords - RunTime O(len(prefix) + k)
Function call -
var trie = new TrieWithSuggestions();
trie.ConstructTrie(words);
var list = trie.GetMathchingWords("am", 10); //amazon, amazing, am, amock, amuck
QUESTION:
Given the conditions on space and pre-processing, is there a better way to do this?
EDIT 1 -
a. Given this setup, it is best to sort the words by weight and then insert into the Trie. In this case a simple List<string> would suffice, since higher frequency words would have been inserted first automatically.
b. Now lets say that in addition to being initialized with a Dictionary<string,int>, we are also going to get additional word, frequency pairs. We would still want a lookup that is as fast as possible, given this requirement what is now the best data-structure to store the sorted list of words within a TrieNode, is a SortedDictionary<int,HashSet<string>> the best option?
You could first sort the input with respect to the weights. Then, you could use Lists instead of Dictionaries on the nodes of trie. Since the words come in increasing (or decreasing) order of weight, checking the last element of the list is enough to decide where to put this new word. This gets rid of the O(logN) time taken by Dictionary.
The input can be sorted in O(N * logN) with a comparison sort, or in O(N + W) with a counting sort where W is the maximum weight.
Time complexity of setting up the trie becomes O(N * logN + N * M). This is better than O(N * M * logN). Query time does not change.
(Last paragraph assumes HashSet operations execute in O(1) as in the question. It is wrong to make this assumption for arbitrary inputs and hash functions.)

Last remaining number

I was asked this question in an interview.
Given an array 'arr' of positive integers and a starting index 'k' of the array. Delete element at k and jump arr[k] steps in the array in circular fashion. Do this repeatedly until only one element remain. Find the last remaining element.
I thought of O(nlogn) solution using ordered map. Is any O(n) solution possible?
My guess is that there is not an O(n) solution to this problem based on the fact that it seems to involve doing something that is impossible. The obvious thing you would need to solve this problem in linear time is a data structure like an array that exposes two operations on an ordered collection of values:
O(1) order-preserving deletes from the data structure.
O(1) lookups of the nth undeleted item in the data structure.
However, such a data structure has been formally proven to not exist; see "Optimal Algorithms for List Indexing and Subset Rank" and its citations. It is not a proof to say that if the natural way to solve some problem involves using a data structure that is impossible, the problem itself is probably impossible, but such an intuition is often correct.
Anyway there are lots of ways to do this in O(n log n). Below is an implementation of maintaining a tree of undeleted ranges in the array. GetIndex() below returns an index into the original array given a zero-based index into the array if items had been deleted from it. Such a tree is not self-balancing so will have O(n) operations in the worst case but in the average case Delete and GetIndex will be O(log n).
namespace CircleGame
{
class Program
{
class ArrayDeletes
{
private class UndeletedRange
{
private int _size;
private int _index;
private UndeletedRange _left;
private UndeletedRange _right;
public UndeletedRange(int i, int sz)
{
_index = i;
_size = sz;
}
public bool IsLeaf()
{
return _left == null && _right == null;
}
public int Size()
{
return _size;
}
public void Delete(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (! IsLeaf())
{
int left_range = _left._size;
if (i < left_range)
_left.Delete(i);
else
_right.Delete(i - left_range);
_size--;
return;
}
if (i == _size - 1)
{
_size--; // Can delete the last item in a range by decremnting its size
return;
}
if (i == 0) // Can delete the first item in a range by incrementing the index
{
_index++;
_size--;
return;
}
_left = new UndeletedRange(_index, i);
int right_index = i + 1;
_right = new UndeletedRange(_index + right_index, _size - right_index);
_size--;
_index = -1; // the index field of a non-leaf is no longer necessarily valid.
}
public int GetIndex(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (IsLeaf())
return _index + i;
int left_range = _left._size;
if (i < left_range)
return _left.GetIndex(i);
else
return _right.GetIndex(i - left_range);
}
}
private UndeletedRange _root;
public ArrayDeletes(int n)
{
_root = new UndeletedRange(0, n);
}
public void Delete(int i)
{
_root.Delete(i);
}
public int GetIndex(int indexRelativeToDeletes )
{
return _root.GetIndex(indexRelativeToDeletes);
}
public int Size()
{
return _root.Size();
}
}
static int CircleGame( int[] array, int k )
{
var ary_deletes = new ArrayDeletes(array.Length);
while (ary_deletes.Size() > 1)
{
int next_step = array[ary_deletes.GetIndex(k)];
ary_deletes.Delete(k);
k = (k + next_step - 1) % ary_deletes.Size();
}
return array[ary_deletes.GetIndex(0)];
}
static void Main(string[] args)
{
var array = new int[] { 5,4,3,2,1 };
int last_remaining = CircleGame(array, 2); // third element, this call is zero-based...
}
}
}
Also note that if the values in the array are known to be bounded such that they are always less than some m less than n, there are lots of O(nm) algorithms -- for example, just using a circular linked list.
I couldn't think of an O(n) solution. However, we could have O(n log n) average time by using a treap or an augmented BST with a value in each node for the size of its subtree. The treap enables us to find and remove the kth entry in O(log n) average time.
For example, A = [1, 2, 3, 4] and k = 3 (as Sumit reminded me in the comments, use the array indexes as values in the tree since those are ordered):
2(0.9)
/ \
1(0.81) 4(0.82)
/
3(0.76)
Find and remove 3rd element. Start at 2 with size = 2 (including the left subtree). Go right. Left subtree is size 1, which together makes 3, so we found the 3rd element. Remove:
2(0.9)
/ \
1(0.81) 4(0.82)
Now we're starting on the third element in an array with n - 1 = 3 elements and looking for the 3rd element from there. We'll use zero-indexing to correlate with our modular arithmetic, so the third element in modulus 3 would be 2 and 2 + 3 = 5 mod 3 = 2, the second element. We find it immediately since the root with its left subtree is size 2. Remove:
4(0.82)
/
1(0.81)
Now we're starting on the second element in modulus 2, so 1, and we're adding 2. 3 mod 2 is 1. Removing the first element we are left with 4 as the last element.

Find count of all points in a 3d space that are strictly less than any of the points in that space?

We are given n points in a 3d space ,we need to find count of all points that are strictly less than atleast one of the points in the 3d space
i.e.
x1<x2 and y1<y2 and z1<z2
so (x1,y1,z1) would be one such point.
For example,Given points
1 4 2
4 3 2
2 5 3
(1,4,2)<(2,5,3)
So the answer for the above case should be the count of such points i.e. 1.
I know this can be solved through a O(n^2) algorithm but i need something faster,i tried sorting through one dimension and then searching only over the greater part of the key , but its still o(n^2) worst case.
What is the efficient way to do this?
There is a way to optimize your search that may be faster than O(n^2) - I would welcome counter-sample input.
Keep three lists of the indexes of the points, sorted by x, y and z respectively. Make a fourth list associating each point with it's place in each of the lists (indexes in the code below; e. g., indexes[0] = [5,124,789] would mean the first point is 5th in the x-sorted list, 124th in the y-sorted list, and 789th in the z-sorted list).
Now iterate over the points - pick the list where the point is highest and test the point against the higher indexed points in the list, exiting early if the point is strictly less than one of them. If a point is low on all three lists, the likelihood of finding a strictly higher point is greater. Otherwise, a higher place in one of the lists means less iterations.
JavaScript code:
function strictlyLessThan(p1,p2){
return p1[0] < p2[0] && p1[1] < p2[1] && p1[2] < p2[2];
}
// iterations
var it = 0;
function f(ps){
var res = 0,
indexes = new Array(ps.length);
// sort by x
var sortedX =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][0] - ps[b][0]; });
// record index of point in x-sorted list
for (var i=0; i<sortedX.length; i++){
indexes[sortedX[i]] = [i,null,null];
}
// sort by y
var sortedY =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][1] - ps[b][1]; });
// record index of point in y-sorted list
for (var i=0; i<sortedY.length; i++){
indexes[sortedY[i]][1] = i;
}
// sort by z
var sortedZ =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][2] - ps[b][2]; });
// record index of point in z-sorted list
for (var i=0; i<sortedZ.length; i++){
indexes[sortedZ[i]][2] = i;
}
// check for possible greater points only in the list
// where the point is highest
for (var i=0; i<ps.length; i++){
var listToCheck,
startIndex;
if (indexes[i][0] > indexes[i][1]){
if (indexes[i][0] > indexes[i][2]){
listToCheck = sortedX;
startIndex = indexes[i][0];
} else {
listToCheck = sortedZ;
startIndex = indexes[i][2];
}
} else {
if (indexes[i][1] > indexes[i][2]){
listToCheck = sortedY;
startIndex = indexes[i][1];
} else {
listToCheck = sortedZ;
startIndex = indexes[i][2];
}
}
var j = startIndex + 1;
while (listToCheck[j] !== undefined){
it++;
var point = ps[listToCheck[j]];
if (strictlyLessThan(ps[i],point)){
res++;
break;
}
j++;
}
}
return res;
}
// var input = [[5,0,0],[4,1,0],[3,2,0],[2,3,0],[1,4,0],[0,5,0],[4,0,1],[3,1,1], [2,2,1],[1,3,1],[0,4,1],[3,0,2],[2,1,2],[1,2,2],[0,3,2],[2,0,3], [1,1,3],[0,2,3],[1,0,4],[0,1,4],[0,0,5]];
var input = new Array(10000);
for (var i=0; i<input.length; i++){
input[i] = [Math.random(),Math.random(),Math.random()];
}
console.log(input.length + ' points');
console.log('result: ' + f(input));
console.log(it + ' iterations not including sorts');
I doubt that the worst-case complexity can be reduced below N×N, because it is possible to create input where no point is strictly less than any other point:
For any value n, consider the plane that intersects with the Z, Y and Z axis at (n,0,0), (0,n,0) and (0,0,n), described by the equation x+y+z=n. If the input consists of points on such a plane, none of the points is strictly less than any other point.
Example of worst-case input:
(5,0,0) (4,1,0) (3,2,0) (2,3,0) (1,4,0) (0,5,0)
(4,0,1) (3,1,1) (2,2,1) (1,3,1) (0,4,1)
(3,0,2) (2,1,2) (1,2,2) (0,3,2)
(2,0,3) (1,1,3) (0,2,3)
(1,0,4) (0,1,4)
(0,0,5)
However, the average complexity can be reduced to much less than N×N, e.g. with this approach:
Take the first point from the input and put it in a list.
Take the second point from the input, and compare it to the first
point in the list. If it is strictly less, discard the new point. If
it is strictly greater, replace the point in the list with the new
point. If it is neither, add the point to the list.
For each new point from the input, compare it to each point in the
list. If it is stricly less than any point in the list, discard the
new point. If it is strictly greater, replace the point in the list
with the new point, and also discard any further points in the list
which are strictly less than the new point. If the new point is not
strictly less or greater than any point in the list, add the new
point to the list.
After checking every point in the input, the result is the number of
points in the input minus the number of points in the list.
Since the probability that for any two random points a and b either a<b or b<a is 25%, the list won't grow to be very large (unless the input is specifically crafted to contain few or no points that are strictly less than any other point).
Limited testing with the code below (100 cases) with 1,000,000 randomly distributed points in a cubic space shows that the average list size is around 116 (with a maximum of 160), and the number of checks whether a point is strictly less than another point is around 1,333,000 (with a maximum of 2,150,000).
(And a few tests with 10,000,000 points show that the average number of checks is around 11,000,000 with a list size around 150.)
So in practice, the average complexity is close to N rather than N×N.
function xyzLessCount(input) {
var list = [input[0]]; // put first point in list
for (var i = 1; i < input.length; i++) { // check every point in input
var append = true;
for (var j = 0; j < list.length; j++) { // against every point in list
if (xyzLess(input[i], list[j])) { // new point < list point
append = false;
break; // continue with next point
}
if (xyzLess(list[j], input[i])) { // new point > list point
list[j] = input[i]; // replace list point
for (var k = list.length - 1; k > j; k--) {
if (xyzLess(list[k], list[j])) { // check rest of list
list.splice(k, 1); // remove list point
}
}
append = false;
break; // continue with next point
}
}
if (append) list.push(input[i]); // append new point to list
}
return input.length - list.length;
function xyzLess(a, b) {
return a.x < b.x && a.y < b.y && a.z < b.z;
}
}
var points = []; // random test data
for (var i = 0; i < 1000000; i++) {
points.push({x: Math.random(), y: Math.random(), z: Math.random()});
}
document.write("1000000 → " + xyzLessCount(points));

Logic to randomly reorder remaining tile positions using a tile map array

The title explains most of the question.
I have a tile grid which is represented by a 2D array. Some tiles are marked as empty (but they exist in the array, for certain continued uses) while others are in normal state.
What I need to do is, to reorder the remaining (non-empty) tiles in the grid so that all (or most) are in a different non-empty position. If I just iterate all the non-empty positions and swap the tile with another random one, I might be already reordering many of them automatically (the swapped ones).
So I was wondering if there's some technique I can follow so as to reorder the grid satisfactorily with minimal looping. Any hints?
public void RandomizeGrid<T>(T[,] grid, Func<T,bool> isEmpty)
{
// Create a list of the indices of all non-empty cells.
var indices = new List<Point>();
int width = grid.GetLength(0);
int height = grid.GetLength(1);
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
if (!isEmpty(grid[x,y])) // function to check emptiness
{
indices.Add(new Point(x,y));
}
}
}
// Randomize the cells using the index-array as displacement.
int n = indices.Count;
var rnd = new Random();
for (int i = 0; i < n; i++)
{
int j = rnd.Next(i,n); // Random index i <= j < n
if (i != j)
{
// Swap the two cells
var p1 = indices[i];
var p2 = indices[j];
var tmp = grid[p1.X,p1.Y];
grid[p1.X,p1.Y] = grid[p2.X,p2.Y];
grid[p2.X,p2.Y] = tmp;
}
}
}
Would it meet your needs ("satisfactorily" is a bit vague) to ensure that every non empty tile was swapped with one other non-empty tile one time?
Say you have a list :
(1,4,7,3,8,10)
we can write down the indicies of the list
(0,1,2,3,4,5)
and perform N random swaps on the indices to shuffle it - maybe some numbers move, some don't.
(5,1,3,2,4,0)
Then take these pairwise as a sequence of swaps to perform on our original list.
(8,10,3,7,1,4)
if you have an odd number of elements, the leftover is swapped with any other element in the list.

Remove duplicate items with minimal auxiliary memory?

What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this?
I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing:
Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
Keeping auxillary memory usage to a minimum, your best bet would be to do an efficient sort to get them in order, then do a single pass of the array with a FROM and TO index.
You advance the FROM index every time through the loop. You only copy the element from FROM to TO (and increment TO) when the key is different from the last.
With Quicksort, that'll average to O(n-log-n) and O(n) for the final pass.
If you sort the array, you will still need another pass to remove duplicates, so the complexity is O(NN) in the worst case (assuming Quicksort), or O(Nsqrt(N)) using Shellsort.
You can achieve O(N*N) by simply scanning the array for each element removing duplicates as you go.
Here is an example in Lua:
function removedups (t)
local result = {}
local count = 0
local found
for i,v in ipairs(t) do
found = false
if count > 0 then
for j = 1,count do
if v == result[j] then found = true; break end
end
end
if not found then
count = count + 1
result[count] = v
end
end
return result, count
end
I don't see any way to do this without something like a bubblesort. When you find a dupe, you need to reduce the length of the array. Quicksort is not designed for the size of the array to change.
This algorithm is always O(n^2) but it also use almost no extra memory -- stack or heap.
// returns the new size
int bubblesqueeze(int* a, int size) {
for (int j = 0; j < size - 1; ++j) {
for (int i = j + 1; i < size; ++i) {
// when a dupe is found, move the end value to index j
// and shrink the size of the array
while (i < size && a[i] == a[j]) {
a[i] = a[--size];
}
if (i < size && a[i] < a[j]) {
int tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
}
}
return size;
}
Is you have two different var for traversing a datadet insted of just one then you can limit the output by dismissing all diplicates that currently are already in the dataset.
Obvious this example in C is not an efficiant sorting algorith but it is just an example on one way to look at the probkem.
You could also blindly sort the data first and then relocate the data for removing dups, but I'm not sure that would be faster.
#define ARRAY_LENGTH 15
int stop = 1;
int scan_sort[ARRAY_LENGTH] = {5,2,3,5,1,2,5,4,3,5,4,8,6,4,1};
void step_relocate(char tmp,char s,int *dataset)
{
for(;tmp<s;s--)
dataset[s] = dataset[s-1];
}
int exists(int var,int *dataset)
{
int tmp=0;
for(;tmp < stop; tmp++)
{
if( dataset[tmp] == var)
return 1;/* value exsist */
if( dataset[tmp] > var)
tmp=stop;/* Value not in array*/
}
return 0;/* Value not in array*/
}
void main(void)
{
int tmp1=0;
int tmp2=0;
int index = 1;
while(index < ARRAY_LENGTH)
{
if(exists(scan_sort[index],scan_sort))
;/* Dismiss all values currently in the final dataset */
else if(scan_sort[stop-1] < scan_sort[index])
{
scan_sort[stop] = scan_sort[index];/* Insert the value as the highest one */
stop++;/* One more value adde to the final dataset */
}
else
{
for(tmp1=0;tmp1<stop;tmp1++)/* find where the data shall be inserted */
{
if(scan_sort[index] < scan_sort[tmp1])
{
index = index;
break;
}
}
tmp2 = scan_sort[index]; /* Store in case this value is the next after stop*/
step_relocate(tmp1,stop,scan_sort);/* Relocated data already in the dataset*/
scan_sort[tmp1] = tmp2;/* insert the new value */
stop++;/* One more value adde to the final dataset */
}
index++;
}
printf("Result: ");
for(tmp1 = 0; tmp1 < stop; tmp1++)
printf( "%d ",scan_sort[tmp1]);
printf("\n");
system( "pause" );
}
I liked the problem so I wrote a simple C test prog for it as you can see above. Make a comment if I should elaborate or you see any faults.

Resources