Can a redblack tree cope with this comparison function?

Can a redblack tree cope with this comparison function? - algorithm

I was thinking to use a RedBlack tree that does not support multiple insertion of the same key using a comparison function similar to this one:
int compare(MyObject A, MyObject B)
{
if (A.error > B.error) return 1;
if (A.error < B.error) return -1;
if (A.name == B.name) return 0;
return 1;
}
this trick would be useful to have multiple items with the same error, but different "name". If two items with the same error are found, but the value does not coincide, the comparing item is just treated as "bigger".
I am pretty sure that this trick would work with a normal BST...but I am having troubles with a redblack tree. I do not know the redblack tree algorithm, I am using an implementation, so I wonder if there is any reason why this should not work.
P.S.: name does not have a comparison relationship...so the only thing I can do is to check if they are the same.
P.P.S.: assuming that this does not work and knowing that I cannot have a order relation between the "name" values, what other possibilities do I have? I could use a data structure that allow to insert multiple values with the same key, but that won't work for me, because when I delete a value, I must be sure that I am deleting the value I am actually passing (basically for me the key and the value hare the same thing, I need a sort of ordered multiset data structure!)

Your binary search trees expect your comparison function to obey the usual rules for a total ordering over the elements you are going to insert into the tree. Your current comparison function doesn't obey this because if you have objects A and B with the same error key but different value keys then according to compare you both A < B and B < A are valid.
I think it should all work correctly if you change your comparison function to
int compare(MyObject A, MyObject B)
{
if (A.error > B.error) return 1;
if (A.error < B.error) return -1;
if (A.value > B.value) return 1;
if (A.value < B.value) return -1;
return 0;
}

You did not define an order relation.
In your case, your objects are two-dimensional. I understand from your question that the priority in ordering should be given to the error field. Thus, an order relation (using lexicographic order) should be :
struct my_object {
int error;
int value;
};
int compare(struct my_object *a, struct my_object *b)
{
int ret;
if (!a) {
return 1;
}
else if (!b) {
return -1;
}
ret = a->error - b->error;
if (!ret) {
ret = a->value - b->value;
}
return ret;
}

Related

About using a boolean array for memoization in a DP

I have a working recursive solution to a DP problem. I wish to memoize it.
Currently it depends on two states: the index i and a boolean variable true or false.
Could someone please point out how I could memoize it? Specifically, how I could initialize the memoization table (dp)?
I am confused because if I initialize the second state with false, I wouldn't be able to differentiate between the false that is due to initialization, versus the one where it is actually the value of the state.
Could someone please provide some advise?
Thanks.
To clarify further: This is how I declare the dp table right now:
vector<vector < bool > > dp;
How do I initialize the inner vector<bool>? I don't think I can set it to either true or false since I wouldn't be able to distinguish later if that is the value generated while executing (solving the problem) or the initialization value.
Edit: Adding the code:
class Solution {
public:
unordered_map<int, int> m1, m2;
vector<int> n1, n2;
vector<vector<int>> v;
int helper(int i, bool parsingNums1) {
if((parsingNums1 && i>=n1.size()) || (!parsingNums1 && i>=n2.size())) return v[i][parsingNums1]=0;
if(v[i][parsingNums1]!=-1) return v[i][parsingNums1];
int ans=0;
if(parsingNums1) {
//we are traversing path 1
//see if we can switch to path 2
if(m2.find(n1[i])!=m2.end())
ans=n1[i] + helper(m2[n1[i]]+1, false);
ans=max(ans, n1[i] + helper(i+1, true));
}
if(!parsingNums1) {
//we are traversing path 2
//see if we can switch to path 1
if(m1.find(n2[i])!=m1.end())
ans=n2[i] + helper(m1[n2[i]]+1, true);
ans=max(ans, n2[i] + helper(i+1, false));
}
return v[i][parsingNums1]=ans;
}
int maxSum(vector<int>& nums1, vector<int>& nums2) {
for(int i=0; i<nums1.size(); i++)
m1[nums1[i]]=i;
for(int i=0; i<nums2.size(); i++)
m2[nums2[i]]=i;
n1=nums1;
n2=nums2;
v.resize((nums1.size()>nums2.size()?nums1.size()+1:nums2.size()+1), vector<int>(2,-1));
return max(helper(0, true), helper(0, false))%(int)(1e9+7);
}
};
I am solving this LeetCode question: https://leetcode.com/problems/get-the-maximum-score/

There are 2 easy methods for handling this.
Declare another vector<vector < bool > > is_stored which is initialised as 0 and when dp[i][j] is calculated, mark is_stored[i][j] as 1. So when you are checking wether the particular state is being memorized, you can look into the is_stored.
Use vector< vector < int > > instead of vector< vector < bool > > and initialise -1 to every state to mark as not memorised.

Another way to store values is using
Map<String, Boolean> map = new HashMap<String, Boolean>(); // just a java version
then you can create key by appending i & j and store respective boolean value to that key. for example
String key = i + ',' + j;
// To validate if we calculated data before
if(map.containsKeys(key)) return map.get(key);
// To store/memoize values
boolean result = someDPmethod(); map.put(key, result);

In C# you can use nullable value types.
A nullable value type T? represents all values of its underlying value type T and an additional null value. For example, you can assign any of the following three values to a bool? variable: true, false, or null. An underlying value type T cannot be a nullable value type itself.
You can use null for indication of unvisited or unprocessed dp states.
you can simulate this in C++ by Initializing your dp memo to
vector<vector<bool*>> dp( m, vector<bool*>(n, nullptr) );
now you can use nullptr as an indicator for unprocessed dp states.

Make unique array with minimal sum

It is a interview question. Given an array, e.g., [3,2,1,2,7], we want to make all elements in this array unique by incrementing duplicate elements and we require the sum of the refined array is minimal. For example the answer for [3,2,1,2,7] is [3,2,1,4,7] and its sum is 17. Any ideas?

It's not quite as simple as my earlier comment suggested, but it's not terrifically complicated.
First, sort the input array. If it matters to be able to recover the original order of the elements then record the permutation used for the sort.
Second, scan the sorted array from left to right (ie from low to high). If an element is less than or equal to the element to its left, set it to be one greater than that element.
Pseudocode
sar = sort(input_array)
for index = 2:size(sar) ! I count from 1
if sar(index)<=sar(index-1) sar(index) = sar(index-1)+1
forend
Is the sum of the result minimal ? I've convinced myself that it is through some head-scratching and trials but I haven't got a formal proof.

If you only need to find ONE of the best solution, here's the algorythm with some explainations.
The idea of this problem is to find an optimal solution, which can be found only by testing all existing solutions (well, they're infinite, let's stick with the reasonable ones).
I wrote a program in C, because I'm familiar with it, but you can port it to any language you want.
The program does this: it tries to increment one value to the max possible (I'll explain how to find it in the comments under the code sections), than if the solution is not found, decreases this value and goes on with the next one and so on.
It's an exponential algorythm, so it will be very slow on large values of duplicated data (yet, it assures you the best solution is found).
I tested this code with your example, and it worked; not sure if there's any bug left, but the code (in C) is this.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int BOOL; //just to ease meanings of values
#define TRUE 1
#define FALSE 0
Just to ease comprehension, I did some typedefs. Don't worry.
typedef struct duplicate { //used to fasten the algorythm; it uses some more memory just to assure it's ok
int value;
BOOL duplicate;
} duplicate_t;
int maxInArrayExcept(int *array, int arraySize, int index); //find the max value in array except the value at the index given
//the result is the max value in the array, not counting th index
int *findDuplicateSum(int *array, int arraySize);
BOOL findDuplicateSum_R(duplicate_t *array, int arraySize, int *tempSolution, int *solution, int *totalSum, int currentSum); //resursive function used to find solution
BOOL check(int *array, int arraySize); //checks if there's any repeated value in the solution
These are all the functions we'll need. All split up for comprehension purpose.
First, we have a struct. This struct is used to avoid checking, for every iteration, if the value on a given index was originally duplicated. We don't want to modify any value not duplicated originally.
Then, we have a couple functions: first, we need to see the worst case scenario: every value after the duplicated ones is already occupied: then we need to increment the duplicated value up to the maximum value reached + 1.
Then, there are the main Function we'll discute later about.
The check Function only checks if there's any duplicated value in a temporary solution.
int main() { //testing purpose
int i;
int testArray[] = { 3,2,1,2,7 }; //test array
int nTestArraySize = 5; //test array size
int *solutionArray; //needed if you want to use the solution later
solutionArray = findDuplicateSum(testArray, nTestArraySize);
for (i = 0; i < nTestArraySize; ++i) {
printf("%d ", solutionArray[i]);
}
return 0;
}
This is the main Function: I used it to test everything.
int * findDuplicateSum(int * array, int arraySize)
{
int *solution = malloc(sizeof(int) * arraySize);
int *tempSolution = malloc(sizeof(int) * arraySize);
duplicate_t *duplicate = calloc(arraySize, sizeof(duplicate_t));
int i, j, currentSum = 0, totalSum = INT_MAX;
for (i = 0; i < arraySize; ++i) {
tempSolution[i] = solution[i] = duplicate[i].value = array[i];
currentSum += array[i];
for (j = 0; j < i; ++j) { //to find ALL the best solutions, we should also put the first found value as true; it's just a line more
//yet, it saves the algorythm half of the duplicated numbers (best/this case scenario)
if (array[j] == duplicate[i].value) {
duplicate[i].duplicate = TRUE;
}
}
}
if (findDuplicateSum_R(duplicate, arraySize, tempSolution, solution, &totalSum, currentSum));
else {
printf("No solution found\n");
}
free(tempSolution);
free(duplicate);
return solution;
}
This Function does a lot of things: first, it sets up the solution array, then it initializes both the solution values and the duplicate array, that is the one used to check for duplicated values at startup. Then, we find the current sum and we set the maximum available sum to the maximum integer possible.
Then, the recursive Function is called; this one gives us the info about having found the solution (that should be Always), then we return the solution as an array.
int findDuplicateSum_R(duplicate_t * array, int arraySize, int * tempSolution, int * solution, int * totalSum, int currentSum)
{
int i;
if (check(tempSolution, arraySize)) {
if (currentSum < *totalSum) { //optimal solution checking
for (i = 0; i < arraySize; ++i) {
solution[i] = tempSolution[i];
}
*totalSum = currentSum;
}
return TRUE; //just to ensure a solution is found
}
for (i = 0; i < arraySize; ++i) {
if (array[i].duplicate == TRUE) {
if (array[i].duplicate <= maxInArrayExcept(solution, arraySize, i)) { //worst case scenario, you need it to stop the recursion on that value
tempSolution[i]++;
return findDuplicateSum_R(array, arraySize, tempSolution, solution, totalSum, currentSum + 1);
tempSolution[i]--; //backtracking
}
}
}
return FALSE; //just in case the solution is not found, but we won't need it
}
This is the recursive Function. It first checks if the solution is ok and if it is the best one found until now. Then, if everything is correct, it updates the actual solution with the temporary values, and updates the optimal condition.
Then, we iterate on every repeated value (the if excludes other indexes) and we progress in the recursion until (if unlucky) we reach the worst case scenario: the check condition not satisfied above the maximum value.
Then we have to backtrack and continue with the iteration, that will go on with other values.
PS: an optimization is possible here, if we move the optimal condition from the check into the for: if the solution is already not optimal, we can't expect to find a better one just adding things.
The hard code has ended, and there are the supporting functions:
int maxInArrayExcept(int *array, int arraySize, int index) {
int i, max = 0;
for (i = 0; i < arraySize; ++i) {
if (i != index) {
if (array[i] > max) {
max = array[i];
}
}
}
return max;
}
BOOL check(int *array, int arraySize) {
int i, j;
for (i = 0; i < arraySize; ++i) {
for (j = 0; j < i; ++j) {
if (array[i] == array[j]) return FALSE;
}
}
return TRUE;
}
I hope this was useful.
Write if anything is unclear.

Well, I got the same question in one of my interviews.
Not sure if you still need it. But here's how I did it. And it worked well.
num_list1 = [2,8,3,6,3,5,3,5,9,4]
def UniqueMinSumArray(num_list):
max=min(num_list)
for i,V in enumerate(num_list):
while (num_list.count(num_list[i])>1):
if (max > num_list[i]+1) :
num_list[i] = max + 1
else:
num_list[i]+=1
max = num_list[i]
i+=1
return num_list
print (sum(UniqueMinSumArray(num_list1)))
You can try with your list of numbers and I am sure it will give you the correct unique minimum sum.

I got the same interview question too. But my answer is in JS in case anyone is interested.
For sure it can be improved to get rid of for loop.
function getMinimumUniqueSum(arr) {
// [1,1,2] => [1,2,3] = 6
// [1,2,2,3,3] = [1,2,3,4,5] = 15
if (arr.length > 1) {
var sortedArr = [...arr].sort((a, b) => a - b);
var current = sortedArr[0];
var res = [current];
for (var i = 1; i + 1 <= arr.length; i++) {
// check current equals to the rest array starting from index 1.
if (sortedArr[i] > current) {
res.push(sortedArr[i]);
current = sortedArr[i];
} else if (sortedArr[i] == current) {
current = sortedArr[i] + 1;
// sortedArr[i]++;
res.push(current);
} else {
current++;
res.push(current);
}
}
return res.reduce((a,b) => a + b, 0);
} else {
return 0;
}
}

Find kth min node in AVL tree

I now have built a AVL tree, Here is a function to find kth min node in AVL tree
(k started from 0)
Code:
int kthMin(int k)
{
int input=k+1;
int count=0;
return KthElement(root,count,input);
}
int KthElement( IAVLTreeNode * root, int count, int k)
{
if( root)
{
KthElement(root->getLeft(), count,k);
count ++;
if( count == k)
return root->getKey();
KthElement(root->getRight(),count,k);
}
return NULL;
}
It can find some of right nodes, but some may fail, anyone can help me debug this>
THanks

From the root, after recursing left, count will be 1, regardless of how many nodes are on the left.
You need to change count in the recursive calls, so change count to be passed by reference (assuming this is C++).
int KthElement( IAVLTreeNode * root, int &count, int k)
(I don't think any other code changes are required to get pass by reference to work here).
And beyond that you need to actually return the value generated in the recursive call, i.e. change:
KthElement(root->getLeft(), count, k);
to:
int val = KthElement(root->getLeft(), count, k);
if (val != 0)
return val;
And similarly for getRight.
Note I used 0, not NULL. NULL is typically used to refer to a null pointer, and it converts to a 0 int (the latter is preferred when using int).
This of course assumes that 0 isn't a valid node in your tree (otherwise your code won't work). If it is, you'll need to find another value to use, or a pointer to the node instead (in which case you can use NULL to indicate not found).

Here is simple algorithm for Kth smallest node in any tree in general:-
count=0, found=false;
kthElement(Node p,int k) {
if(p==NULL)
return -1
else {
value = kthElement(p.left)
if(found)
return value
count++
if(count==k) {
found = true
return p.value
}
value = kthElement(p.right)
return value
}
}
Note:- Use of global variables is the key.

Bullet Physics - btHashMap performance in collision filter callback

The built-in collision filter groups and masks are not enough to store the information needed by my collision filter callback.
I have to use a hash table to store my collision data, and the btHashMap, wich is also included in the library seems the way to go, because of the SIMD code optimizations.
Inspecting the source code I've found that searching for an element in a btHashMap depends on the number of stored elements, so it is not really O(1) but O(n).
Value* find(const Key& key)
{
int index = findIndex(key);
if (index == BT_HASH_NULL)
{
return NULL;
}
return &m_valueArray[index];
}
int findIndex(const Key& key) const
{
unsigned int hash = key.getHash() & (m_valueArray.capacity()-1);
if (hash >= (unsigned int)m_hashTable.size())
{
return BT_HASH_NULL;
}
int index = m_hashTable[hash];
while ((index != BT_HASH_NULL) && key.equals(m_keyArray[index]) == false)
{
index = m_next[index];
}
return index;
}
Although the table will be storing no more than 10 elements, would it be a performance hit to use btHashMap in that way?

Remove duplicate items with minimal auxiliary memory?

What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this?

I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing:
Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}

Keeping auxillary memory usage to a minimum, your best bet would be to do an efficient sort to get them in order, then do a single pass of the array with a FROM and TO index.
You advance the FROM index every time through the loop. You only copy the element from FROM to TO (and increment TO) when the key is different from the last.
With Quicksort, that'll average to O(n-log-n) and O(n) for the final pass.

If you sort the array, you will still need another pass to remove duplicates, so the complexity is O(NN) in the worst case (assuming Quicksort), or O(Nsqrt(N)) using Shellsort.
You can achieve O(N*N) by simply scanning the array for each element removing duplicates as you go.
Here is an example in Lua:
function removedups (t)
local result = {}
local count = 0
local found
for i,v in ipairs(t) do
found = false
if count > 0 then
for j = 1,count do
if v == result[j] then found = true; break end
end
end
if not found then
count = count + 1
result[count] = v
end
end
return result, count
end

I don't see any way to do this without something like a bubblesort. When you find a dupe, you need to reduce the length of the array. Quicksort is not designed for the size of the array to change.
This algorithm is always O(n^2) but it also use almost no extra memory -- stack or heap.
// returns the new size
int bubblesqueeze(int* a, int size) {
for (int j = 0; j < size - 1; ++j) {
for (int i = j + 1; i < size; ++i) {
// when a dupe is found, move the end value to index j
// and shrink the size of the array
while (i < size && a[i] == a[j]) {
a[i] = a[--size];
}
if (i < size && a[i] < a[j]) {
int tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
}
}
return size;
}

Is you have two different var for traversing a datadet insted of just one then you can limit the output by dismissing all diplicates that currently are already in the dataset.
Obvious this example in C is not an efficiant sorting algorith but it is just an example on one way to look at the probkem.
You could also blindly sort the data first and then relocate the data for removing dups, but I'm not sure that would be faster.
#define ARRAY_LENGTH 15
int stop = 1;
int scan_sort[ARRAY_LENGTH] = {5,2,3,5,1,2,5,4,3,5,4,8,6,4,1};
void step_relocate(char tmp,char s,int *dataset)
{
for(;tmp<s;s--)
dataset[s] = dataset[s-1];
}
int exists(int var,int *dataset)
{
int tmp=0;
for(;tmp < stop; tmp++)
{
if( dataset[tmp] == var)
return 1;/* value exsist */
if( dataset[tmp] > var)
tmp=stop;/* Value not in array*/
}
return 0;/* Value not in array*/
}
void main(void)
{
int tmp1=0;
int tmp2=0;
int index = 1;
while(index < ARRAY_LENGTH)
{
if(exists(scan_sort[index],scan_sort))
;/* Dismiss all values currently in the final dataset */
else if(scan_sort[stop-1] < scan_sort[index])
{
scan_sort[stop] = scan_sort[index];/* Insert the value as the highest one */
stop++;/* One more value adde to the final dataset */
}
else
{
for(tmp1=0;tmp1<stop;tmp1++)/* find where the data shall be inserted */
{
if(scan_sort[index] < scan_sort[tmp1])
{
index = index;
break;
}
}
tmp2 = scan_sort[index]; /* Store in case this value is the next after stop*/
step_relocate(tmp1,stop,scan_sort);/* Relocated data already in the dataset*/
scan_sort[tmp1] = tmp2;/* insert the new value */
stop++;/* One more value adde to the final dataset */
}
index++;
}
printf("Result: ");
for(tmp1 = 0; tmp1 < stop; tmp1++)
printf( "%d ",scan_sort[tmp1]);
printf("\n");
system( "pause" );
}
I liked the problem so I wrote a simple C test prog for it as you can see above. Make a comment if I should elaborate or you see any faults.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Can a redblack tree cope with this comparison function? - algorithm

Related

About using a boolean array for memoization in a DP

Make unique array with minimal sum

Find kth min node in AVL tree

Bullet Physics - btHashMap performance in collision filter callback

Remove duplicate items with minimal auxiliary memory?

Categories

Resources