How to improve recursive backtracking algorithm

How to improve recursive backtracking algorithm - algorithm

I implemented backtracking based solution for my problem which I specified in my previous post: Packing items into fixed number of bins
(Bin is a simple wrapper for vector<int> datatype with additional methods such as sum() )
bool backtrack(vector<int>& items, vector<Bin>& bins, unsigned index, unsigned bin_capacity)
{
if (bin_capacity - items.front() < 0) return false;
if (index < items.size())
{
//try to put an item into all opened bins
for(unsigned i = 0; i < bins.size(); ++i)
{
if (bins[i].sum() + items[index] + items.back() <= bin_capacity || bin_capacity - bins[i].sum() == items[index])
{
bins[i].add(items[index]);
return backtrack(items, bins, index + 1, bin_capacity);
}
}
//put an item without exceeding maximum number of bins
if (bins.size() < BINS)
{
Bin new_bin = Bin();
bins.push_back(new_bin);
bins.back().add(items[index]);
return backtrack(items, bins, index + 1, bin_capacity);
}
}
else
{
//check if solution has been found
if (bins.size() == BINS )
{
for (unsigned i = 0; i <bins.size(); ++i)
{
packed_items.push_back(bins[i]);
}
return true;
}
}
return false;
}
Although this algorithm works quite fast, it's prone to stack overflow for large data sets.
I'm looking for any ideas and suggestions how to improve it.
Edit:
I decided to try an iterative approach with explicit stack, but my solution doesn't work as expeced - sometimes it gives incorrect results.
bool backtrack(vector<int>& items, vector<Bin>& bins, unsigned index, unsigned bin_capacity)
{
stack<Node> stack;
Node node, child_node;
Bin new_bin;
//init the stack
node.bins.add(new_bin);
node.bins.back().add(items[item_index]);
stack.push(node);
item_index++;
while(!stack.empty())
{
node = stack.top();
stack.pop();
if (item_index < items.size())
{
if (node.bins.size() < BINS)
{
child_node = node;
Bin empty;
child_node.bins.add(empty);
child_node.bins.back().add(items[item_index]);
stack.push(child_node);
}
int last_index = node.bins.size() - 1;
for (unsigned i = 0; i < node.bins.size(); i++)
{
if (node.bins[last_index - i]->get_sum() + items[item_index]+ items.back() <= bin_capacity ||
bin_capacity - node.bins[last_index - i]->get_sum() == items[item_index])
{
child_node = node;
child_node.bins[last_index - i]->push_back(items[item_index]);
stack.push(child_node);
}
}
item_index++;
}
else
{
if (node.bins() == BINS)
{
//copy solution
bins = node.bins;
return true;
}
}
}
return false;
}
Any suggestions are highly appreciated.

I think there's a dynamic programming algorithm for solving the multiple-bin packing problem, or at least, a polynomial approximation algorithm. Take a look here and here.

Related

A Scapegoat Tree That Just Won't Balance

So, I'm working on this project for Comp 272, Data Structures and Algorithms, and before anyone asks I have no one to help me. It's an online program through Athabasca University and for some unknown reason they didn't supply me with a tutor for this course, which is a first... So... Yeah. The question is as follows:
"(20 marks) Exercise 8.2. Illustrate what happens when the sequence 1, 5, 2, 4, 3 is added to an empty ScapegoatTree, and show where the credits described in the proof of Lemma 8.3 go, and how they are used during this sequence of additions."
This is my code, its complete and it compiles:
/*
Name: Westcott.
Assignment: 2, Question 3.
Date: 08-26-2022.
"(20 marks) Exercise 8.2. Illustrate what happens when the sequence 1, 5, 2, 4, 3 is added to an empty
ScapegoatTree, and show where the credits described in the proof of Lemma 8.3 go, and how they are used
during this sequence of additions."
*/
#include <iostream>
using namespace std;
class Node { // Originally I did this with Node as a subclass of sgTree but I found that this
public: // way was easier. This is actually my second attempt, from scratch, at doing this
int data; // problem. First version developed so many bugs I couldn't keep up with them.
Node* left;
Node* right;
Node* parent;
Node() : data(0), parent(NULL), left(NULL), right(NULL) {};
Node(int x) : data(x), parent(NULL), left(NULL), right(NULL) {};
~Node() {}; // Normally I would do a little more work on clean up but... Yea this problem didn't leave me much room.
Node* binarySearch(Node* root, int x); // The Node class only holds binarySearch in addition to its
// constructors/destructor, and of course the Node*'s left, right and parent.
};
class sgTree { // The sgTree keeps track of the root, n (the number of nodes in the tree), and q which is
public: // as Pat put it a 'high water mark'.
Node* root;
int n;
int q;
sgTree() : root(new Node()), n(1), q(1) {}
sgTree(int x) : root(new Node(x)), n(0), q(0) {}
~sgTree() {
delete root;
}
bool add(int x); // The add function is compounded, within it are findDepth and rebuild.
bool removeX(int x); // removeX works, but it didn't have a big part to play in this question,
int findDepth(Node* addedNode); // but I'll include it to maintain our sorted set interface.
void printTree(Node* u, int space) { // This was extra function I wrote to help me problem solve.
cout << "BINARY TREE DISPLAY" << endl; // this version only prints a title and then it calls printTreeSub on line 46.
cout << "________________________________________________\n\n" << endl;
printTreeSub(u, space);
cout << "________________________________________________\n\n" << endl;
}
int printTreeSub(Node* u, int space); // Function definition for this is on line 81.
int storeInArray(Node* ptr, Node* arr[], int i);// this is our function for storing all the elements of a tree in an array.
int size(Node* u); // this is size, defined on line 74.
void rebuild(Node* u); // And rebuild and buildBalanced are the stars of the show, defined on lines 262 and 282
Node* buildBalanced(Node** a, int i, int ns); // just above the main() funciton.
};
int log32(int q) { // As you can see there's two versions of this function.
int c = 0; // this is supposed to return the log of n to base 3/2.
while (q != 0) { // The version below I got from this website:
q = q / 2; // https://www.geeksforgeeks.org/scapegoat-tree-set-1-introduction-insertion/
c++; // It works fine but I prefer the one I wrote.
} // this is a much simpler function. It just divides q until its zero
return c; // and increments c on each division. Its not exact but it is based on what Pat said
} // in this lecture: https://www.youtube.com/watch?v=OGNUoDPVRCc&t=4852s
/*
static int const log32(int n)
{
double const log23 = 2.4663034623764317;
return (int)ceil(log23 * log(n));
}
*/
int sgTree::size(Node* u) {
if (u == NULL) {
return 0;
}
return 1 + size(u->left) + size(u->right); // Recursion in size();
}
int sgTree::printTreeSub(Node* u, int space) { // Here is my strange print function
if (u == NULL) return space; // I say strange because I'm not even 100% sure
space--; // how I got it to work. The order itself I worked out, but I built it
space -= printTreeSub(u->left, space); // and, originally, got a half decent tree, but then I just kept playing
if (u->right == NULL && u->left == NULL) { // around with increments, decrements, and returned values
cout << "\n\n\n" << u->data << "\n\n\n" << endl; // of space until it just sort of came together.
return 1; // Basically it prints the left most Node first and then prints every node
} // beneath that using recursion. I realized that by setting the for loop
for (int i = space; i >= 0; i--) { // on line 89 I could imitate different nodes having different heights in
cout << " "; // the tree. I figured that using n as an input I could take advantage of
} // the recursion to get an accurate tree. That much I understand.
cout << " " << u->data << "'s children are: "; // But it didn't work out quite how I wanted it to so I just kept playing
if (u->left != NULL) { // with space increments and decrements on different sides of the tree until
cout << u->left->data; // I got something pretty good.
}
else {
cout << "NULL";
}
if (u->right != NULL) {
cout << " and " << u->right->data;
}
else {
cout << " NULL";
}
cout << "\n\n" << endl;
space--;
space -= printTreeSub(u->right, space);
return 1;
}
int sgTree::storeInArray(Node* ptr, Node* a[], int i) { // This function took me a while to figure out.
if (ptr == NULL) { // The recursive insertions of values using i, when
return i; // i is defined by the very same recursion, makes this
} // a bit of a challenge to get your head around.
i = storeInArray(ptr->left, a, i); // Basically its just taking advantage on an inOrder
a[i] = ptr; // transversal to get the values stored into the array
i++; // in order from least to greatest.
return storeInArray(ptr->right, a, i);
}
Node* Node::binarySearch(Node* root, int x) { // I covered this in another question.
if (root->data == x) {
return root;
}
else if (x < root->data) {
if (root->left == NULL) {
return root;
}
return binarySearch(root->left, x);
}
else if (x > root->data) {
if (root->right == NULL) {
return root;
}
return binarySearch(root->right, x);
}
}
bool sgTree::add(int x) { // The add function itself isn't too difficult.
Node* addedNode = new Node(x); // We make a Node using our data, then we search for that Node
Node* parent = root->binarySearch(root, x); // in the tree. I amended binarySearch to return the parent
addedNode->parent = parent; // if it hits a NULL child, on lines 127 and 133.
if (x < parent->data) { // That way the new Node can just go into the returned parents child
parent->left = addedNode; // here is where we choose whether it enters the left or the right.
}
else if (x > parent->data) {
parent->right = addedNode;
}
int h = findDepth(addedNode); // We run findDepth() on the addedNode. I realize that this probably should
// have been a part of the binarySearch, it means we go down
if (h > log32(q)) { // the tree twice instead of once. I did look at changing binarySearch into searchAndDepth
// having binarySearch return an int for the height isn't a problem, but then that would
// mess up removeX and, I don't know. What's more important?
Node* w = addedNode->parent; // If this were going to be a database hosting millions of pieces of data I would give
while (3 * size(w) < 2 * size(w->parent)) { // that alot more consideration but, this is just an exercise after all so...
w = w->parent; // From there, we compare our height to the value output by log32(q) on line 152.
}
rebuild(w); // This expression 3 * size(w) < 2 * size(w->parent) is the formula on page 178 rewritten
//rebuild(root); // as a cross multiplication, clever. It keeps going up the tree until we find the scapegoat w.
// This is a much nicer result.
//See line 311.
} // Now, this is where my problems began. Pat says that this line should read: rebuild(w->parent);
n++; // but when I do that I get an error when w is the root. Because then w->parent is NULL. And in that case
q++; // line 258 throws an error because we're trying to set p equal to NULL's parent. It's not there.
return true; // So my work around was to just offset this by one and send rebuild(w). But that doesn't seem
} // to balance the tree just right. In fact, the best tree results when we replace w with root.
// and just rebalance the whole tree. But in any case, we increment n and q and lets pick this up on line 256.
int sgTree::findDepth(Node* addedNode) {
int d = 0;
while (addedNode != root) {
addedNode = addedNode->parent;
d++;
}
return d;
}
bool sgTree::removeX(int x) {
Node* u = root->binarySearch(root, x);
if (u->left == NULL && u->right == NULL) {
if (u == u->parent->left) {
u->parent->left = NULL;
}
if (u == u->parent->right) {
u->parent->right = NULL;
}
cout << u->data << " deleted" << endl;
n--;
delete u;
return true;
}
if (u->left != NULL && u->right == NULL) {
if (u->parent->left = u) {
u->parent->left = u->left;
}
else if (u->parent->right = u) {
u->parent->right = u->left;
}
cout << u->data << " deleted" << endl;
n--;
delete u;
return true;
}
if (u->left == NULL && u->right != NULL) {
if (u == u->parent->left) {
u->parent->left = u->right;
u->right->parent = u->parent;
}
else if (u == u->parent->right) {
u->parent->right = u->right;
u->right->parent = u->parent;
}
cout << u->data << " deleted" << endl;
n--;
delete u;
return true;
}
if (u->left != NULL && u->right != NULL) {
Node* X = u->right;
if (X->left == NULL) {
X->left = u->left;
if (u->parent != NULL) {
if (u->parent->right == u) {
u->parent->right == X;
}
else if (u->parent->left == u) {
u->parent->left = X;
}
}
else {
root = X;
}
X->parent = u->parent;
cout << u->data << " deleted" << endl;
n--;
delete u;
return true;
}
while (X->left != NULL) {
X = X->left;
}
X->parent->left = NULL;
X->left = u->left;
X->right = u->right;
if (u->parent != NULL) {
X->parent = u->parent;
}
cout << u->data << " deleted" << endl;
n--;
root = X;
delete u;
return true;
}
}
void sgTree::rebuild(Node* u) {
int ns = size(u); // Everything is pretty kosher here. Just get the number of nodes in the subtree.
Node* p = u->parent; // Originally I had n here instead of ns and... I don't want to talk about how long it took me to find that mistake...
/* It's funny because while writing the comments for this I'm like "Oh, hang on, if I just push the definition of p behind the if statement on line 262
and evaluate for whether or not u is NULL instead of p, that should solve all my problems! Yea, no, it doesn't. Because then for some reason it tries rebalancing
empty tree and... Yea I just have to stop myself from trying to fix this because everytime I do I get caught in an infinite loop of me chasing my tail in errors.
I think a solution could be found in buildBalanced, and I literally went through that function line by line, trying to comprehend a work around. I've included at
a photograph of that white board. Yea this is the code that Pat gave us... and its garbage. It doesn't work. Maybe its a C++ thing, I don't know... But I'm
getting frustrated again so I'm going to stop thinking about this part RIGHT HERE, and move on LOL*/
Node** a = new Node * [ns]; // a Node pointer-pointer array... again, another fine piece of code from the textbook. Sorry, trying to stay positive here.
storeInArray(u, a, 0); // See Line 112
if (p == NULL) { // Okay, once we have our array we use buildBalanced to rebuild the subtree with respect to which
root = buildBalanced(a, 0, ns); // child u is relative to its parent.
root->parent = NULL; // See line 281 for buildBalanced().
}
else if (p->right == u) {
p->right = buildBalanced(a, 0, ns);
p->right->parent = p;
}
else {
p->left = buildBalanced(a, 0, ns);
p->left->parent = p;
}
}
Node* sgTree::buildBalanced(Node** a, int i, int ns) { // This is without a doubt one of the hardest functions I've ever had
if (ns == 0) { // the displeasure of trying to understand... Trying to stay positive.
return NULL; // I've gone through it, in a line by line implementation of the array:
} // a[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} you can find that analysis in
int m = ns / 2; // the photo buildBalanced_Analysis.
a[i + m]->left = buildBalanced(a, i, m); // As confusing as it is, I have to admit that it is a beautiful function.
if (a[i + m]->left != NULL) { // It basically uses the two integers i and m to simultaneously
a[i + m]->left->parent = a[i + m]; // regulate the organization of the new tree and to specifically
} // grab the right value from the array when its needed.
a[i + m]->right = buildBalanced(a, i + m + 1, ns - m - 1); // but trying to map this out didn't help me to solve the issues I've been having.
if (a[i + m]->right != NULL) {
a[i + m]->right->parent = a[i + m];
}
return a[i + m];
}
int main() {
sgTree newTree(1);
int a[] = { 5, 2, 4, 3 };
for (int i = 0; i < (sizeof(a) / sizeof(a[0])); i++) {
newTree.add(a[i]);
}
newTree.printTree(newTree.root, newTree.n);
/*
This is a nice test, when paired with rebuild(root), that too me is the only thing that approaches redeeming this whole question.
sgTree newTreeB(1);
int b[] = { 2, 3, 4, 5, 6, 7, 8, 9, 10 };
for (int i = 0; i < (sizeof(b) / sizeof(b[0])); i++) {
newTreeB.add(b[i]);
}
newTreeB.printTree(newTreeB.root, newTreeB.n);
*/
}
Now the issue itself is not that hard to understand. My tree should look like this:
But instead, it looks like this, with 5 at the root and the values 1 and 4 as the leaves:
I'm confident that the problem lives somewhere around line 159 and in those first few calls to buildBalanced. The comments in the code itself elaborate more on the issue. I've spent days just pouring over this trying everything I can think of to make it work and... Yeah... I just can't figure it out.

Find K arrays that sum up to a given array with a certain accuracy

Let's say I have set containing thousands of arrays (let's fix it to 5000 of arrays) of a fixed size (size = 8) with non negative values. And I'm given another array of the same size with non negative values (Input Array). My task is to select some subset of arrays, with the condition that if I sum them together (summation of vectors) I would get the resultant array which is very close to a given Input Array with the desired accuracy (+-m).
For example if the desired result (input array) is (3, 2, 5) and accuracy = 2
Then of course the best set would be the one that would sum up to exactly (3,2,5) but also any solution of the following form would be ok (3 +- m, 2 +- m, 5 +- m).
The question is what could be the right algorithmic approach here? It is similar to multi dimensional sack problem, but there is no cost optimization section in my task.
At least one solution is required which meets the constraints. But several would be better, so that it would be possible to have a choice.

This is kind of extended knapsack problem. We know that it is NPC task to do which mean = we cannot use bruteforce and try all possibilities. It is just not computable with current computers.
What we can do is use some heuristic. One simple and useful is the simulated annealing. The principle is quite simple - at beginning of your algorithm, when the temperature is high - you are not afraid to take even the "at the moment worse solution" (which can actually lead to the best possible solution). So at beginning you take almost anything. Then you start cooling and more cool you are, the more causius you are so you are trying to improve your solution more and more and risk less and less.
The gifs on wiki are actually nice example: https://en.wikipedia.org/wiki/Simulated_annealing
I have also implemented solution that at the end prints whats the inputArray and what is your solution and the "negative score" (the less the better).
You are not guaranteed to get best/valid solution, but you can basically run this in some while cycle until you find solution good enough or you hit some threshold (like if you do not find good solution after running 100x times, you say "data not valid" or take the best of these "not good" solutions)
class Simulation {
constructor(size, allArrSize, inputArrayRange, ordinarySize, maxDif, overDifPenalisation) {
this.size = size;
this.allArrSize = allArrSize;
this.inputArrayRange = inputArrayRange;
this.ordinarySize = ordinarySize;
this.maxDif = maxDif;
this.overDifPenalisation = overDifPenalisation;
this.allArr = [];
this.solutionMap = new Map();
for (let i = 0; i < allArrSize; i++) {
let subarr = [];
for (let j = 0; j < size; j++) {
subarr.push(Math.round(Math.random() * ordinarySize));
}
this.allArr.push(subarr);
}
this.temperature = 100;
this.inputArray = [];
for (let i = 0; i < size; i++) {
this.inputArray.push(Math.round(Math.random() * inputArrayRange));
}
}
findBest() {
while (this.temperature > 0) {
const oldScore = this.countScore(this.solutionMap);
// console.log(oldScore);
let newSolution = new Map(this.solutionMap);
if (this.addNewOrRemove(true)) {
const newCandidate = Math.floor(Math.random() * this.allArrSize);
newSolution.set(newCandidate, true);
} else if (this.addNewOrRemove(false)) {
const deleteCandidate = Math.floor(Math.random() * this.solutionMap.size);
Simulation.deleteFromMapByIndex(newSolution, deleteCandidate);
} else {
const deleteCandidate = Math.floor(Math.random() * this.solutionMap.size);
Simulation.deleteFromMapByIndex(newSolution, deleteCandidate);
const newCandidate = Math.floor(Math.random() * this.allArrSize);
newSolution.set(newCandidate, true);
}
const newScore = this.countScore(newSolution);
if (newScore < oldScore) {
this.solutionMap = newSolution;
} else if ((newScore - oldScore) / newScore < this.temperature / 300) {
this.solutionMap = newSolution;
}
this.temperature -= 0.001;
}
console.log(this.countScore(this.solutionMap), 'Negative Score');
console.log(this.sumTheSolution(this.solutionMap).toString(), 'Solution');
console.log(this.inputArray.toString(), 'Input array');
console.log('Solution is built on these inputs:');
this.solutionMap.forEach((val, key) => console.log(this.allArr[key].toString()))
}
addNewOrRemove(addNew) {
const sum = this.sumTheSolution(this.solutionMap);
let dif = 0;
sum.forEach((val, i) => {
const curDif = this.inputArray[i] - val;
if (curDif < -this.maxDif) {
dif -= 1;
}
if (curDif > this.maxDif) {
dif += 1;
}
});
let chance;
if (addNew) {
chance = (dif + this.size - 1) / (this.size * 2);
} else {
chance = (-dif + this.size - 1) / (this.size * 2);
}
return chance > Math.random();
}
countScore(solution) {
const sum = this.sumTheSolution(solution);
let dif = 0;
sum.forEach((val, i) => {
let curDif = Math.abs(this.inputArray[i] - val);
if (curDif > this.maxDif) {
curDif += (curDif - this.maxDif) * this.overDifPenalisation;
}
dif += curDif;
});
return dif;
}
sumTheSolution(solution) {
const sum = Array(this.size).fill(0);
solution.forEach((unused, key) => this.allArr[key].forEach((val, i) => sum[i] += val));
return sum;
}
static deleteFromMapByIndex(map, index) {
let i = 0;
let toDelete = null;
map.forEach((val, key) => {
if (index === i) {
toDelete = key;
}
i++;
});
map.delete(toDelete);
}
}
const simulation = new Simulation(8, 5000, 1000, 100, 40, 100);
simulation.findBest();
You can play a bit with numbers to get waht you need (the speed of cooling, how it affects probability, some values in constructor etc.)

Understanding Big-O with a specific example

I am working on a rather simple question, to make sure that I understand these concepts.
The question is: there exists an array A of n elements, either being RED, WHITE, or BLUE. Rearrange the array such that all WHITE elements come before all BLUE elements, and all BLUE elements come before all RED elements. Construct an algorithm in O(n) time and O(1) space.
From my understanding, the pseudocode for the solution would be:
numW = numB = 0
for i = 0 to n:
if ARRAY[i] == WHITE:
numW++
else if ARRAY[i] == BLUE:
numB++
for i = 0 to n:
if numW > 0:
ARRAY[i] = WHITE
numW--
else if numB > 0:
ARRAY[i] = BLUE
numB--
else:
ARRAY[i] = RED
I believe it is O(n) because it runs through the loop twice and O(2n) is in O(n). I believe the space is O(1) because it is not dependent on the overall number of elements i.e. there will always be a count for each
Is my understanding correct?

If it's linear time, and your algorithm appears to be, then it's O(n) as you suspect. There's a great summary here: Big-O for Eight Year Olds?

Yes, your solution runs in O(n) time in O(1) space.
Below is my solution which also runs in O(n) time and O(1) space, but also works when we have references to objects, as #kenneth suggested in the comments.
import java.util.Arrays;
import java.util.Random;
import static java.lang.System.out;
class Color{
char c;
Color(char c){
this.c = c;
}
}
public class Solution {
private static void rearrangeColors(Color[] collection){
int ptr = 0;
// move all whites to the left
for(int i=0;i<collection.length;++i){
if(collection[i].c == 'W'){
swap(collection,ptr,i);
ptr++;
}
}
// move all blacks to the left after white
for(int i=ptr;i<collection.length;++i){
if(collection[i].c == 'B'){
swap(collection,ptr,i);
ptr++;
}
}
}
private static void swap(Color[] collection,int ptr1,int ptr2){
Color temp = collection[ptr1];
collection[ptr1] = collection[ptr2];
collection[ptr2] = temp;
}
private static void printColors(Color[] collection){
for(int i=0;i<collection.length;++i){
out.print(collection[i].c + ( i != collection.length - 1 ? "," : ""));
}
out.println();
}
public static void main(String[] args) {
// generate a random collection of 'Color' objects
Random r = new Random();
int array_length = r.nextInt(20) + 1;// to add 1 if in case 0 gets generated
Color[] collection = new Color[array_length];
char[] colors_domain = {'B','W','R'};
for(int i=0;i<collection.length;++i){
collection[i] = new Color(colors_domain[r.nextInt(3)]);
}
// print initial state
printColors(collection);
// rearrange them according to the criteria
rearrangeColors(collection);
// print final state
printColors(collection);
}
}

I won't say this is 100% correct, but a quick test case here did work. If anything, it shows the idea of being able to do it in one pass. Is it faster? Probably not. OP's answer I believe is still the best for this case.
#include <stdio.h>
char temp;
#define SWAP(a,b) { temp = a; a = b; b = temp;}
int main()
{
int n = 10;
char arr[] = "RWBRWBRWBR";
printf("%s\n", arr);
int white = 0;
for(int i=0; i<n; i++)
{
if(arr[i] == 'B')
{
SWAP(arr[i], arr[n-1]);
i--; n--;
}
else if(arr[i] == 'R')
{
SWAP(arr[i], arr[white]);
white++;
}
}
printf("%s\n", arr);
}

Make unique array with minimal sum

It is a interview question. Given an array, e.g., [3,2,1,2,7], we want to make all elements in this array unique by incrementing duplicate elements and we require the sum of the refined array is minimal. For example the answer for [3,2,1,2,7] is [3,2,1,4,7] and its sum is 17. Any ideas?

It's not quite as simple as my earlier comment suggested, but it's not terrifically complicated.
First, sort the input array. If it matters to be able to recover the original order of the elements then record the permutation used for the sort.
Second, scan the sorted array from left to right (ie from low to high). If an element is less than or equal to the element to its left, set it to be one greater than that element.
Pseudocode
sar = sort(input_array)
for index = 2:size(sar) ! I count from 1
if sar(index)<=sar(index-1) sar(index) = sar(index-1)+1
forend
Is the sum of the result minimal ? I've convinced myself that it is through some head-scratching and trials but I haven't got a formal proof.

If you only need to find ONE of the best solution, here's the algorythm with some explainations.
The idea of this problem is to find an optimal solution, which can be found only by testing all existing solutions (well, they're infinite, let's stick with the reasonable ones).
I wrote a program in C, because I'm familiar with it, but you can port it to any language you want.
The program does this: it tries to increment one value to the max possible (I'll explain how to find it in the comments under the code sections), than if the solution is not found, decreases this value and goes on with the next one and so on.
It's an exponential algorythm, so it will be very slow on large values of duplicated data (yet, it assures you the best solution is found).
I tested this code with your example, and it worked; not sure if there's any bug left, but the code (in C) is this.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int BOOL; //just to ease meanings of values
#define TRUE 1
#define FALSE 0
Just to ease comprehension, I did some typedefs. Don't worry.
typedef struct duplicate { //used to fasten the algorythm; it uses some more memory just to assure it's ok
int value;
BOOL duplicate;
} duplicate_t;
int maxInArrayExcept(int *array, int arraySize, int index); //find the max value in array except the value at the index given
//the result is the max value in the array, not counting th index
int *findDuplicateSum(int *array, int arraySize);
BOOL findDuplicateSum_R(duplicate_t *array, int arraySize, int *tempSolution, int *solution, int *totalSum, int currentSum); //resursive function used to find solution
BOOL check(int *array, int arraySize); //checks if there's any repeated value in the solution
These are all the functions we'll need. All split up for comprehension purpose.
First, we have a struct. This struct is used to avoid checking, for every iteration, if the value on a given index was originally duplicated. We don't want to modify any value not duplicated originally.
Then, we have a couple functions: first, we need to see the worst case scenario: every value after the duplicated ones is already occupied: then we need to increment the duplicated value up to the maximum value reached + 1.
Then, there are the main Function we'll discute later about.
The check Function only checks if there's any duplicated value in a temporary solution.
int main() { //testing purpose
int i;
int testArray[] = { 3,2,1,2,7 }; //test array
int nTestArraySize = 5; //test array size
int *solutionArray; //needed if you want to use the solution later
solutionArray = findDuplicateSum(testArray, nTestArraySize);
for (i = 0; i < nTestArraySize; ++i) {
printf("%d ", solutionArray[i]);
}
return 0;
}
This is the main Function: I used it to test everything.
int * findDuplicateSum(int * array, int arraySize)
{
int *solution = malloc(sizeof(int) * arraySize);
int *tempSolution = malloc(sizeof(int) * arraySize);
duplicate_t *duplicate = calloc(arraySize, sizeof(duplicate_t));
int i, j, currentSum = 0, totalSum = INT_MAX;
for (i = 0; i < arraySize; ++i) {
tempSolution[i] = solution[i] = duplicate[i].value = array[i];
currentSum += array[i];
for (j = 0; j < i; ++j) { //to find ALL the best solutions, we should also put the first found value as true; it's just a line more
//yet, it saves the algorythm half of the duplicated numbers (best/this case scenario)
if (array[j] == duplicate[i].value) {
duplicate[i].duplicate = TRUE;
}
}
}
if (findDuplicateSum_R(duplicate, arraySize, tempSolution, solution, &totalSum, currentSum));
else {
printf("No solution found\n");
}
free(tempSolution);
free(duplicate);
return solution;
}
This Function does a lot of things: first, it sets up the solution array, then it initializes both the solution values and the duplicate array, that is the one used to check for duplicated values at startup. Then, we find the current sum and we set the maximum available sum to the maximum integer possible.
Then, the recursive Function is called; this one gives us the info about having found the solution (that should be Always), then we return the solution as an array.
int findDuplicateSum_R(duplicate_t * array, int arraySize, int * tempSolution, int * solution, int * totalSum, int currentSum)
{
int i;
if (check(tempSolution, arraySize)) {
if (currentSum < *totalSum) { //optimal solution checking
for (i = 0; i < arraySize; ++i) {
solution[i] = tempSolution[i];
}
*totalSum = currentSum;
}
return TRUE; //just to ensure a solution is found
}
for (i = 0; i < arraySize; ++i) {
if (array[i].duplicate == TRUE) {
if (array[i].duplicate <= maxInArrayExcept(solution, arraySize, i)) { //worst case scenario, you need it to stop the recursion on that value
tempSolution[i]++;
return findDuplicateSum_R(array, arraySize, tempSolution, solution, totalSum, currentSum + 1);
tempSolution[i]--; //backtracking
}
}
}
return FALSE; //just in case the solution is not found, but we won't need it
}
This is the recursive Function. It first checks if the solution is ok and if it is the best one found until now. Then, if everything is correct, it updates the actual solution with the temporary values, and updates the optimal condition.
Then, we iterate on every repeated value (the if excludes other indexes) and we progress in the recursion until (if unlucky) we reach the worst case scenario: the check condition not satisfied above the maximum value.
Then we have to backtrack and continue with the iteration, that will go on with other values.
PS: an optimization is possible here, if we move the optimal condition from the check into the for: if the solution is already not optimal, we can't expect to find a better one just adding things.
The hard code has ended, and there are the supporting functions:
int maxInArrayExcept(int *array, int arraySize, int index) {
int i, max = 0;
for (i = 0; i < arraySize; ++i) {
if (i != index) {
if (array[i] > max) {
max = array[i];
}
}
}
return max;
}
BOOL check(int *array, int arraySize) {
int i, j;
for (i = 0; i < arraySize; ++i) {
for (j = 0; j < i; ++j) {
if (array[i] == array[j]) return FALSE;
}
}
return TRUE;
}
I hope this was useful.
Write if anything is unclear.

Well, I got the same question in one of my interviews.
Not sure if you still need it. But here's how I did it. And it worked well.
num_list1 = [2,8,3,6,3,5,3,5,9,4]
def UniqueMinSumArray(num_list):
max=min(num_list)
for i,V in enumerate(num_list):
while (num_list.count(num_list[i])>1):
if (max > num_list[i]+1) :
num_list[i] = max + 1
else:
num_list[i]+=1
max = num_list[i]
i+=1
return num_list
print (sum(UniqueMinSumArray(num_list1)))
You can try with your list of numbers and I am sure it will give you the correct unique minimum sum.

I got the same interview question too. But my answer is in JS in case anyone is interested.
For sure it can be improved to get rid of for loop.
function getMinimumUniqueSum(arr) {
// [1,1,2] => [1,2,3] = 6
// [1,2,2,3,3] = [1,2,3,4,5] = 15
if (arr.length > 1) {
var sortedArr = [...arr].sort((a, b) => a - b);
var current = sortedArr[0];
var res = [current];
for (var i = 1; i + 1 <= arr.length; i++) {
// check current equals to the rest array starting from index 1.
if (sortedArr[i] > current) {
res.push(sortedArr[i]);
current = sortedArr[i];
} else if (sortedArr[i] == current) {
current = sortedArr[i] + 1;
// sortedArr[i]++;
res.push(current);
} else {
current++;
res.push(current);
}
}
return res.reduce((a,b) => a + b, 0);
} else {
return 0;
}
}

Cycle finding algorithm

I need do find a cycle beginning and ending at given point. It is not guaranteed that it exists.
I use bool[,] points to indicate which point can be in cycle. Poins can be only on grid. points indicates if given point on grid can be in cycle.
I need to find this cycle using as minimum number of points.
One point can be used only once.
Connection can be only vertical or horizontal.
Let this be our points (red is starting point):
removing dead ImageShack links
I realized that I can do this:
while(numberOfPointsChanged)
{
//remove points that are alone in row or column
}
So i have:
removing dead ImageShack links
Now, I can find the path.
removing dead ImageShack links
But what if there are points that are not deleted by this loop but should not be in path?
I have written code:
class MyPoint
{
public int X { get; set; }
public int Y { get; set; }
public List<MyPoint> Neighbours = new List<MyPoint>();
public MyPoint parent = null;
public bool marked = false;
}
private static MyPoint LoopSearch2(bool[,] mask, int supIndexStart, int recIndexStart)
{
List<MyPoint> points = new List<MyPoint>();
//here begins translation bool[,] to list of points
points.Add(new MyPoint { X = recIndexStart, Y = supIndexStart });
for (int i = 0; i < mask.GetLength(0); i++)
{
for (int j = 0; j < mask.GetLength(1); j++)
{
if (mask[i, j])
{
points.Add(new MyPoint { X = j, Y = i });
}
}
}
for (int i = 0; i < points.Count; i++)
{
for (int j = 0; j < points.Count; j++)
{
if (i != j)
{
if (points[i].X == points[j].X || points[i].Y == points[j].Y)
{
points[i].Neighbours.Add(points[j]);
}
}
}
}
//end of translating
List<MyPoint> queue = new List<MyPoint>();
MyPoint start = (points[0]); //beginning point
start.marked = true; //it is marked
MyPoint last=null; //last point. this will be returned
queue.Add(points[0]);
while(queue.Count>0)
{
MyPoint current = queue.First(); //taking point from queue
queue.Remove(current); //removing it
foreach(MyPoint neighbour in current.Neighbours) //checking Neighbours
{
if (!neighbour.marked) //in neighbour isn't marked adding it to queue
{
neighbour.marked = true;
neighbour.parent = current;
queue.Add(neighbour);
}
//if neighbour is marked checking if it is startig point and if neighbour's parent is current point. if it is not that means that loop already got here so we start searching parents to got to starting point
else if(!neighbour.Equals(start) && !neighbour.parent.Equals(current))
{
current = neighbour;
while(true)
{
if (current.parent.Equals(start))
{
last = current;
break;
}
else
current = current.parent;
}
break;
}
}
}
return last;
}
But it doesn't work. The path it founds contains two points: start and it's first neighbour.
What am I doing wrong?
EDIT:
Forgot to mention... After horizontal connection there has to be vertical, horizontal, vertical and so on...
What is more in each row and column there need to be max two points (two or none) that are in the cycle. But this condition is the same as "The cycle has to be the shortest one".

First of all, you should change your representation to a more efficient one. You should make vertex a structure/class, which keeps the list of the connected vertices.
Having changed the representation, you can easily find the shortest cycle using breadth-first search.
You can speed the search up with the following trick: traverse the graph in the breadth-first order, marking the traversed vertices (and storing the "parent vertex" number on the way to the root at each vertex). AS soon as you find an already marked vertex, the search is finished. You can find the two paths from the found vertex to the root by walking back by the stored "parent" vertices.
Edit:
Are you sure you code is right? I tried the following:
while (queue.Count > 0)
{
MyPoint current = queue.First(); //taking point from queue
queue.Remove(current); //removing it
foreach (MyPoint neighbour in current.Neighbours) //checking Neighbours
{
if (!neighbour.marked) //if neighbour isn't marked adding it to queue
{
neighbour.marked = true;
neighbour.parent = current;
queue.Add(neighbour);
}
else if (!neighbour.Equals(current.parent)) // not considering own parent
{
// found!
List<MyPoint> loop = new List<MyPoint>();
MyPoint p = current;
do
{
loop.Add(p);
p = p.parent;
}
while (p != null);
p = neighbour;
while (!p.Equals(start))
{
loop.Add(p);
p = p.parent;
}
return loop;
}
}
}
return null;
instead of the corresponding part in your code (I changed the return type to List<MyPoint>, too). It works and correctly finds a smaller loop, consisting of 3 points: the red point, the point directly above and the point directly below.

That is what I have done. I don't know if it is optimised but it does work correctly. I have not done the sorting of the points as #marcog suggested.
private static bool LoopSearch2(bool[,] mask, int supIndexStart, int recIndexStart, out List<MyPoint> path)
{
List<MyPoint> points = new List<MyPoint>();
points.Add(new MyPoint { X = recIndexStart, Y = supIndexStart });
for (int i = 0; i < mask.GetLength(0); i++)
{
for (int j = 0; j < mask.GetLength(1); j++)
{
if (mask[i, j])
{
points.Add(new MyPoint { X = j, Y = i });
}
}
}
for (int i = 0; i < points.Count; i++)
{
for (int j = 0; j < points.Count; j++)
{
if (i != j)
{
if (points[i].X == points[j].X || points[i].Y == points[j].Y)
{
points[i].Neighbours.Add(points[j]);
}
}
}
}
List<MyPoint> queue = new List<MyPoint>();
MyPoint start = (points[0]);
start.marked = true;
queue.Add(points[0]);
path = new List<MyPoint>();
bool found = false;
while(queue.Count>0)
{
MyPoint current = queue.First();
queue.Remove(current);
foreach (MyPoint neighbour in current.Neighbours)
{
if (!neighbour.marked)
{
neighbour.marked = true;
neighbour.parent = current;
queue.Add(neighbour);
}
else
{
if (neighbour.parent != null && neighbour.parent.Equals(current))
continue;
if (current.parent == null)
continue;
bool previousConnectionHorizontal = current.parent.Y == current.Y;
bool currentConnectionHorizontal = current.Y == neighbour.Y;
if (previousConnectionHorizontal != currentConnectionHorizontal)
{
MyPoint prev = current;
while (true)
{
path.Add(prev);
if (prev.Equals(start))
break;
prev = prev.parent;
}
path.Reverse();
prev = neighbour;
while (true)
{
if (prev.Equals(start))
break;
path.Add(prev);
prev = prev.parent;
}
found = true;
break;
}
}
if (found) break;
}
if (found) break;
}
if (path.Count == 0)
{
path = null;
return false;
}
return true;
}

Your points removal step is worst case O(N^3) if implemented poorly, with the worst case being stripping a single point in each iteration. And since it doesn't always save you that much computation in the cycle detection, I'd avoid doing it as it also adds an extra layer of complexity to the solution.
Begin by creating an adjacency list from the set of points. You can do this efficiently in O(NlogN) if you sort the points by X and Y (separately) and iterate through the points in order of X and Y. Then to find the shortest cycle length (determined by number of points), start a BFS from each point by initially throwing all points on the queue. As you traverse an edge, store the source of the path along with the current point. Then you will know when the BFS returns to the source, in which case we've found a cycle. If you end up with an empty queue before finding a cycle, then none exists. Be careful not to track back immediately to the previous point or you will end up with a defunct cycle formed by two points. You might also want to avoid, for example, a cycle formed by the points (0, 0), (0, 2) and (0, 1) as this forms a straight line.
The BFS potentially has a worst case of being exponential, but I believe such a case can either be proven to not exist or be extremely rare as the denser the graph the quicker you'll find a cycle while the sparser the graph the smaller your queue will be. On average it is more likely to be closer to the same runtime as the adjacency list construction, or in the worst realistic cases O(N^2).

I think that I'd use an adapted variant of Dijkstra's algorithm which stops and returns the cycle whenever it arrives to any node for the second time. If this never happens, you don't have a cycle.
This approach should be much more efficient than a breadth-first or depth-first search, especially if you have many nodes. It is guarateed that you'll only visit each node once, thereby you have a linear runtime.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to improve recursive backtracking algorithm - algorithm

I think there's a dynamic programming algorithm for solving the multiple-bin packing problem, or at least, a polynomial approximation algorithm. Take a look here and here.

Related

A Scapegoat Tree That Just Won't Balance

Find K arrays that sum up to a given array with a certain accuracy

Understanding Big-O with a specific example

Make unique array with minimal sum

Cycle finding algorithm

Categories

Resources