Insert Interval into a disjoint set of intervals with Binary Search

Insert Interval into a disjoint set of intervals with Binary Search - algorithm

Given
class Interval{
int start;
int end;
}
Task is to insert an Interval into a disjoint set or list of Intervals. So for example
<4,8> into <3,7><10,13><20,21><30,31><40,45> gives <3,8><10,13><20,21><30,31><40,45>
<1,30> into <3,7><10,13><20,21><30,31><40,45> gives <1,30><40,45>
and etc.
I know we should use 2 binary searches for the most efficient solution and that we should be comparing the inserted interval's start with the list intervals' end and vice versa. How exactly do we handle binary search when we can't find exactly what we're looking for?

Count the number of nodes. Go back to the centre, insert if applicable. Otherwise decide which half of the list is of interest. Go back to it's centre, insert etc. You need to handle the exception <4 9> into <2 5> <8 12>.

Assuming C++, I'd use an std::map from the interval end to the interval start.
For searching, use std::upper_bound() to find the first overlapping
interval and then advance the iterator to find all overlapping intervals.
Only one binary search is required.
#include <map>
#include <stdio.h>
typedef std::map<int, int> IntervalMap;
struct Interval {
int start;
int end;
};
int main()
{
IntervalMap imap;
Interval query;
imap[7] = 3; // <3,7>
imap[13] = 10; // <10,13>
// Insert: <4,8>
query.start = 4;
query.end = 8;
// Find the first overlapping interval
auto it = imap.upper_bound(query.start);
if (it != imap.end() && it->second < query.end)
{
// There is one or more overlapping interval
// Update lower bound for the new interval
if (it->second < query.start)
query.start = it->second;
while (it != imap.end() && it->second < query.end)
{
// Update upper bound for the new interval
if (it->first > query.end)
query.end = it->first;
auto tmp = it;
++it;
imap.erase(tmp);
}
}
// Insert the new interval
imap[query.end] = query.start;
for (auto it = imap.begin(); it != imap.end(); ++it)
{
fprintf(stderr, "<%d,%d>\n", it->second, it->first);
}
return 0;
}

Related

Make unique array with minimal sum

It is a interview question. Given an array, e.g., [3,2,1,2,7], we want to make all elements in this array unique by incrementing duplicate elements and we require the sum of the refined array is minimal. For example the answer for [3,2,1,2,7] is [3,2,1,4,7] and its sum is 17. Any ideas?

It's not quite as simple as my earlier comment suggested, but it's not terrifically complicated.
First, sort the input array. If it matters to be able to recover the original order of the elements then record the permutation used for the sort.
Second, scan the sorted array from left to right (ie from low to high). If an element is less than or equal to the element to its left, set it to be one greater than that element.
Pseudocode
sar = sort(input_array)
for index = 2:size(sar) ! I count from 1
if sar(index)<=sar(index-1) sar(index) = sar(index-1)+1
forend
Is the sum of the result minimal ? I've convinced myself that it is through some head-scratching and trials but I haven't got a formal proof.

If you only need to find ONE of the best solution, here's the algorythm with some explainations.
The idea of this problem is to find an optimal solution, which can be found only by testing all existing solutions (well, they're infinite, let's stick with the reasonable ones).
I wrote a program in C, because I'm familiar with it, but you can port it to any language you want.
The program does this: it tries to increment one value to the max possible (I'll explain how to find it in the comments under the code sections), than if the solution is not found, decreases this value and goes on with the next one and so on.
It's an exponential algorythm, so it will be very slow on large values of duplicated data (yet, it assures you the best solution is found).
I tested this code with your example, and it worked; not sure if there's any bug left, but the code (in C) is this.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int BOOL; //just to ease meanings of values
#define TRUE 1
#define FALSE 0
Just to ease comprehension, I did some typedefs. Don't worry.
typedef struct duplicate { //used to fasten the algorythm; it uses some more memory just to assure it's ok
int value;
BOOL duplicate;
} duplicate_t;
int maxInArrayExcept(int *array, int arraySize, int index); //find the max value in array except the value at the index given
//the result is the max value in the array, not counting th index
int *findDuplicateSum(int *array, int arraySize);
BOOL findDuplicateSum_R(duplicate_t *array, int arraySize, int *tempSolution, int *solution, int *totalSum, int currentSum); //resursive function used to find solution
BOOL check(int *array, int arraySize); //checks if there's any repeated value in the solution
These are all the functions we'll need. All split up for comprehension purpose.
First, we have a struct. This struct is used to avoid checking, for every iteration, if the value on a given index was originally duplicated. We don't want to modify any value not duplicated originally.
Then, we have a couple functions: first, we need to see the worst case scenario: every value after the duplicated ones is already occupied: then we need to increment the duplicated value up to the maximum value reached + 1.
Then, there are the main Function we'll discute later about.
The check Function only checks if there's any duplicated value in a temporary solution.
int main() { //testing purpose
int i;
int testArray[] = { 3,2,1,2,7 }; //test array
int nTestArraySize = 5; //test array size
int *solutionArray; //needed if you want to use the solution later
solutionArray = findDuplicateSum(testArray, nTestArraySize);
for (i = 0; i < nTestArraySize; ++i) {
printf("%d ", solutionArray[i]);
}
return 0;
}
This is the main Function: I used it to test everything.
int * findDuplicateSum(int * array, int arraySize)
{
int *solution = malloc(sizeof(int) * arraySize);
int *tempSolution = malloc(sizeof(int) * arraySize);
duplicate_t *duplicate = calloc(arraySize, sizeof(duplicate_t));
int i, j, currentSum = 0, totalSum = INT_MAX;
for (i = 0; i < arraySize; ++i) {
tempSolution[i] = solution[i] = duplicate[i].value = array[i];
currentSum += array[i];
for (j = 0; j < i; ++j) { //to find ALL the best solutions, we should also put the first found value as true; it's just a line more
//yet, it saves the algorythm half of the duplicated numbers (best/this case scenario)
if (array[j] == duplicate[i].value) {
duplicate[i].duplicate = TRUE;
}
}
}
if (findDuplicateSum_R(duplicate, arraySize, tempSolution, solution, &totalSum, currentSum));
else {
printf("No solution found\n");
}
free(tempSolution);
free(duplicate);
return solution;
}
This Function does a lot of things: first, it sets up the solution array, then it initializes both the solution values and the duplicate array, that is the one used to check for duplicated values at startup. Then, we find the current sum and we set the maximum available sum to the maximum integer possible.
Then, the recursive Function is called; this one gives us the info about having found the solution (that should be Always), then we return the solution as an array.
int findDuplicateSum_R(duplicate_t * array, int arraySize, int * tempSolution, int * solution, int * totalSum, int currentSum)
{
int i;
if (check(tempSolution, arraySize)) {
if (currentSum < *totalSum) { //optimal solution checking
for (i = 0; i < arraySize; ++i) {
solution[i] = tempSolution[i];
}
*totalSum = currentSum;
}
return TRUE; //just to ensure a solution is found
}
for (i = 0; i < arraySize; ++i) {
if (array[i].duplicate == TRUE) {
if (array[i].duplicate <= maxInArrayExcept(solution, arraySize, i)) { //worst case scenario, you need it to stop the recursion on that value
tempSolution[i]++;
return findDuplicateSum_R(array, arraySize, tempSolution, solution, totalSum, currentSum + 1);
tempSolution[i]--; //backtracking
}
}
}
return FALSE; //just in case the solution is not found, but we won't need it
}
This is the recursive Function. It first checks if the solution is ok and if it is the best one found until now. Then, if everything is correct, it updates the actual solution with the temporary values, and updates the optimal condition.
Then, we iterate on every repeated value (the if excludes other indexes) and we progress in the recursion until (if unlucky) we reach the worst case scenario: the check condition not satisfied above the maximum value.
Then we have to backtrack and continue with the iteration, that will go on with other values.
PS: an optimization is possible here, if we move the optimal condition from the check into the for: if the solution is already not optimal, we can't expect to find a better one just adding things.
The hard code has ended, and there are the supporting functions:
int maxInArrayExcept(int *array, int arraySize, int index) {
int i, max = 0;
for (i = 0; i < arraySize; ++i) {
if (i != index) {
if (array[i] > max) {
max = array[i];
}
}
}
return max;
}
BOOL check(int *array, int arraySize) {
int i, j;
for (i = 0; i < arraySize; ++i) {
for (j = 0; j < i; ++j) {
if (array[i] == array[j]) return FALSE;
}
}
return TRUE;
}
I hope this was useful.
Write if anything is unclear.

Well, I got the same question in one of my interviews.
Not sure if you still need it. But here's how I did it. And it worked well.
num_list1 = [2,8,3,6,3,5,3,5,9,4]
def UniqueMinSumArray(num_list):
max=min(num_list)
for i,V in enumerate(num_list):
while (num_list.count(num_list[i])>1):
if (max > num_list[i]+1) :
num_list[i] = max + 1
else:
num_list[i]+=1
max = num_list[i]
i+=1
return num_list
print (sum(UniqueMinSumArray(num_list1)))
You can try with your list of numbers and I am sure it will give you the correct unique minimum sum.

I got the same interview question too. But my answer is in JS in case anyone is interested.
For sure it can be improved to get rid of for loop.
function getMinimumUniqueSum(arr) {
// [1,1,2] => [1,2,3] = 6
// [1,2,2,3,3] = [1,2,3,4,5] = 15
if (arr.length > 1) {
var sortedArr = [...arr].sort((a, b) => a - b);
var current = sortedArr[0];
var res = [current];
for (var i = 1; i + 1 <= arr.length; i++) {
// check current equals to the rest array starting from index 1.
if (sortedArr[i] > current) {
res.push(sortedArr[i]);
current = sortedArr[i];
} else if (sortedArr[i] == current) {
current = sortedArr[i] + 1;
// sortedArr[i]++;
res.push(current);
} else {
current++;
res.push(current);
}
}
return res.reduce((a,b) => a + b, 0);
} else {
return 0;
}
}

Binary Tree Generation

I am very new to tree data structures. I know how the entire structure works, but am not sure how to approach randomly generating one.
For example, to create a binary tree with depth 3, you essentially go putting the pieces together one by one. ie:
root = Node()
root.leftChild = Node()
root.rightChild = Node()
root.leftChild.leftChild = 'left'
root.rightChild.rightChild = 'right'
The above doesn't work when I want to randomly create binary tree structures that vary differently between each other. What I mean by randomly creating a tree structure is essentially randomly creating a node type, randomly assign a child or not assign one but the end result will always have a depth of N.
Does anyone have any suggestions on how to approach this? I would love to see some pseudo code/algorithm or anything of that nature.
thanks

I wrote a simple program to illustrate my method. The program will generate a binary-heap-like structure, and it will be simple to convert it to your structure.
#include <iostream>
#include <time.h>
using namespace std;
int main(){
int maxDepth; //The max depth of the tree
int totalNodes; //The least number of nodes in the tree.
int realTotalNodes = 0; //The real number of nodes in the tree.
cin >> maxDepth >> totalNodes;
srand(time(NULL));
int indexMax = (1 << maxDepth) - 1 ; //Max index of the nodes in the n-depth binary tree.
bool* nodes = new bool[indexMax + 1];
memset(nodes, 0, indexMax + 1);
int lastMax = indexMax, lastMin =1 << (maxDepth - 1); //Min and Max index of nodes at n-th level
//First, promise that the tree will be n-level high.
//That is, create a path from root to n-th level.
int lastIndex = (rand() % lastMin) + lastMin; //Generate a node that is at n-th level.
while(lastIndex > 0){ //Create its parent, grand-parent, grand-grand-parent...
nodes[lastIndex] = true;
realTotalNodes++;
lastIndex = lastIndex / 2;
totalNodes--;
}
while(totalNodes > 0){
int currentIndex = rand() % indexMax; //Randomly generate the leaves in the tree
totalNodes--;
while(currentIndex > 0){ //Create its parents...
if(nodes[currentIndex] == true){ //If some parent exists, then its grand-parents have already been created.
break;
}
nodes[currentIndex] = true;
realTotalNodes++;
currentIndex = currentIndex / 2;
totalNodes--;
}
}
//Print these stuff.
int level = 2;
for(int i = 1 ; i < indexMax ; i++){
if(nodes[i]){
cout << i << "\t";
}
if(i == level - 1){
cout << endl;
level = level * 2;
}
}
return 0;
}

BIT To Query For a given range

We have an array arr[0 . . . n-1]. We should be able to efficiently find the minimum value from index qs (query start) to qe (query end) where 0 <= qs <= qe <= n-1
I know the data structure Segment Tree for this. I am wondering if Binary Index Tree (BIT) can also be used for this operation.If Yes, please How can i use BIT in this scenario and Is the array is to static , can we change the element and update our BIT or Segment Tree.

Yes, BIT also can solve this problem just with a little trick.
Let's use num[] represents the init array, and idx[] represents the BIT array.
The keypoint is we should use idx[k] represent the min value of range num[k-lowbit(k)+1, k], k is start from 1.
#define MAX_VALUE 10000
#define lowbit(x) (x&(-x))
int num[MAX_VALUE];
int idx[MAX_VALUE];
void update(int pos, int val, int max_index) {
num[pos] = val;
while (pos < max_index) {
idx[pos] = min(idx[pos], val);
pos += lowbit(pos);
}
}
int getMin(int left, int right) {
int res = num[right];
while (true) {
res = min(res,num[right]);
if(left == right)break;
for(right-=1;right-left>=lowbit(right);right-=lowbit(right)){
res=min(res,idx[right]);
}
}
return res;
}
Hope can help you.

Contest Challenge: "Maximize number of races one can take part in"

While practising problems from hackerearth I came across following problem( not from active contest ) and have been unsuccessful in solving it after many attempts.
Chandler is participating in a race competition involving N track
races. He wants to run his old car on these tracks having F amount of
initial fuel. At the end of each race, Chandler spends si fuel and
gains some money using which he adds ei amount of fuel to his car.
Also for participating in race i at any stage, Chandler should have
more than si amount of fuel. Also he can participate in race i once.
Help Chandler in maximizing the number of races he can take part in if
he has a choice to participate in the given races in any order.
How can I approach the problem. My approach was to sort by (ei-si) but than I couldn't incorporate condition that fuel present is greater than required for race.
EDIT I tried to solve using following algorithm but it fails,I also can't think of any inputs which fail the algorithm. Please help me out figuring whats wrong or give some input where my algorithm fails.
Sort (ei-si) in non-increasing order;
start iterating through sorted (ei-si) and find first element such that fuel>=si
update fuel=fuel+(ei-si);
update count;
erase that element from list, and start searching again;
if fuel was not updated than we can't take part in any races so stop searching
and output count.
EDIT And here is my code as requested.
#include<iostream>
#include<vector>
#include<algorithm>
#include<list>
using namespace std;
struct race{
int ei;
int si;
int earn;
};
bool compareByEarn(const race &a, const race &b)
{
return a.earn <= b.earn;
}
int main(){
int t;
cin>>t;
while(t--){
vector<struct race> fuel;
int f,n;
cin>>f>>n;
int si,ei;
while(n--){
cin>>si>>ei;
fuel.push_back({ei,si,ei-si});
}
sort(fuel.begin(),fuel.end(),compareByEarn);
list<struct race> temp;
std::copy( fuel.rbegin(), fuel.rend(), std::back_inserter(temp ) );
int count=0;
while(1){
int flag=0;
for (list<struct race>::iterator ci = temp.begin(); ci != temp.end(); ++ci){
if(ci->si<=f){
f+=ci->earn;
ci=temp.erase(ci);
++count;
flag=1;
break;
}
}
if(!flag){
break;
}
}
cout<<count<<endl;
}
}
EDIT As noted in answer below, the above greedy approach dosen't always work. So now any alternative method would be useful

Here is my solution, which gets accepted by the judge:
Eliminate those races which have a profit (ei>si)
Sort by ei (in decreasing order)
Solve the problem using a dynamic programming algorithm. (It is similar to a pseudo-polynomial solution for the 0-1 knapsack.)
It is clear that the order in which you eliminate profitable races does not matter. (As long as you process them until no more profitable races can be entered.)
For the rest, I will first prove that if a solution exists, you can perform the same set of races in decreasing order of ei, and the solution will still be feasible. Imagine we have a solution in which k races were chosen and let's say these k races have starting and ending fuel values of s1,...,sk and e1,...,ek. Let i be the first index where ei < ej (where j=i+1). We will show that we can swap i and i+1 without violating any constraints.
It is clear that swapping i and i+1 will not disrupt any constraints before i or after i+1, so we only need to prove that we can still perform race i if we swap its order with race i+1 (j). In the normal order, if the fuel level before we start on race i was f, after race i it will be f-si+ei, and this is at least sj. In other words, we have: f-si+ei>=sj, which means f-sj+ei>=si. However, we know that ei < ej so f-sj+ej >= f-sj+ei >= si, and therefore racing on the jth race before the ith race will still leave at least si fuel for race i.
From there, we implement a dynamic programming algorithm in which d[i][j] is the maximum number of races we can participate in if we can only use races i..n and we start with j units of fuel.
Here is my code:
#include <iostream>
#include <algorithm>
#include <cstring>
using namespace std;
const int maxn = 110;
const int maxf = 110*1000;
int d[maxn][maxf];
struct Race {
int s, e;
bool used;
inline bool operator < (const Race &o) const {
return e > o.e;
}
} race[maxn];
int main() {
int t;
for (cin >> t; t--;) {
memset(d, 0, sizeof d);
int f, n;
cin >> f >> n;
for (int i = 0; i < n; i++) {
cin >> race[i].s >> race[i].e;
race[i].used = false;
}
sort(race, race + n);
int count = 0;
bool found;
do {
found = 0;
for (int i = 0; i < n; i++)
if (!race[i].used && race[i].e >= race[i].s && race[i].s >= f) {
race[i].used = true;
count++;
f += race[i].s - race[i].e;
found = true;
}
} while (found);
for (int i = n - 1; i >= 0; i--) {
for (int j = 0; j < maxf; j++) {
d[i][j] = d[i + 1][j];
if (!race[i].used && j >= race[i].s) {
int f2 = j - race[i].s + race[i].e;
if (f2 < maxf)
d[i][j] = max(d[i][j], 1 + d[i + 1][f2]);
}
}
}
cout << d[0][f] + count << endl;
}
return 0;
}

You need to change your compareByEarn function
bool compareByEarn(const race &a, const race &b)
{
if(a.earn == b.earn) return a.si < b.si;
return a.earn < b.earn;
}
Above comparison means, choose the track with more earning (or lesser loss). But if there are 2 tracks with same earning, prefer the track which requires more fuel.
Consider the example
Initially fuel in the car = 4
track 1 : s = 2, e = 1
track 2 : s = 3, e = 2
track 3 : s = 4, e = 3
Expected answer = 3
Received answer = 2 or 3 depending on whether sorting algorithm is stable or unstable and the order of input\.
As a side note:
Also for participating in race i at any stage, Chandler should have
more than si amount of fuel
Should translate to
if(ci->si < f){ // and not if(ci->si<=f){
You can check if my observation is right or problem author chose incorrect sentence to describe the constraint.
EDIT With more reasoning I realized you can not do it with only greedy approach.
Consider the following input.
Initially fuel in the car = 9
track 1 : s = 9, e = 6
track 2 : s = 2, e = 0
track 3 : s = 2, e = 0
track 4 : s = 2, e = 0
Expected answer = 4
Received answer = 3

Algorithm to match sets with overlapping members

Looking for an efficient algorithm to match sets among a group of sets, ordered by the most overlapping members. 2 identical sets for example are the best match, while no overlapping members are the worst.
So, the algorithm takes input a list of sets and returns matching set pairs ordered by the sets with the most overlapping members.
Would be interested in ideas to do this efficiently. Brute force approach is to try all combinations and sort which obviously is not very performant when the number of sets is very large.
Edit: Use case - Assume a large number of sets already exist. When a new set arrives, the algorithm is run and the output includes matching sets (with at least one element overlap) sorted by the most matching to least (doesn't matter how many items are in the new/incoming set). Hope that clarifies my question.

If you can afford an approximation algorithm with a chance of error, then you should probably consider MinHash.
This algorithm allows estimating the similarity between 2 sets in constant time. For any constructed set, a fixed size signature is computed, and then only the signatures are compared when estimating the similarities. The similarity measure being used is Jaccard distance, which ranges from 0 (disjoint sets) to 1 (identical sets). It is defined as the intersection to union ratio of two given sets.
With this approach, any new set has to be compared against all existing ones (in linear time), and then the results can be merged into the top list (you can use a bounded search tree/heap for this purpose).

Since the number of possible different values is not very large, you get a fairly efficient hashing if you simply set the nth bit in a "large integer" when the nth number is present in your set. You can then look for overlap between sets with a simple bitwise AND followed by a "count set bits" operation. On 64 bit architecture, that means that you can look for the similarity between two numbers (out of 1000 possible values) in about 16 cycles, regardless of the number of values in each cluster. As the cluster gets more sparse, this becomes a less efficient algorithm.
Still - I implemented some of the basic functions you might need in some code that I attach here - not documented but reasonably understandable, I think. In this example I made the numbers small so I can check the result by hand - you might want to change some of the #defines to get larger ranges of values, and obviously you will want some dynamic lists etc to keep up with the growing catalog.
#include <stdio.h>
// biggest number you will come across: want this to be much bigger
#define MAXINT 25
// use the biggest type you have - not int
#define BITSPER (8*sizeof(int))
#define NWORDS (MAXINT/BITSPER + 1)
// max number in a cluster
#define CSIZE 5
typedef struct{
unsigned int num[NWORDS]; // want to use longest type but not for demo
int newmatch;
int rank;
} hmap;
// convert number to binary sequence:
void hashIt(int* t, int n, hmap* h) {
int ii;
for(ii=0;ii<n;ii++) {
int a, b;
a = t[ii]%BITSPER;
b = t[ii]/BITSPER;
h->num[b]|=1<<a;
}
}
// print binary number:
void printBinary(int n) {
unsigned int jj;
jj = 1<<31;
while(jj!=0) {
printf("%c",((n&jj)!=0)?'1':'0');
jj>>=1;
}
printf(" ");
}
// print the array of binary numbers:
void printHash(hmap* h) {
unsigned int ii, jj;
for(ii=0; ii<NWORDS; ii++) {
jj = 1<<31;
printf("0x%08x: ", h->num[ii]);
printBinary(h->num[ii]);
}
//printf("\n");
}
// find the maximum overlap for set m of n
int maxOverlap(hmap* h, int m, int n) {
int ii, jj;
int overlap, maxOverlap = -1;
for(ii = 0; ii<n; ii++) {
if(ii == m) continue; // don't compare with yourself
else {
overlap = 0;
for(jj = 0; jj< NWORDS; jj++) {
// just to see what's going on: take these print statements out
printBinary(h->num[ii]);
printBinary(h->num[m]);
int bc = countBits(h->num[ii] & h->num[m]);
printBinary(h->num[ii] & h->num[m]);
printf("%d bits overlap\n", bc);
overlap += bc;
}
if(overlap > maxOverlap) maxOverlap = overlap;
}
}
return maxOverlap;
}
int countBits (unsigned int b) {
int count;
for (count = 0; b != 0; count++) {
b &= b - 1; // this clears the LSB-most set bit
}
return count;
}
int main(void) {
int cluster[20][CSIZE];
int temp[CSIZE];
int ii,jj;
static hmap H[20]; // make them all 0 initially
for(jj=0; jj<20; jj++){
for(ii=0; ii<CSIZE; ii++) {
temp[ii] = rand()%MAXINT;
}
hashIt(temp, CSIZE, &H[jj]);
}
for(ii=0;ii<20;ii++) {
printHash(&H[ii]);
printf("max overlap: %d\n", maxOverlap(H, ii, 20));
}
}
See if this helps at all...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Insert Interval into a disjoint set of intervals with Binary Search - algorithm

Count the number of nodes. Go back to the centre, insert if applicable. Otherwise decide which half of the list is of interest. Go back to it's centre, insert etc. You need to handle the exception <4 9> into <2 5> <8 12>.

Related

Make unique array with minimal sum

Binary Tree Generation

BIT To Query For a given range

Contest Challenge: "Maximize number of races one can take part in"

Algorithm to match sets with overlapping members

Categories

Resources