Algorithm interview from Google - algorithm

I am a long time lurker, and just had an interview with Google where they asked me this question:
Various artists want to perform at the Royal Albert Hall and you are responsible for scheduling
their concerts. Requests for performing at the Hall are accommodated on a first come first served
policy. Only one performance is possible per day and, moreover, there cannot be any concerts
taking place within 5 days of each other
Given a requested time d which is impossible (i.e. within 5 days of an already sched-
uled performance), give an O(log n)-time algorithm to find the next available day d2
(d2 > d).
I had no clue how to solve it, and now that the interview is over, I am dying to figure out how to solve it. Knowing how smart most of you folks are, I was wondering if you can give me a hand here. This is NOT for homework, or anything of that sort. I just want to learn how to solve it for future interviews. I tried asking follow up questions but he said that is all I can tell you.

You need a normal binary search tree of intervals of available dates. Just search for the interval containing d. If it does not exist, take the interval next (in-order) to the point where the search stopped.
Note: contiguous intervals must be fused together in a single node. For example: the available-dates intervals {2 - 15} and {16 - 23} should become {2 - 23}. This might happen if a concert reservation was cancelled.
Alternatively, a tree of non-available dates can be used instead, provided that contiguous non-available intervals are fused together.

Store the scheduled concerts in a binary search tree and find a feasible solution by doing a binary search.
Something like this:
FindDateAfter(tree, x):
n = tree.root
if n.date < x
n = FindDateAfter(n.right, x)
else if n.date > x and n.left.date < x
return n
return FindDateAfter(n.left, x)
FindGoodDay(tree, x):
n = FindDateAfter(tree, x)
while (n.date + 10 < n.right.date)
n = FindDateAfter(n, n.date + 5)
return n.date + 5

I've used a binary search tree (BST) that holds the ranges for valid free days that can be scheduled for performances.
One of the ranges must end with int.MaxValue, because we have an infinite amount of days so it can't be bound.
The following code searches for the closest day to the requested day, and returns it.
The time complexity is O(H) when H is the tree height (usually H=log(N), but can become H=N in some cases.).
The space complexity is the same as the time complexity.
public static int FindConcertTime(TreeNode<Tuple<int, int>> node, int reqDay)
{
// Not found!
if (node == null)
{
return -1;
}
Tuple<int, int> currRange = node.Value;
// Found range.
if (currRange.Item1 <= reqDay &&
currRange.Item2 >= reqDay)
{
// Return requested day.
return reqDay;
}
// Go left.
else if (currRange.Item1 > reqDay)
{
int suggestedDay = FindConcertTime(node.Left, reqDay);
// Didn't find appropriate range in left nodes, or found day
// is further than current option.
if (suggestedDay == -1 || suggestedDay > currRange.Item1)
{
// Return current option.
return currRange.Item1;
}
else
{
// Return suggested day.
return suggestedDay;
}
}
// Go right.
// Will always find because the right-most node has "int.MaxValue" as Item2.
else //if (currRange.Item2 < reqDay)
{
return FindConcertTime(node.Right, reqDay);
}
}

Store the number of used nights per year, quarter, and month. To find a free night, find the first year that is not fully booked, then the quarter within that year, then the month. Then check each of the nights in that month.
Irregularities in the calendar system makes this a little tricky so instead of using years and months you can apply the idea for units of 4 nights as "month", 16 nights as "quarter", and so on.

Assume, at level 1 all schedule details are available.
Group schedule of 16 days schedule at level 2.
Group 16 level 2 status at level 3.
Group 16 level 3 status at level 4.
Depends on number of days that you want to expand, increase the level.
Now search from higher level and do binary search at the end.

Asymtotic complexity:-
It means runtime is changing as the input grows.
suppose we have an input string “abcd”. Here we traverse through each character to find its length thus the time taken is proportional to the no of characters in the string like n no of char. Thus O(n).
but if we put the length of the string “abcd” in a variable then no matter how long the string be we still can find the length of thestring by looking at the variable len. (len=4).
ex: return 23. no matter what you input is we still have the output as 23.
thus the complexity is O(1). Thus th program will be running in a constant time wrt input size.
for O(log n) - the operations are happening in logarithmic steps.
https://drive.google.com/file/d/0B7eUOnXKVyeERzdPUE8wYWFQZlk/view?usp=sharing
Observe the image in the above link. Over here we can see the bended line(logarithmic line). Here we can say that for smaller inputs the O(log n) notation works good as the time taken is less as we can see in the bended line but when the input grows the linear notation i.e O(n) is considered as better way.
There are also the best and worst case scenarios to be seen. Like the above example.
You can also refer to this cheat for the algorithms: http://bigocheatsheet.com/

It was already mentioned above, but basically keep it simple with a binary tree. You know a binary tree has log N complexity. So you already know what algorithm you need to use.
All you have to do is to come up with a tree node structure and use binary tree insertion algorithm to find next available date:
A possible one:
The tree node has two attributes: d (date of the concert) and d+5 (end date for the blocking period of 5 days). Again to keep it simple, use a timestamp for the two date attributes.
Now it is trivial to find next available date by using binary tree inorder insertion algorithm with initial condition of root = null.

Why not try to use Union-Find? You can group each concert day + the next 5 days as part of one set and then perform a FIND on the given day which would return the next set ID which would be your next concert date.
If implemented using a tree, this gives a O(log n) time complexity.

Related

An algorithm to calculate foreign residence without enumerating days?

The visa conditions for a country to which I travel frequently include this restriction:
"You may reside in [country] for a maximum of 90 days in any period of 180"
Given a tentative list of pairs of dates (entry and exit dates), is there an algorithm that can tell me, for each visit, whether I will be in or out of compliance, and by how many days?
Clearly, one way to do this would be to build a large array of individual days and then slide a 180-day window along it, counting residence days. But I'm wondering whether there is a more elegant method which doesn't involve building a great long list of days.
The normal algorithm for this is basically a greedy algorithm, though it could also be seen as a 1D dynamic progamming algorithm. Basically, rather than sliding the window 1 day at a time, you slide it 1 starting-date at a time. Like so:
first_interval = 0
last_interval = 0
for first_interval = 0 to N:
# include more intervals as long as they (partially) fit within 180 days after the first one
while last_interval < N and A[last_interval].start - A[first_interval].start < 180:
last_interval += 1
calculate total number of days in intervals, possibly clipping the last one
The need to clip the last interval makes it a bit less elegant than it could otherwise be: in similar algorithms, rather than summing the total each time, you add to it for added-on intervals (when incrementing last_interval) and subtract from it for left-behind intervals (when incrementing first_interval). You could do something kind of similar here with the second-to-last interval, but unless you're in a serious performance bind it's probably best not to.
The following C++ code calculates the duration between two arbitrary dates no earlier than Jan 1, 1 A.D. in O(1) time. Is this what you're looking for?
#include <iostream>
using namespace std;
int days(int y,int m,int d){
int i,s,n[13]={0,31,28,31,30,31,30,31,31,30,31,30,31};
y+=(m-1)/12; m=(m-1)%12+1; if(m<1) {y--; m+=12;}
y--; s=y*365+y/4-y/100+y/400; y++;
if(y%4==0 && y%100 || y%400==0) n[2]++;
for(i=1;i<m;i++) s+=n[i]; s+=d; return s;
}
int main(){
cout<<days(2017,8,14)-days(2005,2,28)<<endl;
return 0;
}
You can use the days() function to map all dates of entry and exit to integers and then use the Sneftel's algorithm.

How to find the minimum value of M?

I'm trying to solve this problem:
You have N relatives. You will talk to ith relative for exactly Ti
minutes. Each minute costs you 1 dollar . After the conversation,
they will add a recharge of Xi dollars in your mobile. Initially, you
have M dollars balance in your mobile phone.
Find the minimum value of M, that you must have initially, in your
phone, so that you don't run out of balance during any of the call
(encounter negative balance).
Note : You can call relatives in any order. Each relative will be
called exactly once.
Input:
N
T1 X1
T2 X2
2
1 1
2 1
Output:
2
This looks easy to me at first but I'm not able to find the exact solution.
My Initial thoughts:
We have no problem where Xi > Ti as it will not reduce our initial
balance. We need to take care of situation where where we will run
into loss i.e Ti > Xi.
But I am unable to make expression which will result in minimum
initial value.
Need guidance in approaching this problem to find optimal solution.
UPDATE:-
Binary Search approach seems to lead to wrong result (as proved by the
test case provided in the comment below by user greybeard.
So, this is another approach.We maintain the difference between call cost
and recharge amount.
Then we maintain two arrays/vectors.
If our recharge amount is strictly greater than cost of call, we put
the call in the first array ,else we put it in the second array.
Then we can sort the first array according to the cost and the second array
according to the recharge amount. We then update the diff by adding the
least amount of recharge from the call where our cost is greater than recharge
Then we can iterate through our first array and update our max
requirement,requirement for each call and current balance.Finally, our answer
will be the maximum between max requirement and the diff we have maintained.
Example :-
N = 2
T1 = 1 R1 = 1
T2 = 2 R2 = 1
Our first array contains nothing as all the calls have cost greater than
or equal to recharge amount. So, we place both calls in our second array
The diff gets updated to 2 before we sort the array. Then, we add the min
recharge we can get from the calls to our diff(i.e 1).Now, the diff stands
at 3.Then as our first array contains no elements, our answer is equal to
the diff i.e 3.
Time Complexity :- O(nlogn)
Working Example:-
#include<bits/stdc++.h>
using namespace std;
#define MAXN 100007
int n,diff;
vector<pair<int,int> > v1,v2;
int main(){
diff = 0;
cin>>n;
for(int i=0;i<n;i++){
int cost,recharge;
cin>>cost>>recharge;
if(recharge > cost){
v1.push_back(make_pair(cost,recharge));
}else{
v2.push_back(make_pair(recharge,cost));
}
diff += (cost-recharge);
}
sort(v1.begin(), v1.end());
sort(v2.begin(), v2.end());
if(v2.size() > 0)diff += v2[0].first;
int max_req = diff, req = 0,cur = 0;
for(int i=0; i<v1.size(); i++){
req = v1[i].first - cur;
max_req = max(max_req, req);
cur += v1[i].second-v1[i].first;
}
cout<<max(max_req,diff)<<endl;
return 0;
}
(This is a wiki post: you are invited to edit, and don't need much reputation to do so without involving a moderator.)
Working efficiently means accomplishing the task at hand, with no undue effort. Aspects here:
the OP asks for guidance in approaching this problem to find optimal solution - not for a solution (as this entirely similar, older question does).
the problem statement asks for the minimum value of M - not an optimal order of calls or how to find that.
To find the minimum balance initially required, categorise the relatives/(T, X)-pairs/calls (the order might have a meaning, if not for the problem as stated)
T < X Leaves X-T more for calls to follow. Do in order of increasing cost.
Start assuming an initial balance of 1. For each call, if you can afford it, subtract its cost, add its refund and be done accounting for it. If you can't afford it (yet), put it on hold/the back burner/in a priority queue. At the end of "rewarding calls", remove each head of the queue in turn, accounting for necassary increases in intitial balance.
This part ends with a highest balance, yet.
T = X No influence on any other call. Just do at top balance, in any order.
The top balance required for the whole sequence can't be lower than the cost of any single call, including these.
T > X Leaves T-X less for subsequent calls. Do in order of decreasing refund.
(This may, as any call, go to a balance of zero before refund.
As order of calls does not change the total cost, the ones requiring the least initial balance will be those yielding the lowest final one. For the intermediate balance required by this category, don't forget that least refund.)
Combine the requirements from all categories.
Remember the request for guidance.

Interview q: Data structure and algorithm for O(1) retrieval of avg. response time in client server architecture

Intv Q:
In a client-server architecture, there are multiple requests from multiple clients to the server. The server should maintain the response times of all the requests in the previous hour. What data structure and algo will be used for this? Also, the average response time needs to be maintained and has to be retrieved in O(1).
My take:
algo: maintain a running mean
mean = mean_prev *n + current_response_time
-------------------------------
n+1
DS: a set (using order statistic tree).
My question is whether there is a better answer. I felt that my answer is very trivial and the answer to the questions(in the interview) before this one and after this one where non trivial.
EDIT:
Based on what amit suggested:
cleanup()
while(queue.front().timestamp-curr_time > 1hr)
(timestamp,val)=queue.pop();
sum=sum-val
n=n-1;
insert(timestamp,value)
queue.push(timestamp,value);
sum=sum+val
n=n+1;
cleanup();
query_average()
cleanup();
return sum/n;
And if we can ensure that cleanup() is triggered once every hour or half an hour, then query_average() will not take very long. But if someone were to implement timer trigger for a function call, how would they do it?
The problem with your solution is it only takes the total average since the beginning of time, and not for the last one hour, as you supposed to.
To do so, you need to maintain 2 variables and a queue of entries (timestamp,value).
The 2 variables will be n (the number of elements that are relevant to the last hours) and sum - the sum of the elements from the last hour.
When a new element arrives:
queue.add(timestamp,value)
sum = sum + value
n = n+1
When you have a query for average:
while (queue.front().timestamp > currentTimeAtamp() - 1 hour):
(timestamp,value) = queue.pop()
sum = sum - value
n = n-1
return sum/n
Note that the above is still O(1) on average, because for every insertion to the queue - you do exactly one deletion. You might add the above loop to the insertion procedure as well.

Divide a group of people into two disjoint subgroups (of arbitrary size) and find some values

As we know from programming, sometimes a slight change in a problem can
significantly alter the form of its solution.
Firstly, I want to create a simple algorithm for solving
the following problem and classify it using bigtheta
notation:
Divide a group of people into two disjoint subgroups
(of arbitrary size) such that the
difference in the total ages of the members of
the two subgroups is as large as possible.
Now I need to change the problem so that the desired
difference is as small as possible and classify
my approach to the problem.
Well,first of all I need to create the initial algorithm.
For that, should I make some kind of sorting in order to separate the teams, and how am I suppose to continue?
EDIT: for the first problem,we have ruled out the possibility of a set being an empty set. So all we have to do is just a linear search to find the min age and then put it in a set B. SetA now has all the other ages except the age of setB, which is the min age. So here is the max difference of the total ages of the two sets, as high as possible
The way you described the first problem, it is trivial in the way that it requires you to find only the minimum element (in case the subgroups should contain at least 1 member), otherwise it is already solved.
The second problem can be solved recursively the pseudo code would be:
// compute sum of all elem of array and store them in sum
min = sum;
globalVec = baseVec;
fun generate(baseVec, generatedVec, position, total)
if (abs(sum - 2*total) < min){ // check if the distribution is better
min = abs(sum - 2*total);
globalVec = generatedVec;
}
if (position >= baseVec.length()) return;
else{
// either consider elem at position in first group:
generate(baseVec,generatedVec.pushback(baseVec[position]), position + 1, total+baseVec[position]);
// or consider elem at position is second group:
generate(baseVec,generatedVec, position + 1, total);
}
And now just start the function with generate(baseVec,"",0,0) where "" stand for an empty vector.
The algo can be drastically improved by applying it to a sorted array, hence adding a test condition to stop branching, but the idea stays the same.

What is the difference between Linear search and Binary search?

What is the difference between Linear search and Binary search?
A linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n) search - the time taken to search the list gets bigger at the same rate as the list does.
A binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc. This is pretty much how humans typically look up a word in a dictionary (although we use better heuristics, obviously - if you're looking for "cat" you don't start off at "M"). In complexity terms this is an O(log n) search - the number of search operations grows more slowly than the list does, because you're halving the "search space" with each operation.
As an example, suppose you were looking for U in an A-Z list of letters (index 0-25; we're looking for the value at index 20).
A linear search would ask:
list[0] == 'U'? No.
list[1] == 'U'? No.
list[2] == 'U'? No.
list[3] == 'U'? No.
list[4] == 'U'? No.
list[5] == 'U'? No.
...
list[20] == 'U'? Yes. Finished.
The binary search would ask:
Compare list[12] ('M') with 'U': Smaller, look further on. (Range=13-25)
Compare list[19] ('T') with 'U': Smaller, look further on. (Range=20-25)
Compare list[22] ('W') with 'U': Bigger, look earlier. (Range=20-21)
Compare list[20] ('U') with 'U': Found it! Finished.
Comparing the two:
Binary search requires the input data to be sorted; linear search doesn't
Binary search requires an ordering comparison; linear search only requires equality comparisons
Binary search has complexity O(log n); linear search has complexity O(n) as discussed earlier
Binary search requires random access to the data; linear search only requires sequential access (this can be very important - it means a linear search can stream data of arbitrary size)
Think of it as two different ways of finding your way in a phonebook. A linear search is starting at the beginning, reading every name until you find what you're looking for. A binary search, on the other hand, is when you open the book (usually in the middle), look at the name on top of the page, and decide if the name you're looking for is bigger or smaller than the one you're looking for. If the name you're looking for is bigger, then you continue searching the upper part of the book in this very fashion.
A linear search works by looking at each element in a list of data until it either finds the target or reaches the end. This results in O(n) performance on a given list.
A binary search comes with the prerequisite that the data must be sorted. We can leverage this information to decrease the number of items we need to look at to find our target. We know that if we look at a random item in the data (let's say the middle item) and that item is greater than our target, then all items to the right of that item will also be greater than our target. This means that we only need to look at the left part of the data. Basically, each time we search for the target and miss, we can eliminate half of the remaining items. This gives us a nice O(log n) time complexity.
Just remember that sorting data, even with the most efficient algorithm, will always be slower than a linear search (the fastest sorting algorithms are O(n * log n)). So you should never sort data just to perform a single binary search later on. But if you will be performing many searches (say at least O(log n) searches), it may be worthwhile to sort the data so that you can perform binary searches. You might also consider other data structures such as a hash table in such situations.
A linear search starts at the beginning of a list of values, and checks 1 by 1 in order for the result you are looking for.
A binary search starts in the middle of a sorted array, and determines which side (if any) the value you are looking for is on. That "half" of the array is then searched again in the same fashion, dividing the results in half by two each time.
Make sure to deliberate about whether the win of the quicker binary search is worth the cost of keeping the list sorted (to be able to use the binary search). I.e. if you have lots of insert/remove operations and only an occasional search the binary search could in total be slower than the linear search.
Try this: Pick a random name "Lastname, Firstname" and look it up in your phonebook.
1st time: start at the beginning of the book, reading names until you find it, or else find the place where it would have occurred alphabetically and note that it isn't there.
2nd time: Open the book at the half way point and look at the page. Ask yourself, should this person be to the left or to the right. Whichever one it is, take that 1/2 and find the middle of it. Repeat this procedure until you find the page where the entry should be and then either apply the same process to columns, or just search linearly along the names on the page as before.
Time both methods and report back!
[also consider what approach is better if all you have is a list of names, not sorted...]
Linear search also referred to as sequential search looks at each element in sequence from the start to see if the desired element is present in the data structure. When the amount of data is small, this search is fast.Its easy but work needed is in proportion to the amount of data to be searched.Doubling the number of elements will double the time to search if the desired element is not present.
Binary search is efficient for larger array. In this we check the middle element.If the value is bigger that what we are looking for, then look in the first half;otherwise,look in the second half. Repeat this until the desired item is found. The table must be sorted for binary search. It eliminates half the data at each iteration.Its logarithmic.
If we have 1000 elements to search, binary search takes about 10 steps, linear search 1000 steps.
binary search runs in O(logn) time whereas linear search runs in O(n) times thus binary search has better performance
A linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n) search - the time taken to search the list gets bigger at the same rate as the list does.
A binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc. This is pretty much how humans typically look up a word in a dictionary (although we use better heuristics, obviously - if you're looking for "cat" you don't start off at "M"). In complexity terms this is an O(log n) search - the number of search operations grows more slowly than the list does, because you're halving the "search space" with each operation.
Linear Search looks through items until it finds the searched value.
Efficiency: O(n)
Example Python Code:
test_list = [1, 3, 9, 11, 15, 19, 29]
test_val1 = 25
test_val2 = 15
def linear_search(input_array, search_value):
index = 0
while (index < len(input_array)) and (input_array[index] < search_value):
index += 1
if index >= len(input_array) or input_array[index] != search_value:
return -1
return index
print linear_search(test_list, test_val1)
print linear_search(test_list, test_val2)
Binary Search finds the middle element of the array. Checks that middle value is greater or lower than the search value. If it is smaller, it gets the left side of the array and finds the middle element of that part. If it is greater, gets the right part of the array. It loops the operation until it finds the searched value. Or if there is no value in the array finishes the search.
Efficiency: O(logn)
Example Python Code:
test_list = [1, 3, 9, 11, 15, 19, 29]
test_val1 = 25
test_val2 = 15
def binary_search(input_array, value):
low = 0
high = len(input_array) - 1
while low <= high:
mid = (low + high) / 2
if input_array[mid] == value:
return mid
elif input_array[mid] < value:
low = mid + 1
else:
high = mid - 1
return -1
print binary_search(test_list, test_val1)
print binary_search(test_list, test_val2)
Also you can see visualized information about Linear and Binary Search here: https://www.cs.usfca.edu/~galles/visualization/Search.html
For a clear understanding, please take a look at my codepen implementations https://codepen.io/serdarsenay/pen/XELWqN
Biggest difference is the need to sort your sample before applying binary search, therefore for most "normal sized" (meaning to be argued) samples will be quicker to search with a linear search algorithm.
Here is the javascript code, for html and css and full running example please refer to above codepen link.
var unsortedhaystack = [];
var haystack = [];
function init() {
unsortedhaystack = document.getElementById("haystack").value.split(' ');
}
function sortHaystack() {
var t = timer('sort benchmark');
haystack = unsortedhaystack.sort();
t.stop();
}
var timer = function(name) {
var start = new Date();
return {
stop: function() {
var end = new Date();
var time = end.getTime() - start.getTime();
console.log('Timer:', name, 'finished in', time, 'ms');
}
}
};
function lineerSearch() {
init();
var t = timer('lineerSearch benchmark');
var input = this.event.target.value;
for(var i = 0;i<unsortedhaystack.length - 1;i++) {
if (unsortedhaystack[i] === input) {
document.getElementById('result').innerHTML = 'result is... "' + unsortedhaystack[i] + '", on index: ' + i + ' of the unsorted array. Found' + ' within ' + i + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return unsortedhaystack[i];
}
}
}
function binarySearch () {
init();
sortHaystack();
var t = timer('binarySearch benchmark');
var firstIndex = 0;
var lastIndex = haystack.length-1;
var input = this.event.target.value;
//currently point in the half of the array
var currentIndex = (haystack.length-1)/2 | 0;
var iterations = 0;
while (firstIndex <= lastIndex) {
currentIndex = (firstIndex + lastIndex)/2 | 0;
iterations++;
if (haystack[currentIndex] < input) {
firstIndex = currentIndex + 1;
//console.log(currentIndex + " added, fI:"+firstIndex+", lI: "+lastIndex);
} else if (haystack[currentIndex] > input) {
lastIndex = currentIndex - 1;
//console.log(currentIndex + " substracted, fI:"+firstIndex+", lI: "+lastIndex);
} else {
document.getElementById('result').innerHTML = 'result is... "' + haystack[currentIndex] + '", on index: ' + currentIndex + ' of the sorted array. Found' + ' within ' + iterations + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return true;
}
}
}

Resources