Student Council Election - algorithm

Student council elections are work in an odd manner. Each candidate is assigned a unique
identification number. The University is divided into five zones and each zone proposes a
list of candidates that it would like to nominate to the Council. Any candidate who is
proposed by three or more zones is elected. There is no lower limit or upper limit on the
size of the Council. Design an algorithm to take proposed list of candidate from all five
zones as input (in sorted order) and calculate how many candidates are elected to the
Council. Illustrate your algorithm for the following example:
Suppose the candidates proposed by the five zones are:
Zone 1: [5,12,15,62,87]
Zone 2: [7,14,48,62,87,92]
Zone 3: [5,12,14,87]
Zone 4: [12,17,49,52,92,98]
Zone 5: [5,12,14,87,92]
I think the hint here is sorted order but i couldn't find any ways to approach this problem.If anyone come up with solution please post it. Thank you.

I have a simple idea.
Initialize a HashMap(key, value) with the key representing the candidate id and value representing the number of proposed.
Loop for each element of each zone.
If the element is not yet appeared, append new (key, value) with value 1, else increase value by 1.
Finally, you can check in hashmap and elect key with equal or greater than 3.
So, you can follow my Pseudocode
Hashmap map = new HashMap<int, int>()
ForEach z in ZoneLine
ForEach e in z
If containKey(e)
put map[e] <- 1
Else map[e]+=1
ForEach m in Map
If map[m] >= 3
return m
Hope this helps.

Related

Maximum profit earned in Interval

We have to rent our car to customers. We have a list whose each element represent the time at which car will be given, second -> the time at which car will be returned and third -> the profit earned at that lending. So i need to find out the maximum profit that can be earned.
Eg:
( [1,2,20], [3,6,15], [2,8,25], [7,12,18], [13,31,22] )
The maximum profit earned is 75. [1,2] + [3,6] + [7,12] + [13,31].
We can have overlapping intervals. We need to select such case that maximizes our profit.
Assuming you have only one car, then the problem we are solving in Weighted Interval Scheduling
Let us assume we have intervals I0 , I1, I2, ....In-1 and Interval Ii is (si,ti,pi)
Algorithm :
First sort the Intervals on the basis of starting points si.
Create a array for Dynamic Programming, MaxProfit[i] represent the maximum profit you can make from intervals Ii,Ii+1,In-1.Initialise the last value
MaxProfit[n-1] = profit_of_(n-1)
Then using DP we can find the maximum profit as :
a. Either we can ignore the given interval, In this case our maximum profit will be the maximum profit we can gain from the remaining intervals
MaxProfit[i+1]
b. Or we can include this interval, In this case the maximum profit can be written as
profit_of_i + MaxProfit[r]
where r is the next Interval such that sr > ti
So our overall DP becomes
MaxProfit[i] = max{MaxProfit[i+1], profit_of_i + MaxProfit[r] }
Return the value of MaxProfit[0]
use something like dynamic programming.
at first sort with first element.
you have 2 rows, first show most earned if this time is used and another is for most earned if not used.
then you will put each task in the related period of time and see in each time part it is a good choice to have it or not.
take care that if intervals are legal we choose all of them.

Find timestamp given K best candidates

So I was asked a weird inversion of the K best candidates problem. The normal problem is as follows.
Given a list of 'votes' which are tuples of timestamps and candidates like below:
(111111, Clinton)
(111111, Bush)
...
Return the top K candidates with the most votes.
Its a typical problem and the solution is to use a hashmap of candidates->votes within the timestamp bound also build a min heap of size K where basically the top of the heap is the candidate that is vulnerable to being ejected from the K best candidates.
In the end you return the heap.
But I was asked in the end: Given a list of K candidates, return the timestamp that matches these as the K best candidates. I'm not sure if I'm recalling the question 100% correctly because it would have to either be the first occurrence of these K candidates as the best or I would have been given their vote tally.
If I understand everything, votes is a list of vote tuples that are made up of candidates that are being voted for and the timestamp of the vote taking place. currTime is the timestamp of all of the votes during that timestamp of before it. topCandidates are the candidates with the highest vote at currTime.
Your first question gives you votes and currTime, you are expected to return topCandidates. Your second question gives you votes and topCandidates, and you are expected to return currTime.
Focusing on the second question, I would make a map where the keys are a timestamp and the values are all of the votes taking place at that moment. Also I would create another map where the key is the candidate and the value is the number of votes they have so far. I would traverse the first map in the ascending timestamp order of the first map, get all of the votes that were cast at the timestamp and increment the second map's values by their candidate (key). Before going through the next timestamp, I would create a list of the most voted for candidates with the data in the second map. If that list matches topCandidates, then the last timestamp you traversed through is currTime.
To code this in python:
from collections import Counter, defaultdict
def findCurrTime(votes, topCandidates):
if not (votes and topCandidates):
return -1
votesAtTime = defaultdict(list)
candidatePoll = Counter()
k = len(topCandidates)
for time, candidate in votes: # votes = [(time0, candidate0), ...]
votesAtTime[time].append(candidate)
for ts in votesAtTime:
candidatePoll += Counter(voteAtTime[ts])
if list(map(lambda pair: pair[0],candidatePoll.most_common(k))) == topCandidates:
return ts
# if topCandidates cannot be created from these votes:
return -1
There are some assumptions that I've made (that you hopefully asked your interviewer about). I assumed that the order of topCandidates mattered which Counter.most_common handled, although it won't handle candidates with number of votes.
The time complexity is O(t * n * log(k)) with t being the number of timestamps, n being the number of votes and k being the size of topCandidates. This is because Counter.most_common looks to be O(n*log(k)) and it can run t times. There are definitely more efficient answers though.

Place closest students as far as possible

You are in charge of a classroom which has n seats in a single row, numbered 0 through n-1
During the day students enter and leave the classroom for the exam.
In order to minimize the cheating, your task is to efficiently seat all incoming students.
You're given 2 types of queries: add_student(student_id) -> seat index and remove_student(student_id) -> void
The rules for seating the student is the following:
The seat must be unoccupied
The closest student must be as far away as possible
Ties can be resolved by choosing the lowest-numbered seat.
My Approach is to use Hashtable considering we have to assign seat index to each student which we can do using hash function. Is this approach correct?
If hashtable is the right approach then for - 'the closest student must be as far away as possible', how should I design efficient hash function?
Is there any better way to solve this problem?
This seemed like an interesting problem to me and so here is an algorithm that I have come up with, after giving it some thought.
So as the question says, it's only a single row numbered from 1 to n.
I was thinking of a greedy approach to this problem.
So here how it goes.
Person coming first will be seated at 1.
Person coming second will be seated at n, as this will create the farthest distance between them.
Now we start keeping the subsets, so we have one subset as (1,n).
Now when the third person comes in, he gets seated at mid of first subset i.e (1+n)/2. And after this we will have two subsets - (1,n/2) and (n/2,n).
Now when the fourth person arrives he can either sit in the mid of (1,n/2) and (n/2,n).
And this flow will go on.
Now suppose when a person leaves his seat and goes out of the class, then our subset range will change and then for the next incoming person, we will calculate new set of subsets for him.
Hope this helps. For this approach, I think an array of size n will also work nicely.
Update: This solution does not work because of the order the students can leave and open gaps.
So here's the solution I came up with. I would create an initial position array. So the position of the incoming student is just position[current_number_of_students-1].
Creating the array is the tricky part.
def position(number_of_chairs)
# handle edge cases
return [] if number_of_chairs.nil? || number_of_chairs <= 0
return [0] if number_of_chairs == 1
# initialize with first chair and the last chair
#position = [0, number_of_chairs - 1]
# We want to start adding the middle of row segments
# but in the correct order.
# The first segment is going to be the entire row
#segments = [0, number_of_chairs - 1]
while !#segments.empty?
current_segment = #segments.shift
mid = (current_segment[0] + current_segment[1]) / 2
if (mid > current_segment[0] && mid < current_segment[1])
#position << mid
# add the bottom half to the queue
#segments.push([current_segment[0], mid])
# add the top half to the queue
#segments.push([mid, current_segment[1]])
end
end
#position
end

Divide a group of people into two disjoint subgroups (of arbitrary size) and find some values

As we know from programming, sometimes a slight change in a problem can
significantly alter the form of its solution.
Firstly, I want to create a simple algorithm for solving
the following problem and classify it using bigtheta
notation:
Divide a group of people into two disjoint subgroups
(of arbitrary size) such that the
difference in the total ages of the members of
the two subgroups is as large as possible.
Now I need to change the problem so that the desired
difference is as small as possible and classify
my approach to the problem.
Well,first of all I need to create the initial algorithm.
For that, should I make some kind of sorting in order to separate the teams, and how am I suppose to continue?
EDIT: for the first problem,we have ruled out the possibility of a set being an empty set. So all we have to do is just a linear search to find the min age and then put it in a set B. SetA now has all the other ages except the age of setB, which is the min age. So here is the max difference of the total ages of the two sets, as high as possible
The way you described the first problem, it is trivial in the way that it requires you to find only the minimum element (in case the subgroups should contain at least 1 member), otherwise it is already solved.
The second problem can be solved recursively the pseudo code would be:
// compute sum of all elem of array and store them in sum
min = sum;
globalVec = baseVec;
fun generate(baseVec, generatedVec, position, total)
if (abs(sum - 2*total) < min){ // check if the distribution is better
min = abs(sum - 2*total);
globalVec = generatedVec;
}
if (position >= baseVec.length()) return;
else{
// either consider elem at position in first group:
generate(baseVec,generatedVec.pushback(baseVec[position]), position + 1, total+baseVec[position]);
// or consider elem at position is second group:
generate(baseVec,generatedVec, position + 1, total);
}
And now just start the function with generate(baseVec,"",0,0) where "" stand for an empty vector.
The algo can be drastically improved by applying it to a sorted array, hence adding a test condition to stop branching, but the idea stays the same.

Algorithm For Ranking Items

I have a list of 6500 items that I would like to trade or invest in. (Not for real money, but for a certain game.) Each item has 5 numbers that will be used to rank it among the others.
Total quantity of item traded per day: The higher this number, the better.
The Donchian Channel of the item over the last 5 days: The higher this number, the better.
The median spread of the price: The lower this number, the better.
The spread of the 20 day moving average for the item: The lower this number, the better.
The spread of the 5 day moving average for the item: The higher this number, the better.
All 5 numbers have the same 'weight', or in other words, they should all affect the final number in the with the same worth or value.
At the moment, I just multiply all 5 numbers for each item, but it doesn't rank the items the way I would them to be ranked. I just want to combine all 5 numbers into a weighted number that I can use to rank all 6500 items, but I'm unsure of how to do this correctly or mathematically.
Note: The total quantity of the item traded per day and the donchian channel are numbers that are much higher then the spreads, which are more of percentage type numbers. This is probably the reason why multiplying them all together didn't work for me; the quantity traded per day and the donchian channel had a much bigger role in the final number.
The reason people are having trouble answering this question is we have no way of comparing two different "attributes". If there were just two attributes, say quantity traded and median price spread, would (20million,50%) be worse or better than (100,1%)? Only you can decide this.
Converting everything into the same size numbers could help, this is what is known as "normalisation". A good way of doing this is the z-score which Prasad mentions. This is a statistical concept, looking at how the quantity varies. You need to make some assumptions about the statistical distributions of your numbers to use this.
Things like spreads are probably normally distributed - shaped like a normal distribution. For these, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads).
Things like the quantity traded might be a Power law distribution. For these you might want to take the log() before calculating the mean and sd. That is the z score is z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)).
Then just add up the z-score for each attribute.
To do this for each attribute you will need to have an idea of its distribution. You could guess but the best way is plot a graph and have a look. You might also want to plot graphs on log scales. See wikipedia for a long list.
You can replace each attribute-vector x (of length N = 6500) by the z-score of the vector Z(x), where
Z(x) = (x - mean(x))/sd(x).
This would transform them into the same "scale", and then you can add up the Z-scores (with equal weights) to get a final score, and rank the N=6500 items by this total score. If you can find in your problem some other attribute-vector that would be an indicator of "goodness" (say the 10-day return of the security?), then you could fit a regression model of this predicted attribute against these z-scored variables, to figure out the best non-uniform weights.
Start each item with a score of 0. For each of the 5 numbers, sort the list by that number and add each item's ranking in that sorting to its score. Then, just sort the items by the combined score.
You would usually normalize your data entries to their respective range. Since there is no fixed range for them, you'll have to use a sliding range - or, to keep it simpler, normalize them to the daily ranges.
For each day, get all entries for a given type, get the highest and the lowest of them, determine the difference between them. Let Bottom=value of the lowest, Range=difference between highest and lowest. Then you calculate for each entry (value - Bottom)/Range, which will result in something between 0.0 and 1.0. These are the numbers you can continue to work with, then.
Pseudocode (brackets replaced by indentation to make easier to read):
double maxvalues[5];
double minvalues[5];
// init arrays with any item
for(i=0; i<5; i++)
maxvalues[i] = items[0][i];
minvalues[i] = items[0][i];
// find minimum and maximum values
foreach (items as item)
for(i=0; i<5; i++)
if (minvalues[i] > item[i])
minvalues[i] = item[i];
if (maxvalues[i] < item[i])
maxvalues[i] = item[i];
// now scale them - in this case, to the range of 0 to 1.
double scaledItems[sizeof(items)][5];
double t;
foreach(i=0; i<5; i++)
double delta = maxvalues[i] - minvalues[i];
foreach(j=sizeof(items)-1; j>=0; --j)
scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta;
// linear normalization
something like that. I'll be more elegant with a good library (STL, boost, whatever you have on the implementation platform), and the normalization should be in a separate function, so you can replace it with other variations like log() as the need arises.
Total quantity of item traded per day: The higher this number, the better. (a)
The Donchian Channel of the item over the last 5 days: The higher this number, the better. (b)
The median spread of the price: The lower this number, the better. (c)
The spread of the 20 day moving average for the item: The lower this number, the better. (d)
The spread of the 5 day moving average for the item: The higher this number, the better. (e)
a + b -c -d + e = "score" (higher score = better score)

Resources