I'm trying to solve this problem:
There is a shuffled list of the n most important events from year 0 to the end of the second millennium (year 2000).
What is the most efficient way to sort the list, when all the events of the same year are sorted in the list (for example, event form January 2014 comes before an event from February 2014, and the events don't have to be in adjacent indexes)?
I understand I should use a stable algorithm, and the answer depends on n: if n is bigger than some number, I should use radix, else I should use counting. But I don't know how to show it formally.
I also know that if the range of the years is much bigger than the linear time, than radix is preferable.
Related
I'm doing some practice interview questions and came across this one:
Given a list of integers which represent hedge heights, determine the minimum number of moves to make the hedges pretty - that is, compute the minimum number of changes needed to make the array alternate between increasing and decreasing. For example, [1,6,6,4,4] should return 2 as you need to change the second 6 to something >6 and the last 4 to something <4. Assume the min height is 1 and the max height is 9. You can change to any number that is between 1 and 9, and that counts as 1 move regardless of the diff to the current number.
My solution is here: https://repl.it/#plusfuture/GrowlingOtherEquipment
I'm trying to figure out the big O runtime for this solution, which is memoized recursion. I think it's O(n^3) because for each index, I need to check against 3 possible states for the rest of the array, changeUp, noChange, and changeDown. My friend maintains that it's O(n) since I'm memoizing most of the solutions and exiting branches where the array is not "pretty" immediately.
Can someone help me understand how to analyze the runtime for this solution? Thanks.
Lets say i have an array of birth and death years of 100 people between the years 1900 - 2000.
ergo:
persons[0][birth] = 1900;
persons[0][death] = 1960;
.
.
.
persons[99][birth] = 1904;
persons[99][death] = 1982;
And i would want to find the year that most people lived in between those ranges.
The obvious way is by going through each person, calculate the years he lived (death - birth), and adding these years to an accumulating array using an inside loop. Runtime for that would be O(n^2).
Is there a more efficient way to do this?
The O notation may be a bit misleading sometimes. You do not have only one variable. Number of people is n. Length of life let be y, the whole interval Y. Precisely evaluated Your solution is not O(n^2), but O(MAX(n*y, Y)). If You say, that Y is 100 and always will be, Your solution is O(n), because y<=Y and they are both eliminated as constants. You do twice as much work with twice as much people, so it really is linear, just the constants are high. Marking it O(n^2) strictly speaking means, that length of life is proportional to number of people, that is probably nonsense. Let say, from now on, that y is equal to Y. Now Your solution simplifies to O(n*y).
You can improve Your algorithm. Store +1 for birth and -1 for death into the accumulating array. Only two values for every person. Now You can go through that array once and get number of people living in every year. The time would be O(MAX(n, y)). It is better than solution of #Aziuth when n*log(n) grows faster than y, but worse otherwise.
Create an array that stores a pair of data ([birth or death], year). Every person will create two sets of such data. This takes O(n).
Sort this array by year. This takes O(nlogn).
Process through that array and keep the current amount of people alive in a variable, decreasing and increasing accordingly. Store the maximal amount to that point. This takes again O(n).
Example:
A: born 1904, died 1960. B: born 1930, died 1960. C: born 1940, died 1950.
-> array of deaths and births starts as (birth, 1904), (death, 1960), ...
-> array is sorted: (birth, 1904), (birth, 1930), (birth 1940), (death, 1950), (death, 1960), (death, 1960)
-> processing array: At 1904, one person lives. At 1930, two persons live. At...
Lately, I have been learning about various methods of sorting and a lot of them are unstable i.e selection sort, quick sort, heap sort.
My question is: What are the general factors that make sorting unstable?
Most of the efficient sorting algorithms are efficient since they move data over a longer distance i.e. far closer to the final position every move. This efficiency causes the loss of stability in sorting.
For example, when you do a simple sort like bubble sort, you compare and swap neighboring elements. In this case, it is easy to not move the elements if they are already in the correct order. But say in the case of quick-sort, the partitioning process might chose to say move so the swaps are minimal. For example, if you partition the below list on the number 2, the most efficient way would be to swap the 1st element with the 4th element and 2nd element with the 5th element
2 3 1 1 1 4
1 1 1 2 3 4
If you notice, now we have changed the sequence of 1's in the list causing it to be unstable.
So to sum it up, some algorithms are very suitable for stable sorting (like bubble-sort), whereas some others like quick sort can be made stable by carefully selecting a partitioning algorithm, albeit at the cost of efficiency or complexity or both.
We usually classify the algorithm to be stable or not based on the most "natural" implementation of it.
A sorting algorithm is stable when it uses the original order of elements to break ties in the new ordering. For example, lets say you have records of (name, age) and you want to sort them by age.
If you use a stable sort on (Matt, 50), (Bob, 20), (Alice, 50), then you will get (Bob, 20), (Matt, 50), (Alice, 50). The Matt and Alice records have equal ages, so they are equal according to the sorting criteria. The stable sort preserves their original relative order -- Matt came before Alice in the original list, so it comes before Alice in the output.
If you use an unstable sort on the same list, you might get (Bob, 20), (Matt, 50), (Alice, 50) or you might get (Bob, 20), (Alice, 50), (Matt, 50). Elements that compare equal will be grouped together but can come out in any order.
It's often handy to have a stable sort, but a stable sort implementation has to remember information about the original order of the elements while its reordering them.
In-place array sorting algorithms are designed not to use any extra space to store this kind of information, and they destroy the original ordering while they work. The fast ones like quicksort aren't usually stable, because reordering the array in ways that preserve the original order to break ties is slow. Slow array sorting algorithms like insertion sort or selection sort can usually be written to be stable without difficulty.
Sorting algorithms that copy data from one place to another, or work with other data structures like linked lists, can be both fast and stable. Merge sort is the most common.
If you have an example input of
1 5 3 7 1
You want for to be stable the last 1 to never go before the first 1.
Generally, elements with the same value in the input array to not have changed their positions once sorted.
Then sorted would look like:
1(f) 3 5 7 1(l)
f: first, l: last(or second if more than 2).
For example, QuickSort uses swaps and because the comparisons are done with greater than (>=) or less than, equally valued elements can be swapped while sorting. And as result in the output.
I was recently asked this question in an interview. Even though I was able to come up the O(n²) solution, the interviewer was obsessed with an O(n) solution. I also checked few other solutions of O(n logn) which I understood, but O(n) solution is still not my cup of tea which assumes appointments sorted by start-time.
Can anyone explain this?
Problem Statement: You are given n appointments. Each appointment contains a start time and an end time. You have to retun all conflicting appointments efficiently.
Person: 1,2, 3, 4, 5
App Start: 2, 4, 29, 10, 22
App End: 5, 7, 34, 11, 36
Answer: 2x1 5x3
O(n logn) algorithm: separate start and end point like this:
2s, 4s, 29s, 10s, 22s, 5e, 7e, 34e, 11e, 36e
then sort all of this points (for simplicity let's assume each point is unique):
2s, 4s, 5e, 7e, 10s, 11e, 22s, 29s, 34e, 36e
if we have consecutive starts without ends then it is overlapping:
2s, 4s are adjacent so overlapping is there
We will keep a count of "s" and each time we encounter it will +1, and when e is encountered we decrease count by 1.
The general solution to this problem is not possible in O(n).
At a minimum you need to sort by appointment start time, which requires O(n log n).
There is an O(n) solution if the list is already sorted. The algorithm basically involves checking whether the next appointment is overlapped by any previous ones. There is a bit of subtlety to this one as you actually need two pointers into the list as you run through it:
The current appointment being checked
The appointment with the latest end time encountered so far (which might not be the previous appointment)
O(n) solutions for the unsorted case could only exist if you have other constraints, e.g. a fixed number of appointment timeslots. If this was the case, then you can use HashSets to determine which appointment(s) cover each timeslot, algorithm roughly as follows:
Create a HashSet for each timeslot - O(1) since timeslot number is a fixed constant
For each appointment, store its ID number in HashSets of slot(s) that it covers - O(n) since updating a constant number of timeslots is O(1) for each appointment
Run through the slots, checking for overlaps - O(1) (or O(n) if you want to iterate over the overlapping appointments to return them as results)
Assuming you have some constraint on the start and end times, and on the resolution at which you do the scheduling, it seems like it would be fairly easy to turn each appointment into a bitmap of times it does/doesn't use, then do a counting sort (aka bucket sort) on the in-use slots. Since both of those are linear, the result should be linear (though if I'm thinking correctly, it should be linear on the number of time slots rather than the number of appointments).
At least if I asked this as an interview question, the main thing I'd be hoping for is the candidate to ask about those constraints (i.e., whether those constraints are allowed). Given the degree to which it's unrealistic to schedule appointments for 1000 years from now, or schedule to a precision of even a minute (not to mention something like a nanosecond), they strike me as reasonable constraints, but you should ask before assuming them.
A naive approach might be to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. This allows discarding half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n).
This is the best I can think of, in horrible pseudocode. I attempted to reduce the problem as much as possible. This is only less than On^2 (I think).
Note that the output at the end will not show every appointment that a given appointment will conflict with on that appointment's specific output line...but at some point every conflict is displayed.
Also note that I renamed the appointments numerically in order of starting time.
output would be something like the following:
Appointment 1 conflicts with 2
Appointment 2 conflicts with
Appointment 3 conflicts with
Appointment 4 conflicts with 5
Appointment 5 conflicts with
appt{1},appt{2},appt{3} ,appt{4} ,appt{5}
2 4 10 22 29
5 7 11 36 34
pseudocode
list=(1,2,3,4,5)
for (i=1,i<=5,i++)
list.shift() **removes first element
appt{i}.conflictswith()=list
for (i=1,i<=n,i++)
{ number=n
done=false
while(done=false)
{if (number>i)
{if (appt(i).endtime() < appt(number).startime())
{appt{i}.conflictswith().pop()}
else
{done=true}
number--
}
else
{done=true}
}
}
for (i=1,i<=n,i++)
print "Appointment ",i," conflicts with:",appt{i}.conflictswith()
I came across a Data Structure called Interval tree, with the help of which we can find intervals in less than O(n log (n)) time, depending upon the data provided
Every now and then I read all those conspiracy theories about Lotto-based games being controlled and a computer browsing through the combinations chosen by the players and determining the non-used subset. It got me thinking - how would such algorithm have to work in order to determine such subsets really efficiently? Finding non-used numbers is definitely crossed out as is finding the least used because it's not necesserily providing us with a solution. Also, going deeper, how could an algorithm efficiently choose such a subset that it was used some k times by the players? Saying more formally:
We are given a set of 50 numbers 1 to 50. In the draw 6 numbers are picked.
INPUT: m subsets each consisting of 6 distinct numbers 1 to 50 each,
integer k (0<=k) being the maximum players having all of their 6
numbers correct.
OUTPUT: Subsets which make not more than k players win the jackpot ('winning' means all the numbers they chose were picked in the draw).
Is there any efficient algorithm which could calculate this without using a terrabyte HDD to store all the encounters of every possible 50!/(44!*6!) in the pessimistic case? Honestly, I can't think of any.
If I wanted to run such a conspirancy I would first of all acquire the list of submissions by players. Then I would generate random lottery selections and see how many winners would be produced by each such selection. Then just choose the random lottery selection most attractive to me. There is little point doing anything more sophisticated, because that is probably already powerful enough to be noticed by staticians.
If you want to corrupt the lottery it would probably be easier and safer to select a few competitors you favour and have them win the lottery. In (the book) "1984" I think the state simply announced imaginary lottery winners, with the announcement in each area announcing somebody outside the area. One of the ideas in "The Beckoning Lady" by Margery Allingham is of a gang who attempt to set up a racecourse so they can rig races to allow them to disguise bribes as winnings.
First of all, the total number of combinations (choosing 6 from 50) is not very large. It is about 16 million which can be easily handled.
For each combination keep a count of number of people who played it. While declaring a winner choose the combination that has less than k plays.
If the number within each subset are sorted, then you can treat your subsets as strings - sort them in lexicographical order, then it is easy to count how many players selected each subset (and which subsets were not selected at all). So the time is proportional to the number of players and not the number of numbers in the lottery.