Speeding up LCS algorithm for graph construction - performance

Referencing 2nd question from INOI 2011:
N people live in Sequence Land. Instead of a name, each person is identified by a sequence of integers, called his or her id. Each id is a sequence with no duplicate elements. Two people are said to be each other’s relatives if their ids have at least K elements in common. The extended family of a resident of Sequence Land includes herself or himself, all relatives, relatives of relatives, relatives of relatives of relatives, and so on without any limit.
Given the ids of all residents of Sequence Land, including its President, and the number K, find the number of people in the extended family of the President of Sequence Land.
For example, suppose N = 4 and K = 2. Suppose the President has id (4, 6, 7, 8) and the other three residents have ids (8, 3, 0, 4), (0, 10), and (1, 2, 3, 0, 5, 8). Here, the President is directly related to (8, 3, 0, 4), who in turn is directly related to (1, 2, 3, 0, 5, 8). Thus, the President’s extended family consists of everyone other than (0, 10) and so has size 3.
Limits: 1 <= n <= 300 & 1 <= K <= 300. Number of elements per id: 1-300
Currently, my solution is as follows:
For every person, compare his id to all other id's using an algorithm same as LCS, it can be edited to stop searching if k elements aren't there etc. etc. to improve it's average case performance. Time complexity = O(n^2*k^2)
Construct adjacency list using previous step result.
Use BFS. Output results
But the overall time complexity of this algorithm is not good enough for the second subtask. I googled around a little bit, and found most solutions to be similar to that of mine, and not working for larger subtask. The only thing close to a good solution was this one -> Yes, this question has been asked previously. The reason I'm asking essentially the same question again is that that solution is really tough to work with and implement. Recently, a friend of mine told me about a much better solution he read somewhere.
Can someone help me create a better solution ?
Even pointers to better solution would be great.

Related

Choosing permutations with constraints [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to solve the “Mastermind” guessing game?
I have to choose k items out of n choices, and my selection needs to be in the correct order (i.e. permutation, not combination). After I make a choice, I receive a hint that tells me how many of my selections were correct, and how many were in the correct order.
For example, if I'm trying to choose k=4 out of n=6 items, and the correct ordered set is 5, 3, 1, 2, then an exchange may go as follows:
0,1,2,3
(3, 0) # 3 correct, 0 in the correct position
0,1,2,5
(3, 0)
0,1,5,3
(3, 0)
0,5,2,3
(3,0)
5,1,2,3
(4,1)
5,3,1,2
(4,4)
-> correct order, the game is over
The problem is I'm only given a limited number of tries to get the order right, so if n=6, k=4, then I only get t=6 tries, if n=10,k=5 then t=5, and if n=35,k=6 then t=18.
Where do I start to write an algorithm that solves this? It almost seems like a constraint solving problem. The hard part seems to be that I only know something for sure if I only change 1 thing at once, but the upper bound on that is way more than the number of tries I get.
A simple strategy for an algorithm is to come up with a next guess that is consistent with all previous hints. This will eventually lead to the right solution, but most likely not in the lowest possible number of guesses.
As I can see, this a variation of mastermind board game http://en.m.wikipedia.org/wiki/Mastermind_(board_game)
Also, you can find more details about the problem in this paper
http://arxiv.org/abs/cs.CC/0512049

Sorting Algorithm : output

I faced this problem on a website and I quite can't understand the output, please help me understand it :-
Bogosort, is a dumb algorithm which shuffles the sequence randomly until it is sorted. But here we have tweaked it a little, so that if after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle we get (1, 2, 5, 4, 3, 6) we will keep 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm. Calculate the expected amount of shuffles for the improved algorithm to sort the sequence of the first n natural numbers given that no elements are in the right places initially.
Input:
2
6
10
Output:
2
1826/189
877318/35343
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions. I just can't understand the output.
I assume you found the problem on CodeChef. There is an explanation of the answer to the Bogosort problem here.
Ok I think I found the answer, there is a similar problem here https://math.stackexchange.com/questions/20658/expected-number-of-shuffles-to-sort-the-cards/21273 , and this problem can be thought of as its extension

algorithm - Sort an array with LogLogN distinct elements

This is not my school home work. This is my own home work and I am self-learning algorithms.
In Algorithm Design Manual, there is such an excise
4-25 Assume that the array A[1..n] only has numbers from {1, . . . , n^2} but that at most log log n of these numbers ever appear. Devise an algorithm that sorts A in substantially less than O(n log n).
I have two approaches:
The first approach:
Basically I want to do counting sort for this problem. I can first scan the whole array (O(N)) and put all distinct numbers into a loglogN size array (int[] K).
Then apply counting sort. However, when setting up the counting array (int[] C), I don't need to set its size as N^2, instead, I set the size as loglogN too.
But in this way, when counting the frequencies of each distinct number, I have to scan array K to get that element's index (O(NloglogN) and then update array C.
The second approach:
Again, I have to scan the whole array to get a distinct number array K with size loglogN.
Then I just do a kind of quicksort like, but the partition is based on median of K array (i.e., each time the pivot is an element of K array), recursively.
I think this approach will be best, with O(NlogloglogN).
Am I right? or there are better solutions?
Similar excises exist in Algorithm Design Manual, such as
4-22 Show that n positive integers in the range 1 to k can be sorted in O(n log k) time. The interesting case is when k << n.
4-23 We seek to sort a sequence S of n integers with many duplications, such that the number of distinct integers in S is O(log n). Give an O(n log log n) worst-case time algorithm to sort such sequences.
But basically for all these excises, my intuitive was always thinking of counting sort as we can know the range of the elements and the range is short enough comparing to the length of the whole array. But after more deeply thinking, I guess what the excises are looking for is the 2nd approach, right?
Thanks
We can just create a hash map storing each element as key and its frequency as value.
Sort this map in log(n)*log(log(n)) time i.e (klogk) using any sorting algorithm.
Now scan the hash map and add elements to the new array frequency number of times. Like so:
total time = 2n+log(n)*log(log(n)) = O(n)
Counting sort is one of possible ways:
I will demonstrate this solution on example 2, 8, 1, 5, 7, 1, 6 and all number are <= 3^2 = 9. I use more elements to make my idea more clearer.
First for each number A[i] compute A[i] / N. Lets call this number first_part_of_number.
Sort this array using counting sort by first_part_of_number.
Results are in form (example for N = 3)
(0, 2)
(0, 1)
(0, 1)
(2, 8)
(2, 6)
(2, 7)
(2, 6)
Divide them into groups by first_part_of_number.
In this example you will have groups
(0, 2)
(0, 1)
(0, 1)
and
(2, 8)
(2, 6)
(2, 7)
(2, 6)
For each number compute X modulo N. Lets call it second_part_of_number. Add this number to each element
(0, 2, 2)
(0, 1, 1)
(0, 1, 1)
and
(2, 8, 2)
(2, 6, 0)
(2, 7, 1)
(2, 6, 0)
Sort each group using counting sort by second_part_of_number
(0, 1, 1)
(0, 1, 1)
(0, 2, 2)
and
(2, 6, 0)
(2, 6, 0)
(2, 7, 1)
(2, 8, 2)
Now combine all groups and you have result 1, 1, 2, 6, 6, 7, 8.
Complexity:
You were using only counting sort on elements <= N.
Each element took part in exactly 2 "sorts". So overall complexity is O(N).
I'm going to betray my limited knowledge of algorithmic complexity here, but:
Wouldn't it make sense to scan the array once and build something like a self-balancing tree? As we know the number of nodes in the tree will only grow to (log log n) it is relatively cheap (?) to find a number each time. If a repeat number is found (likely) a counter in that node is incremented.
Then to construct the sorted array, read the tree in order.
Maybe someone can comment on the complexity of this and any flaws.
Update: After I wrote the answer below, #Nabb showed me why it was incorrect. For more information, see Wikipedia's brief entry on Õ, and the links therefrom. At least because it is still needed to lend context to #Nabb's and #Blueshift's comments, and because the whole discussion remains interesting, my original answer is retained, as follows.
ORIGINAL ANSWER (INCORRECT)
Let me offer an unconventional answer: though there is indeed a difference between O(n*n) and O(n), there is no difference between O(n) and O(n*log(n)).
Now, of course, we all know that what I just said is wrong, don't we? After all, various authors concur that O(n) and O(n*log(n)) differ.
Except that they don't differ.
So radical-seeming a position naturally demands justification, so consider the following, then make up your own mind.
Mathematically, essentially, the order m of a function f(z) is such that f(z)/(z^(m+epsilon)) converges while f(z)/(z^(m-epsilon)) diverges for z of large magnitude and real, positive epsilon of arbitrarily small magnitude. The z can be real or complex, though as we said epsilon must be real. With this understanding, apply L'Hospital's rule to a function of O(n*log(n)) to see that it does not differ in order from a function of O(n).
I would contend that the accepted computer-science literature at the present time is slightly mistaken on this point. This literature will eventually refine its position in the matter, but it hasn't done, yet.
Now, I do not expect you to agree with me today. This, after all, is merely an answer on Stackoverflow -- and what is that compared to an edited, formally peer-reviewed, published computer-science book -- not to mention a shelffull of such books? You should not agree with me today, only take what I have written under advisement, mull it over in your mind these coming weeks, consult one or two of the aforementioned computer-science books that take the other position, and make up your own mind.
Incidentally, a counterintuitive implication of this answer's position is that one can access a balanced binary tree in O(1) time. Again, we all know that that's false, right? It's supposed to be O(log(n)). But remember: the O() notation was never meant to give a precise measure of computational demands. Unless n is very large, other factors can be more important than a function's order. But, even for n = 1 million, log(n) is only 20, compared, say, to sqrt(n), which is 1000. And I could go on in this vein.
Anyway, give it some thought. Even if, eventually, you decide that you disagree with me, you may find the position interesting nonetheless. For my part, I am not sure how useful the O() notation really is when it comes to O(log something).
#Blueshift asks some interesting questions and raises some valid points in the comments below. I recommend that you read his words. I don't really have a lot to add to what he has to say, except to observe that, because few programmers have (or need) a solid grounding in the mathematical theory of the complex variable, the O(log(n)) notation has misled probably, literally hundreds of thousands of programmers to believe that they were achieving mostly illusory gains in computational efficiency. Seldom in practice does reducing O(n*log(n)) to O(n) really buy you what you might think that it buys you, unless you have a clear mental image of how incredibly slow a function the logarithm truly is -- whereas reducing O(n) even to O(sqrt(n)) can buy you a lot. A mathematician would have told the computer scientist this decades ago, but the computer scientist wasn't listening, was in a hurry, or didn't understand the point. And that's all right. I don't mind. There are lots and lots of points on other subjects I don't understand, even when the points are carefully explained to me. But this is a point I believe that I do happen to understand. Fundamentally, it is a mathematical point not a computer point, and it is a point on which I happen to side with Lebedev and the mathematicians rather than with Knuth and the computer scientists. This is all.

Dynamic programming question

A circus is designing a tower routine consisting of people standing atop one another’s
shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people
in such a tower.
EXAMPLE:
Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)
Someone suggested me the following:
It can be done as follows:
Sort the input in decreasing order of weight and find the longest decreasing sequence of hight.
Sort the input in decreasing order of hight and find the longest decreasing sequence of weight.
Take max of 1 and 2.
I dont understand why we need to do both steps 1 and 2. Cant we just do 1 and find the answer. IF not , please give example in which doing only step 1 does not give answer?
Result of 1 and 2 has to be same. It's not possible that one of them is shorter, because in a solution elements are descending both in height and weight so if it satisfies 1 or 2 it will satisfy the other as well, If it would be shorter it wouldn't be the longest.
You might need to say something about the weights & heights all being unique.
Otherwise, if
A is (10, 10) // (w, h)
B is ( 9, 10)
C is ( 9, 8)
Then neither method gets the correct answer!
C obviously can stand on A's shoulders.
Edit:
Neither method is good enough!
Example with all weights & heights unique:
A : (12, 12)
B : (11, 8)
C : (10, 9)
D : ( 9, 10)
E : ( 8, 11)
F : ( 7, 7)
Both methods give an answer of 2, however the tower can be at least of height 3 with several combinations:
A on the bottom,
then any of B, C, D, or E,
then F on top.
I think stricter rules on the input data are needed to make this problem solvable by the given methods.
You're absolutely correct. Doing just one direction is enough.
A proof is easy by using the maximum property of the subsequence. We assume one side (say the left) of values is ordered, and take the longest descending subsequence of the right. We now perform the other operation, order the right and take the subsequence from the left.
If we arrive at a list that is either shorter or longer than the first one we found we have reached a contradiction, since that subsequence was ordered in the very same relative order in the first operation, and thus we could have found a longer descending subsequence, in contradiction to the assumption that the one we took was maximal. Similarly if it's shorter then the argument is symmetrical.
We conclude that finding the maximum on just one side will be the same as the maximum of the reverse ordered operation.
Worth noting that I haven't proven that this is a solution to the problem, just that the one-sided algorithm is equivalent to the two-sided version. Although the proof that this is correct is almost identical, assume that there exists a longer solution and it contradicts the maximalness of the subsequence. That proves that there is nothing longer, and it's trivial to see that every solution the algorithm produces is a valid solution. Which means the algorithm's result is both >= the solution and <= the solution, therefore it is the solution.
It doesn't make any difference. And it is unnecessary to pre-sort as you end up with the same graph to search.
As far as I can see, this is the same question as Box stacking problem:
Also: http://people.csail.mit.edu/bdean/6.046/dp/

Bogosort optimization, probability related

I'm coding a question on an online judge for practice . The question is regarding optimizing Bogosort and involves not shuffling the entire number range every time. If after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle Johnny gets (1, 2, 5, 4, 3, 6) he will fix 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm.
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions.
A sample input/output says that for n=6, the answer is 1826/189.
I don't quite understand how the answer was arrived at.
This looks similar to 2011 Google Code Jam, Preliminary Round, Problem 4, however the answer is n, I don't know how you get 1826/189.

Resources