Efficient divide-and-conquer algorithm - algorithm

At a political event, introducing 2 people determines if they represent the same party or not.
Suppose more than half of the n attendees represent the same party. I'm trying to find an efficient algorithm that will identify the representatives of this party using as few introductions as possible.
A brute force solution will be to maintain two pointers over the array of attendees, introducing n attendees to n-1 other attendees in O(n2) time. I can't figure out how to improve on this.
Edit: Formally,
You are given an integer n. There is a hidden array A of size n, such that more than half of the values in A are the same. This array represents the party affiliation of each person.
You are allowed queries of the form introduce(i, j), where i≠j, and 1 <= i, j <= n, and you will get a boolean value in return: You will get back 1, if A[i] = A[j], and 0 otherwise.
Output: B ⊆ [1, 2. ... n] where |B| > n/2 and the A-value of every element in B is the same.
Hopefully this explains the problem better.

This can be done in O(n) introductions using the Boyer–Moore majority vote algorithm.
Consider some arbitrary ordering of the attendees: A_1, A_2, ..., A_n. In the algorithm, you maintain a 'stored element', denoted by m. Whenever the algorithm wants to check if the current element (x) is same as the stored element or not, you introduce those two people and increment or decrement the counter accordingly. The stored element at the end will be a member of the majority party. Then, you can do another pass over all the other n - 1 people, and introduce each of them to this known person and hence find all the members of the majority party.
Thus, the total number of introductions is O(n).

Related

Application of binary indexed tree

I was trying to solve this algorithmic problem and I came across this nice solution:
"The idea is to treat the ai, bi and
ci asymmetrically. The BIT supports minimum
queries for key intervals starting at 1. We use ci as values and
bi as keys. Those are inserted in the order of increasing ai. This
way, for each ai in turn, the data structure allows queries for the
smallest value of cj (possibly ∞) for bj in [1..bi) and
aj < ai. We have cj < ci if and only if contestant i is not
excellent."
source
Now I am having hard time understanding this solution.
Here's what I understand of this solution: I know that binary indexed tree is used to answers queries like finding sum of an interval in an array and it also support updates in elements. It does both the operations in O(logn) time complexity each. Now this solution says that we build BIT with keys as ci, and value as bi, that is basically bi is an additional value that goes with each node. Now we insert elements in the tree with increasing values of ai, this is where I lost the grip. How does it matter in what order we are inserting nodes and what the statement says following that part, I have no idea.
Please help me understand what this solution says.
Let's find all non-excellent participants. Another participant j can be better than the participant i only if his a[j] < a[i]. Thus, we can ignore all participants with a larger value of a[j]. That's why we sort them by a.
This condition is necessary, but it's not sufficient. We also need to check b and c. How can we do that? We need to know if there's a guy a[j] < a[i] (that is, the one who goes before the current one in the sorted order) such that his b[j] < b[i] and c[j] < c[i]. We build a BIT (with c[j] as keys and b[j] is values) to check the last two conditions. It's clear that such j exists if and only if the minimum on the prefix [0, c[i]) is less than b[i].
To sum up, the idea is as follows: we sort them by a[i] and then ignore the values of a. This way, we go from a 3-D to a 2-D problem, which is simpler to solve (that's why the order matters. The guy with larger a[i] is never better). We use a BIT to solve a 2-D problem.

Greedy Maximum Flow

The Dining Problem:
Several families go out to dinner together. In order to increase their social interaction, they would like to sit at tables so that no two members of the same family are at the same table. Assume that the dinner contingent has p families and that the ith family has a(i) members. Also, assume there are q tables available and that the jth table has a seating capacity of b(j).
The question is:
What is the maximum number of persons we can sit on the tables?
EDIT:
This problem can be solved creating a Graph and running a maximum flow algorithm. But if we have 2*10^3 vertices with Dinic algorithm, the global complexity is O(10^6*10^6) = O(10^12).
If we only sit always the larger groups first, in a greedy manner. The complexity is O(10^6).
So my questions are:
1) Does the greedy approach in this problem work?
2) What is the best algorithm to solve this problem?
Yes, greedily seating the largest families first is a correct solution. We just need to prove that, after we seat the next largest family, there is a way to seat the remaining families correctly.
Suppose that an instance is solvable. We prove by induction that there exists a solution after the greedy algorithm seats the k largest families. The basis k = 0 is obvious, since the hypothesis to be proved is that there exists a solution. Inductively, suppose that there exists a solution that extends greedy's partial assignment for the first k - 1 families. Now greedy extends its partial assignment by seating the kth family. We edit the known solution to restore the inductive hypothesis.
While we still can, find a table T1 where greedy has seated a kth family member but the known solution has not. If there is space in the known solution at T1, move a kth family member from a table where greedy has none. Otherwise, the known solution has a family member not in the k largest families seated at T1. Since that family is smaller than the kth largest, a kth largest family member occupies a table T2 that the smaller family does not. Swap these members.
It is easy to come up with examples where such seating is simply impossible, so here's a pseudo code for solving the problem assuming that the problem is solvable:
Sort each family i by a(i) in decreasing order
Add each table j to a max-heap with b(j) as the key
For each family i from the sorted list:
Pop a(i) tables from max-heap
Add one member of i to each table
Add each table j back into the max-heap with b(j) = b(j) - 1
Let n = a(1) + a(2) + ... + a(p) (i.e. total number of people)
Assuming a binary heap is used for the max-heap, the time complexities are:
Sorting families: O(plog(p))
Initializing max-heap of tables: O(qlog(q))
All pops and pushes to/from max-heap: O(nlog(q))
Giving the total time complexity of O(plog(p) + qlog(q) + nlog(q)), where O(nlog(q)) will likely dominate.
Since we are dealing with integers, if we use a 1D bucket system for the max-heap such that c is the maximum b(j), then we will end up with just O(n + c) (assuming the max-heap operations dominate), which maybe quicker.
Finally, please up-vote David's answer as the proof was required and is awesome.

How can I find the maximum sum of a sub-sequence using dynamic programming?

I'm re-reading Skiena's Algorithm Design Manual to catch up on some stuff I've forgotten since school, and I'm a little baffled by his descriptions of Dynamic Programming. I've looked it up on Wikipedia and various other sites, and while the descriptions all make sense, I'm having trouble figuring out specific problems myself. Currently, I'm working on problem 3-5 from the Skiena book. (Given an array of n real numbers, find the maximum sum in any contiguous subvector of the input.) I have an O(n^2) solution, such as described in this answer. But I'm stuck on the O(N) solution using dynamic programming. It's not clear to me what the recurrence relation should be.
I see that the subsequences form a set of sums, like so:
S = {a,b,c,d}
a a+b a+b+c a+b+c+d
b b+c b+c+d
c c+d
d
What I don't get is how to pick which one is the greatest in linear time. I've tried doing things like keeping track of the greatest sum so far, and if the current value is positive, add it to the sum. But when you have larger sequences, this becomes problematic because there may be stretches of negative numbers that would decrease the sum, but a later large positive number may bring it back to being the maximum.
I'm also reminded of summed area tables. You can calculate all the sums using only the cumulative sums: a, a+b, a+b+c, a+b+c+d, etc. (For example, if you need b+c, it's just (a+b+c) - (a).) But don't see an O(N) way to get it.
Can anyone explain to me what the O(N) dynamic programming solution is for this particular problem? I feel like I almost get it, but that I'm missing something.
You should take a look to this pdf back in the school in http://castle.eiu.edu here it is:
The explanation of the following pseudocode is also int the pdf.
There is a solution like, first sort the array in to some auxiliary memory, then apply Longest Common Sub-Sequence method to the original array and the sorted array, with sum(not the length) of common sub-sequence in the 2 arrays as the entry into the table (Memoization). This can also solve the problem
Total running time is O(nlogn)+O(n^2) => O(n^2)
Space is O(n) + O(n^2) => O(n^2)
This is not a good solution when memory comes into picture. This is just to give a glimpse on how problems can be reduced to one another.
My understanding of DP is about "making a table". In fact, the original meaning "programming" in DP is simply about making tables.
The key is to figure out what to put in the table, or modern terms: what state to track, or what's the vertex key/value in DAG (ignore these terms if they sound strange to you).
How about choose dp[i] table being the largest sum ending at index i of the array, for example, the array being [5, 15, -30, 10]
The second important key is "optimal substructure", that is to "assume" dp[i-1] already stores the largest sum for sub-sequences ending at index i-1, that's why the only step at i is to decide whether to include a[i] into the sub-sequence or not
dp[i] = max(dp[i-1], dp[i-1] + a[i])
The first term in max is to "not include a[i]", the second term is to "include a[i]". Notice, if we don't include a[i], the largest sum so far remains dp[i-1], which comes from the "optimal substructure" argument.
So the whole program looks like this (in Python):
a = [5,15,-30,10]
dp = [0]*len(a)
dp[0] = max(0,a[0]) # include a[0] or not
for i in range(1,len(a)):
dp[i] = max(dp[i-1], dp[i-1]+a[i]) # for sub-sequence, choose to add or not
print(dp, max(dp))
The result: largest sum of sub-sequence should be the largest item in dp table, after i iterate through the array a. But take a close look at dp, it holds all the information.
Since it only goes through items in array a once, it's a O(n) algorithm.
This problem seems silly, because as long as a[i] is positive, we should always include it in the sub-sequence, because it will only increase the sum. This intuition matches the code
dp[i] = max(dp[i-1], dp[i-1] + a[i])
So the max. sum of sub-sequence problem is easy, and doesn't need DP at all. Simply,
sum = 0
for v in a:
if v >0
sum += v
However, what about largest sum of "continuous sub-array" problem. All we need to change is just a single line of code
dp[i] = max(dp[i-1]+a[i], a[i])
The first term is to "include a[i] in the continuous sub-array", the second term is to decide to start a new sub-array, starting a[i].
In this case, dp[i] is the max. sum continuous sub-array ending with index-i.
This is certainly better than a naive approach O(n^2)*O(n), to for j in range(0,i): inside the i-loop and sum all the possible sub-arrays.
One small caveat, because the way dp[0] is set, if all items in a are negative, we won't select any. So for the max sum continuous sub-array, we change that to
dp[0] = a[0]

Does a combination of K integers exist, so that their sum is equal to a given number?

I've been breaking a sweat over this question I've been asked to answer (it's technically homework).
I've considered a hashtable but I'm kind of stuck on the exact specifics of how I'd make this work
Here's the question:
Given k sets of integers A1,A2,..,Ak of total size O(n), you should determine whether exist
a1 ϵ A1, a2 ϵ A2,..,ak ϵ Ak, such that a1+a2+..+ak−1 =ak. Your algorithm should run in Tk(n)
time, where Tk(n) = O(nk/2 × log n) for even k, and O(n(k+1)/2) for odd values of k.
Can anyone give me a general direction so that I can come closer to solving this?
Divide the k sets into two groups. For even k, both groups have k/2 sets each. For odd k, one group has (k+1)/2 and the other has (k-1)/2 sets. Compute all possible sums (taking one element from each set) within each group. For even k, you will get two arrays, each with nk/2 elements. For odd k, one array has n(k+1)/2 and the other array has n(k-1)/2 elements. The problem is reduced to the standard one "Given two arrays, check if a specified sum can be reached by taking one element from each array".

Revisit: 2D Array Sorted Along X and Y Axis

So, this is a common interview question. There's already a topic up, which I have read, but it's dead, and no answer was ever accepted. On top of that, my interests lie in a slightly more constrained form of the question, with a couple practical applications.
Given a two dimensional array such that:
Elements are unique.
Elements are sorted along the x-axis and the y-axis.
Neither sort predominates, so neither sort is a secondary sorting parameter.
As a result, the diagonal is also sorted.
All of the sorts can be thought of as moving in the same direction. That is to say that they are all ascending, or that they are all descending.
Technically, I think as long as you have a >/=/< comparator, any total ordering should work.
Elements are numeric types, with a single-cycle comparator.
Thus, memory operations are the dominating factor in a big-O analysis.
How do you find an element? Only worst case analysis matters.
Solutions I am aware of:
A variety of approaches that are:
O(nlog(n)), where you approach each row separately.
O(nlog(n)) with strong best and average performance.
One that is O(n+m):
Start in a non-extreme corner, which we will assume is the bottom right.
Let the target be J. Cur Pos is M.
If M is greater than J, move left.
If M is less than J, move up.
If you can do neither, you are done, and J is not present.
If M is equal to J, you are done.
Originally found elsewhere, most recently stolen from here.
And I believe I've seen one with a worst-case O(n+m) but a optimal case of nearly O(log(n)).
What I am curious about:
Right now, I have proved to my satisfaction that naive partitioning attack always devolves to nlog(n). Partitioning attacks in general appear to have a optimal worst-case of O(n+m), and most do not terminate early in cases of absence. I was also wondering, as a result, if an interpolation probe might not be better than a binary probe, and thus it occurred to me that one might think of this as a set intersection problem with a weak interaction between sets. My mind cast immediately towards Baeza-Yates intersection, but I haven't had time to draft an adaptation of that approach. However, given my suspicions that optimality of a O(N+M) worst case is provable, I thought I'd just go ahead and ask here, to see if anyone could bash together a counter-argument, or pull together a recurrence relation for interpolation search.
Here's a proof that it has to be at least Omega(min(n,m)). Let n >= m. Then consider the matrix which has all 0s at (i,j) where i+j < m, all 2s where i+j >= m, except for a single (i,j) with i+j = m which has a 1. This is a valid input matrix, and there are m possible placements for the 1. No query into the array (other than the actual location of the 1) can distinguish among those m possible placements. So you'll have to check all m locations in the worst case, and at least m/2 expected locations for any randomized algorithm.
One of your assumptions was that matrix elements have to be unique, and I didn't do that. It is easy to fix, however, because you just pick a big number X=n*m, replace all 0s with unique numbers less than X, all 2s with unique numbers greater than X, and 1 with X.
And because it is also Omega(lg n) (counting argument), it is Omega(m + lg n) where n>=m.
An optimal O(m+n) solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Resources