Is there a way to reduce the complexity of this program? - algorithm

the problem is as follows:
Consider a table with 1 column as follows:
a,b
b,c
a,c
a,b,d
...
I have to find a list of elements such that exactly 1 element of the list is present in every row of the table.
The only way I can think of a creating every possible subset and check if 1 of the subset satisfies the condition

Related

matrix problem solving - find the first column that has a value 1 in a matrix of 1's and 0's

Was asked this question in a coding round:
Given a matrix of 0's and 1's where, in any row - the values will be ascending order. i.e 1's are always after the 0's. Consider the example :
0,0,0,1,1
0,0,1,1,1
0,0,0,0,1
1,1,1,1,1
0,0,0,0,0
Find the first column that has a 1. ( from left - right )
In this case the first column ( in row 4 ) has a 1.
Answer is 1
I suggested a column wise traversal across all rows and exit when the current column encounters 1 in any of the rows.
Since the worse case performance is n * n ( comparing every element in the matrix) the interviewer wasn't pleased and was looking for a efficient solution - what is an efficient solution here ?
Take advantage of the fact that the rows are sorted which is evident from "in any row - the values will be ascending order. i.e 1's are always after the 0's"
Let there be m rows and n columns. Do a binary search on first row to figure out the first 1 and store that index in some variable, say index (One may think of a better variable name. I am just focused here on solving the problem optimally.) Continue binary search on every row, update the index if the first column containing 1 has lesser index than the index. After doing binary search on every row, you'll end up with the result in index variable.
Time complexity: m rows * log2(n columns) i.e. O(m * log2(n)).
This is the approach I could think of, which is better than the brute force approach having O(mn) time complexity. I don't think there would be a more optimal approach in terms of time and space complexity, as one has to search for the first 1 in every row.
[I don't think I should add the details on how to do a binary search to figure out the first column containing a 1. In case someone isn't very familiar with binary search, I leave this trivial part as an exercise.]

Assign seats in an Auditorium

I encountered an Interview Question:
There is an event in the auditorium and Given capacity of the auditorium is NxM.
Every group of person booked ticket and all the tickets are booked, Now you have to assign seat number to all of them such that minimum number of group split.
So basically a 2-D array is given and we have some groups of certain size(different groups may be of different size).Array needs to be completely filled with minimum number of groups split.
One Brute force Recursive approach I found is :Place first group ,then second group and so on.Permute this arrangement to find the arrangement with minimum split.
One efficient solution I found was using subset sum problem.
https://en.wikipedia.org/wiki/Subset_sum_problem
I could not understand how subset sum problem can be used to solve this problem.
Please suggest how can I approach this problem.I am not looking for code,just psuedo-code or algorithm will suffice.
Firstly, I'm assuming that "group-split" means that some part of the group is in one row and remaining is in another. If the number of seats in a row are N, and given a set which contains the size of the different groups, you need to find a subset that will sum to N. If such a subset is found, that row will be filled without breaking any groups. If there is no such subset found, then you will need to break at least 1 group. Then there can be multiple strategies here.
1) You can pick a group that will be split across 2 rows. This group can be the largest of the remaining, or the smallest or can be picked at random. Once this group is decided, you have 2 rows with less than N empty seats that need to be filled recursively.
2) The strategy can be to find a subset that sums to 2*N - if found, 1 group will be split. If not found, then find a subset that sums to 3*N with 2 group-splits and so on. The maximum number of group-splits will be M-1 for M rows.
Continue 1) or 2) to fill M number of rows in the theatre.

Algorithm to determine if a set of sets can “cover” a range

let's say you have a finite and arbitrary set of sets, and each inner set has can contain integers from 1 to 4 not repeating. So a set could be {{1}, {1,4}, {1,4}, {1,2,3,4,4}, {2,3,4}}. And suppose you have a requirement that you have a set of numbers that have to be in the inner sets, but an inner set can only contribute one number to the requirement.
That was probably confusing, so let me given an example: Say the requirement is {1,2,3,4} and say the set is {{1,2, 3, 4}, {3,4}, {1,2}, {1,2}}. The, it meets the requirement, since you could take 3 from the first inner set, 4 from the second, 1 from the third, and 2 from the last. However, if the set is {{1,2,3,4}, {1,2}, {1,2}, {1,2}} then that does not meet the requirement since you could get a 3 or 4 from the first inner set, but not get the other from any of the other inner sets.
Note that for the requirements, duplicates are fine: so a requirement of {1,1,3} is allowed.
So my question is: Given a requirement and a set, how would you write an algorithm to determine if the set satisfies the condition?
Thanks for reading this!
Take the cross product of the inner sets, and see if it contains the requirement. (Where by cross-product of sets A and B, I mean all sets that can be derived by taking one element from A and one element from B; if exactly one element is a set, add the other element to that set; if both are sets, take their union.)
Try maximal matching in an unweighted bipartite graph.

Algorithm X to Solve the Exact Cover: Fat Matrices

As I was reading about Knuth's Algorithm X to solve the exact cover problem, I thought of an edge case that I wanted some clarification on.
Here are my assumptions:
Given a matrix A, Algorithm X's "goal is to select a subset of the rows so that the digit 1 appears in each column exactly once."
If the matrix is empty, the algorithm terminates successfully and the solution is then the subset of rows logged in the partial solution up to that point.
If there is a column of 0's, the algorithm terminates unsuccessfully.
For reference: http://en.wikipedia.org/wiki/Algorithm_X
Consider the matrix A:
[[1 1 0]
[0 1 1]]
Steps I took:
Given Matrix A:
1. Choose a column, c, with the least number of 1's. I choose: column 1
2. Choose a row, r, that contains to a 1 in column c. I choose: row 1
3. Add r to the partial solution.
4. For each column j such that A(r, j) = 1:
For each row i such that A(i, j) = 1:
delete row i
delete column j
5. Matrix A is empty. Algorithm terminates successfully and solution is allegedly: {row 1}.
However, this is clearly not the case as row 1 only consists of [1 1 0] and clearly does not cover the 3rd column.
I would assume that the algorithm should at some point reduce the matrix to the point where there is only a single 0 and terminate unsuccessfully.
Could someone please explain this?
I think the confusion here is simply in the use of the term empty matrix. If you read Knuth's original paper (linked on the Wikipedia article you cited), you can see that he was treating the rows and columns as doubly-linked lists. When he says that the matrix is empty, he doesn't mean that it has no entries, he means that all the row and column objects have been deleted.
To clarify, I'll label the rows with lower case letters and the columns with upper case letters, as follows:
| A | B | C
---------------
a | 1 | 1 | 0
---------------
b | 0 | 1 | 1
The algorithm states that you choose a column deterministically (using any rule you wish), and he suggests choosing a column with the fewest number of 1's. So, we'll proceed as you suggest and choose column A. The only row with a 1 in column A is row a, so we choose row a and add it to the possible solution { a }. Now, row a has 1s in columns A and B, so we must delete those columns, and any rows containing 1s in those columns, that is, rows a and b, just as you did. The resulting matrix has a single column C and no rows:
| C
-------
This is not an empty matrix (it has a column remaining). However, column C has no 1s in it, so we terminate unsuccessfully, as the algorithm indicates.
This may seem odd, but it is a very important case if we intend to use an incidence matrix for the Exact Cover Problem, because columns represent elements of the set X that we wish to cover and rows represents subsets of X. So a matrix with some columns and no rows represents the exact covering problem where the collection of subsets to choose from is empty (but there are still points to cover).
If this description causes problems for your implementation, there is a simple workaround: just include the empty set in every problem. The empty set (containing no points of X) is represented by a row of all zeros. It is never selected by your algorithm as part of a solution, never collides with any other selected rows, but always ensures that the matrix is nonempty (there is at least one row) until all the columns have been deleted, which is really all you care about since you need to make sure that each column is covered by some row.

Efficiently determining the relationship between rows in a spreadsheet

This is a problem I've just run into, or rather its a simplification that captures the core problem.
Imagine I have a spreadsheet containing a number of columns, each of them labeled, and a number of rows.
I want to determine when the value in one column can be inferred from the value in another. For example, we might find that every time a '1' appears in column a, a '5' always appears in column d, but whenever a '2' appears in column a, a 3 always appears in column d. We observe that the value in column a reliably predicts the value in column c.
The goal is to identify all such relationships between columns.
The naive solution is to start with a list of all pairs of columns, (a, b), (a, c), (a, d)... (b, c), (b, d)... and so on. We call these the "eligible" list.
For each of these pairs, we keep track of the value of the first in the pair, and the corresponding value in the second. If we notice that we see the same value for the first of a pair, but a different value for the second of a pair, then this pair is no-longer eligible.
Whatever is left at the end of this process is the set of valid relationships.
Unfortunately, this rapidly becomes impractical as the number of columns increases, as the amount of data we must store is in the order of the number of columns squared.
Can anyone think of an efficient way to do this?
I don't think you can improve on O(n^2) for n columns: consider the case where no relationship exists between any pair. The only way to discover this is to test all pairs, which is O(n^2).
I suspect you might be best to build up the relation, rather than whittle it down.
You might well have to store n^2 pieces of information, where you have n columns. For example if a column never repeats (ie its value is different on each row) then that column predicts all others. If every column is like that then every column predicts every other. You could use a two dimensional table pred say, indexed by columns numbers, with pred(a,b) true if a predicts b. pred(a,b) could have any of 3 values: true, false and unknown.
The predicts relation is transitive, that is if a predicts b and b predicts c then a predicts c. If the number of rows is large, so that checking if a row predicts another is expensive, then it might be worth using transitivity to fill out what you can: if you have just computed that pred(a,b) is true and you have already computed pred(b,x) for every x, then you can set pred(a,y) true for every y for which pred(b,y) is true.
To fill out pred(a,.) you could build a temporary array of pairs (value,row-index) from a, and then sort by value; this gives you easy access to the sets of indices where a is constant. If each of these sets is a singleton, then pred(a,b) is true for every b; otherwise to check if a predicts b (if its not already known) you need to check that b is constant on each index set (with more than one member) where a is constant.
An optimisation might be that if pred(a,b) is true, and also pred(b,a) is true then for every c, pred(a,c) if and only if pred(b,c); thus in this case if you have already filled out pred(b,.) you can fill out all pred(a,.) by copying.

Resources