I am reading chapter 15 from CLRS and came across this definition of a subsequence:
A subsequence of a given sequence is just the given sequence with zero
or more elements left out.
Later it is said that:
Each subsequence of X corresponds to a subset of the indices {1, 2,
3...m} of X. Because X has 2^m subsequences...
I don't see how X can have 2^m subsequences. From what I understand, if
X = {A, B}, then the subsequences of X can be {A}, {B} and {A, B} so we have 3 subsequences and not 2^2. Could someone please show me what I am missing here?
There is one empty subset.
For any set S the power set of S is the number of subsets possible which is 2^|S| where |s| is the cardinality of the set, i.e number of elements in the set.
In your case the sequence is nothing but a set and the number of subsequence possible is equivalent to the power set of the sequence.
In your example X = {A, B} the possible sub sequences are {empty, A, B ,AB}
Related
Let 3 colorful sets problem. Given M sets of size three over {1...n} elements. We are given in other words sets S1, S2, ... , Sm where, for every i, Si = {x, y, z} for some x, y, z ∈ {1, ... , n}. What I want to find is to pick a set of elements E ⊆ {1, ... , n} so that to maximize the number of sets that contain exactly one element in E, namely, to maximize the |{i |Si ∩ E| = 1}| . A solution could use a 3 approximation polynomial time algo.
I am thinking a randomized algo that guarantee the approximation ration or a deterministic one. I have some ideas but I am not sure how to actually implement it. Any help would be appreciated.
The famous algorithm for exact cover problem is given by Donald Knuth called Knuth's Algorithm X.
Input: List of subsets of a Universal sets
Output: All the possible disjoint subset whose union is Universal set
Suppose the input is {ab, ac, cd, c, d, a, b}. Is it possible to make the Knuth's Algorithm X such that it will give output according to some predefined block size. For example if {2, 2} is the block size set, it will give output: {ab, cd}, if {2,1,1} is the block size set, it will give output: {ab, c, d}, {ac, b, d} and {cd, b, a}.
You can (optionally) start with removing all subsets from your input list that does not have size in set of block sizes.
The original Knuth's Algorithm X can be altered with the set of block sizes (for example {2, 1, 1}) as a restriction using extensions in bold as follows:
If A is empty and set of block sizes is empty, the problem is solved; terminate successfully.
Otherwise choose a column, c (deterministically).
Choose a row, r, such that A[r, c] = 1 and number of 1s in row r is in the set of block sizes (nondeterministically).
Include r in the partial solution
Remove number of 1s in row r from set of block sizes
For each j such that A[r, j] = 1,
Delete column j from matrix A;
For each i such that A[i, j] = 1,
Delete row i from matrix A.
Repeat this algorithm recursively on the reduced matrix A and reduced set of block sizes.
I recently thought of this problem, and I thought of an "instinctive" greedy solution but I can't prove its optimality.
You are given N integers, V1, V2, ..., VN and K sets (K < N).
You need to find a way of partitioning the integers into the sets, so that the minimum difference between any two elements in the same set is maximized.
For example, when the integers are 1, 5, 6, 8, 8 and you have 2 sets, an optimal way of partitioning the integers would be
{1, 6, 8}
{5, 8}
So the minimum difference is between 6 and 8, which is 2.
This arrangement is not unique, for example
{1, 5, 8}
{6, 8}
Also gives a minimum difference of 2.
I was thinking, if I can use a greedy algorithm to solve this.
I would sort it first, and then put all V1, V1+K, V1+2K... together, and then all V2, V2+K, V2+2K... together, and so on.
Is there a proof for the optimality of this solution, or a counterexample where this does not work?
Thanks.
Yes, it's optimal. We'll show that if a difference D appears using your process, then for any arrangement of the numbers there's a pair of numbers in the same set which differ by at most D.
To prove it, consider adding the sorted numbers one by one to the K sets. Let's call the sorted numbers x[i]. Suppose we're adding x[n] to one of the sets. The largest value in that set is x[n-k], with x[n]-x[n-k] = D for some D.
Now, the set x[n-k], x[n-k+1], ..., x[n] is a set of k+1 numbers, all of which differ from each other by at most D (for x[n]-x[n-k] = D).
By the pigeon-hole principle, two of these k+1 numbers must fall in the same set no matter how you arrange them, so the maximum minimum distance must be at most D.
This proves that if a distance D appears in your process, then the maximum minimum distance achievable is at most D.
Let D_min be the smallest difference between two numbers in the same set using your process. Then we've shown that the maximum minimum distance achievable is <= D_min, but also D_min <= maximum minimum distance (since D_min is a minimum distance) which shows that D_min is the maximum minimum distance.
I was reading about Big Oh notation in Skiena's Algorithm Design Manual and came across the following explanation of O(2n):
Exponential functions: Functions like 2n arise when enumerating all subsets of n items.
What does this mean in terms of a concrete example?
Say I have the set: {1,2,3,4} (therefore n=4), this would mean (according to Skiena's definition) that the number of subsets is 24 which is 16 subsets. I can't figure out what these 16 subsets are.
Does the 2 in the relation 2n mean that the subsets are restricted to a size of 2 each?
Edit: I guess part of what I'm asking is, why 2n and not 3n for example? This doesn't feel intuitive at all to me.
Here's a list of the all valid subsets of {1, 2, 3, 4}:
{} 1
{1}, {2}, {3}, {4} + 4
{1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4} + 6
{1,2,3}, {1,2,4}, {1,3,4}, {2,3,4} + 4
{1,2,3,4} + 1
= 16
The reason that the count is 2ⁿ and not 3ⁿ is that to create a subset, you can imagine going through each element and making the decision "is the element in the subset or not?".
That is, you choose between two possibilities (in and out) for each of n elements, so the total number of ways to make this decision (and thus the total number of subsets) is
2 * 2 * 2 * .... * 2
\________ ________/
\/
n times
which is 2ⁿ.
One subset of 0 elements: {}
Four subsets of 1 element: {1} {2} {3} {4}
Six subsets of 2 elements: {1,2} {1,3} {1,4} {2,3} {2,4} {3,4}
Four subsets of 3 elements: {1,2,3} {1,2,4} {1,3,4} {2,3,4}
One subset of 4 elements {1,2,3,4}
Totals subsets is therefore sixteen.
The 2 in the 2n simply means that the "workload" rises in proportion to the exponential function of n. This is much worse than even n2 where it simply rises with the square.
This set of all sets of a finite set is known as a power set and, if you really want to know why it's 2n, the properties section of that page explains:
We write any subset of S in the format {X1, X2, ..., Xn} where Xi,1<=i<=n, can take the value of 0 or 1. If Xi = 1, the i-th element of S is in the subset; otherwise, the i-th element is not in the subset. Clearly the number of distinct subsets that can be constructed this way is 2n.
Basically what that means in layman's terms is that, in a given subset, each element can either be there or not there. The number of possibilities is therefore similar to what you see with n-bit binary numbers.
For one bit, there are two possibilities 0/1, equivalent to the set {a} which has subsets {} {a}.
For two bits, four possibilities 00/01/10/11, equivalent to the set {a,b} which has subsets {} {a} {b} {a,b}.
For three bits, eight possibilities 000/001/010/011/100/101/110/111, equivalent to the set {a,b,c} which has subsets {} {a} {b} {c} {a,b} {a,c} {b,c} {a,b,c}.
And so on, including the next step of four elements giving sixteen possibilities as already seen above.
I have faced the following problem recently:
We have a sequence A of M consecutive integers, beginning at A[1] = 1:
1,2,...M (example: M = 8 , A = 1,2,3,4,5,6,7,8 )
We have the set T consisting of all possible subsequences made from L_T consecutive terms of A.
(example L_T = 3 , subsequences are {1,2,3},{2,3,4},{3,4,5},...). Let's call the elements of T "tiles".
We have the set S consisting of all possible subsequences of A that have length L_S. ( example L_S = 4, subsequences like {1,2,3,4} , {1,3,7,8} ,...{4,5,7,8} ).
We say that an element s of S can be "covered" by K "tiles" of T if there exist K tiles in T such that the union of their sets of terms contains the terms of s as a subset. For example, subsequence {1,2,3} is possible to cover with 2 tiles of length 2 ({1,2} and {3,4}), while subsequnce {1,3,5} is not possible to "cover" with 2 "tiles" of length 2, but is possible to cover with 2 "tiles" of length 3 ({1,2,3} and {4,5,6}).
Let C be the subset of elements of S that can be covered by K tiles of T.
Find the cardinality of C given M, L_T, L_S, K.
Any ideas would be appreciated how to tackle this problem.
Assume M is divisible by T, so that we have an integer number of tiles covering all elements of the initial set (otherwise the statement is currently unclear).
First, let us count F (P): it will be almost the number of subsequences of length L_S which can be covered by no more than P tiles, but not exactly that.
Formally, F (P) = choose (M/T, P) * choose (P*T, L_S).
We start by choosing exactly P covering tiles: the number of ways is choose (M/T, P).
When the tiles are fixed, we have exactly P * T distinct elements available, and there are choose (P*T, L_S) ways to choose a subsequence.
Well, this approach has a flaw.
Note that, when we chose a tile but did not use its elements at all, we in fact counted some subsequences more than once.
For example, if we fixed three tiles numbered 2, 6 and 7, but used only 2 and 7, we counted the same subsequences again and again when we fixed three tiles numbered 2, 7 and whatever.
The problem described above can be countered by a variation of the inclusion-exclusion principle.
Indeed, for a subsequence which uses only Q tiles out of P selected tiles, it is counted choose (M-Q, P-Q) times instead of only once: Q of P choices are fixed, but the other ones are arbitrary.
Define G (P) as the number of subsequences of length L_S which can be covered by exactly P tiles.
Then, F (P) is sum for Q from 0 to P of the products G (Q) * choose (M-Q, P-Q).
Working from P = 0 upwards, we can calculate all the values of G by calculating the values of F.
For example, we get G (2) from knowing F (2), G (0) and G (1), and also the equation connecting F (2) with G (0), G (1) and G (2).
After that, the answer is simply sum for P from 0 to K of the values G (P).