I was reading about Big Oh notation in Skiena's Algorithm Design Manual and came across the following explanation of O(2n):
Exponential functions: Functions like 2n arise when enumerating all subsets of n items.
What does this mean in terms of a concrete example?
Say I have the set: {1,2,3,4} (therefore n=4), this would mean (according to Skiena's definition) that the number of subsets is 24 which is 16 subsets. I can't figure out what these 16 subsets are.
Does the 2 in the relation 2n mean that the subsets are restricted to a size of 2 each?
Edit: I guess part of what I'm asking is, why 2n and not 3n for example? This doesn't feel intuitive at all to me.
Here's a list of the all valid subsets of {1, 2, 3, 4}:
{} 1
{1}, {2}, {3}, {4} + 4
{1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4} + 6
{1,2,3}, {1,2,4}, {1,3,4}, {2,3,4} + 4
{1,2,3,4} + 1
= 16
The reason that the count is 2ⁿ and not 3ⁿ is that to create a subset, you can imagine going through each element and making the decision "is the element in the subset or not?".
That is, you choose between two possibilities (in and out) for each of n elements, so the total number of ways to make this decision (and thus the total number of subsets) is
2 * 2 * 2 * .... * 2
\________ ________/
\/
n times
which is 2ⁿ.
One subset of 0 elements: {}
Four subsets of 1 element: {1} {2} {3} {4}
Six subsets of 2 elements: {1,2} {1,3} {1,4} {2,3} {2,4} {3,4}
Four subsets of 3 elements: {1,2,3} {1,2,4} {1,3,4} {2,3,4}
One subset of 4 elements {1,2,3,4}
Totals subsets is therefore sixteen.
The 2 in the 2n simply means that the "workload" rises in proportion to the exponential function of n. This is much worse than even n2 where it simply rises with the square.
This set of all sets of a finite set is known as a power set and, if you really want to know why it's 2n, the properties section of that page explains:
We write any subset of S in the format {X1, X2, ..., Xn} where Xi,1<=i<=n, can take the value of 0 or 1. If Xi = 1, the i-th element of S is in the subset; otherwise, the i-th element is not in the subset. Clearly the number of distinct subsets that can be constructed this way is 2n.
Basically what that means in layman's terms is that, in a given subset, each element can either be there or not there. The number of possibilities is therefore similar to what you see with n-bit binary numbers.
For one bit, there are two possibilities 0/1, equivalent to the set {a} which has subsets {} {a}.
For two bits, four possibilities 00/01/10/11, equivalent to the set {a,b} which has subsets {} {a} {b} {a,b}.
For three bits, eight possibilities 000/001/010/011/100/101/110/111, equivalent to the set {a,b,c} which has subsets {} {a} {b} {c} {a,b} {a,c} {b,c} {a,b,c}.
And so on, including the next step of four elements giving sixteen possibilities as already seen above.
Related
Find the only two numbers in an array where one evenly divides the other - that is, where the result of the division operation is a whole number
Input Arrays Output
5 9 2 8 8/2 = 4
9 4 7 3 9/3 = 3
3 8 6 5 6/3 = 2
The brute force approach of having nested loops has time complexity of O(n^2). Is there any better way with less time complexity?
This question is part of advent of code.
Given an array of numbers A, you can identify the denominator by multiplying all the numbers together to give E, then testing each ith element by dividing E by Ai2. If this is a whole number, you have found the denominator, as no other factors can be introduced by multiplication.
Once you have the denominator, it's a simple task to do a second, independent loop searching for the paired numerator.
This eliminates the n2 comparisons.
Why does this work? First, we have an n-2 collection of non-divisors: abcde..
To complete the array, we also have numerator x and denominator y.
However, we know that x and only x has a factor of y, so it can be expressed as yz (z being a whole remainder from the division of x by y)
When we multiply out all the numbers, we end up with xyabcde.., but as x = yz, we can also say y2zabcde..
When we loop through dividing by the squared i'th element from the array, for most of the elements we create a fraction, e.g. for a:
y2zabcde.. / a2 = y2zbcde.. / a
However, for y and y only:
y2zabcde.. / y^2 = zabcde..
Why doesn't this work? The same is true of the other numbers. There's no guarantee that a and b can't produce another common factor when multiplied. Take the example of [9, 8, 6, 4], 9 and 8 multiplied equals 72, but as they both include prime factors 2 and 3, 72 has a factor of 6, also in the array. When we multiply it all out to 1728, those combine with the original 6 so that it can divide soundly by 36.
How might this be fixed? More accurately, if y is a factor of x, then y's prime factors will uniquely be a subset of x's prime factors, so maybe things can be refined along those lines. Obtaining a prime factorization should not scale according to the size of the array, but comparing subsets would, so it's not clear to me if this is at all useful.
I think that O(n^2) is the best time complexity you can get without any assumptions on the data.
If you can't tell anything about the numbers, knowing that x and y do not divide each other tells you nothing about x and z or y and z for any x, y, z. Therefore, in the worst case you must check all pairs of numbers - equal to n Choose 2 = n*(n-1)/2 = O(n^2).
Clearly, we can get O(n * sqrt(m)), where m is the absolute value range, by listing the pairs of divisors of each element against a hash of unique values in the array. This can be more efficient than O(n^2) depending on the input.
5 9 2 8
list divisor pairs (at most sqrt m iterations per element m)
5 (1,5)
9 (1,9), (3,3)
2 (1,2)
8 (1,8), (2,4) BINGO!
If you prime factorise all the numbers in the array progressively into a tree, when we discover a completely factored number leaf while factoring another number, we know we've found the divisor.
However, given we don't know which number is the divisor, we do need to test all primes up to divisor's largest factor. The largest factor for any m-digit number is, at most, sqrt(m), while the average number of primes below any m-digit number is m / ln(m). This means we will make at most n (sqrt(m) / ln(sqrt(m)) operations with very basic factorization and no optimization.
To be a little more specific, the algorithm should keep track of four things: a common tree of explored prime factors, the original number from the array, its current partial factorization, and its position in the tree.
For each prime number, we should test all numbers in the array (repeatedly to account for repeated factors). If the number divides evenly, we a) update the partial factorization, b) add/navigate to the corresponding child to the tree, c) if the partial factorization is 1, we have found the last factor and can indicate a leaf by adding the terminating '1' child, and d) if not, we can check for other numbers having left a child '1' to indicate they are completely factored.
When we find a child '1', we can identify the other number by multiplying out the partial factorization (e.g. all the parents up the tree) and exit.
For further optimization, we can cache the factorization (both partial and full) of numbers. We can also stop checking further factors of numbers that have a unique factor, narrowing the field of candidates over time.
I'm trying to divide a linked-list into 2 sublists with equal sum. These sublists do not need to consist of consecutive elements.
I have a linked list as
Eg.1
LinkedList={1,7,5,5,4}
should be divided into
LinkedList1={1,5,5}
LinkedList2={7,4}
Both have the same sum of elements as 11.
Eg.2
LinkedList={42,2,3,2,2,2,5,20,2,20}
This should be divided into two list of equal sum i.e 50.
LinkedList1={42,3,5}
LinkedList2={2,2,2,2,20,2,20}
Can someone provide some pseudocode to solve this problem?
This is what I've thought so far:
Sum the elements of linked list and divide by 2.
Now till the sum of your linkedlist1 is less than the sum of linkedlist/2 keep pushing elements into linkedlist1.
If not equal and less than linkedlist sum/2 move to the next element and the current element can be pushed to the linkedlist2.
But this would only work if the elements are in a particular order.
This is known as the partition problem.
There are a few approaches to solving the problem, but I'll just mention the most common 2 below (see Wikipedia for more details on either approach or other approaches).
This can be solved with a dynamic programming approach, which basically comes down to, for each element and value, either including or excluding that element, and looking up whether there's a subset summing to the corresponding value. More specifically, we have the following recurrence relation:
p(i, j) is True if a subset of { x1, ..., xj } sums to i and False otherwise.
p(i, j) is True if either p(i, j − 1) is True or if p(i − xj, j − 1) is True
p(i, j) is False otherwise
Then p(N/2, n) tells us whether a subset exists.
The running time is O(Nn) where n is the number of elements in the input set and N is the sum of elements in the input set.
The "approximate" greedy approach (doesn't necessarily find an equal-sum partition) is pretty straight-forward - it just involves putting each element in the set with the smallest sum. Here's the pseudo-code:
INPUT: A list of integers S
OUTPUT: An attempt at a partition of S into two sets of equal sum
1 function find_partition( S ):
2 A ← {}
3 B ← {}
4 sort S in descending order
5 for i in S:
6 if sum(A) <= sum(B)
7 add element i to set A
8 else
9 add element i to set B
10 return {A, B}
The running time is O(n log n).
I have faced the following problem recently:
We have a sequence A of M consecutive integers, beginning at A[1] = 1:
1,2,...M (example: M = 8 , A = 1,2,3,4,5,6,7,8 )
We have the set T consisting of all possible subsequences made from L_T consecutive terms of A.
(example L_T = 3 , subsequences are {1,2,3},{2,3,4},{3,4,5},...). Let's call the elements of T "tiles".
We have the set S consisting of all possible subsequences of A that have length L_S. ( example L_S = 4, subsequences like {1,2,3,4} , {1,3,7,8} ,...{4,5,7,8} ).
We say that an element s of S can be "covered" by K "tiles" of T if there exist K tiles in T such that the union of their sets of terms contains the terms of s as a subset. For example, subsequence {1,2,3} is possible to cover with 2 tiles of length 2 ({1,2} and {3,4}), while subsequnce {1,3,5} is not possible to "cover" with 2 "tiles" of length 2, but is possible to cover with 2 "tiles" of length 3 ({1,2,3} and {4,5,6}).
Let C be the subset of elements of S that can be covered by K tiles of T.
Find the cardinality of C given M, L_T, L_S, K.
Any ideas would be appreciated how to tackle this problem.
Assume M is divisible by T, so that we have an integer number of tiles covering all elements of the initial set (otherwise the statement is currently unclear).
First, let us count F (P): it will be almost the number of subsequences of length L_S which can be covered by no more than P tiles, but not exactly that.
Formally, F (P) = choose (M/T, P) * choose (P*T, L_S).
We start by choosing exactly P covering tiles: the number of ways is choose (M/T, P).
When the tiles are fixed, we have exactly P * T distinct elements available, and there are choose (P*T, L_S) ways to choose a subsequence.
Well, this approach has a flaw.
Note that, when we chose a tile but did not use its elements at all, we in fact counted some subsequences more than once.
For example, if we fixed three tiles numbered 2, 6 and 7, but used only 2 and 7, we counted the same subsequences again and again when we fixed three tiles numbered 2, 7 and whatever.
The problem described above can be countered by a variation of the inclusion-exclusion principle.
Indeed, for a subsequence which uses only Q tiles out of P selected tiles, it is counted choose (M-Q, P-Q) times instead of only once: Q of P choices are fixed, but the other ones are arbitrary.
Define G (P) as the number of subsequences of length L_S which can be covered by exactly P tiles.
Then, F (P) is sum for Q from 0 to P of the products G (Q) * choose (M-Q, P-Q).
Working from P = 0 upwards, we can calculate all the values of G by calculating the values of F.
For example, we get G (2) from knowing F (2), G (0) and G (1), and also the equation connecting F (2) with G (0), G (1) and G (2).
After that, the answer is simply sum for P from 0 to K of the values G (P).
I want to calculate how many pairs of disjoint subsets S1 and S2 (S1 U S2 may not be S) of a set S exists for which sum of elements in S1 = sum of elements in S2.
Say i have calculated all the subset sums for all the possible 2^n subsets.
How do i find how many disjoint subsets have equal sum.
For a sum value A, can we use the count of subsets having sum A/2 to solve this ?
As an example :
S ={1,2,3,4}
Various S1 and S2 sets possible are:
S1 = {1,2} and S2 = {3}
S1 = {1,3} and S2 = {4}
S1 = {1,4} nd S2 = {2,3}
Here is the link to the problem :
http://www.usaco.org/index.php?page=viewproblem2&cpid=139
[EDIT: Fixed stupid complexity mistakes. Thanks kash!]
Actually I believe you'll need to use the O(3^n) algorithm described here to answer this question -- the O(2^n) partitioning algorithm is only good enough to enumerate all pairs of disjoint subsets whose union is the entire ground set.
As described at the answer I linked to, for each element you are essentially deciding whether to:
Put it in the first set,
Put it in the second set, or
Ignore it.
Considering every possible way to do this generates a tree where each vertex has 3 children: hence O(3^n) time. One thing to note is that if you generate a solution (S1, S2) then you should not also count the solution (S2, S1): this can be achieved by always maintaining an asymmetry between the two sets as you build them up, e.g. enforcing that the smallest element in S1 must always be smaller than the smallest element in S2. (This asymmetry enforcement has the nice side-effect of halving the execution time :))
A speedup for a special (but perhaps common in practice) case
If you expect that there will be many small numbers in the set, there is another possible speedup available to you: First, sort all the numbers in the list in increasing order. Choose some maximum value m, the larger the better, but small enough that you can afford an m-size array of integers. We will now break the list of numbers into 2 parts that we will process separately: an initial list of numbers that sum to at most m (this list may be quite small), and the rest. Suppose the first k <= n numbers fit into the first list, and call this first list Sk. The rest of the original list we will call S'.
First, initialise a size-m array d[] of integers to all 0, and solve the problem for Sk as usual -- but instead of only recording the number of disjoint subsets having equal sums, increment d[abs(|Sk1| - |Sk2|)] for every pair of disjoint subsets Sk1 and Sk2 formed from these first k numbers. (Also increment d[0] to count the case when Sk1 = Sk2 = {}.) The idea is that after this first phase has finished, d[i] will record the number of ways that 2 disjoint subsets having a difference of i can be generated from the first k elements of S.
Second, process the remainder (S') as usual -- but instead of only recording the number of disjoint subsets having equal sums, whenever |S1'| - |S2'| <= m, add d[abs(|S1'| - |S2'|)] to the total number of solutions. This is because we know that there are that many ways of building a pair of disjoint subsets from the first k elements having this difference -- and for each of these subset pairs (Sk1, Sk2), we can add the smaller of Sk1 or Sk2 to the larger of S1' or S2', and the other one to the other one, to wind up with a pair of disjoint subsets having equal sum.
Here is a clojure solution.
It defines s to be a set of 1, 2, 3, 4
Then all-subsets is defined to be a list of all sets of size 1 - 3
Once all the subsets are defined, it looks at all pairs of subsets and selects only the pairs that are not equal, do not union to the original set, and whose sum is equal
(require 'clojure.set)
(use 'clojure.math.combinatorics)
(def s #{1, 2, 3, 4})
(def subsets (mapcat #(combinations s %) (take 3 (iterate inc 1))))
(for [x all-subsets y all-subsets
:when (and (= (reduce + x) (reduce + y))
(not= s (clojure.set/union (set x) (set y)))
(not= x y))]
[x y])
Produces the following:
([(3) (1 2)] [(4) (1 3)] [(1 2) (3)] [(1 3) (4)])
I know of the algorithm for the general case (e.g. generating all combination of
n elements taken m at a time) but I was wondering if there was a faster one specifically
designed for the case m=n-1. Also, if such an algorithm exists, could anyone point to
a C/C++ implementation?
It is pretty easy - iterate over all the elements using a simple cycle. In this cycle construct a new set consisting of all elements but one(the one pointing to by the index in the cycle).
NOTE: a few notes so that you can achieve O(N) complexity(I will use C++ for example but you may use any other language with vector-like container).
In C++: asssuming you have an vector<int> a that holds all the numbers:
vector<int> a;
... initialize a ....
vector<int> b(a.begin()+1, a.size()); // Now b will have all elements of a but the first one.
for (int i=0;i<a.size() - 1;++i) {
b.push_back(a[i]);
swap(b[i], b[b.size()-1]);
b.pop_back();
}
Using the code above b will sequentally iterate over all the combinations.
if a set has
2 Elements {1,2} subset will have 2^2 elements {},{1},{2},{1,2}
3 elements {1,2,3} subset will have 2^3 elements {},
{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}
4 elements {1,2,3,4} subset will have 2^4 elements {},
{1},{2},{3},{4},{1,2},{1,3},
{1,4}{2,3},{2,4},{3,4},{1,2,3},{1,2,4},{1,3,4},{2,3,4},{1,2,3,4}
So i think above set could be achieved using 2 loops
also in Combinations
Let U be a set with n elements; we want to count the number of distinct subsets of the set U that have exactly j elements. this could be written as N!/J! *(N-J)!
note: empty element subset can be removed and formula becomes 2^n-1
Here is my answer , if at all this helps
if N=2 {1},{2}
for N=3 add 3 at the end of N=2 elements {1,3}{2,3} and mix N=2
elements {1,2}
for N=4 add 4 at the end of N=3 elements {1,3,4}{2,3,4} {1,2,4} and
Mix N=3 elements {1,2,3}
Hope this helps!!!