is this permutation or combination? - set

I am bit struggling to understand the set theory.
I know that there are combinations and permutations to choose sets from a given input.
For example, I have 3 buttons A, B, and C to retrieve data from each basket A, B, and C.
If I want to retrieve data from A and B then I must select A and B button.
to retrieve data from A and C then I must select A and C.
to retrieve data from A, B, and C then I must select all three buttons.
Since it is a button to retrieve data, the sequence would not matter such that AB = BA.
Then, I will have
A
B
C
AB
AC
BC
ABC
I thought this was a permutation. But then I am getting a total of 7 instead of 6.
I really feel like I am not getting what combinations and permutations are.
Can someone explain what this is?

A permutation of an ordered set is an arrangement of its elements. All the permutations of the set {A,B,C} are
(A,B,C)
(A,C,B)
(C,A,B)
(C,B,A)
(B,C,A)
(B,A,C)
For a set of N elements there are N! permutations.
In contrast, a combination is a selection of k elements from a set of N elements. Order does not play a role here. k can vary between 0 and N. For a fixed k there are N!/(k!(N-k)!) combinations. Summing all of them up gives a total of 2^N combinations. The eight possible combinations of {A,B,C} are
{}
{A}
{B}
{C}
{A,B}
{A,C}
{B,C}
{A,B,C}

Related

Bit masks for subsets from two different sets

I have two sets -
set1 - {i1,i2,i3...iN1}
set2 - {k1,k2,k3...kN2}
For any single set of n items I can represent all possible subsets using bit masks 0-2^n -1.
Similarly how can i represent -
All possible subset of set1 and set2, where at-least 1 items is from the different set.
for example
{i1,i2,k1} is valid
but {i1,i2} - invalid as it has no item from set2.
I am trying to generate two things -
Kind of a equation which can give me a count of all subsets, like we have 2^n subsets for a single n items set.
Bit encoding/masks using which i can represent above type of subsets.
This will be easier if we introduce a few extra sets of interest. Let's call the two input sets S1 and S2; we'll define sets L, C, and R (for left, center, and right). Think of these as being the Venn diagram. So, define L = S1 \ S2, the elements in S1 that aren't in S2 at all; C = S1 ∩ S2, the elements that are in both S1 and S2, and R = S2 \ S1. Let's also write l, c, and r for the sizes of these sets, respectively.
Now we're ready to answer question 1: how many subsets of S1 ∪ S2 have an element from S1 and have an element from S2? There are two cases to consider: either we have a subset with an element in C (which satisfies both the "an element from S1" and the "an element from S2" clauses), or we have no elements in C and at least one each from L and R. In the first case, there are (2^c - 1) non-empty subsets of C, and 2^(l+r) subsets of the remainder, so there are (2^c - 1)*2^(l+r) sets in that case. In the second case, there are 2^l - 1 non-empty subsets of L and 2^r - 1 non-empty subsets of R, so there are (2^l - 1) * (2^r - 1) subsets in that case. Adding up the two cases, we have a total of 2^(c+l+r) - 2^l - 2^r total subsets satisfying the condition.
If you have a fancy representation of non-empty subsets, this also immediately suggests a data structure for storing these: a single bit tag for which case you're in, plus the appropriate representations of subsets and non-empty subsets in each case.
But I would probably just use a single bitmask of size c+l+r, even though there are a few "invalid" bitmasks: it's very compact, it's easy to check validity, and there are many cheap operations on bitmasks.

Minimum edit distance of two anagrams given two swap operations

Given two anagrams S and P, what is the minimum edit distance from S to P when there are only two operations:
swap two adjacent elements
swap the first and the last element
If this question is simplified to only having the first operation (i.e. swap two adjacent elements) then this question is "similar to" the classical algorithm question of "the minimum number of swaps for sorting an array of numbers" (solution link is given below)
Sorting a sequence by swapping adjacent elements using minimum swaps
I mean "similar to" because when the two anagrams have all distinct characters:
S: A B C D
P : B C A D
Then we can define the ordering in P like this
P: B C A D
1 2 3 4
Then based on this ordering the string S becomes
S: A B C D
3 1 2 4
Then we can use the solution given in the link to solve this question.
However, I have two questions:
In the simplified question that we can only swap two adjacent elements, how can we get the minimum number of swaps if the anagrams contain duplicate elements. For example,
S: C D B C D A A
P: A A C D B C D
How to solve the complete question with two swap operations?
One approach is to use http://en.wikipedia.org/wiki/A*_search_algorithm for the search. Your cost function is half of the sum of the shortest distances from each element to the nearest element that could possibly go there. The reason for half is that the absolutely ideal set of swaps will at all points move both elements closer to where they want to go.

sets/number theory : no of occurences of a particular element in k subsets of n set

let there be a set S with n distinct elements namely a1, a2, a3, a4 and so on.
let the be subsets of S which have exactly k elements.
number of such subsets= (n choose k ) or nCk
out of all such subsets with exactly k number of elements, how many of them will contain a1?
I know you asked this a long time ago, so I'm sorry if this is completely irrelevant now, but maybe it can help someone else.
In theory, consider that if we let a set P be S, but without the element a1, then P has n-1 elements. In this new set P, if we choose subsets with k-1 elements, then there will be a corresponding subset of S containing k elements, one of which is a1. So we're just using a set with one less element (n-1 elements) and choose k-1 elements for the subsets. Our formula is then: (n-1)C(k-1).

Find cardinality of set

I have faced the following problem recently:
We have a sequence A of M consecutive integers, beginning at A[1] = 1:
1,2,...M (example: M = 8 , A = 1,2,3,4,5,6,7,8 )
We have the set T consisting of all possible subsequences made from L_T consecutive terms of A.
(example L_T = 3 , subsequences are {1,2,3},{2,3,4},{3,4,5},...). Let's call the elements of T "tiles".
We have the set S consisting of all possible subsequences of A that have length L_S. ( example L_S = 4, subsequences like {1,2,3,4} , {1,3,7,8} ,...{4,5,7,8} ).
We say that an element s of S can be "covered" by K "tiles" of T if there exist K tiles in T such that the union of their sets of terms contains the terms of s as a subset. For example, subsequence {1,2,3} is possible to cover with 2 tiles of length 2 ({1,2} and {3,4}), while subsequnce {1,3,5} is not possible to "cover" with 2 "tiles" of length 2, but is possible to cover with 2 "tiles" of length 3 ({1,2,3} and {4,5,6}).
Let C be the subset of elements of S that can be covered by K tiles of T.
Find the cardinality of C given M, L_T, L_S, K.
Any ideas would be appreciated how to tackle this problem.
Assume M is divisible by T, so that we have an integer number of tiles covering all elements of the initial set (otherwise the statement is currently unclear).
First, let us count F (P): it will be almost the number of subsequences of length L_S which can be covered by no more than P tiles, but not exactly that.
Formally, F (P) = choose (M/T, P) * choose (P*T, L_S).
We start by choosing exactly P covering tiles: the number of ways is choose (M/T, P).
When the tiles are fixed, we have exactly P * T distinct elements available, and there are choose (P*T, L_S) ways to choose a subsequence.
Well, this approach has a flaw.
Note that, when we chose a tile but did not use its elements at all, we in fact counted some subsequences more than once.
For example, if we fixed three tiles numbered 2, 6 and 7, but used only 2 and 7, we counted the same subsequences again and again when we fixed three tiles numbered 2, 7 and whatever.
The problem described above can be countered by a variation of the inclusion-exclusion principle.
Indeed, for a subsequence which uses only Q tiles out of P selected tiles, it is counted choose (M-Q, P-Q) times instead of only once: Q of P choices are fixed, but the other ones are arbitrary.
Define G (P) as the number of subsequences of length L_S which can be covered by exactly P tiles.
Then, F (P) is sum for Q from 0 to P of the products G (Q) * choose (M-Q, P-Q).
Working from P = 0 upwards, we can calculate all the values of G by calculating the values of F.
For example, we get G (2) from knowing F (2), G (0) and G (1), and also the equation connecting F (2) with G (0), G (1) and G (2).
After that, the answer is simply sum for P from 0 to K of the values G (P).

Finding least number of bit sequence ORs to achieve all 1's?

I'm trying to find anything that may help with this task: I have a variable number of bit sequences (that will all individually be the same length) and I need to find which combination of sequences would OR to all 1's, using as few sequences as possible. I was thinking to start with whichever sequence had the most 1's and try filling in the blanks, but since I haven't worked with bit comparisons really I didn't know if there was some algorithm or property of bit logic that would simplify this. Thanks.
This problem, unfortunately, is NP-hard in the most general case by a reduction from the set cover problem. In the set cover problem, you have a collection of sets of elements, and want to find the smallest number of them whose union contains all the total elements. You can easily reduce the set cover problem to your problem by constructing a bitvector for each set that has a 1 in each position if a given set has that item and a 0 otherwise. The smallest number of bitvectors whose OR gives all 1s is then equivalent to the smallest group of sets whose union contains all elements.
For example, given the sets {a, b, e}, {b, c}, {b, d, f}, and {a, f}, you would get these bitvectors:
{a, b, e} 110010
{b, c} 011000
{b, d, f} 010101
{a, f} 100001
Since the set cover problem is known to be NP-hard, this means that unless P = NP there is no polynomial-time algorithm for your problem. Worse, it is known that you cannot approximate the optimal solution within a factor of O(log n), where n is the number of total elements, in polynomial time. You are probably best off looking for heuristics, or staying content with an O(log n) approximation using the greedy algorithm.
Hope this helps!
I thought a bit about this problem and here's the idea I came up with:
First you create for every bit a List and in every List you'll find every sequence that has a '1' on this bit. This takes O(n*m) beeing n the number of sequences and m the length of a particular sequence
Then you count all occurences of every Bitsequence and throw all these Tuple of [List, Integer] in a structure (AVL Tree or Heap or whatever you like) and sort them. (I mean: the sequence 'a' occurs 15 times over all lists and sequence b 10 times). This takes again O(n*m) because O(nlogn) < O(n*m)
In the next step you use the sequence with the highest priority and remove all lists of step one wich contain this sequence. Then you go back to step 2 until you have eliminated all lists. In the worst case you'll have to do this m times.
So in total we have a time of O(n * m^2)
Correct me if you I misunderstood a part of the question or if I did a mistake ;)
Here is a little example of what I mean:
Bit Strings:
a: 100101
b: 010001
c: 011100
d: 000010
So this will create the Lists:
L1: a
L2: b,c
L3: c
L4: a, c
L5: d
L6: a, b
Then we will count and sort:
a: 3
c: 3
b: 2
d: 1
So we take a in our final list and delete the following Lists:
L1, L4, L6
Now we count again:
c: 2
b: 1
d: 1
so we take c in our list and delete:
L2, L3
so we have only L5 left wich only contains d
So we have found our final minimal set: a, c, d

Resources