Find all sets doesn't contain any sub sets - algorithm

I have a problem and I'm researching the fastest algorithm to find a set that is subset of original set (S) and doesn't contain any subsets (S1, ... Sn) of S. The set I want to find can contain some elements of Si but doesn't contain the whole.
For example, original set: S = (1, 2, 3, 4, 5), S1 = (1, 2), S2 = (1, 3)
=> longest set: (2, 3, 4, 5); other sets: (1, 4, 5), (2, 4, 5), (3, 4, 5), (1, 4),...
Anybody can give me a suggestion? Thanks!

Bad news
Consider the problem of choosing which elements to NOT include.
If we choose to NOT include element 1, we satisfy the constraints for S1 and S2.
If we choose to NOT include element 2, we satisfy the constraints for S1.
If we choose to NOT include element 3, we satisfy the constraints for S1 and S3.
So 1 gives {S1,S2}, 2 gives {S1}, 3 gives {S3}.
Your problem can be expressed as finding the minimum number of elements to NOT include such that the union of the satisfied sets (e.g. {S1,S2}) covers all of the given sets.
This is exactly the set cover problem which is NP complete.
Good news
In practice, you will probably do quite well by simply choosing the elements to NOT include based on whichever ends up covering the most sets.
This is an easy to implement greedy algorithm (although it will not always give the optimal answer).

Related

What is the key difference between Combination Sum IV and No. of ways to make coin change problem?

Combination Sum
Given an array of distinct integer nums and a target integer target, return the number of possible combinations that add up to the target.
Input: nums = [1,2,3], target = 4
Output: 7
Explanation:
The possible combination ways are:
(1, 1, 1, 1)
(1, 1, 2)
(1, 2, 1)
(1, 3)
(2, 1, 1)
(2, 2)
(3, 1)
Note that different sequences are counted as different combinations.
Coin Change
For the given infinite supply of coins of each of denominations, D = {D0, D1, D2, D3, ...... Dn-1}. You need to figure out the total number of ways W, in which you can make the change for Value V using coins of denominations D.
For the same Input as above question:
Number of ways are - 4 total i.e. (1,1,1,1), (1,1, 2), (1, 3) and (2, 2).
I know how to solve Coin Change using the concept of UNBOUNDED KNAPSACK. But how Combination Sum IV is different here! Seems so similar

Most common subpath in a collection of paths

There is numerous literature on the Web for the longest common subsequence problem but I have a slightly different problem and was wondering if anyone knows of a fast algorithm.
Say, you have a collection of paths:
[1,2,3,4,5,6,7], [2,3,4,9,10], [3,4,6,7], ...
We see that subpath [3,4] is the most common.
Know of a neat algorithm to find this? For my case there are tens of thousands of paths!
Assuming that a "path" has to encompass at least two elements, then the most common path will obviously have two elements (although there could also be a path with more than two elements that's equally common -- more on this later). So you can just iterate all the lists and count how often each pair of consecutive numbers appears in the different lists and remember those pairs that appear most often. This requires iterating each list once, which is the minimum amount you'd have to do in any case.
If you are interested in the longest most common path, then you can start the same way, finding the most common 2-segment-paths, but additionally to the counts, also record the position of each of those segments (e.g. {(3,4): [2, 1, 0], ...} in your example, the numbers in the list indicating the position of the segment in the different paths). Now, you can take all the most-common length-2-paths and see if for any of those, the next element is also the same for all the occurrences of that path. In this case you have a most-common length-3-path that is equally common as the prior length-2 path (it can not be more common, obviously). You can repeat this for length-4, length-5, etc. until it can no longer be expanded without making the path "less common". This part requires extra work of n*k for each expansion, with n being the number of candidates left and k how often those appear.
(This assumes that frequency beats length, i.e. if there is a length-2 path appearing three times, you prefer this over a length-3 path appearing twice. The same apprach can also be used for a different starting length, e.g. requiring at least length-3 paths, without changing the basic algorithm or the complexity.)
Here's a simple example implementation in Python to demonstrate the algorithm. This only goes up to length-3, but could easily be extended to length-4 and beyond with a loop. Also, it does not check any edge-cases (array-out-of-bounds etc.)
# example data
data = [[1,2, 4,5,6,7, 9],
[1,2,3,4,5,6, 8,9],
[1,2, 4,5,6,7,8 ]]
# step one: count how often and where each pair appears
from collections import defaultdict
pairs = defaultdict(list)
for i, lst in enumerate(data):
for k, pair in enumerate(zip(lst, lst[1:])):
pairs[pair].append((i,k))
# step two: find most common pair and filter
most = max([len(lst) for lst in pairs.values()])
pairs = {k: v for k, v in pairs.items() if len(v) == most}
print(pairs)
# {(1, 2): [(0, 0), (1, 0), (2, 0)], (4, 5): [(0, 2), (1, 3), (2, 2)], (5, 6): [(0, 3), (1, 4), (2, 3)]}
# step three: expand pairs to triplets, triplets to quadruples, etc.
triples = [k + (data[v[0][0]][v[0][1]+2],)
for k, v in pairs.items()
if len(set(data[i][k+2] for (i,k) in v)) == 1]
print(triples)
# [(4, 5, 6)]

Python: Combinations

I am looking to solve a problem that involves different permutations of an array. I would like a function that checks if the array under scrutiny matches a condition, but if it does not, generates a new permutation to check, and so on, so forth. I believe this involves a while statement, so my question lies more in how to create such an algorithm to generate a unique (but not random so as to avoid duplicates) permutation upon each iteration. There exists a restriction: the array will contain at least 2 but no more than 10 elements. Additionally, if the condition matches no permutations, the return should be False I have no code thus far, as I cannot come up with the algorithm I would like to persue yet. Any thoughts would be helpful.
Why do you need to reinvent the wheel? Since you've tagged python, you should know there are a ton of libraries that help you do useful things like this. One such library is itertools, more specifically the itertools.permutations function:
>>> from itertools import permutations
>>> x = [1, 2, 3, 4, 5, 6]
>>> for p in permutations(x):
... print(p)
...
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)
If you must write an algorithm yourself, then you should learn about the Johnson-Trotter Algorithm for generating permutations. It is quite intuitive, and generates permutations in O(n!) time.

Union of two sets given a certain ordering in O(n) time

[Note: I am hoping this problem can be solved in O(n) time. If not, I am looking for a proof that it cannot be solved in O(n) time. If I get the proof, I'll try to implement a new algorithm to reach to the union of these sorted sets in a different way.]
Consider the sets:
(1, 4, 0, 6, 3)
(0, 5, 2, 6, 3)
The resultant should be:
(1, 4, 0, 5, 2, 6, 3)
Please note that the problem of union of sorted sets is easy. These are also sorted sets but the ordering is defined by some other properties from which these indices have been resolved. But the ordering (whatever it is) is valid to both the sets, i.e. for any i, j ∈ Set X if i <= j, then in some other Set Y, for the same i, j, i <= j.
EDIT: I am sorry I have missed something very important that I have covered in one of the comments below — intersection of two sets is not a null set, i.e. the two sets have common elements.
Insert each item in the first set into a hash table.
Go through each item in the second set, looking up that value.
If not found, insert that item into the resulting set.
If found, insert all items from the first set between the last item we inserted up to this value.
At the end, insert all remaining items from the first set into the resulting set.
Running time
Expected O(n).
Side note
With the constraints given, the union is not necessarily unique.
For e.g. (1) (2), the resulting set can be either (1, 2) or (2, 1).
This answer will pick (2, 1).
Implementation note
Obviously looping through the first set to find the last inserted item is not going to result in an O(n) algorithm. Instead we must keep an iterator into the first set (not the hash table), and then we can simply continue from the last position that iterator had.
Here's some pseudo-code, assuming both sets are arrays (for simplicity):
for i = 0 to input1.length
hashTable.insert(input1[i])
i = 0 // this will be our 'iterator' into the first set
for j = 0 to input2.length
if hashTable.contains(input2[j])
do
output.append(input1[i])
i++
while input1[i] != input2[j]
else
output.append(input2[j])
while i < input.length
output.append(input1[i])
The do-while-loop inside the for-loop may look suspicious, but note that each iteration that that loop runs, we increase i, so it can run a total of input1.length times.
Example
Input:
(1, 4, 0, 6, 8, 3)
(0, 5, 2, 6, 3)
Hash table: (1, 4, 0, 6, 8, 3)
Then, go through the second set.
Look up 0, found, so insert 1, 4, 0 into the resulting set
(no item from first set inserted yet, so insert all items from the start until we get 0).
Look up 5, not found, so insert 5 into the resulting set.
Look up 2, not found, so insert 2 into the resulting set.
Look up 6, found, so insert 6 into the resulting set
(last item inserted from first set is 0, so only 6 needs to be inserted).
Look up 3, found, so insert 8, 3 into the resulting set
(last item inserted from first set is 6, so insert all items from after 6 until we get 3).
Output: (1, 4, 0, 5, 2, 6, 8, 3)
We have two ordered sets of indices A and B, which are ordered by some function f(). So we know that f(A[i]) < f(A[j]) iff i < j, and the same holds true for set B.
From here, we got a linear mapping to a "sorted" linear sets, thus reduced to the "problem of union of sorted sets".
This also doesn't have the best space complexity, but you can try:
a = [1,2,3,4,5]
b = [4,2,79,8]
union = {}
for each in a:
union[each]=1
for each in b:
union[each]=1
for each in union:
print each,' ',
Output:
>>> 1 2 3 4 5 8 79

Selecting a surviving population in a "voter" Genetic Algorithm

I've been working on a genetic algorithm where there is a population consisting of individuals with a color, and a preference. Preference and color are from a small number of finite states, probably around 4 or 5. (example: 1|1, 5|2, 3|3 etc)
Every individual casts a "vote" for their preference, which assists those individuals with that vote as their color.
My current idea is to cycle through every individual, and calculate the chance that they should survive, based on number of votes, etc. and then roll a die to see if they live.
I'm currently doing it so that if v[x] represents the percent of votes for color x, individual k with color c has v[c] chance of surviving. However, this means that if there are equal numbers of all 5 types of (a|a) individuals, 4/5 of them perish, and that's not good.
Does anyone have any idea of a method of randomness I could use to determine the chance an individual has to survive? For instance, an algorithm that for v votes for c, v individuals with color c survive (on statistical average).
Assign your fitness (likelyness of survival in your case) to each individual as is, then sort them on descending fitness and use binary tournament selection or something similar to sample another population of your chosen size.
Well, you can weight the probabilities according to the value returned by passing each
member of the population to the cost function.
That seems to me the most straightforward way, consistent with the genetic
meta-heuristic.
More common though, is to divide the current population into segments, based on
the value returned from passing them to the cost function.
So for instance,
if each generation consists of 100 members, then the top N (N is just a user-defined
parameter, often something like 5-10% of the total) members w/ the lowest cost
function result) are carried forward to the next generation just as they are (elitism).
Perhaps this is what you mean by 'survive.' If so, then again, these 'survivors'
are determined by ranking the members of the population according to the cost function
value and selecting those members above your defined elitism fraction constant.
The rest (the majority) of the next generation are created either by
mutation or cross-over.
mutation:
# one member of the current population:
[4, 5, 1, 7, 4, 2, 8, 9]
# small random change in one member of prior generation, to create mutant that is
# a member of the next generation
[4, 9, 1, 7, 4, 2, 8, 9]
crossover:
# two of the 'top' members of the current generation
[4, 5, 1, 7, 4, 2, 8, 9]
[2, 3, 6, 9, 2, 1, 6, 4]
# offpsring is a member of the next generation
[4, 5, 1, 7, 2, 1, 6, 4]

Resources