Generating the powerset of a multiset - algorithm

Suppose I have a multiset
{a,a,a,b,c}
from which I can make the following selections:
{}
{a}
{a,a}
{a,a,a}
{a,a,a,b}
{a,a,a,b,c}
{a,a,a,c}
{a,a,b}
{a,a,b,c}
{a,a,c}
{a,b}
{a,b,c}
{a,c}
{b}
{b,c}
{c}
Notice that the number of selections equals 16. The cardinality of a powerset of a multiset, card(P(M)), is defined on OEIS as
card(P(M)) = prod(mult(x) + 1) for all x in M
where mult(x) is the multiplicity of x in M and prod is the product of the terms. So for our example, this would amount to 4 x 2 x 2 = 16.
Let's say, for example, that the multiplicity of these elements is very high:
m(a) = 21
m(b) = 36
m(c) = 44
Then
card(P(M)) = 22 * 37 * 45 = 36630.
But if we were to treat all those elements as distinct - as a set - the cardinality of the powerset would be
card(P(S)) = 2^(21+36+44) = 2535301200456458802993406410752.
The "standard" solution for this problem suggests to just compute the powerset of the set where all of the elements are treated as distinct, and then prune the results to remove the duplicates. That's a solution with O(2^n) complexity.
Does a general algorithm for generating a powerset of a multiset with complexity on the order of card(P(M)) exist?

powerset recipe with itertools
What you are asking is usually called the powerset and is available as an itertools recipe, as well as a function in the module more_itertools. See the documentation:
itertools recipe;
more_itertools.powerset.
multiset = ['a', 'a', 'a', 'b', 'c']
#
# USING ITERTOOLS
#
import itertools
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
print(list(powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
#
# USING MORE_ITERTOOLS
#
import more_itertools
print(list(more_itertools.powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
Powerset of a collections.Counter object
In Python, multisets are usually represented with a collections.Counter rather than with a list. The class collections.Counter is a subclass of dict; it implements dictionaries that map elements to counts, as well as several useful methods such as building a Counter by counting occurrences in a sequence.
Taking the powerset of a Counter is the topic of another question on stackoverflow:
How to generate all the subsets of a Counter?
Although I am not aware of an already-implemented method doing this in standard modules, the answer to that question presents one solution using itertools:
import collections
import itertools
multiset = collections.Counter(['a', 'a', 'a', 'b', 'c'])
# Counter({'a': 3, 'b': 1, 'c': 1})
def powerset(multiset):
range_items = [[(x, z) for z in range(y + 1)] for x,y in multiset.items()]
products = itertools.product(*range_items)
return [{k: v for k, v in pairs if v > 0} for pairs in products]
print(powerset(multiset))
# [{}, {'c': 1}, {'b': 1}, {'b': 1, 'c': 1}, {'a': 1}, {'a': 1, 'c': 1}, {'a': 1, 'b': 1}, {'a': 1, 'b': 1, 'c': 1}, {'a': 2}, {'a': 2, 'c': 1}, {'a': 2, 'b': 1}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3}, {'a': 3, 'c': 1}, {'a': 3, 'b': 1}, {'a': 3, 'b': 1, 'c': 1}]

This will give you all the combinations of lst as tuples. Hope this answers your question.
from itertools import combinations
lst = ['a', 'a', 'a', 'b', 'c']
combs = set()
for i in range(len(lst)+1):
els = [tuple(x) for x in combinations(lst, i)]
for item in tuple(els):
combs.add(item)
print(combs)

The best way I like to think of it is that you start out with the empty set and then for each character you are making a choice of either adding it to the current existing sets or not adding it. Since you have 2 choices at each step, the number of total elements in a powerset is 2^n. Implementing this can be done through this Java code:
public List<List<Integer>> subsets(int[] nums) {
// list variable contains all of the subsets
List<List<Integer>> list = new ArrayList<>();
// add the empty set to start with
list.add(new ArrayList<Integer>());
for (int i = 0; i < nums.length; i++) {
//find current list size
int length = list.size();
// Loop through and add the current element to all existing
subsets
//Represents making the choice of adding the element
for (int j = 0; j < length; j++) {
// making a copy of current subset list
ArrayList<Integer> temp = new ArrayList<>(list.get(j));
temp.add(nums[i]);
list.add(temp);
}
}
return list;
}

Related

Ranking algorithm with win-lose records

I am looking for an algorithmic approach to sort elements based on its win-lose records of each combiniation.
Please take a look at the sample data
('a', 'b') -> (W, L)
('a', 'c') -> (L, W)
('a', 'd') -> (L, W)
('a', 'e') -> (W, L)
('b', 'c') -> (L, W)
('b', 'd') -> (L, W)
('b', 'e') -> (W, L)
('c', 'd') -> (W, L)
('c', 'e') -> (W, L)
('d', 'e') -> (W, L)
The winner is placed right side of the array
ex)
c win over all the other elements
d win over all the other elements except c
...
Desired result ordered from Lost -> Win
[e, b, a, d, c]
Is there a keyword, or approach I can chase on to solve this problem?
I would go about by assigning each token, w, l, (and you could do draw d) a value, such as w=3, l=1, d=2.
Then you would map those values to each player's result and you'd sort it accordingly.
So from your example of:
('a', 'b') -> (W, L)
('a', 'c') -> (L, W)
('a', 'd') -> (L, W)
('a', 'e') -> (W, L)
('b', 'c') -> (L, W)
('b', 'd') -> (L, W)
('b', 'e') -> (W, L)
('c', 'd') -> (W, L)
('c', 'e') -> (W, L)
('d', 'e') -> (W, L)
gets mapped to something like this:
('a', 'b') -> (2, 1)
('a', 'c') -> (1, 2)
('a', 'd') -> (1, 2)
('a', 'e') -> (2, 1)
('b', 'c') -> (1, 2)
('b', 'd') -> (1, 2)
('b', 'e') -> (2, 1)
('c', 'd') -> (2, 1)
('c', 'e') -> (2, 1)
('d', 'e') -> (2, 1)
Sum up the values by their key:
a: 6
b: 4
c: 8
d: 7
e: 4
and sort the values starting with lowest:
b: 4
e: 4
a: 6
d: 7
c: 8
I am not a native English speaker, please forgive me if there is any grammatical error.
Record the number of wins and losses for each letter, the letter that has not been lost is the largest letter, and the letter that has not been won is the smallest letter.
This problem can be transformed into the longest path algorithm. You can think of each comparison as a path of length 1, so that the longest path from the smallest letter to the largest letter represents the result.
This assumes that the ordering is based on number of wins per item.
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/67630173/ranking-algorithm-with-win-lose-records
Created on Fri May 21 19:00:33 2021
#author: Paddy3118
"""
data = {('a', 'b'): ('W', 'L'),
('a', 'c'): ('L', 'W'),
('a', 'd'): ('L', 'W'),
('a', 'e'): ('W', 'L'),
('b', 'c'): ('L', 'W'),
('b', 'd'): ('L', 'W'),
('b', 'e'): ('W', 'L'),
('c', 'd'): ('W', 'L'),
('c', 'e'): ('W', 'L'),
('d', 'e'): ('W', 'L'),
}
all_items = set()
for (i1, i2) in data.keys():
all_items |= {i1, i2} # Finally = {'a', 'b', 'c', 'd', 'e'}
win_counts = {item: 0 for item in all_items}
for (i1, i2), (r1, r2) in data.items():
if r1 == 'W':
win_counts[i1] += 1
else:
win_counts[i2] += 1
# win_counts = {'a': 2, 'd': 3, 'b': 1, 'e': 0, 'c': 4}
answer = sorted(all_items, key=lambda i: win_counts[i])
print(answer) # ['e', 'b', 'a', 'd', 'c']

Use dynamic programming to merge two arrays such that the number of repetitions of the same element is minimised

Let's say we have two arrays m and n containing the characters from the set a, b, c , d, e. Assume each character in the set has a cost associated with it, consider the costs to be a=1, b=3, c=4, d=5, e=7.
for example
m = ['a', 'b', 'c', 'd', 'd', 'e', 'a']
n = ['b', 'b', 'b', 'a', 'c', 'e', 'd']
Suppose we would like to merge m and n to form a larger array s.
An example of s array could be
s = ['a', 'b', 'c', 'd', 'd', 'e', 'a', 'b', 'b', 'b', 'a', 'c', 'e', 'd']
or
s = ['b', 'a', 'd', 'd', 'd', 'b', 'e', 'c', 'b', 'a', 'b', 'a', 'c', 'e']
If there are two or more identical characters adjacent to eachother a penalty is applied which is equal to: number of adjacent characters of the same type * the cost for that character. Consider the second example for s above which contains a sub-array ['d', 'd', 'd']. In this case a penalty of 3*5 will be applied because the cost associated with d is 5 and the number of repetitions of d is 3.
Design a dynamic programming algorithm which minimises the cost associated with s.
Does anyone have any resources, papers, or algorithms they could share to help point me in the right direction?

Python: List all possible paths in graph represented by dictionary

I have a dictionary with keys representing nodes and values representing possible nodes that the key can traverse to.
Example:
dependecyDict = { 'A': ['D'], 'B': ['A', 'E'], 'C': ['B'], 'D': ['C'], 'G':['H']}
I want to create a new dicitonary, ChainsDict, that will contain all 'values' that each 'key' can traverse to by means of dependecyDict.
For example, the output of the program with this example will be:
ChainsDict = {'A': ['D', 'C', 'B','E'], 'B':['A','D','C','E'], 'C':['B','A','D','E'], 'D':['C','B','A','E'], 'G': ['H']}
I think using a recursive algorithm is the best way to go about making a solution and I tried modifying a shortest path traversing algorithm as follows:
def helper(dependencyDict, ChainsDict):path = []
for key in dependencyDict:
path = path + [(recursiveRowGen(dependencyDict,key))]
for paths in path:
ChainsDict[paths[0]] = paths[1:]
print(finalLineDict)
def recursiveRowGen(dependencyDict,key,path = []):
path = path + [key]
if not key in dependencyDict:
print("no key: ",key)
return path
print(dependencyDict[key])
for blocking in dependencyDict[key]:
if blocking not in path:
newpath = recursiveRowGen(dependencyDict,blocking,path)
if newpath:
return newpath
return path
This code however is having problems capturing the correct output when a key in dependecyDict has more than one value.
I found a hacky solution but it doesn't feel very elegant. Any help is appreciated, thanks!
Here is a recursive solution:
Code
def get_chain_d(argDict):
def each_path(i,caller_chain):
a=[]
caller_chain.append(i)
b = argDict.get(i,[])
for j in b:
if j not in caller_chain:
a.append(j)
a.extend(each_path(j,caller_chain))
return a
return {i:each_path(i,[]) for i in argDict}
dependecyDict = { 'A': ['D'], 'B': ['A', 'E'], 'C': ['B'], 'D': ['C'], 'G':['H']}
print(get_chain_d(dependecyDict))
Output:
{'B': ['A', 'D', 'C', 'E'], 'A': ['D', 'C', 'B', 'E'], 'D': ['C', 'B', 'A', 'E'], 'C': ['B', 'A', 'D', 'E'], 'G': ['H']}
This is basically a graph traversal problem. You can represent each of your key as a node in a graph and its values are the nodes it is connected to.
You can do either Depth-first-search or breadth-first-search for graph. Of course, there's also an iterative and a recursive solution for each of these method. Here's an iterative implementation (I added a few conditionals to eliminate loops):
dependencyDict = { 'A': ['D'], 'B': ['A', 'E'], 'C': ['B'], 'D': ['C'], 'G':['H'] }
chainsDict = {}
for key in dependencyDict:
currKey = key
frontier = [key]
visited = []
while frontier:
currKey = frontier[0]
frontier.remove(currKey)
if dependencyDict.get(currKey,0) and (currKey not in visited) and (currKey not in frontier):
nodes = dependencyDict[currKey]
frontier.extend(nodes)
visited.append(currKey)
elif currKey in visited:
visited.remove(currKey)
elif dependencyDict.get(currKey,0) == 0:
visited.append(currKey)
for i in visited:
if i == key:
visited.remove(i)
chainsDict[key] = visited
print chainsDict
The result looks like:
{'A': ['D', 'C', 'B', 'E'], 'C': ['B', 'A', 'E', 'D'], 'B': ['A', 'E', 'D', 'C'], 'D': ['C', 'B', 'A', 'E'], 'G': ['H']}

Algorithm to generate permutations of a list of strings and their substrings

This algorithm has been escaping me for some time now. Lets say I'm given the string "cccaatt". I'm trying to generate all the possible variations of each substring of repeated letters. EG, "cccaatt" as an input would return:
cat,
catt,
caat,
caatt,
ccat,
ccatt,
ccaat,
ccaatt,
cccat,
cccatt,
cccaat,
cccaatt
The order of the results does not matter, so long as it returns all of them. Generally, the input is a string, consisting of g groups of repeated letters, each group k_n letters long.
My intuition is that this is a recursive algorithm, but the exact structure of it has been difficult to understand.
If you store the alphabet and maximum occurrences of each letter (as awesomely mentioned in the comments) you can do this:
function variations(letter_type, current string) {
if (letter_type is in the alphabet) {
while (string has fewer than the max amount of that letter) {
add one of that letter to current string
variations(next letter, current string)
}
} else {
print current string // since there are no more letters to add
}
}
In Java:
public class Variations {
static String[] alphabet = {"c","a","t"};
static int[] maximums = {3, 2, 2};
public static void main(String[] args) {
variations(0, "");
}
public static void variations(int letter_type, String curr) {
if (letter_type < alphabet.length) {
for (int i = 1; i <= maximums[letter_type]; i++) {
curr += alphabet[letter_type];
variations(letter_type+1, curr);
}
} else {
System.out.println(curr);
}
}
}
Decompose the string into a list of numbers and the number of repeats, i.e. "cccaatt" => [(c,3), (a,2), (t,2)]. then the problem could be defined recursively.
Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)]
define Perm(xs):
if len(xs) == 1:
return all length variations of xs
else:
return every sequence in Perm(x[:-1]) appended with one or more from x[-1]
I'll have a python example shortly.
> perm("cccaatt")
> ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt']
Code attached
def perm(xs):
if not xs:
return []
# group them into the correct format, probably should have used groupby + zip
l = [(xs[0],1)]
for x in xs[1:]:
last, num = l[-1]
if last == x:
l[-1] = (last, num+1)
else:
l.append((x, 1))
# print(l)
print(recurse(l))
# this is where the real work is done.
def recurse(xs):
if len(xs) == 1:
return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ]
prev = recurse(xs[:-1])
char, num = xs[-1]
return [ y + x * char for x in range(1,num + 1) for y in prev ]
The Python itertools module has powerful tools to group and then to iterate on members of groups leading to the following program.
I have shown some intermediate results and used the pprint module to prettyprint the answer:
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import itertools
>>> instring = "cccaatt"
>>> [(x[0], list(x[1])) for x in itertools.groupby(instring)]
[('c', ['c', 'c', 'c']), ('a', ['a', 'a']), ('t', ['t', 't'])]
>>> xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
>>> xx
[['c', 'cc', 'ccc'], ['a', 'aa'], ['t', 'tt']]
>>> list(itertools.product(*xx))
[('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')]
>>> from pprint import pprint as pp
>>> pp(list(itertools.product(*xx)))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
Or as a function:
>>> def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return list(itertools.product(*xx))
>>> pp(stringexpand("cccaatt"))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
You seem to need the strings joined from their parts. This can be done in this slight mod:
def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return [''.join(parts) for parts in itertools.product(*xx)]
Which returns:
['cat',
'catt',
'caat',
'caatt',
'ccat',
'ccatt',
'ccaat',
'ccaatt',
'cccat',
'cccatt',
'cccaat',
'cccaatt']

Python: What is the right way to modify list elements?

I've this list with tuples:
l = [('a','b'),('c','d'),('e','f')]
And two parameters: a key value, and a new value to modify. For example,
key = 'a'
new_value= 'B' # it means, modify with 'B' the value in tuples where there's an 'a'
I've this two options (both works):
f = lambda t,k,v: t[0] == k and (k,v) or t
new_list = [f(t,key,new_value) for t in l]
print new_list
and
new_list = []
for i in range(len(l)):
elem = l.pop()
if elem[0] == key:
new_list.append((key,new_value))
else:
new_list.append(elem)
print new_list
But, i'm new in Python, and don't know if its right.
Can you help me? Thank you!
Here is one solution involving altering the items in-place.
def replace(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
Or, to append if it's not in there,
def replace_or_append(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
break
else:
list_.append((key, new_value))
Usage:
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> replace(my_list, 'a', 'B')
>>> my_list
[('a', 'B'), ('c', 'd')]
If you want to create a new list, a list comprehension is easiest.
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> find_key = 'a'
>>> new_value = 'B'
>>> new_list = [(key, new_value if key == find_key else value) for key, value in my_list]
>>> new_list
[('a', 'B'), ('c', 'd')]
And if you wanted it to append if it wasn't there,
>>> if len(new_list) == len(my_list):
... new_list.append((find_key, new_value))
(Note also I've changed your variable name from l; l is too easily confused with I and 1 and is best avoided. Thus saith PEP8 and I agree with it.)
To create a new list, a list comprehension would do:
In [102]: [(key,'B' if key=='a' else val) for key,val in l]
Out[102]: [('a', 'B'), ('c', 'd'), ('e', 'f')]
To modify the list in place:
l = [('a','b'),('c','d'),('e','f')]
for i,elt in enumerate(l):
key,val=elt
if key=='a':
l[i]=(key,'B')
print(l)
# [('a', 'B'), ('c', 'd'), ('e', 'f')]
To modify existing list just use list assignment, e.g.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> l[0] = ('a','B')
>>> print l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
I would usually prefer to create a new list using comprehension, e.g.
[(key, new_value) if x[0] == key else x for x in l]
But, as the first comment has already mentioned, it sounds like you are trying to make a list do something which you should really be using a dict for instead.
Here's the approach I would use.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> key = 'a'
>>> new_value= 'B'
>>> for pos in (index for index, (k, v) in enumerate(l) if k == key):
... l[pos] = (key, new_value)
... break
... else:
... l.append((key, new_value))
...
>>> l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
This looks an awful lot like an OrderedDict, though; key-value pairs with preserved ordering. You might want to take a look at that and see if it suits your needs
Edit: Replaced try:...except StopIteration: with for:...break...else: since that might look a bit less weird.

Resources