Ranking algorithm with win-lose records - algorithm

I am looking for an algorithmic approach to sort elements based on its win-lose records of each combiniation.
Please take a look at the sample data
('a', 'b') -> (W, L)
('a', 'c') -> (L, W)
('a', 'd') -> (L, W)
('a', 'e') -> (W, L)
('b', 'c') -> (L, W)
('b', 'd') -> (L, W)
('b', 'e') -> (W, L)
('c', 'd') -> (W, L)
('c', 'e') -> (W, L)
('d', 'e') -> (W, L)
The winner is placed right side of the array
ex)
c win over all the other elements
d win over all the other elements except c
...
Desired result ordered from Lost -> Win
[e, b, a, d, c]
Is there a keyword, or approach I can chase on to solve this problem?

I would go about by assigning each token, w, l, (and you could do draw d) a value, such as w=3, l=1, d=2.
Then you would map those values to each player's result and you'd sort it accordingly.
So from your example of:
('a', 'b') -> (W, L)
('a', 'c') -> (L, W)
('a', 'd') -> (L, W)
('a', 'e') -> (W, L)
('b', 'c') -> (L, W)
('b', 'd') -> (L, W)
('b', 'e') -> (W, L)
('c', 'd') -> (W, L)
('c', 'e') -> (W, L)
('d', 'e') -> (W, L)
gets mapped to something like this:
('a', 'b') -> (2, 1)
('a', 'c') -> (1, 2)
('a', 'd') -> (1, 2)
('a', 'e') -> (2, 1)
('b', 'c') -> (1, 2)
('b', 'd') -> (1, 2)
('b', 'e') -> (2, 1)
('c', 'd') -> (2, 1)
('c', 'e') -> (2, 1)
('d', 'e') -> (2, 1)
Sum up the values by their key:
a: 6
b: 4
c: 8
d: 7
e: 4
and sort the values starting with lowest:
b: 4
e: 4
a: 6
d: 7
c: 8

I am not a native English speaker, please forgive me if there is any grammatical error.
Record the number of wins and losses for each letter, the letter that has not been lost is the largest letter, and the letter that has not been won is the smallest letter.
This problem can be transformed into the longest path algorithm. You can think of each comparison as a path of length 1, so that the longest path from the smallest letter to the largest letter represents the result.

This assumes that the ordering is based on number of wins per item.
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/67630173/ranking-algorithm-with-win-lose-records
Created on Fri May 21 19:00:33 2021
#author: Paddy3118
"""
data = {('a', 'b'): ('W', 'L'),
('a', 'c'): ('L', 'W'),
('a', 'd'): ('L', 'W'),
('a', 'e'): ('W', 'L'),
('b', 'c'): ('L', 'W'),
('b', 'd'): ('L', 'W'),
('b', 'e'): ('W', 'L'),
('c', 'd'): ('W', 'L'),
('c', 'e'): ('W', 'L'),
('d', 'e'): ('W', 'L'),
}
all_items = set()
for (i1, i2) in data.keys():
all_items |= {i1, i2} # Finally = {'a', 'b', 'c', 'd', 'e'}
win_counts = {item: 0 for item in all_items}
for (i1, i2), (r1, r2) in data.items():
if r1 == 'W':
win_counts[i1] += 1
else:
win_counts[i2] += 1
# win_counts = {'a': 2, 'd': 3, 'b': 1, 'e': 0, 'c': 4}
answer = sorted(all_items, key=lambda i: win_counts[i])
print(answer) # ['e', 'b', 'a', 'd', 'c']

Related

Generating the powerset of a multiset

Suppose I have a multiset
{a,a,a,b,c}
from which I can make the following selections:
{}
{a}
{a,a}
{a,a,a}
{a,a,a,b}
{a,a,a,b,c}
{a,a,a,c}
{a,a,b}
{a,a,b,c}
{a,a,c}
{a,b}
{a,b,c}
{a,c}
{b}
{b,c}
{c}
Notice that the number of selections equals 16. The cardinality of a powerset of a multiset, card(P(M)), is defined on OEIS as
card(P(M)) = prod(mult(x) + 1) for all x in M
where mult(x) is the multiplicity of x in M and prod is the product of the terms. So for our example, this would amount to 4 x 2 x 2 = 16.
Let's say, for example, that the multiplicity of these elements is very high:
m(a) = 21
m(b) = 36
m(c) = 44
Then
card(P(M)) = 22 * 37 * 45 = 36630.
But if we were to treat all those elements as distinct - as a set - the cardinality of the powerset would be
card(P(S)) = 2^(21+36+44) = 2535301200456458802993406410752.
The "standard" solution for this problem suggests to just compute the powerset of the set where all of the elements are treated as distinct, and then prune the results to remove the duplicates. That's a solution with O(2^n) complexity.
Does a general algorithm for generating a powerset of a multiset with complexity on the order of card(P(M)) exist?
powerset recipe with itertools
What you are asking is usually called the powerset and is available as an itertools recipe, as well as a function in the module more_itertools. See the documentation:
itertools recipe;
more_itertools.powerset.
multiset = ['a', 'a', 'a', 'b', 'c']
#
# USING ITERTOOLS
#
import itertools
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
print(list(powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
#
# USING MORE_ITERTOOLS
#
import more_itertools
print(list(more_itertools.powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
Powerset of a collections.Counter object
In Python, multisets are usually represented with a collections.Counter rather than with a list. The class collections.Counter is a subclass of dict; it implements dictionaries that map elements to counts, as well as several useful methods such as building a Counter by counting occurrences in a sequence.
Taking the powerset of a Counter is the topic of another question on stackoverflow:
How to generate all the subsets of a Counter?
Although I am not aware of an already-implemented method doing this in standard modules, the answer to that question presents one solution using itertools:
import collections
import itertools
multiset = collections.Counter(['a', 'a', 'a', 'b', 'c'])
# Counter({'a': 3, 'b': 1, 'c': 1})
def powerset(multiset):
range_items = [[(x, z) for z in range(y + 1)] for x,y in multiset.items()]
products = itertools.product(*range_items)
return [{k: v for k, v in pairs if v > 0} for pairs in products]
print(powerset(multiset))
# [{}, {'c': 1}, {'b': 1}, {'b': 1, 'c': 1}, {'a': 1}, {'a': 1, 'c': 1}, {'a': 1, 'b': 1}, {'a': 1, 'b': 1, 'c': 1}, {'a': 2}, {'a': 2, 'c': 1}, {'a': 2, 'b': 1}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3}, {'a': 3, 'c': 1}, {'a': 3, 'b': 1}, {'a': 3, 'b': 1, 'c': 1}]
This will give you all the combinations of lst as tuples. Hope this answers your question.
from itertools import combinations
lst = ['a', 'a', 'a', 'b', 'c']
combs = set()
for i in range(len(lst)+1):
els = [tuple(x) for x in combinations(lst, i)]
for item in tuple(els):
combs.add(item)
print(combs)
The best way I like to think of it is that you start out with the empty set and then for each character you are making a choice of either adding it to the current existing sets or not adding it. Since you have 2 choices at each step, the number of total elements in a powerset is 2^n. Implementing this can be done through this Java code:
public List<List<Integer>> subsets(int[] nums) {
// list variable contains all of the subsets
List<List<Integer>> list = new ArrayList<>();
// add the empty set to start with
list.add(new ArrayList<Integer>());
for (int i = 0; i < nums.length; i++) {
//find current list size
int length = list.size();
// Loop through and add the current element to all existing
subsets
//Represents making the choice of adding the element
for (int j = 0; j < length; j++) {
// making a copy of current subset list
ArrayList<Integer> temp = new ArrayList<>(list.get(j));
temp.add(nums[i]);
list.add(temp);
}
}
return list;
}

How To Find All Possible Permutations From A Bag under apache pig

i'm trying to find all combinations possible using apache pig, i was able to generate permutation but i want to eliminate the replication of values i write this code :
A = LOAD 'data' AS f1:chararray;
DUMP A;
('A')
('B')
('C')
B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;
And the result i obtained is like :
DUMP D;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
but what i'm trying to obtain the result is like bellow
DUMP R;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
how can i do this? i avoid to use comparison of characters because it's possible to have multiple occurrences of a string in more than a line
You can FILTER D to remove the rows you don't want. For example
A = load 'testdata.txt';
B = foreach A generate $0;
C = Cross A, B;
D = filter C by $0 <= $1;
dump D;
which prints out
(C,C)
(B,C)
(B,B)
(A,C)
(A,B)
(A,A)
when 'testdata.txt' has
A
B
C

Algorithm to generate permutations of a list of strings and their substrings

This algorithm has been escaping me for some time now. Lets say I'm given the string "cccaatt". I'm trying to generate all the possible variations of each substring of repeated letters. EG, "cccaatt" as an input would return:
cat,
catt,
caat,
caatt,
ccat,
ccatt,
ccaat,
ccaatt,
cccat,
cccatt,
cccaat,
cccaatt
The order of the results does not matter, so long as it returns all of them. Generally, the input is a string, consisting of g groups of repeated letters, each group k_n letters long.
My intuition is that this is a recursive algorithm, but the exact structure of it has been difficult to understand.
If you store the alphabet and maximum occurrences of each letter (as awesomely mentioned in the comments) you can do this:
function variations(letter_type, current string) {
if (letter_type is in the alphabet) {
while (string has fewer than the max amount of that letter) {
add one of that letter to current string
variations(next letter, current string)
}
} else {
print current string // since there are no more letters to add
}
}
In Java:
public class Variations {
static String[] alphabet = {"c","a","t"};
static int[] maximums = {3, 2, 2};
public static void main(String[] args) {
variations(0, "");
}
public static void variations(int letter_type, String curr) {
if (letter_type < alphabet.length) {
for (int i = 1; i <= maximums[letter_type]; i++) {
curr += alphabet[letter_type];
variations(letter_type+1, curr);
}
} else {
System.out.println(curr);
}
}
}
Decompose the string into a list of numbers and the number of repeats, i.e. "cccaatt" => [(c,3), (a,2), (t,2)]. then the problem could be defined recursively.
Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)]
define Perm(xs):
if len(xs) == 1:
return all length variations of xs
else:
return every sequence in Perm(x[:-1]) appended with one or more from x[-1]
I'll have a python example shortly.
> perm("cccaatt")
> ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt']
Code attached
def perm(xs):
if not xs:
return []
# group them into the correct format, probably should have used groupby + zip
l = [(xs[0],1)]
for x in xs[1:]:
last, num = l[-1]
if last == x:
l[-1] = (last, num+1)
else:
l.append((x, 1))
# print(l)
print(recurse(l))
# this is where the real work is done.
def recurse(xs):
if len(xs) == 1:
return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ]
prev = recurse(xs[:-1])
char, num = xs[-1]
return [ y + x * char for x in range(1,num + 1) for y in prev ]
The Python itertools module has powerful tools to group and then to iterate on members of groups leading to the following program.
I have shown some intermediate results and used the pprint module to prettyprint the answer:
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import itertools
>>> instring = "cccaatt"
>>> [(x[0], list(x[1])) for x in itertools.groupby(instring)]
[('c', ['c', 'c', 'c']), ('a', ['a', 'a']), ('t', ['t', 't'])]
>>> xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
>>> xx
[['c', 'cc', 'ccc'], ['a', 'aa'], ['t', 'tt']]
>>> list(itertools.product(*xx))
[('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')]
>>> from pprint import pprint as pp
>>> pp(list(itertools.product(*xx)))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
Or as a function:
>>> def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return list(itertools.product(*xx))
>>> pp(stringexpand("cccaatt"))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
You seem to need the strings joined from their parts. This can be done in this slight mod:
def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return [''.join(parts) for parts in itertools.product(*xx)]
Which returns:
['cat',
'catt',
'caat',
'caatt',
'ccat',
'ccatt',
'ccaat',
'ccaatt',
'cccat',
'cccatt',
'cccaat',
'cccaatt']

Python: What is the right way to modify list elements?

I've this list with tuples:
l = [('a','b'),('c','d'),('e','f')]
And two parameters: a key value, and a new value to modify. For example,
key = 'a'
new_value= 'B' # it means, modify with 'B' the value in tuples where there's an 'a'
I've this two options (both works):
f = lambda t,k,v: t[0] == k and (k,v) or t
new_list = [f(t,key,new_value) for t in l]
print new_list
and
new_list = []
for i in range(len(l)):
elem = l.pop()
if elem[0] == key:
new_list.append((key,new_value))
else:
new_list.append(elem)
print new_list
But, i'm new in Python, and don't know if its right.
Can you help me? Thank you!
Here is one solution involving altering the items in-place.
def replace(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
Or, to append if it's not in there,
def replace_or_append(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
break
else:
list_.append((key, new_value))
Usage:
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> replace(my_list, 'a', 'B')
>>> my_list
[('a', 'B'), ('c', 'd')]
If you want to create a new list, a list comprehension is easiest.
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> find_key = 'a'
>>> new_value = 'B'
>>> new_list = [(key, new_value if key == find_key else value) for key, value in my_list]
>>> new_list
[('a', 'B'), ('c', 'd')]
And if you wanted it to append if it wasn't there,
>>> if len(new_list) == len(my_list):
... new_list.append((find_key, new_value))
(Note also I've changed your variable name from l; l is too easily confused with I and 1 and is best avoided. Thus saith PEP8 and I agree with it.)
To create a new list, a list comprehension would do:
In [102]: [(key,'B' if key=='a' else val) for key,val in l]
Out[102]: [('a', 'B'), ('c', 'd'), ('e', 'f')]
To modify the list in place:
l = [('a','b'),('c','d'),('e','f')]
for i,elt in enumerate(l):
key,val=elt
if key=='a':
l[i]=(key,'B')
print(l)
# [('a', 'B'), ('c', 'd'), ('e', 'f')]
To modify existing list just use list assignment, e.g.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> l[0] = ('a','B')
>>> print l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
I would usually prefer to create a new list using comprehension, e.g.
[(key, new_value) if x[0] == key else x for x in l]
But, as the first comment has already mentioned, it sounds like you are trying to make a list do something which you should really be using a dict for instead.
Here's the approach I would use.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> key = 'a'
>>> new_value= 'B'
>>> for pos in (index for index, (k, v) in enumerate(l) if k == key):
... l[pos] = (key, new_value)
... break
... else:
... l.append((key, new_value))
...
>>> l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
This looks an awful lot like an OrderedDict, though; key-value pairs with preserved ordering. You might want to take a look at that and see if it suits your needs
Edit: Replaced try:...except StopIteration: with for:...break...else: since that might look a bit less weird.

find the longest sequence S that is a subsequence of A,B,C string

Give a polynomial time algorithm that takes three strings, A, B and C, as input, and returns the longest sequence S that is a subsequence of A, B, and C.
Let dp[i, j, k] = longest common subsequence of prefixes A[1..i], B[1..j], C[1..k]
We have:
dp[i, j, k] = dp[i - 1, j - 1, k - 1] + 1 if A[i] = B[j] = C[k]
max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise
Similar to the 2d case, except you have 3 dimensions. Complexity is O(len A * len B * len C).
Here's a solution in Python for an arbitrary number of sequences. You could use it to test your solution for 2D, 3D cases. It closely follows Wikipedia's algorithm:
#!/usr/bin/env python
import functools
from itertools import starmap
#memoize
def lcs(*seqs):
"""Find longest common subsequence of `seqs` sequences.
Complexity: O(len(seqs)*min(seqs, key=len)*reduce(mul,map(len,seqs)))
"""
if not all(seqs): return () # at least one sequence is empty
heads, tails = zip(*[(seq[0], seq[1:]) for seq in seqs])
if all(heads[0] == h for h in heads): # all seqs start with the same element
return (heads[0],) + lcs(*tails)
return max(starmap(lcs, (seqs[:i]+(tails[i],)+seqs[i+1:]
for i in xrange(len(seqs)))), key=len)
def memoize(func):
cache = {}
#functools.wraps(func)
def wrapper(*args):
try: return cache[args]
except KeyError:
r = cache[args] = func(*args)
return r
return wrapper
Note: without memoization it is an exponential algorithm (wolfram alpha):
$ RSolve[{a[n] == K a[n-1] + K, a[0] = K}, a[n], n]
a(n) = (K^(n + 1) - 1) K/(K - 1)
where K == len(seqs) and n == max(map(len, seqs))
Examples
>>> lcs("agcat", "gac")
('g', 'a')
>>> lcs("banana", "atana")
('a', 'a', 'n', 'a')
>>> lcs("abc", "acb")
('a', 'c')
>>> lcs("XMJYAUZ", "MZJAWXU")
('M', 'J', 'A', 'U')
>>> lcs("XMJYAUZ")
('X', 'M', 'J', 'Y', 'A', 'U', 'Z')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGHI")
('M', 'J', 'A')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGUHI", "ZYXJAQRU")
('J', 'A', 'U')
>>> lcs() #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:
>>> lcs(*"abecd acbed".split())
('a', 'b', 'e', 'd')
>>> lcs("acd", lcs("abecd", "acbed"))
('a', 'd')
>>> lcs(*"abecd acbed acd".split())
('a', 'c', 'd')
All you have to do is Google "longest subsequence".
This is the top link: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
If you have any particular problem understanding it then please ask here, preferably with a more specific question.

Resources