Python: What is the right way to modify list elements?

Python: What is the right way to modify list elements? - data-structures

I've this list with tuples:
l = [('a','b'),('c','d'),('e','f')]
And two parameters: a key value, and a new value to modify. For example,
key = 'a'
new_value= 'B' # it means, modify with 'B' the value in tuples where there's an 'a'
I've this two options (both works):
f = lambda t,k,v: t[0] == k and (k,v) or t
new_list = [f(t,key,new_value) for t in l]
print new_list
and
new_list = []
for i in range(len(l)):
elem = l.pop()
if elem[0] == key:
new_list.append((key,new_value))
else:
new_list.append(elem)
print new_list
But, i'm new in Python, and don't know if its right.
Can you help me? Thank you!

Here is one solution involving altering the items in-place.
def replace(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
Or, to append if it's not in there,
def replace_or_append(list_, key, new_value):
for i, (current_key, current_value) in enumerate(list_):
if current_key == key:
list_[i] = (key, new_value)
break
else:
list_.append((key, new_value))
Usage:
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> replace(my_list, 'a', 'B')
>>> my_list
[('a', 'B'), ('c', 'd')]
If you want to create a new list, a list comprehension is easiest.
>>> my_list = [('a', 'b'), ('c', 'd')]
>>> find_key = 'a'
>>> new_value = 'B'
>>> new_list = [(key, new_value if key == find_key else value) for key, value in my_list]
>>> new_list
[('a', 'B'), ('c', 'd')]
And if you wanted it to append if it wasn't there,
>>> if len(new_list) == len(my_list):
... new_list.append((find_key, new_value))
(Note also I've changed your variable name from l; l is too easily confused with I and 1 and is best avoided. Thus saith PEP8 and I agree with it.)

To create a new list, a list comprehension would do:
In [102]: [(key,'B' if key=='a' else val) for key,val in l]
Out[102]: [('a', 'B'), ('c', 'd'), ('e', 'f')]
To modify the list in place:
l = [('a','b'),('c','d'),('e','f')]
for i,elt in enumerate(l):
key,val=elt
if key=='a':
l[i]=(key,'B')
print(l)
# [('a', 'B'), ('c', 'd'), ('e', 'f')]

To modify existing list just use list assignment, e.g.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> l[0] = ('a','B')
>>> print l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
I would usually prefer to create a new list using comprehension, e.g.
[(key, new_value) if x[0] == key else x for x in l]
But, as the first comment has already mentioned, it sounds like you are trying to make a list do something which you should really be using a dict for instead.

Here's the approach I would use.
>>> l = [('a','b'),('c','d'),('e','f')]
>>> key = 'a'
>>> new_value= 'B'
>>> for pos in (index for index, (k, v) in enumerate(l) if k == key):
... l[pos] = (key, new_value)
... break
... else:
... l.append((key, new_value))
...
>>> l
[('a', 'B'), ('c', 'd'), ('e', 'f')]
This looks an awful lot like an OrderedDict, though; key-value pairs with preserved ordering. You might want to take a look at that and see if it suits your needs
Edit: Replaced try:...except StopIteration: with for:...break...else: since that might look a bit less weird.

Related

Generating the powerset of a multiset

Suppose I have a multiset
{a,a,a,b,c}
from which I can make the following selections:
{}
{a}
{a,a}
{a,a,a}
{a,a,a,b}
{a,a,a,b,c}
{a,a,a,c}
{a,a,b}
{a,a,b,c}
{a,a,c}
{a,b}
{a,b,c}
{a,c}
{b}
{b,c}
{c}
Notice that the number of selections equals 16. The cardinality of a powerset of a multiset, card(P(M)), is defined on OEIS as
card(P(M)) = prod(mult(x) + 1) for all x in M
where mult(x) is the multiplicity of x in M and prod is the product of the terms. So for our example, this would amount to 4 x 2 x 2 = 16.
Let's say, for example, that the multiplicity of these elements is very high:
m(a) = 21
m(b) = 36
m(c) = 44
Then
card(P(M)) = 22 * 37 * 45 = 36630.
But if we were to treat all those elements as distinct - as a set - the cardinality of the powerset would be
card(P(S)) = 2^(21+36+44) = 2535301200456458802993406410752.
The "standard" solution for this problem suggests to just compute the powerset of the set where all of the elements are treated as distinct, and then prune the results to remove the duplicates. That's a solution with O(2^n) complexity.
Does a general algorithm for generating a powerset of a multiset with complexity on the order of card(P(M)) exist?

powerset recipe with itertools
What you are asking is usually called the powerset and is available as an itertools recipe, as well as a function in the module more_itertools. See the documentation:
itertools recipe;
more_itertools.powerset.
multiset = ['a', 'a', 'a', 'b', 'c']
#
# USING ITERTOOLS
#
import itertools
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
print(list(powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
#
# USING MORE_ITERTOOLS
#
import more_itertools
print(list(more_itertools.powerset(multiset)))
# [(), ('a',), ('a',), ('a',), ('b',), ('c',), ('a', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'a', 'a', 'b'), ('a', 'a', 'a', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'b', 'c'), ('a', 'a', 'a', 'b', 'c')]
Powerset of a collections.Counter object
In Python, multisets are usually represented with a collections.Counter rather than with a list. The class collections.Counter is a subclass of dict; it implements dictionaries that map elements to counts, as well as several useful methods such as building a Counter by counting occurrences in a sequence.
Taking the powerset of a Counter is the topic of another question on stackoverflow:
How to generate all the subsets of a Counter?
Although I am not aware of an already-implemented method doing this in standard modules, the answer to that question presents one solution using itertools:
import collections
import itertools
multiset = collections.Counter(['a', 'a', 'a', 'b', 'c'])
# Counter({'a': 3, 'b': 1, 'c': 1})
def powerset(multiset):
range_items = [[(x, z) for z in range(y + 1)] for x,y in multiset.items()]
products = itertools.product(*range_items)
return [{k: v for k, v in pairs if v > 0} for pairs in products]
print(powerset(multiset))
# [{}, {'c': 1}, {'b': 1}, {'b': 1, 'c': 1}, {'a': 1}, {'a': 1, 'c': 1}, {'a': 1, 'b': 1}, {'a': 1, 'b': 1, 'c': 1}, {'a': 2}, {'a': 2, 'c': 1}, {'a': 2, 'b': 1}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3}, {'a': 3, 'c': 1}, {'a': 3, 'b': 1}, {'a': 3, 'b': 1, 'c': 1}]

This will give you all the combinations of lst as tuples. Hope this answers your question.
from itertools import combinations
lst = ['a', 'a', 'a', 'b', 'c']
combs = set()
for i in range(len(lst)+1):
els = [tuple(x) for x in combinations(lst, i)]
for item in tuple(els):
combs.add(item)
print(combs)

The best way I like to think of it is that you start out with the empty set and then for each character you are making a choice of either adding it to the current existing sets or not adding it. Since you have 2 choices at each step, the number of total elements in a powerset is 2^n. Implementing this can be done through this Java code:
public List<List<Integer>> subsets(int[] nums) {
// list variable contains all of the subsets
List<List<Integer>> list = new ArrayList<>();
// add the empty set to start with
list.add(new ArrayList<Integer>());
for (int i = 0; i < nums.length; i++) {
//find current list size
int length = list.size();
// Loop through and add the current element to all existing
subsets
//Represents making the choice of adding the element
for (int j = 0; j < length; j++) {
// making a copy of current subset list
ArrayList<Integer> temp = new ArrayList<>(list.get(j));
temp.add(nums[i]);
list.add(temp);
}
}
return list;
}

Generate a set of tuples from one tuple in PIG

I couldn't find any solution how to generate in Pig a set of tuples from one tuple according to the rule:
Input:
((1,2,3),(a,b,c),(aaa,bbb,ccc))
Output:
(1,a,aaa)
(2,b,bbb)
(3,c,ccc)
Suppose TOBAG and FLATTEN should be applied, but it seems too tricky.

Use the zip builtin function and argument unpacking ("star" args):
>>> x = ((1,2,3),('a','b','c'),('aaa','bbb','ccc'))
>>> tuple(zip(*x))
((1, 'a', 'aaa'), (2, 'b', 'bbb'), (3, 'c', 'ccc'))
>>> for y in zip(*x):
print(y)
(1, 'a', 'aaa')
(2, 'b', 'bbb')
(3, 'c', 'ccc')

[tuple(original[i] for original in originals) for i in range(len(original[0]))]
will give you the second list of tuples if your original list is called originals.

extending permutation algorithm (including duplicates) [duplicate]

This question already has answers here:
Generate all strings under length N in C
(2 answers)
Closed 8 years ago.
I m looking for an algorithm that give all possible combinations of letters
Let me explain better. If i have
base-letters = ["a","b","c"];
depth = 2; //max chars allowed
then the expected result would be these 12 elements (3^1 + 3^2 = 12):
["a", "b", "c", "aa","ab","ac","ba","bb","bc","ca", "cb", "cc"]
If i had a depth value = 3, i would expect (3^1) + (3^2) + (3^3) = 39 elements
["a", "b", ... , "aa", "ab", ... , "aaa", "aab", ..., "aba", ...]
Now, if i understood correctly permutation algorithm is similar, but doesn't consider duplicated letters (like "aa","bb","aab", "aba"), and the variable depth value (it could be different then base-letters length).

You can define a recursive function F(s) which takes a string s of length less than or equal to your maximum length, and start by calling F(s) with s equal to the empty string. The function F computes the length of the string and if it is equal to the maximum length, it prints the string s and returns. If the length of the string is less than the maximum, then F(s) prints out the string s and then iterates over all possible letters in the alphabet and for each letter, it adds the letter to the end of string s to produce a string s' of length one more, and then calls F(s'). This has very low memory usage and is essentially the fastest method possible, at least in asymptotic terms.

In Python, use itertools's permutations function (a code recipe is included if you need to translate the code to your native language)
>>> import itertools
>>> base_elements = ['a', 'b', 'cow']
>>> max_depth = 2
>>> result = [''.join(element) for element in itertools.chain.from_iterable([itertools.permutations(base_elements, depth) for depth in range(1, max_depth+1)])]
>>> print(result)
['a', 'b', 'cow', 'ab', 'acow', 'ba', 'bcow', 'cowa', 'cowb']
If you want only unique values, then rather than concatenating the each output element into a string, create a set. This removes duplicates. Then remove duplicates from the final set.
>>> result = frozenset([frozenset(element)
for element in itertools.chain.from_iterable(
[itertools.permutations(base_elements, depth)
for depth in range(1, max_depth+1)]
)])
Or more cleanly,
def permutations(base_elements, max_depth):
result = set()
for depth in range(1, max_depth+1):
for element in itertools.permutations(base_elements, depth):
result.add(frozenset(element))
return result

It seems this code will give you what you need:
def all_strs(iterable, depth):
results = []
if depth==1:
for item in iterable:
results.append(str(item))
return results
for item in iterable:
for s in all_strs(iterable, depth-1):
results.append(str(item) + s)
return results
if __name__ == "__main__":
print all_strs('abc', 2)
print all_strs([1, 2, 3], 3)
s = 'abc'
results = []
for i in range(len(s)):
results += print all_strs(s, i+1)
print results
the output is:
['aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc']
['111', '112', '113', '121', '122', '123', '131', '132', '133', '211', '212', '213', '221', '222', '223', '231', '232', '233', '311', '312', '313', '321', '322', '323', '331', '332', '333']
['a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa', 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']

Algorithm to generate permutations of a list of strings and their substrings

This algorithm has been escaping me for some time now. Lets say I'm given the string "cccaatt". I'm trying to generate all the possible variations of each substring of repeated letters. EG, "cccaatt" as an input would return:
cat,
catt,
caat,
caatt,
ccat,
ccatt,
ccaat,
ccaatt,
cccat,
cccatt,
cccaat,
cccaatt
The order of the results does not matter, so long as it returns all of them. Generally, the input is a string, consisting of g groups of repeated letters, each group k_n letters long.
My intuition is that this is a recursive algorithm, but the exact structure of it has been difficult to understand.

If you store the alphabet and maximum occurrences of each letter (as awesomely mentioned in the comments) you can do this:
function variations(letter_type, current string) {
if (letter_type is in the alphabet) {
while (string has fewer than the max amount of that letter) {
add one of that letter to current string
variations(next letter, current string)
}
} else {
print current string // since there are no more letters to add
}
}
In Java:
public class Variations {
static String[] alphabet = {"c","a","t"};
static int[] maximums = {3, 2, 2};
public static void main(String[] args) {
variations(0, "");
}
public static void variations(int letter_type, String curr) {
if (letter_type < alphabet.length) {
for (int i = 1; i <= maximums[letter_type]; i++) {
curr += alphabet[letter_type];
variations(letter_type+1, curr);
}
} else {
System.out.println(curr);
}
}
}

Decompose the string into a list of numbers and the number of repeats, i.e. "cccaatt" => [(c,3), (a,2), (t,2)]. then the problem could be defined recursively.
Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)]
define Perm(xs):
if len(xs) == 1:
return all length variations of xs
else:
return every sequence in Perm(x[:-1]) appended with one or more from x[-1]
I'll have a python example shortly.
> perm("cccaatt")
> ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt']
Code attached
def perm(xs):
if not xs:
return []
# group them into the correct format, probably should have used groupby + zip
l = [(xs[0],1)]
for x in xs[1:]:
last, num = l[-1]
if last == x:
l[-1] = (last, num+1)
else:
l.append((x, 1))
# print(l)
print(recurse(l))
# this is where the real work is done.
def recurse(xs):
if len(xs) == 1:
return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ]
prev = recurse(xs[:-1])
char, num = xs[-1]
return [ y + x * char for x in range(1,num + 1) for y in prev ]

The Python itertools module has powerful tools to group and then to iterate on members of groups leading to the following program.
I have shown some intermediate results and used the pprint module to prettyprint the answer:
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import itertools
>>> instring = "cccaatt"
>>> [(x[0], list(x[1])) for x in itertools.groupby(instring)]
[('c', ['c', 'c', 'c']), ('a', ['a', 'a']), ('t', ['t', 't'])]
>>> xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
>>> xx
[['c', 'cc', 'ccc'], ['a', 'aa'], ['t', 'tt']]
>>> list(itertools.product(*xx))
[('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')]
>>> from pprint import pprint as pp
>>> pp(list(itertools.product(*xx)))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
Or as a function:
>>> def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return list(itertools.product(*xx))
>>> pp(stringexpand("cccaatt"))
[('c', 'a', 't'),
('c', 'a', 'tt'),
('c', 'aa', 't'),
('c', 'aa', 'tt'),
('cc', 'a', 't'),
('cc', 'a', 'tt'),
('cc', 'aa', 't'),
('cc', 'aa', 'tt'),
('ccc', 'a', 't'),
('ccc', 'a', 'tt'),
('ccc', 'aa', 't'),
('ccc', 'aa', 'tt')]
>>>
You seem to need the strings joined from their parts. This can be done in this slight mod:
def stringexpand(instring):
xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
return [''.join(parts) for parts in itertools.product(*xx)]
Which returns:
['cat',
'catt',
'caat',
'caatt',
'ccat',
'ccatt',
'ccaat',
'ccaatt',
'cccat',
'cccatt',
'cccaat',
'cccaatt']

find the longest sequence S that is a subsequence of A,B,C string

Give a polynomial time algorithm that takes three strings, A, B and C, as input, and returns the longest sequence S that is a subsequence of A, B, and C.

Let dp[i, j, k] = longest common subsequence of prefixes A[1..i], B[1..j], C[1..k]
We have:
dp[i, j, k] = dp[i - 1, j - 1, k - 1] + 1 if A[i] = B[j] = C[k]
max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise
Similar to the 2d case, except you have 3 dimensions. Complexity is O(len A * len B * len C).

Here's a solution in Python for an arbitrary number of sequences. You could use it to test your solution for 2D, 3D cases. It closely follows Wikipedia's algorithm:
#!/usr/bin/env python
import functools
from itertools import starmap
#memoize
def lcs(*seqs):
"""Find longest common subsequence of `seqs` sequences.
Complexity: O(len(seqs)*min(seqs, key=len)*reduce(mul,map(len,seqs)))
"""
if not all(seqs): return () # at least one sequence is empty
heads, tails = zip(*[(seq[0], seq[1:]) for seq in seqs])
if all(heads[0] == h for h in heads): # all seqs start with the same element
return (heads[0],) + lcs(*tails)
return max(starmap(lcs, (seqs[:i]+(tails[i],)+seqs[i+1:]
for i in xrange(len(seqs)))), key=len)
def memoize(func):
cache = {}
#functools.wraps(func)
def wrapper(*args):
try: return cache[args]
except KeyError:
r = cache[args] = func(*args)
return r
return wrapper
Note: without memoization it is an exponential algorithm (wolfram alpha):
$ RSolve[{a[n] == K a[n-1] + K, a[0] = K}, a[n], n]
a(n) = (K^(n + 1) - 1) K/(K - 1)
where K == len(seqs) and n == max(map(len, seqs))
Examples
>>> lcs("agcat", "gac")
('g', 'a')
>>> lcs("banana", "atana")
('a', 'a', 'n', 'a')
>>> lcs("abc", "acb")
('a', 'c')
>>> lcs("XMJYAUZ", "MZJAWXU")
('M', 'J', 'A', 'U')
>>> lcs("XMJYAUZ")
('X', 'M', 'J', 'Y', 'A', 'U', 'Z')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGHI")
('M', 'J', 'A')
>>> lcs("XMJYAUZ", "MZJAWXU", "AMBCJDEFAGUHI", "ZYXJAQRU")
('J', 'A', 'U')
>>> lcs() #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:
>>> lcs(*"abecd acbed".split())
('a', 'b', 'e', 'd')
>>> lcs("acd", lcs("abecd", "acbed"))
('a', 'd')
>>> lcs(*"abecd acbed acd".split())
('a', 'c', 'd')

All you have to do is Google "longest subsequence".
This is the top link: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
If you have any particular problem understanding it then please ask here, preferably with a more specific question.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Python: What is the right way to modify list elements? - data-structures

Related

Generating the powerset of a multiset

Generate a set of tuples from one tuple in PIG

extending permutation algorithm (including duplicates) [duplicate]

Algorithm to generate permutations of a list of strings and their substrings

find the longest sequence S that is a subsequence of A,B,C string

Categories

Resources