Default value for optional argument in Python: - arguments

I have the following method:
def get_data(replace_nan=False):
if replace_nan is not False
data[numpy.isnan(data)] = replace_nan
return data
else:
return data[~numpy.isnan(data)]
So, if replace_nan is False, we return some data array but remove NaNs, and if it's anything else, we replace NaNs with the argument.
Problem is, I may want to replace NaN with False. Or anything else, for that sake. What's the most pythonic way to do so? This:
def get_data(**kwargs):
if "replace_nan" in kwargs:
...
works, but is semantically ugly (because we're really just interested in one keyword argument, replace_nan) Any suggestions how to handle this case?

Usually people use None as the default value and then check for is not None.
If you need to allow None, too, use a dummy object:
__default = object()
def get_data(replace_nan=__default):
if replace_nan is __default:
...

numpy evaluates False inside array to 0:
>>>np.array([False,True,2,3])
array([0, 1, 2, 3])
So this might probably not what you want to happen.
def get_data(replace_nan=False):
if replace_nan:
return np.where(np.isnan(data),replace_nan,data)
else:
return data[~numpy.isnan(data)]
The numpy.where function builds an array with the indexes where your entries are NaN. There it replaces the entries with replace_nan, everywhere else it keeps the entries.
From the manual page:
numpy.where(condition[, x, y])
Return elements, either from x or y, depending on condition.

I Wanted to put this as comment below ThiefMaster's answer but no formatting in comments allowed, so ...:
If you are concerned about cluttering your namespace you can—with some tricks—del the variable after defining the function.
__default = object()
def get_data(replace_nan=__default, __default=__default):
if replace_nan is __default:
...
del __default
Or:
__default = object()
def get_data(replace_nan=__default):
if replace_nan is get_data.default_replace_nan:
...
get_data.default_replace_nan = __default
del __default

Another way to avoid the cluttering of ThiefMaster's approach is this:
def get_data(replace_nan=object()):
if replace_nan is get_data.func_defaults[0]:
...
But it uses python interna which might not be as portable (pypy/stackles/next version/…).

Related

Returning all the subsets. (Issue with recursion)

class Solution(object):
lista=[]
def subsets(self, nums):
subset=[]
i=0
self.helper(nums,subset,i)
return self.lista
def helper(self,nums,subset,i):
if(i==len(nums)):
print(self.lista)
self.lista.append(subset)
print(subset)
return
subset.append(nums[i])
self.helper(nums,subset,i+1)
subset.pop()
self.helper(nums,subset,i+1)
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
So the question is https://leetcode.com/problems/subsets/
Can someone help me understand where I am going wrong? My code only returns an empty list. I understand that the last call of the recursion returns nullset but my lista is declared globally and so whenever I append something in the base case of the recursion function, shouldn't it append to the existing global list?. So, should it not append that to the lista and work properly? Any help is appreciated.
Your logic for generating subsets is fine. The main issue here is that self.lista.append(subset) inserts a reference to subset into lista. You can read more about object references in relation to lists here.
This means that any changes you make to subset will persist in all references of subset in lista. In this case, the final state of subset will be an empty list, hence lista contains a bunch of empty lists [].
One way to fix this would be to make a copy of subset on insertion, i.e change
self.lista.append(subset)
to
self.lista.append(subset.copy()) (if you're using Python >= 3.3, otherwise you can slice it or use copy).
for this recursion it can be simplified as the following:
(you can compare and see the difference then)
class Solution:
def subsets(self, nums: List[int]) -> List[List[int]]:
if not nums:
return [[]]
without = self.subsets(nums[1:])
return without + [s + [nums[0]] for s in without]
Or a simple straight-forward iterative can do the work too:
def subsets(self, nums):
result = [[]]
for n in nums:
result += [x+[n] for x in result]
return result

Why variables setted inside Enum.each is not saved?

I'm trying to set a value to a variable inside a function in Enum.each, but at the end of loop, variable is empty and I don't know exactly why this behaviour.
Code:
base = "master"
candidates = ["stream", "pigeons", "maters"]
return = []
Enum.each(candidates, fn candidate ->
cond do
String.length(base) == String.length(candidate) ->
return = return ++ [candidate]
true ->
true
end
end)
IO.inspect return
At this example, return is expected to be ["stream", "maters"], but instead, it is only an empty list: []
My question is why this happens.
When dealing with languages like Elixir, it is better to think in terms of "values" and "names" instead of "variables".
The reason you cannot do what you want is that Elixir has "lexical scoping".
When you assign to a "variable", you create a new value in the inner scope. You never change the "value" of a "name" defined in the outer scope.
(you probably can get what you want with Enum.filter/2, but I'm guessing this is just an illustrative example)
EDIT:
As of today, Elixir will allow you to write something like this:
if condition_that_evals_to_false do
x = 1
else
x = 2
end
IO.inspect x # => 2
```
But this will be deprecated in Elixir 1.3
Any reason why you don't just filter?
Anyways it seems like you're trying to mutate the value of return which is not possible with Elixir.
base = "master"
candidates = ["stream", "pigeon", "maters"]
result = Enum.filter(candidates, fn(candidate) ->
length(candidate) == length(base)
end
IO.inspect result
Edit: I'd also like to add that based on your logic, all of the candidates would be returned
Not sure, since I've never worked with the language, but a couple things spring to mind:
String.length(base) == String.length(candidate) can be equivalent to true, which is already a pattern in your set.
It could also be a scope issue with the return variable. It could be that the local return is hiding the global return. You could check this by outputting return every iteration. Each iteration the return should contain a single entry.
This is a bug. From Elixir's documentation:
Note: due to a bug in the 0.12.x series, cond‘s conditions actually
leak bindings to the surrounding scope. This should be fixed in
0.13.1.
You should use filtering like #{Christopher Yammine} suggested.

Python, fastest way to iterate over regular expressions but stop on first match

I have a function that returns True if a string matches at least one
regular expression in a list and False otherwise. The function is called
often enough that performance is an issue.
When running it through cProfile, the function is spending about 65% of
its time doing matches and 35% of its time iterating over the list.
I would think there would be a way to use map() or something but I can't
think of a way to have it stop iterating after it finds a match.
Is there a way to make the function faster while still having it return
upon finding the first match?
def matches_pattern(str, patterns):
for pattern in patterns:
if pattern.match(str):
return True
return False
The first thing that comes to mind is pushing the loop to the C side by using a generator expression:
def matches_pattern(s, patterns):
return any(p.match(s) for p in patterns)
Probably you don't even need a separate function for that.
Another thing you should try out is to build a single, composite regex using the | alternation operator, so that the engine has a chance to optimize it for you. You can also create the regex dynamically from a list of string patterns, if this is necessary:
def matches_pattern(s, patterns):
return re.match('|'.join('(?:%s)' % p for p in patterns), s)
Of course you need to have your regexes in string form for that to work. Just profile both of these and check which one is faster :)
You might also want to have a look at a general tip for debugging regular expressions in Python. This can also help to find opportunities to optimize.
UPDATE: I was curious and wrote a little benchmark:
import timeit
setup = """
import re
patterns = [".*abc", "123.*", "ab.*", "foo.*bar", "11010.*", "1[^o]*"]*10
strings = ["asdabc", "123awd2", "abasdae23", "fooasdabar", "111", "11010100101", "xxxx", "eeeeee", "dddddddddddddd", "ffffff"]*10
compiled_patterns = list(map(re.compile, patterns))
def matches_pattern(str, patterns):
for pattern in patterns:
if pattern.match(str):
return True
return False
def test0():
for s in strings:
matches_pattern(s, compiled_patterns)
def test1():
for s in strings:
any(p.match(s) for p in compiled_patterns)
def test2():
for s in strings:
re.match('|'.join('(?:%s)' % p for p in patterns), s)
def test3():
r = re.compile('|'.join('(?:%s)' % p for p in patterns))
for s in strings:
r.match(s)
"""
import sys
print(timeit.timeit("test0()", setup=setup, number=1000))
print(timeit.timeit("test1()", setup=setup, number=1000))
print(timeit.timeit("test2()", setup=setup, number=1000))
print(timeit.timeit("test3()", setup=setup, number=1000))
The output on my machine:
1.4120500087738037
1.662621021270752
4.729579925537109
0.1489570140838623
So any doesn't seem to be faster than your original approach. Building up a regex dynamically also isn't really fast. But if you can manage to build up a regex upfront and use it several times, this might result in better performance. You can also adapt this benchmark to test some other options :)
The way to do this fastest is to combine all the regexes into one with "|" between them, then make one regex match call. Also, you'll want to compile it once to be sure you're avoiding repeated regex compilation.
For example:
def matches_pattern(s, pats):
pat = "|".join("(%s)" % p for p in pats)
return bool(re.match(pat, s))
This is for pats as strings, not compiled patterns. If you really only have compiled regexes, then:
def matches_pattern(s, pats):
pat = "|".join("(%s)" % p.pattern for p in pats)
return bool(re.match(pat, s))
Adding to the excellent answers above, make sure you compare the output of re.match with None:
>>> timeit('None is None')
0.03676295280456543
>>> timeit('bool(None)')
0.1125330924987793
>>> timeit('re.match("a","abc") is None', 'import re')
1.0200879573822021
>>> timeit('bool(re.match("a","abc"))', 'import re')
1.134294033050537
It's not exactly what the OP asked, but this worked well for me as an alternative to long iterative matching.
Here is some example data and code:
import random
import time
mylonglist = [ ''.join([ random.choice("ABCDE") for i in range(50)]) for j in range(3000) ]
# check uniqueness
print "uniqueness:"
print len(mylonglist) == len(set(mylonglist))
# subsample 1000
subsamp = [ mylonglist[x] for x in random.sample(xrange(3000),1000) ]
# join long string for matching
string = " ".join(subsamp)
# test function 1
def by_string_match(string, mylonglist):
counter = 0
t1 = time.time()
for i in mylonglist:
if i in string:
counter += 1
t2 = time.time()
print "It took {} seconds to find {} items".format(t2-t1,counter)
# test function 2
def by_iterative_match(subsamp, mylonglist):
counter = 0
t1 = time.time()
for i in mylonglist:
if any([ i in s for s in subsamp ]):
counter += 1
t2 = time.time()
print "It took {} seconds to find {} items".format(t2-t1,counter)
# test 1:
print "string match:"
by_string_match(string, mylonglist)
# test 2:
print "iterative match:"
by_iterative_match(subsamp, mylonglist)

How to copy a dict and modify it in one line of code

Very often I need to create dicts that differ one from another by an item or two. Here is what I usually do:
setup1 = {'param1': val1,
'param2': val2,
'param3': val3,
'param4': val4,
'paramN': valN}
setup2 = copy.deepcopy(dict(setup1))
setup2.update({'param1': val10,
'param2': val20})
The fact that there is a point in the program at which setup2 is an identical copy of setup1 makes me nervous, as I'm afraid that at some point of the program life the two lines might get separated, which is a slippery slope towards too many bugs.
Ideally I would like to be able to complete this action in a single line of code (something like this):
setup2 = dict(setup1).merge({'param1': val10,
'param2': val20})
Of course, I can use semicolon to squeeze two commands into one physical line, but this looks pretty ugly to me. Are there other options?
The simplest way in my opinion is something like this:
new_dict = {**old_dict, 'changed_val': value, **other_new_vals_as_dict}
You could use keyword arguments in the dictionary constructor for your updates
new = dict(old, a=1, b=2, c=3)
# You can also unpack your modifications
new = dict(old, **mods)
This is equivalent to:
new = old.copy()
new.update({"a": 1, "b": 2, "c": 3})
Source
Notes
dict.copy() creates a shallow copy.
All keys need to be strings since they are passed as keyword arguments.
setup2 = dict(setup1.items() + {'param1': val10, 'param2': val20}.items())
This way if new keys do not exist in setup1 they get added, otherwise they replace the old key/value pairs.
Solution
Build a function for that.
Your intention would be clearer when you use it in the code, and you can handle complicated decisions (e.g., deep versus shallow copy) in a single place.
def copy_dict(source_dict, diffs):
"""Returns a copy of source_dict, updated with the new key-value
pairs in diffs."""
result=dict(source_dict) # Shallow copy, see addendum below
result.update(diffs)
return result
And now the copy is atomic, assuming no threads involved:
setup2=copy_dict(setup1, {'param1': val10, 'param2': val20})
Addendum - deep copy
For primitives (integers and strings), there is no need for deep copy:
>>> d1={1:'s', 2:'g', 3:'c'}
>>> d2=dict(d1)
>>> d1[1]='a'
>>> d1
{1: 'a', 2: 'g', 3: 'c'}
>>> d2
{1: 's', 2: 'g', 3: 'c'}
If you need a deep copy, use the copy module:
result=copy.deepcopy(source_dict) # Deep copy
instead of:
result=dict(setup1) # Shallow copy
Make sure all the objects in your dictionary supports deep copy (any object that can be pickled should do).
setup2 = dict((k, {'param1': val10, 'param2': val20}.get(k, v))
for k, v in setup1.iteritems())
This only works if all keys of the update dictionary are already contained in setup1.
If all your keys are strings, you can also do
setup2 = dict(setup1, param1=val10, param2=val20)
If you just need to create a new dict with items from more than one dict, you can use:
dict(a.items() + b.items())
If both "a" and "b" have some same key, the result will have the value from b.
If you're using Python 3, the concatenation won't work, but you can do the same by freezing the generators to lists, or by using the itertools.chain function.
This is an extension to the nice answer posted by Adam Matan:
def copy_dict(d, diffs={}, **kwargs):
res = dict(d)
res.update(diffs)
res.update(kwargs)
return res
The only difference is the addition of kwargs.
Now one can write
setup2 = copy_dict(setup1, {'param1': val10, 'param2': val20})
or
setup2 = copy_dict(setup1, param1=val10, param2=val20)
From Python 3.9, you can use the pipe command (e.g. first_dic | second_dic) for merging dictionary; it can also be used for returning a new updated dictionary by passing the original dictionary first, and the update as a second dictionary:
setup2 = setup1 | {'param1': val10, 'param2': val20}
You can write your own class using UserDict wrapper, and simply add dicts like
# setup1 is of Dict type (see below)
setup2 = setup1 + {'param1': val10}
All you have to do is
Define a new class using UserDict as base class
Implement __add__ method for it.
Something like :
class Dict(dict):
def __add__(self, _dict):
if isinstance(_dict, dict):
tmpdict = Dict(self)
tmpdict.update(_dict)
return tmpdict
else:
raise TypeError
def __radd__(self, _dict):
return Dict.__add__(self, _dict)
I like this line (after from itertools import chain):
d3 = dict(chain(d1.items(), d2.items()))
(Thanks for juanpa.arrivillaga for the improvement!)
Some good answers above. I came here because I had the same issue. Thought the function solution was the most elegant since the question mentioned "often"
def variant(common, diffs):
"""Create a new dict as a variant of an old one
"""
temp = common.copy()
temp.update(diffs)
return temp
to call it you simply use:
PTX130 = variant(PTX100, {'PA_r': 0.25, 'TX_CAP': 4.2E-10})
which for me says that the PTX130 is a variant of the PTX100 with different PA resistance and TX capacitance.

Is there a more Pythonic way of changing `None` to `[]` than

Is there a more Pythonic way of doing this?:
if self.name2info[name]['prereqs'] is None:
self.name2info[name]['prereqs'] = []
if self.name2info[name]['optionals'] is None:
self.name2info[name]['optionals'] = []
The reason I do this is because I need to iterate over those later. They're None to begin with sometimes because that's the default value. It's my workaround to not making [] a default value.
Thanks.
If you prefer this:
self.name2info[name]['prereqs'] = self.name2info[name]['prereqs'] or []
If you can't fix the input you could do this (becomes 'better' if you need to add more):
for prop in ['prereqs', 'optionals']:
if self.name2info[name][prop] is None:
self.name2info[name][prop] = []
But replacing these values to be iterating over the empty list you just added doesn't make a whole lot of sense (unless maybe if you're appending something to this list at some point). So maybe you could just move the test for None-ness right before the iteration:
prereqs = self.name2info[name]['prereqs']
if prereqs is not None:
for prereq in prereqs:
do_stuff(prereq)
Slightly going off-topic now, but if you ever want to test if an item is iterable at all, a common (pythonic) way would be to write:
try:
my_iterable_obj = iter(my_obj)
except TypeError:
# not iterable
You could do it this way:
if not self.name2info[name]['prereqs']: self.name2info[name]['prereqs'] = []
or this way
self.name2info[name]['prereqs'] = [] if not self.name2info[name]['prereqs'] else self.name2info[name]['prereqs']
Every one of those attribute and dict lookups takes time and processing. It's Pythonic to look up self.name2info[name] just once, and then work with a temporary name bound to that dict:
rec = self.name2info[name]
for key in "prereqs optionals required elective distance".split():
if key not in rec or rec[key] is None:
rec[key] = []
Now if need to add another category, like "AP_credit", you just add that to the string of key names.
If you're iterating over them I assume they're stored in a list. In which case combining some of the above approaches would probably be best.
seq=list(map(lambda x: x or [], seq))
Is a concise way of doing it. To my knowledge conversions in map() are faster than explicit for loops because the loops are run in the underlying C code.

Resources