pythonic way to do something N times without an index variable? [duplicate] - coding-style

This question already has answers here:
Is it possible to implement a Python for range loop without an iterator variable?
(15 answers)
Closed 7 months ago.
I have some code like:
for i in range(N):
do_something()
I want to do something N times. The code inside the loop doesn't depend on the value of i.
Is it possible to do this simple task without creating a useless index variable, or in an otherwise more elegant way? How?

A slightly faster approach than looping on xrange(N) is:
import itertools
for _ in itertools.repeat(None, N):
do_something()

Use the _ variable, like so:
# A long way to do integer exponentiation
num = 2
power = 3
product = 1
for _ in range(power):
product *= num
print(product)

I just use for _ in range(n), it's straight to the point. It's going to generate the entire list for huge numbers in Python 2, but if you're using Python 3 it's not a problem.

since function is first-class citizen, you can write small wrapper (from Alex answers)
def repeat(f, N):
for _ in itertools.repeat(None, N): f()
then you can pass function as argument.

The _ is the same thing as x. However it's a python idiom that's used to indicate an identifier that you don't intend to use. In python these identifiers don't takes memor or allocate space like variables do in other languages. It's easy to forget that. They're just names that point to objects, in this case an integer on each iteration.

I found the various answers really elegant (especially Alex Martelli's) but I wanted to quantify performance first hand, so I cooked up the following script:
from itertools import repeat
N = 10000000
def payload(a):
pass
def standard(N):
for x in range(N):
payload(None)
def underscore(N):
for _ in range(N):
payload(None)
def loopiter(N):
for _ in repeat(None, N):
payload(None)
def loopiter2(N):
for _ in map(payload, repeat(None, N)):
pass
if __name__ == '__main__':
import timeit
print("standard: ",timeit.timeit("standard({})".format(N),
setup="from __main__ import standard", number=1))
print("underscore: ",timeit.timeit("underscore({})".format(N),
setup="from __main__ import underscore", number=1))
print("loopiter: ",timeit.timeit("loopiter({})".format(N),
setup="from __main__ import loopiter", number=1))
print("loopiter2: ",timeit.timeit("loopiter2({})".format(N),
setup="from __main__ import loopiter2", number=1))
I also came up with an alternative solution that builds on Martelli's one and uses map() to call the payload function. OK, I cheated a bit in that I took the freedom of making the payload accept a parameter that gets discarded: I don't know if there is a way around this. Nevertheless, here are the results:
standard: 0.8398549720004667
underscore: 0.8413165839992871
loopiter: 0.7110594899968419
loopiter2: 0.5891903560004721
so using map yields an improvement of approximately 30% over the standard for loop and an extra 19% over Martelli's.

Assume that you've defined do_something as a function, and you'd like to perform it N times.
Maybe you can try the following:
todos = [do_something] * N
for doit in todos:
doit()

What about a simple while loop?
while times > 0:
do_something()
times -= 1
You already have the variable; why not use it?

Related

Is there a way to use range with Z3ints in z3py?

I'm relatively new to Z3 and experimenting with it in python. I've coded a program which returns the order in which different actions is performed, represented with a number. Z3 returns an integer representing the second the action starts.
Now I want to look at the model and see if there is an instance of time where nothing happens. To do this I made a list with only 0's and I want to change the index at the times where each action is being executed, to 1. For instance, if an action start at the 5th second and takes 8 seconds to be executed, the index 5 to 12 would be set to 1. Doing this with all the actions and then look for 0's in the list would hopefully give me the instances where nothing happens.
The problem is: I would like to write something like this for coding the problem
list_for_check = [0]*total_time
m = s.model()
for action in actions:
for index in range(m.evaluate(action.number) , m.evaluate(action.number) + action.time_it_takes):
list_for_check[index] = 1
But I get the error:
'IntNumRef' object cannot be interpreted as an integer
I've understood that Z3 isn't returning normal ints or bools in their models, but writing
if m.evaluate(action.boolean):
works, so I'm assuming the if is overwritten in a way, but this doesn't seem to be the case with range. So my question is: Is there a way to use range with Z3 ints? Or is there another way to do this?
The problem might also be that action.time_it_takes is an integer and adding a Z3int with a "normal" int doesn't work. (Done in the second part of the range).
I've also tried using int(m.evaluate(action.number)), but it doesn't work.
Thanks in advance :)
When you call evaluate it returns an IntNumRef, which is an internal z3 representation of an integer number inside z3. You need to call as_long() method of it to convert it to a Python number. Here's an example:
from z3 import *
s = Solver()
a = Int('a')
s.add(a > 4);
s.add(a < 7);
if s.check() == sat:
m = s.model()
print("a is %s" % m.evaluate(a))
print("Iterating from a to a+5:")
av = m.evaluate(a).as_long()
for index in range(av, av + 5):
print(index)
When I run this, I get:
a is 5
Iterating from a to a+5:
5
6
7
8
9
which is exactly what you're trying to achieve.
The method as_long() is defined here. Note that there are similar conversion functions from bit-vectors and rationals as well. You can search the z3py api using the interface at: https://z3prover.github.io/api/html/namespacez3py.html

Parallel Quicksort in Python

I would like to implement the Parallel Quicksort in Python.
I know Quicksort, you have to choose a pivot, partition, but how do spawned them as independent task in Python?
Here is the pseudocode for it:
QS(A[1:n])
if n=1 then return A[1]
pivot <--any value from A (random)
L <- A[A[:] < pivot]
R <- A[A[:] > pivot]
A(L) <- spawn QS(L)
A(R) <- QS(R)
sync
return A(L) ++ A(R)
You can do it, but it's unlikely to speed up your code. You can use ThreadPoolExecutor to create a thread and get result back from it. Here is a simple illustration with a function that sums an array:
from concurrent.futures import ThreadPoolExecutor
pool = ThreadPoolExecutor(max_workers=1)
def add(arr):
if len(arr)<2:
return sum(arr) #cheating a little
mid = len(arr)//2
f = pool.submit(add,arr[:mid])
y = add(arr[mid:])
return y+f.result()
submit() takes a function's name as first argument and then takes the function's arguments. So for your code it'll be something like f = pool.submit(QS,L).
Please remember though Python supports concurrency but not parallelism using thread. Take a look here for their difference. So above code will actually run in single thread. Now you can use ProcessPoolExecutor for process parallelism, which python supports well. But the overhead in data IO will probably eat up any speed up you gain from process parallelism.

cython code continues after prange returns value

Suppose i have the following function:
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef bint test(np.int_t [:] values):
cdef Py_ssize_t n_values = len(values)
cdef int i
for i in prange(n_values,nogil=True):
if i ==0:
return 0
print 'test'
I run it like so:
In [1]: import algos
In [2]: import numpy as np
In [3]: algos.test(np.array([1,2,3,1,4,5]))
test
Out[3]: False
Why is the function printing when it should have just exited without printing? Is there a way to have the function exit when it reaches the return?
Thank you.
The documentation is clear that this is a bit of a minefield since there's no guarantee which iteration finishes first. I think the added complication that they don't make clear is that if the number of iterations is small enough then (provided one thread is done) you can also end up continuing as after the prange too, which is what you see.
What seems to work for me is to use the else clause of a loop, which only gets executed if it hasn't finished early:
for i in prange(n_values,nogil=True):
# stuff ...
else:
with gil:
print "test"
A quick look at the C code suggests that this is putting appropriate checks in place and it should be reliable.

Python, fastest way to iterate over regular expressions but stop on first match

I have a function that returns True if a string matches at least one
regular expression in a list and False otherwise. The function is called
often enough that performance is an issue.
When running it through cProfile, the function is spending about 65% of
its time doing matches and 35% of its time iterating over the list.
I would think there would be a way to use map() or something but I can't
think of a way to have it stop iterating after it finds a match.
Is there a way to make the function faster while still having it return
upon finding the first match?
def matches_pattern(str, patterns):
for pattern in patterns:
if pattern.match(str):
return True
return False
The first thing that comes to mind is pushing the loop to the C side by using a generator expression:
def matches_pattern(s, patterns):
return any(p.match(s) for p in patterns)
Probably you don't even need a separate function for that.
Another thing you should try out is to build a single, composite regex using the | alternation operator, so that the engine has a chance to optimize it for you. You can also create the regex dynamically from a list of string patterns, if this is necessary:
def matches_pattern(s, patterns):
return re.match('|'.join('(?:%s)' % p for p in patterns), s)
Of course you need to have your regexes in string form for that to work. Just profile both of these and check which one is faster :)
You might also want to have a look at a general tip for debugging regular expressions in Python. This can also help to find opportunities to optimize.
UPDATE: I was curious and wrote a little benchmark:
import timeit
setup = """
import re
patterns = [".*abc", "123.*", "ab.*", "foo.*bar", "11010.*", "1[^o]*"]*10
strings = ["asdabc", "123awd2", "abasdae23", "fooasdabar", "111", "11010100101", "xxxx", "eeeeee", "dddddddddddddd", "ffffff"]*10
compiled_patterns = list(map(re.compile, patterns))
def matches_pattern(str, patterns):
for pattern in patterns:
if pattern.match(str):
return True
return False
def test0():
for s in strings:
matches_pattern(s, compiled_patterns)
def test1():
for s in strings:
any(p.match(s) for p in compiled_patterns)
def test2():
for s in strings:
re.match('|'.join('(?:%s)' % p for p in patterns), s)
def test3():
r = re.compile('|'.join('(?:%s)' % p for p in patterns))
for s in strings:
r.match(s)
"""
import sys
print(timeit.timeit("test0()", setup=setup, number=1000))
print(timeit.timeit("test1()", setup=setup, number=1000))
print(timeit.timeit("test2()", setup=setup, number=1000))
print(timeit.timeit("test3()", setup=setup, number=1000))
The output on my machine:
1.4120500087738037
1.662621021270752
4.729579925537109
0.1489570140838623
So any doesn't seem to be faster than your original approach. Building up a regex dynamically also isn't really fast. But if you can manage to build up a regex upfront and use it several times, this might result in better performance. You can also adapt this benchmark to test some other options :)
The way to do this fastest is to combine all the regexes into one with "|" between them, then make one regex match call. Also, you'll want to compile it once to be sure you're avoiding repeated regex compilation.
For example:
def matches_pattern(s, pats):
pat = "|".join("(%s)" % p for p in pats)
return bool(re.match(pat, s))
This is for pats as strings, not compiled patterns. If you really only have compiled regexes, then:
def matches_pattern(s, pats):
pat = "|".join("(%s)" % p.pattern for p in pats)
return bool(re.match(pat, s))
Adding to the excellent answers above, make sure you compare the output of re.match with None:
>>> timeit('None is None')
0.03676295280456543
>>> timeit('bool(None)')
0.1125330924987793
>>> timeit('re.match("a","abc") is None', 'import re')
1.0200879573822021
>>> timeit('bool(re.match("a","abc"))', 'import re')
1.134294033050537
It's not exactly what the OP asked, but this worked well for me as an alternative to long iterative matching.
Here is some example data and code:
import random
import time
mylonglist = [ ''.join([ random.choice("ABCDE") for i in range(50)]) for j in range(3000) ]
# check uniqueness
print "uniqueness:"
print len(mylonglist) == len(set(mylonglist))
# subsample 1000
subsamp = [ mylonglist[x] for x in random.sample(xrange(3000),1000) ]
# join long string for matching
string = " ".join(subsamp)
# test function 1
def by_string_match(string, mylonglist):
counter = 0
t1 = time.time()
for i in mylonglist:
if i in string:
counter += 1
t2 = time.time()
print "It took {} seconds to find {} items".format(t2-t1,counter)
# test function 2
def by_iterative_match(subsamp, mylonglist):
counter = 0
t1 = time.time()
for i in mylonglist:
if any([ i in s for s in subsamp ]):
counter += 1
t2 = time.time()
print "It took {} seconds to find {} items".format(t2-t1,counter)
# test 1:
print "string match:"
by_string_match(string, mylonglist)
# test 2:
print "iterative match:"
by_iterative_match(subsamp, mylonglist)

a more pythonic way to express conditionally bounded loop?

I've got a loop that wants to execute to exhaustion or until some user specified limit is reached. I've got a construct that looks bad yet I can't seem to find a more elegant way to express it; is there one?
def ello_bruce(limit=None):
for i in xrange(10**5):
if predicate(i):
if not limit is None:
limit -= 1
if limit <= 0:
break
def predicate(i):
# lengthy computation
return True
Holy nesting! There has to be a better way. For purposes of a working example, xrange is used where I normally have an iterator of finite but unknown length (and predicate sometimes returns False).
Maybe something like this would be a little better:
from itertools import ifilter, islice
def ello_bruce(limit=None):
for i in islice(ifilter(predicate, xrange(10**5)), limit):
# do whatever you want with i here
I'd take a good look at the itertools library. Using that, I think you'd have something like...
# From the itertools examples
def tabulate(function, start=0):
return imap(function, count(start))
def take(n, iterable):
return list(islice(iterable, n))
# Then something like:
def ello_bruce(limit=None):
take(filter(tabulate(predicate)), limit)
I'd start with
if limit is None: return
since nothing can ever happen to limit when it starts as None (if there are no desirable side effects in the iteration and in the computation of predicate -- if there are, then, in this case you can just do for i in xrange(10**5): predicate(i)).
If limit is not None, then you just want to perform max(limit, 1) computations of predicate that are true, so an itertools.islice of an itertools.ifilter would do:
import itertools as it
def ello_bruce(limit=None):
if limit is None:
for i in xrange(10**5): predicate(i)
else:
for _ in it.islice(
it.ifilter(predicate, xrange(10**5),
max(limit, 1)): pass
You should remove the nested ifs:
if predicate(i) and not limit is None:
...
What you want to do seems perfectly suited for a while loop:
def ello_bruce(limit=None):
max = 10**5
# if you consider 0 to be an invalid value for limit you can also do
# if limit:
if limit is None:
limit = max
while max and limit:
if predicate(i):
limit -= 1
max -=1
The loop stops if either max or limit reaches zero.
Um. As far as I understand it, predicate just computes in segments, and you totally ignore its return value, right?
This is another take:
import itertools
def ello_bruce(limit=None):
if limit is None:
limiter= itertools.repeat(None)
else:
limiter= xrange(limit)
# since predicate is a Python function
# itertools looping won't be faster, so use plain for.
# remember to replace the xrange(100000) with your own iterator
for dummy in itertools.izip(xrange(100000), limiter):
pass
Also, remove the unneeded return True from the end of predicate.

Resources