Tuple unpacking: dummy variable vs index - coding-style

What is the usual/clearest way to write this in Python?
value, _ = func_returning_a_tuple()
or:
value = func_returning_a_tuple()[0]

value = func_returning_a_tuple()[0] seems clearer and also can be generalized.
What if the function was returning a tuple with more than 2 values?
What if the program logic is interested in the 4th element of an umpteen tuple?
What if the size of the returned tuple varies?
None of these questions affects the subcript-based idiom, but do in the case of multi-assignement idiom.

If you'd appreciate a handy way to do this in python3.x, check out the python enhancement proposal (PEP) 3132 on this page of What's New in Python:
Extended Iterable Unpacking. You can now write things like a, b, *rest = some_sequence. And even *rest, a = stuff. The rest object is always a (possibly empty) list; the right-hand side may be any iterable. Example:
(a, *rest, b) = range(5)
This sets a to 0, b to 4, and rest to [1, 2, 3].

For extracting a single item, indexing is a bit more idiomatic. When you're extracting two or more items, unpacking becomes more idiomatic. That's just empirical observation on my part; I don't know of any style guides recommending or mandating either choice!-)

For list/generator comprehensions with key/value pairs I think the usage of the dummy variable can be quite neat, especially where the unpacked value needs to be used more than once (avoiding repeated indexing), e.g.:
l = [('a', 1.54), ('b', 4.34), ('c', 3.22), ('d', 6.43)]
s = [x * (1.0 - x) * (2.0 - x) for _, x in l]
versus:
s = [x[0] * (1.0 - x[0]) * (2.0 - x[0]) for x in l]
Another thing to note is that while unpacking and indexing are roughly as expensive as one another, extended unpacking seems to be an order of magnitude slower.
With Python 3.2 using %timeit in IPython:
Regular unpacking:
>>> x = (1, 2)
>>> %timeit y, _ = x
10000000 loops, best of 3: 50 ns per loop
>>> %timeit y, _ = x
10000000 loops, best of 3: 50.4 ns per loop
Extended unpacking:
>>> x = (1, 2, 3)
>>> %timeit y, *_ = x
1000000 loops, best of 3: 1.02 us per loop
>>> %timeit y = x[0]
10000000 loops, best of 3: 68.9 ns per loop

Related

Replace values with for loop

Suppose I have the following function:
function y1(x)
y = x^(2) - 4
return y
end
Now, I want to evaluate all the values from this sequence: collect(range(-10,10, 1000))
I tried this
y_1 = zeros(1000);
for x in collect(range(-10, 10, 1000))
y_1 = y1.(x)
end
Note that I use the broadcast operator to apply the function y1 for every value that takes the iterator. But if I don't use it I get the same result.
But as an answer, I just get 96.0.
How can I refill the y_1 vector with the for loop, so I get the evaluated values?
The evaluated vector should be of size 1000
Thanks in advance!
Edit:
I found a way to get to my desired result without the for loop:
y_1 = y1.(collect(range(-10, 10, 1000)))
But I still want to know how can I do it in a loop.
The broadcast operator broadcasts the function over the entire iterator by itself i.e. y1.(arr) will
call y1 on each of the elements of the array arr
collect the results of all those calls, and
allocate memory to store those results as an array too
So the following are all equivalent in terms of functionality:
julia> arr = range(-4, 5, length = 10) #define a simple range
-4.0:1.0:5.0
julia> y1.(arr)
10-element Vector{Float64}:
12.0
5.0
0.0
-3.0
-4.0
-3.0
0.0
5.0
12.0
21.0
julia> [y1(x) for x in arr]
10-element Vector{Float64}:
(same values as above)
julia> map(y1, arr)
10-element Vector{Float64}:
(same values as above)
julia> y_1 = zeros(10);
julia> for (i, x) in pairs(arr)
y_1[i] = y1(x)
end
julia> y_1
10-element Vector{Float64}:
(same values as above)
In practice, there maybe other considerations, including performance, that decides between these and other choices.
As an aside, note that very often you don't want to collect a range in Julia i.e. don't think of collect as somehow equivalent to c() in R. For many operations, the ranges can be directly used, including for iteration in for loops. collect should only be necessary in the rare cases where an actual Vector is necessary, for eg. a value in the middle of the array needs to be changed for some reason. As a general rule, use the range results as they are, until and unless you get an error that requires you to change it.

permutations without repetition

I would like to know, what is the best approach to solve this problem:
Given x, y and y integers: a1, a2, a3 .. ay find all combinations of
a1 ± a2 ± ... ± ay = x, y < 20.
My recent approach is to find all permutations of 1 and 0 stored in table T and then, depending on whether number T[i] is 1 and 0, add or subtract ai from sum. The problem is that there are n! permutations of n-element array. Hence, for 20-element array, I have to check 20! possibilities where most of them are repeated. Could you please suggest me any potential approach to solving my problem?
There are only 2^20 (just over a million) binary vectors of length 20 rather than the infeasible 20!. Use should be able to brute-force that few in less than a second, especially if you use a Gray Code which would allow you to pass from one candidate sum to another in a single step (e.g. to go from a + b - c -d to a + b - c + d just add 2*d.
The excellent branch and bound idea of #MikeWise would be good if y gets much larger. Generate a tree starting with a root node of 0. Give it children of -a1 and +a1. Then 4 grand children by adding and subtracting a2, etc. If you ever get farther than the sum of the remaining ai from the target x -- you can prune that branch. In the worst case, this might be slightly worse than the Gray-code based brute force (because you need to do so much more processing at each node), but in the best case you might be able to prune away most possibilities.
On Edit: Here is some Python code. First I define a generator which, given an integer n, successively returns which bit position needs to flip to step through a Gray code:
def grayBit(n):
code = [0]*n
odd = True
done = False
while not done:
if odd:
code[0] = 1 - code[0] #flip bit
odd = False
yield 0
else:
i = code.index(1)
if i == n-1:
done = True
else:
code[i+1] = 1 - code[i+1]
odd = True
yield i+1
(This uses an algorithm which I learned years ago in the excellent book "Constructive Combinatorics" by Stanton and White).
Then -- I use this to return all solutions (as lists consisting of the input list of numbers with negative signs inserted as needed). The key point is that I can take the current bit-to-flip and either add or subtract twice the corresponding number:
def signedSums(nums, target):
n = len(nums)
patterns = []
total = sum(nums)
pattern = [1]*n
if target == total: patterns.append([x*y for x,y in zip(nums,pattern)])
deltas = [2*i for i in nums]
for i in grayBit(n):
if pattern[i] == 1:
total -= deltas[i]
else:
total += deltas[i]
pattern[i] = -1 * pattern[i]
if target == total: patterns.append([x*y for x,y in zip(nums,pattern)])
return patterns
Typical output:
>>> signedSums([1,2,3,4,5,9],6)
[[1, -2, -3, -4, 5, 9], [1, 2, 3, -4, -5, 9], [-1, 2, -3, 4, -5, 9], [1, 2, 3, 4, 5, -9]]
It only takes about a second to evaluate:
>>> len(signedSums([i for i in range(1,21)],100))
2865
Hence there are 2865 ways to add or subtract the integers in the range 1,2,..,20 to get a net sum of 100.
I assumed that a1 can be either added or subtracted (instead of just added, which is what your question implies if taken literally). Note that if you really want to insist that a1 occurs positively, then you could just subtract it from x and apply the above algorithm to the rest of the list and the adjusted target.
Finally, it is not too hard to see that if you solve the subset sub problem with the set of weights {2*a1, 2*a2, 2*a3, .... 2*ay} and with a target sum of x + a1 + a2 + ... + ay then the subsets selected will correspond exactly to the subsets where the positive signs occur in the solution to the original problem. Thus your problem is easily reducible to the subset-sum problem and it is thus NP-complete to determine if it has any solutions (and NP-hard to list them all).
We have conditions:
a1 ± a2 ± ... ± ay = x, y<20 [1]
First of all, I would generalize the condition [1], allowing all 'a' including 'a1' to be ±:
±a1 ± a2 ± ... ± ay = x [2]
If we have solution for [2], we can easily get solution for [1]
To solve [2] we can use the following approach:
combinations list x
| x == 0 && null list = [[]]
| null list = []
| otherwise = plusCombinations ++ minusCombinations where
a = head list
rest = tail list
plusCombinations = map (\c -> a:c) $ combinations rest (x-a)
minusCombinations = map (\c -> -a:c) $ combinations rest (x+a)
Explanation:
First condition checks if x reached zero and used all numbers from list. This means that solution found and we return single solution: [[]]
Second condition checks that list is empty and as far as x is not 0 this means that no solution can be found, returning empty solution: []
Third branch means that we can two alternatives: to use ai with '+' or with '-' so we concatenate plus and minus combinations
Example output:
*Main> combinations [1,2,3,4] 2
[[1,2,3,-4],[-1,2,-3,4]]
*Main> combinations [1,2,3,4] 3
[]
*Main> combinations [1,2,3,4] 4
[[1,2,-3,4],[-1,-2,3,4]]

Efficient partial permutation sort in Julia

I am dealing with a problem that requires a partial permutation sort by magnitude in Julia. If x is a vector of dimension p, then what I need are the first k indices corresponding to the k components of x that would appear first in a partial sort by absolute value of x.
Refer to Julia's sorting functions here. Basically, I want a cross between sortperm and select!. When Julia 0.4 is released, I will be able to obtain the same answer by applying sortperm! (this function) to the vector of indices and choosing the first k of them. However, using sortperm! is not ideal here because it will sort the remaining p-k indices of x, which I do not need.
What would be the most memory-efficient way to do the partial permutation sort? I hacked a solution by looking at the sortperm source code. However, since I am not versed in the ordering modules that Julia uses there, I am not sure if my approach is intelligent.
One important detail: I can ignore repeats or ambiguities here. In other words, I do not care about the ordering by abs() of indices for two components 2 and -2. My actual code uses floating point values, so exact equality never occurs for practical purposes.
# initialize a vector for testing
x = [-3,-2,4,1,0,-1]
x2 = copy(x)
k = 3 # num components desired in partial sort
p = 6 # num components in x, x2
# what are the indices that sort x by magnitude?
indices = sortperm(x, by = abs, rev = true)
# now perform partial sort on x2
select!(x2, k, by = abs, rev = true)
# check if first k components are sorted here
# should evaluate to "true"
isequal(x2[1:k], x[indices[1:k]])
# now try my partial permutation sort
# I only need indices2[1:k] at end of day!
indices2 = [1:p]
select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))
# same result? should evaluate to "true"
isequal(indices2[1:k], indices[1:k])
EDIT: With the suggested code, we can briefly compare performance on much larger vectors:
p = 10000; k = 100; # asking for largest 1% of components
x = randn(p); x2 = copy(x);
# run following code twice for proper timing results
#time {indices = sortperm(x, by = abs, rev = true); indices[1:k]};
#time {indices2 = [1:p]; select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))};
#time selectperm(x,k);
My output:
elapsed time: 0.048876901 seconds (19792096 bytes allocated)
elapsed time: 0.007016534 seconds (2203688 bytes allocated)
elapsed time: 0.004471847 seconds (1657808 bytes allocated)
The following version appears to be relatively space-efficient because it uses only an integer array of the same length as the input array:
function selectperm (x,k)
if k > 1 then
kk = 1:k
else
kk = 1
end
z = collect(1:length(x))
return select!(z,1:k,by = (i)->abs(x[i]), rev = true)
end
x = [-3,-2,4,1,0,-1]
k = 3 # num components desired in partial sort
print (selectperm(x,k))
The output is:
[3,1,2]
... as expected.
I'm not sure if it uses less memory than the originally-proposed solution (though I suspect the memory usage is similar) but the code may be clearer and it does produce only the first k indices whereas the original solution produced all p indices.
(Edit)
selectperm() has been edited to deal with the BoundsError that occurs if k=1 in the call to select!().

numpy: evaluating function in matrix, using previous array as argument in calculating the next

I have an m x n array: a, where the integers m > 1E6, and n <= 5.
I have functions F and G, which are composed like this: F( u, G ( u, t)). u is a 1 x n array, t is a scalar, and F and G returns 1 x n arrays.
I need to evaluate each row of a in F, and use previously evaluated row as the u-array for the next evaluation. I need to make m such evaluations.
This has to be really fast. I was previously impressed by scitools.std StringFunction evaluaion for a whole array, but this problem requires using the previously calculated array as an argument in calculating the next. I don't know if StringFunction can do this.
For example:
a = zeros((1000000, 4))
a[0] = asarray([1.,69.,3.,4.1])
# A is a float defined elsewhere, h is a function which accepts a float as its argument and returns an arbitrary float. h is defined elsewhere.
def G(u, t):
return asarray([u[0], u[1]*A, cos(u[2]), t*h(u[3])])
def F(u, t):
return u + G(u, t)
dt = 1E-6
for i in range(1, 1000000):
a[i] = F(a[i-1], i*dt)
i += 1
The problem with the above code is that it is slow as hell. I need to get these calculations done by numpy milliseconds.
How can I do what I want?
Thank you for our time.
Kind regards,
Marius
This sort of thing is very difficult to do in numpy. If we look at this by column we see a few simpler solutions.
a[:,0] is very easy:
col0 = np.ones((1000))*2
col0[0] = 1 #Or whatever start value.
np.cumprod(col0, out=col0)
np.allclose(col0, a[:1000,0])
True
As mentioned earlier this will overflow very quickly. a[:,1] can be done much along the same lines.
I do not believe there is a way to do the next two columns inside numpy alone quickly. We can turn to numba for this:
from numba import auotojit
def python_loop(start, count):
out = np.zeros((count), dtype=np.double)
out[0] = start
for x in xrange(count-1):
out[x+1] = out[x] + np.cos(out[x+1])
return out
numba_loop = autojit(python_loop)
np.allclose(numba_loop(3,1000),a[:1000,2])
True
%timeit python_loop(3,1000000)
1 loops, best of 3: 4.14 s per loop
%timeit numba_loop(3,1000000)
1 loops, best of 3: 42.5 ms per loop
Although its worth pointing out that this converges to pi/2 very very quickly and there is little point in calculating this recursion past ~20 values for any start value. This returns the exact same answer to double point precision- I didn't bother finding the cutoff, but it is much less then 50:
%timeit tmp = np.empty((1000000));
tmp[:50] = numba_loop(3,50);
tmp[50:] = np.pi/2
100 loops, best of 3: 2.25 ms per loop
You can do something similar with the fourth column. Of course you can autojit all of the functions, but this gives you several different options to try out depending on numba usage:
Use cumprod for the first two columns
Use an approximation for column 3 (and possible 4) where only the first few iterations are calculated
Implement columns 3 and 4 in numba using autojit
Wrap everything inside of an autojit loop (the best option)
The way you have presented this all rows past ~200 will either be np.inf or np.pi/2. Exploit this.
Slightly faster. Your first column is basicly 2^n. Calculating 2^n for n up to 1000000 is gonna overflow.. second column is even worse.
def calc(arr, t0=1E-6):
u = arr[0]
dt = 1E-6
h = lambda x: np.random.random(1)*50.0
def firstColGen(uStart):
u = uStart
while True:
u += u
yield u
def secondColGen(uStart, A):
u = uStart
while True:
u += u*A
yield u
def thirdColGen(uStart):
u = uStart
while True:
u += np.cos(u)
yield u
def fourthColGen(uStart, h, t0, dt):
u = uStart
t = t0
while True:
u += h(u) * dt
t += dt
yield u
first = firstColGen(u[0])
second = secondColGen(u[1], A)
third = thirdColGen(u[2])
fourth = fourthColGen(u[3], h, t0, dt)
for i in xrange(1, len(arr)):
arr[i] = [first.next(), second.next(), third.next(), fourth.next()]

Slow tail recursion in F#

I have an F# function that returns a list of numbers starting from 0 in the pattern of skip n, choose n, skip n, choose n... up to a limit. For example, this function for input 2 will return [2, 3, 6, 7, 10, 11...].
Initially I implemented this as a non-tail-recursive function as below:
let rec indicesForStep start blockSize maxSize =
match start with
| i when i > maxSize -> []
| _ -> [for j in start .. ((min (start + blockSize) maxSize) - 1) -> j] # indicesForStep (start + 2 * blockSize) blockSize maxSize
Thinking that tail recursion is desirable, I reimplemented it using an accumulator list as follows:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList
| _ -> indicesForStepInternal (istart + 2 * blockSize) (accumList # [for j in istart .. ((min (istart + blockSize) maxSize) - 1) -> j])
indicesForStepInternal start []
However, when I run this in fsi under Mono with the parameters 1, 1 and 20,000 (i.e. should return [1, 3, 5, 7...] up to 20,000), the tail-recursive version is significantly slower than the first version (12 seconds compared to sub-second).
Why is the tail-recursive version slower? Is it because of the list concatenation? Is it a compiler optimisation? Have I actually implemented it tail-recursively?
I also feel as if I should be using higher-order functions to do this, but I'm not sure exactly how to go about doing it.
As dave points out, the problem is that you're using the # operator to append lists. This is more significant performance issue than tail-recursion. In fact, tail-recursion doesn't really speed-up the program too much (but it makes it work on large inputs where the stack would overflow).
The reason why you'r second version is slower is that you're appending shorter list (the one generated using [...]) to a longer list (accumList). This is slower than appending longer list to a shorter list (because the operation needs to copy the first list).
You can fix it by collecting the elements in the accumulator in a reversed order and then reversing it before returning the result:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList |> List.rev
| _ ->
let acc =
[for j in ((min (istart + blockSize) maxSize) - 1) .. -1 .. istart -> j]
# accumList
indicesForStepInternal (istart + 2 * blockSize) acc
indicesForStepInternal start []
As you can see, this has the shorter list (generated using [...]) as the first argument to # and on my machine, it has similar performance to the non-tail-recursive version. Note that the [ ... ] comprehension generates elements in the reversed order - so that they can be reversed back at the end.
You can also write the whole thing more nicely using the F# seq { .. } syntax. You can avoid using the # operator completely, because it allows you to yield individual elemetns using yield and perform tail-recursive calls using yield!:
let rec indicesForStepSeq start blockSize maxSize = seq {
match start with
| i when i > maxSize -> ()
| _ ->
for j in start .. ((min (start + blockSize) maxSize) - 1) do
yield j
yield! indicesForStepSeq (start + 2 * blockSize) blockSize maxSize }
This is how I'd write it. When calling it, you just need to add Seq.toList to evaluate the whole lazy sequence. The performance of this version is similar to the first one.
EDIT With the correction from Daniel, the Seq version is actually slightly faster!
In F# the list type is implemented as a singly linked list. Because of this you get different performance for x # y and y # x if x and y are of different length. That's why your seeing a difference in performance. (x # y) has running time of X.length.
// e.g.
let x = [1;2;3;4]
let y = [5]
If you did x # y then x (4 elements) would be copied into a new list and its internal next pointer would be set to the existing y list. If you did y # x then y (1 element) would be copied into a new list and its next pointer would be set to the existing list x.
I wouldn't use a higher order function to do this. I'd use list comprehension instead.
let indicesForStepTail start blockSize maxSize =
[
for block in start .. (blockSize * 2) .. (maxSize - 1) do
for i in block .. (block + blockSize - 1) do
yield i
]
This looks like the list append is the problem. Append is basically an O(N) operation on the size of the first argument. By accumulating on the left, this operation takes O(N^2) time.
The way this is typically done in functional code seems to be to accumulate the list in reverse order (by accumulating on the right), then at the end, return the reverse of the list.
The first version you have avoids the append problem, but as you point out, is not tail recursive.
In F#, probably the easiest way to solve this problem is with sequences. It is not very functional looking, but you can easily create an infinite sequence following your pattern, and use Seq.take to get the items you are interested in.

Resources