How to perform advanced indexing in PyTorch?

How to perform advanced indexing in PyTorch? - performance

Is there a way of doing the following without looping?
S, N, H = 9, 7, 4
a = torch.randn(S, N, H)
# tensor with integer values between 1, S of shape (N,)
lens = torch.randint(1, S + 1, (N,))
res = torch.zeros(N, H)
for i in range(N):
res[i] = a[lens[i] - 1, i, :]

Yes, I believe this works.
import torch
S, N, H = 9, 7, 4
a = torch.randn(S, N, H)
# tensor with integer values between 1, S of shape (N,)
lens = torch.randint(0, S, (N,))
i = torch.tensor(range(0,7))
res = torch.zeros(N, H)
res = a[lens, i, :]
print(res)
And why did you make lens 1 from S+1 and then do lens[i]-1 ? I just changed it so lens is 0 from S for convenience. However if you need lens to be 1 from S+1, you can change
res = a[lens, i, :]
to
res = a[lens-1, i, :]

Related

Tricks to improve the performance of a cunstom function in Julia

I am replicating using Julia a sequence of steps originally made in Matlab. In Octave, this procedure takes 1.4582 seconds and in Julia (using Jupyter) it takes approximately 10 seconds. I'll try to be brief in the scripts. My goal is to achieve or improve Octave's performance. First of all, I will describe my variables and some function:
zgrid (double 1x7 size)
kgrid (double 500x1 size)
V0 (double 500x7 size)
P (double 7x7 size) a transition matrix
delta and beta are fixed parameters.
F(z,k) and u(c) are particular functions and are specified in the Julia script.
% Octave script
% V0 is given
[K, Z, K2] = meshgrid(kgrid, zgrid, kgrid);
K = permute(K, [2, 1, 3]);
Z = permute(Z, [2, 1, 3]);
K2 = permute(K2, [2, 1, 3]);
C = max(f(Z,K) + (1-delta)*K - K2,0);
U = u(C);
EV = V0*P';% EV is a 500x7 matrix size
EV = permute(repmat(EV, 1, 1, 500), [3, 2, 1]);
H = U + beta*EV;
[TV, index] = max(H, [], 3);
In Julia, I created a function that replicates this procedure. I used loops, but it has a performance 9 times longer.
% Julia script
% V0 is the input of my T operator function
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
F = (z,k) -> exp(z)*(k^α);
u = (c) -> (c^(1-μ) - 1)/(1-μ)
% parameters
α = 1/3
β = 0.987
δ = 0.012;
μ = 2
Kss = 48.1905148382166
kgrid = range(0.75*Kss, stop=1.25*Kss, length=500);
zgrid = [-0.06725382459813659, -0.044835883065424395, -0.0224179415327122, 0 , 0.022417941532712187, 0.04483588306542438, 0.06725382459813657]
function T(V)
E=V*P'
T1 = zeros(Float64, 500, 7 )
aux = zeros(Float64, 500)
for i = 1:7
for j = 1:500
for l = 1:500
c= maximum( (F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j] - kgrid[l],0))
aux[l] = u(c) + β*E[l,i]
end
T1[j,i] = maximum(aux)
end
end
return T1
end
I would very much like to improve my performance in Julia. I believe there is a way to improve, but I am new in Julia programming.

This code runs for me in 5ms. Note that I have made F and u into proper (not anonymous) functions, F_ and u_, but you could get a similar effect by making the anonymous functions const.
Your main problem is that you have a lot of non-const global variables, and also that your main function is doing unnecessary work multiple times, and creating an unnecessary array, aux.
The performance tips section in the manual is essential reading: https://docs.julialang.org/en/v1/manual/performance-tips/
F_(z,k) = exp(z) * (k^(1/3)); # you can still use α, but it must be const
u_(c) = (c^(1-2) - 1)/(1-2)
function T_(V, P, kgrid, zgrid, β, δ)
E = V * P'
T1 = similar(V)
for i in axes(T1, 2)
for j in axes(T1, 1)
temp = F_(zgrid[i], kgrid[j]) + (1-δ)*kgrid[j]
aux = -Inf
for l in eachindex(kgrid)
c = max(0.0, temp - kgrid[l])
aux = max(aux, u_(c) + β * E[l, i])
end
T1[j,i] = aux
end
end
return T1
end
Benchmark:
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
zgrid = sort!(rand(1, 7); dims=2)
kgrid = sort!(rand(500, 1); dims=1)
P = rand(length(zgrid), length(zgrid))
#btime T_($V0, $P, $kgrid, $zgrid, $β, $δ);
# output: 5.126 ms (4 allocations: 54.91 KiB)

The following should perform much better. The most noticeable differences are that it calculates F 500x less, and doesn't rely on global variables.
function T(V,kgrid,zgrid,β,δ)
E=V*P'
T1 = zeros(Float64, 500, 7)
for j = 1:500
for i = 1:7
x = F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j]
T1[j,i] = maximum(u(max(x - kgrid[l], 0)) + β*E[l,i] for l in 1:500)
end
end
return T1
end

How can I solve this problem using dynamic programming?

Given a list of numbers, say [4 5 2 3], I need to maximize the sum obtained according to the following set of rules:
I need to select a number from the list and that number will be removed.
Eg. selecting 2 will have the list as [4 5 3].
If the number to be removed has two neighbours then I should get the result of this selection as the product of the currently selected number with one of its neighbours and this product summed up with the other neighbour. eg.: if I select 2 then I can have the result of this selction as 2 * 5 + 3.
If I select a number with only one neighbour then the result is the product of the selected number with its neighbour.
When their is only one number left then it is just added to the result till now.
Following these rules, I need to select the numbers in such an order that the result is maximized.
For the above list, if the order of selction is 4->2->3->5 then the sum obtained is 53 which is the maximum.
I am including a program which lets you pass as input the set of elements and gives all possible sums and also indicates the max sum.
Here's a link.
import itertools
l = [int(i) for i in input().split()]
p = itertools.permutations(l)
c, cs = 1, -1
mm = -1
for i in p:
var, s = l[:], 0
print(c, ':', i)
c += 1
for j in i:
print(' removing: ', j)
pos = var.index(j)
if pos == 0 or pos == len(var) - 1:
if pos == 0 and len(var) != 1:
s += var[pos] * var[pos + 1]
var.remove(j)
elif pos == 0 and len(var) == 1:
s += var[pos]
var.remove(j)
if pos == len(var) - 1 and pos != 0:
s += var[pos] * var[pos - 1]
var.remove(j)
else:
mx = max(var[pos - 1], var[pos + 1])
mn = min(var[pos - 1], var[pos + 1])
s += var[pos] * mx + mn
var.remove(j)
if s > mm:
mm = s
cs = c - 1
print(' modified list: ', var, '\n sum:', s)
print('MAX SUM was', mm, ' at', cs)

Consider 4 variants of the problem: those where every element gets consumed, and those where either the left, the right, or both the right and left elements are not consumed.
In each case, you can consider the last element to be removed, and this breaks the problem down into 1 or 2 subproblems.
This solves the problem in O(n^3) time. Here's a python program that solves the problem. The 4 variants of solve_ correspond to none, one or the other, or both of the endpoints being fixed. No doubt this program can be reduced (there's a lot of duplication).
def solve_00(seq, n, m, cache):
key = ('00', n, m)
if key in cache:
return cache[key]
assert m >= n
if n == m:
return seq[n]
best = -1e9
for i in range(n, m+1):
left = solve_01(seq, n, i, cache) if i > n else 0
right = solve_10(seq, i, m, cache) if i < m else 0
best = max(best, left + right + seq[i])
cache[key] = best
return best
def solve_01(seq, n, m, cache):
key = ('01', n, m)
if key in cache:
return cache[key]
assert m >= n + 1
if m == n + 1:
return seq[n] * seq[m]
best = -1e9
for i in range(n, m):
left = solve_01(seq, n, i, cache) if i > n else 0
right = solve_11(seq, i, m, cache) if i < m - 1 else 0
best = max(best, left + right + seq[i] * seq[m])
cache[key] = best
return best
def solve_10(seq, n, m, cache):
key = ('10', n, m)
if key in cache:
return cache[key]
assert m >= n + 1
if m == n + 1:
return seq[n] * seq[m]
best = -1e9
for i in range(n+1, m+1):
left = solve_11(seq, n, i, cache) if i > n + 1 else 0
right = solve_10(seq, i, m, cache) if i < m else 0
best = max(best, left + right + seq[n] * seq[i])
cache[key] = best
return best
def solve_11(seq, n, m, cache):
key = ('11', n, m)
if key in cache:
return cache[key]
assert m >= n + 2
if m == n + 2:
return max(seq[n] * seq[n+1] + seq[n+2], seq[n] + seq[n+1] * seq[n+2])
best = -1e9
for i in range(n + 1, m):
left = solve_11(seq, n, i, cache) if i > n + 1 else 0
right = solve_11(seq, i, m, cache) if i < m - 1 else 0
best = max(best, left + right + seq[i] * seq[n] + seq[m], left + right + seq[i] * seq[m] + seq[n])
cache[key] = best
return best
for c in [[1, 1, 1], [4, 2, 3, 5], [1, 2], [1, 2, 3], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]:
print(c, solve_00(c, 0, len(c)-1, dict()))

How can I find the minimum index of the array in this case?

We are given an array with n values.
Example: [1,4,5,6,6]
For each index i of the array a ,we construct a new element of array b such that,
b[i]= [a[i]/1] + [a[i+1]/2] + [a[i+2]/3] + ⋯ + [a[n]/(n−i+1)] where [.] denotes the greatest integer function.
We are given an integer k as well.
We have to find the minimum i such that b[i] ≤ k.
I know the brute-force O(n^2) algorithm (to create the array - 'b'), can anybody suggest a better time complexity and way solve it?
For example, for the input [1,2,3],k=3, the output is 1(minimum-index).
Here, a[1]=1; a[2]=2; a[3]=3;
Now, b[1] = [a[1]/1] + [a[2]/2] + [a[3]/3] = [1/1] + [2/2] + [3/3] = 3;
b[2] = [a[2]/1] + [a[3]/2] = [2/1] + [3/2] = 3;
b[3] = [a[3]/1] = [3/1] = 3 (obvious)
Now, we have to find the index i such that b[i]<=k , k='3' , also b[1]<=3, henceforth, 1 is our answer! :-)
Constraints : - Time limits: -(2-seconds) , 1 <= a[i] <= 10^5, 1 <=
n <= 10^5, 1 <= k <= 10^9

Here's an O(n √A)-time algorithm to compute the b array where n is the number of elements in the a array and A is the maximum element of the a array.
This algorithm computes the difference sequence of the b array (∆b = b[0], b[1] - b[0], b[2] - b[1], ..., b[n-1] - b[n-2]) and derives b itself as the cumulative sums. Since the differences are linear, we can start with ∆b = 0, 0, ..., 0, loop over each element a[i], and add the difference sequence for [a[i]], [a[i]/2], [a[i]/3], ... at the appropriate spot. The key is that this difference sequence is sparse (less than 2√a[i] elements). For example, for a[i] = 36,
>>> [36//j for j in range(1,37)]
[36, 18, 12, 9, 7, 6, 5, 4, 4, 3, 3, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> list(map(operator.sub,_,[0]+_[:-1]))
[36, -18, -6, -3, -2, -1, -1, -1, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
We can derive the difference sequence from a subroutine that, given a positive integer r, returns all maximal pairs of positive integers (p, q) such that pq ≤ r.
See complete Python code below.
def maximal_pairs(r):
p = 1
q = r
while p < q:
yield (p, q)
p += 1
q = r // p
while q > 0:
p = r // q
yield (p, q)
q -= 1
def compute_b_fast(a):
n = len(a)
delta_b = [0] * n
for i, ai in enumerate(a):
previous_j = i
for p, q in maximal_pairs(ai):
delta_b[previous_j] += q
j = i + p
if j >= n:
break
delta_b[j] -= q
previous_j = j
for i in range(1, n):
delta_b[i] += delta_b[i - 1]
return delta_b
def compute_b_slow(a):
n = len(a)
b = [0] * n
for i, ai in enumerate(a):
for j in range(n - i):
b[i + j] += ai // (j + 1)
return b
for n in range(1, 100):
print(list(maximal_pairs(n)))
lst = [1, 34, 3, 2, 9, 21, 3, 2, 2, 1]
print(compute_b_fast(lst))
print(compute_b_slow(lst))

This probably cannot reach the efficiency of David Eisenstat's answer but since I spent quite a long time figuring out an implementation, I thought I'd leave it up anyway. As it is, it seems about O(n^2).
The elements of b[i] may be out of order, but sections of them are not:
[a[1]/1] + [a[2]/2] + [a[3]/3]
|------ s2_1 -----|
|-s1_1-|
[a[2]/1] + [a[3]/2]
|------ s2_2 -----|
|-s1_2-|
[a[3]/1]
|-s1_3-|
s2_1 < s2_2
s1_1 < s1_2 < s1_3
Binary search for k on s1. Any result with an s1_i greater than k will rule out a section of ordered rows (rows are b_is).
Binary search for k on s2 on the remaining rows. Any result with an s2_i greater than k will rule out a section of ordered rows (rows are b_is).
This wouldn't help much since in the worst case, we'd have O(n^2 * log n) complexity, greater than O(n^2).
But we can also search horizontally. If we know that b_i ≤ k, then it will rule out both all rows with greater or equal length and the need to search smaller s(m)s, not because smaller s(m)s cannot produce a sum >= k, but because they will necessarily produce one with a higher i and we are looking for the minimum i.
JavaScript code:
var sum_width_iterations = 0
var total_width_summed = 0
var sum_width_cache = {}
function sum_width(A, i, width){
let key = `${i},${width}`
if (sum_width_cache.hasOwnProperty(key))
return sum_width_cache[key]
sum_width_iterations++
total_width_summed += width
let result = 0
for (let j=A.length-width; j<A.length; j++)
result += ~~(A[j] / (j + 1 - i))
return sum_width_cache[key] = result
}
function get_b(A){
let result = []
A.map(function(a, i){
result.push(sum_width(A, i, A.length - i))
})
return result
}
function find_s_greater_than_k(A, width, low, high, k){
let mid = low + ((high - low) >> 1)
let s = sum_width(A, mid, width)
while (low <= high){
mid = low + ((high - low) >> 1)
s = sum_width(A, mid, width)
if (s > k)
high = mid - 1
else
low = mid + 1
}
return [mid, s]
}
function f(A, k, l, r){
let n = A.length
if (l > r){
console.log(`l > r: l, r: ${l}, ${r}`)
return [n + 1, Infinity]
}
let width = n - l
console.log(`\n(call) width, l, r: ${width}, ${l}, ${r}`)
let mid = l + ((r - l) >> 1)
let mid_width = n - mid
console.log(`mid: ${mid}`)
console.log('mid_width: ' + mid_width)
let highest_i = n - mid_width
let [i, s] = find_s_greater_than_k(A, mid_width, 0, highest_i, k)
console.log(`hi_i, s,i,k: ${highest_i}, ${s}, ${i}, ${k}`)
if (mid_width == width)
return [i, s]
// either way we need to look left
// and down
console.log(`calling left`)
let [li, ls] = f(A, k, l, mid - 1)
// if i is the highest, width is
// the width of b_i
console.log(`got left: li, ls, i, high_i: ${li}, ${ls}, ${i}, ${highest_i}`)
if (i == highest_i){
console.log(`i == highest_i, s <= k: ${s <= k}`)
// b_i is small enough
if (s <= k){
if (ls <= k)
return [li, ls]
else
return [i, s]
// b_i is larger than k
} else {
console.log(`b_i > k`)
let [ri, rs] = f(A, k, mid + 1, r)
console.log(`ri, rs: ${ri}, ${rs}`)
if (ls <= k)
return [li, ls]
else if (rs <= k)
return [ri, rs]
else
return [i, s]
}
// i < highest_i
} else {
console.log(`i < highest_i: high_i, i, s, li, ls, mid, mid_width, width, l, r: ${highest_i}, ${i}, ${s}, ${li}, ${ls}, ${mid}, ${mid_width}, ${width}, ${l}, ${r}`)
// get the full sum for this b
let b_i = sum_width(A, i, n - i)
console.log(`b_i: ${b_i}`)
// suffix sum is less than k
// so we cannot rule out either side
if (s < k){
console.log(`s < k`)
let ll = l
let lr = mid - 1
let [lli, lls] = f(A, k, ll, lr)
console.log(`ll, lr, lli, lls: ${ll}, ${lr}, ${lli}, ${lls}`)
// b_i is a match so we don't
// need to look to the right
if (b_i <= k){
console.log(`b_i <= k: i, b_i: ${i}, ${b_i}`)
if (lls <= k)
return [lli, lls]
else
return [i, b_i]
// b_i > k
} else {
console.log(`b_i > k: i, b_i: ${i}, ${b_i}`)
let rl = mid + 1
let rr = r
let [rri, rrs] = f(A, k, rl, rr)
console.log(`rl, rr, rri, rrs: ${rl}, ${rr}, ${rri}, ${rrs}`)
// return the best of right
// and left sections
if (lls <= k)
return [lli, lls]
else if (rrs <= k)
return [rri, rrs]
else
return [i, b_i]
}
// suffix sum is greater than or
// equal to k so we can rule out
// this and all higher rows (`b`s)
// that share this suffix
} else {
console.log(`s >= k`)
let ll = l
// the suffix rules out b_i
// and above
let lr = i - 1
let [lli, lls] = f(A, k, ll, lr)
console.log(`ll, lr, lli, lls: ${ll}, ${lr}, ${lli}, ${lls}`)
let rl = highest_i + 1
let rr = r
let [rri, rrs] = f(A, k, rl, rr)
console.log(`rl, rr, rri, rrs: ${rl}, ${rr}, ${rri}, ${rrs}`)
// return the best of right
// and left sections
if (lls <= k)
return [lli, lls]
else if (rrs <= k)
return [rri, rrs]
else
return [i, b_i]
}
}
}
let lst = [1, 2, 3, 1]
// b [3, 3, 3, 1]
lst = [ 1, 34, 3, 2, 9, 21, 3, 2, 2, 1]
// b [23, 41, 12, 13, 20, 22, 4, 3, 2, 1]
console.log(
JSON.stringify(f(lst, 20, 0, lst.length)))
console.log(`sum_width_iterations: ${sum_width_iterations}`)
console.log(`total_width_summed: ${total_width_summed}`)

Why should calculating b[i] lead to O(n²)? If i = 1, it takes n steps. If i = n, it takes one step to calculate b[i]...
You could improve your calculation when you abort the sum on the condition Sum > k.
Let a in N^n
Let k in N
for (i1 := 1; i1 <= n; i1++)
b := 0
for (i2 :=i1; i2 <= n; i2++) // This loop is the calculation of b[i]
b := b + ceil(a[i2]/(i2 + 1))
if (b > k)
break
if (i2 == n)
return i1

How can i fix a multiplicity issue in mathematica 10.0 loop?

I am solving a project in Mathematica 10 and I think that the best way to do it is using a loop like For or Do. After build it I obtain the results I looking for but with a to much big multiplicity. Here is the isolated part of the code:
(*Initializing variables*)
epot[0] = 1; p[0] = 1; \[Psi][0] = HermiteH[0, x] E^(-(x^2/2));
e[n_] := e[n] = epot[n];
(*Defining function*)
\[Psi][n_] := \[Psi][n] = (Sum[p[k]*x^k,{k,0,4*n}]) [Psi][0];
(*Differential equation*)
S = - D[D[\[Psi][n], x], x] + x^2 \[Psi][n] + x^4 \[Psi][n - 1] - Sum[e[n-k]*\[Psi][k],{k,0,n}];
(*Construction of the loop*)
S1 = Collect[E^(x^2/2) S, x, Simplify];
c = Coefficient[S1, x, 0];
sol = Solve[c == 0, epot[n]]; e[n] = epot[n] /. sol;
For[j = 1, j <= 4 n, j++,
c = Coefficient[S1, x, j];
sol = Solve[c == 0, p[j]];
p[j] = p[j] /. sol;];
(*Results*)
Print[Subscript[e, n], "= ", e[n] // InputForm];
Subscript[e, 1]= {{{3/4}}}
Print[ArrayDepth[e[n]]];
3 (*Multiplicity, it should be 1*)
Print[Subscript[\[Psi], n], "= ", \[Psi][n]];
Subscript[\[Psi], 1]= {{E^(-(x^2/2)) (1-(3 x^2)/8-x^4/8)}}
Print[ArrayDepth[\[Psi][n]]];
2 (*Multiplicity, it should be 1*)
After this calculation, the question remaining is how do i substitute this results in the original functions. Thank you very much.

"Inverted" Selection Sort in Mathematica 8

Well, I'm having trouble with this code, it's about writing the Selection Sort alghorithm in Mathematica, but inverted, I mean, instead of searching for the smallest number and place it in the first position of a list, I need to search for the biggest one and place it in the last position.
I've written this code but as I'm new to Mathematica, I can't find the solution. It doesn't sort the list. Thank you very much for reading, your answers will be helpfull!
L = {};
n = Input["Input the size of the list (a number): "];
For[i = 1, i <= n, m = Input["Input a number to place in the list:"];
L = Append[L, m]; i++]
SelectSort[L] :=
Module[{n = 1, temp, xi = L, j}, While[n <= Length#L, temp = xi[[n]];
For[j = n, j <= Length#L, j++, If[xi[[j]] < temp, temp = xi[[j]]];];
xi[[n ;;]] = {temp}~Join~
Delete[xi[[n ;;]], First#Position[xi[[n ;;]], temp]];
n++;];
xi]
Print[L]

Here is a working version. In the SelectSort[] function I only had to change the function variable to a pattern variable, i.e. L_. Other than that it seems to work.
(* Function definition *)
SelectSort[L_] := Module[{n = 1, temp, xi = L, j},
While[n <= Length#L,
temp = xi[[n]];
For[j = n, j <= Length#L, j++,
If[xi[[j]] < temp, temp = xi[[j]]];
];
xi[[n ;;]] = {temp}~Join~
Delete[xi[[n ;;]], First#Position[xi[[n ;;]], temp]];
n++;];
xi]
(* Run section *)
L = {};
n = Input["Input the size of the list (a number): "];
For[i = 1, i <= n, m = Input["Input a number to place in the list:"];
L = Append[L, m]; i++]
SelectSort[L]
Print[L]
{3, 3, 5, 7, 8}
{8, 3, 5, 7, 3}
The output is first the sorted list from SelectSort[L], then the original input list,L.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to perform advanced indexing in PyTorch? - performance

Is there a way of doing the following without looping? S, N, H = 9, 7, 4 a = torch.randn(S, N, H) # tensor with integer values between 1, S of shape (N,) lens = torch.randint(1, S + 1, (N,)) res = torch.zeros(N, H) for i in range(N): res[i] = a[lens[i] - 1, i, :]

Related

Tricks to improve the performance of a cunstom function in Julia

How can I solve this problem using dynamic programming?

How can I find the minimum index of the array in this case?

How can i fix a multiplicity issue in mathematica 10.0 loop?

"Inverted" Selection Sort in Mathematica 8

Categories

Resources