I have this distribution below:
using Distributions
struct OrthoNNDist <: DiscreteMultivariateDistribution
x0::Vector{Int64}
oc::Array{Int64,2}
x1s::Array
prob::Float64
#return a new uniform distribution with all vectors in x1s orthogonal to oc
function OrthoNNDist(x0::Vector{Int}, oc::Array{Int,2})
x1s = []
for i = 1:size(oc)[2]
x1 = x0 + oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
x1 = x0 - oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
end
new(x0, oc, x1s, 1.0/length(x1s))
end
end
Base.length(d::OrthoNNDist) = length(d.x0)
Distributions.rand(d::OrthoNNDist, N::Integer=1) = rand(d.x1s, 1)
Distributions.pdf(d::OrthoNNDist, x::Vector) = x in d.x1s ? D.prob : 0.0
Distributions.pdf(d::OrthoNNDist) = fill(d.prob, size(d.x1s))
Distributions.logpdf(d::OrthoNNDist, x::Vector) = log(PDF(d, x))
and I want to generate random values from it, I don't know how I've tried:rand(OrthoNNDist,1000)and it didn't work, I'm kinda new on probabilistic programming, I don't know how I can do that.
The function nonneg is no longer provided, that is easy enough to remedy:
nonneg(x::Real) = zero(x) <= x
nonneg(x::Vector{<:Real}) = all(nonneg, x)
Almost all of what you wrote is fine:
using Distributions
struct OrthoNNDist <: DiscreteMultivariateDistribution
x0::Vector{Int64}
oc::Array{Int64,2}
x1s::Array
prob::Float64
#return a new uniform distribution with all vectors in x1s orthogonal to oc
function OrthoNNDist(x0::Vector{Int}, oc::Array{Int,2})
x1s = []
for i = 1:size(oc)[2]
x1 = x0 + oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
x1 = x0 - oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
end
new(x0, oc, x1s, 1.0/length(x1s))
end
end
Base.length(d::OrthoNNDist) = length(d.x0)
Distributions.pdf(d::OrthoNNDist, x::Vector) = x in d.x1s ? D.prob : 0.0
Distributions.pdf(d::OrthoNNDist) = fill(d.prob, size(d.x1s))
Distributions.logpdf(d::OrthoNNDist, x::Vector) = log(PDF(d, x))
To get rand working .. you are almost there
using Distributions: rand
Distributions.rand(d::OrthoNNDist, n::Int=1) = rand(d.x1s, n)
now with some data
julia> x0 = rand(1:1_000_000,5);
julia> oc = reshape(rand(1:1_000_000,25), (5,5));
julia> dist = OrthoNNDist(x0,oc);
julia> Distributions.rand(dist, 4)
4-element Array{Any,1}:
[1330729, 656190, 927615, 470782, 1435138]
[1382946, 1058057, 778316, 488440, 1304526]
[1330729, 656190, 927615, 470782, 1435138]
[1409093, 353679, 454229, 698320, 1271674]
(thanks to Mohamed Tarek for his direction)
Related
I am running this code but it shows a method error. Can someone please help me?
Code:
function lsmc_am_put(S, K, r, σ, t, N, P)
Δt = t / N
R = exp(r * Δt)
T = typeof(S * exp(-σ^2 * Δt / 2 + σ * √Δt * 0.1) / R)
X = Array{T}(N+1, P)
for p = 1:P
X[1, p] = x = S
for n = 1:N
x *= R * exp(-σ^2 * Δt / 2 + σ * √Δt * randn())
X[n+1, p] = x
end
end
V = [max(K - x, 0) / R for x in X[N+1, :]]
for n = N-1:-1:1
I = V .!= 0
A = [x^d for d = 0:3, x in X[n+1, :]]
β = A[:, I]' \ V[I]
cV = A' * β
for p = 1:P
ev = max(K - X[n+1, p], 0)
if I[p] && cV[p] < ev
V[p] = ev / R
else
V[p] /= R
end
end
end
return max(mean(V), K - S)
end
lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
error:
MethodError: no method matching (Array{Float64})(::Int64, ::Int64)
Closest candidates are:
(Array{T})(::LinearAlgebra.UniformScaling, ::Integer, ::Integer) where T at /Volumes/Julia-1.8.3/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/uniformscaling.jl:508
(Array{T})(::Nothing, ::Any...) where T at baseext.jl:45
(Array{T})(::UndefInitializer, ::Int64) where T at boot.jl:473
...
Stacktrace:
[1] lsmc_am_put(S::Int64, K::Int64, r::Float64, σ::Float64, t::Float64, N::Int64, P::Int64)
# Main ./REPL[39]:5
[2] top-level scope
# REPL[40]:1
I tried this code and I was expecting a numeric answer but this error came up. I tried to look it up on google but I found nothing that matches my situation.
The error occurs where you wrote X = Array{T}(N+1, P). Instead, use one of the following approaches if you need a Vector:
julia> Array{Float64, 1}([1,2,3])
3-element Vector{Float64}:
1.0
2.0
3.0
julia> Vector{Float64}([1, 2, 3])
3-element Vector{Float64}:
1.0
2.0
3.0
And in your case, you should write X = Array{T,1}([N+1, P]) or X = Vector{T}([N+1, P]). But since there's such a X[1, p] = x = S expression in your code, I guess you mean to initialize a 2D array and update its elements through the algorithm. For this, you can define X like the following:
X = zeros(Float64, N+1, P)
# Or
X = Array{Float64, 2}(undef, N+1, P)
So, I tried the following in your code:
# I just changed the definition of `X` in your code like the following
X = Array{T, 2}(undef, N+1, P)
#And the result of the code was:
julia> lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
3.329213731484463
I have a data cube a of radius w and for every element of that cube, I would like to add the element and all surrounding values within a cube of radius r, where r < w. The result should be returned in an array of the same shape, b.
As a simple example, suppose:
a = numpy.ones(shape=(2*w,2*w,2*w),dtype='float32')
kernel = numpy.ones(shape=(2*r,2*r,2*r),dtype='float32')
b = convolve(a,kernel,mode='constant',cval=0)
then b would have the value (2r)(2r)(2r) for all the indices not on the edge.
Currently I am using a loop to do this and it is very slow, especially for larger w and r. I tried scipy convolution but got little speedup over the loop. I am now looking at numba's parallel computation feature but cannot figure out how to rewrite the code to work with numba. I have a Nvidia RTX card so CUDA GPU calculations are also possible.
Suggestions are welcome.
Here is my current code:
for x in range(0,w*2):
print(x)
for y in range(0,w*2):
for z in range(0,w*2):
if x >= r:
x1 = x - r
else:
x1 = 0
if x < w*2-r:
x2 = x + r
else:
x2 = w*2 - 1
if y >= r:
y1 = y - r
else:
y1 = 0
if y < w*2-r:
y2 = y + r
else:
y2 = w*2 - 1
if z >= r:
z1 = z - r
else:
z1 = 0
if z < w*2-r:
z2 = z + r
else:
z2 = w*2 - 1
b[x][y][z] = numpy.sum(a[x1:x2,y1:y2,z1:z2])
return b
Here is a very simple version of your code so that it works with numba. I was finding speed-ups of a factor of 10 relative to the pure numpy code. However, you should be able to get even greater speed-ups using a FFT convolution algorithm (e.g. scipy's fftconvolve). Can you share your attempt at getting convolution to work?
from numba import njit
#njit
def sum_cubes(a,b,w,r):
for x in range(0,w*2):
#print(x)
for y in range(0,w*2):
for z in range(0,w*2):
if x >= r:
x1 = x - r
else:
x1 = 0
if x < w*2-r:
x2 = x + r
else:
x2 = w*2 - 1
if y >= r:
y1 = y - r
else:
y1 = 0
if y < w*2-r:
y2 = y + r
else:
y2 = w*2 - 1
if z >= r:
z1 = z - r
else:
z1 = 0
if z < w*2-r:
z2 = z + r
else:
z2 = w*2 - 1
b[x,y,z] = np.sum(a[x1:x2,y1:y2,z1:z2])
return b
EDIT: Your original code has a small bug in it. The way numpy indexing works, the final line should be
b[x,y,z] = np.sum(a[x1:x2+1,y1:y2+1,z1:z2+1])
unless you want the cube to be off-centre.
Assuming you do want the cube to be centred, then a much faster way to do this calculation is using scipy's uniform filter:
from scipy.ndimage import uniform_filter
def sum_cubes_quickly(a,b,w,r):
b = uniform_filter(a,mode='constant',cval=0,size=2*r+1)*(2*r+1)**3
return b
A few quick runtime comparisons for randomly generated data with w = 50, r = 10:
Original raw numpy code - 15.1 sec
Numba'd numpy code - 8.1 sec
uniform_filter - 13.1 ms
Does this code have mutation, selection, and crossover, just like the original genetic algorithm.
Since this, a hybrid algorithm (i.e PSO with GA) does it use all steps of original GA or skips some
of them.Please do tell me.
I am just new to this and still trying to understand. Thank you.
%%% Hybrid GA and PSO code
function [gbest, gBestScore, all_scores] = QAP_PSO_GA(CreatePopFcn, FitnessFcn, UpdatePosition, ...
nCity, nPlant, nPopSize, nIters)
% Set algorithm parameters
constant = 0.95;
c1 = 1.5; %1.4944; %2;
c2 = 1.5; %1.4944; %2;
w = 0.792 * constant;
% Allocate memory and initialize
gBestScore = inf;
all_scores = inf * ones(nPopSize, nIters);
x = CreatePopFcn(nPopSize, nCity);
v = zeros(nPopSize, nCity);
pbest = x;
% update lbest
cost_p = inf * ones(1, nPopSize); %feval(FUN, pbest');
for i=1:nPopSize
cost_p(i) = FitnessFcn(pbest(i, 1:nPlant));
end
lbest = update_lbest(cost_p, pbest, nPopSize);
for iter = 1 : nIters
if mod(iter,1000) == 0
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
else
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
end
% Update pbest
cost_x = inf * ones(1, nPopSize);
for i=1:nPopSize
cost_x(i) = FitnessFcn(x(i, 1:nPlant));
end
s = cost_x<cost_p;
cost_p = (1-s).*cost_p + s.*cost_x;
s = repmat(s',1,nCity);
pbest = (1-s).*pbest + s.*x;
% update lbest
lbest = update_lbest(cost_p, pbest, nPopSize);
% update global best
all_scores(:, iter) = cost_x;
[cost,index] = min(cost_p);
if (cost < gBestScore)
gbest = pbest(index, :);
gBestScore = cost;
end
% draw current fitness
figure(1);
plot(iter,min(cost_x),'cp','MarkerEdgeColor','k','MarkerFaceColor','g','MarkerSize',8)
hold on
str=strcat('Best fitness: ', num2str(min(cost_x)));
disp(str);
end
end
% Function to update lbest
function lbest = update_lbest(cost_p, x, nPopSize)
sm(1, 1)= cost_p(1, nPopSize);
sm(1, 2:3)= cost_p(1, 1:2);
[cost, index] = min(sm);
if index==1
lbest(1, :) = x(nPopSize, :);
else
lbest(1, :) = x(index-1, :);
end
for i = 2:nPopSize-1
sm(1, 1:3)= cost_p(1, i-1:i+1);
[cost, index] = min(sm);
lbest(i, :) = x(i+index-2, :);
end
sm(1, 1:2)= cost_p(1, nPopSize-1:nPopSize);
sm(1, 3)= cost_p(1, 1);
[cost, index] = min(sm);
if index==3
lbest(nPopSize, :) = x(1, :);
else
lbest(nPopSize, :) = x(nPopSize-2+index, :);
end
end
If you are new to Optimization, I recommend you first to study each algorithm separately, then you may study how GA and PSO maybe combined, Although you must have basic mathematical skills in order to understand the operators of the two algorithms and in order to test the efficiency of these algorithm (this is what really matter).
This code chunk is responsible for parent selection and crossover:
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
Is not really obvious how selection randperm is done (I have no experience about Matlab).
And this is the code that is responsible for updating the velocity and position of each particle:
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
This version of velocity updating strategy is utilizing what is called Interia-Weight W, which basically mean we are preserving the velocity history of each particle (not completely recomputing it).
It worth mentioning that velocity updating is done more often than crossover (each 1000 iteration).
I have two lines:
y = -1/3x + 4
y = 3x + 85
The intersection is at [24.3, 12.1].
I have a set of coordinates prepared:
points = [[1, 3], [4, 8], [25, 10], ... ]
#y = -1/3x + b
m_regr = -1/3
b_regr = 4
m_perp = 3 #(1 / m_regr * -1)
distances = []
points.each do |pair|
x1 = pair.first
y2 = pair.last
x2 = ((b_perp - b_regr / (m_regr - m_perp))
y2 = ((m_regr * b_perp) / (m_perp * b_regr))/(m_regr - m_perp)
distance = Math.hypot((y2 - y1), (x2 - x1))
distances << distance
end
Is there a gem or some better method for this?
NOTE: THE ABOVE METHOD DOES NOT WORK. See my answer for a solution that works.
What's wrong with using a little math?
If you have:
y = m1 x + b1
y = m2 x + b2
It's a simple system of linear equations.
If you solve them, your intersection is:
x = (b2 - b1)/(m1 - m2)
y = (m1 b2 - m2 b1)/(m1 - m2)
After much suffering and many different tries, I found a simple algebraic method here that not only works but is dramatically simplified.
distance = ((y - mx - b).abs / Math.sqrt(m**2 + 1))
where x and y are the coordinates for the known point.
For Future Googlers:
def solution k, l, m, n, p, q, r, s
intrsc_x1 = m - k
intrsc_y1 = n - l
intrsc_x2 = r - p
intrsc_y2 = s - q
v1 = (-intrsc_y1 * (k - p) + intrsc_x1 * (l - q)) / (-intrsc_x2 * intrsc_y1 + intrsc_x1 * intrsc_y2);
v2 = ( intrsc_x2 * (l - q) - intrsc_y2 * (k - p)) / (-intrsc_x2 * intrsc_y1 + intrsc_x1 * intrsc_y2);
(v1 >= 0 && v1 <= 1 && v2 >= 0 && v2 <= 1) ? true : false
end
The simplest and cleanest way I've found on the internet.
Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
Given:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
{
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
))
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
}
}
}
Py_RETURN_NONE;
}
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
{0}
};
void
initexte()
{
Py_InitModule("exte", exteMethods);
}
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
Replace
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
with
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want:
http://docs.python.org/extending/