Related
I'm trying to create a template matching program that's using the following formula to determine the fit between the template and the image:
my code is following:
Halide::Var x, y, xt, yt;
Halide::RDom r(0, t.width(), 0, t.height());
Halide::Func limit, compare;
limit = Halide::BoundaryConditions::constant_exterior(input,255);
compare(x, y) = limit(x,y);
compare(x, y) = Halide::cast<uint8_t>(Halide::pow(t(0 + r.x, 0 + r.y) - limit(x + r.x, y + r.y),2));
Halide::Image<uint8_t> output(input.width(),input.height());
output = compare.realize(input.width(),input.height());
After executing the following code the result image is shifted like in the example:
Original image:
Template:
Result:
How can I prevent the image from shifting?
Not sure what the types of t and input are, so the following might overflow, but I think you want something like:
Halide::Var x, y, xt, yt;
Halide::RDom r(0, t.width(), 0, t.height());
Halide::Func limit, compare;
limit = Halide::BoundaryConditions::constant_exterior(input,255);
compare(x, y) = limit(x,y);
compare(x, y) = Halide::cast<uint8_t>(sum(Halide::pow(t(r.x, r.y) - limit(x + r.x - t.width()/2, y + r.y - t.height()/2),2))/(t.width()*t.height()));
Halide::Image<uint8_t> output(input.width(),input.height());
output = compare.realize(input.width(),input.height());
There is no sum. You are only storing the squared difference of the lower right pixel of the template image. There's some other things too, which I commented on:
Halide::Var x, y, xt, yt;
Halide::RDom r(0, t.width(), 0, t.height());
Halide::Func limit, compare;
// There's no point comparing the template to pixels not in the input.
// If you really wanted to do that, you should start at
// -t.width(), -t.height() and wd, ht will be plus the template size
// instead of minus.
int wd = input.width () - t.width ();
int ht = input.height() - t.height();
// constant_exterior returns a Func.
// We can copy all dimensions with an underscore.
limit(_) = Halide::BoundaryConditions::constant_exterior(input,255)(_) / 255.f;
Func tf;
tf(_) = t(_) / 255.f;
// Not necessary now and even so, should have been set to undef< uint8_t >().
// compare(x, y) = limit(x,y);
// Expr are basically cut and pasted to where they are used.
Expr sq_dif = Halide::pow(tf(r.x, r.x) - limit(x + r.x, y + r.y), 2);
Expr t_count = t.width() * t.height();
Expr val = Halide::sum(sq_dif) / t_count;
compare(x, y) = Halide::cast<uint8_t>(Halide::clamp(255 * val, 0, 255));
// The size of output is set by realize().
Halide::Image<uint8_t> output;
output = compare.realize(wd, ht);
I think my code below it's not exactly give me the same random distribution.
subroutine trig_random_value()
implicit none
integer :: t, z, y, x
real(real64) :: theta, r
real(real64), parameter :: PI=4.D0*DATAN(1.D0)
integer, dimension(12) :: date_time
integer, dimension(12) :: seed
call date_and_time(values=date_time)
call random_seed
seed = date_time(6) * date_time(7) + date_time(8)
call random_seed(put = seed)
do z = 1, z_size
do y = 1, y_size
do x = 1, x_size
theta = rand()*2*PI
r = 0.1*rand()
l1(1, z, y, x) = r*cos(theta)
l2(1, z, y, x) = r*sin(theta)
theta = rand()*2*PI
r = 0.1*rand()
l1(2, z, y, x) = r*cos(theta)
l2(2, z, y, x) = r*sin(theta)
end do
end do
end do
return
end subroutine trig_random_value
According to my code, I try to random value to l1(1,:,:,:), l1(2,:,:,:), l2(1,:,:,:) and l2(2,:,:,:) where l(t, x, y, z) is (3+1)-dimension array
Why do i use trigonometry function for my random function? because i want a circular randomization. If i plot distribution of l1(1,:,:,:) vs l2(1,:,:,:) or l1(2,:,:,:) vs l2(2,:,:,:) i will get circle shape distribution with radius 0.1
So, and why i tell you that this's not exactly give me a same distribution? because i was tried to measure a variance of them and i got
variance_l1_t1 = 1.6670507752921395E-003
variance_l1_t2 = 3.3313151655785292E-003
variance_l2_t1 = 4.9965623815717321E-003
variance_l2_t2 = 6.6641054728288360E-003
notice that (variance_l1_t2 - variance_l1_t1) = (variance_l2_t1 - variance_l1_t2) = (variance_l2_t2 - variance_l2_t1) = 0.00166
That's quite a weird result. In actually i should get almost the same variance value of l1(1,:,:,:), l1(2,:,:,:), l2(1,:,:,:) and l2(2,:,:,:) if this function if good random function. may be i did something wrong.
How to solve this problem?
Additional information from request:
real(real64) function find_variance(l)
implicit none
real(real64), dimension(z_size, y_size, x_size), intent(in) :: l
integer :: z, y, x
real(real64) :: l_avg = 0
real(real64) :: sum_val = 0
do z = 1, z_size
do y = 1, y_size
do x = 1, x_size
l_avg = l_avg + l(z, y, x)
end do
end do
end do
l_avg = l_avg/(z_size*y_size*x_size)
do z = 1, z_size
do y = 1, y_size
do x = 1, x_size
sum_val = sum_val + (l(z , y, x) - l_avg)**2
end do
end do
end do
find_variance = sum_val/(z_size*y_size*x_size)
return
end function find_variance
In modern Fortran, an initialization of variables such as
real(real64) :: sum_val = 0
means that sum_val is a variable with the SAVE attribute (which is similar to static in C), which is initialized only once when the program starts. It is equivalent to
real(real64), save :: sum_val = 0
The value of the SAVEed variable is kept during the entire run and it will not be initialized to 0 again. To fix this, simply replace
real(real64) :: sum_val !! this is a usual local variable
sum_val = 0 !! or sum_val = real( 0, real64 )
then I guess it should be fine. Please see this page for more details.
IMO this is one of the very confusing features of Fortran...
Okay, so I ported this bit of code from a javascript file to Lua. It's a solution to the Bin Packing problem. Basically, this is a given a target rectangle size "init(x,y)", and then is given a table with blocks to fill said rectangle, "fit(blocks)". However when I run this I get the error "attempt to index local 'root' (a number value)". What is going wrong here?
I also don't fully understand how this code is working, someone helped me along with the porting process. When I pass a table "blocks" into the fit function, is it adding the attributes of block.fit.x and block.fit.y?
Any help is appreciated.
Edit: Fixed the error by changing "." to ":" when calling a method.
--ported from https://github.com/jakesgordon/bin-packing
local _M = {}
mt = {
init = function(t, x, y) --takes in dimensions of target rect.
t.root = { x = 0, y = 0, x = x, y = y }
end,
fit = function(t, blocks) --passes table "blocks"
local n, node, block
for k, block in pairs(blocks) do
if node == t.findNode(t.root, block.x, block.y) then
block.fit = t.splitNode(node, block.x, block.y)
end
end
end,
findNode = function(t, root, x, y)
if root.used then --if root.used then
return t.findNode(root.right, x, y) or t.findNode(root.down, x, y)
elseif (x <= root.x) and (y <= root.y) then
return root
else
return nil
end
end,
splitNode = function(t, node, x, y)
node.used = true
node.down = { x = node.x, y = node.y + y, x = node.x, y = node.y - y }
node.right = { x = node.x + x, y = node.y, x = node.x - x, y = y }
return node;
end,
}
setmetatable(_M, mt)
-- Let's do the object-like magic
mt.__index = function(t, k)
if nil ~= mt[k] then
return mt[k]
else
return t[k]
end
end
mt.__call = function(t, ...)
local new_instance = {}
setmetatable(new_instance, mt)
new_instance:init(...)
return new_instance
end
return _M
I don't know if this will help, but here is how I would have ported the code over to Lua.
local Packer = {}
Packer.__index = Packer
function Packer:findNode (root, w, h)
if root.used then
return self:findNode(root.right, w, h) or self:findNode(root.down, w, h)
elseif w <= root.w and h <= root.h then
return root
else
return nil
end
end
function Packer:fit (blocks)
local node
for _, block in pairs(blocks) do
node = self:findNode(self.root, block.w, block.h)
if node then
block.fit = self:splitNode(node, block.w, block.h)
end
end
end
function Packer.init (w, h)
local packer = {}
packer.root = {x = 0, y = 0, w = w, h = h}
return setmetatable(packer, Packer)
end
function Packer:splitNode (node, w, h)
node.used = true
node.down = {x = node.x, y = node.y + h, w = node.w, h = node.h - h}
node.right = {x = node.x + w, y = node.y, w = node.w - w, h = h }
return node
end
return Packer
Just place it in a file like packer.lua and import it into your main.lua with local Packer = require "packer"
So I recently purchased the draw library on Touch Lua! I've started trying to make a simple Tic Tac Toe game. I'm using a simple setup they used to detect clicks on the NumPad default program, so the buttons should work.
The problem is that when you tap a square, the O fills into seemingly-random squares, sometimes more than 1, up to 4+ squares may get filled.
I suspect the problem is the function Picked, which sets the title to X/O and then updates the board.
local Turn = nil
local Move = "O"
local Mode = nil
::ModePick::
print("1 player mode? (y/n)")
local plrs = io.read()
if plrs == "y" then
Mode = 1
goto TurnChoice
elseif plrs == "n" then
Mode = 2
goto Game
else
goto ModePick
end
::TurnChoice::
print("Would you like to go first? (Be O) (y/n)")
do
local pick = io.read()
if pick == "y" then
Turn = 1
elseif pick == "n" then
Turn = 2
else
goto TurnChoice
end
end
::Game::
local Buttons = {}
draw.setscreen(1)
draw.settitle("Tic Tac Toe")
draw.clear()
width, height = draw.getport()
function Picked(b)
for i,v in pairs(Buttons) do
if v == b then
b.title = Move
UpdateBoard()
end
end
--Fill in X/O details
--Detect if there's a tic/tac/toe
--Set winning screen
if Move == "O" then
--Compute Move (1 player)
--Move = "X" (2 player)
else
Move = "O"
end
end
function DrawButton(b)
draw.setfont('Helvetica', 50)
draw.setlinestyle(2, 'butt')
local x1, y1 = b.x, b.y
local x2, y2 = x1+b.width, y1+b.height
draw.rect(x1, y1, x2, y2, b.color)
local w, h = draw.stringsize(b.title)
local x = b.x + (b.width - w)/2
local y = b.y + (b.height - h)/2
draw.string(b.title, x, y, draw.black)
return b
end
function Button(x, y, x2, y2, title, color, action)
local action = action or function() end
local button = {x = x, y = y, width = x2, height = y2, color = color, title = title, action = action}
table.insert(Buttons, button)
return button
end
function LookUpButton(x, y)
for i = 1, #Buttons do
local b = Buttons[i]
if x > b.x and x < b.x+b.width and y > b.y and y < b.y+b.height then
return b
end
end
return nil
end
function TouchBegan(x, y)
local b = LookUpButton(x, y)
if b then
b.action(b)
end
end
function TouchMoved(x, y)
end
function TouchEnded(x, y)
end
draw.tracktouches(TouchBegan, TouchMoved, TouchEnded)
function CreateButton(x,y,x2,y2,txt,col,func)
return DrawButton(Button(x, y, x2, y2, txt, col, func))
end
function UpdateBoard()
draw.clear()
for i = 1,3 do
for ii = 1,3 do
CreateButton(100 * (ii - 1) + 7.5, 100 * (i - 1) + 75, 100, 100, Buttons[i + ii].title, draw.blue, Picked)
end
end
end
for i = 1,3 do
for ii = 1,3 do
CreateButton(100 * (ii - 1) + 7.5, 100 * (i - 1) + 75, 100, 100, "", draw.blue, Picked)
end
end
while true do
draw.doevents()
sleep(1)
end
Note: Sorry if the indention came out wrong, I pasted all this code in on my iPod, so I had to manually put in 4 spaces starting each line.
If anybody could help me out with this small setback I have, I'd love the help, if there's anything I'm missing I'd gladly edit it in just reply in the comments :D
EDIT: I've modified some of the code to fix how the table keeps getting new buttons, this is the code I have now, same problem, buttons are added in wrong place (and getting removed now):
function Button(x, y, x2, y2, title, color, action, prev)
local action = action or function() end
local button = {x = x, y = y, width = x2, height = y2, color = color, title = title, action = action}
if prev then
for i,v in pairs(Buttons) do
if v == prev then
table.remove(Buttons, i)
end
end
end
table.insert(Buttons, button)
return button
end
function CreateButton(x,y,x2,y2,txt,col,func, prev)
return DrawButton(Button(x, y, x2, y2, txt, col, func, prev))
end
function UpdateBoard()
draw.clear()
for i = 1,3 do
for ii = 1,3 do
CreateButton(100 * (ii - 1) + 7.5, 100 * (i - 1) + 75, 100, 100, Buttons[i + ii].title, draw.blue, Picked, Buttons[i + ii])
end
end
end
EDIT: Thanks to Etan I've fixed UpdateBoard, squares are still random:
function UpdateBoard()
draw.clear()
local n = 1
for i = 1,3 do
for ii = 1,3 do
CreateButton(100 * (ii - 1) + 7.5, 100 * (i - 1) + 75, 100, 100, Buttons[n].title, draw.blue, Picked, Buttons[n])
n = n + 1
end
end
end
It's been a while since I've got around to post the finished code, but this is what it looks like:
function UpdateBoard(ended)
local ended = ended or false
local Act = nil
if ended == true then
Act = function() end
else
Act = Picked
end
draw.clear()
local Buttons2 = {}
for i,v in pairs(Buttons) do
Buttons2[i] = v
end
Buttons = {}
local n = 1
for i = 1,3 do
for ii = 1,3 do
CreateButton(100 * (ii - 1) + 7.5, 100 * (i - 1) + 75, 100, 100, Buttons2[n].title, draw.blue, Act, Buttons2[n], i, ii)
n = n + 1
end
end
OpenButtons = {}
ClosedButtons = {}
for i,v in pairs(Buttons) do
if v.title == "" then
table.insert(OpenButtons, v)
else
table.insert(ClosedButtons, v)
end
end
end
It may seem a bit complicated for the question, but this is the code after multiple other things.
The main difference you should note here, is that it's recreating the table of buttons, so it does not recount new buttons, and it uses n to count up, instead of using i and ii.
Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
Given:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
{
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
))
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
}
}
}
Py_RETURN_NONE;
}
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
{0}
};
void
initexte()
{
Py_InitModule("exte", exteMethods);
}
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
Replace
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
with
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want:
http://docs.python.org/extending/