I'm experiencing a memory leak on ruby 2.3.1 with gtk-3.
On my system (Ubuntu 16-04) the following code consumes approximately 80 MB.
The size of picture.jpg is 289kb.
`require 'gtk3'
def ptest
i=0
j=0
loop {
i += 1
j += 1
exit if j==50
#image = Gtk::Image.new
newPixbuf = GdkPixbuf::Pixbuf.new(:file => "picture.jpg")
#image.pixbuf = newPixbuf
#image.clear
#image=nil
if i == 10
p "GC"
GC.start
i = 0
end
}
end
ptest`
According https://sourceforge.net/p/ruby-gnome2/mailman/message/8659687/ this shouldn't happen. What can I do to release the memory?
I'm not into Ruby but know some bits of Gtk+.
In C, where you must deal with memory allocations by yourself, you need to unref the pixbuf.
From GtkImage Documentation:
The GtkImage does not assume a reference to the pixbuf; you still need to unref it if you own references.
So, most probably, if Ruby doesn't implement ARC (Automatic Reference Counting on GObjects), you must do something like newPixbuf.unref (not sure about Ruby syntax) right after #image.pixbuf = newPixbuf.
Hope it helps.
Apparently there has been a bug in my Ruby gdk3-Gem. A gem update solved the problem.
Related
So for the life of me I can't figure out which combination of pool.map() or pool.starmap() should be used to make this work as I want. The code below executes correctly, firing off (sequentially!) two instances of runSim and appending the results.
results = []
argsToRun = []
for n in range(2):
random.seed()
env = simpy.Environment() # Create the SimPy environment
env.clockRate = 1e9
argsToRun.append([env, adist, sdist, tdist, 108, intSpeed, runUntil])
for arg_list in argsToRun:
line = runSim(*arg_list)
results.append(line)
I was under the impression that something like the code below would be a simple way to run these instances in parallel (the results are not order-dependent)... but neither of them seem to work. Surely I'm missing something obvious?
with multiprocessing.Pool() as pool:
results = pool.map(runSim, argsToRun)
pool.close()
Nor....
with multiprocessing.Pool() as pool:
results = pool.starmap(runSim, argsToRun)
pool.close()
After lots of frustration and searching, it turns out that some of my parameters were not pickleable. This, combined with a bug in VSCode which prevented the debugger from running made it much harder to diagnose than it should have been.
Switching to the pathos versions of Pool() solved the problem.
for n in range(15):
random.seed()
env = simpy.Environment() # Create the SimPy environment
argsToRun.append((env, adist, sdist, tdist, 108, intSpeed, runUntil))
with pathos.helpers.mp.Pool as pool: ## tada!
results = pool.starmap(runSim, argsToRun)
In an attempt to recreate the getenvironment(..) C function of _winapi.c (direct link) in plain python using ctypes, I'm wondering how the following C code could be translated:
buffer = PyMem_NEW(Py_UCS4, totalsize);
if (! buffer) {
PyErr_NoMemory();
goto error;
}
p = buffer;
end = buffer + totalsize;
for (i = 0; i < envsize; i++) {
PyObject* key = PyList_GET_ITEM(keys, i);
PyObject* value = PyList_GET_ITEM(values, i);
if (!PyUnicode_AsUCS4(key, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(key);
*p++ = '=';
if (!PyUnicode_AsUCS4(value, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(value);
*p++ = '\0';
}
/* add trailing null byte */
*p++ = '\0';
It seems that the function ctypes.create_unicode_buffer(..) (doc, code) is doing something quite close that I could reproduce if only I could have an access to Py_UCS4 C type or be sure of its link to any other type accessible to python through ctypes.
Would c_wchar be a good candidate ?, but it seems I can't make that assumption, as python 2.7 could be compiled in UCS-2 if I'm right (source), and I guess windows is really waiting fo UCS-4 there... even if it seems that ctypes.wintypes.LPWSTR is an alias to c_wchart_p in cPython 2.7 (code).
For this question, it is safe to make the assumption that the target platform is python 2.7 on Windows if that helps.
Context (if it has some importance):
I'm in the process of delving for the first time in ctypes to attempt a plain python fix at cPython 2.7's bug hitting windows subprocess.Popen(..) implementation. This bug is a won't fix. This bug prevents the usage of unicode in command line calls (as executable name or arguments). This is fixed in python 3, so I'm having a go at reverse implementing in plain python the actual cPython3 implementation of the required CreateProcess(..) in _winapi.c which calls in turn getenvironment(..).
This possible workaround was mentionned in the comments of this answer to a question related to subprocess.Popen(..) unicode issues.
This doesn't answer the part in the title about build specifically UCS4 buffer. But it gives a partial answer to the question in bold and manage to create a unicode buffer that seems to work on my current python 2.7 on windows: (so maybe UCS4 is not required).
So we are here taking the assumption that c_wchar is what windows require (if it is UCS4 or UCS2 is not so clear to me yet, and it might have no importance, but I recon having a very light confidence in my knowledge here).
So here is the python code that reproduces the C code as requested in the question:
## creation of buffer of size totalsize
wenv = (c_wchar * totalsize)()
wenv.value = (unicode("").join([
unicode("%s=%s\0") % (key, value)
for k, v in env.items()])) + "\0"
This wenv can then be fed to CreateProcessW and this seems to work.
I am trying to figure out how to work with parallel computing with Julia. The documentation looks great, even for someone like me that has never worked with Parallel Computing (and that does not understand most of the concepts behind the documentation ;)).
Just to mention: I am working in a PC with Ubuntu. It has a 4-core processor.
To run the code I describe below I am calling the julia terminal as:
$ julia -p 4
I am following the documentation here. I am facing some problems with examples described in this section
I am trying to run the following piece of code:
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(advection_shared_chunk!, p, q, u)
end
end
q
end
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile
advection_shared!(q,u)
But I am facing the following error:
ERROR: MethodError: `remotecall_wait` has no method matching remotecall_wait(::Function, ::Int64, ::SharedArray{Float64,3}, ::SharedArray{Float64,3})
Closest candidates are:
remotecall_wait(::LocalProcess, ::Any, ::Any...)
remotecall_wait(::Base.Worker, ::Any, ::Any...)
remotecall_wait(::Integer, ::Any, ::Any...)
in anonymous at task.jl:447
...and 3 other exceptions.
in sync_end at ./task.jl:413
[inlined code] from task.jl:422
in advection_shared! at none:2
What am I doing wrong here? As far as I know I am just reproducing the example in the docs... or not?
Thanks for any help,
Thanks #Daniel Arndt, you found the trick! I was looking at the docs in: http://docs.julialang.org/en/latest/manual/parallel-computing/ I thought it was supposed to be the one relative to Julia 0.4.x (the latest stable version so far) but it seems that it is relative to Julia 0.5.x (the latest version among all versions).
I did the changes you suggested (changed the order and added the functions that were missing) and everything worked like a charm. I will leave here the updated code
# Here's the kernel
#everywhere function advection_chunk!(q, u, irange, jrange, trange)
#show (irange, jrange, trange) # display so we can see what's happening
for t in trange, j in jrange, i in irange
q[i,j,t+1] = q[i,j,t] + u[i,j,t]
end
q
end
# This function retuns the (irange,jrange) indexes assigned to this worker
#everywhere function myrange(q::SharedArray)
idx = indexpids(q)
if idx == 0
# This worker is not assigned a piece
return 1:0, 1:0
end
nchunks = length(procs(q))
splits = [round(Int, s) for s in linspace(0,size(q,2),nchunks+1)]
1:size(q,1), splits[idx]+1:splits[idx+1]
end
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(p, advection_shared_chunk!, q, u)
end
end
q
end
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile
advection_shared!(q,u)
Done!
I don't believe you are doing anything wrong, other than you are likely using a newer version of the docs (or we're seeing different things!).
Lets make sure you're using Julia 0.4.x and these docs: http://docs.julialang.org/en/release-0.4/manual/parallel-computing/
In Julia v0.5.0, the order of the first two parameters for remotecall_wait was changed. Switch the order to remotecall_wait(p, advection_shared_chunk!, q, u) and you should be on to your next error (myrange is not defined, which can be found earlier in the docs)
I want to run two matlab scripts in parallel for a project and communicate between them. The purpose of this is to have one script do image analysis and sending the results to the other which will use it for more calculations (time consuming, but not related to the task of finding stuff in the images). Since both tasks are time consuming, and should preferably be done in real time, I believe that parallelization is necessary.
To get a feel for how this should be done I created a test script to find out how to communicate between the two scripts.
The first script takes a user input using the built in function input, and then using labSend sends it to the other, which recieves it, and prints it.
function [blarg] = inputStuff(blarg)
mpiInit(); %added because of error message, but do not work...
for i=1:2
labBarrier; % added because of error message
inp = input('Enter a number to write');
labSend(inp);
if (inp == 0)
break;
else
i = 1;
end
end
end
function [ blarg ] = testWrite( blarg )
mpiInit(); % added because of error message, but does not help
par = 0;
if ( blarg == 0)
par = 1;
end
for i = 1:10
if (par == 1)
labBarrier
delta = labReceive();
i = 1;
else
delta = input('Enter number to write');
end
if (delta == 0)
break;
end
s = strcat('This lab no', num2str(labindex), '. Delta is = ')
delta
end
end
%%This is the file test_parfor.m
funlist = {#inputStuff, #testWrite};
matlabpool(2);
mpiInit(); % added because of error message, but does not help
parfor i=1:2
funlist{i}(0);
end
matlabpool close;
Then, when the code is run, the following error message appears:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
Error using parallel_function (line 589)
The MPI implementation has not yet been loaded. Please
call mpiInit.
Error stack:
testWrite.m at 11
Error in test_parfor (line 8)
parfor i=1:2
Calling the method mpiInit does not help... (Called as shown in the code above.)
And nowhere in the examples that mathworks have in the documentation, or on their website, show this error or what to do with it.
Any help is appreciated!
You would typically use constructs such as labSend, labRecieve and labBarrier within an spmd block, rather than a parfor block.
parfor is intended for implementing embarrassingly parallel algorithms, in other words algorithms that consist of multiple independent tasks that can be run in parallel, and do not require communication between tasks.
I'm stretching my knowledge here (perhaps someone more expert can correct me), but as I understand things, it does not set up an MPI ring for communication between workers, which is probably the explanation for the (rather uninformative) error message you're getting.
An spmd block enables communication between workers using labSend, labRecieve and labBarrier. There are quite a few examples of using them all in the documentation.
Sam is right that the MPI functionality is not enabled during parfor, only during spmd. You need to do something more like this:
spmd
funlist{labindex}(0);
end
(Sam is also quite right that the error message you saw is pretty unhelpful)
I am aware of:
https://github.com/lsegal/barracuda
Which hasn't been updated since 01/11
And
http://rubyforge.org/projects/ruby-opencl/
Which hasn't been updated since 03/10.
Are these projects dead? Or have they simply not changed because their functioning, and OpenCL/Ruby haven't changed since then. Is anybody using these projects? Any luck?
If not, can you recommend another opencl gem for Ruby? Or how is this sort of call done usually? Just call raw C from Ruby?
You can try opencl_ruby_ffi, it's actively developed (by a colleague of mine) and working well with OpenCL version 1.2. OpenCL 2.0 should also be available soon.
sudo gem install opencl_ruby_ffi
In Khronos forum you can find a quick example that shows how it works:
require 'opencl_ruby_ffi'
# select the first platform/device available
# improve it if you have multiple GPU on your machine
platform = OpenCL::platforms.first
device = platform.devices.first
# prepare the source of GPU kernel
# this is not Ruby but OpenCL C
source = <<EOF
__kernel void addition( float2 alpha, __global const float *x, __global float *y) {\n\
size_t ig = get_global_id(0);\n\
y[ig] = (alpha.s0 + alpha.s1 + x[ig])*0.3333333333333333333f;\n\
}
EOF
# configure OpenCL environment, refer to OCL API if necessary
context = OpenCL::create_context(device)
queue = context.create_command_queue(device, :properties => OpenCL::CommandQueue::PROFILING_ENABLE)
# create and compile the OpenCL C source code
prog = context.create_program_with_source(source)
prog.build
# allocate CPU (=RAM) buffers and
# fill the input one with random values
a_in = NArray.sfloat(65536).random(1.0)
a_out = NArray.sfloat(65536)
# allocate GPU buffers matching the CPU ones
b_in = context.create_buffer(a_in.size * a_in.element_size, :flags => OpenCL::Mem::COPY_HOST_PTR, :host_ptr => a_in)
b_out = context.create_buffer(a_out.size * a_out.element_size)
# create a constant pair of float
f = OpenCL::Float2::new(3.0,2.0)
# trigger the execution of kernel 'addition' on 128 cores
event = prog.addition(queue, [65536], f, b_in, b_out,
:local_work_size => [128])
# #Or if you want to be more OpenCL like:
# k = prog.create_kernel("addition")
# k.set_arg(0, f)
# k.set_arg(1, b_in)
# k.set_arg(2, b_out)
# event = queue.enqueue_NDrange_kernel(k, [65536],:local_work_size => [128])
# tell OCL to transfer the content GPU buffer b_out
# to the CPU memory (a_out), but only after `event` (= kernel execution)
# has completed
queue.enqueue_read_buffer(b_out, a_out, :event_wait_list => [event])
# wait for everything in the command queue to finish
queue.finish
# now a_out contains the result of the addition performed on the GPU
# add some cleanup here ...
# verify that the computation went well
diff = (a_in - a_out*3.0)
65536.times { |i|
raise "Computation error #{i} : #{diff[i]+f.s0+f.s1}" if (diff[i]+f.s0+f.s1).abs > 0.00001
}
puts "Success!"
You may want to package whatever C functionality you would like as a gem. This is pretty straightforward and this way you can wrap all your c logic in a specific namespace that you can reuse in other projects.
http://guides.rubygems.org/c-extensions/
If you want to do high speed calculations with GPU, Cumo / NArray is a good choice. Cumo has the same interface as NArray. Although it is cuda rather than opencl.
https://github.com/sonots/cumo