Replace For-loop to remove blobs (IDL) - performance

I have the following function to identify blobs in an image, and remove them if they are under a certain size.
With the for-loop the removal is of course very slow, if there are a lot of blobs, now my question is, is it possible to replace the for-loop?
function clean_regions, input, max_size
output = input
tmp = size(input)
input_labels = LABEL_REGION(input, /ULONG)
hist = histogram(input_labels, binsize=1, locations=loc, /nan, /l64)
to_remove = loc(where(hist le max_size))
result_map = MAKE_ARRAY(tmp[1:2], /ULONG, VALUE=0)
to_keep = mg_complement(to_remove, n_elements(loc))
for i=0,n_elements(to_keep)-1 do begin
result_map[where(input_labels EQ to_keep[i])] = 1
endfor
output *= result_map
output = boolean(output gt 0)
return, output
END

Related

How to read by the number at each iteration of the loop in Ruby?

How to read by the number at each iteration of the loop? Dynamic work is important, (not once to read the entire line and convert to an array), at each iteration, take one number from the file string and work with it. How to do it right?
input.txt :
5
1 7 5 2 3
Work with 2nd line of the file.
fin = File.open("input.txt", "r")
fout = File.open("output.txt", "w")
n = fin.readline.to_i
heap_min = Heap.new(:min)
heap_max = Heap.new(:max)
for i in 1..n
a = fin.read.to_i #code here <--
heap_max.push(a)
if heap_max.size > heap_min.size
tmp = heap_max.top
heap_max.pop
heap_min.push(tmp)
end
if heap_min.size > heap_max.size
tmp = heap_min.top
heap_min.pop
heap_max.push(tmp)
end
if heap_max.size == heap_min.size
heap_max.top > heap_min.top ? median = heap_min.top : median = heap_max.top
else
median = heap_max.top
end
fout.print(median, " ")
end
If you're 100% sure that your file separate numbers by space you can try this :
a = fin.gets(' ', -1).to_i
Read the 2nd line of a file:
line2 = File.readlines('input.txt')[1]
Convert it to an array of integers:
array = line2.split(' ').map(&:to_i).compact

IDL:How to divide FOR loop in N parts for parallel execution?

I have a time consuming loop of length 300. I would like to execute in parallel.
Pseudocode:
for t=0, 300 do begin
output_data[t] = function(input_data(t))
endfor
• The function() for each iteration is completely the same
• The input_data(t)) is stored in a file
Is possible divide the 300 iterations in to K parallel processes (where k is the CPU number)?
I found split_fot.pro but if I understand correctly it is for divide the different processes in the same nth cicle of loop.
How can I do?
Thank you!!
I have some routines in my library that you could use to do something like the following:
pool = obj_new('MG_Pool', n_processes=k)
x = indgen(300)
output_data = pool->map('my_function', x)
Here, my_function would need to accept an argument i, get the data associated with index I, and apply function to it. The result would then be put into output_data[i].
You can specify the number of processes you want to use for the pool object with the N_PROCESSES keyword or it will just automatically use the number of cores you have available.
The code is in my library, check the src/multiprocessing directory. See the examples/multiprocessing directory for some examples of using it.
You can use IDL_IDLBridge along with object arrays
to create multiple child processes.
Child processes do not inherit variables from main process,
so you may want to use SHMMAP, SHMVAR, SHMUNMAP
to share variables across child processes,
or use SETVAR method of IDL_IDLBridge if memory is not a problem.
As an example, below I create 5 child processes to distribute the for-loop:
dim_input = size(input_data, /dim) ; obtain the data dimension
dim_output = size(output_data, /dim)
shmmap, dimension=dim_input, get_name=seg_input ; set the memory segment
shmmap, dimension=dim_output, get_name=seg_output
shared_input = shmvar(seg_input)
shared_output = shmvar(seg_output)
shared_input[0] = input_data ; assign data to the shared variable
; shared_data[0, 0] if data is 2d
shared_output[0] = output_data
; initialize child processes
procs = objarr(5)
for i = 0, 4 do begin
procs[i] = IDL_IDLBridge(output='')
procs[i].setvar, 'istart', i*60
procs[i].setvar, 'iend', (i+1)*60 - 1
procs[i].setvar, 'seg_input', seg_input
procs[i].setvar, 'seg_output', seg_output
procs[i].setvar, 'dim_input', dim_input
procs[i].setvar, 'dim_output', dim_output
procs[i].execute, 'shmmap, seg_input, dimension=dim_input'
procs[i].execute, 'shmmap, seg_output, dimension=dim_output'
procs[i].execute, 'shared_input = shmvar(seg_input)'
procs[i].execute, 'shared_output = shmvar(seg_output)'
endfor
; execute the for-loop asynchronously
for i = 0, 4 do begin
procs[i].execute, 'for t=istart, iend do ' + $
'shared_output[t] = function(shared_input[t])', $
/nowait
endfor
; wait until all child processes are idle
repeat begin
n_idle = 0
for i = 0, 4 do begin
case procs[i].status() of
0: n_idle++
2: n_idle++
else:
endcase
endfor
wait, 1
endrep until (n_idle eq 5)
; cleanup child processes
for i = 0, 4 do begin
procs[i].cleanup
obj_destroy, procs[i]
endfor
; assign output values back to output_data
; unmap the shared variable
output_data = shared_output[0]
shmunmap, seg_input
shmunmap, seg_output
shared_input = 0
shared_output = 0
You may also want to optimize your function for multi-processing.
Lastly, to prevent multiple accesses to the memory segment,
you can use SEM_CREATE, SEM_LOCK, SEM_RELEASE, SEM_DELETE
functions/procedures provided by IDL.

python Use Pool to create multiple processes but not execute the results

I put all the functions are placed in a class, including the creation of the process of the function and the implementation of the function, in another file to call the function of this class
from multiprocessing import Pool
def initData(self, type):
# create six process to deal with the data
if type == 'train':
data = pd.read_csv('./data/train_merged_8.csv')
elif type == 'test':
data = pd.read_csv('./data/test_merged_2.csv')
modelvec = allWord2Vec('no').getModel()
modelvec_all = allWord2Vec('all').getModel()
modelvec_stop = allWord2Vec('stop').getModel()
p = Pool(6)
count = 0
for i in data.index:
count += 1
p.apply_async(self.valueCal, args=(i, data, modelvec, modelvec_all, modelvec_stop))
if count % 1000 == 0:
print(str(count // 100) + 'h rows of data has been dealed')
p.close()
p.join
def valueCal(self, i, data, modelvec, modelvec_all, modelvec_stop):
# the function run in process
list_con = []
q1 = str(data.get_value(i, 'question1')).split()
q2 = str(data.get_value(i, 'question2')).split()
f1 = self.getF1_union(q1, q2)
f2 = self.getF2_inter(q1, q2)
f3 = self.getF3_sum(q1, q2)
f4_q1 = len(q1)
f4_q2 = len(q2)
f4_rate = f4_q1/f4_q2
q1 = [','.join(str(ve)) for ve in q1]
q2 = [','.join(str(ve)) for ve in q2]
list_con.append('|'.join(q1))
list_con.append('|'.join(q2))
list_con.append(f1)
list_con.append(f2)
list_con.append(f3)
list_con.append(f4_q1)
list_con.append(f4_q2)
list_con.append(f4_rate)
f = open('./data/test.txt', 'a')
f.write('\t'.join(list_con) + '\n')
f.close()
The result appears very soon like this, but I have not even seen the file being created.But when I check the task manager, there are indeed six processes are created and consumed a lot of resources I cpu. And when the program is finished, the file is still not created.
How can i solve this problem?
10h rows of data have been dealed
20h rows of data have been dealed
30h rows of data have been dealed
40h rows of data have been dealed

Ruby - How to subtract numbers of two files and save the result in one of them on a specified position?

I have 2 txt files with different strings and numbers in them splitted with ;
Now I need to subtract the
((number on position 2 in file1) - (number on position 25 in file2)) = result
Now I want to replace the (number on position 2 in file1) with the result.
I tried my code below but it only appends the number in the end of the file and its not the result of the calculation which got appended.
def calc
f1 = File.open("./file1.txt", File::RDWR)
f2 = File.open("./file2.txt", File::RDWR)
f1.flock(File::LOCK_EX)
f2.flock(File::LOCK_EX)
f1.each.zip(f2.each).each do |line, line2|
bg = line.split(";").compact.collect(&:strip)
bd = line2.split(";").compact.collect(&:strip)
n = bd[2].to_i - bg[25].to_i
f2.print bd[2] << n
#puts "#{n}" Only for testing
end
f1.flock(File::LOCK_UN)
f2.flock(File::LOCK_UN)
f1.close && f2.close
end
Use something like this:
lines1 = File.readlines('file1.txt').map(&:to_i)
lines2 = File.readlines('file2.txt').map(&:to_i)
result = lines1.zip(lines2).map do |value1, value2| value1 - value2 }
File.write('file1.txt', result.join(?\n))
This code load all files in memory, then calculate result and write it to first file.
FYI: If you want to use your code just save result to other file (i.e. result.txt) and at the end copy it to original file.

Lua: how use all tables in table

positions = {
--table 1
[1] = {pos = {fromPosition = {x=1809, y=317, z=8},toPosition = {x=1818, y=331, z=8}}, m = {"100 monster"}},
--table 2
[2] = {pos = {fromPosition = {x=1809, y=317, z=8},toPosition = {x=1818, y=331, z=8}}, m = {"100 monster"}},
-- table3
[3] = {pos = {fromPosition = {x=1809, y=317, z=8},toPosition = {x=1818, y=331, z=8}}, m = {"100 monster"}}
}
tb = positions[?]--what need place here?
for _,x in pairs(tb.m) do --function
for s = 1, tonumber(x:match("%d+")) do
pos = {x = math.random(tb.pos.fromPosition.x, tb.pos.toPosition.x), y = math.random(tb.pos.fromPosition.y, tb1.pos.toPosition.y), z = tb.pos.fromPosition.z}
doCreateMonster(x:match("%s(.+)"), pos)
end
end
Here the problem, i use tb = positions[1], and it only for one table in "positions" table. But how apply this function for all tables in this table?
I don't know Lua very well but you could loop over the table:
for i = 0, table.getn(positions), 1 do
tb = positions[i]
...
end
Sources :
http://lua.gts-stolberg.de/en/schleifen.php and http://www.lua.org/pil/19.1.html
You need to iterate over positions with a numerical for.
Note that, unlike Antoine Lassauzay's answer, the loop starts at 1 and not 0, and uses the # operator instead of table.getn (deprecated function in Lua 5.1, removed in Lua 5.2).
for i=1,#positions do
tb = positions[i]
...
end
use the pairs() built-in. there isn't any reason to do a numeric for loop here.
for index, position in pairs(positions) do
tb = positions[index]
-- tb is now exactly the same value as variable 'position'
end

Resources