Parallel simd MUCH slower than serial simd in Julia - parallel-processing
Summary: Scroll down for reproducible example which should run-from-scratch in Julia if you have the packages specified in the using lines. (Note: the ODE has a complex, re-usable structure which is specified in a Gist which is downloaded/included by the script.)
Background: I have to repeatedly solve a large system of ODEs for different initial conditions vectors. In the example below, it is 127 states/ODEs, but it could easily be 1000-2000. I will have to run these 100s-1000s of times for inference, so speed is essential.
The Puzzle: The short version is that, for the serial functions, the #simd version is much faster than the "plain", non-#simd version. But for the parallel versions, the #simd version is much slower -- plus, in this case, the answer, sum_of_solutions, is variable and wrong.
I have this set up so that Julia is started with JULIA_NUM_THREADS=auto julia, in my case this creates 8 cores for 8 threads. Then, I make sure I never have more than 8 jobs spawned at once.
The different calculation times: (runtime, then sum_of_ODE_solutions)
# Output is (runtime, sum_of_solutions)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (1.1, 8.731365050398926)
# (0.878, 8.731365050398926)
# (0.898, 8.731365050398926)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (0.046, 8.731365050398928)
# (0.042, 8.731365050398928)
# (0.046, 8.731365050398928)
parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Faster than serial plain version
# (duration, sum_of_solutions)
# (0.351, 8.731365050398926)
# (0.343, 8.731365050398926)
# (0.366, 8.731365050398926)
parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Dramatically slower than serial simd version, plus wrong sum_of_solutions
# (duration, sum_of_solutions)
# (136.966, 9.61313614002137)
# (141.843, 9.616688089683372)
As you can see, while serial #simd gets the calculation speed down to 0.046 seconds, and while parallel plain is 2.5 times faster than serial plain, when I combine parallelization with the #simd function I get runtimes of 140 seconds, and with variable & wrong answers to boot! Literally the only difference between the two parallelizng functions is using core_op_plain versus core_op_simd for the core ODE solving operation.
It seems like #simd and #spawn must be conflicting somehow? I have the parallelizing function set up to never employ more than the number of CPU threads available. (8 on my machine.)
I am still learning Julia so there is the chance that some smallish change could isolate the #simd calculations and prevent conflicts across threads (if that is what is happening). Any help is very much appreciated!
PS: Reproducible Example. The code below should provide a reproducible example on any Julia session running multiple cores. I also have my versioninfo() etc.:
versioninfo()
notes="""
My setup is:
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: Intel(R) Xeon(R) CPU E5-2697 v2 # 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, ivybridge)
"""
# Startup notes
notes="""
# "If $JULIA_NUM_THREADS is set to auto, then the number of threads will be set to the number of CPU threads."
JULIA_NUM_THREADS=auto julia --startup-file=no
Threads.nthreads(): 8 # Number of CPU threads
"""
using LinearAlgebra # for "I" in: Matrix{Float64}(I, 2, 2)
# https://www.reddit.com/r/Julia/comments/9cfosj/identity_matrix_in_julia_v10/
using Sundials # for CVODE_BDF
using Statistics # for mean(), max()
using DataFrames # for e.g. DataFrame()
using Dates # for e.g. DateTime, Dates.now()
using DifferentialEquations # for ODEProblem
using BenchmarkTools # for #benchmark
using Distributed # for workers
# Check that you have multiple threads
numthreads = Base.Threads.nthreads()
# Download & include the pre-saved model structure/rates (all precalculated for speed; 1.8 MB)
#include("/GitHub/BioGeoJulia.jl/test/model_p_object.jl")
url = "https://gist.githubusercontent.com/nmatzke/ed99ab8f5047794eb25e1fdbd5c43b37/raw/b3e6ddff784bd3521d089642092ba1e3830699c0/model_p_object.jl"
download(url, "model_p_object.jl")
include("model_p_object.jl")
# Load the ODE functions
url = "https://gist.githubusercontent.com/nmatzke/f116258c78bd43ab7a448f07c4290516/raw/24a210261fd2e090b8ed27bc64a59a1ff9ec62cd/simd_vs_spawn_setup_v2.jl"
download(url, "simd_vs_spawn_setup_v2.jl")
include("simd_vs_spawn_setup_v2.jl")
#include("/GitHub/BioGeoJulia.jl/test/simd_vs_spawn_setup_v2.jl")
#include("/GitHub/BioGeoJulia.jl/test/simd_vs_spawn_setup_v3.jl")
# Load the pre-saved model structure/rates (all precalculated for speed; 1.8 MB)
p_Es_v5 = load_ps_127();
# Set up output object
numstates = 127
number_of_solves = 10
solve_results1 = Array{Float64, 2}(undef, number_of_solves, numstates)
solve_results1 .= 0.0
solve_results2 = Array{Float64, 2}(undef, number_of_solves, numstates)
solve_results2 .= 0.0
length(solve_results1)
length(solve_results1[1])
sum(sum.(solve_results1))
# Precalculate the Es for use in the Ds
Es_tspan = (0.0, 60.0)
prob_Es_v7 = DifferentialEquations.ODEProblem(Es_v7_simd_sums, p_Es_v5.uE, Es_tspan, p_Es_v5);
sol_Es_v7 = solve(prob_Es_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=true,
abstol=1e-12, reltol=1e-9);
p_Ds_v7 = (n=p_Es_v5.n, params=p_Es_v5.params, p_indices=p_Es_v5.p_indices, p_TFs=p_Es_v5.p_TFs, uE=p_Es_v5.uE, terms=p_Es_v5.terms, sol_Es_v5=sol_Es_v7);
# Set up ODE inputs
u = collect(repeat([0.0], numstates));
u[2] = 1.0
du = similar(u)
du .= 0.0
p = p_Ds_v7;
t = 1.0
# ODE functions to integrate (single-step; ODE solvers will run this many many times)
#time Ds_v5_tmp(du,u,p,t)
#time Ds_v5_tmp(du,u,p,t)
#time Ds_v7_simd_sums(du,u,p,t)
#time Ds_v7_simd_sums(du,u,p,t)
##btime Ds_v5_tmp(du,u,p,t)
# 7.819 ms (15847 allocations: 1.09 MiB)
##btime Ds_v7_simd_sums(du,u,p,t)
# 155.858 μs (3075 allocations: 68.66 KiB)
tspan = (0.0, 1.0)
prob_Ds_v7 = DifferentialEquations.ODEProblem(Ds_v7_simd_sums, p_Ds_v7.uE, tspan, p_Ds_v7);
sol_Ds_v7 = solve(prob_Ds_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
# This is the core operation; plain version (no #simd)
function core_op_plain(u, tspan, p_Ds_v7)
prob_Ds_v5 = DifferentialEquations.ODEProblem(Ds_v5_tmp, u.+0.0, tspan, p_Ds_v7);
sol_Ds_v5 = solve(prob_Ds_v5, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
return sol_Ds_v5
end
# This is the core operation; #simd version
function core_op_simd(u, tspan, p_Ds_v7)
prob_Ds_v7 = DifferentialEquations.ODEProblem(Ds_v7_simd_sums, u.+0.0, tspan, p_Ds_v7);
sol_Ds_v7 = solve(prob_Ds_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
return sol_Ds_v7
end
#time core_op_plain(u, tspan, p_Ds_v7);
#time core_op_plain(u, tspan, p_Ds_v7);
#time core_op_simd(u, tspan, p_Ds_v7);
#time core_op_simd(u, tspan, p_Ds_v7);
function serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=10)
start_time = Dates.now()
for i in 1:number_of_solves
# Temporary u
solve_results1[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results1[i,i] = 1.0
solve_results1
sol_Ds_v7 = core_op_plain(solve_results1[i,:], tspan, p_Ds_v7)
solve_results1[i,:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)]
# print("\n")
# print(round.(sol_Ds_v7[length(sol_Ds_v7)], digits=3))
end
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results1))
return (duration, sum_of_solutions)
end
function serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=10)
start_time = Dates.now()
for i in 1:number_of_solves
# Temporary u
solve_results1[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results1[i,i] = 1.0
solve_results1
sol_Ds_v7 = core_op_simd(solve_results1[i,:], tspan, p_Ds_v7)
solve_results1[i,:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)]
# print("\n")
# print(round.(sol_Ds_v7[length(sol_Ds_v7)], digits=3))
end
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results1))
return (duration, sum_of_solutions)
end
# Output is (runtime, sum_of_solutions)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (1.1, 8.731365050398926)
# (0.878, 8.731365050398926)
# (0.898, 8.731365050398926)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (0.046, 8.731365050398928)
# (0.042, 8.731365050398928)
# (0.046, 8.731365050398928)
using Distributed
function parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=10)
start_time = Dates.now()
number_of_threads = Base.Threads.nthreads()
curr_numthreads = Base.Threads.nthreads()
# Individual ODE solutions will occur over different timeperiods,
# initial values, and parameters. We'd just like to load up the
# cores for the first jobs in the list, then add jobs as earlier
# jobs finish.
tasks = Any[]
tasks_started_TF = Bool[]
tasks_fetched_TF = Bool[]
task_numbers = Any[]
task_inc = 0
are_we_done = false
current_running_tasks = Any[]
# List the tasks
for i in 1:number_of_solves
# Temporary u
solve_results2[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results2[i,i] = 1.0
task_inc = task_inc + 1
push!(tasks_started_TF, false) # Add a "false" to tasks_started_TF
push!(tasks_fetched_TF, false) # Add a "false" to tasks_fetched_TF
push!(task_numbers, task_inc)
end
# Total number of tasks
num_tasks = length(tasks_fetched_TF)
iteration_number = 0
while(are_we_done == false)
iteration_number = iteration_number+1
# Launch tasks when thread (core) is available
for j in 1:num_tasks
if (tasks_fetched_TF[j] == false)
if (tasks_started_TF[j] == false) && (curr_numthreads > 0)
# Start a task
push!(tasks, Base.Threads.#spawn core_op_plain(solve_results2[j,:], tspan, p_Ds_v7));
curr_numthreads = curr_numthreads-1;
tasks_started_TF[j] = true;
push!(current_running_tasks, task_numbers[j])
end
end
end
# Check for finished tasks
tasks_to_check_TF = ((tasks_started_TF.==true) .+ (tasks_fetched_TF.==false)).==2
if sum(tasks_to_check_TF .== true) > 0
for k in 1:sum(tasks_to_check_TF)
if (tasks_fetched_TF[current_running_tasks[k]] == false)
if (istaskstarted(tasks[k]) == true) && (istaskdone(tasks[k]) == true)
sol_Ds_v7 = fetch(tasks[k]);
solve_results2[current_running_tasks[k],:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)].+0.0
tasks_fetched_TF[current_running_tasks[k]] = true
current_tasknum = current_running_tasks[k]
deleteat!(tasks, k)
deleteat!(current_running_tasks, k)
curr_numthreads = curr_numthreads+1;
print("\nFinished task #")
print(current_tasknum)
print(", current task k=")
print(k)
break # break out of this loop, since you have modified current_running_tasks
end
end
end
end
are_we_done = sum(tasks_fetched_TF) == length(tasks_fetched_TF)
# Test for concluding the while loop
are_we_done && break
end # END while(are_we_done == false)
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results2))
print("\n")
return (duration, sum_of_solutions)
end
function parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=10)
start_time = Dates.now()
number_of_threads = Base.Threads.nthreads()
curr_numthreads = Base.Threads.nthreads()
# Individual ODE solutions will occur over different timeperiods,
# initial values, and parameters. We'd just like to load up the
# cores for the first jobs in the list, then add jobs as earlier
# jobs finish.
tasks = Any[]
tasks_started_TF = Bool[]
tasks_fetched_TF = Bool[]
task_numbers = Any[]
task_inc = 0
are_we_done = false
current_running_tasks = Any[]
# List the tasks
for i in 1:number_of_solves
# Temporary u
solve_results2[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results2[i,i] = 1.0
task_inc = task_inc + 1
push!(tasks_started_TF, false) # Add a "false" to tasks_started_TF
push!(tasks_fetched_TF, false) # Add a "false" to tasks_fetched_TF
push!(task_numbers, task_inc)
end
# Total number of tasks
num_tasks = length(tasks_fetched_TF)
iteration_number = 0
while(are_we_done == false)
iteration_number = iteration_number+1
# Launch tasks when thread (core) is available
for j in 1:num_tasks
if (tasks_fetched_TF[j] == false)
if (tasks_started_TF[j] == false) && (curr_numthreads > 0)
# Start a task
push!(tasks, Base.Threads.#spawn core_op_simd(solve_results2[j,:], tspan, p_Ds_v7))
curr_numthreads = curr_numthreads-1;
tasks_started_TF[j] = true;
push!(current_running_tasks, task_numbers[j])
end
end
end
# Check for finished tasks
tasks_to_check_TF = ((tasks_started_TF.==true) .+ (tasks_fetched_TF.==false)).==2
if sum(tasks_to_check_TF .== true) > 0
for k in 1:sum(tasks_to_check_TF)
if (tasks_fetched_TF[current_running_tasks[k]] == false)
if (istaskstarted(tasks[k]) == true) && (istaskdone(tasks[k]) == true)
sol_Ds_v7 = fetch(tasks[k]);
solve_results2[current_running_tasks[k],:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)].+0.0
tasks_fetched_TF[current_running_tasks[k]] = true
current_tasknum = current_running_tasks[k]
deleteat!(tasks, k)
deleteat!(current_running_tasks, k)
curr_numthreads = curr_numthreads+1;
print("\nFinished task #")
print(current_tasknum)
print(", current task k=")
print(k)
break # break out of this loop, since you have modified current_running_tasks
end
end
end
end
are_we_done = sum(tasks_fetched_TF) == length(tasks_fetched_TF)
# Test for concluding the while loop
are_we_done && break
end # END while(are_we_done == false)
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results2))
print("\n")
return (duration, sum_of_solutions)
end
tspan = (0.0, 1.0)
parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Faster than serial plain version
# (duration, sum_of_solutions)
# (0.351, 8.731365050398926)
# (0.343, 8.731365050398926)
# (0.366, 8.731365050398926)
parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Dramatically slower than serial simd version
# (duration, sum_of_solutions)
# (136.966, 9.61313614002137)
# (141.843, 9.616688089683372)
Thanks again, Nick
Related
Why does the total object count by `ObjectSpace.count_objects` not change?
I get this result (Cf. https://ruby-doc.org/core-2.5.1/ObjectSpace.html#method-c-count_objects): total = ObjectSpace.count_objects[:TOTAL] new_object = "tonytonyjan" ObjectSpace.count_objects[:TOTAL] - total # => 0 total = ObjectSpace.count_objects[:T_STRING] new_object = "tonytonyjan" ObjectSpace.count_objects[:T_STRING] - total # => 0 Please explain why the result is zero. Did new_object die just after the initialization?
Rather rely on each_object to give the status about live objects: def foo total = ObjectSpace.each_object(String).count str = "kiddorails" puts ObjectSpace.each_object(String).count - total end foo #=> 1 Another thing to note: the above code snippet is not fullproof to give the detail about incremented String objects, since GC is enable and can kick in anytime. I would prefer this: def foo GC.enable # enables GC if not enabled GC.start(full_mark: true, immediate_sweep: true, immediate_mark: false) # perform GC if required on current object space GC.disable # disable GC to get the right facts below total = ObjectSpace.each_object(String).count 100.times { "kiddorails" } puts ObjectSpace.each_object(String).count - total end foo #=> 100
scrapy response.xpath() cause memory leaking
i found response.xpath() method leaking memory while using scrapy to write a spider. here is the code: def extract_data(self, response): aomen_host_water = None aomen_pankou = None aomen_guest_water = None sb_host_water = None sb_pankou = None sb_guest_water = None # response.xpath('//div[#id="webmain"]/table[#id="odds"]/tr') # for tr in all_trs: # # cname(company name) # cname = tr.xpath('td[1]/text()').extract() # if len(cname) == 0: # continue # # remove extra space and other stuff # cname = cname[0].split(' ')[0] # if cname == u'澳彩': # aomen_host_water = tr.xpath('td[9]/text()').extract() # if len(aomen_host_water) != 0: # aomen_pankou = tr.xpath('td[10]/text()').extract() # aomen_guest_water = tr.xpath('td[11]/text()').extract() # else: # aomen_host_water = tr.xpath('td[6]/text()').extract() # aomen_pankou = tr.xpath('td[7]/text()').extract() # aomen_guest_water = tr.xpath('td[8]/text()').extract() # elif cname == u'SB': # sb_host_water = tr.xpath('td[9]/text()').extract() # if len(sb_host_water) != 0: # sb_pankou = tr.xpath('td[10]/text()').extract() # sb_guest_water = tr.xpath('td[11]/text()').extract() # else: # sb_host_water = tr.xpath('td[6]/text()').extract() # sb_pankou = tr.xpath('td[7]/text()').extract() # sb_guest_water = tr.xpath('td[8]/text()').extract() # if (aomen_host_water is None) or (aomen_pankou is None) or (aomen_guest_water is None) or \ # (sb_host_water is None) or (sb_pankou is None) or (sb_guest_water is None): # return None # if (len(aomen_host_water) == 0) or (len(aomen_pankou) == 0) or (len(aomen_guest_water) == 0) or \ # (len(sb_host_water) == 0) or (len(sb_pankou) == 0) or (len(sb_guest_water) == 0): # return None # item = YPItem() # item['aomen_host_water'] = float(aomen_host_water[0]) # item['aomen_pankou'] = aomen_pankou[0].encode('utf-8') # float(pankou.pankou2num(aomen_pankou[0])) # item['aomen_guest_water'] = float(aomen_guest_water[0]) # item['sb_host_water'] = float(sb_host_water[0]) # item['sb_pankou'] = sb_pankou[0].encode('utf-8') # float(pankou.pankou2num(sb_pankou[0])) # item['sb_guest_water'] = float(sb_guest_water[0]) item = YPItem() item['aomen_host_water'] = 1.0 item['aomen_pankou'] = '111' # float(pankou.pankou2num(aomen_pankou[0])) item['aomen_guest_water'] = 1.0 item['sb_host_water'] = 1.0 item['sb_pankou'] = '111' # float(pankou.pankou2num(sb_pankou[0])) item['sb_guest_water'] = 1.0 return item here i commented the useful statements and used fake data, spider used about 45M memory, when i uncommented the commented lines, spider used 100+M memory and the memory usage continuously rises. Did somebody met this kind of problem before ?
You might decrease the memory usage by switching to extract_first() instead of extract() which would create unnecessary lists. I would also upgrade scrapy and lxml to the latest versions: pip install --upgrade scrapy pip install --upgrade lxml
how to separate this text into a hash ruby
sorry my bad english, im new i have this document.txt paul gordon,jin kazama,1277,1268,21-12,21-19 yoshimistu,the rock,2020,2092,21-9,21-23,25-27 ... lot more i mean, how to strip each line, and comma sparator, into a hash like this result = { line_num: { name1: "paula wood", name2: "sarah carnley", m1: 1277, m2: 1268, sc1: 21, sc2: 12, sc3: 21, sc4: 19 } } i try to code like this im using text2re for regex here doc = File.read("doc.txt") lines = doc.split("\n") counts = 0 example = {} player1 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))' player2 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))' re = (player1 + player2 ) m = Regexp.new(re, Regexp::IGNORECASE) lines.each do |line| re1='((?:[a-z][a-z]+))' # Word 1 re2='(.)' # Any Single Character 1 re3='((?:[a-z][a-z]+))' # Word 2 re4='(.)' # Any Single Character 2 re5='((?:[a-z][a-z]+))' # Word 3 re6='(.)' # Any Single Character 3 re7='((?:[a-z][a-z]+))' # Word 4 re=(re1+re2+re3+re4+re5+re6+re7) m=Regexp.new(re,Regexp::IGNORECASE); if m.match(line) word1=m.match(line)[1]; c1=m.match(line)[2]; word2=m.match(line)[3]; c2=m.match(line)[4]; word3=m.match(line)[5]; c3=m.match(line)[6]; word4=m.match(line)[7]; counts += 1 example[counts] = word1+word2 puts example end end # (/[a-z].?/) but the output does not match my expectation 1=>"", 2=>"indahdelika", 3=>"masam", ..more
Your data is comma-separated, so use the CSV class instead of trying to roll your own parser. There are dragons waiting for you if you try to split simply using commas. I'd use: require 'csv' data = "paul gordon,jin kazama,1277,1268,21-12,21-19 yoshimistu,the rock,2020,2092,21-9,21-23,25-27 " hash = {} CSV.parse(data).each_with_index do |row, i| name1, name2, m1, m2, sc1_2, sc3_4 = row sc1, sc2 = sc1_2.split('-') sc3, sc4 = sc3_4.split('-') hash[i] = { name1: name1, name2: name2, m1: m1, m2: m2, sc1: sc1, sc2: sc2, sc3: sc3, sc4: sc4, } end Which results in: hash # => {0=> # {:name1=>"paul gordon", # :name2=>"jin kazama", # :m1=>"1277", # :m2=>"1268", # :sc1=>"21", # :sc2=>"12", # :sc3=>"21", # :sc4=>"19"}, # 1=> # {:name1=>"yoshimistu", # :name2=>"the rock", # :m1=>"2020", # :m2=>"2092", # :sc1=>"21", # :sc2=>"9", # :sc3=>"21", # :sc4=>"23"}} Since you're reading from a file, modify the above a bit using the "Reading from a file a line at a time" example in the documentation. If the numerics need to be integers, tweak the hash definition to: hash[i] = { name1: name1, name2: name2, m1: m1.to_i, m2: m2.to_i, sc1: sc1.to_i, sc2: sc2.to_i, sc3: sc3.to_i, sc4: sc4.to_i, } Which results in: # => {0=> # {:name1=>"paul gordon", # :name2=>"jin kazama", # :m1=>1277, # :m2=>1268, # :sc1=>21, # :sc2=>12, # :sc3=>21, # :sc4=>19}, # 1=> # {:name1=>"yoshimistu", # :name2=>"the rock", # :m1=>2020, # :m2=>2092, # :sc1=>21, # :sc2=>9, # :sc3=>21, # :sc4=>23}} # :sc4=>"23"}}
This is another way you could do it. I have made no assumptions about the number of items per line which are to be the values of :namex, :scx or :mx, or the order of those items. Code def hashify(str) str.lines.each_with_index.with_object({}) { |(s,i),h| h[i] = inner_hash(s) } end def inner_hash(s) n = m = sc = 0 s.split(',').each_with_object({}) do |f,g| case f when /[a-zA-Z].*/ g["name#{n += 1}".to_sym] = f when /\-/ g["sc#{sc += 1}".to_sym], g["sc#{sc += 1}".to_sym] = f.split('-').map(&:to_i) else g["m#{m += 1}".to_sym] = f.to_i end end end Example str = "paul gordon,jin kazama,1277,1268,21-12,21-19 yoshimistu,the rock,2020,2092,21-9,21-23,25-27" hashify(str) #=> {0=>{:name1=>"paul gordon", :name2=>"jin kazama", # :m1=>1277, :m2=>1268, # :sc1=>21, :sc2=>12, :sc3=>21, :sc4=>19}, # 1=>{:name1=>"yoshimistu", :name2=>"the rock", # :m1=>2020, :m2=>2092, # :sc1=>21, :sc2=>9, :sc3=>21, :sc4=>23, :sc5=>25, :sc6=>27} # }
How to write code ruby to collect data while run loop condition
I am quit new in ruby and I need your help. Now I want to write ruby code to collect some data while looping. I have 2 code for this work. My objective is collect sum score from text that split from input file. -first, run test_dialog.rb -Second, change input file for this format from AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card to AA 0.88 BB 0.82 CC 0.77 -Then use each text that separate check on dialog condition. If this data appear in dialog ,store point until end of text (AA --> BB --> CC) -Finally get average score. I have problem will separating and use loop for collect point in same time. Please help. Best regard. PS. score will return if match with dialog score of input line 1 should be (0.88+0.82+0.77/3) [match condition 1]. if no match, no score return. Input data AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card BB:0.88:320:800|EE:0.82:1040:1330|FF:0.77:1330:1700 enquire-privilege_card EE:0.88:320:800|QQ:0.82:1040:1330|AA:0.77:1330:1700|RR:0.77:1330:1700|TT:0.77:1330:1700 enquire-privilege_card test_dialog.rb #!/usr/bin/env ruby # encoding: UTF-8 # # Input file: # hyp(with confidence score), ref_tag # # Output: # hyp, ref_tag, hyp_tag, result # require_relative 'dialog' require_relative 'version' unless ARGV.length > 0 puts 'Usage: ruby test_dialog.rb FILENAME [FILENAME2...]' exit(1) end counter = Hash.new{|h,k| h[k]=Hash.new{|h2,k2| h2[k2]=Hash.new{|h3,k3| h3[k3]=0}}} thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] puts %w(hyp ref_tag hyp_tag result).join("\t") ARGV.each do |fname| open(fname, 'r:UTF-8').each do |line| hyp, ref_tag = line.strip.split(/\t/) key = if ref_tag == "(reject)" :reject else :accept end counter[fname][key][:all] += 1 thresholds.each do |threshold| hyp_all = get_response_text(hyp, threshold) hyp_tag = if hyp_all==:reject "(reject)" else hyp_all.split(/,/)[1] end result = ref_tag==hyp_tag counter[fname][key][threshold] += 1 if result puts [hyp.split('|').map{|t| t.split(':')[0]}.join(' '), ref_tag, hyp_tag, result].join("\t") if threshold==0.0 end end end STDERR.puts ["Filename", "Result"].concat(thresholds).join("\t") counter.each do |fname, c| ca_all = c[:accept].delete(:all) cr_all = c[:reject].delete(:all) ca = thresholds.map{|t| c[:accept][t]}.map{|n| ca_all==0 ? "N/A" : '%4.1f' % (n.to_f/ca_all*100) } cr = thresholds.map{|t| c[:reject][t]}.map{|n| cr_all==0 ? "N/A" : '%4.1f' % (n.to_f/cr_all*100) } STDERR.puts [fname, "Correct Accept"].concat(ca).join("\t") STDERR.puts [fname, "Correct Reject"].concat(cr).join("\t") end dialog.rb # -*- coding: utf-8 -*- # # text : AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|DD:0.71:1700:2010|EE:1.00:2070:2390|FF:0.56:320:800|GG:0.12:1330:1700 # def get_response_text text, threshold, dsr_session_id=nil # ... #p "result text >> " + text # Promotion => detail => rate # Promotion IR/IDD => high priority than enquire-promotion # Rate IR/IDD => high priority than enquire-rate # Problem IR/IDD => high priority than enquire-service_problem # Internet IR/IDD => high priority than enquire-internet # Cancel Net => enquire-internet NOT cancel-service # Lost-Stolen => +Broken memu = "" intent = "" prompt = "" intent_th = "" intent_id = "" # strInput = text.gsub(/\s/,'') strInput = text.split('|').map{|t| t.split(':')[0]}.join('') puts ("****strINPUT*****") puts strInput scores = text.split('|').map{|t| t.split(':')[1].to_f} puts ("****SCORE*****") puts scores avg_score = scores.inject(0){|a,x| a+=x} / scores.size puts ("****AVG-Score*****") puts avg_score if avg_score < threshold return :reject end # List of Country country_fname = File.dirname(__FILE__)+"/country_list.txt" country_list = open(country_fname, "r:UTF-8").readlines.map{|line| line.chomp} contry_reg = Regexp.union(country_list) # List of Mobile Type mobile_fname = File.dirname(__FILE__)+"/mobile_list.txt" mobile_list = open(mobile_fname, "r:UTF-8").readlines.map{|line| line.chomp} mobile_reg = Regexp.union(mobile_list) # List of Carrier carrier_fname = File.dirname(__FILE__)+"/carrier_list.txt" carrier_list = open(carrier_fname, "r:UTF-8").readlines.map{|line| line.chomp} carrier_reg = Regexp.union(carrier_list) if (strInput =~ /AA|BB/ and strInput =~ /CC/) intent = "enquire-payment_method" elsif (strInput =~ /EE/) and ("#{$'}" =~ /QQ|RR/) intent = "enquire-balance_amount" elsif (strInput =~ /AA|EE/i) and (strInput =~ /TT/i) intent = "enquire-balance_unit" elsif (strInput =~ /DD|BB|/i) and (strInput =~ /FF|AA/i) intent = "service-balance_amount" end
Parse as follows: str = 'AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card' str.split( /[:|]/ ).select.with_index {| code, i | i % 4 < 2 ; }.join( ' ' ) # => "AA 0.88 BB 0.82 CC 0.77"
How to get a stopwatch program running?
I borrowed some code from a site, but I don't know how to get it to display. class Stopwatch def start #accumulated = 0 unless #accumulated #elapsed = 0 #start = Time.now #mybutton.configure('text' => 'Stop') #mybutton.command { stop } #timer.start end def stop #mybutton.configure('text' => 'Start') #mybutton.command { start } #timer.stop #accumulated += #elapsed end def reset stop #accumulated, #elapsed = 0, 0 #mylabel.configure('text' => '00:00:00.00.000') end def tick #elapsed = Time.now - #start time = #accumulated + #elapsed h = sprintf('%02i', (time.to_i / 3600)) m = sprintf('%02i', ((time.to_i % 3600) / 60)) s = sprintf('%02i', (time.to_i % 60)) mt = sprintf('%02i', ((time - time.to_i)*100).to_i) ms = sprintf('%04i', ((time - time.to_i)*10000).to_i) ms[0..0]='' newtime = "#{h}:#{m}:#{s}.#{mt}.#{ms}" #mylabel.configure('text' => newtime) end end How would I go about getting this running? Thanks
Based upon the additional code rkneufeld posted, this class requires a timer that is specific to Tk. To do it on the console, you could just create a loop that calls tick over and over. Of course, you have to remove all the code that was related to the GUI: class Stopwatch def start #accumulated = 0 unless #accumulated #elapsed = 0 #start = Time.now # #mybutton.configure('text' => 'Stop') # #mybutton.command { stop } # #timer.start end def stop # #mybutton.configure('text' => 'Start') # #mybutton.command { start } # #timer.stop #accumulated += #elapsed end def reset stop #accumulated, #elapsed = 0, 0 # #mylabel.configure('text' => '00:00:00.00.000') end def tick #elapsed = Time.now - #start time = #accumulated + #elapsed h = sprintf('%02i', (time.to_i / 3600)) m = sprintf('%02i', ((time.to_i % 3600) / 60)) s = sprintf('%02i', (time.to_i % 60)) mt = sprintf('%02i', ((time - time.to_i)*100).to_i) ms = sprintf('%04i', ((time - time.to_i)*10000).to_i) ms[0..0]='' newtime = "#{h}:#{m}:#{s}.#{mt}.#{ms}" # #mylabel.configure('text' => newtime) end end watch = Stopwatch.new watch.start 1000000.times do puts watch.tick end You'll end up with output like this: 00:00:00.00.000 00:00:00.00.000 00:00:00.00.000 ... 00:00:00.00.000 00:00:00.00.000 00:00:00.01.160 00:00:00.01.160 ... Not particularly useful, but there it is. Now, if you're looking to do something similar in Shoes, try this tutorial that is very similar.
I believe you have found the example on this site I'm repeating what is already on the site but you are missing: require 'tk' as well as initialization code: def initialize root = TkRoot.new { title 'Tk Stopwatch' } menu_spec = [ [ ['Program'], ['Start', lambda { start } ], ['Stop', lambda { stop } ], ['Exit', lambda { exit } ] ], [ ['Reset'], ['Reset Stopwatch', lambda { reset } ] ] ] #menubar = TkMenubar.new(root, menu_spec, 'tearoff' => false) #menubar.pack('fill'=>'x', 'side'=>'top') #myfont = TkFont.new('size' => 16, 'weight' => 'bold') #mylabel = TkLabel.new(root) #mylabel.configure('text' => '00:00:00.0', 'font' => #myfont) #mylabel.pack('padx' => 10, 'pady' => 10) #mybutton = TkButton.new(root) #mybutton.configure('text' => 'Start') #mybutton.command { start } #mybutton.pack('side'=>'left', 'fill' => 'both') #timer = TkAfter.new(1, -1, proc { tick }) Tk.mainloop end end Stopwatch.new I would suggest reading through the rest of the site to understand what is all going on.
I was searching for a quick and dirty stop watch class to avoid coding such and came upon the site where the original code was posted and this site as well. In the end, I modified the code until it met what I think that I was originally searching for. In case anyone is interested, the version that I have ended up thus far with is as follows (albeit that I have yet to apply it in the application that I am currently updating and for which I want to make use of such functionality). # REFERENCES # 1. http://stackoverflow.com/questions/858970/how-to-get-a-stopwatch-program-running # 2. http://codeidol.com/other/rubyckbk/User-Interface/Creating-a-GUI-Application-with-Tk/ # 3. http://books.google.com.au/books?id=bJkznhZBG6gC&pg=PA806&lpg=PA806&dq=ruby+stopwatch+class&source=bl&ots=AlH2e7oWWJ&sig=KLFR-qvNfBfD8WMrUEbVqMbN_4o&hl=en&ei=WRjOTbbNNo2-uwOkiZGwCg&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEsQ6AEwBw#v=onepage&q=ruby%20stopwatch%20class&f=false # 4. http://4loc.wordpress.com/2008/09/24/formatting-dates-and-floats-in-ruby/ module Utilities class StopWatch def new() #watch_start_time = nil #Time (in seconds) when the stop watch was started (i.e. the start() method was called). #lap_start_time = nil #Time (in seconds) when the current lap started. end #def new def start() myCurrentTime = Time.now() #Current time in (fractional) seconds since the Epoch (January 1, 1970 00:00 UTC) if (!running?) then #watch_start_time = myCurrentTime #lap_start_time = #watch_start_time end #if myCurrentTime - #watch_start_time; end #def start def lap_time_seconds() myCurrentTime = Time.now() myLapTimeSeconds = myCurrentTime - #lap_start_time #lap_start_time = myCurrentTime myLapTimeSeconds end #def lap_time_seconds def stop() myTotalSecondsElapsed = Time.now() - #watch_start_time #watch_start_time = nil myTotalSecondsElapsed end #def stop def running?() !#watch_start_time.nil? end #def end #class StopWatch end #module Utilities def kill_time(aRepeatCount) aRepeatCount.times do #just killing time end #do end #def kill_time elapsed_time_format_string = '%.3f' myStopWatch = Utilities::StopWatch.new() puts 'total time elapsed: ' + elapsed_time_format_string % myStopWatch.start() + ' seconds' kill_time(10000000) puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds' kill_time(20000000) puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds' kill_time(30000000) puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds' puts 'total time elapsed: ' + elapsed_time_format_string % myStopWatch.stop() + ' seconds'
Simple stopwatch script: # pass the number of seconds as the parameter seconds = eval(ARGV[0]).to_i start_time = Time.now loop do elapsed = Time.now - start_time print "\e[D" * 17 print "\033[K" if elapsed > seconds puts "Time's up!" exit end print Time.at(seconds - elapsed).utc.strftime('%H:%M:%S.%3N') sleep(0.05) end Run like this in your terminal (to mark a lap, just tap enter): # 10 is the number of seconds ruby script.rb 10 # you can even do this: ruby script.rb "20*60" # 20 minutes