Parallel simd MUCH slower than serial simd in Julia - parallel-processing

Summary: Scroll down for reproducible example which should run-from-scratch in Julia if you have the packages specified in the using lines. (Note: the ODE has a complex, re-usable structure which is specified in a Gist which is downloaded/included by the script.)
Background: I have to repeatedly solve a large system of ODEs for different initial conditions vectors. In the example below, it is 127 states/ODEs, but it could easily be 1000-2000. I will have to run these 100s-1000s of times for inference, so speed is essential.
The Puzzle: The short version is that, for the serial functions, the #simd version is much faster than the "plain", non-#simd version. But for the parallel versions, the #simd version is much slower -- plus, in this case, the answer, sum_of_solutions, is variable and wrong.
I have this set up so that Julia is started with JULIA_NUM_THREADS=auto julia, in my case this creates 8 cores for 8 threads. Then, I make sure I never have more than 8 jobs spawned at once.
The different calculation times: (runtime, then sum_of_ODE_solutions)
# Output is (runtime, sum_of_solutions)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (1.1, 8.731365050398926)
# (0.878, 8.731365050398926)
# (0.898, 8.731365050398926)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (0.046, 8.731365050398928)
# (0.042, 8.731365050398928)
# (0.046, 8.731365050398928)
parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Faster than serial plain version
# (duration, sum_of_solutions)
# (0.351, 8.731365050398926)
# (0.343, 8.731365050398926)
# (0.366, 8.731365050398926)
parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Dramatically slower than serial simd version, plus wrong sum_of_solutions
# (duration, sum_of_solutions)
# (136.966, 9.61313614002137)
# (141.843, 9.616688089683372)
As you can see, while serial #simd gets the calculation speed down to 0.046 seconds, and while parallel plain is 2.5 times faster than serial plain, when I combine parallelization with the #simd function I get runtimes of 140 seconds, and with variable & wrong answers to boot! Literally the only difference between the two parallelizng functions is using core_op_plain versus core_op_simd for the core ODE solving operation.
It seems like #simd and #spawn must be conflicting somehow? I have the parallelizing function set up to never employ more than the number of CPU threads available. (8 on my machine.)
I am still learning Julia so there is the chance that some smallish change could isolate the #simd calculations and prevent conflicts across threads (if that is what is happening). Any help is very much appreciated!
PS: Reproducible Example. The code below should provide a reproducible example on any Julia session running multiple cores. I also have my versioninfo() etc.:
versioninfo()
notes="""
My setup is:
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: Intel(R) Xeon(R) CPU E5-2697 v2 # 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, ivybridge)
"""
# Startup notes
notes="""
# "If $JULIA_NUM_THREADS is set to auto, then the number of threads will be set to the number of CPU threads."
JULIA_NUM_THREADS=auto julia --startup-file=no
Threads.nthreads(): 8 # Number of CPU threads
"""
using LinearAlgebra # for "I" in: Matrix{Float64}(I, 2, 2)
# https://www.reddit.com/r/Julia/comments/9cfosj/identity_matrix_in_julia_v10/
using Sundials # for CVODE_BDF
using Statistics # for mean(), max()
using DataFrames # for e.g. DataFrame()
using Dates # for e.g. DateTime, Dates.now()
using DifferentialEquations # for ODEProblem
using BenchmarkTools # for #benchmark
using Distributed # for workers
# Check that you have multiple threads
numthreads = Base.Threads.nthreads()
# Download & include the pre-saved model structure/rates (all precalculated for speed; 1.8 MB)
#include("/GitHub/BioGeoJulia.jl/test/model_p_object.jl")
url = "https://gist.githubusercontent.com/nmatzke/ed99ab8f5047794eb25e1fdbd5c43b37/raw/b3e6ddff784bd3521d089642092ba1e3830699c0/model_p_object.jl"
download(url, "model_p_object.jl")
include("model_p_object.jl")
# Load the ODE functions
url = "https://gist.githubusercontent.com/nmatzke/f116258c78bd43ab7a448f07c4290516/raw/24a210261fd2e090b8ed27bc64a59a1ff9ec62cd/simd_vs_spawn_setup_v2.jl"
download(url, "simd_vs_spawn_setup_v2.jl")
include("simd_vs_spawn_setup_v2.jl")
#include("/GitHub/BioGeoJulia.jl/test/simd_vs_spawn_setup_v2.jl")
#include("/GitHub/BioGeoJulia.jl/test/simd_vs_spawn_setup_v3.jl")
# Load the pre-saved model structure/rates (all precalculated for speed; 1.8 MB)
p_Es_v5 = load_ps_127();
# Set up output object
numstates = 127
number_of_solves = 10
solve_results1 = Array{Float64, 2}(undef, number_of_solves, numstates)
solve_results1 .= 0.0
solve_results2 = Array{Float64, 2}(undef, number_of_solves, numstates)
solve_results2 .= 0.0
length(solve_results1)
length(solve_results1[1])
sum(sum.(solve_results1))
# Precalculate the Es for use in the Ds
Es_tspan = (0.0, 60.0)
prob_Es_v7 = DifferentialEquations.ODEProblem(Es_v7_simd_sums, p_Es_v5.uE, Es_tspan, p_Es_v5);
sol_Es_v7 = solve(prob_Es_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=true,
abstol=1e-12, reltol=1e-9);
p_Ds_v7 = (n=p_Es_v5.n, params=p_Es_v5.params, p_indices=p_Es_v5.p_indices, p_TFs=p_Es_v5.p_TFs, uE=p_Es_v5.uE, terms=p_Es_v5.terms, sol_Es_v5=sol_Es_v7);
# Set up ODE inputs
u = collect(repeat([0.0], numstates));
u[2] = 1.0
du = similar(u)
du .= 0.0
p = p_Ds_v7;
t = 1.0
# ODE functions to integrate (single-step; ODE solvers will run this many many times)
#time Ds_v5_tmp(du,u,p,t)
#time Ds_v5_tmp(du,u,p,t)
#time Ds_v7_simd_sums(du,u,p,t)
#time Ds_v7_simd_sums(du,u,p,t)
##btime Ds_v5_tmp(du,u,p,t)
# 7.819 ms (15847 allocations: 1.09 MiB)
##btime Ds_v7_simd_sums(du,u,p,t)
# 155.858 μs (3075 allocations: 68.66 KiB)
tspan = (0.0, 1.0)
prob_Ds_v7 = DifferentialEquations.ODEProblem(Ds_v7_simd_sums, p_Ds_v7.uE, tspan, p_Ds_v7);
sol_Ds_v7 = solve(prob_Ds_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
# This is the core operation; plain version (no #simd)
function core_op_plain(u, tspan, p_Ds_v7)
prob_Ds_v5 = DifferentialEquations.ODEProblem(Ds_v5_tmp, u.+0.0, tspan, p_Ds_v7);
sol_Ds_v5 = solve(prob_Ds_v5, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
return sol_Ds_v5
end
# This is the core operation; #simd version
function core_op_simd(u, tspan, p_Ds_v7)
prob_Ds_v7 = DifferentialEquations.ODEProblem(Ds_v7_simd_sums, u.+0.0, tspan, p_Ds_v7);
sol_Ds_v7 = solve(prob_Ds_v7, CVODE_BDF(linear_solver=:GMRES), save_everystep=false, abstol=1e-12, reltol=1e-9);
return sol_Ds_v7
end
#time core_op_plain(u, tspan, p_Ds_v7);
#time core_op_plain(u, tspan, p_Ds_v7);
#time core_op_simd(u, tspan, p_Ds_v7);
#time core_op_simd(u, tspan, p_Ds_v7);
function serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=10)
start_time = Dates.now()
for i in 1:number_of_solves
# Temporary u
solve_results1[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results1[i,i] = 1.0
solve_results1
sol_Ds_v7 = core_op_plain(solve_results1[i,:], tspan, p_Ds_v7)
solve_results1[i,:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)]
# print("\n")
# print(round.(sol_Ds_v7[length(sol_Ds_v7)], digits=3))
end
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results1))
return (duration, sum_of_solutions)
end
function serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=10)
start_time = Dates.now()
for i in 1:number_of_solves
# Temporary u
solve_results1[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results1[i,i] = 1.0
solve_results1
sol_Ds_v7 = core_op_simd(solve_results1[i,:], tspan, p_Ds_v7)
solve_results1[i,:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)]
# print("\n")
# print(round.(sol_Ds_v7[length(sol_Ds_v7)], digits=3))
end
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results1))
return (duration, sum_of_solutions)
end
# Output is (runtime, sum_of_solutions)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_plain_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (1.1, 8.731365050398926)
# (0.878, 8.731365050398926)
# (0.898, 8.731365050398926)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (duration, sum_of_solutions)
# (0.046, 8.731365050398928)
# (0.042, 8.731365050398928)
# (0.046, 8.731365050398928)
using Distributed
function parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=10)
start_time = Dates.now()
number_of_threads = Base.Threads.nthreads()
curr_numthreads = Base.Threads.nthreads()
# Individual ODE solutions will occur over different timeperiods,
# initial values, and parameters. We'd just like to load up the
# cores for the first jobs in the list, then add jobs as earlier
# jobs finish.
tasks = Any[]
tasks_started_TF = Bool[]
tasks_fetched_TF = Bool[]
task_numbers = Any[]
task_inc = 0
are_we_done = false
current_running_tasks = Any[]
# List the tasks
for i in 1:number_of_solves
# Temporary u
solve_results2[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results2[i,i] = 1.0
task_inc = task_inc + 1
push!(tasks_started_TF, false) # Add a "false" to tasks_started_TF
push!(tasks_fetched_TF, false) # Add a "false" to tasks_fetched_TF
push!(task_numbers, task_inc)
end
# Total number of tasks
num_tasks = length(tasks_fetched_TF)
iteration_number = 0
while(are_we_done == false)
iteration_number = iteration_number+1
# Launch tasks when thread (core) is available
for j in 1:num_tasks
if (tasks_fetched_TF[j] == false)
if (tasks_started_TF[j] == false) && (curr_numthreads > 0)
# Start a task
push!(tasks, Base.Threads.#spawn core_op_plain(solve_results2[j,:], tspan, p_Ds_v7));
curr_numthreads = curr_numthreads-1;
tasks_started_TF[j] = true;
push!(current_running_tasks, task_numbers[j])
end
end
end
# Check for finished tasks
tasks_to_check_TF = ((tasks_started_TF.==true) .+ (tasks_fetched_TF.==false)).==2
if sum(tasks_to_check_TF .== true) > 0
for k in 1:sum(tasks_to_check_TF)
if (tasks_fetched_TF[current_running_tasks[k]] == false)
if (istaskstarted(tasks[k]) == true) && (istaskdone(tasks[k]) == true)
sol_Ds_v7 = fetch(tasks[k]);
solve_results2[current_running_tasks[k],:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)].+0.0
tasks_fetched_TF[current_running_tasks[k]] = true
current_tasknum = current_running_tasks[k]
deleteat!(tasks, k)
deleteat!(current_running_tasks, k)
curr_numthreads = curr_numthreads+1;
print("\nFinished task #")
print(current_tasknum)
print(", current task k=")
print(k)
break # break out of this loop, since you have modified current_running_tasks
end
end
end
end
are_we_done = sum(tasks_fetched_TF) == length(tasks_fetched_TF)
# Test for concluding the while loop
are_we_done && break
end # END while(are_we_done == false)
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results2))
print("\n")
return (duration, sum_of_solutions)
end
function parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=10)
start_time = Dates.now()
number_of_threads = Base.Threads.nthreads()
curr_numthreads = Base.Threads.nthreads()
# Individual ODE solutions will occur over different timeperiods,
# initial values, and parameters. We'd just like to load up the
# cores for the first jobs in the list, then add jobs as earlier
# jobs finish.
tasks = Any[]
tasks_started_TF = Bool[]
tasks_fetched_TF = Bool[]
task_numbers = Any[]
task_inc = 0
are_we_done = false
current_running_tasks = Any[]
# List the tasks
for i in 1:number_of_solves
# Temporary u
solve_results2[i,:] .= 0.0
# Change the ith state from 0.0 to 1.0
solve_results2[i,i] = 1.0
task_inc = task_inc + 1
push!(tasks_started_TF, false) # Add a "false" to tasks_started_TF
push!(tasks_fetched_TF, false) # Add a "false" to tasks_fetched_TF
push!(task_numbers, task_inc)
end
# Total number of tasks
num_tasks = length(tasks_fetched_TF)
iteration_number = 0
while(are_we_done == false)
iteration_number = iteration_number+1
# Launch tasks when thread (core) is available
for j in 1:num_tasks
if (tasks_fetched_TF[j] == false)
if (tasks_started_TF[j] == false) && (curr_numthreads > 0)
# Start a task
push!(tasks, Base.Threads.#spawn core_op_simd(solve_results2[j,:], tspan, p_Ds_v7))
curr_numthreads = curr_numthreads-1;
tasks_started_TF[j] = true;
push!(current_running_tasks, task_numbers[j])
end
end
end
# Check for finished tasks
tasks_to_check_TF = ((tasks_started_TF.==true) .+ (tasks_fetched_TF.==false)).==2
if sum(tasks_to_check_TF .== true) > 0
for k in 1:sum(tasks_to_check_TF)
if (tasks_fetched_TF[current_running_tasks[k]] == false)
if (istaskstarted(tasks[k]) == true) && (istaskdone(tasks[k]) == true)
sol_Ds_v7 = fetch(tasks[k]);
solve_results2[current_running_tasks[k],:] .= sol_Ds_v7.u[length(sol_Ds_v7.u)].+0.0
tasks_fetched_TF[current_running_tasks[k]] = true
current_tasknum = current_running_tasks[k]
deleteat!(tasks, k)
deleteat!(current_running_tasks, k)
curr_numthreads = curr_numthreads+1;
print("\nFinished task #")
print(current_tasknum)
print(", current task k=")
print(k)
break # break out of this loop, since you have modified current_running_tasks
end
end
end
end
are_we_done = sum(tasks_fetched_TF) == length(tasks_fetched_TF)
# Test for concluding the while loop
are_we_done && break
end # END while(are_we_done == false)
end_time = Dates.now()
duration = (end_time - start_time).value / 1000.0
sum_of_solutions = sum(sum.(solve_results2))
print("\n")
return (duration, sum_of_solutions)
end
tspan = (0.0, 1.0)
parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Faster than serial plain version
# (duration, sum_of_solutions)
# (0.351, 8.731365050398926)
# (0.343, 8.731365050398926)
# (0.366, 8.731365050398926)
parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Dramatically slower than serial simd version
# (duration, sum_of_solutions)
# (136.966, 9.61313614002137)
# (141.843, 9.616688089683372)
Thanks again, Nick

Related

Why does the total object count by `ObjectSpace.count_objects` not change?

I get this result (Cf. https://ruby-doc.org/core-2.5.1/ObjectSpace.html#method-c-count_objects):
total = ObjectSpace.count_objects[:TOTAL]
new_object = "tonytonyjan"
ObjectSpace.count_objects[:TOTAL] - total # => 0
total = ObjectSpace.count_objects[:T_STRING]
new_object = "tonytonyjan"
ObjectSpace.count_objects[:T_STRING] - total # => 0
Please explain why the result is zero. Did new_object die just after the initialization?
Rather rely on each_object to give the status about live objects:
def foo
total = ObjectSpace.each_object(String).count
str = "kiddorails"
puts ObjectSpace.each_object(String).count - total
end
foo
#=> 1
Another thing to note: the above code snippet is not fullproof to give the detail about incremented String objects, since GC is enable and can kick in anytime. I would prefer this:
def foo
GC.enable # enables GC if not enabled
GC.start(full_mark: true, immediate_sweep: true, immediate_mark: false) # perform GC if required on current object space
GC.disable # disable GC to get the right facts below
total = ObjectSpace.each_object(String).count
100.times { "kiddorails" }
puts ObjectSpace.each_object(String).count - total
end
foo #=> 100

scrapy response.xpath() cause memory leaking

i found response.xpath() method leaking memory while using scrapy to write a spider. here is the code:
def extract_data(self, response):
aomen_host_water = None
aomen_pankou = None
aomen_guest_water = None
sb_host_water = None
sb_pankou = None
sb_guest_water = None
# response.xpath('//div[#id="webmain"]/table[#id="odds"]/tr')
# for tr in all_trs:
# # cname(company name)
# cname = tr.xpath('td[1]/text()').extract()
# if len(cname) == 0:
# continue
# # remove extra space and other stuff
# cname = cname[0].split(' ')[0]
# if cname == u'澳彩':
# aomen_host_water = tr.xpath('td[9]/text()').extract()
# if len(aomen_host_water) != 0:
# aomen_pankou = tr.xpath('td[10]/text()').extract()
# aomen_guest_water = tr.xpath('td[11]/text()').extract()
# else:
# aomen_host_water = tr.xpath('td[6]/text()').extract()
# aomen_pankou = tr.xpath('td[7]/text()').extract()
# aomen_guest_water = tr.xpath('td[8]/text()').extract()
# elif cname == u'SB':
# sb_host_water = tr.xpath('td[9]/text()').extract()
# if len(sb_host_water) != 0:
# sb_pankou = tr.xpath('td[10]/text()').extract()
# sb_guest_water = tr.xpath('td[11]/text()').extract()
# else:
# sb_host_water = tr.xpath('td[6]/text()').extract()
# sb_pankou = tr.xpath('td[7]/text()').extract()
# sb_guest_water = tr.xpath('td[8]/text()').extract()
# if (aomen_host_water is None) or (aomen_pankou is None) or (aomen_guest_water is None) or \
# (sb_host_water is None) or (sb_pankou is None) or (sb_guest_water is None):
# return None
# if (len(aomen_host_water) == 0) or (len(aomen_pankou) == 0) or (len(aomen_guest_water) == 0) or \
# (len(sb_host_water) == 0) or (len(sb_pankou) == 0) or (len(sb_guest_water) == 0):
# return None
# item = YPItem()
# item['aomen_host_water'] = float(aomen_host_water[0])
# item['aomen_pankou'] = aomen_pankou[0].encode('utf-8') # float(pankou.pankou2num(aomen_pankou[0]))
# item['aomen_guest_water'] = float(aomen_guest_water[0])
# item['sb_host_water'] = float(sb_host_water[0])
# item['sb_pankou'] = sb_pankou[0].encode('utf-8') # float(pankou.pankou2num(sb_pankou[0]))
# item['sb_guest_water'] = float(sb_guest_water[0])
item = YPItem()
item['aomen_host_water'] = 1.0
item['aomen_pankou'] = '111' # float(pankou.pankou2num(aomen_pankou[0]))
item['aomen_guest_water'] = 1.0
item['sb_host_water'] = 1.0
item['sb_pankou'] = '111' # float(pankou.pankou2num(sb_pankou[0]))
item['sb_guest_water'] = 1.0
return item
here i commented the useful statements and used fake data, spider used about 45M memory, when i uncommented the commented lines, spider used 100+M memory and the memory usage continuously rises. Did somebody met this kind of problem before ?
You might decrease the memory usage by switching to extract_first() instead of extract() which would create unnecessary lists.
I would also upgrade scrapy and lxml to the latest versions:
pip install --upgrade scrapy
pip install --upgrade lxml

how to separate this text into a hash ruby

sorry my bad english, im new
i have this document.txt
paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
... lot more
i mean, how to strip each line, and comma sparator, into a hash like this
result = {
line_num: { name1: "paula wood", name2: "sarah carnley", m1: 1277, m2: 1268, sc1: 21, sc2: 12, sc3: 21, sc4: 19 }
}
i try to code like this
im using text2re for regex here
doc = File.read("doc.txt")
lines = doc.split("\n")
counts = 0
example = {}
player1 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
player2 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
re = (player1 + player2 )
m = Regexp.new(re, Regexp::IGNORECASE)
lines.each do |line|
re1='((?:[a-z][a-z]+))' # Word 1
re2='(.)' # Any Single Character 1
re3='((?:[a-z][a-z]+))' # Word 2
re4='(.)' # Any Single Character 2
re5='((?:[a-z][a-z]+))' # Word 3
re6='(.)' # Any Single Character 3
re7='((?:[a-z][a-z]+))' # Word 4
re=(re1+re2+re3+re4+re5+re6+re7)
m=Regexp.new(re,Regexp::IGNORECASE);
if m.match(line)
word1=m.match(line)[1];
c1=m.match(line)[2];
word2=m.match(line)[3];
c2=m.match(line)[4];
word3=m.match(line)[5];
c3=m.match(line)[6];
word4=m.match(line)[7];
counts += 1
example[counts] = word1+word2
puts example
end
end
# (/[a-z].?/)
but the output does not match my expectation
1=>"", 2=>"indahdelika", 3=>"masam",
..more
Your data is comma-separated, so use the CSV class instead of trying to roll your own parser. There are dragons waiting for you if you try to split simply using commas.
I'd use:
require 'csv'
data = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
"
hash = {}
CSV.parse(data).each_with_index do |row, i|
name1, name2, m1, m2, sc1_2, sc3_4 = row
sc1, sc2 = sc1_2.split('-')
sc3, sc4 = sc3_4.split('-')
hash[i] = {
name1: name1,
name2: name2,
m1: m1,
m2: m2,
sc1: sc1,
sc2: sc2,
sc3: sc3,
sc4: sc4,
}
end
Which results in:
hash
# => {0=>
# {:name1=>"paul gordon",
# :name2=>"jin kazama",
# :m1=>"1277",
# :m2=>"1268",
# :sc1=>"21",
# :sc2=>"12",
# :sc3=>"21",
# :sc4=>"19"},
# 1=>
# {:name1=>"yoshimistu",
# :name2=>"the rock",
# :m1=>"2020",
# :m2=>"2092",
# :sc1=>"21",
# :sc2=>"9",
# :sc3=>"21",
# :sc4=>"23"}}
Since you're reading from a file, modify the above a bit using the "Reading from a file a line at a time" example in the documentation.
If the numerics need to be integers, tweak the hash definition to:
hash[i] = {
name1: name1,
name2: name2,
m1: m1.to_i,
m2: m2.to_i,
sc1: sc1.to_i,
sc2: sc2.to_i,
sc3: sc3.to_i,
sc4: sc4.to_i,
}
Which results in:
# => {0=>
# {:name1=>"paul gordon",
# :name2=>"jin kazama",
# :m1=>1277,
# :m2=>1268,
# :sc1=>21,
# :sc2=>12,
# :sc3=>21,
# :sc4=>19},
# 1=>
# {:name1=>"yoshimistu",
# :name2=>"the rock",
# :m1=>2020,
# :m2=>2092,
# :sc1=>21,
# :sc2=>9,
# :sc3=>21,
# :sc4=>23}}
# :sc4=>"23"}}
This is another way you could do it. I have made no assumptions about the number of items per line which are to be the values of :namex, :scx or :mx, or the order of those items.
Code
def hashify(str)
str.lines.each_with_index.with_object({}) { |(s,i),h| h[i] = inner_hash(s) }
end
def inner_hash(s)
n = m = sc = 0
s.split(',').each_with_object({}) do |f,g|
case f
when /[a-zA-Z].*/
g["name#{n += 1}".to_sym] = f
when /\-/
g["sc#{sc += 1}".to_sym], g["sc#{sc += 1}".to_sym] = f.split('-').map(&:to_i)
else
g["m#{m += 1}".to_sym] = f.to_i
end
end
end
Example
str = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27"
hashify(str)
#=> {0=>{:name1=>"paul gordon", :name2=>"jin kazama",
# :m1=>1277, :m2=>1268,
# :sc1=>21, :sc2=>12, :sc3=>21, :sc4=>19},
# 1=>{:name1=>"yoshimistu", :name2=>"the rock",
# :m1=>2020, :m2=>2092,
# :sc1=>21, :sc2=>9, :sc3=>21, :sc4=>23, :sc5=>25, :sc6=>27}
# }

How to write code ruby to collect data while run loop condition

I am quit new in ruby and I need your help.
Now I want to write ruby code to collect some data while looping.
I have 2 code for this work.
My objective is collect sum score from text that split from input file.
-first, run test_dialog.rb
-Second, change input file for this format
from
AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card
to
AA 0.88
BB 0.82
CC 0.77
-Then use each text that separate check on dialog condition. If this data appear in dialog ,store point until end of text (AA --> BB --> CC)
-Finally get average score.
I have problem will separating and use loop for collect point in same time.
Please help.
Best regard.
PS.
score will return if match with dialog
score of input line 1 should be (0.88+0.82+0.77/3) [match condition 1].
if no match, no score return.
Input data
AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card
BB:0.88:320:800|EE:0.82:1040:1330|FF:0.77:1330:1700 enquire-privilege_card
EE:0.88:320:800|QQ:0.82:1040:1330|AA:0.77:1330:1700|RR:0.77:1330:1700|TT:0.77:1330:1700 enquire-privilege_card
test_dialog.rb
#!/usr/bin/env ruby
# encoding: UTF-8
#
# Input file:
# hyp(with confidence score), ref_tag
#
# Output:
# hyp, ref_tag, hyp_tag, result
#
require_relative 'dialog'
require_relative 'version'
unless ARGV.length > 0
puts 'Usage: ruby test_dialog.rb FILENAME [FILENAME2...]'
exit(1)
end
counter = Hash.new{|h,k| h[k]=Hash.new{|h2,k2| h2[k2]=Hash.new{|h3,k3| h3[k3]=0}}}
thresholds = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
puts %w(hyp ref_tag hyp_tag result).join("\t")
ARGV.each do |fname|
open(fname, 'r:UTF-8').each do |line|
hyp, ref_tag = line.strip.split(/\t/)
key = if ref_tag == "(reject)"
:reject
else
:accept
end
counter[fname][key][:all] += 1
thresholds.each do |threshold|
hyp_all = get_response_text(hyp, threshold)
hyp_tag = if hyp_all==:reject
"(reject)"
else
hyp_all.split(/,/)[1]
end
result = ref_tag==hyp_tag
counter[fname][key][threshold] += 1 if result
puts [hyp.split('|').map{|t| t.split(':')[0]}.join(' '),
ref_tag, hyp_tag, result].join("\t") if threshold==0.0
end
end
end
STDERR.puts ["Filename", "Result"].concat(thresholds).join("\t")
counter.each do |fname, c|
ca_all = c[:accept].delete(:all)
cr_all = c[:reject].delete(:all)
ca = thresholds.map{|t| c[:accept][t]}.map{|n| ca_all==0 ? "N/A" : '%4.1f' % (n.to_f/ca_all*100) }
cr = thresholds.map{|t| c[:reject][t]}.map{|n| cr_all==0 ? "N/A" : '%4.1f' % (n.to_f/cr_all*100) }
STDERR.puts [fname, "Correct Accept"].concat(ca).join("\t")
STDERR.puts [fname, "Correct Reject"].concat(cr).join("\t")
end
dialog.rb
# -*- coding: utf-8 -*-
#
# text : AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|DD:0.71:1700:2010|EE:1.00:2070:2390|FF:0.56:320:800|GG:0.12:1330:1700
#
def get_response_text text, threshold, dsr_session_id=nil
# ...
#p "result text >> " + text
# Promotion => detail => rate
# Promotion IR/IDD => high priority than enquire-promotion
# Rate IR/IDD => high priority than enquire-rate
# Problem IR/IDD => high priority than enquire-service_problem
# Internet IR/IDD => high priority than enquire-internet
# Cancel Net => enquire-internet NOT cancel-service
# Lost-Stolen => +Broken
memu = ""
intent = ""
prompt = ""
intent_th = ""
intent_id = ""
# strInput = text.gsub(/\s/,'')
strInput = text.split('|').map{|t| t.split(':')[0]}.join('')
puts ("****strINPUT*****")
puts strInput
scores = text.split('|').map{|t| t.split(':')[1].to_f}
puts ("****SCORE*****")
puts scores
avg_score = scores.inject(0){|a,x| a+=x} / scores.size
puts ("****AVG-Score*****")
puts avg_score
if avg_score < threshold
return :reject
end
# List of Country
country_fname = File.dirname(__FILE__)+"/country_list.txt"
country_list = open(country_fname, "r:UTF-8").readlines.map{|line| line.chomp}
contry_reg = Regexp.union(country_list)
# List of Mobile Type
mobile_fname = File.dirname(__FILE__)+"/mobile_list.txt"
mobile_list = open(mobile_fname, "r:UTF-8").readlines.map{|line| line.chomp}
mobile_reg = Regexp.union(mobile_list)
# List of Carrier
carrier_fname = File.dirname(__FILE__)+"/carrier_list.txt"
carrier_list = open(carrier_fname, "r:UTF-8").readlines.map{|line| line.chomp}
carrier_reg = Regexp.union(carrier_list)
if (strInput =~ /AA|BB/ and strInput =~ /CC/)
intent = "enquire-payment_method"
elsif (strInput =~ /EE/) and ("#{$'}" =~ /QQ|RR/)
intent = "enquire-balance_amount"
elsif (strInput =~ /AA|EE/i) and (strInput =~ /TT/i)
intent = "enquire-balance_unit"
elsif (strInput =~ /DD|BB|/i) and (strInput =~ /FF|AA/i)
intent = "service-balance_amount"
end
Parse as follows:
str = 'AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700 enquire-privilege_card'
str.split( /[:|]/ ).select.with_index {| code, i | i % 4 < 2 ; }.join( ' ' )
# => "AA 0.88 BB 0.82 CC 0.77"

How to get a stopwatch program running?

I borrowed some code from a site, but I don't know how to get it to display.
class Stopwatch
def start
#accumulated = 0 unless #accumulated
#elapsed = 0
#start = Time.now
#mybutton.configure('text' => 'Stop')
#mybutton.command { stop }
#timer.start
end
def stop
#mybutton.configure('text' => 'Start')
#mybutton.command { start }
#timer.stop
#accumulated += #elapsed
end
def reset
stop
#accumulated, #elapsed = 0, 0
#mylabel.configure('text' => '00:00:00.00.000')
end
def tick
#elapsed = Time.now - #start
time = #accumulated + #elapsed
h = sprintf('%02i', (time.to_i / 3600))
m = sprintf('%02i', ((time.to_i % 3600) / 60))
s = sprintf('%02i', (time.to_i % 60))
mt = sprintf('%02i', ((time - time.to_i)*100).to_i)
ms = sprintf('%04i', ((time - time.to_i)*10000).to_i)
ms[0..0]=''
newtime = "#{h}:#{m}:#{s}.#{mt}.#{ms}"
#mylabel.configure('text' => newtime)
end
end
How would I go about getting this running?
Thanks
Based upon the additional code rkneufeld posted, this class requires a timer that is specific to Tk. To do it on the console, you could just create a loop that calls tick over and over. Of course, you have to remove all the code that was related to the GUI:
class Stopwatch
def start
#accumulated = 0 unless #accumulated
#elapsed = 0
#start = Time.now
# #mybutton.configure('text' => 'Stop')
# #mybutton.command { stop }
# #timer.start
end
def stop
# #mybutton.configure('text' => 'Start')
# #mybutton.command { start }
# #timer.stop
#accumulated += #elapsed
end
def reset
stop
#accumulated, #elapsed = 0, 0
# #mylabel.configure('text' => '00:00:00.00.000')
end
def tick
#elapsed = Time.now - #start
time = #accumulated + #elapsed
h = sprintf('%02i', (time.to_i / 3600))
m = sprintf('%02i', ((time.to_i % 3600) / 60))
s = sprintf('%02i', (time.to_i % 60))
mt = sprintf('%02i', ((time - time.to_i)*100).to_i)
ms = sprintf('%04i', ((time - time.to_i)*10000).to_i)
ms[0..0]=''
newtime = "#{h}:#{m}:#{s}.#{mt}.#{ms}"
# #mylabel.configure('text' => newtime)
end
end
watch = Stopwatch.new
watch.start
1000000.times do
puts watch.tick
end
You'll end up with output like this:
00:00:00.00.000
00:00:00.00.000
00:00:00.00.000
...
00:00:00.00.000
00:00:00.00.000
00:00:00.01.160
00:00:00.01.160
...
Not particularly useful, but there it is. Now, if you're looking to do something similar in Shoes, try this tutorial that is very similar.
I believe you have found the example on this site
I'm repeating what is already on the site but you are missing:
require 'tk'
as well as initialization code:
def initialize
root = TkRoot.new { title 'Tk Stopwatch' }
menu_spec = [
[
['Program'],
['Start', lambda { start } ],
['Stop', lambda { stop } ],
['Exit', lambda { exit } ]
],
[
['Reset'], ['Reset Stopwatch', lambda { reset } ]
]
]
#menubar = TkMenubar.new(root, menu_spec, 'tearoff' => false)
#menubar.pack('fill'=>'x', 'side'=>'top')
#myfont = TkFont.new('size' => 16, 'weight' => 'bold')
#mylabel = TkLabel.new(root)
#mylabel.configure('text' => '00:00:00.0', 'font' => #myfont)
#mylabel.pack('padx' => 10, 'pady' => 10)
#mybutton = TkButton.new(root)
#mybutton.configure('text' => 'Start')
#mybutton.command { start }
#mybutton.pack('side'=>'left', 'fill' => 'both')
#timer = TkAfter.new(1, -1, proc { tick })
Tk.mainloop
end
end
Stopwatch.new
I would suggest reading through the rest of the site to understand what is all going on.
I was searching for a quick and dirty stop watch class to avoid coding such and came upon the site where the original code was posted and this site as well.
In the end, I modified the code until it met what I think that I was originally searching for.
In case anyone is interested, the version that I have ended up thus far with is as follows (albeit that I have yet to apply it in the application that I am currently updating and for which I want to make use of such functionality).
# REFERENCES
# 1. http://stackoverflow.com/questions/858970/how-to-get-a-stopwatch-program-running
# 2. http://codeidol.com/other/rubyckbk/User-Interface/Creating-a-GUI-Application-with-Tk/
# 3. http://books.google.com.au/books?id=bJkznhZBG6gC&pg=PA806&lpg=PA806&dq=ruby+stopwatch+class&source=bl&ots=AlH2e7oWWJ&sig=KLFR-qvNfBfD8WMrUEbVqMbN_4o&hl=en&ei=WRjOTbbNNo2-uwOkiZGwCg&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEsQ6AEwBw#v=onepage&q=ruby%20stopwatch%20class&f=false
# 4. http://4loc.wordpress.com/2008/09/24/formatting-dates-and-floats-in-ruby/
module Utilities
class StopWatch
def new()
#watch_start_time = nil #Time (in seconds) when the stop watch was started (i.e. the start() method was called).
#lap_start_time = nil #Time (in seconds) when the current lap started.
end #def new
def start()
myCurrentTime = Time.now() #Current time in (fractional) seconds since the Epoch (January 1, 1970 00:00 UTC)
if (!running?) then
#watch_start_time = myCurrentTime
#lap_start_time = #watch_start_time
end #if
myCurrentTime - #watch_start_time;
end #def start
def lap_time_seconds()
myCurrentTime = Time.now()
myLapTimeSeconds = myCurrentTime - #lap_start_time
#lap_start_time = myCurrentTime
myLapTimeSeconds
end #def lap_time_seconds
def stop()
myTotalSecondsElapsed = Time.now() - #watch_start_time
#watch_start_time = nil
myTotalSecondsElapsed
end #def stop
def running?()
!#watch_start_time.nil?
end #def
end #class StopWatch
end #module Utilities
def kill_time(aRepeatCount)
aRepeatCount.times do
#just killing time
end #do
end #def kill_time
elapsed_time_format_string = '%.3f'
myStopWatch = Utilities::StopWatch.new()
puts 'total time elapsed: ' + elapsed_time_format_string % myStopWatch.start() + ' seconds'
kill_time(10000000)
puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds'
kill_time(20000000)
puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds'
kill_time(30000000)
puts 'lap time: ' + elapsed_time_format_string % myStopWatch.lap_time_seconds() + ' seconds'
puts 'total time elapsed: ' + elapsed_time_format_string % myStopWatch.stop() + ' seconds'
Simple stopwatch script:
# pass the number of seconds as the parameter
seconds = eval(ARGV[0]).to_i
start_time = Time.now
loop do
elapsed = Time.now - start_time
print "\e[D" * 17
print "\033[K"
if elapsed > seconds
puts "Time's up!"
exit
end
print Time.at(seconds - elapsed).utc.strftime('%H:%M:%S.%3N')
sleep(0.05)
end
Run like this in your terminal (to mark a lap, just tap enter):
# 10 is the number of seconds
ruby script.rb 10
# you can even do this:
ruby script.rb "20*60" # 20 minutes

Resources