I'm running some Ruby scripts concurrently using Grosser/Parallel.
During each concurrent test I want to add up the number of times a particular thing has happened, then display that number.
Let's say:
def main
$this_happened = 0
do_this_in_parallel
puts $this_happened
end
def do_this_in_parallel
Parallel.each(...) {
$this_happened += 1
}
end
The final value after do_this_in_parallel has finished will always be 0
I'd like to know why this happens.
How can I get the desired result which would be $this_happenend > 0?
Thanks.
This doesn't work because separate processes have separate memory spaces: setting variables etc in one process has no effect on what happens in the other process.
However you can return a result from your block (because under the hood parallel sets up pipes so that the processes can be fed input/return results). For example you could do this
counts = Parallel.map(...) do
#the return value of the block should
#be the number of times the event occurred
end
Then just sum the counts to get your total count (eg counts.reduce(:+)). You might also want to read up on map-reduce for more information about this way of parallelising work
I have never used parallel but the documentation seems to suggest that something like this might work.
Parallel.each(..., :finish => lambda {|*_| $this_happened += 1}) { do_work }
Related
I have a Ruby Discord (discordrb) bot written to manage D&D characters. I notice when multiple players submit the same command, at the same time, the results they each receive are not independent. The request of one player (assigning a weapon to their character) ends up being assigned to other characters who submitted the same request at the same time. I expected each request to be executed separately, in sequence. How do I prevent crossing requests?
bot.message(contains:"$Wset") do |event|
inputStr = event.content; # this should contain "$Wset#" where # is a single digit
check_user_or_nick(event); pIndex = nil; #fetch the value of #user & set pIndex
(0..(#player.length-1)).each do |y| #find the #player pIndex within the array using 5 char of #user
if (#player[y][0].index(#user.slice(0,5)) == 0) then pIndex = y; end; #finds player Index Value (integer or nil)
end;
weaponInt = Integer(inputStr.slice(5,1)) rescue false; #will detect integer or non integer input
if (pIndex != nil) && (weaponInt != false) then;
if weaponInt < 6 then;
#player[pIndex][1]=weaponInt;
say = #player[pIndex][0].to_s + " weapon damage has be set to " + #weapon[(#player[pIndex][1])].to_s;
else;
say = "Sorry, $Wset requires this format: $Wset? where ? is a single number ( 0 to 5 )";
end;
else
say = "Sorry, $Wset requires this format: $Wset? where ? is a single number ( 0 to 5 )";
end;
event.respond say;
end;
To avoid race conditions in multithreaded code like this, the main thing you want to look for are side effects.
Think about the bot.message(contains:"$Wset") do |event| block as a mini program or a thread. Everything in here should be self contained - there should be no way for it to effect any other threads.
Looking through your code initially, what I'm searching for are any shared variables. These produce a race condition if they are read/written by multiple threads at the same time.
In this case, there are 2 obvious offenders - #player and #user. These should be refactored to local variables rather than instance variables. Define them within the block so they don't affect any other scope, for example:
# Note, for this to work, you will have to change
# the method definition to return [player, user]
player, user = check_user_or_nick(event)
Sometimes, making side effects from threads is unavoidable (say you wanted to make a counter for how many times the thread was run). To prevent race conditions in these scenarios, a Mutex is generally the solution but also sometimes a distributed lock if the code is being run on multiple machines. However, from the code you've shown, it doesn't look like you need either of these things here.
let intArray = [5]
intArray.allSatisfy{$0<0} //false, of course, but why 2 times?
There is no more operators, as far as I can see...
It's not telling you how many times the block was executed, but rather how many outputs were called on that line. Since the block returns false and the function allSatisfy returns false, that's 2 outputs in 1 line. You'll notice that no matter the size of the array you get the same value, and if you expand the code, i.e.
intArray.allSatisfy {
$0 < 0
}
you don't see 2 times.
Playgrounds doesn't seem to give a counter for executions of closures/functions passed as parameters, rather only for regular for-loops instead.
So I'm trying to iterate over the list of partitions of something, say 1:n for some n between 13 and 21. The code that I ideally want to run looks something like this:
valid_num = #parallel (+) for p in partitions(1:n)
int(is_valid(p))
end
println(valid_num)
This would use the #parallel for to map-reduce my problem. For example, compare this to the example in the Julia documentation:
nheads = #parallel (+) for i=1:200000000
Int(rand(Bool))
end
However, if I try my adaptation of the loop, I get the following error:
ERROR: `getindex` has no method matching getindex(::SetPartitions{UnitRange{Int64}}, ::Int64)
in anonymous at no file:1433
in anonymous at multi.jl:1279
in run_work_thunk at multi.jl:621
in run_work_thunk at multi.jl:630
in anonymous at task.jl:6
which I think is because I am trying to iterate over something that is not of the form 1:n (EDIT: I think it's because you cannot call p[3] if p=partitions(1:n)).
I've tried using pmap to solve this, but because the number of partitions can get really big, really quickly (there are more than 2.5 million partitions of 1:13, and when I get to 1:21 things will be huge), constructing such a large array becomes an issue. I left it running over night and it still didn't finish.
Does anyone have any advice for how I can efficiently do this in Julia? I have access to a ~30 core computer and my task seems easily parallelizable, so I would be really grateful if anyone knows a good way to do this in Julia.
Thank you so much!
The below code gives 511, the number of partitions of size 2 of a set of 10.
using Iterators
s = [1,2,3,4,5,6,7,8,9,10]
is_valid(p) = length(p)==2
valid_num = #parallel (+) for i = 1:30
sum(map(is_valid, takenth(chain(1:29,drop(partitions(s), i-1)), 30)))
end
This solution combines the takenth, drop, and chain iterators to get the same effect as the take_every iterator below under PREVIOUS ANSWER. Note that in this solution, every process must compute every partition. However, because each process uses a different argument to drop, no two processes will ever call is_valid on the same partition.
Unless you want to do a lot of math to figure out how to actually skip partitions, there is no way to avoid computing partitions sequentially on at least one process. I think Simon's answer does this on one process and distributes the partitions. Mine asks each worker process to compute the partitions itself, which means the computation is being duplicated. However, it is being duplicated in parallel, which (if you actually have 30 processors) will not cost you time.
Here is a resource on how iterators over partitions are actually computed: http://www.informatik.uni-ulm.de/ni/Lehre/WS03/DMM/Software/partitions.pdf.
PREVIOUS ANSWER (More complicated than necessary)
I noticed Simon's answer while writing mine. Our solutions seem similar to me, except mine uses iterators to avoid storing partitions in memory. I'm not sure which would actually be faster for what size sets, but I figure it's good to have both options. Assuming it takes you significantly longer to compute is_valid than to compute the partitions themselves, you can do something like this:
s = [1,2,3,4]
is_valid(p) = length(p)==2
valid_num = #parallel (+) for i = 1:30
foldl((x,y)->(x + int(is_valid(y))), 0, take_every(partitions(s), i-1, 30))
end
which gives me 7, the number of partitions of size 2 for a set of 4. The take_every function returns an iterator that returns every 30th partition starting with the ith. Here is the code for that:
import Base: start, done, next
immutable TakeEvery{Itr}
itr::Itr
start::Any
value::Any
flag::Bool
skip::Int64
end
function take_every(itr, offset, skip)
value, state = Nothing, start(itr)
for i = 1:(offset+1)
if done(itr, state)
return TakeEvery(itr, state, value, false, skip)
end
value, state = next(itr, state)
end
if done(itr, state)
TakeEvery(itr, state, value, true, skip)
else
TakeEvery(itr, state, value, false, skip)
end
end
function start{Itr}(itr::TakeEvery{Itr})
itr.value, itr.start, itr.flag
end
function next{Itr}(itr::TakeEvery{Itr}, state)
value, state_, flag = state
for i=1:itr.skip
if done(itr.itr, state_)
return state[1], (value, state_, false)
end
value, state_ = next(itr.itr, state_)
end
if done(itr.itr, state_)
state[1], (value, state_, !flag)
else
state[1], (value, state_, false)
end
end
function done{Itr}(itr::TakeEvery{Itr}, state)
done(itr.itr, state[2]) && !state[3]
end
One approach would be to divide the problem up into pieces that are not too big to realize and then process the items within each piece in parallel, e.g. as follows:
function my_take(iter,state,n)
i = n
arr = Array[]
while !done(iter,state) && (i>0)
a,state = next(iter,state)
push!(arr,a)
i = i-1
end
return arr, state
end
function get_part(npart,npar)
valid_num = 0
p = partitions(1:npart)
s = start(p)
while !done(p,s)
arr,s = my_take(p,s,npar)
valid_num += #parallel (+) for a in arr
length(a)
end
end
return valid_num
end
valid_num = #time get_part(10,30)
I was going to use the take() method to realize up to npar items from the iterator but take() appears to be deprecated so I've included my own implementation which I've called my_take(). The getPart() function therefore uses my_take() to obtain up to npar partitions at a time and carry out a calculation on them. In this case, the calculation just adds up their lengths, because I don't have the code for the OP's is_valid() function. get_part() then returns the result.
Because the length() calculation isn't very time-consuming, this code is actually slower when run on parallel processors than it is on a single processor:
$ julia -p 1 parpart.jl
elapsed time: 10.708567515 seconds (373025568 bytes allocated, 6.79% gc time)
$ julia -p 2 parpart.jl
elapsed time: 15.70633439 seconds (548394872 bytes allocated, 9.14% gc time)
Alternatively, pmap() could be used on each piece of the problem instead of the parallel for loop.
With respect to the memory issue, realizing 30 items from partitions(1:10) took nearly 1 gigabyte of memory on my PC when I ran Julia with 4 worker processes so I expect realizing even a small subset of partitions(1:21) will require a great deal of memory. It may be desirable to estimate how much memory would be needed to see if it would be at all possible before trying such a computation.
With respect to the computation time, note that:
julia> length(partitions(1:10))
115975
julia> length(partitions(1:21))
474869816156751
... so even efficient parallel processing on 30 cores might not be enough to make the larger problem solvable in a reasonable time.
Before anything, I have read all the answers of Why doesn't Ruby support i++ or i—? and understood why. Please note that this is not just another discussion topic about whether to have it or not.
What I'm really after is a more elegant solution for the situation that made me wonder and research about ++/-- in Ruby. I've looked up loops, each, each_with_index and things alike but I couldn't find a better solution for this specific situation.
Less talk, more code:
# Does the first request to Zendesk API, fetching *first page* of results
all_tickets = zd_client.tickets.incremental_export(1384974614)
# Initialises counter variable (please don't kill me for this, still learning! :D )
counter = 1
# Loops result pages
loop do
# Loops each ticket on the paged result
all_tickets.all do |ticket, page_number|
# For debug purposes only, I want to see an incremental by each ticket
p "#{counter} P#{page_number} #{ticket.id} - #{ticket.created_at} | #{ticket.subject}"
counter += 1
end
# Fetches next page, if any
all_tickets.next unless all_tickets.last_page?
# Breaks outer loop if last_page?
break if all_tickets.last_page?
end
For now, I need counter for debug purposes only - it's not a big deal at all - but my curiosity typed this question itself: is there a better (more beautiful, more elegant) solution for this? Having a whole line just for counter += 1 seems pretty dull. Just as an example, having "#{counter++}" when printing the string would be much simpler (for readability sake, at least).
I can't simply use .each's index because it's a nested loop, and it would reset at each page (outer loop).
Any thoughts?
BTW: This question has nothing to do with Zendesk API whatsoever. I've just used it to better illustrate my situation.
To me, counter += 1 is a fine way to express incrementing the counter.
You can start your counter at 0 and then get the effect you wanted by writing:
p "#{counter += 1} ..."
But I generally wouldn't recommend this because people do not expect side effects like changing a variable to happen inside string interpolation.
If you are looking for something more elegant, you should make an Enumerator that returns integers one at a time, each time you call next on the enumerator.
nums = Enumerator.new do |y|
c = 0
y << (c += 1) while true
end
nums.next # => 1
nums.next # => 2
nums.next # => 3
Instead of using Enumerator.new in the code above, you could just write:
nums = 1.upto(Float::INFINITY)
As mentioned by B Seven each_with_index will work, but you can keep the page_number, as long all_tickets is a container of tuples as it must be to be working right now.
all_tickets.each_with_index do |ticket, page_number, i|
#stuff
end
Where i is the index. If you have more than ticket and page_number inside each element of all_tickets you continue putting them, just remember that the index is the extra one and shall stay in the end.
Could be I oversimplified your example but you could calculate a counter from your inner and outer range like this.
all_tickets = *(1..10)
inner_limit = all_tickets.size
outer_limit = 5000
1.upto(outer_limit) do |outer_counter|
all_tickets.each_with_index do |ticket, inner_counter|
p [(outer_counter*inner_limit)+inner_counter, outer_counter, inner_counter, ticket]
end
# some conditional to break out, in your case the last_page? method
break if outer_counter > 3
end
all_tickets.each_with_index(1) do |ticket, i|
I'm not sure where page_number is coming from...
See Ruby Docs.
I know this code is not optimal, any ideas on how to improve it?
job_and_cost_code_found = false
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, clean_cost_code].each do |row|
job_and_cost_code_found = true
end
if job_and_cost_code_found == false then
info = linenum + "," + id + ",,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
add_to_exception_output_file(info)
end
You're breaking a lot of simple rules here.
Don't select what you don't use.
You select a number of columns, then completely ignore the result data. What you probably want is a count:
SELECT COUNT(*) AS cost_code_count FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?'
Then you'll get one row that will have either a zero or non-zero value in it. Save this into a variable like:
job_and_cost_codes_found = timberline_db[...][0]['cost_code_count']
Don't compare against false unless you need to differentiate between that and nil
In Ruby only two things evaluate as false, nil and false. Most of the time you will not be concerned about the difference. On rare occasions you might want to have different logic for set true, set false or not set (nil), and only then would you test so specifically.
However, keep in mind that 0 is not a false value, so you will need to compare against that.
Taking into account the previous optimization, your if could be:
if job_and_cost_codes_found == 0
# ...
end
Don't use then or other bits of redundant syntax
Most Ruby style-guides spurn useless syntax like then, just as they recommend avoiding for and instead use the Enumerable class which is far more flexible.
Manipulate data, not strings
You're assembling some kind of CSV-like line in the end there. Ideally you'd be using the built-in CSV library to do the correct encoding, and libraries like that want data, not a string they'd have to parse.
One step closer to that is this:
line = [
linenum,
id,
nil,
"Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
].join(',')
add_to_exception_output_file(line)
You'd presumably replace join(',') with the proper CSV encoding method that applies here. The library is more efficient when you can compile all of the data ahead of time into an array-of-arrays, so I'd recommend doing that if this is the end goal.
For example:
lines = [ ]
# ...
if (...)
# Append an array to the lines to write to the CSV file.
lines << [ ... ]
end
Keep your data in a standard structure like an Array, a Hash, or a custom object, until you're prepared to commit it to its final formatted or encoded form. That way you can perform additional operations on it if you need to do things like filtering.
It's hard to refactor this when I'm not exactly sure what it's supposed to be doing, but assuming that you want to log an error when there's no entry matching a job & code pair, here's what I've come up with:
def fetch_by_job_and_cost_code(job, cost_code)
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, cost_code]
end
if fetch_by_job_and_cost_code(job, clean_cost_code).none?
add_to_exception_output_file "#{linenum},#{id},,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
end