How limiting background jobs works? - bash

while looking on how to parallelize bash tasks, I've stumbles over a code like this:
for item in "${items[#]}"
do
((i=i%THREADS)); ((i++==0)) && wait
process_item $item &
done
Where process_item is some king of function/program that works with item and the THREADS var contain the maximum number of background processes that can run simultaneously.
Can someone explain to me how this works? I understand that i=i%THREADS ensures that i is between 0 and THREADS-1, and that i++==0 increments i and checks whether it is 0. But is wait bound to all sub processes? Or how does it know that is has to wait until the previous batch stopped processing?

It's an obfuscated way of writing
for item in "${items[#]}"
do
# Every THREADSth job, stop and wait for everything
# to complete.
if (( i % THREADS == 0 )); then
wait
fi
((i++))
process_item $item &
done
It also doesn't actually work terribly well. It doesn't ensure that there are always $THREADS jobs running, only that no more than $THREADS jobs are running at once.

i++==0 checks and increments, not the opposite. wait waits for all currently active child processes. So, each iteration (but the first, thanks to the ((i++==0))) first waits for the process launched by the previous iteration and launches a new process.

Related

Bash parallelization for double for loop

So I have a function that request REST API and that takes in two arguments: instances and dates. I am given a list of instances and a range of dates which need to be iterated with two for loops. One constraint is that the only one instance can be requested at a time.
I have tried using & and wait, and my pseudocode looks like this.
for each date:
for each instance:
do-something "$date" "$instance" &
done
wait
done
This actually works perfectly since only one instance is requested at a time and only progress when all instances are processed and therefore no instance gets requested at the same time.
The problem is that some request for certain instance takes a long time, so other instances that have been processed earlier are idling. How can I solve this problem?
Define a function which will process a given instance for each date sequentially:
for_each_date () {
instance=$1
shift
for d in "$#"; do
some_command "$d" "$instance"
done
}
Now, spawn a background process to run this function for each instance.
dates=(2015-07-21 2015-07-22 2015-07-23) # For example
instances=(inst1 inst2 inst3)
for instance in "${instances[#]}"; do
for_each_date "$instance" "${dates[#]}" &
done
wait
Each background job will run some-command for a different instance, and will never run more than one process at a time, so you meet your first constraint. At the same time, for_each_date starts a new request for its instance as soon as the old one completes, keeping your machine as busy as possible.
With GNU Parallel you would do:
parallel do-something ::: d a t e s ::: i n s t a n c e s

GUI interaction, waiting until the window is open

Is there a bash command that waits for a window to open? Right now I'm doing something along the lines of:
open-program
sleep 100 # Wait for the program to open
send-keyboard-input
Is there a way to have "send-keyboard-input" wait until open-program finishes, eliminating the sleep 100? The time always varies, sometimes it's 90 seconds, sometimes it's 50 second.
Have you tried this?
open-program && send-keyboard-input

Getting Thread not to run until join in ruby

I am getting into ruby and have been using threads for a little while now with out fully understanding them. I notice that when adding a thread to an array and if I add a sleep() command as the first command the thread does not run until I do a join which is mostly what I want. So I have 2 questions.
1.Is that suppose to happen?
2.Is there a better way to do that other then the way I'm doing it. Here is a sample code that I have to show what I'm talking about.
job = Array.new
10.times do |n|
job << Thread.new do
sleep 0.001
puts "done #{n}"
end
end
#job.each do |t|
#t.join
#end
puts "End of script"
Output is
End of script
If I remove the comments output is
done 1
done 0
done 7
done 6
done 5
done 4
done 3
done 2
done 9
done 8
End of script
So I use this now but I don't understand why it does that. Sometimes I notice even doing something like `echo hi` instead of sleep does the trick.
Thanks in advance.
Timing of threads isn't a defined behavior. Once you put them to sleep, they will be put in a queue to be run later. You can't ever expect it to run one way or another.
Your main program doesn't take very long to run, so it is likely to happen to finish before your other threads get picked back up to run again. Really, when you think about it, 0.001 seconds is quite a long time to computer, so spinning off 10 threads in that time is likely to happen -- but even if it takes longer, there is no guarantee the thread will resume immediately after .001 seconds. Often there's really no guarantee it won't start before .001 seconds, either, but sleep calls usually don't end early.
When you add the join calls, you are introducing additional time into your main thread which allows the other threads time to run, so this behavior is expected.

Watir ... difference between sleep and wait

Is there any notable difference between
sleep 10
and
wait_until(10)
They both seem to do the same thing: wait 10 seconds then proceed to the next step
sleep just does nothing for the specified time. wait_until takes a block. It waits until the block evaluates to true or times out. If no block is given they act the same.

How resque checks when to run a job?

I have found the Resque:
https://github.com/elucid/resque-delayed
And I can see that I can schedule delayed Job. My question is, how does it check for delayed jobs? If I have 5000 delayed jobs in one month time, I hope it doesn't check every 10 seconds all delayed jobs.
So how is it being done?
It does not have to check all the delayed jobs. It maintains a sorted set in Redis, the jobs being sorted by their scheduled time. See the code at:
https://github.com/elucid/resque-delayed/blob/master/lib/resque-delayed/resque-delayed.rb
Each time the daemon awakes, only the first item of the set needs to be checked (using a ZRANGEBYSCORE command). The daemon fetches the relevant jobs one by one, until the polling query returns no result, then it sleeps again.
Performance could be further improved by fetching the jobs n by n. It could be implemented using a server-side Lua script as a polling query:
local res = redis.call('ZRANGEBYSCORE',KEYS[1], "-inf", ARGV[1], 'LIMIT', 0, 10 )
if #res > 0 then
redis.call( 'ZREMRANGEBYRANK', KEYS[1], 0, #res-1 )
return res
else
return false
end
In one roundtrip, this script gets 10 jobs (if available), and delete them from the zset. Much better than the 11 ZRANGEBYSCORE and 10 ZREM, currently required by Resque-delayed.

Resources