Ruby Threads: progress bar during copy - ruby

I'm pretty new to Ruby Threads, so could someone let me know what I'm doing wrong here?
require 'fileutils'
require 'zip'
require 'rubygems'
require 'progressbar'
oraclePath = "\\\\server\\Oracle Client\\Oracle_11gR2\\win64_11gR2_client.zip"
begin
tmpDir = Dir.mktmpdir("ora-")
progress = Thread.new(){
Thread.current[:name] = "FileProgress"
sourceFileSize = File.size("#{oraclePath}")
batch_bytes = ( in_size / 100 ).ceil
total = 0
p_bar = ProgressBar.new('Copying', 100)
buffer = "#{oraclePath}".sysread(batch_bytes)
while total < sourceFileSize do
"#{tmpDir}".syswrite(buffer)
p_bar.inc
total += batch_bytes
if (sourceFileSize - total) < batch_bytes
batch_bytes = (sourceFileSize - total)
end
buffer = "#{oraclePath}".sysread(batch_bytes)
end
p_bar.finish
}
progress.run
puts "#{tmpDir}"
FileUtils.cp_r("#{oraclePath}","#{tmpDir}")
Zip::File.open("#{tmpDir}/win64_11gR2_client.zip") do |zipfile|
`unzip -j #{zipfile} -d #{dir}`
#zipfile.each do |file|
#zipfile.extract(file, "#{tmpDir}")
#end
end
ensure
# remove the temp directories
FileUtils.remove_entry_secure tmpDir
end
The copying works, but the thread doesn't - I can't even step into it; it just skips it entirely.

A Ruby Thread will start running the moment it's instantiated with Thread.new, so in your example the copy begins immediately and the line progress.run isn't necessary.
You would only need to call run if the thread itself had stopped (i.e. called stop on itself while waiting for further instructions).
As reference, you can find more information here: http://ruby-doc.org/core-2.0.0/Thread.html

Related

How to check if Dir size changes during Watir wait.until

I have a method that waits for a chrome download to start, using Watir. However, I'd like to simplify and respecify this to the point where it simply checks if the directory size increases. I'm assuming this is going to require me to save the directory's size at the beginning of the block, and then wait for the Dir size to be equal to that number + 1.
def wait_for_download
dl_dir = Dir["#{Dir.pwd}/downloads/*"].to_s
Watir::Wait.until { !dl_dir.include?(".crdownload") }
end
This is just a couple of functions you can add in your initializers or whatever.
def get_file_size_in_mb(path)
File.size(path).to_f / 10240000.0
end
def find_all_files_inside(folder_path)
Dir.glob("#{folder_path}/**/*")
end
def calculate_size_of_folder_contents(folder_path)
mb = 0.0
find_all_files_inside(folder_path).each do |fn|
mb += get_file_size_in_mb(fn)
end# ^ could have used `inject` here
mb
end
def wait_until_folder_size_changes(folder_path, seconds=2)
while true
size0 = calculate_size_of_folder_contents(folder_path)
sleep seconds
size1 = calculate_size_of_folder_contents(folder_path)
break if (size1-size0) > 0
end
end
Haven't tested, but seems functionally sound
You could also easily monkey code this into watir itself

Ruby Thread & Mutex : why does my code failed to fetch JSON in sequence?

I wrote a crawler which uses 8 threads to download JSON from the Internet:
#encoding: utf-8
require 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'
$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads
$cnt = 0 # number of lines in this COMMIT to database
db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000
def fetch(http, url, timeout = 10)
# ...
end
def parsePrice( i, db)
ss = fetch(Net::HTTP.start('p.3.cn',80), 'http://p.3.cn/prices/get?skuid=J_'+i.to_s)
doc = JSON.parse(ss)[0]
puts "processing "+i.to_s
STDOUT.flush
begin
$mutex.synchronize {
$cnt = $cnt+1
db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
if $cnt > 20
db.execute('COMMIT')
db.execute('BEGIN')
$cnt = 0
end
}
rescue SQLite3::ConstraintException
warn("duplicate id: "+i.to_s)
$cntMutex.synchronize {
$threadCnt -= 1;
}
Thread.terminate
rescue NoMethodError
warn("Matching failed")
rescue
raise
ensure
end
$cntMutex.synchronize {
$threadCnt -= 1;
}
end
puts "will now start from " + start.to_s()
db.execute("BEGIN")
Thread.new {
for ii in start..12000000 do
sleep 0.1 while $threadCnt > 7
$cntMutex.synchronize {
$threadCnt += 1;
}
Thread.new {
parsePrice( ii, db)
}
end
db.execute('COMMIT')
} . join
Then I created a database named price.db:
sqlite3 > create table prices (id INT PRIMATY KEY, price REAL);
To make my code thread-safe, db, $cnt, $threadCnt are all protected by $mutex or $cntMutex.
However, when I tried to run this script, the following messages were printed:
[lz#lz crawl]$ ruby priceCrawler.rb
will now start from 10000000
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000002
processing 10000002
processing 10000002processing 10000008processing 10000008processing 10000002
duplicate id: 10000002
duplicate id: 10000002processing 10000008
processing 10000008duplicate id: 10000008
duplicate id: 10000008processing 10000008
duplicate id: 10000008
It seems that this script skipped some id and called parsePrice with the same id more than once.
So why did this error occur? Any help would be appreciated.
It seems to me that your thread scheduling is wrong. I have modified your code to illustrates some possible race conditions you were triggering.
re 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'
$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads
$cnt = 0 # number of lines in this COMMIT to database
db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000
def fetch(http, url, timeout = 10)
# ...
end
def parsePrice(i, db)
must_terminate = false
ss = fetch(Net::HTTP.start('p.3.cn',80), "http://p.3.cn/prices/get?skuid=J_#{i}")
doc = JSON.parse(ss)[0]
puts "processing #{i}"
STDOUT.flush
begin
$mutex.synchronize {
$cnt = $cnt+1
db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
if $cnt > 20
db.execute('COMMIT')
db.execute('BEGIN')
$cnt = 0
end
}
rescue SQLite3::ConstraintException
warn("duplicate id: #{i}")
must_terminate = true
rescue NoMethodError
warn("Matching failed")
rescue
# Raising here does not prevent ensure from running.
# It will raise after we decrement $threadCnt on
# ensure clause.
raise
ensure
$cntMutex.synchronize {
$threadCnt -= 1;
}
end
Thread.terminate if must_terminate
end
puts "will now start from #{start}"
# This begin makes no sense for me.
db.execute("BEGIN")
for ii in start..12000000 do
should_redo = false
# Instead of sleeping, we acquire the lock and check
# if we can create another thread. If we can't, we just
# release the lock and retry latter (using for-redo).
$cntMutex.synchronize{
if $threadCnt <= 7
$threadCnt += 1;
Thread.new { parsePrice(ii, db) }
else
# We use this flag since we don't know for sure redo's
# behavior inside a lock.
should_redo = true
end
}
# Will redo this iteration if we can't create the thread.
if should_redo
# Mitigate busy waiting a bit.
sleep(0.1)
redo
end
end
# This commit makes no sense to me.
db.execute('COMMIT')
Thread.list.each { |t| t.join }
Also, most databases already implement locks themselves. You can probably remove the mutex that locks the database. And another advice is that you be more consistent with your commits. You have a lot of scattered begins and commits in the code. I suggest that you either make the operation and then commit or use a commit buffer and then commit everything in a single place.
The race condition, it seems you were not being careful enough when dealing with $threadCnt. The implementation I gave you makes more sense to me, but I have not tested it.
The redo in the main loop is a form of busy waiting, which is bad for performance. You can and you should put a sleep clause there. But it is essential that you maintain the $threadCnt checking and updating inside the lock. The way you implemented it before did not ensure the check and updating was an atomic operation.

Ruby GC::Profiler no output

I'm running a ruby script and trying to see the GC stats on it, but the output is just empty string. Here are the contents of my script:
class NumberPool
...
attr_accessor :sets
def initialize
#sets = []
end
def allocate
allocated_number = Random.rand(min_bound..max_bound)
sets.each do |set|
next unless set.range.include?(allocated_number)
return set.range.delete(allocated_number)
end
factor = allocated_number / batch_size
min = factor * batch_size
max = min + batch_size
sub = SubPool.new(min, max)
sub.range.delete(allocated_number)
sets.push(sub)
allocated_number
end
...
def run_test
GC::Profiler.enable
a = NumberPool.new
p a.allocate
GC::Profiler.report
end
puts run_test
When I run this, the output is:
$ ruby number_pool.rb
1855532
I expected to see something from the GC report in standard out.
This is a guess, but maybe GC hasn't triggered (no need to collect garbage yet because plenty of free memory).
See what happens if you force GC by adding GC.start (modify code like so):
p a.allocate
GC.start
GC::Profiler.report

Webdriver in Ruby, how to run a test case x times

I have a Webdriver test case written in Ruby as below, and I want to make it run for example 100 times:
require 'rubygems'
require 'selenium-webdriver'
$i = 0
$num = 100
while $i<$num do
driver = Selenium::WebDriver.for :firefox
driver.get "https://dev08-olb.nz.thenational.com/ib4b/app/login"
# some other test which require $i to be incremental as unique ID
driver.close
i++
end
It does like it.
Can you show me how to execute it as many time as I want?
Thanks.
Not sure what you are trying to do here but try this
num = 100
num.times do |i|
#INSERT YOUR BLOCK TO RUN HERE i will increment from 0..num
end
if you want to specify a different number of times each time it is called I would create a method
def run_times(n)
n.times do |i|
#INSERT YOUR BLOCK TO RUN HERE
end
end
Then you can call it with run_times 100

Detect number of IDLE processors ruby

I work on shared linux machines with between 4 and 24 cores. To make best use of them, I use the following code to detect the number of processors from my ruby scripts:
return `cat /proc/cpuinfo | grep processor | wc -l`.to_i
(perhaps there is a pure-ruby way of doing this?)
But sometimes a colleague is using six or eight of the 24 cores. (as seen via top). How can I get an estimate of the number of currently unused processors that I can use without making anyone upset?
Thanks!
You can use the data in the /proc filesystem to get CPU affinity info for running processes. The following should give you the number of CPUs currently in use (Note: I don't have a Linux or Ruby box handy so this code is untested, but you can get the idea):
def processors_in_use
procs=[]
Dir.glob("/proc/*/stat") {|filename|
next if File.directory?(filename)
this_proc=[]
File.open(filename) {|file| this_proc = file.gets.split.values_at(2,38)}
procs << this_proc[1].to_i if this_proc[0]=="R"
}
procs.uniq.length
end
def num_processors
IO.readlines("/proc/cpuinfo").delete_if{|x| x.index("processor")==nil}.length
end
def num_free_processors
num_processors - processors_in_use
end
def estimate_free_cpus(count, waittime)
results=[]
count.times {
results << num_free_processors
sleep(waittime)
}
sum=0
results.each {|x| sum += x}
(sum.to_f / results.length).round
end
Edit: I verified that the above code works (I was using Ruby 1.9)
inspired by bta's reply, this is what i'm using:
private
def YWSystemTools.numberOfActiveProcessors # internal
processorForProcs = []
processFiles = Dir.glob("/proc/*/stat")
raise IOError, 'Cannot find /proc/*/stat files. Are you sure this is a linux machine?' if processFiles.empty?
processFiles.each do |filename|
next if File.directory?(filename) # because /proc/net/stat is a directory
next if !File.exists?(filename) # may have disappeared in the meantime
this_proc = []
File.open(filename) { |file| this_proc = file.gets.split.values_at(2,38) }
processorForProcs << this_proc[1].to_i if this_proc[0]=="R"
end
processorsInUse = processorForProcs.uniq
return(processorsInUse.length)
end
public
def YWSystemTools.numberOfAvailableProcessors
numberOfAttempts = 5
$log.info("Will determine number of available processors. Wait #{numberOfAttempts.to_s} seconds.")
#we estimate 5 times because of local fluctuations in procesor use. Keep minimum.
estimationsOfNumberOfActiveProcessors = []
numberOfAttempts.times do
estimationsOfNumberOfActiveProcessors << YWSystemTools.numberOfActiveProcessors
sleep(1)
end
numberOfActiveProcessors = estimationsOfNumberOfActiveProcessors.min
numberOfTotalProcessors = number_of_processors()
raise IOError, '!! # active Processors > # processors' if numberOfActiveProcessors > numberOfTotalProcessors
numberOfAvailableProcessors = numberOfTotalProcessors - numberOfActiveProcessors
$log.info("#{numberOfAvailableProcessors} out of #{numberOfTotalProcessors} are available!")
return(numberOfAvailableProcessors)
end

Resources