Running ruby scripts in parallel - ruby

Let's say I've got two ruby scripts - a.rb and b.rb. Both are web-scrapers used for different pages. They can work for many, many hours and I would like to run them simultaneously. In order to do that I've tried to run them by third script using 'promise' gem with the following code:
def method_1
require 'path to my file\a'
end
def method_2
require 'path to my file\b'
end
require 'future'
x=future{method_1}
y=future{method_2}
x+y
However this solution throws an error(below) and only one script is executed.
An operation was attempted on something that is not a socket.
(Errno::ENOTSOCK)
I also tried playing with Thread class:
def method_one
require 'path to my file\a'
end
def method_two
require 'path to my file\b'
end
x = Thread.new{method_one}
y = Thread.new{method_two}
x.join
y.join
And it gives me the same error as for 'promise' gem.
I've also run those scripts in separate shells- then they work at the same time, but the performance is much worse (aprox. about 50% slower).
Is it any way to run them at the same time and keep high performance?

You can use concurrent-ruby for this, here is how you can run both your scripts in parallel:
require 'concurrent'
# Create future for running script a
future1 = Concurrent::Promises.future do
require 'path to file\a'
:result
end
# Create future for running script b
future2 = Concurrent::Promises.future do
require 'path to file\b'
:result
end
# Combine both futures to run them in parallel
future = Concurrent::Promises.zip(future1, future1)
# Wait until both scripts are completed
future.value!

Related

Capybara Around Hook to test several envinroments

I'm writing some tests for a webpage that I'd like to run in several environments. The idea is that the test will run in one, then repeat in the next. The two environments are preview and uat.
I've written an Around hook to set the environment variables. Below:
Around do |scenario, block|
def test_envs
chosen_env = ENV['test_env'] || 'preview'
chosen_env.split(',').map(&:strip)
end
test_envs.each do |test_env|
$base_url = "https://#{test_env}.webpage.com"
end
block.call
end
I have then written a method to execute the navigation step:
def navigate_to(path)
visit $base_url + path
end
My Scenario step_definition is:
navigate_to '/login'
The tests will work in either environment, Preview by default or UAT if I set test_env=uat
However, I was aiming to set test_env=preview,uat and have them run consecutively in both environments.
Is there something obvious that I've missed here?
Thanks
If I'm understanding you correctly, it's the 'parallel' aspect that you're asking about.
Rspec can be used with parallel tests (the parallel_tests gem) but I wouldn't be so sure that calling something like 3.times { blk.call } in an around hook will run each block in parallel.
An alternative may be do so some metaprogramming with your example definitions, i.e.
test_envs.each do |env_name|
it "does something in #{env_name}" do
# do something with the specific environment
end
end
Now, I haven't actually used this gem and I don't know for sure it would work. I think the simplest solution may be to just write a wrapper script to call the tests
# run_tests.rb
environments = ENV["TEST_ENV"]&.split(",") || []\
filename = ENV["filename"]
environments.each do |env_name|
Thread.new do
system <<-SH
env TEST_ENV=#{env_name} bundle exec rspec #{filename}
SH
end
end
Running it like env TEST_ENV=foo,bar ruby run_tests.rb would call the following commands in their own threads:
env TEST_ENV=foo bundle exec rspec
env TEST_ENV=bar bundle exec rspec
I like this approach because it means you don't have to touch your existing test code.

Resque tasks always fail - uninitialized job constants?

I've tried using Resque before and was met with unmitigated failure. I'm revisiting it again with the same results...
resque.rake:
require "resque/tasks"
task "resque:setup" => :environment
test.rb:
require 'resque'
class FileWorker
#queue = :save_to_file
def self.perform(str)
File.open('./' + Time.now.to_s + '.txt', 'w+') do |f|
f << "test 123"
end
end
end
Resque.enqueue(FileWorker, "12345567".split('').shuffle.join)
Gemfile:
gem 'resque'
gem 'rake'
It seems like running test.rb on its own successfully queues the job:
However, running rake resque:work QUEUE='*' in the same folder results in a warning,
WARNING: This way of doing signal handling is now deprecated. Please
see http://hone.heroku.com/resque/2012/08/21/resque-signals.html for
more info.
As well as the task being added to "failed" queue with the following reason: "exception":"NameError","error":"uninitialized constant FileWorker"
How do I get this to work? Seems like something quite obvious but there's tons of tutorials about Resque spanning many years - some painfully out of date and none explaining how to run workers so they don't fail.
Thanks in advance.
When you enqueue a task with Resque, what is stored on Redis is just the name of the job class (as a string) along with the arguments (again as strings) in a JSON object.
When a worker then tries to perform the task, it needs to be able to create an instance of the job class. It does this by using const_get and const_missing. This is where the error you are seeing occurs, since the worker does not have the definition of FileWorker available to it.
The error is the same as if you tried to get an unknown constant in irb:
> Object.const_missing "FileWorker"
NameError: uninitialized constant FileWorker
The solution is to make sure the definition of FileWorker is available to your workers. The simplest way to do this would be to just require test.rb from your Rakefile (or resque.rake). In your code this would involve adding another task to the queue, so you might want to move the FileWorker code to its own file where it can be required by both the rake file and the code enqueuing jobs.
test.rb:
require 'resque'
require './file_worker'
Resque.enqueue(FileWorker, "12345567".split('').shuffle.join)
Rakefile (note the :environment task only makes sense if you are using Rails and will give errors otherwise):
require "resque/tasks"
require "./file_worker"
file_worker.rb:
class FileWorker
#queue = :save_to_file
def self.perform(str)
File.open('./' + Time.now.to_s + '.txt', 'w+') do |f|
f << "test 123"
end
end
end
Now the workers will be able to create instances of FileWorker to complete the tasks.
The way to avoid the warning about signals is given in the page linked to in the message. Simply set the environment variable TERM_CHILD when calling rake:
$ rake resque:work QUEUE='*' TERM_CHILD=1

A ruby script to run other ruby scripts

If I want to run a bunch of ruby scripts (super similar, with maybe a number changed as a commandline argument) and still have them output to stdout, is there a way to do this?
i.e a script to run these:
ruby program1.rb input_1.txt
ruby program1.rb input_2.txt
ruby program1.rb input_3.txt
like
(1..3).each do |i|
ruby program1.rb input_#{i}'
end
in another script, so I can just run that script and see the output in a terminal from all 3 runs?
EDIT:
I'm struggling to implement the second highest voted suggested answer.
I don't have a main function within my program1.rb, whereas the suggested answer has one.
I've tried this, for script.rb:
require "program1.rb"
(1..6).each do |i|
driver("cmd_line_arg_#{i}","cmd_line_arg2")
end
but no luck. Is that right?
You can use load to run the script you need (the difference between load and require is that require will not run the script again if it has already been loaded).
To make each run have different arguments (given that they are read from the ARGV variable), you need to override the ARGV variable:
(1..6).each do |i|
ARGV = ["cmd_line_arg_#{i}","cmd_line_arg2"]
load 'program1.rb'
end
# script_runner.rb
require_relative 'program_1'
module ScriptRunner
class << self
def run
ARGV.each do | file |
SomeClass.new(file).process
end
end
end
end
ScriptRunner.run
.
# programe_1.rb
class SomeClass
attr_reader :file_path
def initialize(file_path)
#file_path = file_path
end
def process
puts File.open(file_path).read
end
end
Doing something like the code shown above would allow you to run:
ruby script_runner.rb input_1.txt input_2.txt input_3.txt
from the command line - useful if your input files change. Or even:
ruby script_runner.rb *.txt
if you want to run it on all text files. Or:
ruby script_runner.rb inputs/*
if you want to run it on all files in a specific dir (in this case called 'inputs').
This assumes whatever is in program_1.rb is not just a block of procedural code and instead provides some object (class) that encapsulates the logic you want to use on each file,meaning you can require program_1.rb once and then use the object it provides as many times as you like, otherwise you'll need to use #load:
# script_runner.rb
module ScriptRunner
class << self
def run
ARGV.each do | file |
load('program_1.rb', file)
end
end
end
end
ScriptRunner.run
.
# program_1.rb
puts File.open(ARGV[0]).read

Load a Ruby TestCase Without Running It

I'm trying to write a custom tool that runs ruby unit tests with my customizations.
What I need it to do is to load a certain TestCase from given file(through require or whatever), and then run it after doing some calculations and initializations.
Problem is, the moment I require "test/unit" and a test case, it runs immediately.
What can I do with this?
Thanks.
Since you're running 1.9 and test/unit in 1.9 is merely a wrapper for MiniTest, the following approach should work:
implement your own custom Runner
set MiniTest's runner to your custom runner
Something like (shameless plug from EndOfLine Custom Test Runner, adjusted to Ruby 1.9):
fastfailrunner.rb:
require 'test/unit'
class FastFailRunner19 < MiniTest::Unit
def _run args = []
puts "fast fail runner"
end
end
~
example_test.rb:
require 'test/unit'
class ExampleTest < Test::Unit::TestCase
def test_assert_equal
assert_equal 1, 1
end
def test_lies
assert false
end
def test_exceptions
raise Exception, 'Beware the Jubjub bird, and shun the frumious Bandersnatch!'
end
def test_truth
assert true
end
end
run.rb:
require_relative 'fast_fail_runner'
require_relative 'example_test'
MiniTest::Unit.runner= FastFailRunner19.new
If you run this with
ruby run.rb
the custom FastFailRunner19 will be used, which does nothing.
What about reading file content as a regular text file and doing eval on its content after you initialize/calculate things you say? It may not be sufficient for your needs and may require manual setup and execution of testing framework.
Like that (I put heredoc instead of reading file). Basically content is just a string containing your test case code.
content = <<TEST_CASE
class YourTestCase
def hello
puts 'Hello from eval'
end
end
YourTestCase.new.hello
TEST_CASE
eval content
Note: Altough I'd rather not use eval if there is another way. One should be extra careful when evaling code from string manually in any language.
You could collect the test cases you want to deferred its executions and store them in an array. Afterwards you would create a block execution code. For instance:
test_files = ['test/unit/first_test.rb'] #=> Testcases you want to run
test_block = Proc.new {spec_files.each {|f|load f} } #=> block storing the actual execution of those tests.
Once you're ready to call those testcases you just do test_block.call.
To generalize a bit, when thinking about deferring or delaying code executions, closures are a very elegant and flexible alternative.

Is there a shorter way to require a file in the same directory in ruby?

Is there a shorter way to require a file located in the same directory (as the script being executed)?
require File.expand_path(File.dirname(__FILE__) + '/some_other_script')
I read that require "my_script" and require "./my_script" will actually load the script twice (ruby will not recognize that it is actually the same script), and this is the reason why File.expand_path is recommended: if it is used every time the script is required, then it will only be loaded once.
It seems weird to me that a concise language like Ruby does not seem to have a shorter solution. For example, python simply has this:
import .some_other_module_in_the_same_directory
I guess I could monkey-patch require... but that's just evil! ;-)
Since ruby 1.9 you can use require_relative.
Check the latest doc for require_relative or another version of the Core API.
Just require filename.
Yes, it will import it twice if you specify it as filename and ./filename, so don't do that. You're not specifying the .rb, so don't specify the path. I usually put the bulk of my application logic into a file in lib, and then have a script in bin that looks something like this:
#!/usr/bin/env ruby
$: << File.join(File.dirname(__FILE__), "/../lib")
require 'app.rb'
App.new.run(ARGV)
Another advantage is that I find it easier to do unit testing if the loading the application logic doesn't automatically start executing it.
The above will work even when you're running the script from some other directory.
However, inside the same directory the shorter forms you refer to work as expected and at least for ruby 1.9 won't result in a double-require.
testa.rb
puts "start test A"
require 'testb'
require './testb'
puts "finish test A"
testb.rb
puts "start test B"
puts "finish test B"
running 'ruby testa.rb' will result in:
start test A
start test B
finish test B
finish test A
However, the longer form will work even from another directory (eg. ruby somedir/script.rb)
Put this in a standard library directory (somewhere that's already in your default loadpath $:):
# push-loadpath.rb
if caller.first
$: << File.expand_path(File.dirname(caller.first))
end
Then, this should work
% ls /path/to/
bin.rb lib1.rb lib2.rb #...
% cat /path/to/bin.rb
load 'push-loadpath.rb'
require 'lib1'
require 'lib2'
#...
caller gives you access to the current callstack, and tells you what file and where, so push-loadpath.rb uses that to add the file that load'd it to the loadpath.
Note that you should load the file, rather than require it, so the body can be invoked multiple times (once for each time you want to alter the loadpath).
Alternately, you could wrap the body in a method,
# push-loadpath.rb
def push_loadpath
$: << File.expand_path(File.dirname(caller.first))
end
This would allow you to require it, and use it this way:
% ls /path/to/
bin.rb lib1.rb lib2.rb #...
% cat /path/to/bin.rb
require 'push-loadpath'
push_loadpath
require 'lib1'
require 'lib2'
#...

Resources