How do efficiently run multiple nested scripts from within a ruby script - ruby

Update
I added an answer below that I think makes sense. Please feel free to answer or comment if you think I have this wrong.
I have an array of two strings, for example. ["report_a", "report_b"]
I need to generate a database from each of these report names, then once the database is created I need to generate two wave files using the database as a source.
The two databases can be generated independently of each other, and then the two wave files can be generated independently of each other from the same database.
So in the end I need to do something like this
Generate db_a from report_a
Generate wave_1a from db_a
Generate wave_1b from db_a
Generate db_b from report_b
Generate wave_2a from db_b
Generate wave_2b from db_b
I would like to generate db_a and db_b in parallel and then after a db is created generate corresponding wave files a and b in parallel.
Here is the code I am using, but I feel generating too many processes and running the same commands multiple times.
What's the correct way to maximize what gets done in parallel without having the some work done multiple times?
report_types.each do |report_type|
log.puts "starting build for #{report_type} at #{Time.now}"
Process.fork do
outfile_prefix = "#{cell}.#{report_type}"
log.puts "starting db generation for #{report_type} at #{Time.now}"
log.puts `generate_db.rb #{report_base_name}.#{report_type}.gz #{outfile_prefix}`
log.puts "finished db generation for #{report_type} at #{Time.now}"
Process.fork do
log.puts "starting generation of grf file for #{report_type} at #{Time.now}"
log.puts `generate_waves.rb #{outfile_prefix}.sqlite3.gz #{outfile_prefix}.grf`
log.puts "finished generation of grf file for #{report_type} at #{Time.now}"
end
Process.fork do
log.puts "starting generation of trn file for #{report_type} at #{Time.now}"
log.puts `generate_waves.rb #{outfile_prefix}.sqlite3.gz #{outfile_prefix}.trn`
log.puts "finished generation of trn file for #{report_type} at #{Time.now}"
end
end
log.puts "finished build for #{report_type} at #{Time.now}"
end

I now think this is doing what I want.
I based my original post on the observations of what I saw in the log file I created. For example I observed 11 copies of the line "starting build for ...", in the log. I think this is because I end up with 11 processes. The original plus 10 clones.
1 (original) + 2 (one clone for each report) + 2x2 (each report clone gets a clone for grf wave) + 2x2 (again each report cone gets cloned again for the trn wave ).
So in the end we end up with 11 processes = 1 + 2 + 2x2 + 2x2.
In each process there is open un-written buffer containing the string "starting build for ..". Eventually all these processes end and the buffers are flushed creating the 11 copies of the string in the output.
I updated my code to the following, adding a flush before each Process.fork statement, and now I get the expected number of log entires for each statement. For example I now observe only 2 copies of "starting build for ..." in the log.
report_types.each do |report_type|
log.puts "starting build for #{report_type} at #{Time.now}"
log.flush
Process.fork do
outfile_prefix = "#{cell}.#{report_type}"
log.puts "starting db generation for #{report_type} at #{Time.now}"
log.puts `generate_db.rb #{report_base_name}.#{report_type}.gz #{outfile_prefix}`
log.puts "finished db generation for #{report_type} at #{Time.now}"
log.flush
Process.fork do
log.puts "starting generation of grf file for #{report_type} at #{Time.now}"
log.puts `generate_waves.rb #{outfile_prefix}.sqlite3.gz #{outfile_prefix}.grf`
log.puts "finished generation of grf file for #{report_type} at #{Time.now}"
end
log.flush
Process.fork do
log.puts "starting generation of trn file for #{report_type} at #{Time.now}"
log.puts `generate_waves.rb #{outfile_prefix}.sqlite3.gz #{outfile_prefix}.trn`
log.puts "finished generation of trn file for #{report_type} at #{Time.now}"
end
end
log.puts "finished build for #{report_type} at #{Time.now}"
end

Related

Ruby TCPServer fails to work sometimes

I've implemented a very simple kind of server in Ruby, using TCPServer. I have a Server class with serve method:
def serve
# Do the actual serving in a child process
#pid = fork do
# Trap signal sent by #stop or by pressing ^C
Signal.trap('INT') { exit }
# Create a new server on port 2835 (1 ounce = 28.35 grams)
server = TCPServer.new('localhost', 2835)
#logger.info 'Listening on http://localhost:2835...'
loop do
socket = server.accept
request_line = socket.gets
#logger.info "* #{request_line}"
socket.print "message"
socket.close
end
end
end
and a stop method:
def stop
#logger.info 'Shutting down'
Process.kill('INT', #pid)
Process.wait
#pid = nil
end
I run my server from the command line, using:
if __FILE__ == $0
server = Server.new
server.logger = Logger.new(STDOUT)
server.logger.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
begin
server.serve
Process.wait
rescue Interrupt
server.stop
end
end
The problem is that, sometimes, when I do ruby server.rb from my terminal, the server starts, but when I try to make a request on localhost:2835, it fails. Only after several requests it starts serving some pages. In other cases, I need to stop/start the server again for it to properly serve pages. Why is this happening? Am I doing something wrong? I find this very weird...
The same things applies to my specs: I have some specs defined, and some Capybara specs. Before each test I create a server and start it and after each test I stop the server. And the problem persists: tests sometimes pass, sometimes fail because the requested page could not be found.
Is there something fishy going on with my forking?
Would appreciate any answer because I have no more place to look...
Your code is not an HTTP server. It is a TCP server that sends the string "message" over the socket after receiving a newline.
The reason that your code isn't a valid HTTP server is that it doesn't conform to the HTTP protocol. One of the many requirements of the HTTP protocol is that the server respond with a message of the form
HTTP/1.1 <code> <reason>
Where <code> is a number and <reason> is a human-readable "status", like "OK" or "Server Error" or something along those lines. The string message obviously does not conform to this requirement.
Here is a simple introduction to how you might build a HTTP server in ruby: https://practicingruby.com/articles/implementing-an-http-file-server

How can I deploy with a comment?

When I do a:
rake deploy
How can I add a comment? Right now the comments are auto generated with the date:
Site updated at 2014-03-30 23:21:00 UTC
You can check in the rake file.
multitask :push do
puts "## Deploying branch to Github Pages "
puts "## Pulling any updates from Github Pages "
cd "#{deploy_dir}" do
system "git pull"
end
(Dir["#{deploy_dir}/*"]).each { |f| rm_rf(f) }
Rake::Task[:copydot].invoke(public_dir, deploy_dir)
puts "\n## Copying #{public_dir} to #{deploy_dir}"
cp_r "#{public_dir}/.", deploy_dir
cd "#{deploy_dir}" do
system "git add -A"
puts "\n## Committing: Site updated at #{Time.now.utc}"
message = "Site updated at #{Time.now.utc}"
system "git commit -m \"#{message}\""
puts "\n## Pushing generated #{deploy_dir} website"
system "git push origin #{deploy_branch}"
puts "\n## Github Pages deploy complete"
end
end

How to ignore error and continue with rest of the script

Some background. I want to delete AWS Redshift cluster and the process takes more than 30 minute. So this is what I want to do:
Start the deletion
Every 1 minute, check the cluster status (it should be “deleting”)
When the cluster is deleted, the command would fail (because it
cannot find the cluster anymore). So log some message and continue with rest of the script
This is the command I run in a while loop to check the cluster status after I start the deletion:
resp = redshift.client.describe_clusters(:cluster_identifier=>"blahblahblah")
Above command will get me cluster status as deleting while the deletion process continues. But once the cluster is completely deleted, then the command itself will fail as it cannot find the cluster blahblahblah.
Here is the error from command once the cluster is deleted:
/var/lib/gems/1.9.1/gems/aws-sdk-1.14.1/lib/aws/core/client.rb:366:in `return_or_raise': Cluster blahblahblah not found. (AWS::Redshift::Errors::ClusterNotFound)
I agree with this error. But this makes my script exit abruptly. So I want to log a message saying The cluster is deleted....continuing and continue with my script.
I tried below settings
resp = redshift.client.describe_clusters(:cluster_identifier=>"blahblahblah")
|| raise (“The cluster is deleted....continuing”)
I also tried couple of suggestion mentioned at https://www.ruby-forum.com/topic/133876
But this is not working. My script exits once above command fails to find the cluster.
Questions:
How to ignore the error, print my own message saying “The cluster is deleted....continuing” and continue with the script ?
Thanks.
def delete_clusters clusters=[]
cluster.each do |target_cluster|
puts "will delete #{target_clust}"
begin
while (some_condition) do
resp = redshift.client.describe_clusters(:cluster_identifier => target_clust)
# break condition
end
rescue AWS::Redshift::Errors::ClusterNotFound => cluster_exception
raise ("The cluster, #{target_clust} (#{cluster_excption.id}), is deleted....continuing")
end
puts "doing other things now"
# ....
end
end
#NewAlexandria, I changed your code to look like below:
puts "Checking the cluster status"
begin
resp = redshift.client.describe_clusters(:cluster_identifier=>"blahblahblah")
rescue AWS::Redshift::Errors::ClusterNotFound => cluster_exception
puts "The cluster is deleted....continuing"
end
puts "seems like the cluster is deleted and does not exist"
OUTPUT:
Checking the cluster status
The cluster is deleted....continuing
seems like the cluster is deleted and does not exist
I changed the raise to puts in the line that immediately follows the rescue line in your response. This way I got rid of the RuntimeError that I mentioned in my comment above.
I do not know what are the implication of this. I do not even know whether this is the right way to do it. But It shows the error when the cluster is not found and then continues with the script.
Later I read a lot of articles on ruby exception/rescue/raise/throw.... but that was just too much for me to understand as I do not belong to programming background at all. So, if you could explain what is going on here, it will really helpful for me to get more confidence in ruby.
Thanks for your time.

Ruby Threads (Rake) for FTP

I have a rake task that uploads a list of files via ftp. Copying without threading works fine, but it would be faster if I could do multiple concurrent uploads.
(I'm new to ruby and multithreading, so no surprise it didn't work right off the bat.)
I have:
files.each_slice(files.length / max_threads) do |file_set|
threads << Thread.new(file_set) do |file_slice|
running_threads += 1
thread_num = running_threads
thread_num.freeze
puts "making thread # #{thread_num}"
file_slice.each do |file|
file.freeze
if File.directory?(file)
else
puts file.pathmap("#{$ftpDestination}%p")
ftp.putbinaryfile(file, file.pathmap("#{$ftpDestination}%p"))
end
end
end
end
My output is:
making thread # 1
/test/./1column-ff-template.aspx
making thread # 2
making thread # 3
/test/./admin/footerContent.aspx
/test/./admin/contentList.aspx
making thread # 4
/test/./3columnTemplate.ascx
making thread # 5
/test/./ascx/dashboard/dash.ascx
making thread # 6
/test/./ascx/Links.ascx
making thread # 7
/test/./bin/App_GlobalResources.dll
making thread # 8
/test/./bin/App_Web__foxtqrr.dll
making thread # 9
/test/./GetPageLink.ascx
So it looks like each thread starts to upload a file and then dies without an error.
What am I doing wrong?
If abort_on_exception is false and the debug flag is not enabled (default) an unhandled exception kills the current thread. You don't even know about it until you issue a join on the thread that raised it. So you can do a join or change the debug flag and you should get the exception if one is indeed thrown.
The root of the problem was fixed by adding:
threads.each { |t| t.join }
after the file_slice loop ends.
Thanks to JRL for helping me find the Exception!

Merb & DataMapper - accessing database connection info?

I'm using Merb and DataMapper with a MySQL db. I want to access the database name, user, and password from a Rake task for my Merb app. I guess I could YAML.load() the the database.yml, but that seems ugly. Any ideas?
desc "outputs database connection parameters"
task :db_conn => :merb_env do |t|
puts "Username: #{DataMapper.repository.adapter.uri.user}"
puts "Password: #{DataMapper.repository.adapter.uri.password}"
puts "Database: #{DataMapper.repository.adapter.uri.path.split('/').last}"
end
The interesting part there is the => :merb_env bit. That ensures that the "merb_env" task has executed before your task does. This simply loads up the Merb environment, at which point you can proceed to inspect its configuration.

Resources