I have a rake task that uploads a list of files via ftp. Copying without threading works fine, but it would be faster if I could do multiple concurrent uploads.
(I'm new to ruby and multithreading, so no surprise it didn't work right off the bat.)
I have:
files.each_slice(files.length / max_threads) do |file_set|
threads << Thread.new(file_set) do |file_slice|
running_threads += 1
thread_num = running_threads
thread_num.freeze
puts "making thread # #{thread_num}"
file_slice.each do |file|
file.freeze
if File.directory?(file)
else
puts file.pathmap("#{$ftpDestination}%p")
ftp.putbinaryfile(file, file.pathmap("#{$ftpDestination}%p"))
end
end
end
end
My output is:
making thread # 1
/test/./1column-ff-template.aspx
making thread # 2
making thread # 3
/test/./admin/footerContent.aspx
/test/./admin/contentList.aspx
making thread # 4
/test/./3columnTemplate.ascx
making thread # 5
/test/./ascx/dashboard/dash.ascx
making thread # 6
/test/./ascx/Links.ascx
making thread # 7
/test/./bin/App_GlobalResources.dll
making thread # 8
/test/./bin/App_Web__foxtqrr.dll
making thread # 9
/test/./GetPageLink.ascx
So it looks like each thread starts to upload a file and then dies without an error.
What am I doing wrong?
If abort_on_exception is false and the debug flag is not enabled (default) an unhandled exception kills the current thread. You don't even know about it until you issue a join on the thread that raised it. So you can do a join or change the debug flag and you should get the exception if one is indeed thrown.
The root of the problem was fixed by adding:
threads.each { |t| t.join }
after the file_slice loop ends.
Thanks to JRL for helping me find the Exception!
Related
Using Ruby (tested with versions 2.6.9, 2.7.5, 3.0.3, 3.1.1) and forking processes to handle socket communication there seems to be a huge difference between MacOS OSX and a Debian Linux.
While running on Debian, the forked processes get called in a balanced manner - that mean: if having 10 tcp server forks and running 100 client calls, each fork will get 10 calls. The order of the pid call stack is also always the same even not ordered by pid (caused by load when instantiating the forks).
Doing the same on a MacOS OSX (Catalina) the forked processes will not get called balanced - that mean: "pid A" might get called 23 or whatever times while e.g. "pid G" was never used.
Sample code (originally from: https://relaxdiego.com/2017/02/load-balancing-sockets.html)
#!/usr/bin/env ruby
# server.rb
require 'socket'
# Open a socket
socket = TCPServer.open('0.0.0.0', 9999)
puts "Server started ..."
# For keeping track of children pids
wpids = []
# Forward any relevant signals to the child processes.
[:INT, :QUIT].each do |signal|
Signal.trap(signal) {
wpids.each { |wpid| Process.kill(:KILL, wpid) }
}
end
5.times {
wpids << fork do
loop {
connection = socket.accept
connection.puts "Hello from #{ Process.pid }"
connection.close
}
end
}
Process.waitall
Run some netcat to the server on a second terminal:
for i in {1..20}; do nc -d localhost 9999; done
As said: if running on Linux each forked process will get 4 calls - doing same on MacOS OSX its a random usage per forked process.
Any solution or correction to make it work on MacOS OSX in a balanced manner also?
The problem is that the default socket backlog size is 5 on MacOS and 128 on Linux. You can change the backlog size by passing it as the second argument to TCPServer#listen:
socket.listen(128)
Or you can use the backlog size from the environment variable SOMAXCONN:
socket.listen(ENV.fetch('SOMAXCONN', 128).to_i)
I have this worker process in Heroku, which does some cleaning. It runs every two hours and listens to Heroku terminate signals. It works fine, but I'm seeing 100% dyno load all the time.
My question is: How to run this kind of worker process in Heroku without 100% dyno load? The loop causes the dyno load, but what to use instead of the infinite loop?
# Scheduler here
cleanup = Rufus::Scheduler.new
cleanup.cron '* */2 * * *' do
do_some_cleaning
end
# Signal trapping
Signal.trap("TERM") {
terminate = true
shut_down
exit 0
}
# Infinite loop
while terminate == false
end
It's because you're doing an infinite loop with no sleep cycles. This means you're basically telling the CPU that every single cycle you should be immediately executing a loop condition.
This will quickly use up your CPU.
Instead, try throwing a sleep statement into your infinite loop -- this will pause execution and bring your usage down to 0% =)
while terminate == false
sleep 1
end
I should have thought about it sooner. You can actually simply join rufus-scheduler's loop:
cleanup_scheduler = Rufus::Scheduler.new
cleanup_scheduler.cron '* */2 * * *' do
do_some_cleaning
end
Signal.trap('TERM') do
shut_down
exit 0
end
cleanup_scheduler.join
That joins rufus-scheduler scheduling thread and is pseudo equivalent to:
while !terminated
sleep 0.3
trigger_schedules_if_any
end
I'm working on a production app that has multiple rails servers behind nginx loadbalancer. We are monitoring sidekiq processes with monit, and it works just fine - when sidekiq proces dies monit starts it right back.
However recently encountered a situation where one of these processes was running and visible to monit, but for some reason not visible to sidekiq. That resulted in many failed jobs and took us some time to notice that we're missing one process in sidekiq Web UI, since monit was telling us everything was fine and all processes were running. Simple restart fixed the problem.
And that bring me to my question: how do you monitor your sidekiq processes? I know i can use something like rollbar to notify me when jobs fail, but i'd like to know if there is a way to monitor process count and preferably send mail when one dies. Any suggestions?
Something that would ping sidekiq/stats and verify response.
My super simple solution to a similar problem looks like this:
# sidekiq_check.rb
namespace :sidekiq_check do
task rerun: :environment do
if Sidekiq::ProcessSet.new.size == 0
exec 'bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e production'
end
end
end
and then using cron/whenever
# schedule.rb
every 5.minutes do
rake 'sidekiq_check:rerun'
end
We ran into this problem where our sidekiq processes had stopped working off jobs overnight and we had no idea. It took us about 30 minutes to integrate http://deadmanssnitch.com by following these instructions.
It's not the prettiest or cheapest option but it gets the job done (integrates nicely with Pagerduty) and has saved our butt twice in the last few months.
On of our complaints with the service is the shortest grace interval we can set is 15 minutes which is too long for us. So we're evaluating similar services like Healthchecks, etc.
My approach is the following:
create a background job that does something
call the job regularly
check that the thing is being done!
so; using a cron script (or something like whenever) every 5 mins, I run :
CheckinJob.perform_later
It's now up to sidekiq (or delayed_job, or whatever active job you're using) to actually run the job.
The job just has to do something which you can check.
I used to get the job to update a record in my Status table (essentially a list of key/value records). Then I'd have a /status page which returns a :500 status code if the record hasn't been updated in the last 6 minutes.
(obviously your timing may vary)
Then I use a monitoring service to monitor the status page! (something like StatusCake)
Nowdays I have a simpler approach; I just get the background job to check in with a cron monitoring service like
IsItWorking
Dead Mans Snitch
Health Checks
The monitoring service which expects your task to check in every X mins. If your task doesn't check in - then the monitoring service will let you know.
Integration is dead simple for all the services. For Is It Working it would be:
IsItWorkingInfo::Checkin.ping(key:"CHECKIN_IDENTIFIER")
full disclosure: I wrote IsItWorking !
I use god gem to monitor my sidekiq processes. God gem makes sure that your process is always running and also can notify the process status on various channels.
ROOT = File.dirname(File.dirname(__FILE__))
God.pid_file_directory = File.join(ROOT, "tmp/pids")
God.watch do |w|
w.env = {'RAILS_ENV' => ENV['RAILS_ENV'] || 'development'}
w.name = 'sidekiq'
w.start = "bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e #{ENV['RAILS_ENV']}"
w.log = "#{ROOT}/log/sidekiq_god.log"
w.behavior(:clean_pid_file)
w.dir = ROOT
w.keepalive
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.interval = 120.seconds
c.above = 100.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end
restart.condition(:cpu_usage) do |c|
c.interval = 120.seconds
c.above = 80.percent
c.times = 5
end
end
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 1.hours
end
end
end
Usually, Ohai plugin runs periodically to collect some host parameters and some of the plugins usually added to all nodes in the company. This could be sensitive for resources using and how Ohai handle that. So I have two questions here.
The first one is what will happen if I will put infinite loop accidentally? Does Ohai/ruby has some max heap size or any memory limits?
Second question would be about shell out in Ohai. Is it possible to reduce timeout? Do you know more protections just in case?
I use only special ruby timeout for now:
require 'timeout'
begin
status = Timeout::timeout(600) {
# all code here
}
rescue Timeout::Error
puts 'timeout'
end
The chef-client run won't start/succeed, if ohai hangs.. you should notice this in some kind of monitoring.
Regarding the timeout part: Searching the source code reveals this:
def shell_out(cmd, **options)
# unless specified by the caller timeout after 30 seconds
options[:timeout] ||= 30
so = Mixlib::ShellOut.new(cmd, options)
So you should be able to set the timeout as you like (2 seconds in this case):
so = shell_out("/bin/your-command", :timeout => 2)
Regarding the third sub-question
Do you know more protections just in case?
you are getting pretty broad. Try to solve the problems that occur, stop over-engineering.
Just for the sake of completeness, Chef does not guard against broken or malicious Ohai plugins. If you put sleep 1 while true in your Ohai plugin it will happily sit there forever.
Seems I have found solution to limit Ohai resources for Redhat Linux in terms of CPU, disk space usage, disk space I/O, long run timeout and heap size memory limit. So you will not affect other host's components. In ideal world, you write optimised and right code, but memory leak is global problem and could happen so I think protections are needed especially when you have loaded Ohai plugin to hunders or tousands production servers.
CPU -
If I'm right Ohai plugin gets lowest cpu priority (-19?). Please confirm this if you know. So Ohai plugin cannot affect your production app in terms of CPU.
Disk space -
Ohai plugin should write to node attributes
Protection for unexpected long run -
require 'timeout'
begin
status = Timeout::timeout(600) {
# Ohai plugin code is here
}
rescue Timeout::Error
puts 'timeout'
end
Protection for unexpected long run of shell_out:
so = shell_out("/bin/your-command", :timeout => 30)
Memory (RAM) heap size limit -
require "thread"
# This thread is memory watcher. It works separately and does exit if heap size reached.
# rss is checked including childs but excluding shared memory.
# This could be ok for Ohai plugin. I'm assuming memory is not shared.
# Exit - if heap size limit reached (10 000 KB) or any unexpected scenario happened.
Thread.start {
loop do
sleep 1
current_memory_rss = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_i
if current_memory_rss != nil && current_memory_rss > 0 && $$ != nil && $$.to_i > 0
exit if current_memory_rss > 10_000
else
exit
end
end
}
# Your Ohai code begins here
# For testing, any code can be included to make memory growing constantly as infinite loop:
loop do
puts `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_s + ' KB'
end
Please let me know if you have better solutions, but it seems it works.
Disk I/O read heavy usage -
timeout should help here, but recommended to avoid commands like find and similar others
I have follwing ruby scripts
rubyScript.rb:
require "rScript"
t1 = Thread.new{LongRunningOperation(); puts "doneLong"}
sleep 1
shortOperation()
puts "doneShort"
t1.join
rScript.rb:
def LongRunningOperation()
puts "In LongRunningOperation method"
for i in 0..100000
end
return 0
end
def shortOperation()
puts "IN shortOperation method"
return 0
end
THE OUTPUT of above script i.e.(ruby rubyScript.rb)
1) With use of sleep function
In veryLongRunningOperation method
doneLong
IN shortOperation method
doneShort
2) Without use of sleep function i.e. removing sleep function.(ruby rubyScript.rb)
In veryLongRunningOperation method
IN shortOperation method
doneShort
doneLong
why there is difference in output. What sleep does in ablve case. Thannks in advance.
The sleep lets the main thread sleep for 1 second.
Your long running function runs longer than your short running function but it is still faster than one second.
If you remove the sleep, then your long running function starts in a new thread and the main thread continues without any wait. It then starts the short running function, which finishes nearly immediatly, while the long running function is still running.
In the case of the none removed sleep it goes as following:
Your long running function starts in a new Thread and the main thread continues. Now the main thread encounters the sleep command and waits for 1 second. In this time the long running function in the other thread is still running and finishes. The main thread continues after its sleep time and starts the short running function.
sleep 1 makes the current thread sleep (i.e. do nothing) for one second. So veryLongRunningOperation (which despite being a very long running operation still takes less than a second) has enough time to finish before shortOperation even starts.
sleep 1
Makes the main thread to wait for 1 second, that allows t1 to finish before shortOperation is executed.