Detect number of IDLE processors ruby - ruby

I work on shared linux machines with between 4 and 24 cores. To make best use of them, I use the following code to detect the number of processors from my ruby scripts:
return `cat /proc/cpuinfo | grep processor | wc -l`.to_i
(perhaps there is a pure-ruby way of doing this?)
But sometimes a colleague is using six or eight of the 24 cores. (as seen via top). How can I get an estimate of the number of currently unused processors that I can use without making anyone upset?
Thanks!

You can use the data in the /proc filesystem to get CPU affinity info for running processes. The following should give you the number of CPUs currently in use (Note: I don't have a Linux or Ruby box handy so this code is untested, but you can get the idea):
def processors_in_use
procs=[]
Dir.glob("/proc/*/stat") {|filename|
next if File.directory?(filename)
this_proc=[]
File.open(filename) {|file| this_proc = file.gets.split.values_at(2,38)}
procs << this_proc[1].to_i if this_proc[0]=="R"
}
procs.uniq.length
end
def num_processors
IO.readlines("/proc/cpuinfo").delete_if{|x| x.index("processor")==nil}.length
end
def num_free_processors
num_processors - processors_in_use
end
def estimate_free_cpus(count, waittime)
results=[]
count.times {
results << num_free_processors
sleep(waittime)
}
sum=0
results.each {|x| sum += x}
(sum.to_f / results.length).round
end
Edit: I verified that the above code works (I was using Ruby 1.9)

inspired by bta's reply, this is what i'm using:
private
def YWSystemTools.numberOfActiveProcessors # internal
processorForProcs = []
processFiles = Dir.glob("/proc/*/stat")
raise IOError, 'Cannot find /proc/*/stat files. Are you sure this is a linux machine?' if processFiles.empty?
processFiles.each do |filename|
next if File.directory?(filename) # because /proc/net/stat is a directory
next if !File.exists?(filename) # may have disappeared in the meantime
this_proc = []
File.open(filename) { |file| this_proc = file.gets.split.values_at(2,38) }
processorForProcs << this_proc[1].to_i if this_proc[0]=="R"
end
processorsInUse = processorForProcs.uniq
return(processorsInUse.length)
end
public
def YWSystemTools.numberOfAvailableProcessors
numberOfAttempts = 5
$log.info("Will determine number of available processors. Wait #{numberOfAttempts.to_s} seconds.")
#we estimate 5 times because of local fluctuations in procesor use. Keep minimum.
estimationsOfNumberOfActiveProcessors = []
numberOfAttempts.times do
estimationsOfNumberOfActiveProcessors << YWSystemTools.numberOfActiveProcessors
sleep(1)
end
numberOfActiveProcessors = estimationsOfNumberOfActiveProcessors.min
numberOfTotalProcessors = number_of_processors()
raise IOError, '!! # active Processors > # processors' if numberOfActiveProcessors > numberOfTotalProcessors
numberOfAvailableProcessors = numberOfTotalProcessors - numberOfActiveProcessors
$log.info("#{numberOfAvailableProcessors} out of #{numberOfTotalProcessors} are available!")
return(numberOfAvailableProcessors)
end

Related

Download files asynchronously

I was trying to make a script that downloads all images or videos from a thread in my favourite imageboard: 2ch.hk
I was successful until I wanted to download these files asynchronously (for example, to improve performance)
Here is the code http://ideone.com/k2l4Hm
file = http.get(source).body
require 'net/http'
multithreading = false
Net::HTTP.start("2ch.hk", :use_ssl => true) do |http|
thread = http.get("/b/res/133467978.html").body
sources = []
thread.scan(/<a class="desktop" target="_blank" href=".+">.+<\/a>/).each do |a|
source = "/b#{/<a class="desktop" target="_blank" href="\.\.(.+)">.+<\/a>/.match(a).to_a[1]}"
sources << source
end
i = 0
start = Time.now
if multithreading
threads = []
sources.each do |source|
threads << Thread.new(i) do |j|
file = http.get(source).body #breaks everything
# type = /.+\.(.+)/.match(source)[1]
# open("#{j}.#{type}","wb") { |new_file|
# new_file.write(file)
# }
end
i += 1
end
threads.each do |thr|
thr.join
end
# until downloade=sources.size
#
# end
else
sources.each do |source|
file = http.get(source).body
type = /.+\.(.+)/.match(source)[1]
open("#{i}.#{type}","wb") { |new_file|
new_file.write(file)
}
i += 1
print "#{(((i).to_f / sources.size) * 100).round(2)}% "
end
puts
end
puts "Done. #{i} files were downloaded. It took #{Time.now - start} seconds"
end
I suppose that this line crashes everything.
file = http.get(source).body
Or maybe that's the problem.
threads.each do |thr|
thr.join
end
Error messages are always different, from Bad File Descriptor and IO errors to "You may have encountered a bug in the Ruby interpreter or extension libraries."
If you want to try and run my code, please substitute a link to thread in 4th line with a new thread (from 2ch.hk/b), because the one in my code may be deleted by the time you run my code
Version of ruby: 2.3.1, OS Xubuntu 16.10
You'll probably have much better performance using a ruby http lib that supports parallel requests:
https://github.com/typhoeus/typhoeus
e.g.
hydra = Typhoeus::Hydra.new
10.times.map{ hydra.queue(Typhoeus::Request.new("www.example.com", followlocation: true)) }
hydra.run
The problem with my code is that I can't make multiple requests on a Net::HTTP instance at the same time.
The solution is to open an HTTP connection for each thread.

Increasing Ruby Resolv Speed

Im trying to build a sub-domain brute forcer for use with my clients - I work in security/pen testing.
Currently, I am able to get Resolv to look up around 70 hosts in 10 seconds, give or take and wanted to know if there was a way to get it to do more. I have seen alternative scripts out there, mainly Python based that can achieve far greater speeds than this. I don't know how to increase the number of requests Resolv makes in parallel, or if i should split the list up. Please note I have put Google's DNS servers in the sample code, but will be using internal ones for live usage.
My rough code for debugging this issue is:
require 'resolv'
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
subs = []
domains = File.open("domains.txt", "r") #list of domain names line by line.
Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
File.open("tiny.txt", "r").each_line do |subdomain|
subdomain.chomp!
domains.each do |d|
puts "Checking #{subdomain}.#{d}"
ip = Resolv.new.getaddress "#{subdomain}.#{d}" rescue ""
if ip != nil
subs << subdomain+"."+d << ip
end
end
end
test = subs.each_slice(4).to_a
test.each do |z|
if !z[1].nil? and !z[3].nil?
puts z[0] + "\t" + z[1] + "\t\t" + z[2] + "\t" + z[3]
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
subdomains
domains.txt is my list of client domain names, for example google.com, bbc.co.uk, apple.com and 'tiny.txt' is a list of potential subdomain names, for example ftp, www, dev, files, upload. Resolv will then lookup files.bbc.co.uk for example and let me know if it exists.
One thing is you are creating a new Resolv instance with the Google nameservers, but never using it; you create a brand new Resolv instance to do the getaddress call, so that instance is probably using some default nameservers and not the Google ones. You could change the code to something like this:
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
# ...
ip = resolv.getaddress "#{subdomain}.#{d}" rescue ""
In addition, I suggest using the File.readlines method to simplify your code:
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
Also, you're rescuing the bad ip and setting it to the empty string, but then in the next line you test for not nil, so all results should pass, and I don't think that's what you want.
I've refactored your code, but not tested it. Here is what I came up with, and may be clearer:
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
valid_subdomains = subdomains.each_with_object([]) do |subdomain, valid_subdomains|
domains.each do |domain|
combined_name = "#{subdomain}.#{domain}"
puts "Checking #{combined_name}"
ip = resolv.getaddress(combined_name) rescue nil
valid_subdomains << "#{combined_name}#{ip}" if ip
end
end
valid_subdomains.each_slice(4).each do |z|
if z[1] && z[3]
puts "#{z[0]}\t#{z[1]}\t\t#{z[2]}\t#{z[3]}"
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
Also, you might want to check out the dnsruby gem (https://github.com/alexdalitz/dnsruby). It might do what you want to do better than Resolv.
[Note: I've rewritten the code so that it fetches the IP addresses in chunks. Please see https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed. You can modify the RESOLVE_CHUNK_SIZE constant to balance performance with resource load.]
I've rewritten this code using the dnsruby gem (written mainly by Alex Dalitz in the UK, and contributed to by myself and others). This version uses asynchronous message processing so that all requests are being processed pretty much simultaneously. I've posted a gist at https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed but will also post the code here.
Note that since you are new to Ruby, there are lots of things in the code that might be instructive to you, such as method organization, use of Enumerable methods (e.g. the amazing 'partition' method), the Struct class, rescuing a specific Exception class, %w, and Benchmark.
NOTE: LOOKS LIKE STACK OVERFLOW ENFORCES A MAXIMUM MESSAGE SIZE, SO THIS CODE IS TRUNCATED. GO TO THE GIST IN THE LINK ABOVE FOR THE COMPLETE CODE.
#!/usr/bin/env ruby
# Takes a list of subdomain prefixes (e.g. %w(ftp xyz)) and a list of domains (e.g. %w(nytimes.com afp.com)),
# creates the subdomains combining them, fetches their IP addresses (or nil if not found).
require 'dnsruby'
require 'awesome_print'
RESOLVER = Dnsruby::Resolver.new(:nameserver => %w(8.8.8.8 8.8.4.4))
# Experiment with this to get fast throughput but not overload the dnsruby async mechanism:
RESOLVE_CHUNK_SIZE = 50
IpEntry = Struct.new(:name, :ip) do
def to_s
"#{name}: #{ip ? ip : '(nil)'}"
end
end
def assemble_subdomains(subdomain_prefixes, domains)
domains.each_with_object([]) do |domain, subdomains|
subdomain_prefixes.each do |prefix|
subdomains << "#{prefix}.#{domain}"
end
end
end
def create_query_message(name)
Dnsruby::Message.new(name, 'A')
end
def parse_response_for_address(response)
begin
a_answer = response.answer.detect { |a| a.type == 'A' }
a_answer ? a_answer.rdata.to_s : nil
rescue Dnsruby::NXDomain
return nil
end
end
def get_ip_entries(names)
queue = Queue.new
names.each do |name|
query_message = create_query_message(name)
RESOLVER.send_async(query_message, queue, name)
end
# Note: although map is used here, the record in the output array will not necessarily correspond
# to the record in the input array, since the order of the messages returned is not guaranteed.
# This is indicated by the lack of block variable specified (normally w/map you would use the element).
# That should not matter to us though.
names.map do
_id, result, error = queue.pop
name = _id
case error
when Dnsruby::NXDomain
IpEntry.new(name, nil)
when NilClass
ip = parse_response_for_address(result)
IpEntry.new(name, ip)
else
raise error
end
end
end
def main
# domains = File.readlines("domains.txt").map(&:chomp)
domains = %w(nytimes.com afp.com cnn.com bbc.com)
# subdomain_prefixes = File.readlines("subdomain_prefixes.txt").map(&:chomp)
subdomain_prefixes = %w(www xyz)
subdomains = assemble_subdomains(subdomain_prefixes, domains)
start_time = Time.now
ip_entries = subdomains.each_slice(RESOLVE_CHUNK_SIZE).each_with_object([]) do |ip_entries_chunk, results|
results.concat get_ip_entries(ip_entries_chunk)
end
duration = Time.now - start_time
found, not_found = ip_entries.partition { |entry| entry.ip }
puts "\nFound:\n\n"; puts found.map(&:to_s); puts "\n\n"
puts "Not Found:\n\n"; puts not_found.map(&:to_s); puts "\n\n"
stats = {
duration: duration,
domain_count: ip_entries.size,
found_count: found.size,
not_found_count: not_found.size,
}
ap stats
end
main

Ruby GC::Profiler no output

I'm running a ruby script and trying to see the GC stats on it, but the output is just empty string. Here are the contents of my script:
class NumberPool
...
attr_accessor :sets
def initialize
#sets = []
end
def allocate
allocated_number = Random.rand(min_bound..max_bound)
sets.each do |set|
next unless set.range.include?(allocated_number)
return set.range.delete(allocated_number)
end
factor = allocated_number / batch_size
min = factor * batch_size
max = min + batch_size
sub = SubPool.new(min, max)
sub.range.delete(allocated_number)
sets.push(sub)
allocated_number
end
...
def run_test
GC::Profiler.enable
a = NumberPool.new
p a.allocate
GC::Profiler.report
end
puts run_test
When I run this, the output is:
$ ruby number_pool.rb
1855532
I expected to see something from the GC report in standard out.
This is a guess, but maybe GC hasn't triggered (no need to collect garbage yet because plenty of free memory).
See what happens if you force GC by adding GC.start (modify code like so):
p a.allocate
GC.start
GC::Profiler.report

Nasty race conditions with Celluloid

I have a script that generates a user-specified number of IP addresses and tries to connect to them all on some port. I'm using Celluloid with this script to allow for reasonable speeds, since scanning 2000 hosts synchronously could take a long time. However, say I tell the script to scan 2000 random hosts. What I find is that it actually only ends up scanning about half that number. If I tell it to scan 3000, I get the same basic results. It seems to work much better if I do 1000 or less, but even if I just scan 1000 hosts it usually only ends up doing about 920 with relative consistency. I realize that generating random IP addresses will obviously fail with some of them, but I find it hard to believe that there are around 70 improperly generated IP addresses, every single time. So here's the code:
class Scan
include Celluloid
def initialize(arg1)
#arg1 = arg1
#host_arr = []
#timeout = 1
end
def popen(host)
addr = Socket.getaddrinfo(host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(22, addr[0][3]))
rescue Errno::EINPROGRESS
resp = IO.select(nil, [sock], nil, #timeout.to_i)
if resp.nil?
puts "#{host}:Firewalled"
end
begin
if sock.connect_nonblock(Socket.pack_sockaddr_in(22, addr[0][3]))
puts "#{host}:Connected"
end
rescue Errno::ECONNREFUSED
puts "#{host}:Refused"
rescue
false
end
end
sock
end
def asynchronous
s = 1
threads = []
while s <= #arg1.to_i do
#host_arr << Array.new(4){rand(254)}.join('.')
s += 1
end
#host_arr.each do |ip|
threads << Thread.new do
begin
popen(ip)
rescue
end
end
end
threads.each do |thread|
thread.join
end
end
end
scan = Scan.pool(size: 100, args: [ARGV[0]])
(0..20).to_a.map { scan.future.asynchronous }
Around half the time I get this:
D, [2014-09-30T17:06:12.810856 #30077] DEBUG -- : Terminating 11 actors...
W, [2014-09-30T17:06:12.812151 #30077] WARN -- : Terminating task: type=:finalizer, meta={:method_name=>:shutdown}, status=:receiving
Celluloid::TaskFiber backtrace unavailable. Please try Celluloid.task_class = Celluloid::TaskThread if you need backtraces here.
and the script does nothing at all. The rest of the time (only if I specify more then 1000) I get this: http://pastebin.com/wTmtPmc8
So, my question is this. How do I avoid race conditions and deadlocking, while still achieving what I want in this particular script?
Starting low-level Threads by yourself interferes with Celluloid's functionality. Instead create a Pool of Scan objects and feed them the IP's all at once. They will queue up for the available
class Scan
def popen
…
end
end
scanner_pool = Scan.pool(50)
resulsts = #host_arr.map { |host| scanner_pool.scan(host) }

Ruby Parallel each loop

I have a the following code:
FTP ... do |ftp|
files.each do |file|
...
ftp.put(file)
sleep 1
end
end
I'd like to run the each file in a separate thread or some parallel way. What's the correct way to do this? Would this be right?
Here's my try on the parallel gem
FTP ... do |ftp|
Parallel.map(files) do |file|
...
ftp.put(file)
sleep 1
end
end
The issue with parallel is puts/outputs can occur at the same time like so:
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as) do |a|
puts a
end
How can I force puts to occur like they normally would line separated.
The whole point of parallelization is to run at the same time. But if there's some part of the process that you'd like to run some of the code sequentially you could use a mutex like:
semaphore = Mutex.new
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as, in_threads: 3) do |a|
# Parallel stuff
sleep rand
semaphore.synchronize {
# Sequential stuff
puts a
}
# Parallel stuff
sleep rand
end
You'll see that it prints stuff correctly but not necesarily in the same order. I used in_threads instead of in_processes (default) because Mutex doesn't work with processes. See below for an alternative if you do need processes.
References:
http://ruby-doc.org/core-2.2.0/Mutex.html
http://dev.housetrip.com/2014/01/28/efficient-cross-processing-locking-in-ruby/
In the interest of keeping it simple, here's what I'd do with built-in Thread:
results = files.map do |file|
result = Thread.new do
ftp.put(file)
end
end
Note that this code assumes that ftp.put(file) returns safely. If that isn't guaranteed, you'll have to do that yourself by wrapping calls in a timeout block and have each thread return an exception if one is thrown and then at the very end of the loop have a blocking check to see that results does not contain any exceptions.

Resources