Nasty race conditions with Celluloid - ruby

I have a script that generates a user-specified number of IP addresses and tries to connect to them all on some port. I'm using Celluloid with this script to allow for reasonable speeds, since scanning 2000 hosts synchronously could take a long time. However, say I tell the script to scan 2000 random hosts. What I find is that it actually only ends up scanning about half that number. If I tell it to scan 3000, I get the same basic results. It seems to work much better if I do 1000 or less, but even if I just scan 1000 hosts it usually only ends up doing about 920 with relative consistency. I realize that generating random IP addresses will obviously fail with some of them, but I find it hard to believe that there are around 70 improperly generated IP addresses, every single time. So here's the code:
class Scan
include Celluloid
def initialize(arg1)
#arg1 = arg1
#host_arr = []
#timeout = 1
end
def popen(host)
addr = Socket.getaddrinfo(host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(22, addr[0][3]))
rescue Errno::EINPROGRESS
resp = IO.select(nil, [sock], nil, #timeout.to_i)
if resp.nil?
puts "#{host}:Firewalled"
end
begin
if sock.connect_nonblock(Socket.pack_sockaddr_in(22, addr[0][3]))
puts "#{host}:Connected"
end
rescue Errno::ECONNREFUSED
puts "#{host}:Refused"
rescue
false
end
end
sock
end
def asynchronous
s = 1
threads = []
while s <= #arg1.to_i do
#host_arr << Array.new(4){rand(254)}.join('.')
s += 1
end
#host_arr.each do |ip|
threads << Thread.new do
begin
popen(ip)
rescue
end
end
end
threads.each do |thread|
thread.join
end
end
end
scan = Scan.pool(size: 100, args: [ARGV[0]])
(0..20).to_a.map { scan.future.asynchronous }
Around half the time I get this:
D, [2014-09-30T17:06:12.810856 #30077] DEBUG -- : Terminating 11 actors...
W, [2014-09-30T17:06:12.812151 #30077] WARN -- : Terminating task: type=:finalizer, meta={:method_name=>:shutdown}, status=:receiving
Celluloid::TaskFiber backtrace unavailable. Please try Celluloid.task_class = Celluloid::TaskThread if you need backtraces here.
and the script does nothing at all. The rest of the time (only if I specify more then 1000) I get this: http://pastebin.com/wTmtPmc8
So, my question is this. How do I avoid race conditions and deadlocking, while still achieving what I want in this particular script?

Starting low-level Threads by yourself interferes with Celluloid's functionality. Instead create a Pool of Scan objects and feed them the IP's all at once. They will queue up for the available
class Scan
def popen
…
end
end
scanner_pool = Scan.pool(50)
resulsts = #host_arr.map { |host| scanner_pool.scan(host) }

Related

Increasing Ruby Resolv Speed

Im trying to build a sub-domain brute forcer for use with my clients - I work in security/pen testing.
Currently, I am able to get Resolv to look up around 70 hosts in 10 seconds, give or take and wanted to know if there was a way to get it to do more. I have seen alternative scripts out there, mainly Python based that can achieve far greater speeds than this. I don't know how to increase the number of requests Resolv makes in parallel, or if i should split the list up. Please note I have put Google's DNS servers in the sample code, but will be using internal ones for live usage.
My rough code for debugging this issue is:
require 'resolv'
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
subs = []
domains = File.open("domains.txt", "r") #list of domain names line by line.
Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
File.open("tiny.txt", "r").each_line do |subdomain|
subdomain.chomp!
domains.each do |d|
puts "Checking #{subdomain}.#{d}"
ip = Resolv.new.getaddress "#{subdomain}.#{d}" rescue ""
if ip != nil
subs << subdomain+"."+d << ip
end
end
end
test = subs.each_slice(4).to_a
test.each do |z|
if !z[1].nil? and !z[3].nil?
puts z[0] + "\t" + z[1] + "\t\t" + z[2] + "\t" + z[3]
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
subdomains
domains.txt is my list of client domain names, for example google.com, bbc.co.uk, apple.com and 'tiny.txt' is a list of potential subdomain names, for example ftp, www, dev, files, upload. Resolv will then lookup files.bbc.co.uk for example and let me know if it exists.
One thing is you are creating a new Resolv instance with the Google nameservers, but never using it; you create a brand new Resolv instance to do the getaddress call, so that instance is probably using some default nameservers and not the Google ones. You could change the code to something like this:
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
# ...
ip = resolv.getaddress "#{subdomain}.#{d}" rescue ""
In addition, I suggest using the File.readlines method to simplify your code:
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
Also, you're rescuing the bad ip and setting it to the empty string, but then in the next line you test for not nil, so all results should pass, and I don't think that's what you want.
I've refactored your code, but not tested it. Here is what I came up with, and may be clearer:
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
valid_subdomains = subdomains.each_with_object([]) do |subdomain, valid_subdomains|
domains.each do |domain|
combined_name = "#{subdomain}.#{domain}"
puts "Checking #{combined_name}"
ip = resolv.getaddress(combined_name) rescue nil
valid_subdomains << "#{combined_name}#{ip}" if ip
end
end
valid_subdomains.each_slice(4).each do |z|
if z[1] && z[3]
puts "#{z[0]}\t#{z[1]}\t\t#{z[2]}\t#{z[3]}"
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
Also, you might want to check out the dnsruby gem (https://github.com/alexdalitz/dnsruby). It might do what you want to do better than Resolv.
[Note: I've rewritten the code so that it fetches the IP addresses in chunks. Please see https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed. You can modify the RESOLVE_CHUNK_SIZE constant to balance performance with resource load.]
I've rewritten this code using the dnsruby gem (written mainly by Alex Dalitz in the UK, and contributed to by myself and others). This version uses asynchronous message processing so that all requests are being processed pretty much simultaneously. I've posted a gist at https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed but will also post the code here.
Note that since you are new to Ruby, there are lots of things in the code that might be instructive to you, such as method organization, use of Enumerable methods (e.g. the amazing 'partition' method), the Struct class, rescuing a specific Exception class, %w, and Benchmark.
NOTE: LOOKS LIKE STACK OVERFLOW ENFORCES A MAXIMUM MESSAGE SIZE, SO THIS CODE IS TRUNCATED. GO TO THE GIST IN THE LINK ABOVE FOR THE COMPLETE CODE.
#!/usr/bin/env ruby
# Takes a list of subdomain prefixes (e.g. %w(ftp xyz)) and a list of domains (e.g. %w(nytimes.com afp.com)),
# creates the subdomains combining them, fetches their IP addresses (or nil if not found).
require 'dnsruby'
require 'awesome_print'
RESOLVER = Dnsruby::Resolver.new(:nameserver => %w(8.8.8.8 8.8.4.4))
# Experiment with this to get fast throughput but not overload the dnsruby async mechanism:
RESOLVE_CHUNK_SIZE = 50
IpEntry = Struct.new(:name, :ip) do
def to_s
"#{name}: #{ip ? ip : '(nil)'}"
end
end
def assemble_subdomains(subdomain_prefixes, domains)
domains.each_with_object([]) do |domain, subdomains|
subdomain_prefixes.each do |prefix|
subdomains << "#{prefix}.#{domain}"
end
end
end
def create_query_message(name)
Dnsruby::Message.new(name, 'A')
end
def parse_response_for_address(response)
begin
a_answer = response.answer.detect { |a| a.type == 'A' }
a_answer ? a_answer.rdata.to_s : nil
rescue Dnsruby::NXDomain
return nil
end
end
def get_ip_entries(names)
queue = Queue.new
names.each do |name|
query_message = create_query_message(name)
RESOLVER.send_async(query_message, queue, name)
end
# Note: although map is used here, the record in the output array will not necessarily correspond
# to the record in the input array, since the order of the messages returned is not guaranteed.
# This is indicated by the lack of block variable specified (normally w/map you would use the element).
# That should not matter to us though.
names.map do
_id, result, error = queue.pop
name = _id
case error
when Dnsruby::NXDomain
IpEntry.new(name, nil)
when NilClass
ip = parse_response_for_address(result)
IpEntry.new(name, ip)
else
raise error
end
end
end
def main
# domains = File.readlines("domains.txt").map(&:chomp)
domains = %w(nytimes.com afp.com cnn.com bbc.com)
# subdomain_prefixes = File.readlines("subdomain_prefixes.txt").map(&:chomp)
subdomain_prefixes = %w(www xyz)
subdomains = assemble_subdomains(subdomain_prefixes, domains)
start_time = Time.now
ip_entries = subdomains.each_slice(RESOLVE_CHUNK_SIZE).each_with_object([]) do |ip_entries_chunk, results|
results.concat get_ip_entries(ip_entries_chunk)
end
duration = Time.now - start_time
found, not_found = ip_entries.partition { |entry| entry.ip }
puts "\nFound:\n\n"; puts found.map(&:to_s); puts "\n\n"
puts "Not Found:\n\n"; puts not_found.map(&:to_s); puts "\n\n"
stats = {
duration: duration,
domain_count: ip_entries.size,
found_count: found.size,
not_found_count: not_found.size,
}
ap stats
end
main

Celluloid output is out of order and formatted erratically

I have a working script that utilizes celluloid for network parallelism. What it does is scan a range of IP addresses and tries to connect to them. It will output either ip_addr: Filtered, Refused, or Connected. The only problem with the script is the way the results are printed. Instead of being in order, like so:
192.168.0.20: Filtered
192.168.0.21: Connected
It outputs like this:
192.168.0.65 Firewalled!
192.168.0.11 Firewalled!192.168.0.183 Firewalled!192.168.0.28 Firewalled!192.168.0.171 Firewalled!192.168.0.228 Firewalled!
192.168.0.238 Firewalled!192.168.0.85 Firewalled!192.168.0.148 Firewalled!192.168.0.154 Firewalled!192.168.0.76 Firewalled!192.168.0.115 Firewalled!
192.168.0.215 Firewalled!
In the terminal. As you can see it's completely erratic. Here's the relevant code:
def connect
addr = Socket.getaddrinfo(#host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(#port, addr[0][3]))
rescue Errno::EINPROGRESS
resp = IO.select(nil, [sock], nil, #timeout.to_i)
if resp.nil?
puts "#{#host} Firewalled!"
end
begin
if sock.connect_nonblock(Socket.pack_sockaddr_in(#port, addr[0][3]))
puts "#{#host} Connected!"
end
rescue Errno::ECONNREFUSED
puts "#{#host} Refused!"
rescue
false
end
end
sock
end
range = []
main = Ranger.new(ARGV[0], ARGV[1])
(1..254).each do |oct|
range << main.strplace(ARGV[0]+oct.to_s)
end
threads = []
range.each do |ip|
threads << Thread.new do
scan = Ranger.new(ip, ARGV[1])
scan.future :connect
end
end
threads.each do |thread|
thread.join
end
I think I know what the problem is. You see, puts is not thread-safe. When you call puts, it does 2 things: a) It prints whatever you want to the screen and b) It inserts a newline \n at the end. So one thread (thread A) could do a) but then stop and another thread (thread B) could also do a), then the operating system might go again to thread A which will do b) etc., thus producing the input you're seeing.
So the solution would be to replace all instances of puts with "print whatever-you-want \n". For example, this:
puts "#{#host} Firewalled!"
could be converted into:
print "#{#host} Firewalled!\n"
Unlike puts, print is thread-safe and cannot be interrupted before it's complete.

Put contents of array all at once

I don't understand why this won't do what the title states.
#!/usr/bin/env ruby
require 'socket'
require 'timeout'
class Scanner
def initialize(host, port)
#host = host
#port = port
end
def popen
begin
array = []
sock = Socket.new(:INET, :STREAM)
sockaddr = Socket.sockaddr_in(#port, #host)
Timeout::timeout(5) do
array.push("Port #{#port}: Open") if sock.connect(sockaddr)
end
puts array
rescue Timeout::Error
puts "Port #{#port}: Filtered"
rescue Errno::ECONNREFUSED
end
end
end # end Scanner
def main
begin
p = 1
case ARGV[0]
when '-p'
eport = ARGV[1]
host = ARGV[2]
else
eport = 65535
host = ARGV[0]
end
t1 = Time.now
puts "\n"
puts "-" * 70
puts "Scanning #{host}..."
puts "-" * 70
while p <= eport.to_i do
scan = Scanner.new(host, p)
scan.popen
p += 1
end
t2 = Time.now
time = t2 - t1
puts "\nScan completed: #{host} scanned in #{time} seconds."
rescue Errno::EHOSTUNREACH
puts "This host appears to be unreachable"
rescue Interrupt
puts "onnection terminated."
end
end
main
What I'm trying to achieve is an output similar to nmap, in the way that it scans everything, and then shows all open or closed ports at the end. Instead what happens is that it prints them out as it discovers them. I figured pushing the output into an array then printing the array would achieve such an output, yet it still prints out the ports one at a time. Why is this happening?
Also, I apologize for the formatting, the code tags are a little weird.
Your loop calls popen once per iteration. Your popen method sets array = [] each time it is called, then populates it with one item, then you print it with puts. On the next loop iteration, you reset array to [] and do it all again.
You only asked "why," but – you could solve this by setting array just once in the body of main and then passing it to popen (or any number of ways).

Ruby 192 recursive thread lock error

I am working with ruby 192 p290: under one unit-test script (shown below) throws ThreadError
1) Error:
test_orchpr_pass(TC_MyTest):
ThreadError: deadlock; recursive locking
internal:prelude:8:in `lock'
internal:prelude:8:in `synchronize'
testth.rb:121:in `orchpr_run'
testth.rb:158:in `test_orchpr_pass'
With ruby 187 gives error: Thread tried to join itself.
CODE
def orchpr_run(timeout = 60)
# used by the update function to signal that a final update was
# received from all clients
#update_mutex.lock
# required since we'll have to act as an observer to the DRb server
DRb.start_service
# get configuration objects
run_config_type = DataLayer.get_run_config_type
client_daemon = DataLayer.get_client_daemon_by_branch (run_config_type, #branch)
client_daemon['port_no'] = 9096
#get the servers for this client_daemon
servers = DataLayer.get_servers(run_config_type, client_daemon.id)
servers.each { |server| #pr[server.host_name] = OrchestratedPlatformRun.new(run_config_type, server, timeout)
}
#pr.each_value { |x| x.add_observer(self)
#pr.each_value { |x| x.start(#service_command_pass, true)
# wait for update to receive notifications from all servers # this is the statement causing error:
#update_mutex.synchronize {} end
Another piece of code throwing same error:
require "thread"
require "timeout"
def calc_fib(n)
if n == 0
0
elsif n == 1
1
else
calc_fib(n-1) + calc_fib(n-2)
end
end
lock = Mutex.new
threads = 20.times.collect do
Thread.new do
20.times do
begin
Timeout.timeout(0.25) do
lock.synchronize{ calc_fib(1000) }
end
rescue ThreadError => e
puts "#{e.class}: #{e.message}:\n" + e.backtrace.join("\n") + "\n\n"
rescue Timeout::Error => e
#puts e.class
nil
end
end
end
end
threads.each{ |t| t.join }
Commenting synchronizing Block will cause the error to disappear but then then threads are not able to synchronize. I found some stuff on net, saying bug with ruby 192 , need changes in file prelude.rb and thread.c regarding MUTEX synchronization.
But Under windows installation Unable to find file prelude.rb
If a mutex is locked by a thread then an error will be raised if you try and lock it again from the same thread
This is exactly what you are doing, since synchronize is just a convenience for method for locking the mutex, yielding to the block and then releasing the lock. I'm not sure what you're trying to do, but it feels to me like you might be trying to use mutexes for something other than their intended purposes.
Using threads and locks well is difficult to get right - you might want to look at celluloid for a different approach to concurrency.

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Resources