How to pass data between task in Ruby Rake? - ruby

So how to pass data between task in Ruby Rake?
Believe me I read a lot bout this in the internet and none makes sense.
I found the ENV['some_var'] approach, But I can't share objects without some conversions, unnecessary additional conversions that cost time to me and the processor :(. Additionally: "Come on, is this the best that they made up?"
Somebody said use instance or class variables. It is hacky isn't it? I mean it is semantic fiasco. The modules in Ruby are places to put methods and constants- I read somewhere, which makes sense at some some extend, but class and instance variables in modules? Why classes then?
So how can I share data between two rake tasks without doing some hacking?
For example: How to pass the response object from task get to task ping_server:
require 'net/http'
require 'uri'
namespace :http_request do
desc 'Request server to obtain status, and stores the data in the memcache specified by the environment'
# This task is made generic so it can serve as a low level routine for other tasks.
# Thus avoiding repetitive code.
task :get, [:url] => [:environment] do |t, args|
#configuration
WAIT_RESPONSE_TO_IN_SECONDS = 5
uri = URI.parse(args[:url])
http = Net::HTTP.new(uri.host, uri.port)
# We cannot wait for response forever, therefore provide timeout
http.open_timeout = WAIT_RESPONSE_TO_IN_SECONDS # in seconds
request = Net::HTTP::Get.new(uri.path)
# The response may take too long, or the URI may be bad(invalid)
begin
response = http.request(request)
puts response.code
ENV['req_response'] = {status: "ok", val: response.inspect}.to_s
# Rails.cache.write(args[:name], response.code)
rescue Exception => e
puts "\nRequest filed: #{e}\n"
ENV['req_response'] = {status: e.to_s, val: nil.to_s}.to_s
end
end
end
namespace :server_state do
desc "write cache"
task :ping_server, [:url] => "http_request:get" do
response = eval(ENV['req_response'])
puts "\n\nRESULT = #{response}"
puts "\n\nRESULT = #{response[:val]}"
end
end

Related

Increasing Ruby Resolv Speed

Im trying to build a sub-domain brute forcer for use with my clients - I work in security/pen testing.
Currently, I am able to get Resolv to look up around 70 hosts in 10 seconds, give or take and wanted to know if there was a way to get it to do more. I have seen alternative scripts out there, mainly Python based that can achieve far greater speeds than this. I don't know how to increase the number of requests Resolv makes in parallel, or if i should split the list up. Please note I have put Google's DNS servers in the sample code, but will be using internal ones for live usage.
My rough code for debugging this issue is:
require 'resolv'
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
subs = []
domains = File.open("domains.txt", "r") #list of domain names line by line.
Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
File.open("tiny.txt", "r").each_line do |subdomain|
subdomain.chomp!
domains.each do |d|
puts "Checking #{subdomain}.#{d}"
ip = Resolv.new.getaddress "#{subdomain}.#{d}" rescue ""
if ip != nil
subs << subdomain+"."+d << ip
end
end
end
test = subs.each_slice(4).to_a
test.each do |z|
if !z[1].nil? and !z[3].nil?
puts z[0] + "\t" + z[1] + "\t\t" + z[2] + "\t" + z[3]
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
subdomains
domains.txt is my list of client domain names, for example google.com, bbc.co.uk, apple.com and 'tiny.txt' is a list of potential subdomain names, for example ftp, www, dev, files, upload. Resolv will then lookup files.bbc.co.uk for example and let me know if it exists.
One thing is you are creating a new Resolv instance with the Google nameservers, but never using it; you create a brand new Resolv instance to do the getaddress call, so that instance is probably using some default nameservers and not the Google ones. You could change the code to something like this:
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
# ...
ip = resolv.getaddress "#{subdomain}.#{d}" rescue ""
In addition, I suggest using the File.readlines method to simplify your code:
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
Also, you're rescuing the bad ip and setting it to the empty string, but then in the next line you test for not nil, so all results should pass, and I don't think that's what you want.
I've refactored your code, but not tested it. Here is what I came up with, and may be clearer:
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
valid_subdomains = subdomains.each_with_object([]) do |subdomain, valid_subdomains|
domains.each do |domain|
combined_name = "#{subdomain}.#{domain}"
puts "Checking #{combined_name}"
ip = resolv.getaddress(combined_name) rescue nil
valid_subdomains << "#{combined_name}#{ip}" if ip
end
end
valid_subdomains.each_slice(4).each do |z|
if z[1] && z[3]
puts "#{z[0]}\t#{z[1]}\t\t#{z[2]}\t#{z[3]}"
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
Also, you might want to check out the dnsruby gem (https://github.com/alexdalitz/dnsruby). It might do what you want to do better than Resolv.
[Note: I've rewritten the code so that it fetches the IP addresses in chunks. Please see https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed. You can modify the RESOLVE_CHUNK_SIZE constant to balance performance with resource load.]
I've rewritten this code using the dnsruby gem (written mainly by Alex Dalitz in the UK, and contributed to by myself and others). This version uses asynchronous message processing so that all requests are being processed pretty much simultaneously. I've posted a gist at https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed but will also post the code here.
Note that since you are new to Ruby, there are lots of things in the code that might be instructive to you, such as method organization, use of Enumerable methods (e.g. the amazing 'partition' method), the Struct class, rescuing a specific Exception class, %w, and Benchmark.
NOTE: LOOKS LIKE STACK OVERFLOW ENFORCES A MAXIMUM MESSAGE SIZE, SO THIS CODE IS TRUNCATED. GO TO THE GIST IN THE LINK ABOVE FOR THE COMPLETE CODE.
#!/usr/bin/env ruby
# Takes a list of subdomain prefixes (e.g. %w(ftp xyz)) and a list of domains (e.g. %w(nytimes.com afp.com)),
# creates the subdomains combining them, fetches their IP addresses (or nil if not found).
require 'dnsruby'
require 'awesome_print'
RESOLVER = Dnsruby::Resolver.new(:nameserver => %w(8.8.8.8 8.8.4.4))
# Experiment with this to get fast throughput but not overload the dnsruby async mechanism:
RESOLVE_CHUNK_SIZE = 50
IpEntry = Struct.new(:name, :ip) do
def to_s
"#{name}: #{ip ? ip : '(nil)'}"
end
end
def assemble_subdomains(subdomain_prefixes, domains)
domains.each_with_object([]) do |domain, subdomains|
subdomain_prefixes.each do |prefix|
subdomains << "#{prefix}.#{domain}"
end
end
end
def create_query_message(name)
Dnsruby::Message.new(name, 'A')
end
def parse_response_for_address(response)
begin
a_answer = response.answer.detect { |a| a.type == 'A' }
a_answer ? a_answer.rdata.to_s : nil
rescue Dnsruby::NXDomain
return nil
end
end
def get_ip_entries(names)
queue = Queue.new
names.each do |name|
query_message = create_query_message(name)
RESOLVER.send_async(query_message, queue, name)
end
# Note: although map is used here, the record in the output array will not necessarily correspond
# to the record in the input array, since the order of the messages returned is not guaranteed.
# This is indicated by the lack of block variable specified (normally w/map you would use the element).
# That should not matter to us though.
names.map do
_id, result, error = queue.pop
name = _id
case error
when Dnsruby::NXDomain
IpEntry.new(name, nil)
when NilClass
ip = parse_response_for_address(result)
IpEntry.new(name, ip)
else
raise error
end
end
end
def main
# domains = File.readlines("domains.txt").map(&:chomp)
domains = %w(nytimes.com afp.com cnn.com bbc.com)
# subdomain_prefixes = File.readlines("subdomain_prefixes.txt").map(&:chomp)
subdomain_prefixes = %w(www xyz)
subdomains = assemble_subdomains(subdomain_prefixes, domains)
start_time = Time.now
ip_entries = subdomains.each_slice(RESOLVE_CHUNK_SIZE).each_with_object([]) do |ip_entries_chunk, results|
results.concat get_ip_entries(ip_entries_chunk)
end
duration = Time.now - start_time
found, not_found = ip_entries.partition { |entry| entry.ip }
puts "\nFound:\n\n"; puts found.map(&:to_s); puts "\n\n"
puts "Not Found:\n\n"; puts not_found.map(&:to_s); puts "\n\n"
stats = {
duration: duration,
domain_count: ip_entries.size,
found_count: found.size,
not_found_count: not_found.size,
}
ap stats
end
main

Creating a Ruby API

I have been tasked with creating a Ruby API that retrieves youtube URL's. However, I am not sure of the proper way to create an 'API'... I did the following code below as a Sinatra server that serves up JSON, but what exactly would be the definition of an API and would this qualify as one? If this is not an API, how can I make in an API? Thanks in advance.
require 'open-uri'
require 'json'
require 'sinatra'
# get user input
puts "Please enter a search (seperate words by commas):"
search_input = gets.chomp
puts
puts "Performing search on YOUTUBE ... go to '/videos' API endpoint to see the results and use the output"
puts
# define query parameters
api_key = 'my_key_here'
search_url = 'https://www.googleapis.com/youtube/v3/search'
params = {
part: 'snippet',
q: search_input,
type: 'video',
videoCaption: 'closedCaption',
key: api_key
}
# use search_url and query parameters to construct a url, then open and parse the result
uri = URI.parse(search_url)
uri.query = URI.encode_www_form(params)
result = JSON.parse(open(uri).read)
# class to define attributes of each video and format into eventual json
class Video
attr_accessor :title, :description, :url
def initialize
#title = nil
#description = nil
#url = nil
end
def to_hash
{
'title' => #title,
'description' => #description,
'url' => #url
}
end
def to_json
self.to_hash.to_json
end
end
# create an array with top 3 search results
results_array = []
result["items"].take(3).each do |video|
#video = Video.new
#video.title = video["snippet"]["title"]
#video.description = video["snippet"]["description"]
#video.url = video["snippet"]["thumbnails"]["default"]["url"]
results_array << #video.to_json.gsub!(/\"/, '\'')
end
# define the API endpoint
get '/videos' do
results_array.to_json
end
An "API = Application Program Interface" is, simply, something that another program can reliably use to get a job done, without having to busy its little head about exactly how the job is done.
Perhaps the simplest thing to do now, if possible, is to go back to the person who "tasked" you with this task, and to ask him/her, "well, what do you have in mind?" The best API that you can design, in this case, will be the one that is most convenient for the people (who are writing the programs which ...) will actually have to use it. "Don't guess. Ask!"
A very common strategy for an API, in a language like Ruby, is to define a class which represents "this application's connection to this service." Anyone who wants to use the API does so by calling some function which will return a new instance of this class. Thereafter, the program uses this object to issue and handle requests.
The requests, also, are objects. To issue a request, you first ask the API-connection object to give you a new request-object. You then fill-out the request with whatever particulars, then tell the request object to "go!" At some point in the future, and by some appropriate means (such as a callback ...) the request-object informs you that it succeeded or that it failed.
"A whole lot of voodoo-magic might have taken place," between the request object and the connection object which spawned it, but the client does not have to care. And that, most of all, is the objective of any API. "It Just Works.™"
I think they want you to create a third-party library. Imagine you are schizophrenic for a while.
Joe wants to build a Sinatra application to list some YouTube videos, but he is lazy and he does not want to do the dirty work, he just wants to drop something in, give it some credentials, ask for urls and use them, finito.
Joe asks Bob to implement it for him and he gives him his requirements: "Bob, I need YouTube library. I need it to do:"
# Please note that I don't know how YouTube API works, just guessing.
client = YouTube.new(api_key: 'hola')
video_urls = client.videos # => ['https://...', 'https://...', ...]
And Bob says "OK." end spends a day in his interactive console.
So first, you should figure out how you are going to use your not-yet-existing lib, if you can – sometimes you just don't know yet.
Next, build that library based on the requirements, then drop it in your Sinatra app and you're done. Does that help?

Suggested Redis driver for use within Goliath?

There seem to be several options for establishing Redis connections for use within EventMachine, and I'm having a hard time understanding the core differences between them.
My goal is to implement Redis within Goliath
The way I establish my connection now is through em-synchrony:
require 'em-synchrony'
require 'em-synchrony/em-redis'
config['redis'] = EventMachine::Synchrony::ConnectionPool.new(:size => 20) do
EventMachine::Protocols::Redis.connect(:host => 'localhost', :port => 6379)
end
What is the difference between the above, and using something like em-hiredis?
If I'm using Redis for sets and basic key:value storage, is em-redis the best solution for my scenario?
We use em-hiredis very successfully inside Goliath. Here's a sample of how we coded publishing:
config/example_api.rb
# These give us direct access to the redis connection from within the API
config['redisUri'] = 'redis://localhost:6379/0'
config['redisPub'] ||= EM::Hiredis.connect('')
example_api.rb
class ExampleApi < Goliath::API
use Goliath::Rack::Params # parse & merge query and body parameters
use Goliath::Rack::Formatters::JSON # JSON output formatter
use Goliath::Rack::Render # auto-negotiate response format
def response(env)
env.logger.debug "\n\n\nENV: #{env['PATH_INFO']}"
env.logger.debug "REQUEST: Received"
env.logger.debug "POST Action received: #{env.params} "
#processing of requests from browser goes here
resp =
case env.params["action"]
when 'SOME_ACTION' then process_action(env)
when 'ANOTHER_ACTION' then process_another_action(env)
else
# skip
end
env.logger.debug "REQUEST: About to respond with: #{resp}"
[200, {'Content-Type' => 'application/json', 'Access-Control-Allow-Origin' => "*"}, resp]
end
# process an action
def process_action(env)
# extract message data
data = Hash.new
data["user_id"], data["object_id"] = env.params['user_id'], env.params['object_id']
publishData = { "action" => 'SOME_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
# process anothr action
def process_another_action(env)
# extract message data
data = Hash.new
data["user_id"], data["widget_id"] = env.params['user_id'], env.params['widget_id']
publishData = { "action" => 'SOME_OTHER_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
end
Handling subscriptions are left as an exercise for the reader.
what em-synchrony does is patch the em-redis gem to allow using it with fibers which effectively allows it to run in goliath.
Here is a project using Goliath + Redis which can guide you on how to make all this works: https://github.com/igrigorik/mneme
Example with em-hiredis, what goliath do is wrap your request in a fiber so a way to test it is:
require 'rubygems'
require 'bundler/setup'
require 'em-hiredis'
require 'em-synchrony'
EM::run do
Fiber.new do
## this is what you can use in goliath
redis = EM::Hiredis.connect
p EM::Synchrony.sync redis.keys('*')
## end of goliath block
end.resume
end
and the Gemfile I used:
source :rubygems
gem 'em-hiredis'
gem 'em-synchrony'
If you run this example you will get the list of defined keys in your redis database printed on screen.
Without the EM::Synchrony.sync call you would get a deferrable but here the fiber is suspended until the calls return and you get the result.

Ruby library to make multiple HTTP requests simultaneously

I'm looking for an alternate Ruby HTTP library that makes multiple HTTP calls simultaneously and performs better than the core Net::HTTP library.
You are probably looking for Typhoeus.
Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
https://github.com/typhoeus/typhoeus
Why do you need a networking library handle parallelism? That is exactly what threads are for.
require "open-uri"
fetcher = lambda do |uri|
puts "Started fetching #{uri}"
puts open(uri).read
puts "Stopped fetching #{uri}"
end
thread1 = Thread.new("http://localhost:9292", &fetcher)
thread2 = Thread.new("http://localhost:9293", &fetcher)
thread1.join
thread2.join
Also, I don't understand what do you mean by "performs better". Core libraries are usually good enough to be in the core. Do you have any problems with Net::HTTP?
You can use the Parallel gem, it should work with any Ruby HTTP library.
Not sure if it performs better then Typhoeus, BUT you could use Eventmacheine + em-http-request. There is an example for sending multiple requests.
require 'eventmachine'
require 'em-http'
EventMachine.run {
multi = EventMachine::MultiRequest.new
reqs = [
'http://google.com/',
'http://google.ca:81/'
]
reqs.each_with_index do |url, idx|
http = EventMachine::HttpRequest.new(url, :connect_timeout => 1)
req = http.get
multi.add idx, req
end
multi.callback do
p multi.responses[:callback].size
p multi.responses[:errback].size
EventMachine.stop
end
}
https://github.com/igrigorik/em-http-request

Using Open-URI to fetch XML and the best practice in case of problems with a remote url not returning/timing out?

Current code works as long as there is no remote error:
def get_name_from_remote_url
cstr = "http://someurl.com"
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
doc = Nokogiri::XML(getresult)
my_data = doc.xpath("/session/name").text
# => 'Fred' or 'Sam' etc
return my_data
end
But, what if the remote URL times out or returns nothing? How I detect that and return nil, for example?
And, does Open-URI give a way to define how long to wait before giving up? This method is called while a user is waiting for a response, so how do we set a max timeoput time before we give up and tell the user "sorry the remote server we tried to access is not available right now"?
Open-URI is convenient, but that ease of use means they're removing the access to a lot of the configuration details the other HTTP clients like Net::HTTP allow.
It depends on what version of Ruby you're using. For 1.8.7 you can use the Timeout module. From the docs:
require 'timeout'
begin
status = Timeout::timeout(5) {
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
}
rescue Timeout::Error => e
puts e.to_s
end
Then check the length of getresult to see if you got any content:
if (getresult.empty?)
puts "got nothing from url"
end
If you are using Ruby 1.9.2 you can add a :read_timeout => 10 option to the open() method.
Also, your code could be tightened up and made a bit more flexible. This will let you pass in a URL or default to the currently used URL. Also read Nokogiri's NodeSet docs to understand the difference between xpath, /, css and at, %, at_css, at_xpath:
def get_name_from_remote_url(cstr = 'http://someurl.com')
doc = Nokogiri::XML(open(cstr, 'UserAgent' => 'Ruby-OpenURI'))
# xpath returns a nodeset which has to be iterated over
# my_data = doc.xpath('/session/name').text # => 'Fred' or 'Sam' etc
# at returns a single node
doc.at('/session/name').text
end

Resources