Ruby: Decorator pattern slows simple program by a lot - ruby

I recently wrote a program to return a bunch of stocks from the stock market that are unhealthy. The basic algorithm is this:
Look up all the quotes of every stock in an exchange (either NYSE or NASDAQ)
Find the ones that are less than 5 dollars from step 1
Find the ones from step 2 that are down 3 days and have large volume (expensive because I have to make a request for each stock, which is like ~700 currently for nasdaq).
Scan the news for the ones returned by step 3.
I had this all in one file:
Original implementation (https://github.com/EdmundMai/minion/blob/aa14bc3234a4953e7273ec502276c6f0073b459d/lib/minion.rb):
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "business_time"
require 'nokogiri'
require 'open-uri'
module Minion
class << self
def query(exchange)
client = YahooFinance::Client.new
all_companies = CSV.read("#{exchange}.csv")
small_caps = []
ticker_symbols = all_companies.map { |row| row[0] }
ticker_symbols.each_slice(200) do |batch|
data = client.quotes(batch, [:symbol, :last_trade_price, :average_daily_volume])
small_caps << data.select { |stock| stock.last_trade_price.to_f < 5.0 }
end
attractive = []
small_caps.flatten!.each_with_index do |small_cap, index|
begin
data = client.historical_quotes(small_cap.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > small_cap.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
attractive << small_cap.symbol
puts "Qualified: #{small_cap.symbol}, finished with #{index} out of #{small_caps.count}"
else
puts "Not qualified: #{small_cap.symbol}, finished with #{index} out of #{small_caps.count}"
end
rescue => e
puts e.inspect
end
end
final_results = []
attractive.each do |symbol|
rss_feed = Nokogiri::HTML(open("http://feeds.finance.yahoo.com/rss/2.0/headline?s=#{symbol}&region=US&lang=en-US"))
html_body = rss_feed.css('body')[0].text
diluting = false
['warrant', 'cashless exercise'].each do |keyword|
diluting = true if html_body.match(/#{keyword}/i)
end
final_results << symbol if diluting
end
final_results
end
end
end
This was really fast and would finish processing like ~700 stocks in a minute or less.
Then, I tried refactoring and splitting up the algorithm into different classes and files without changing the algorithm at all. I decided on using the decorator pattern since it seems to fit. However when I run the program now, it makes each request really slowly (15+ min). I know this because my puts statements get printed out really slowly.
New and slower implementation (https://github.com/EdmundMai/minion/blob/master/lib/minion.rb)
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "minion/dilution_finder"
require "minion/negative_finder"
require "minion/small_cap_finder"
require "minion/market_fetcher"
module Minion
class << self
def query(exchange)
all_companies = CSV.read("#{exchange}.csv")
all_tickers = all_companies.map { |row| row[0] }
short_finder = DilutionFinder.new(NegativeFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))))
short_finder.results
end
end
end
The part it's lagging at according to my puts:
require "yahoo-finance"
require "business_time"
require_relative "stock_finder"
class NegativeFinder < StockFinder
def results
client = YahooFinance::Client.new
results = []
finder.results.each_with_index do |stock, index|
begin
data = client.historical_quotes(stock.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > stock.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
results << stock
puts "Qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}"
else
puts "Not qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}"
end
rescue => e
puts e.inspect
end
end
results
end
end
It's lagging on step 3 (making one request for each stock). Not sure what's going on so any advice would be appreciated. If you want to clone the program and run it, just comment in the last line in lib/minion.rb and type ruby lib/minion.rb

After debugging it I figured it out. It was because I was calling finder.results (results being the decorated method) inside of the loop as shown below:
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "minion/dilution_finder"
require "minion/negative_finder"
require "minion/small_cap_finder"
require "minion/market_fetcher"
module Minion
class << self
def query(exchange)
all_companies = CSV.read("#{exchange}.csv")
all_tickers = all_companies.map { |row| row[0] }
short_finder = DilutionFinder.new(NegativeFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))))
short_finder.results
end
end
end
The part it's lagging at according to my puts:
require "yahoo-finance"
require "business_time"
require_relative "stock_finder"
class NegativeFinder < StockFinder
def results
client = YahooFinance::Client.new
results = []
finder.results.each_with_index do |stock, index|
begin
data = client.historical_quotes(stock.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > stock.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
results << stock
// HERE!!!!!!!!!!!!!!!!!!!!!!!!!
puts "Qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}" <------------------------------------
else
// AND HERE!!!!!!!!!!!!!!!!!!!!!!!!!
puts "Not qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}" <-----------------------------------------------------------
end
rescue => e
puts e.inspect
end
end
results
end
end
This caused a cascade of requests every time I iterated through the loop in NegativeFinder. Removing that call fixed it. Lesson: When using the decorator pattern, either only call the decorated method once, especially when you're doing something expensive in each call. Either that or hold the returned variable in an instance variable so you don't have to calculate it each time.
Also as a side note, I've decided not to go with the decorator pattern because I don't think it applies well here. Something like SmallCapFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))) doesn't add functionality at all (the primary function of using the decorator pattern), so chaining decorators doesn't do anything. Therefore, I'm just going to make them methods instead of adding unnecessary complexity.

There are some thing missing in the code you gave us (Base class StockFinder, MarketFetcher). But I think you are now instantate more than one YahooFinance::Client. Input/Output to other systems is very often the cause for speed problems.
I suggest that you first encapsulate the finance client and access to financial data. This makes it easier when you want to switch your financial data provider, or add another one. Instead of the decorator pattern, I would just use plain old methods for finding small caps, finding negative, etc.

Related

Increasing Ruby Resolv Speed

Im trying to build a sub-domain brute forcer for use with my clients - I work in security/pen testing.
Currently, I am able to get Resolv to look up around 70 hosts in 10 seconds, give or take and wanted to know if there was a way to get it to do more. I have seen alternative scripts out there, mainly Python based that can achieve far greater speeds than this. I don't know how to increase the number of requests Resolv makes in parallel, or if i should split the list up. Please note I have put Google's DNS servers in the sample code, but will be using internal ones for live usage.
My rough code for debugging this issue is:
require 'resolv'
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
subs = []
domains = File.open("domains.txt", "r") #list of domain names line by line.
Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
File.open("tiny.txt", "r").each_line do |subdomain|
subdomain.chomp!
domains.each do |d|
puts "Checking #{subdomain}.#{d}"
ip = Resolv.new.getaddress "#{subdomain}.#{d}" rescue ""
if ip != nil
subs << subdomain+"."+d << ip
end
end
end
test = subs.each_slice(4).to_a
test.each do |z|
if !z[1].nil? and !z[3].nil?
puts z[0] + "\t" + z[1] + "\t\t" + z[2] + "\t" + z[3]
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
subdomains
domains.txt is my list of client domain names, for example google.com, bbc.co.uk, apple.com and 'tiny.txt' is a list of potential subdomain names, for example ftp, www, dev, files, upload. Resolv will then lookup files.bbc.co.uk for example and let me know if it exists.
One thing is you are creating a new Resolv instance with the Google nameservers, but never using it; you create a brand new Resolv instance to do the getaddress call, so that instance is probably using some default nameservers and not the Google ones. You could change the code to something like this:
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
# ...
ip = resolv.getaddress "#{subdomain}.#{d}" rescue ""
In addition, I suggest using the File.readlines method to simplify your code:
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
Also, you're rescuing the bad ip and setting it to the empty string, but then in the next line you test for not nil, so all results should pass, and I don't think that's what you want.
I've refactored your code, but not tested it. Here is what I came up with, and may be clearer:
def subdomains
puts "Subdomain enumeration beginning at #{Time.now.strftime("%H:%M:%S")}"
domains = File.readlines("domains.txt").map(&:chomp)
subdomains = File.readlines("tiny.txt").map(&:chomp)
resolv = Resolv.new(:nameserver => ['8.8.8.8', '8.8.4.4'])
valid_subdomains = subdomains.each_with_object([]) do |subdomain, valid_subdomains|
domains.each do |domain|
combined_name = "#{subdomain}.#{domain}"
puts "Checking #{combined_name}"
ip = resolv.getaddress(combined_name) rescue nil
valid_subdomains << "#{combined_name}#{ip}" if ip
end
end
valid_subdomains.each_slice(4).each do |z|
if z[1] && z[3]
puts "#{z[0]}\t#{z[1]}\t\t#{z[2]}\t#{z[3]}"
end
end
puts "Finished at #{Time.now.strftime("%H:%M:%S")}"
end
Also, you might want to check out the dnsruby gem (https://github.com/alexdalitz/dnsruby). It might do what you want to do better than Resolv.
[Note: I've rewritten the code so that it fetches the IP addresses in chunks. Please see https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed. You can modify the RESOLVE_CHUNK_SIZE constant to balance performance with resource load.]
I've rewritten this code using the dnsruby gem (written mainly by Alex Dalitz in the UK, and contributed to by myself and others). This version uses asynchronous message processing so that all requests are being processed pretty much simultaneously. I've posted a gist at https://gist.github.com/keithrbennett/3cf0be2a1100a46314f662aea9b368ed but will also post the code here.
Note that since you are new to Ruby, there are lots of things in the code that might be instructive to you, such as method organization, use of Enumerable methods (e.g. the amazing 'partition' method), the Struct class, rescuing a specific Exception class, %w, and Benchmark.
NOTE: LOOKS LIKE STACK OVERFLOW ENFORCES A MAXIMUM MESSAGE SIZE, SO THIS CODE IS TRUNCATED. GO TO THE GIST IN THE LINK ABOVE FOR THE COMPLETE CODE.
#!/usr/bin/env ruby
# Takes a list of subdomain prefixes (e.g. %w(ftp xyz)) and a list of domains (e.g. %w(nytimes.com afp.com)),
# creates the subdomains combining them, fetches their IP addresses (or nil if not found).
require 'dnsruby'
require 'awesome_print'
RESOLVER = Dnsruby::Resolver.new(:nameserver => %w(8.8.8.8 8.8.4.4))
# Experiment with this to get fast throughput but not overload the dnsruby async mechanism:
RESOLVE_CHUNK_SIZE = 50
IpEntry = Struct.new(:name, :ip) do
def to_s
"#{name}: #{ip ? ip : '(nil)'}"
end
end
def assemble_subdomains(subdomain_prefixes, domains)
domains.each_with_object([]) do |domain, subdomains|
subdomain_prefixes.each do |prefix|
subdomains << "#{prefix}.#{domain}"
end
end
end
def create_query_message(name)
Dnsruby::Message.new(name, 'A')
end
def parse_response_for_address(response)
begin
a_answer = response.answer.detect { |a| a.type == 'A' }
a_answer ? a_answer.rdata.to_s : nil
rescue Dnsruby::NXDomain
return nil
end
end
def get_ip_entries(names)
queue = Queue.new
names.each do |name|
query_message = create_query_message(name)
RESOLVER.send_async(query_message, queue, name)
end
# Note: although map is used here, the record in the output array will not necessarily correspond
# to the record in the input array, since the order of the messages returned is not guaranteed.
# This is indicated by the lack of block variable specified (normally w/map you would use the element).
# That should not matter to us though.
names.map do
_id, result, error = queue.pop
name = _id
case error
when Dnsruby::NXDomain
IpEntry.new(name, nil)
when NilClass
ip = parse_response_for_address(result)
IpEntry.new(name, ip)
else
raise error
end
end
end
def main
# domains = File.readlines("domains.txt").map(&:chomp)
domains = %w(nytimes.com afp.com cnn.com bbc.com)
# subdomain_prefixes = File.readlines("subdomain_prefixes.txt").map(&:chomp)
subdomain_prefixes = %w(www xyz)
subdomains = assemble_subdomains(subdomain_prefixes, domains)
start_time = Time.now
ip_entries = subdomains.each_slice(RESOLVE_CHUNK_SIZE).each_with_object([]) do |ip_entries_chunk, results|
results.concat get_ip_entries(ip_entries_chunk)
end
duration = Time.now - start_time
found, not_found = ip_entries.partition { |entry| entry.ip }
puts "\nFound:\n\n"; puts found.map(&:to_s); puts "\n\n"
puts "Not Found:\n\n"; puts not_found.map(&:to_s); puts "\n\n"
stats = {
duration: duration,
domain_count: ip_entries.size,
found_count: found.size,
not_found_count: not_found.size,
}
ap stats
end
main

Ruby + recursive function + defining global variable

I am pulling bitbucket repo list using Ruby. The response from bitbucket will contain only 10 repositories and a marker for the next page where there will be another 10 repos and so on ... (they call it pagination)
So, I wrote a recursive function which calls itself if the next page marker is present. This will continue until it reaches the last page.
Here is my code:
#!/usr/local/bin/ruby
require 'net/http'
require 'json'
require 'awesome_print'
#repos = Array.new
def recursive(url)
### here goes my net/http code which connects to bitbucket and pulls back the response in a JSON as request.body
### hence, removing this code for brevity
hash = JSON.parse(response.body)
hash["values"].each do |x|
#repos << x["links"]["self"]["href"]
end
if hash["next"]
puts "next page exists"
puts "calling recusrisve with: #{hash["next"]}"
recursive(hash["next"])
else
puts "this is the last page. No more recursions"
end
end
repo_list = recursive('https://my_bitbucket_url')
#repos.each {|x| puts x}
Now, my code works fine and it lists all the repos.
Question:
I am new to Ruby, so I am not sure about the way I have used the global variable #repos = Array.new above. If I define the array in function, then each call to the function will create a new array overwriting its contents from previous call.
So, how do the Ruby programmers use a global symbol in such cases. Does my code obey Ruby ethics or is it something really amateur (yet correct because it works) way of doing it.
The consensus is to avoid global variables as much as possible.
I would either build the collection recursively like this:
def recursive(url)
### ...
result = []
hash["values"].each do |x|
result << x["links"]["self"]["href"]
end
if hash["next"]
result += recursive(hash["next"])
end
result
end
or hand over the collection to the function:
def recursive(url, result = [])
### ...
hash["values"].each do |x|
result << x["links"]["self"]["href"]
end
if hash["next"]
recursive(hash["next"], result)
end
result
end
Either way you can call the function
repo_list = recursive(url)
And I would write it like this:
def recursive(url)
# ...
result = hash["values"].map { |x| x["links"]["self"]["href"] }
result += recursive(hash["next"]) if hash["next"]
result
end

Refactor the Ruby CLI program

I'm new to programming in Ruby.
How do I make the output show Revenue and Profit or Loss?
How can I refactor the following code to look neater? I know it's wrong but I have no idea how to take my if profit out of the initialize method.
class Theater
attr_accessor :ticket_price, :number_of_attendees, :revenue, :cost
def initialize
puts "What is your selling price of the ticket?"
#ticket_price = gets.chomp.to_i
puts "How many audience are there?"
#number_of_attendees = gets.chomp.to_i
#revenue = (#number_of_attendees * #ticket_price)
#cost = (#number_of_attendees * 3) + 180
#profit = (#revenue - #cost)
if #profit > 0
puts "Profit made: $#{#profit}"
else
puts "Loss incurred: $#{#profit.abs}"
end
end
end
theater = Theater.new
# theater.profit
# puts "Revenue for the theater is RM#{theater.revenue}."
# I hope to put my Profit/Loss here
#
# puts theater.revenue
Thanks guys.
Do not initialize the object with input from the user, make your object accept the needed values. Make a method to read the needed input and return you new Theater. Last of all put the if in separate method like #report_profit.
Remember constructors are for setting up the initial state of the object, making sure it is in a valid state. The constructor should not have side effects(in your case system input/output). This is something to be aware for all programming languages, not just ruby.
Try this:
class Theatre
COST = { running: 3, fixed: 180 }
attr_accessor :number_of_audience, :ticket_price
def revenue
#number_of_audience * #ticket_price
end
def total_cost
COST[:fixed] + (#number_of_audience * COST[:running])
end
def net
revenue - total_cost
end
def profit?
net > 0
end
end
class TheatreCLI
def initialize
#theatre = Theatre.new
end
def seek_number_of_attendes
print 'Number of audience: '
#theatre.number_of_audience = gets.chomp.to_i
end
def seek_ticket_price
print 'Ticket price: '
#theatre.ticket_price = gets.chomp.to_i
end
def print_revenue
puts "Revenue for the theatre is RM #{#theatre.revenue}."
end
def print_profit
message_prefix = #theatre.profit? ? 'Profit made' : 'Loss incurred'
puts "#{message_prefix} #{#theatre.net.abs}."
end
def self.run
TheatreCLI.new.instance_eval do
seek_ticket_price
seek_number_of_attendes
print_revenue
print_profit
end
end
end
TheatreCLI.run
Notes:
Never use your constructor (initialize method) for anything other than initial setup.
Try to keep all methods under 5 lines.
Always try to keep each class handle a single responsibility; for instance, printing and formatting output is not something the Theatre class needs to care.
Try extracting all hard coded values; eg see the COST hash.
Use apt variables consistent to the domain. Eg: net instead of profit makes the intent clear.

Calling multiple methods on a CSV object

I have constructed an Event Manager class that performs parsing actions on a CSV file, and produces html letters using erb. It is part of a jumpstart labs tutorial
The program works fine, but I am unable to call multiple methods on an object without the earlier methods interfering with the later methods. As a result, I have opted to create multiple objects to call instance methods on, which seems like a clunky inelegant solution. Is there a better way to do this, where I can create a single new object and call methods on it?
Like so:
eventmg = EventManager.new("event_attendees.csv")
eventmg.print_valid_phone_numbers
eventmg_2 = EventManager.new("event_attendees.csv")
eventmg_2.print_zipcodes
eventmg_3 = EventManager.new("event_attendees.csv")
eventmg_3.time_targeter
eventmg_4 = EventManager.new("event_attendees.csv")
eventmg_4.day_of_week
eventmg_5 = EventManager.new("event_attendees.csv")
eventmg_5.create_thank_you_letters
The complete code is as follows
require 'csv'
require 'sunlight/congress'
require 'erb'
class EventManager
INVALID_PHONE_NUMBER = "0000000000"
Sunlight::Congress.api_key = "e179a6973728c4dd3fb1204283aaccb5"
def initialize(file_name, list_selections = [])
puts "EventManager Initialized."
#file = CSV.open(file_name, {:headers => true,
:header_converters => :symbol} )
#list_selections = list_selections
end
def clean_zipcode(zipcode)
zipcode.to_s.rjust(5,"0")[0..4]
end
def print_zipcodes
puts "Valid Participant Zipcodes"
#file.each do |line|
zipcode = clean_zipcode(line[:zipcode])
puts zipcode
end
end
def clean_phone(phone_number)
converted = phone_number.scan(/\d/).join('').split('')
if converted.count == 10
phone_number
elsif phone_number.to_s.length < 10
INVALID_PHONE_NUMBER
elsif phone_number.to_s.length == 11 && converted[0] == 1
phone_number.shift
phone_number.join('')
elsif phone_number.to_s.length == 11 && converted[0] != 1
INVALID_PHONE_NUMBER
else
phone_number.to_s.length > 11
INVALID_PHONE_NUMBER
end
end
def print_valid_phone_numbers
puts "Valid Participant Phone Numbers"
#file.each do |line|
clean_number = clean_phone(line[:homephone])
puts clean_number
end
end
def time_targeter
busy_times = Array.new(24) {0}
#file.each do |line|
registration = line[:regdate]
prepped_time = DateTime.strptime(registration, "%m/%d/%Y %H:%M")
prepped_time = prepped_time.hour.to_i
# inserts filtered hour into the array 'list_selections'
#list_selections << prepped_time
end
# tallies number of registrations for each hour
i = 0
while i < #list_selections.count
busy_times[#list_selections[i]] += 1
i+=1
end
# delivers a result showing the hour and the number of registrations
puts "Number of Registered Participants by Hour:"
busy_times.each_with_index {|counter, hours| puts "#{hours}\t#{counter}"}
end
def day_of_week
busy_day = Array.new(7) {0}
d_of_w = ["Monday:", "Tuesday:", "Wednesday:", "Thursday:", "Friday:", "Saturday:", "Sunday:"]
#file.each do |line|
registration = line[:regdate]
# you have to reformat date because of parser format
prepped_date = Date.strptime(registration, "%m/%d/%y")
prepped_date = prepped_date.wday
# adds filtered day of week into array 'list selections'
#list_selections << prepped_date
end
i = 0
while i < #list_selections.count
# i is minus one since days of week begin at '1' and arrays begin at '0'
busy_day[#list_selections[i-1]] += 1
i+=1
end
#busy_day.each_with_index {|counter, day| puts "#{day}\t#{counter}"}
prepared = d_of_w.zip(busy_day)
puts "Number of Registered Participants by Day of Week"
prepared.each{|date| puts date.join(" ")}
end
def legislators_by_zipcode(zipcode)
Sunlight::Congress::Legislator.by_zipcode(zipcode)
end
def save_thank_you_letters(id,form_letter)
Dir.mkdir("output") unless Dir.exists?("output")
filename = "output/thanks_#{id}.html"
File.open(filename,'w') do |file|
file.puts form_letter
end
end
def create_thank_you_letters
puts "Thank You Letters Available in Output Folder"
template_letter = File.read "form_letter.erb"
erb_template = ERB.new template_letter
#file.each do |line|
id = line[0]
name = line[:first_name]
zipcode = clean_zipcode(line[:zipcode])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
end
end
end
The reason you're experiencing this problem is because when you apply each to the result of CSV.open you're moving the file pointer each time. When you get to the end of the file with one of your methods, there is nothing for anyone else to read.
An alternative is to read the contents of the file into an instance variable at initialization with readlines. You'll get an array of arrays which you can operate on with each just as easily.
"Is there a better way to do this, where I can create a single new object and call methods on it?"
Probably. If your methods are interfering with one another, it means you're changing state within the manager, instead of working on local variables.
Sometimes, it's the right thing to do (e.g. Array#<<); sometimes not (e.g. Fixnum#+)... Seeing your method names, it probably isn't.
Nail the offenders down and adjust the code accordingly. (I only scanned your code, but those Array#<< calls on an instance variable, in particular, look fishy.)

Nest output of recursive function

I have written this piece of code, which outputs a list of jobdescriptions (in Danish). It works fine, however I would like to alter the output a bit. The function is recursive because the jobs are nested, however the output does not show the nesting.
How do I configure the function to show an output like this:
Job 1
- Job 1.1
- Job 1.2
-- Job 1.2.1
And so on...
require 'nokogiri'
require 'open-uri'
def crawl(url)
basePath = 'http://www.ug.dk'
doc = Nokogiri::HTML(open(basePath + url))
doc.css('.maplist li').each do |listitem|
listitem.css('.txt').each do |txt|
puts txt.content
end
listitem.css('a[href]').each do |link|
crawl(link['href'])
end
end
end
crawl('/Job.aspx')
require 'nokogiri'
require 'open-uri'
def crawl(url, nesting_level = 0)
basePath = 'http://www.ug.dk'
doc = Nokogiri::HTML(open(basePath + url))
doc.css('.maplist li').each do |listitem|
listitem.css('.txt').each do |txt|
puts " " * nesting_level + txt.content
end
listitem.css('a[href]').each do |link|
crawl(link['href'], nesting_level + 1)
end
end
end
crawl('/Job.aspx')
I see two options:
Pass an additional argument to the recursive function to indicate the level you are currently in. Initialize the value to 0 and each time you call the function increment this value. Something like this:
def crawl(url, level)
basePath = 'http://www.ug.dk'
doc = Nokogiri::HTML(open(basePath + url))
doc.css('.maplist li').each do |listitem|
listitem.css('.txt').each do |txt|
puts txt.content
end
listitem.css('a[href]').each do |link|
crawl(link['href'], level + 1)
end
end
end
Make use of the caller array that holds the callstack. Use the size of this array to indicate the level in the recursion you are located in.
def try
puts caller.inspect
end
try
I would personally stick to the fist version as it seems easier to read, but requires you to modify the interface.

Resources