I recently wrote a program to return a bunch of stocks from the stock market that are unhealthy. The basic algorithm is this:
Look up all the quotes of every stock in an exchange (either NYSE or NASDAQ)
Find the ones that are less than 5 dollars from step 1
Find the ones from step 2 that are down 3 days and have large volume (expensive because I have to make a request for each stock, which is like ~700 currently for nasdaq).
Scan the news for the ones returned by step 3.
I had this all in one file:
Original implementation (https://github.com/EdmundMai/minion/blob/aa14bc3234a4953e7273ec502276c6f0073b459d/lib/minion.rb):
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "business_time"
require 'nokogiri'
require 'open-uri'
module Minion
class << self
def query(exchange)
client = YahooFinance::Client.new
all_companies = CSV.read("#{exchange}.csv")
small_caps = []
ticker_symbols = all_companies.map { |row| row[0] }
ticker_symbols.each_slice(200) do |batch|
data = client.quotes(batch, [:symbol, :last_trade_price, :average_daily_volume])
small_caps << data.select { |stock| stock.last_trade_price.to_f < 5.0 }
end
attractive = []
small_caps.flatten!.each_with_index do |small_cap, index|
begin
data = client.historical_quotes(small_cap.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > small_cap.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
attractive << small_cap.symbol
puts "Qualified: #{small_cap.symbol}, finished with #{index} out of #{small_caps.count}"
else
puts "Not qualified: #{small_cap.symbol}, finished with #{index} out of #{small_caps.count}"
end
rescue => e
puts e.inspect
end
end
final_results = []
attractive.each do |symbol|
rss_feed = Nokogiri::HTML(open("http://feeds.finance.yahoo.com/rss/2.0/headline?s=#{symbol}®ion=US&lang=en-US"))
html_body = rss_feed.css('body')[0].text
diluting = false
['warrant', 'cashless exercise'].each do |keyword|
diluting = true if html_body.match(/#{keyword}/i)
end
final_results << symbol if diluting
end
final_results
end
end
end
This was really fast and would finish processing like ~700 stocks in a minute or less.
Then, I tried refactoring and splitting up the algorithm into different classes and files without changing the algorithm at all. I decided on using the decorator pattern since it seems to fit. However when I run the program now, it makes each request really slowly (15+ min). I know this because my puts statements get printed out really slowly.
New and slower implementation (https://github.com/EdmundMai/minion/blob/master/lib/minion.rb)
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "minion/dilution_finder"
require "minion/negative_finder"
require "minion/small_cap_finder"
require "minion/market_fetcher"
module Minion
class << self
def query(exchange)
all_companies = CSV.read("#{exchange}.csv")
all_tickers = all_companies.map { |row| row[0] }
short_finder = DilutionFinder.new(NegativeFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))))
short_finder.results
end
end
end
The part it's lagging at according to my puts:
require "yahoo-finance"
require "business_time"
require_relative "stock_finder"
class NegativeFinder < StockFinder
def results
client = YahooFinance::Client.new
results = []
finder.results.each_with_index do |stock, index|
begin
data = client.historical_quotes(stock.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > stock.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
results << stock
puts "Qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}"
else
puts "Not qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}"
end
rescue => e
puts e.inspect
end
end
results
end
end
It's lagging on step 3 (making one request for each stock). Not sure what's going on so any advice would be appreciated. If you want to clone the program and run it, just comment in the last line in lib/minion.rb and type ruby lib/minion.rb
After debugging it I figured it out. It was because I was calling finder.results (results being the decorated method) inside of the loop as shown below:
require 'bundler/setup'
require "minion/version"
require "yahoo-finance"
require "minion/dilution_finder"
require "minion/negative_finder"
require "minion/small_cap_finder"
require "minion/market_fetcher"
module Minion
class << self
def query(exchange)
all_companies = CSV.read("#{exchange}.csv")
all_tickers = all_companies.map { |row| row[0] }
short_finder = DilutionFinder.new(NegativeFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))))
short_finder.results
end
end
end
The part it's lagging at according to my puts:
require "yahoo-finance"
require "business_time"
require_relative "stock_finder"
class NegativeFinder < StockFinder
def results
client = YahooFinance::Client.new
results = []
finder.results.each_with_index do |stock, index|
begin
data = client.historical_quotes(stock.symbol, { start_date: 2.business_days.ago, end_date: Time.now })
closing_prices = data.map(&:close).map(&:to_f)
volumes = data.map(&:volume).map(&:to_i)
negative_3_days_in_a_row = closing_prices == closing_prices.sort
larger_than_average_volume = volumes.reduce(:+) / volumes.count > stock.average_daily_volume.to_i
if negative_3_days_in_a_row && larger_than_average_volume
results << stock
// HERE!!!!!!!!!!!!!!!!!!!!!!!!!
puts "Qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}" <------------------------------------
else
// AND HERE!!!!!!!!!!!!!!!!!!!!!!!!!
puts "Not qualified: #{stock.symbol}, finished with #{index} out of #{finder.results.count}" <-----------------------------------------------------------
end
rescue => e
puts e.inspect
end
end
results
end
end
This caused a cascade of requests every time I iterated through the loop in NegativeFinder. Removing that call fixed it. Lesson: When using the decorator pattern, either only call the decorated method once, especially when you're doing something expensive in each call. Either that or hold the returned variable in an instance variable so you don't have to calculate it each time.
Also as a side note, I've decided not to go with the decorator pattern because I don't think it applies well here. Something like SmallCapFinder.new(SmallCapFinder.new(MarketFetcher.new(all_tickers))) doesn't add functionality at all (the primary function of using the decorator pattern), so chaining decorators doesn't do anything. Therefore, I'm just going to make them methods instead of adding unnecessary complexity.
There are some thing missing in the code you gave us (Base class StockFinder, MarketFetcher). But I think you are now instantate more than one YahooFinance::Client. Input/Output to other systems is very often the cause for speed problems.
I suggest that you first encapsulate the finance client and access to financial data. This makes it easier when you want to switch your financial data provider, or add another one. Instead of the decorator pattern, I would just use plain old methods for finding small caps, finding negative, etc.
I am pulling bitbucket repo list using Ruby. The response from bitbucket will contain only 10 repositories and a marker for the next page where there will be another 10 repos and so on ... (they call it pagination)
So, I wrote a recursive function which calls itself if the next page marker is present. This will continue until it reaches the last page.
Here is my code:
#!/usr/local/bin/ruby
require 'net/http'
require 'json'
require 'awesome_print'
#repos = Array.new
def recursive(url)
### here goes my net/http code which connects to bitbucket and pulls back the response in a JSON as request.body
### hence, removing this code for brevity
hash = JSON.parse(response.body)
hash["values"].each do |x|
#repos << x["links"]["self"]["href"]
end
if hash["next"]
puts "next page exists"
puts "calling recusrisve with: #{hash["next"]}"
recursive(hash["next"])
else
puts "this is the last page. No more recursions"
end
end
repo_list = recursive('https://my_bitbucket_url')
#repos.each {|x| puts x}
Now, my code works fine and it lists all the repos.
Question:
I am new to Ruby, so I am not sure about the way I have used the global variable #repos = Array.new above. If I define the array in function, then each call to the function will create a new array overwriting its contents from previous call.
So, how do the Ruby programmers use a global symbol in such cases. Does my code obey Ruby ethics or is it something really amateur (yet correct because it works) way of doing it.
The consensus is to avoid global variables as much as possible.
I would either build the collection recursively like this:
def recursive(url)
### ...
result = []
hash["values"].each do |x|
result << x["links"]["self"]["href"]
end
if hash["next"]
result += recursive(hash["next"])
end
result
end
or hand over the collection to the function:
def recursive(url, result = [])
### ...
hash["values"].each do |x|
result << x["links"]["self"]["href"]
end
if hash["next"]
recursive(hash["next"], result)
end
result
end
Either way you can call the function
repo_list = recursive(url)
And I would write it like this:
def recursive(url)
# ...
result = hash["values"].map { |x| x["links"]["self"]["href"] }
result += recursive(hash["next"]) if hash["next"]
result
end
I got Ruby to travel to a web site, iterate through a list of campaigns and scrape the pages for specific data. The problem I have now is getting it from the structure Nokogiri gives me, and outputting it into a readable form.
campaign_list = Array.new
campaign_list.push(1042360, 1042386, 1042365, 992307)
browser = Watir::Browser.new :chrome
browser.goto '<redacted>'
browser.text_field(:id => 'email').set '<redacted>'
browser.text_field(:id => 'password').set '<redacted>'
browser.send_keys :enter
file = File.new('hourlysales.csv', 'w')
data = {}
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
puts "Error loading page, I recommend restarting script"
# Possibly automatic restart of script
else
hourly_data = Nokogiri::HTML.parse(browser.html).text
# file.write data
puts hourly_data
end
This is the output I get:
{"views":[[17,145],[18,165],[19,99],[20,71],[21,31],[22,26],[23,10],[0,15],[1,1], [2,18],[3,19],[4,35],[5,47],[6,44],[7,67],[8,179],[9,141],[10,112],[11,95],[12,46],[13,82],[14,79],[15,70],[16,103]],"orders":[[17,10],[18,9],[19,5],[20,1],[21,1],[22,0],[23,0],[0,1],[1,0],[2,1],[3,0],[4,1],[5,2],[6,1],[7,5],[8,11],[9,6],[10,5],[11,3],[12,1],[13,2],[14,4],[15,6],[16,7]],"conversion_rates":[0.06870229007633588,0.05442176870748299,0.050505050505050504,0.014084507042253521,0.03225806451612903,0.0,0.0,0.06666666666666667,0.0,0.05555555555555555,0.0,0.02857142857142857,0.0425531914893617,0.022727272727272728,0.07462686567164178,0.06134969325153374,0.0425531914893617,0.044642857142857144,0.031578947368421054,0.021739130434782608,0.024390243902439025,0.05063291139240506,0.08571428571428572,0.06741573033707865]}
The arrays stand for { views [[hour, # of views], [hour, # of views], etc. }. Same with orders. I don't need conversion rates.
I also need to add the values up for each key, so after doing this for 5 pages, I have one key for each hour of the day, and the total number of views for that hour. I tried a couple each loops, but couldn't make any progress.
I appreciate any help you guys can give me.
It looks like the output (which from your code I assume is the content of hourly_data) is JSON. In that case, it's easy to parse and add up the numbers. Something like this:
require "json" # at the top of your script
# ...
def sum_hours_values(data, hours_values=nil)
# Start with an empty hash that automatically initializes missing keys to `0`
hours_values ||= Hash.new {|hsh,hour| hsh[hour] = 0 }
# Iterate through the [hour, value] arrays, adding `value` to the running
# count for that `hour`, and return `hours_values`
data.each_with_object(hours_values) do |(hour, value), hsh|
hsh[hour] += value
end
end
# ... Watir/Nokogiri stuff here...
# Initialize these so they persist outside the loop
hours_views, orders_views = nil
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
# ...
else
# ...
hourly_data_parsed = JSON.parse(hourly_data)
hours_views = sum_hours_values(hourly_data_parsed["views"], hours_views)
hours_orders = sum_hours_values(hourly_data_parsed["orders"], orders_views)
end
end
puts "Views by hour:"
puts hours_views.sort.map {|hour_views| "%2i\t%4i" % hour_views }
puts "Orders by hour:"
puts hours_orders.sort.map {|hour_orders| "%2i\t%4i" % hour_orders }
P.S. There's a really nice recursive version of sum_hours_values I didn't include since the iterative version is clearer to most Ruby programmers. If you're into recursion I leave it as an exercise for you. ;)
I am trying to geht this script to run: http://dysinger.net/2008/10/13/using-amazon-ec2-metadata-as-a-simple-dns but dosnt work because it is using an old amazon sdk version, i rewrote it to use the new one:
#!/usr/bin/env ruby
require "rubygems"
require "aws-sdk"
%w(optparse rubygems aws-sdk resolv pp).each {|l| require l}
options = {}
parser = OptionParser.new do |p|
p.banner = "Usage: hosts [options]"
p.on("-a", "--access-key USER", "The user's AWS access key ID.") do |aki|
options[:access_key_id] = aki
end
p.on("-s",
"--secret-key PASSWORD",
"The user's AWS secret access key.") do |sak|
options[:secret_access_key] = sak
end
p.on_tail("-h", "--help", "Show this message") {
puts(p)
exit
}
p.parse!(ARGV) rescue puts(p)
end
if options.key?(:access_key_id) and options.key?(:secret_access_key)
puts "127.0.0.1 localhost"
AWS.config(options)
AWS::EC2.new(options)
answer = AWS::EC2::Client.new.describe_instances
answer.reservationSet.item.each do |r|
r.instancesSet.item.each do |i|
if i.instanceState.name =~ /running/
puts(Resolv::DNS.new.getaddress(i.privateDnsName).to_s +
" #{i.keyName}.ec2 #{i.keyName}")
end
end
end
else
puts(parser)
exit(1)
end
What this should do is outputing a new /etc/hosts file with my ec2 instances in it.
And i get a response =D, but answer is a hash and therefore i get the
error undefined method `reservationSet' for #<Hash:0x7f7573b27880>.
And this is my problem, since i dont know Ruby at all ( All I was doing was reading Amazon Documentation and playing around so i get an answer ). Somehow in the original example this seemed to work. I suppose that back then, the API did not return a hash, anyway...how can i iterate through a hash like above, to get this to work?
This code may help you:
answer = AWS::EC2::Client.new.describe_instances
reservations = answer[:reservation_set]
reservations.each do |reservation|
instances = reservation[:instances_set]
instances.each do |instance|
if instance[:instance_state][:name] == "running"
private_dns_name = instance[:private_dns_name]
key_name = instance[:key_name]
address = Resolv::DNS.new.getaddress(private_dns_name)
puts "{address} #{key_name}.ec2 #{key_name}"
end
end
end
Generally change your code from using methods with names e.g. item.fooBarBaz to using a hash e.g. item[:foo_bar_baz]
When you're learning Ruby the "pp" command is very useful for pretty-printing variables as you go, such as:
pp reservations
pp instances
pp private_dns_name
I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end