Best way to DRY up code using procs and blocks and/or dynamic methods - ruby

I am writing a way to parse websites, each "scraper" has it's own way gather information, but there is plenty of common functionality between two methods.
Differences:
One scraper uses Nokogiri to open the page via css selectors
the other scraper uses an RSS feed to gather information
Similarities:
each scraper creates an "Event" object that has the following attributes:
title
date
description
if for the Nokogiri scraper, we do something like this:
event_selector = page.css(".div-class")
event_selector.each_with_index do |event, index|
date = Date.parse(event.text) #code I want to share
end
for the RSS scraper, we do something like this
open(url) do |rss|
feed = RSS::Parser.parse(rss)
feed.items.each do |event|
description = Sanitize.fragment(event.description)
date = description[/\d{2}-\d{2}-20\d{2}/]
date = Date.strptime(date, '%m-%d-%Y') #code I want to share
end
end
^^ The date is grabbed via a regex from the description and then converted into a Date object via the .strptime method
as you can see each scraper uses 2 different method calls/ways to find the date. How could I abstract this information into a class?
I was thinking of something like this:
class scrape
attr_accessor :scrape_url, :title, :description, :date, :url
def initialize(options = {})
end
def find_date(&block)
# Process the block??
end
end
and then in each of the scraper methods do something like
scrape = Scrape.new
date_proc = Proc.new {Date.parse(event.text)}
scrape.find_date(date_proc)
Is this the right way to go about this problem? In short I want to have common functionality of two website parsers to pass the desired code into a instance method of a "scrape" class. I would greatly appreciate any tips to tackle this scenario.
Edit: Maybe it would make more sense if I say that I want to find the "date" of an event, but the way I find it - the behavior - or the specific code that is run, is different.

You could use an Event builder. Something like this:
class Event::Builder
def date(raw)
#date = Date.strptime(raw, '%m-%d-%Y')
end
# ... more setters (title, description) ...
def build
Event.new(date: #date, ... more arguments ..)
end
end
And then, inside the scraper:
open(url) do |rss|
builder = Event::Builder.new
feed = RSS::Parser.parse(rss)
feed.items.each do |event|
description = Sanitize.fragment(event.description)
date = description[/\d{2}-\d{2}-20\d{2}/]
builder.date(date)
# ... set other attributes ...
event = builder.build
# do something with the event ...
end
end

You should look into the Strategy or Template patterns. These are ways of writing code that does different things depending on some state or configuration. Essentially you'd write a Scraper class and then sub class it as WebScraper and RssScraper. Each class would inherit from the Scraper class all the common functionality but only differ in their implementation of how to get the date, description, etc.

Related

Ruby mixins looking for a best practice

I'm writing Ruby Gem where I have Connection module for Faraday configuration
module Example
module Connection
private
def connection
Faraday.new(url: 'http://localhost:3000/api') do |conn|
conn.request :url_encoded # form-encode POST params
conn.response :logger # log requests to STDOUT
conn.adapter Faraday.default_adapter # make requests with Net::HTTP
conn.use Faraday::Response::ParseJson
conn.use FaradayMiddleware::RaiseHttpException
end
end
end
end
Second module which makes API requests looks like this:
module Example
module Request
include Connection
def get(uri)
connection.get(uri).body
end
def post(url, attributes)
response = connection.post(url) do |request|
request.body = attributes.to_json
end
end
def self.extended(base)
base.include(InstanceMethods)
end
module InstanceMethods
include Connection
def put(url, attributes)
response = connection.put(url) do |request|
request.body = attributes.to_json
end
end
end
end
end
Class Cusomer where I use Request looks like this:
module Example
class Customer
extend Request
attr_accessor :id, :name, :age
def initialize(attrs)
attrs.each do |key, value|
instance_variable_set("##{key}", value)
end
end
def self.all
customers = get('v1/customer')
customers.map { |cust| new cust }
end
def save
params = {
id: self.id,
age: self.age
name: self.name,
}
put("v1/customers/#{self.id}", params)
end
end
end
So here you see in Customer#all class method I'm calling Request#get method which is available because I extended Request in Customer. then I'm using self.extended method in Request module to be make Request#put available in Customer class, so I have question is this good approach to use mixins like this, or do you have any suggestion?
Mixins are a strange beast. Best practices vary depending on who you talk to. As far as reuse goes, you've achieved that here with mixins, and you have a nice separation of concerns.
However, mixins are a form of inheritance (you can take a peek at #ancestors). I would challenge you saying that you shouldn't use inheritance here because a Customer doesn't have an "is-a" relationship with Connection. I would recommend you use composition instead (e.g. pass in Connection/Request) as it makes more sense to me in this case and has stronger encapsulation.
One guideline for writing mixins is to make everything end in "-able", so you would have Enumerable, Sortable, Runnable, Callable, etc. In this sense, mixins are generic extensions that provide some sort of helpers that are depending on a very specific interface (e.g. Enumerable depends on the class to implement #each).
You could also use mixins for cross-cutting concerns. For example, we've used mixins in the past in our background jobs so that we could add logging for example without having to touch the source code of the class. In this case, if a new job wants logging, then they just mixin the concern which is coupled to the framework and will inject itself properly.
My general rule of thumb is don't use them if you don't have to. They make understanding the code a lot more complicated in most cases
EDIT: Adding an example of composition. In order to maintain the interface you have above you'd need to have some sort of global connection state, so it may not make sense. Here's an alternative that uses composition
class CustomerConnection
# CustomerConnection is composed of a Connection and retains isolation
# of responsibilities. It also uses constructor injection (e.g. takes
# its dependencies in the constructor) which means easy testing.
def initialize(connection)
#connection = connection
end
def all_customers
#connection.get('v1/customers').map { |res| Customer.new(res) }
end
end
connection = Connection.new
CustomerConnection.new(connection).all_customers

AFMotion HTTP GET request syntax for setting variable

My goal is to set an instance variable using AFMotion's AFMotion::HTTP.get method.
I've set up a Post model. I would like to have something like:
class Post
...
def self.all
response = AFMotion::HTTP.get("localhost/posts.json")
objects = JSON.parse(response)
results = objects.map{|x| Post.new(x)}
end
end
But according to the docs, AFMotion requires some sort of block syntax that looks and seems to behave like an async javascript callback. I am unsure how to use that.
I would like to be able to call
#posts = Post.all in the ViewController. Is this just a Rails dream? Thanks!
yeah, the base syntax is async, so you don't have to block the UI while you're waiting for the network to respond. The syntax is simple, place all the code you want to load in your block.
class Post
...
def self.all
AFMotion::HTTP.get("localhost/posts.json") do |response|
if result.success?
p "You got JSON data"
# feel free to parse this data into an instance var
objects = JSON.parse(response)
#results = objects.map{|x| Post.new(x)}
elsif result.failure?
p result.error.localizedDescription
end
end
end
end
Since you mentioned Rails, yeah, this is a lil different logic. You'll need to place the code you want to run (on completion) inside the async block. If it's going to change often, or has nothing to do with your Model, then pass in a &block to yoru method and use that to call back when it's done.
I hope that helps!

ruby - Sharing a class across modules

I'm trying to mimic ActiveRecord with a simple set of ruby objects for running raw sql queries. Below is a spike I've been experimenting with:
module Runable
def run
return self::Results.new
end
end
module Query
class Results
def initialize
#results = Object.find_by_sql()
end
def to_a
#code
end
end
end
module Scored
extend Runable
include Query
QUERY = 'a raw sql query string'
end
module Unseen
extend Runable
include Query
QUERY = 'a different raw sql query string'
end
What I want to be able to do is create simple Modules for each type of raw sql query I'm going to run, put them into a file like Scored or Unseen above and call .run on them to get back a results object. So like this:
Scored.run #=> #<Scored::Results:0x0000000000>
Unseen.run #=> #<Unseen::Results:0x0000000000>
but instead I get this...
Scored.run #=> #<Query::Results:0x0000000000>
Unseen.run #=> #<Query::Results:0x0000000000>
I've been doing ruby and rails for over a year but I'm just beginning to get into more advanced ruby usage. This is my first big step into using modules and mixins.
The issue, as far as I can tell, is that module class methods have self scoped to the module they're defined in. So I get Query::Results because the initialize method for Results is defined in the Query module. That make sense?
Thank you for the help!
Update 5/30 16:45
Basically, I want to wrap a handful of raw SQL statements into modules like this:
module ScoredUsers
include Queryable
QUERY="SELECT * FROM users ..."
end
and interact with the queries like this:
r = ScoredUsers.run #=> ScoredUsers::Results
r.ids
r.load_objects
REDIS.zadd user:5:cache, r.to_a
I want to keep everything in modules and classes, the ruby way (I think?) so when I want to create a new query object I can simple use the boilerplate module like Scored above.
The reason why you are getting such a results is that class Results is created just once. When the module is included new constant is created within including class (Scored::Results), but it is pointing to same memory space as constant Query::Results.
What you need is that you have to create a new class for each class this module is being included in. This is perfect opportunity to use included method:
module Query
def self.included(mod)
results = Class.new do
def initialize
#results = Object.find_by_sql()
end
def to_a
#code
end
end
mod.const_set('Results', results)
end
end
Now of course we are left with the question - do we really need to do this? This depends on how you are planning to use those classes.

How do I access a variable inside the method I'm calling in a block I'm passing to it?

I'm writing a wrapper for an XML API using Nokogiri to build the XML for submission.
In order to keep my code DRY, I'm using custom blocks for the first time and just getting to grips with how to pass variables back and forth and how that works.
What I'm doing at the moment is this:
# Generic action
def action(xml, action_title, test=false)
xml.request do
xml.login do
xml.username("my_user")
xml.password("my_pass")
end
xml.action(action_title)
xml.params do
yield
end
end
end
# Specific action
def get_users(city = "", gender = "")
build = Nokogiri::XML::Builder.new do |xml|
action(xml, "getusers") do
xml.city(city) unless city.blank?
xml.gender(gender) unless gender.blank?
end
end
do_stuff_to(build)
end
Ideally, I'd like to the specific action method to look like this:
def get_users(city = "", gender = "")
action("getusers") do |xml|
xml.city(city) unless city.blank?
xml.gender(gender) unless gender.blank?
end
end
In doing so, I'd want the other logic currently in the specific action method to be moved to the generic action method with the generic action method returning the results of do_stuff_to(build).
What I'm struggling with is how to pass the xml object from action() back to get_users(). What should action() look like in order to achieve this?
Turns out this was quite simple. The action method needs to be changed so it looks like this:
def action(action_title)
build = Nokogiri::XML::Builder.new do |xml|
xml.request do
xml.login do
xml.username("my_user")
xml.password("my_pass")
end
xml.action(action_title)
xml.params do
yield xml
end
end
end
do_stuff_to(build)
end
That meant the specific action method could be called like this to the same effect:
def get_users(city = "", gender = "")
action("getusers") do |xml|
xml.city(city) unless city.blank?
xml.gender(gender) unless gender.blank?
end
end

getting active records to display as a plist

I'm trying to get a list of active record results to display as a plist for being consumed by the iphone. I'm using the plist gem v 3.0.
My model is called Post. And I want Post.all (or any array or Posts) to display correctly as a Plist.
I have it working fine for one Post instance:
[http://pastie.org/580902][1]
that is correct, what I would expect. To get that behavior I had to do this:
class Post < ActiveRecord::Base
def to_plist
attributes.to_plist
end
end
However, when I do a Post.all, I can't get it to display what I want. Here is what happens:
http://pastie.org/580909
I get marshalling. I want output more like this:
[http://pastie.org/580914][2]
I suppose I could just iterate the result set and append the plist strings. But seems ugly, I'm sure there is a more elegant way to do this.
I am rusty on Ruby right now, so the elegant way isn't obvious to me. Seems like I should be able to override ActiveRecord and make result-sets that pull back more than one record take the ActiveRecord::Base to_plist and make another to_plist implementation. In rails, this would go in environment.rb, right?
I took the easy way out:
private
# pass in posts resultset from finds
def posts_to_plist(posts)
plist_array = []
posts.each do |post|
plist_array << post.attributes
end
plist_array.to_plist
end
public
# GET /posts
# GET /posts.xml
def index
#posts = Post.all
##posts = [{:a=>"blah"}, {:b=>"blah2"}]
respond_to do |format|
format.html # index.html.erb
format.xml { render :xml => posts_to_plist(#posts) }
end
end
I found this page searching for the same answer. I think you have the right approach, though I'm also a newbie (on Rails) and not sure the right way to do it. I added this to application_helper.rb. Seems to work.
require 'plist'
module ApplicationHelper
class ActiveRecord::Base
public
include Plist::Emit
def to_plist
self.attribute_names.inject({}) do |attrs, name|
value = self.read_attribute(name)
if !value.nil?
attrs[name] = value
end
attrs
end
end
end
end
According to the plist project README, you should implement "to_plist_node", as opposed to "to_plist".
You should also mixin Plist::Emit to your ActiveRecord class.

Resources