How this ruby script using twitter api scrapes user ids? - ruby

Hello I have modified this older code to scrape twitter usernames, but for some reason it also scrapes user ids. I dont understand how it does that, since I dont see anywhere in the code "user_id" which you should use to get user ids according to twitter api documentation.
Here is the code
def my_usernames
"UHDTelevisions"
end
def my_userinfo(names)
#client.followers(names)
end
def my_userhash(users)
userhash = {}
users.each do |user|
userhash[user.screen_name] = user.id.to_s
end
return userhash
end
def my_users
my_userhash(my_userinfo(my_usernames))
end
def my_csv(my_users)
CSV.open('./my_users.csv','a+') do |csv|
my_users.each do |k,v|
csv << [k,v]
end
end

Here is the line that builds a hash {name ⇒ id}:
userhash[user.screen_name] = user.id.to_s
Here we already got the user object, that contains id amongst other user params. To return the list of names, one might simply:
#client.followers("UHDTelevisions").map &:screen_name
instead of all the code above.

If you wanted to keep a parallel structure you could change it to have my_userarray (since you only need values, not key-value pairs, I assume)
def my_userarray(users)
userarray = []
users.each do |user|
userarray << user.screen_name
end
return userarray
end
You would need to update the my_users method as well, of course, to reflect the new method name for my_userarray

Related

Metrics/AbcSize Too High: How do I decrease the ABC in this method?

I have recently started using Rubocop to "standardise" my code, and it has helped me optimise a lot of my code, as well as help me learn a lot of Ruby "tricks". I understand that I should use my own judgement and disable Cops where necessary, but I have found myself quite stuck with the below code:
def index
if params[:filters].present?
if params[:filters][:deleted].blank? || params[:filters][:deleted] == "false"
# if owned is true, then we don't need to filter by admin
params[:filters][:admin] = nil if params[:filters][:admin].present? && params[:filters][:owned] == "true"
# if admin is true, then must not filter by owned if false
params[:filters][:owned] = nil if params[:filters][:owned].present? && params[:filters][:admin] == "false"
companies_list =
case params[:filters][:admin]&.to_b
when true
current_user.admin_companies
when false
current_user.non_admin_companies
end
if params[:filters][:owned].present?
companies_list ||= current_user.companies
if params[:filters][:owned].to_b
companies_list = companies_list.where(owner: current_user)
else
companies_list = companies_list.where.not(owner: current_user)
end
end
else
# Filters for deleted companies
companies_list = {}
end
end
companies_list ||= current_user.companies
response = { data: companies_list.alphabetical.as_json(current_user: current_user) }
json_response(response)
end
Among others, the error that I'm getting is the following:
C: Metrics/AbcSize: Assignment Branch Condition size for index is too high. [<13, 57, 16> 60.61/15]
I understand the maths behind it, but I don't know how to simplify this code to achieve the same result.
Could someone please give me some guidance on this?
Thanks in advance.
Well first and foremost, is this code fully tested, including all the myriad conditions? It's so complex that refactoring will surely be disastrous unless the test suite is rigorous. So, write a comprehensive test suite if you don't already have one. If there's already a test suite, make sure it tests all the conditions.
Second, apply the "fat model skinny controller" paradigm. So move all the complexity into a model, let's call it CompanyFilter
def index
companies_list = CompanyFilter.new(current_user, params).list
response = { data: companies_list.alphabetical.as_json(current_user: current_user) }
json_response(response)
end
and move all those if/then/else statements into the CompanyFilter#list method
tests still pass? great, you'll still get the Rubocop warnings, but related to the CompanyFilter class.
Now you need to untangle all the conditions. It's a bit hard for me to understand what's going on, but it looks as if it should be reducible to a single case statement, with 5 possible outcomes. So the CompanyFilter class might look something like this:
class CompanyFilter
attr_accessors :current_user, :params
def initialize(current_user, params)
#current_user = current_user
#params = params
end
def list
case
when no_filter_specified
{}
when user_is_admin
#current_user.admin_companies
when user_is_owned
# etc
when # other condition
# etc
end
end
private
def no_filter_specified
#params[:filter].blank?
end
def user_is_admin
# returns boolean based on params hash
end
def user_is_owned
# returns boolean based on params hash
end
end
tests still passing? perfect! [Edit] Now you can move most of your controller tests into a model test for the CompanyFilter class.
Finally I would define all the different companies_list queries as scopes on the Company model, e.g.
class Company < ApplicationRecord
# some examples, I don't know what's appropriate in this app
scope :for_user, ->(user){ where("...") }
scope :administered_by, ->(user){ where("...") }
end
When composing database scopes ActiveRecord::SpawnMethods#merge is your friend.
Post.where(title: 'How to use .merge')
.merge(Post.where(published: true))
While it doesn't look like much it lets you programatically compose scopes without overelying on mutating assignment and if/else trees. You can for example compose an array of conditions and merge them together into a single ActiveRecord::Relation object with Array#reduce:
[Post.where(title: 'foo'), Post.where(author: 'bar')].reduce(&:merge)
# => SELECT "posts".* FROM "posts" WHERE "posts"."title" = $1 AND "posts"."author" = $2 LIMIT $3
So lets combine that with a skinny controllers approach where you handle filtering in a seperate object:
class ApplicationFilter
include ActiveModel::Attributes
include ActiveModel::AttributeAssignment
attr_accessor :user
def initialize(**attributes)
super()
assign_attributes(attributes)
end
# A convenience method to both instanciate and apply the filters
def self.call(user, params, scope: model_class.all)
return scope unless params[:filters].present?
scope.merge(
new(
permit_params(params).merge(user: user)
).to_scope
)
end
def to_scope
filters.map { |filter| apply_filter(filter) }
.compact
.select {|f| f.respond_to?(:merge) }
.reduce(&:merge)
end
private
# calls a filter_by_foo method if present or
# defaults to where(key => value)
def apply_filter(attribute)
if respond_to? "filter_by_#{attribute}"
send("filter_by_#{attribute}")
else
self.class.model_class.where(
attribute => send(attribute)
)
end
end
# Convention over Configuration is sexy.
def self.model_class
name.chomp("Filter").constantize
end
# filters the incoming params hash based on the attributes of this filter class
def self.permit_params
params.permit(filters).reject{ |k,v| v.blank? }
end
# provided for modularity
def self.filters
attribute_names
end
end
This uses some of the goodness provided by Rails to setup objects with attributes that will dynamically handle filtering attributes. It looks at the list of attributes you have declared and then slices those off the params and applies a method for that filter if present.
We can then write a concrete implementation:
class CompanyFilter < ApplicationFilter
attribute :admin, :boolean, default: false
attribute :owned, :boolean
private
def filter_by_admin
if admin
user.admin_companies
else
user.non_admin_companies
end
end
# this should be refactored to use an assocation on User
def filter_by_owned
case owned
when nil
nil
when true
Company.where(owner: user)
when false
Company.where.not(owner: user)
end
end
end
And you can call it with:
# scope is optional
#companies = CompanyFilter.call(current_user, params), scope: current_user.companies)

Fetching array of data from DB using Rails 3

I want to access multiple columns using Rails 3.But it gave me the following error.
Error:
ArgumentError (wrong number of arguments (2 for 1)):
app/controllers/payments_controller.rb:13:in `check_type'
Check my below code.
payment_controller.rb:
class PaymentsController < ApplicationController
def payment
#payment=Vendor.new
respond_to do |format|
format.html
format.js
end
end
def check_type
if params[:commit]=="submit"
#vendor_type=PaymentVendor.where(:v_name => params[:v_name]).pluck(:type ,:Receipt_No)
#vendor_type.each do |vendor|
end
else
#v_name=Vendor.where(:s_catagory => params[:payment][:s_catagory] ).pluck(:v_name)
end
end
end
Actually i want to retrive data like below format.
#vendor_type=["Receipt_no":"type","Receipt_no":"type",.....]
Once these data will appear,I need how to access row values according to Receipt_No.Please help me to resolve this error.
Thanks to ActiveRecord >= 4 . pluck accepts multiple arguments so in
Rails 4: Your query will work
#vendor_type=PaymentVendor.where(:v_name => params[:v_name]).pluck(:type ,:Receipt_No)
Now as you are using Rails 3 which doesn't support multiple arguments to pluck then we can extend ActiveRecord::Relation itself like this:
put your file under config/initializers
# pluck_all.rb
module ActiveRecord
class Relation
def pluck_all(*args)
args.map! do |column_name|
if column_name.is_a?(Symbol) && column_names.include?(column_name.to_s)
"#{connection.quote_table_name(table_name)}.#{connection.quote_column_name(column_name)}"
else
column_name.to_s
end
end
relation = clone
relation.select_values = args
klass.connection.select_all(relation.arel).map! do |attributes|
initialized_attributes = klass.initialize_attributes(attributes)
attributes.each do |key, attribute|
attributes[key] = klass.type_cast_attribute(key, initialized_attributes)
end
end
end
end
end
Now in your controller you can pass multiple arguments to pluck like this:
# payment_controller.rb:
#vendor_type=PaymentVendor.where(:v_name => params[:v_name]).pluck_all(:type ,:Receipt_No)
Now you can use pluck_all in whole app. Hope this helps ;)
EDIT:
Try below code if plcuk_all not worked:
#vendor_type = PaymentVendor.where(:v_name => params[:v_name]).map{|v|[v.type ,v.Receipt_No]}
Reference for more info: http://meltingice.net/2013/06/11/pluck-multiple-columns-rails/
Your pluck(:type ,:Receipt_No) looks wrong,
pluck have only one argument.
Also your type of data #vendor_type is wrong, Array don't have key, value pair.
Use map like this,
#vendor_type=PaymentVendor.where(:v_name => params[:v_name]).map { |i| [i.Receipt_No] }
In terms of making a rails 3 method that behaves the same as the Rails 4 pluck with multiple columns. This outputs a similar array (rather than a hashed key value collection). This should save a bit of pain if you ever come to upgrade and want to clean up the code.
See this tutorial which outlines a similar method that outputs a hash.
config/initializers/pluck_all.rb
module ActiveRecord
class Relation
def pluck_all(*args)
args.map! do |column_name|
if column_name.is_a?(Symbol) && column_names.include?(column_name.to_s)
"#{connection.quote_table_name(table_name)}.#{connection.quote_column_name(column_name)}"
else
column_name.to_s
end
end
relation = clone
relation.select_values = args
klass.connection.select_all(relation.arel).map! do |attributes|
initialized_attributes = klass.initialize_attributes(attributes)
attributes.map do |key, attribute|
klass.type_cast_attribute(key, initialized_attributes)
end
end
end
end
end
Standing on the shoulders of giants and all

Warming Up Cache Digests Overnight

We have a Rails 3.2 website which is fairly large with thousands of URLs. We implemented Cache_Digests gem for Russian Doll caching. It is working well. We want to further optimize by warming up the cache overnight so that user gets a better experience during the day. I have seen answer to this question: Rails: Scheduled task to warm up the cache?
Could it be modified for warming up large number of URLs?
To trigger cache hits for many pages with expensive load times, just create a rake task to iteratively send web requests to all record/url combinations within your site. (Here is one implementation)
Iteratively Net::HTTP request all site URL/records:
To only visit every page, you can run a nightly Rake task to make sure that early morning users still have a snappy page with refreshed content.
lib/tasks/visit_every_page.rake:
namespace :visit_every_page do
include Net
include Rails.application.routes.url_helpers
task :specializations => :environment do
puts "Visiting specializations..."
Specialization.all.sort{ |a,b| a.id <=> b.id }.each do |s|
begin
puts "Specialization #{s.id}"
City.all.sort{ |a,b| a.id <=> b.id }.each do |c|
puts "Specialization City #{c.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_city_cache/#{c.id}.js") )
end
Division.all.sort{ |a,b| a.id <=> b.id }.each do |d|
puts "Specialization Division #{d.id}"
Net::HTTP.get( URI("http://#{APP_CONFIG[:domain]}/specialties/#{s.id}/#{s.token}/refresh_division_cache/#{d.id}.js") )
end
end
end
end
# The following methods are defined to fake out the ActionController
# requirements of the Rails cache
def cache_store
ActionController::Base.cache_store
end
def self.benchmark( *params )
yield
end
def cache_configured?
true
end
end
(If you want to directly include cache expiration/recaching into this task, check out this implementation.)
via a Custom Controller Action:
If you need to bypass user authentication restrictions to get to your pages, and/or you don't want to screw up (too badly) your website's tracking analytics, you can create a custom controller action for hitting cache digests that use tokens to bypass authentication:
app/controllers/specializations.rb:
class SpecializationsController < ApplicationController
...
before_filter :check_token, :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
skip_authorization_check :only => [:refresh_cache, :refresh_city_cache, :refresh_division_cache]
...
def refresh_cache
#specialization = Specialization.find(params[:id])
#feedback = FeedbackItem.new
render :show, :layout => 'ajax'
end
def refresh_city_cache
#specialization = Specialization.find(params[:id])
#city = City.find(params[:city_id])
render 'refresh_city.js'
end
def refresh_division_cache
#specialization = Specialization.find(params[:id])
#division = Division.find(params[:division_id])
render 'refresh_division.js'
end
end
Our custom controller action renders the views of other expensive to load pages, causing cache hits to those pages. E.g. refresh_cache renders the same view page & data as controller#show, so requests to refresh_cache will warm up the same cache digests as controller#show for those records.
Security Note:
For security reasons, I recommend before providing access to any custom refresh_cache controller request that you pass in a token and check it to make sure that it corresponds with a unique token for that record. Matching URL tokens to database records before providing access (as seen above) is trivial because your Rake task has access to the unique tokens of each record -- just pass the record's token in with each request.
tl;dr:
To trigger thousands of site URL's/cache digests, create a rake task to iteratively request every record/url combination in your site. You can bypass your app's user authentication restrictions for this task by creating a a custom controller action that authenticates access via tokens instead.
I realize this question is about a year old, but I just worked out my own answer, after scouring a bunch of partial & incorrect solutions.
Hopefully this will help the next person...
Per my own utility class, which can be found here:
https://raw.githubusercontent.com/JayTeeSF/cmd_notes/master/automated_action_runner.rb
You can simply run this (per it's .help method) and pre-cache your pages, without tying-up your own web-server, in the process.
class AutomatedActionRunner
class StatusObject
def initialize(is_valid, error_obj)
#is_valid = !! is_valid
#error_obj = error_obj
end
def valid?
#is_valid
end
def error
#error_obj
end
end
def self.help
puts <<-EOH
Instead tying-up the frontend of your production site with:
`curl http://your_production_site.com/some_controller/some_action/1234`
`curl http://your_production_site.com/some_controller/some_action/4567`
Try:
`rails r 'AutomatedActionRunner.run(SomeController, "some_action", [{id: "1234"}, {id: "4567"}])'`
EOH
end
def self.common_env
{"rack.input" => "", "SCRIPT_NAME" => "", "HTTP_HOST" => "localhost:3000" }
end
REQUEST_ENV = common_env.freeze
def self.run(controller, controller_action, params_ary=[], user_obj=nil)
success_objects = []
error_objects = []
autorunner = new(controller, controller_action, user_obj)
Rails.logger.warn %Q|[AutomatedAction Kickoff]: Preheating cache for #{params_ary.size} #{autorunner.controller.name}##{controller_action} pages.|
params_ary.each do |params_hash|
status = autorunner.run(params_hash)
if status.valid?
success_objects << params_hash
else
error_objects << status.error
end
end
return process_results(success_objects, error_objects, user_obj.try(:id), autorunner.controller.name, controller_action)
end
def self.process_results(success_objects=[], error_objects=[], user_id, controller_name, controller_action)
message = %Q|AutomatedAction Summary|
backtrace = (error_objects.first.try(:backtrace)||[]).join("\n\t").inspect
num_errors = error_objects.size
num_successes = success_objects.size
log_message = %Q|[#{message}]: Generated #{num_successes} #{controller_name}##{controller_action}, pages; Failed #{num_errors} times; 1st Fail: #{backtrace}|
Rails.logger.warn log_message
# all the local-variables above, are because I typically call Sentry or something with extra parameters!
end
attr_reader :controller
def initialize(controller, controller_action, user_obj)
#controller = controller
#controller = controller.constantize unless controller.respond_to?(:name)
#controller_instance = #controller.new
#controller_action = controller_action
#env_obj = REQUEST_ENV.dup
#user_obj = user_obj
end
def run(params_hash)
Rails.logger.warn %Q|[AutomatedAction]: #{#controller.name}##{#controller_action}(#{params_hash.inspect})|
extend_with_autorun unless #controller_instance.respond_to?(:autorun)
#controller_instance.autorun(#controller_action, params_hash, #env_obj, #user_obj)
end
private
def extend_with_autorun
def #controller_instance.autorun(action_name, action_params, action_env, current_user_value=nil)
self.params = action_params # suppress strong parameters exception
self.request = ActionDispatch::Request.new(action_env)
self.response = ActionDispatch::Response.new
define_singleton_method(:current_user, -> { current_user_value })
send(action_name) # do it
return StatusObject.new(true, nil)
rescue Exception => e
return StatusObject.new(false, e)
end
end
end

Bubblewrap HTTP -> Table View; method returns bubblewrap query instead of response data

I'm trying out Rubymotion and can't seem to do figure how to accomplish what seems like a simple task.
I've set up a UITableView for a directory of people. I've created a rails back end that returns json.
Person model has a get_people class method defined:
def self.get_people
BubbleWrap::HTTP.get("http://myapp.com/api.json") do |response|
#people = BW::JSON.parse(response.body.to_str)
# p #people prints [{"id"=>10, "name"=>"Sam"}, {etc}] to the console
end
end
In the directory_controller I just want to set an instance variable for #data to the array that my endpoint returns such that I can populate the table view.
I am trying to do #data = Person.get_people in viewDidLoad, but am getting an error message that indicates the BW response object is being passed instead: undefined methodcount' for #BubbleWrap::HTTP::Query:0x8d04650 ...> (NoMethodError)`
So if I hard code my array into the get_people method after the BW response block everything works fine. But I find that I am also unable to persist an instance variable through the close of the BW respond block.
def self.get_people
BubbleWrap::HTTP.get("http://myapp.com/api.json") do |response|
#people = BW::JSON.parse(response.body.to_str)
end
p #people #prints nil to the console
# hard coding [{"id"=>10, "name"=>"Sam"}, {etc}] here puts my data in the table view correctly
end
What am I missing here? How do I get this data out of bubblewrap's response object and in to a usable form to pass to my controllers?
As explained in the BW documentation "BW::HTTP wraps NSURLRequest, NSURLConnection and friends to provide Ruby developers with a more familiar and easier to use API. The API uses async calls and blocks to stay as simple as possible."
Due to async nature of the call, in your 2nd snippet you are printing #people before you actually update it. THe right way is to pass the new data to the UI after parsing ended (say for instance #table.reloadData() if #people array is supposed to be displayed in a UITableView).
Here's an example:
def get_people
BubbleWrap::HTTP.get("http://myapp.com/api.json") do |response|
#people = BW::JSON.parse(response.body.to_str)
update_result()
end
end
def update_result()
p #people
# do stuff with the updated content in #people
end
Find a more complex use case with a more elaborate explanation at RubyMotion async programming with BubbleWrap
Personally, I'd skip BubbleWrap and go for something like this:
def self.get_people
people = []
json_string = self.get_json_from_http
json_data = json_string.dataUsingEncoding(NSUTF8StringEncoding)
e = Pointer.new(:object)
hash = NSJSONSerialization.JSONObjectWithData(json_data, options:0, error: e)
hash["person"].each do |person| # Assuming each of the people is stored in the JSON as "person"
people << person
end
people # #people is an array of hashes parsed from the JSON
end
def self.get_json_from_http
url_string = ("http://myapp.com/api.json").stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
url = NSURL.URLWithString(url_string)
request = NSURLRequest.requestWithURL(url)
response = nil
error = nil
data = NSURLConnection.sendSynchronousRequest(request, returningResponse: response, error: error)
raise "BOOM!" unless (data.length > 0 && error.nil?)
json = NSString.alloc.initWithData(data, encoding: NSUTF8StringEncoding)
end

Create dynamic variables from th class name in tables, move td values into that row's array or hash?

I'm an amateur programmer wanting to scrape data from a site that is similar to this site: http://www.highschoolsports.net/massey/ (I have permission to scrape the site, by the way.)
The target site has 'th' classes for each 'th' in row[0] but I want to ensure that each 'TD' I pull from each table is somehow linked to that th's class name, because the tables are inconsistent, for example one table might be:
row[0] - >>th.name, th.place, th.team
row[1] - >>td[0], td[1] , td[2]
while another might be:
row[0] - >>th.place, th.team, th.name
row[1] - >>td[0], td[1] , td[2] etc..
My Question: How do I capture the 'th' class name across many hundreds of tables which are inconsistent(in 'th' class order) and create the 10-14 variables(arrays), then link the 'td' corresponding to that column in the table to that dynamic variable? Please let me know if this is confusing.. there are multiple tables on a given page..
Currently my code is something like:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'uri'
class Result
def initialize(row)
#attrs = {}
#attrs[:raw] = row.text
end
end
class Race
def initialize(page, table)
#handle = page
#table = table
#results = []
#attrs = {}
parse!
end
def parse!
#attrs[:name] = #handle.css('div.caption').text
get_rows
end
def get_rows
# get all of the rows ..
#handle.css('tr').each do |tr|
#results << RaceResult.new(tr)
end
end
end
class Event
class << self
def all(index_url)
events = []
ourl = Nokogiri::HTML(open(index_url))
ourl.css('a.event').each do |url|
abs_url = MAIN + url.attributes["href"]
events << Event.new(abs_url)
end
events
end
end
def initialize(url)
#url = url
#handle = nil
#attrs = {}
#races = []
#sub_events = []
parse!
end
def parse!
#handle = Nokogiri::HTML(open(#url))
get_page_meta
if(#handle.css('table.base.event_results').length > 0)
#handle.search('div.table_container.event_results').each do |table|
#races << Race.new(#handle, table)
end
else
#handle.css('div.centered a.obvious').each do |ol|
#sub_events << Event.new(BASE_URL + ol.attributes["href"])
end
end
end
def get_page_meta
#attrs[:name] = #handle.css('html body div.content h2 text()')[0] # event name
#attrs[:date] = #handle.xpath("html/body/div/div/text()[2]").text.strip #date
end
end
A friend has been helping me with this and I'm just starting to get a grasp on OOP but I'm only capture the tables and they're not split into td's and stored into some kind of variable/array/hash etc.. I need help understanding this process or how to do this. The critical piece would be dynamically assigning variable names according to the classes of the data and moving the 'td's' from that column (all td[2]'s for example) into that dynamic variable name. I can't tell you how amazing it would be if someone actually could help me solve this problem and understand how to make this work. Thank you in advance for any help!
It's easy once you realize that the th contents are the keys of your hash. Example:
#items = []
doc.css('table.masseyStyleTable').each do |table|
fields = table.css('th').map{|x| x.text.strip}
table.css('tr').each do |tr|
item = {}
fields.each_with_index do |field,i|
item[field] = tr.css('td')[i].text.strip rescue ''
end
#items << item
end
end

Resources