Why won't array take random, shuffle, limit, etc.?

Why won't array take random, shuffle, limit, etc.? - ruby

In my Rails controller code I would like to randomly retrieve three of each content:
#content = Content.includes(:author).find(params[:id])
content_sub_categories = #content.subcategories
related_content = []
content_sub_categories.each do |sub_cat|
related_content << sub_cat.contents
end
#related_content = related_content.rand.limit(3)
rand.limit(3) isn't working, and the errors include:
undefined method `limit' for #<Array:0x007f9e19806bf0>
I'm familiar with Rails but still in the process of learning Ruby. Any help would be incredibly appreciated.
Perhaps it could be I am also rendering out the content in this way <%= #related_content %>?
I'm using:
Rails 3.2.14
Ruby 1.9.3

limit is a a method on ActiveRecord relations (that adds LIMIT X) to the SQL generated. However you have an array not a relation, hence the error.
The equivalent array method is take. You can of course combine both the shuffling and the limit into one step by using the sample method

If you want to pick 3 random elements, use Array#sample:
related_content.sample(3)

This should work:
related_content = []
content_sub_categories.each do |sub_cat|
related_content << sub_cat.contents.sample(3) # add 3 random elements
end
#related_content = related_content
Or without temporary variables using map:
#related_content = #content.subcategories.map { |cat| cat.contents.sample(3) }
Note that #related_content is an array of (3-element) arrays.

How is this ?
a = (1..10).to_a
p a.sample(3)
# >> [4, 10, 7]

Here's the final answer for finding the content id's subcategories, all of these subcategory's contents and displaying the content without repeats:
def show
#content = Content.includes(:author).find(params[:id])
related_content = #content.subcategories.pluck(:id)
#related_content = Content.joins(:subcategories).published.order('random()').limit(3).where(subcategories: { id: related_content}).where('"contents"."id" <> ?', #content.id)
end

Related

Looking for a cleaner way to scrape from website by avoiding repeating

Hi I am just doing a bit of refactoring on a small cli web scraping project I did in Ruby and I was simply wondering if there was cleaner way to write a particular section without repeating the code too much.
Basically with the code below, I pulled data from a website but I had to do this per page. You will notice that both methods are only different by their name and the source.
def self.scrape_first_page
html = open("https://www.texasblackpages.com/united-states/san-antonio")
doc = Nokogiri::HTML(html)
doc.css('div.grid_element').each do |business|
biz = Business.new
biz.name = business.css('a b').text
biz.type = business.css('span.hidden-xs').text
biz.number = business.css('span.sm-block.lmargin.sm-nomargin').text.gsub("\r\n","").strip
end
end
def self.scrape_second_page
html = open('https://www.texasblackpages.com/united-states/san-antonio?page=2')
doc = Nokogiri::HTML(html)
doc.css('div.grid_element').each do |business|
biz = Business.new
biz.name = business.css('a b').text
biz.type = business.css('span.hidden-xs').text
biz.number = business.css('span.sm-block.lmargin.sm-nomargin').text.gsub("\r\n","").strip
end
end
Is there a way for me to streamline this process all with just one method pulling from one source, but have the ability to access different pages within the same site, or this is pretty much the best and only way? They owners of the website do not have a public api from me to pull from in case anyone is wondering.

Remember that in programming you want to steer towards code that follows the Zero, One or Infinity Rule avoid the dreaded two. In other words, write methods that take no arguments, fixed arguments (one), or an array of unspecified size (infinity).
So the first step is to clean up the scraping function to make it as generic as possible:
def scrape(page)
doc = Nokogiri::HTML(open(page))
# Use map here to return an array of Business objects
doc.css('div.grid_element').map do |business|
Business.new.tap do |biz|
# Use tap to modify this object before returning it
biz.name = business.css('a b').text
biz.type = business.css('span.hidden-xs').text
biz.number = business.css('span.sm-block.lmargin.sm-nomargin').text.gsub("\r\n","").strip
end
end
end
Note that apart from the extraction code, there's nothing specific about this. Takes a URL, returns Business objects in an Array.
In order to generate pages 1..N, consider this:
def pages(base_url, start: 1)
page = start
Enumerator.new do |y|
loop do
y << base_url % page
page += 1
end
end
end
Now that's an infinite series, but you can always cap it to whatever you want with take(n) or by instead looping until you get an empty list:
# Collect all business from each of the pages...
businesses = pages('https://www.texasblackpages.com/united-states/san-antonio?page=%d').lazy.map do |page|
# ...by scraping the page...
scrape(page)
end.take_while do |results|
# ...and iterating until there's no results, as in Array#any? is false.
results.any?
end.to_a.flatten
The .lazy part means "evaluate each part of the chain sequentially" as opposed to the default behaviour of trying to evaluate each stage to completion. This is important or else it will try and download an infinite number of pages before moving to the next test.
The .to_a on the end forces that chain to run to completion. The .flatten squishes all the page-wise results into a single result set.
Of course if you want to scrape the first N pages, it's a lot easier:
pages('https://www.texasblackpages.com/.../san-antonio?page=%d').take(n).flat_map do |page|
scrape(page)
end
It's almost no code!

This was suggested by #Todd A. Jacobs
def self.scrape(url)
html = open(url)
doc = Nokogiri::HTML(html)
doc.css('div.grid_element').each do |business|
biz = Business.new
biz.name = business.css('a b').text
biz.type = business.css('span.hidden-xs').text
biz.number = business.css('span.sm-block.lmargin.sm-nomargin').text.gsub("\r\n","").strip
end
The downside is with there not being a public api I had to invoke the method as many times as I need it since the url's are representing different pages within the wbesite, but this is fine because I was able to get rid of the repeating methods.
def make_listings
Scraper.scrape("https://www.texasblackpages.com/united-states/san-antonio")
Scraper.scrape("https://www.texasblackpages.com/united-states/san-antonio?page=2")
Scraper.scrape("https://www.texasblackpages.com/united-states/san-antonio?page=3")
Scraper.scrape("https://www.texasblackpages.com/united-states/san-antonio?page=4")
end

i ever had some problem with you, i do loop though. usually if the page support pagination then the first page it have chance to use query param page also.
def self.scrape
page = 1
loop do
url = "https://www.texasblackpages.com/united-states/san-antonio?page=#{page}"
# do nokogiri parse
# do data scrapping
page += 1
end
end
you can have break on certain page condition.

How to get the second element of an array in an array ruby

I have a set as follows:
verb_tag_set = Set.new ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ", "MD"]
My tagged_text array contains the following:
tagged_text = [["VB", "go"], ["VBG", "going"]]
I am trying to get all of the elements of the second row of each array by selecting those arrays that contain an element that matches one of the items in the verb_tag_set.
verb_tagged_array = tagged_text.select{|el| el[1] if verb_tag_set.include?(el[0])}
verb_tagged_array.map{|row| row[1]}
Although this works, I should be able to get the array in one line.
Any ideas on how to refine this code?
NOOB with ruby so any help appreciated.

It appears you want the following.
require 'set'
verb_tag_set = Set.new ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ", "MD"]
tagged_text = [["VB", "go"], ["VBG", "going"]]
tagged_text.select { |a,b| verb_tag_set.include?(a) }.map(&:last)
#=> ["go", "going"]

require 'set'
verb_tag_set = Set.new ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ", "MD"]
tagged_text = [["VB", "go"], ["VBG", "going"]]
verb_tag_set.map(&tagged_text.to_h.method(:[])).compact
#⇒ [ "go", "going" ]
Here we use the tagged_text as a mapper function for the set, getting rid of nils afterwards.

Getting typed results from ActiveRecord raw SQL

In Sequel, I can do:
irb(main):003:0> DB["select false"].get
=> false
Which returns a false boolean. I'd like to be able to do something similar in ActiveRecord:
irb(main):007:0> ActiveRecord::Base.connection.select_value "select false"
=> "f"
As you can see, it returns the string "f". Is there a way to get a false boolean with ActiveRecord? (Similarly, I might be calling a function that returns a timestamptz, an array, etc -- I'd want the returned value to have the correct type)
My use case: I'm calling a database function, want to get back a typed result instead of a string.

While I have no doubt that Björn Nilsson's answer worked when he posted it, it is failing for me with Postgres 9.4 and PG gem version 0.18.2. I have found the following to work after looking through the PG gem documentation:
pg = ActiveRecord::Base.connection
#type_map ||= PG::BasicTypeMapForResults.new(pg.raw_connection)
res = pg.execute("SELECT 'abc'::TEXT AS a, 123::INTEGER AS b, 1.23::FLOAT;")
res.type_map = #type_map
res[0]
# => {"a"=>"abc", "b"=>123, "float8"=>1.23}

Pretty ugly but does what you are asking for:
res = ActiveRecord::Base.connection.
select_all("select 1 as aa, false as aas, 123::varchar, Array[1,2] as xx")
# Breaks unless returned values have unique column names
res.map{|row|row.map{|col,val|res.column_types[col].type_cast val}}
# Don't care about no column names
res.map{|row|
row.values.map.with_index{|val,idx|
res.column_types.values[idx].type_cast val
}
}
gives:
[[1, false, "123", [1, 2]]]
How it works:
res.column_types
returns a hash of columns names and Postgresql column types
Here is a pointer to how it works:
https://github.com/rails/docrails/blob/fb8ac4f7b8487e4bb5c241dc0ba74da30f21ce9f/activerecord/lib/active_record/connection_adapters/postgresql/oid/float.rb

Don't have enough reputation points to respond, but Bjorn's answer and associated replies are broken in Rails 5. This works:
res = ActiveRecord::Base.connection.select_all(sql)
res.to_a.map{|o| o.each{|k, v| o[k] = res.column_types[k].cast v}}

I don't know if it is the way, but you can create activerecord model without table with sort of fake column:
class FunctionValue < ActiveRecord::Base
def self.columns
#columns ||= [];
end
def self.column(name, sql_type = nil, default = nil, null = true)
columns << ActiveRecord::ConnectionAdapters::Column.new(
name.to_s,
default,
sql_type.to_s,
null
)
end
column :value, :boolean
end
And then you can run this:
function_value = FunctionValue.find_by_sql('select false as value').first
function_value.value

This works for me in rails 5
results = ActiveRecord::Base.connection.select_all(sql)
results.rows.map{ |row| Hash[results.columns.zip(row)] }
Gives nice results
[{"person1"=>563, "person2"=>564, "count"=>1},
{"person1"=>563, "person2"=>566, "count"=>5},
{"person1"=>565, "person2"=>566, "count"=>1}]

In Rails 6, Person.connection.select_all(sql_query).to_a
...will return an array of hashes whose values are type-casted. Example:
[{"id"=>12, "name"=>"John Doe", "vip_client"=>false, "foo"=> nil, "created_at"=>2018-01-24 23:55:58 UTC}]
If you prefer an OpenStruct, use Mike's suggestion:
Person.connection.select_all(sql_query).to_a.map {|r| OpenStruct.new(r) }
If you prefer symbols as keys, call map(&:symbolize_keys) after to_a.

calling .uniq on capybara xpath selectors and logically bypassing Capybara::ElementNotFound

I am working on a javascript capable screen-scraper using capybara/dsl, selienium webdriver, and the spreadsheet gem. Very close to the desired output however two major problems arise:
I have not been able to figure out the exact xpath selector to filter out only the elements I'm looking for; to ensure that none are missing I am using a broad selector that I know will produce duplicate elements. I was planning on just calling .uniq on that selector but this throws an error. What is the proper way to do this results in the desired filtering. The error is an undefined no method for 'uniq'. Maybe I'm not using it properly: results = all("//a[contains(#onclick, 'analyticsLog')]").uniq. I know that the xpath that I have chosen to extract hrefs: //a[contains(#onclick, 'analyticsLog')] will define more nodes than I intended because using find to inspect the page elements shows 144 rather than 72 that make up the page results. I have looked for a more specific selector however I haven't been able to find one without filtering out some desired links due to the business logic used on the site.
My save_item method has two selectors that are not always found within the info results, I would like the script to just skip those that aren't found and save only the ones that are however my current iteration will throw a Capybara::ElementNotFound and exit. How could I configure this to work in the intended way.
#
code below
#
require "capybara/dsl"
require "spreadsheet"
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'
class Tomtop
include Capybara::DSL
def initialize
#excel = Spreadsheet::Workbook.new
#work_list = #excel.create_worksheet
#row = 0
end
def go
visit_main_link
end
def visit_main_link
visit "http://www.some.com/clothing-accessories?dir=asc&limit=72&order=position"
results = all("//a[contains(#onclick, 'analyticsLog')]")# I would like to use .uniq here to filter out the duplicates that I know will be delivered by this selector
item = []
results.each do |a|
item << a[:href]
end
item.each do |link|
visit link
save_item
end
#excel.write "inventory.csv"
end
def save_item
data = all("//*[#id='content-wrapper']/div[2]/div/div")
data.each do |info|
#work_list[#row, 0] = info.find("//*[#id='productright']/div/div[1]/h1").text
#work_list[#row, 1] = info.find("//div[contains(#class, 'price font left')]").text
#work_list[#row, 2] = info.find("//*[#id='productright']/div/div[11]").text
#work_list[#row, 3] = info.find("//*[#id='tabcontent1']/div/div").text.strip
#work_list[#row, 4] = info.find("//select[contains(#name, 'options[747]')]//*[#price='0']").text #I'm aware that this will not always be found depending on the item in question but how do I ensure that it doesn't crash the program
#work_list[#row, 5] = info.find("//select[contains(#name, 'options[748]')]//*[#price='0']").text #I'm aware that this will not always be found depending on the item in question but how do I ensure that it doesn't crash the program
#row = #row + 1
end
end
end
tomtop = Tomtop.new
tomtop.go

For Question 1: Get unique elements
All of the elements returned by all are unique. Therefore, I assume by "unique" elements, you mean that the "onclick" attribute is unique.
The collection of elements returned by Capybara is an enumerable. Therefore, you can convert it to an array and then take the unique element's based on their onclick attribute:
results = all("//a[contains(#onclick, 'analyticsLog')]")
.to_a.uniq{ |e| e[:onclick] }
Note that it looks like the duplicate links are due to one for the image and one for the text below the image. You could scope your search to just one or the other and then you would not need to do the uniq check. To scope to just the text link, use the fact that the link is a child of an h5:
results = all("//h5/a[contains(#onclick, 'analyticsLog')]")
For Question 2: Capture text if element present
To solve your second problem, you could use first to locate the element. This will return the matching element if one exists and nil if one does not. You could then save the text if the element is found.
For example:
el = info.first("//select[contains(#name, 'options[747]')]//*[#price='0']")
#work_list[#row, 4] = el.text if el
If you want the text of all matching elements, then use all:
options = info.all(".//select[contains(#name, 'options[747]')]//*[#price='0']")
#work_list[#row, 4] = options.collect(&:text).join(', ')
When there are multiple matching options, you will get something like "Green, Pink". If there are no matching options, you will get "".

Ruby: Shorthand for associative hash/array from mysql2 result set

So, new to ruby and struggling on this:
images table - 2 columns, filename and md5
using the mysql2 extension
images = Hash.new
results = client.query("SELECT * FROM images").each do |row|
images[row['filename']] = row['md5']
end
i'd like to just do this automatically, it seems pointless to loop through to make a hash - I think that I have missed something ?

You can try following
images = Hash[*Image.all.map{ |i| [i.filename, i.md5] }.flatten]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why won't array take random, shuffle, limit, etc.? - ruby

limit is a a method on ActiveRecord relations (that adds LIMIT X) to the SQL generated. However you have an array not a relation, hence the error. The equivalent array method is take. You can of course combine both the shuffling and the limit into one step by using the sample method

If you want to pick 3 random elements, use Array#sample: related_content.sample(3)

How is this ? a = (1..10).to_a p a.sample(3) # >> [4, 10, 7]

Related

Looking for a cleaner way to scrape from website by avoiding repeating

How to get the second element of an array in an array ruby

Getting typed results from ActiveRecord raw SQL

calling .uniq on capybara xpath selectors and logically bypassing Capybara::ElementNotFound

Ruby: Shorthand for associative hash/array from mysql2 result set

Categories

Resources