Using Nokogiri to scrape itemprop data - ruby

I have a div which looks like the following and I am trying to scrape the itemprop datetime data but I can't seem to get it to work.
<time itemprop="startDate" datetime="2019-03-28T19:00:00">
Thursday, March 28, 2019
</time>
The script below pulls the text for the date just fine (i.e., . Thursday, March 28, 2019), but the time selector throws this error.
undefined method `text' for nil:NilClass (NoMethodError)
I've searched Stackoverflow, and I've tried to map the time data but nothing works.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
my_local_filename = "C:/data-hold-classes/Santa Fe College" + ".html"
data = Nokogiri::HTML(open(my_local_filename), "r")
classes = data.css(".col-xs-7")
classes.each do |item|
class = item.at_css("a b").text.strip #=> All details
date = item.at_css("a > div > time").text.strip #==> Thursday, March 28, 2019
#time = item.at_css("a datetime").text.strip #==>
puts class
puts date
#puts time
puts " "
end
My goal is to pull the datetime portion of the div so I can format it as time (e.g., 8:00PM)

The line item.at_css("a > div > time") returns an element time.
a > div > time is a nested path to get that element. Now, you wanna get time, an attribute, not html element, so path a datetime will not return anything (cause we have no datetime element).
You can get date by using:
item.at_css("a > div > time")["datetime"].strip
Hope it helps :D

Related

Ruby - how to retrieve text after a div with Nokogiri

I am trying to retrieve the date and time info from the code below (Target Code). I can pull the class name but not date and time.
class = events.at_css('div.classTitle b').text
date = events.at_css('.classTitle') ["eventTime"]
time = events.at_css('.classTime span')
p class
p date
p time
I get the class name but nil for date and time
Target code
<div class="classTitle"><b>Astronomy 101</b></div>
<div class="classTime">
Friday, May 3, 2019<span class="smalltype"> at</span> 7:00PM</div>
<br>
You want the Node#content method:
This is index.html:
<div class="classTitle"><b>Astronomy 101</b></div>
<div class="classTime">
Friday, May 3, 2019<span class="smalltype"> at</span> 7:00PM
</div>
<br>
This is test.rb:
require 'nokogiri'
events = Nokogiri::HTML(open('index.html'))
date, time = events.at_css('div.classTime').content.strip.split('at')
puts date #=> Friday, May 3, 2019
puts time #=> 7:00PM

Cucumber Date Ranges Ruby

I've seen some similar questions on here regarding date ranges yet none of the solutions seem to work for me. What i'm trying to do is have a date range of a month and confirm if today's date is within that range. Eventually, this will be put into a case method for every month of the year as the functionality I'm testing is date specific.
I tried to convert the dates to integer to make the calculation easier (or so I thought) then use between? to check the date range.
This is my code:
today = Time.now.to_i
month_start = Time.parse('1 Jan 2016').to_i
month_end = Time.parse('31 Jan 2016').to_i
if today.between?(month_start,month_end)
#do something
end
When having a puts on each variable, this is the output:
Today = 1468479863
month_start = 1451606400
month_end = 1454198400
As you can see, this should fail as today is not between the date range, it's far outside it. Yet, the tests are going green which would suggest my if statement containing the between? method isn't working.
Is there something blindingly obvious that I'm missing here as I can't see it.
The step in the feature is here:
Then(/^I can see that all results have a statement due this month$/) do
today = Time.now.to_i
month_start = Time.parse('1 Jan 2016').to_i
month_end = Time.parse('31 Jan 2016').to_i
results_table = all('table#clickable-rows tbody tr')
if today.between?(month_start,month_end)
results_table.each do |row|
within(row.all('td')[3]) do
statement_date = find('table#clickable-rows tbody tr td:nth-child(4) > span')
expect(statement_date).to have_text '1 Aug 2016'
end
end
end
end
Your tests pass, because if today.between?(month_start,month_end) return false, no expectations are run!
You might want to consider changing the if into an expectation in itself:
Then(/^I can see that all results have a statement due this month$/) do
today = Time.now.to_i
month_start = Time.parse('1 Jan 2016').to_i
month_end = Time.parse('31 Jan 2016').to_i
results_table = all('table#clickable-rows tbody tr')
expect(today).to be_between(month_start,month_end)
results_table.each do |row|
within(row.all('td')[3]) do
statement_date = find('table#clickable-rows tbody tr td:nth-child(4) > span')
expect(statement_date).to have_text '1 Aug 2016'
end
end
end
I think you mis-typed this:
month_end = Time.parse('1 Jan 2016').to_i
Shouldn't it be '31 Jan 2016' instead?
Your whole test seems a bit jumbled - I think from the description of the step what you actually want is
Then(/^I can see that all results have a statement due this month$/) do
expected_date = /\A#{Date.today.strftime('1 %b %Y')}\Z/ # rexex needed so it doesn't match 11 Jul 2016 or 21 Jul 2016
results_table = all('table#clickable-rows tbody tr')
results_table.each do |row|
expect(row.find('td:nth-child(4)')).to have_text(expected_date)
end
end
The results_table.each block could have also been written as
expect(results_table).to RSpec.all(have_selector('td:nth-child(4)', text: expected_date))

capybara - Find with xPath is leaving the within scope

I am trying to build a date selector with Capybara using the default Rails date, time, and datetime fields. I am using the within method to find the select boxes for the field but when I use xPath to find the correct box it leaves the within scope and find the first occurrence on the page of the element.
Here is the code I am using. The page I am testing on has 2 datetime fields but I can only get it to change the first because of this error. At the moment I have an div container with id that wraps up the datetime field but I do plan on switching the code to find by the label.
module Marketron
module DateTime
def select_date(field, options = {})
date_parse = Date.parse(options[:with])
year = date_parse.year.to_s
month = date_parse.strftime('%B')
day = date_parse.day.to_s
within("div##{field}") do
find(:xpath, "//select[contains(#id, \"_#{FIELDS[:year]}\")]").select(year)
find(:xpath, "//select[contains(#id, \"_#{FIELDS[:month]}\")]").select(month)
find(:xpath, "//select[contains(#id, \"_#{FIELDS[:day]}\")]").select(day)
end
end
def select_time(field, options = {})
require "time"
time_parse = Time.parse(options[:with])
hour = time_parse.hour.to_s.rjust(2, '0')
minute = time_parse.min.to_s.rjust(2, '0')
within("div##{field}") do
find(:xpath, "//select[contains(#id, \"_#{FIELDS[:hour]}\")]").find(:xpath, "option[contains(#value, '#{hour}')]").select_option
find(:xpath, "//select[contains(#id, \"_#{FIELDS[:minute]}\")]").find(:xpath, "option[contains(#value, '#{minute}')]").select_option
end
end
def select_datetime(field, options = {})
select_date(field, options)
select_time(field, options)
end
private
FIELDS = {year: "1i", month: "2i", day: "3i", hour: "4i", minute: "5i"}
end
end
World(Marketron::DateTime)
You should specify in the xpath that you want to start with the current node by adding a . to the start:
find(:xpath, ".//select[contains(#id, \"_#{FIELDS[:year]}\")]")
Example:
I tested an HTML page of this (hopefully not over simplifying your page):
<html>
<div id='div1'>
<span class='container'>
<span id='field_01'>field 1</span>
</span>
</div>
<div id='div2'>
<span class='container'>
<span id='field_02'>field 2</span>
</span>
</div>
</html>
Using the within methods, you can see your problem when you do this:
within("div#div1"){ puts find(:xpath, "//span[contains(#id, \"field\")]").text }
#=> field 1
within("div#div2"){ puts find(:xpath, "//span[contains(#id, \"field\")]").text }
#=> field 1
But you can see that but specifying the xpath to look within the current node (ie using .), you get the results you want:
within("div#div1"){ puts find(:xpath, ".//span[contains(#id, \"field\")]").text }
#=> field 1
within("div#div2"){ puts find(:xpath, ".//span[contains(#id, \"field\")]").text }
#=> field 2

How do I add two weeks to Time.now?

How can I add two weeks to the current Time.now in Ruby? I have a small Sinatra project that uses DataMapper and before saving, I have a field populated with the current time PLUS two weeks, but is not working as needed. Any help is greatly appreciated! I get the following error:
NoMethodError at /
undefined method `weeks' for 2:Fixnum
Here is the code for the Model:
class Job
include DataMapper::Resource
property :id, Serial
property :position, String
property :location, String
property :email, String
property :phone, String
property :description, Text
property :expires_on, Date
property :status, Boolean
property :created_on, DateTime
property :updated_at, DateTime
before :save do
t = Time.now
self.expires_on = t + 2.week
self.status = '0'
end
end
You don't have such nice helpers in plain Ruby. You can add seconds:
Time.now + (2*7*24*60*60)
But, fortunately, there are many date helper libraries out there (or build your own ;) )
Ruby Date class has methods to add days and months in addition to seconds in Time.
An example:
require 'date'
t = DateTime.now
puts t # => 2011-05-06T11:42:26+03:00
# Add 14 days
puts t + 14 # => 2011-05-20T11:42:26+03:00
# Add 2 months
puts t >> 2 # => 2011-07-06T11:42:26+03:00
# And if needed, make Time object out of it
(t + 14).to_time # => 2011-05-20 11:42:26 +0300
require 'rubygems'
require 'active_support/core_ext/numeric/time'
self.expires = 2.weeks.from_now
You have to use seconds to do calculation between dates, but you can use the Time class as a helper to get the seconds from the date part elements.
Time.now + 2.week.to_i
EDIT: As mentioned by #iain you will need Active Support to accomplish usign 2.week.to_i, if you can't (or don't want to) have this dependency you can always use the + operator to add seconds to a Time instance (time + numeric → time docs here)
Time.now + (60 * 60 * 24 * 7 * 2)
I think week/weeks is defined in the active support numeric extension
$ ruby -e 'p Time.now'
2011-05-05 22:27:04 -0400
$ ruby -r active_support/core_ext/numeric -e 'p Time.now + 2.weeks'
2011-05-19 22:27:07 -0400
You can use these 3 patterns
# you have NoMethod Error undefined method
require 'active_support/all'
# Tue, 28 Nov 2017 11:46:37 +0900
Time.now + 2.weeks
# Tue, 28 Nov 2017 11:46:37 +0900
Time.now + 2.week
# Tue Nov 28 11:48:24 +0900 2017
2.weeks.from_now
<%current_time=Time.now
current_time_s=current_time.strftime('%Y-%m-%d %H:%M:%S').to_s #show currrent date time
current_time= Time.now + (60 * 60 * 24 * 7 * 250)
current_time_e=current_time.strftime('%Y-%m-%d %H:%M:%S').to_s #show datetime after week
%>
I like mine too :)
def minor?(dob)
n = DateTime.now
a = DateTime.parse(dob)
a >> 12*18 > n
end
Saves you the trouble of thinking about leap years and seconds. Just works out of the box.

How to check whether a string contains today's date in a specific format

I'm parsing some RSS feeds that aggregate what's going on in a given city. I'm only interested in the stuff that is happening today.
At the moment I have this:
require 'rubygems'
require 'rss/1.0'
require 'rss/2.0'
require 'open-uri'
require 'shorturl'
source = "http://rss.feed.com/example.xml"
content = ""
open(source) do |s| content = s.read end
rss = RSS::Parser.parse(content, false)
t = Time.now
day = t.day.to_s
month = t.strftime("%b")
rss.items.each do |rss|
if "#{rss.title}".include?(day)&&(month)
# does stuff with it
end
end
Of course by checking whether the title (that I know contains the date of event in the following format: "(2nd Apr 11)") contains the day and the month (eg. '2' and 'May') I get also info about the events that happen on 12th May, 20th of May and so on. How can I make it foolproof and only get today's events?
Here's a sample title: "Diggin Deeper # The Big Chill House (12th May 11)"
today = Time.now.strftime("%d:%b:%y")
if date_string =~ /(\d*).. (.*?) (\d\d)/
article_date = sprintf("%02i:%s:%s", $1.to_i, $2, $3)
if today == article_date
#this is today
else
#this is not today
end
else
raise("No date found in title.")
end
There could potentially be problems if the title contains other numbers. Does the title have any bounding characters around the date, such as a hyphen before the date or brackets around it? Adding those to the regex could prevent trouble. Could you give us an example title? (An alternative would be to use Time#strftime to create a string which would perfectly match the date as it appears in the title and then just use String#include? with that string, but I don't think there's an elegant way to put the 'th'/'nd'/'rd'/etc on the day.)
Use something like this:
def check_day(date)
t = Time.now
day = t.day.to_s
month = t.strftime("%b")
if date =~ /^#{day}nd\s#{month}\s11/
puts "today!"
else
puts "not today!"
end
end
check_day "3nd May 11" #=> today!
check_day "13nd May 11" #=> not today!
check_day "30nd May 11" #=> not today!

Resources