I want to download and parse all the event data from a website's public google calendar, what would be the best way to do so? I'm considering just downloading the ics file or getting xml data and parsing that myself. I've looked into google api but it looks unnecessarily complex if all I want to do is read the data. I'm a beginner to working with API's and programming in general so I'm having trouble navigating all that documentation. They don't provide very many helpful examples.
How about something like this:
require 'ri_cal'
require 'open-uri'
components = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
components = RiCal.parse(cal)
end
components.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end
You will need to install the ri_gem.
UPDATE: Using iCalendar
require 'icalendar'
require 'open-uri'
calendars = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
#calendars = RiCal.parse(cal)
calendars = Icalendar.parse(cal)
end
calendars.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end
Related
I am trying to scrape through the following website :
https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html
to get all of the state statistics on coronavirus.
My code below works:
require 'nokogiri'
require 'open-uri'
require 'httparty'
require 'pry'
url = "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html"
doc = Nokogiri::HTML.parse(open(url))
total_cases = doc.css("span.count")[0].text
total_deaths = doc.css("span.count")[1].text
new_cases = doc.css("span.new-cases")[0].text
new_deaths = doc.css("span.new-cases")[1].text
However, I am unable to get into the collapsed data/gridcell data.
I have tried searching by the class .aria-label and by the .rt-tr-group class. Any help would be appreciated. Thank you.
Although the answer of Layon Ferreira already states the problem it does not provide the steps needed to load the data.
Like already said in the linked answer the data is loaded asynchronously. This means that the data is not present on the initial page and is loaded through the JavaScript engine executing code.
When you open up the browser development tools, go to the "Network" tab. You can clear out all requests, then refresh the page. You'll get to see a list of all requests made. If you're looking for asynchronously loaded data the most interesting requests are often those of type "json" or "xml".
When browsing through the requests you'll find that the data you're looking for is located at:
https://www.cdc.gov/coronavirus/2019-ncov/json/us-cases-map-data.json
Since this is JSON you don't need "nokogiri" to parse it.
require 'httparty'
require 'json'
response = HTTParty.get('https://www.cdc.gov/coronavirus/2019-ncov/json/us-cases-map-data.json')
data = JSON.parse(response.body)
When executing the above you'll get the exception:
JSON::ParserError ...
This seems to be a Byte Order Mark (BOM) that is not removed by HTTParty. Most likely because the response doesn't specify an UTF-8 charset.
response.body[0]
#=> ""
format '%X', response.body[0].ord
#=> "FEFF"
To correctly handle the BOM Ruby 2.7 added the set_encoding_by_bom method to IO which is also available on StringIO.
require 'httparty'
require 'json'
require 'stringio'
response = HTTParty.get('https://www.cdc.gov/coronavirus/2019-ncov/json/us-cases-map-data.json')
body = StringIO.new(response.body)
body.set_encoding_by_bom
data = JSON.parse(body.gets(nil))
#=> [{"Jurisdiction"=>"Alabama", "Range"=>"10,001 to 20,000", "Cases Reported"=>10145, ...
If you're not yet using Ruby 2.7 you can use a substitute to remove the BOM, however the former is probably the safer option:
data = JSON.parse(response.body.force_encoding('utf-8').sub(/\A\xEF\xBB\xBF/, ''))
That page is using AJAX to load its data.
in that case you may use Watir to fetch the page using a browser
as answered here: https://stackoverflow.com/a/13792540/2784833
Another way is to get data from the API directly.
You can see the other endpoints by checking the network tab on your browser console
I replicated your code and found some of the errors that you might have done
require 'HTTParty'
will not work. You need to use
require 'httparty'
Secondly, there should be quotes around your variable url value i.e
url = "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html"
Other than that, it just worked fine for me.
Also, if you're trying to get the Covid-19 data you might want to use these APIs
For US Count
For US Daily Count
For US Count - States
You could learn more about the APIs here
Im trying to make an app which would iterate through my own posts and get a list of users who favorited a post. Afterwards I would like the application to follow each of those users if I am not already following them. I am using Ruby for this.
This is my code now:
#client = Twitter::REST::Client.new(config)
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
user = #client.user()
tweets = #client.user_timeline(user).take(20)
num_of_tweets = tweets.length
puts "tweets found: #{tweets.length}"
tweets.each do |item|
puts "#{ item}" #iterating through my posts here
end
any suggestions?
That information isn't exposed in the Twitter API, either through a timeline collection or via the endpoint representing a single tweet. This'll be why the twitter gem, which provides a useable interface around the Rest API, cannot give you what you're after.
Third party sites such as Favstar do display that information, but as far as I know their own API does not expose the relevant users in any manageable way.
I'm trying to get an example of the following code from github that looks to be a dead topic for my Linux/Ubuntu install. I have been trying to scrape data from my company intranet using "mechanize" see stack question for details. Since I'm not smart enough to figure a way around my login issue I thought I would try and feed data from an excel sheet as a work around until I can figure out the mechanize route. Once again I'm not smart enough to get the provided code to work on Linux because I'm getting the following error:
`kqueue=': kqueue is not supported on this platform (EventMachine::Unsupported)
If I'm understanding correctly from the information provided in the original source, the problem is that kqueue isn't supported in Linux. The OP states that inotify is an alternative but I've had no luck finding a similar example using it to display Excel in a widget.
Here is the code that is shown on GitHub and would like help converting it to work on Linux:
require 'roo'
EM.kqueue = EM.kqueue?
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell(1, 2) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
EM.next_tick do
EM.watch_file(file_path, Handler)
end
Okay, so I was able to get this working and to display my data on a Dashing Dashboard widget by doing the following:
First: I uploaded my spreadsheet.xls to the root directory of my dashboard.
Second: I replaced the /jobs/sample.rb code with:
#!/usr/bin/env ruby
require 'roo'
SCHEDULER.every '2s' do
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell('B',49) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
end
Third: Make sure the /widgets/number is in your dashboard "this is part of the sample install".
Fourth: Add the following code to your /dashboards/sample.erb file "this is part of the sample install as well".
<li data-row="1" data-col="1" data-sizex="1" data-sizey="1">
<div data-id="valuation" data-view="Number" data-title="Current Valuation" data-prefix="$"></div>
</li>
I used this source to help me better understand how Roo works. I tested my widget by changing my values and re-uploading the spreadsheet.xls to server and seen instant changes on my dashboard.
Hope this helps someone and I'm still looking for help to automate this process by scraping the data. Reference this if you can help.
Thanks for sharing this code sample. I did not manage to make it work in my environment (Raspberry/Raspbian) but after some efforts I managed to come up something that works -- at least for me ;)
I had never worked with Ruby before this week, so this code may be a bit crappy. Please accept apologizes.
-- Christophe
require 'roo'
require 'rubygems'
require 'rb-inotify'
# Implement INotify::Notifier.watch as described here:
# https://www.go4expert.com/articles/track-file-changes-ruby-inotify-t30264/
file_path = "#{Dir.pwd}/datasheet.csv"
def fetch_spreadsheet_data(path)
s = Roo::CSV.new(path)
send_event('csvdata', { value: s.cell(1, 1) })
end
SCHEDULER.every '5s' do
notifier = INotify::Notifier.new
notifier.watch(file_path, :modify) do |event|
event.flags.each do |flag|
## convert to string
flag = flag.to_s
puts case flag
when 'modify' then fetch_spreadsheet_data(file_path)
end
end
end
## loop, wait for events from inotify
notifier.process
end
I want to use the API of a website in a Ruby script, and the only return from the API is a number through the HTTPS protocol. Nothing more, not even tags or something, so I was wondering if there is a way to get that number in a string or integer in my script without using any XML parsing livrary or gem like REXML or hpricot or libXML, because the webpages that I want to parse are, as I said, extremely basic...
If I understand. A request to https://www.website.com/api/getid return 2.
Then, I guess this would do:
require 'net/https'
require 'uri'
def open(url)
Net::HTTP.get(URI.parse(url))
end
response = open("https://www.website.com/api/getid")
EDIT
You'll find much usefull examples here.
As it is mentioned in the link above, HTTParty is quite popular. An example:
require 'httparty'
response = HTTParty.get('http://twitter.com/statuses/public_timeline.json')
puts response.body, response.code, response.message, response.headers.inspect
I have a problem parsing an RSS feed.
When I do this:
feed = getFeed("http://example.com/rss)
If the feed content changes it don't update.
If I do it like this:
feed = getFeed("http://example.com/rss?" + Random.rand(20).to_s)
It works most of the time but not always.
getFeed() is implemented like this:
def getFeed(url)
rss_content = ""
open(url) do |f|
rss_content = f.read
end
return rss_content
end
I used this in Sinatra with Ruby 1.9.3, if this make a difference.
On my opinion somewhere it gets cached but I have no idea where.
Edit:
Okey after 1/2 day running on the server it works with out a problem.
This:
feed = getFeed("http://example.com/rss?" + Random.rand(20).to_s)
implies the problem is with caching, but Ruby, OpenURI and Sinatra shouldn't be caching anything. Perhaps your code is running behind a caching device or app that is handling outgoing requests as well as incoming?
This isn't the fix, but your code can be streamlined greatly:
def getFeed(url)
open(url).read
end