I'm trying to get an example of the following code from github that looks to be a dead topic for my Linux/Ubuntu install. I have been trying to scrape data from my company intranet using "mechanize" see stack question for details. Since I'm not smart enough to figure a way around my login issue I thought I would try and feed data from an excel sheet as a work around until I can figure out the mechanize route. Once again I'm not smart enough to get the provided code to work on Linux because I'm getting the following error:
`kqueue=': kqueue is not supported on this platform (EventMachine::Unsupported)
If I'm understanding correctly from the information provided in the original source, the problem is that kqueue isn't supported in Linux. The OP states that inotify is an alternative but I've had no luck finding a similar example using it to display Excel in a widget.
Here is the code that is shown on GitHub and would like help converting it to work on Linux:
require 'roo'
EM.kqueue = EM.kqueue?
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell(1, 2) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
EM.next_tick do
EM.watch_file(file_path, Handler)
end
Okay, so I was able to get this working and to display my data on a Dashing Dashboard widget by doing the following:
First: I uploaded my spreadsheet.xls to the root directory of my dashboard.
Second: I replaced the /jobs/sample.rb code with:
#!/usr/bin/env ruby
require 'roo'
SCHEDULER.every '2s' do
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell('B',49) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
end
Third: Make sure the /widgets/number is in your dashboard "this is part of the sample install".
Fourth: Add the following code to your /dashboards/sample.erb file "this is part of the sample install as well".
<li data-row="1" data-col="1" data-sizex="1" data-sizey="1">
<div data-id="valuation" data-view="Number" data-title="Current Valuation" data-prefix="$"></div>
</li>
I used this source to help me better understand how Roo works. I tested my widget by changing my values and re-uploading the spreadsheet.xls to server and seen instant changes on my dashboard.
Hope this helps someone and I'm still looking for help to automate this process by scraping the data. Reference this if you can help.
Thanks for sharing this code sample. I did not manage to make it work in my environment (Raspberry/Raspbian) but after some efforts I managed to come up something that works -- at least for me ;)
I had never worked with Ruby before this week, so this code may be a bit crappy. Please accept apologizes.
-- Christophe
require 'roo'
require 'rubygems'
require 'rb-inotify'
# Implement INotify::Notifier.watch as described here:
# https://www.go4expert.com/articles/track-file-changes-ruby-inotify-t30264/
file_path = "#{Dir.pwd}/datasheet.csv"
def fetch_spreadsheet_data(path)
s = Roo::CSV.new(path)
send_event('csvdata', { value: s.cell(1, 1) })
end
SCHEDULER.every '5s' do
notifier = INotify::Notifier.new
notifier.watch(file_path, :modify) do |event|
event.flags.each do |flag|
## convert to string
flag = flag.to_s
puts case flag
when 'modify' then fetch_spreadsheet_data(file_path)
end
end
end
## loop, wait for events from inotify
notifier.process
end
Related
I started playing around with the blog5 example (https://github.com/voltrb/blog5) from the Volt documentation website and tried to upgrade to Volt 0.9.0 to 0.9.3.
After changing the version number in the Gemfile, the edit functionality seems to be broken, clicking on the "Edit" link (see edit controller below) blocks the app. The same behaviour persists after changing the deprecated _id to id. Can anyone advise, what's wrong with the controller below, or what else may have changed between these Volt versions?
def new
self.model = store._blog_posts.buffer
end
def edit
self.model = store._blog_posts.where(_id: params._id).fetch_first.then(&:buffer)
end
def show
self.model = store._blog_posts.where(_id: params._id).fetch_first
end
def post
end
# Save the post
def save
model.save! do
redirect_to '/'
end.fail do |errors|
flash._errors << errors.to_s
end
end
Many tanks.
.fetch_first was replaced with just .first (though it should still work on .fetch_first, just with a deprecation warning, so maybe there's another issue)
I recently discovered SitePrism via the rubyweekly email.
It looks amazing. I can see its going to be the future.
The examples I have seen are mostly for cucumber steps.
I am trying to figure out how one would go about using SitePrism with rspec.
Assuming #home_page for the home page, and #login_page for the login_page
I can understand that
#home_page.load # => visit #home.expanded_url
however, the part I am not sure about, is if I think click on for example the "login" link, and the browser in Capybara goes to the login page - how I can then access an instance of the login page, without loading it.
#home_page = HomePage.new
#home_page.load
#home.login_link.click
# Here I know the login page should be loaded, so I can perhaps do
#login_page = LoginPage.new
#login_page.should be_displayed
#login_page.email_field.set("some#email.com")
#login_page.password_field.set("password")
#login_page.submit_button.click
etc...
That seems like it might work. So, when you know you are supposed to be on a specific page, you create an instance of that page, and somehow the capybara "page" context, as in page.find("a[href='/sessions/new']") is transferred to the last SitePrism object?
I just feel like I am missing something here.
I'll play around and see what I can figure out - just figured I might be missing something.
I am looking through the source, but if anyone has figured this out... feel free to share :)
What you've assumed turns out to be exactly how SitePrism works :) Though you may want to check the epilogue of the readme that explains how to save yourself from having to instantiate page objects all over your test code. Here's an example:
# our pages
class Home < SitePrism::Page
#...
end
class SearchResults < SitePrism::Page
#...
end
# here's the app class that represents our entire site:
class App
def home
Home.new
end
def results_page
SearchResults.new
end
end
# and here's how to use it:
#first line of the test...
#app = App.new
#app.home.load
#app.home.search_field.set "sausages"
#app.home.search_button.click
#app.results_page.should be_displayed
I want to download and parse all the event data from a website's public google calendar, what would be the best way to do so? I'm considering just downloading the ics file or getting xml data and parsing that myself. I've looked into google api but it looks unnecessarily complex if all I want to do is read the data. I'm a beginner to working with API's and programming in general so I'm having trouble navigating all that documentation. They don't provide very many helpful examples.
How about something like this:
require 'ri_cal'
require 'open-uri'
components = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
components = RiCal.parse(cal)
end
components.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end
You will need to install the ri_gem.
UPDATE: Using iCalendar
require 'icalendar'
require 'open-uri'
calendars = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
#calendars = RiCal.parse(cal)
calendars = Icalendar.parse(cal)
end
calendars.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end
I'm working to do a crawl, but before I crawl an entire website, I would like to shoot off a test, of to or so pages. So I was thinking something like below would work, but I keep getting a nomethoderror....
Anemone.crawl(self.url) do |anemone|
anemone.focus_crawl do |crawled_page|
crawled_page.links.slice(0..10)
page = pages.find_or_create_by_url(crawled_page.url)
logger.debug(page.inspect)
page.check_for_term(self.term, crawled_page.body)
end
end
NoMethodError (private method `select' called for true:TrueClass):
app/models/site.rb:14:in `crawl'
app/controllers/sites_controller.rb:96:in `block in crawl'
app/controllers/sites_controller.rb:95:in `crawl'
Basically I want to have a way to first craw only 10 pages, but I seem to be not understanding the basics here. Can someone help me out?
Thanks!!
Add this monkeypatch to your crawling file.
module Anemone
class Core
def kill_threads
#tentacles.each { |thread|
Thread.kill(thread) if thread.alive?
}
end
end
end
Here is an example of how to use it after you've added it to your crawling file.Then in the file which you are running your add this to your anemone.on_every_page method
#counter = 0
Anemone.crawl(http://stackoverflow.com, :obey_robots => true) do |anemone|
anemone.on_every_page do |page|
#counter+= 1
if #counter > 10
anemone.kill_threads
end
end
end
Source: https://github.com/chriskite/anemone/issues/24
So I found the :depth_limit param and that will be ok, but I would rather limit it to # of links.
i found your question while i was googling for anemone.
I had the same problem. And with Anemone, what i did was:
As soon as i reach the URL limit that i want, i raise an exception. The whole anemone block is inside a begin/rescue block.
In your case specific i would take another approach. I would download the page that you want to parse, and bind it to fakeweb. I wrote a blog entry about it, long time ago, maybe it would be useful: http://blog.bigrails.com/scraper-guide.html
This might be a similar problem to my earlier two questions - see here and here but I'm trying to use the _detail command to automatically click the link so I can scrape the details page for each individual event.
The code I'm using is:
require 'rubygems'
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 20
end
nuffield_data.to_xml.write($stdout,1)
Is there any way to print out the URL that using the event_detail is trying to access? The error doesn't seem to give me the URL that gave the 404.
Update: I think the link may be a relative link - could this be causing problems? Any ideas how to deal with that?
I had the same issue with relative links and fixed it like this... you have to set the :resolve param to the correct base url
event do
title 'The Coast of Mayo'
link_url
event_detail :resolve => 'http://www.nuffieldtheatre.co.uk/cn/events' do
dates "1-4 October"
times "7:30pm"
end
end
sudo gem install ruby-debug
This will give you access to a nice ruby debugger, start the debugger by altering your script:
require 'rubygems'
require 'ruby-debug'
Debugger.start
Debugger.settings[:autoeval] = true if Debugger.respond_to?(:settings)
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 2
end
nuffield_data.to_xml.write($stdout,1)
Then find out where scrubyt is throwing an exception - in this case:
/Library/Ruby/Gems/1.8/gems/scrubyt-0.3.4/lib/scrubyt/core/navigation/fetch_action.rb:52:in `fetch'
Find the scrubyt gem on your system, and add a rescue clause to the method in question so that the end of the method looks like this:
if ##current_doc_protocol == 'file'
##hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(open(##current_doc_url).read))
else
##hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(##mechanize_doc.body))
store_host_name(self.get_current_doc_url) # in case we're on a new host
end
rescue
debugger
self # the self is here because debugger doesn't like being at the end of a method
end
Now run the script again and you should be dropped into a debugger when the exception is raised. Just try typing this a the debug prompt to see what the offending URL is:
##current_doc_url
You can also add a debugger statement anywhere in that method if you want to check what is going on - for example you may want to add one between line 51 and 52 of this method to check how the url that is being called changes and why.
This is basically how I figured out the answer to your previous questions.
Good luck.
Sorry I have no idea why this would be nil - every time I have run this it returns a url - the method self.fetch requires a URL which you should be able to access as the local variable doc_url. If this returns nil also may you should post the code where you have included the debugger call.
I've tried to access doc_url but that seems to also return nil. When I have access to my server (later in the day) I'll post the code with the debugging bit in it.