I am working on a way to share a Ruby program. This is my test code.
require 'rubygems'
require 'watir-webdriver'
require 'headless'
require 'google_drive'
#browsertype =""
#name =""
#champsname = ""
#date = DateTime.now
file = File.exists?("temp.txt")
if file == true
input = IO.read("temp.txt")
input_hash = Hash[*input.gsub(/"/,"").split(/\s*[\n=]\s*/)]
browsertype = input_hash.shift
name = input_hash.shift
champsname = input_hash.shift
#browsertype = browsertype[1]
#name = name[1]
#champ = champsname[1]
end
if file == false
f = File.new("temp.txt", "w")
# processing
puts "What browser will you be using? "
browsertype = gets
f.write("browser = :")
f.write(#browsertype)
puts "What is your name? "
name = gets
f.write("name = ")
f.write(#name)
puts "what is your champs name"
champsname = gets
f.write("champion = ")
f.write(#champsname)
#browsertype = browsertype
#name = name
#champ = champsname
end
puts #browsertype
puts #name
puts #champ
puts #date
##b = "Watir::Browser.new :"+#browsertype
agent = Watir::Browser.new
#b = agent+#browsertype
#b = #b
#b.goto 'https://docs.google.com/spreadsheet/ccc?key=0AncPM9_7wL02dHVaSWg5eklfYW5jdTE3NGtJSGJPb3c'
I used this question as a reference for making var out of files, Variables magic and read from file. I want my friends to be able to use the finished product by using orca Building a Windows executable from my Ruby app?. The problem I have been encountering is Watir runs Firefox by default when you use
agent = Watir::Browser.new
But not everyone uses Firefox, so this is why I created the browsertype in the file. But when I use
#b = "Watir::Browser.new :"+#browsertype
I get an error saying that the +string is invalid and I get the same for symbol. Does anyone have any suggestions on how I can have a user defined browser type?
You should pass the browser type to the initialization of the browser.:
#b = Watir::Browser.new #browsertype
This assumes that #browsertype is something like 'firefox'.
Watir accepts a small specific set of 'symbols' such as :chrome for the browser type. If accepting input from a user I'd use a case statement to setup a variable that contains the specific symbol (:firefox, :chrome, etc) based on their input and give feedback to the user if they don't type in a value that matches what you've anticipated.
alternatively you can also use .to_sym on a string to cast it to a symbol.. so
#browser_type = "chrome"
#b = Watir::Browser.new #browser_type.to_sym
Related
I am doing some web scraping with the Kimurai Ruby gem. I have this script that works great:
require 'kimurai'
class SimpleSpider < Kimurai::Base
#name = "simple_spider"
#engine = :selenium_chrome
#start_urls = ["https://apply.workable.com/taxjar/"]
def parse(response, url:, data: {})
# Update response to current response after interaction with a browser
count = 0
# browser.click_button "Show more"
doc = browser.current_response
returned_jobs = doc.css('.careers-jobs-list-styles__jobsList--3_v12')
returned_jobs.css('li').each do |char_element|
# puts char_element
title = char_element.css('a')[0]['aria-label']
link = "https://apply.workable.com" + char_element.css('a')[0]['href']
#click on job link and get description
browser.visit(link)
job_page = browser.current_response
description = job_page.xpath('/html/body/div[1]/div/div[1]/div[2]/div[2]/div[2]').text
puts '*******'
puts title
puts link
puts description
puts count += 1
end
puts "There are #{count} jobs total"
end
end
SimpleSpider.crawl!
However, I'm wanting this all to return an array of objects...or jobs in this case. I'd like to create a jobs array in the parse method and do something like jobs << [title, link, description, company] inside the returned_jobs loop and have that get returned when I call SimpleSpider.crawl! but that doesn't work.
Any help appreciated.
You can slightly modify your code like this:
class SimpleSpider < Kimurai::Base
#name = "simple_spider"
#engine = :selenium_chrome
#start_urls = ["https://apply.workable.com/taxjar/"]
def parse(response, url:, data: {})
# Update response to current response after interaction with a browser
count = 0
# browser.click_button "Show more"
doc = browser.current_response
returned_jobs = doc.css('.careers-jobs-list-styles__jobsList--3_v12')
jobs = []
returned_jobs.css('li').each do |char_element|
# puts char_element
title = char_element.css('a')[0]['aria-label']
link = "https://apply.workable.com" + char_element.css('a')[0]['href']
#click on job link and get description
browser.visit(link)
job_page = browser.current_response
description = job_page.xpath('/html/body/div[1]/div/div[1]/div[2]/div[2]/div[2]').text
jobs << [title, link, description]
end
puts "There are #{jobs.count} jobs total"
puts jobs
end
end
I am not sure about the company as I don't see that variable in your code. However, you can see the idea to call an array above and work on that.
Here is part of output running in terminal:
I also have a blog post here about how to use Kimurai framework from Ruby on Rails application.
Turns out there is a parse method that allows a value to be returned. Here is working example:
require 'open-uri'
require 'nokogiri'
require 'kimurai'
class TaxJar < Kimurai::Base
#name = "tax_jar"
#engine = :selenium_chrome
#start_urls = ["https://apply.workable.com/taxjar/"]
def parse(response, url:, data: {})
jobs = Array.new
doc = browser.current_response
returned_jobs = doc.css('.careers-jobs-list-styles__jobsList--3_v12')
returned_jobs.css('li').each do |char_element|
title = char_element.css('a')[0]['aria-label']
link = "https://apply.workable.com" + char_element.css('a')[0]['href']
#click on job link and get description
browser.visit(link)
job_page = browser.current_response
description = job_page.xpath('/html/body/div[1]/div/div[1]/div[2]/div[2]/div[2]').text
company = 'TaxJar'
puts "title is: #{title}, link is: #{link}, \n description is: #{description}"
jobs << [title, link, description, company]
end
return jobs
end
end
jobs = TaxJar.parse!(:parse, url: "https://apply.workable.com/taxjar/")
puts jobs.inspect
If you are scraping JS websites, this gem seems pretty robust compared with others (waitr/selenium) I have tried.
I want to collect the names of users in a particular group, called Nature, in the photo-sharing website Fotolog. This is my code:
require 'rubygems'
require 'mechanize'
require 'csv'
def getInitUser()
agent1 = Mechanize.new
number = 0
while number<=500
address = 'http://http://www.fotolog.com/nature/participants/#{number}/'
logfile2 = File.new("Fotolog/Users.csv","a")
tryConut = 0
begin
page = agent1.get(address)
rescue
tryConut=tryConut+1
if tryConut<5
retry
end
return
end
arrayUsers= []
# search for the users
page.search("a[class=img_border_radius").map do |opt|
link = opt.attributes['href'].text
link = link.gsub("http://www.fotolog.com/","").gsub("/","")
arrayUsers << link
logfile2.print("#{link}\n")
end
number = number+100
end
return arrayUsers
end
arrayUsers = getInitUser()
arrayUsers.each do |user|
getFriend(user)
end
But the Users.csv file I am getting is empty. What's wrong here? I suspect it might have something to do with the "class" tag I am using. But from the inspect element, it seems to be the correct class, isn't it? I am just getting started with web crawling, so I apologise if this is a silly query.
I have a small class which is for a character and we can assign to it from outside the class.
I need to know how I can dump all the information in that class into another that can be used to create a YAML file.
require "yaml"
module Save
filename = "data.yaml"
character = []
sex = []
race = []
stats = [Str=[], Dex=[], Con=[], Int=[], Wis=[], Cha=[]]
inventory = []
saving_throws = [fortitude=[], reflex=[], will=[]]
#Armor Class, Flat footed Armor Class, and Touch armor Class
armor_class = [ac=[], fac=[], tac=[]]
armor_worn = [head=[], eyes=[], neck=[], shoulders=[], body=[], torso=[], arms_wrists=[], hands=[], ring1=[], ring2=[], waist=[], feet=[]]
money = []
god = []
speciality_school = [] #wizard
companion = [] #also used for familirs and psicrystals
skills = []
class_race_traits = []
feats = []
languages = []
program_data = {
character: character,
sex: sex,
race: race,
stats: stats,
inventory: inventory,
saving_throws: saving_throws,
armor_class: armor_class,
armor_worn: armor_worn,
mony: money,
god: god,
speciality_school: speciality_school,
companion: companion,
skills: skills,
class_race_traits: class_race_traits,
feats: feats,
languages: languages
}
File.write filename, YAML.dump(program_data)
end
This is the code I want to use to obtain the user content from the player:
class Character
attr_reader :name, :race, :description
def initialize (name, race, description)
#name = name
#race = race
#description = description
end
end
def prompt
print "Enter Command >"
end
puts "What is your name?"
prompt; name = gets.chomp.downcase
puts "What is your race?"
prompt; race = gets.chomp.downcase
puts "What do you look like?"
prompt; desc = gets.chomp.downcase
player_one = Character.new(name, race, desc)
puts player_one
I'm stuck on how to load it back and refill the character content to make it continue where the player left off.
Meditate on this bit of fictional code:
require 'yaml'
SAVED_STATE_FILE = 'saved_state.yaml'
class User
def initialize(name=nil, address=nil)
#name = name
#address = address
end
def to_h
{
'name' => #name,
'address' => #address
}
end
def save
File.write(SAVED_STATE_FILE, self.to_h.to_yaml)
end
def reload
state = YAML.load_file(SAVED_STATE_FILE)
#name, #address = state.values
end
end
We can create a new user with some properties:
user = User.new('Popeye', '123 E. Main St.')
# => #<User:0x007fe361097058 #name="Popeye", #address="123 E. Main St.">
To write that information to a file you should probably start by using YAML, which results in a very readable output and is readable by many different languages, making the data file reusable. A hash results in a very readable output:
user.to_h
# => {"name"=>"Popeye", "address"=>"123 E. Main St."}
user.to_h.to_yaml
# => "---\nname: Popeye\naddress: 123 E. Main St.\n"
Save the YAML serialized hash:
user.save
Create a new version of the user without any state:
user = User.new
# => #<User:0x007fe361094a88 #name=nil, #address=nil>
Load the saved information from the file back into the blank object:
user.reload
Which results in:
user
# => #<User:0x007fe361094a88 #name="Popeye", #address="123 E. Main St.">
That will give you enough to work from.
Your current code isn't going to work well though; I'd recommend reading some tutorials about Ruby classes and modules, as a Module isn't what you want, at least for your initial code.
I am parsing a PDF file online in order to extract a text. The 2 completes codes:
First
require 'open-uri'
require "net/http"
require 'pdf/reader'
module OpenSSL
module SSL
remove_const :VERIFY_PEER
end
end
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
io = open('https://www.mtholyoke.edu/sites/default/files/registrar/bulletin/docs/dept_econ.pdf')
reader = PDF::Reader.new(io)
reader.pages.each do |page|
iso = page.text
$var = iso.scan(/Economics[\s\S]*Overview/)
p $var
end
Second
require 'open-uri'
require "net/http"
require 'pdf/reader'
module OpenSSL
module SSL
remove_const :VERIFY_PEER
end
end
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
io = open('https://www.mtholyoke.edu/sites/default/files/registrar/bulletin/docs/dept_econ.pdf')
reader = PDF::Reader.new(io)
reader.pages.each do |page|
iso = page.text
$var = iso.scan(/Economics[\s\S]*Overview/)
end
p $var
It appears that when I use p $var after end, I have truncated the result unlike the first code. Why does putting p $var after end give a different result from putting it before?
In my web app, I do need to do put it after the end and have the same result as the first code. How can I do so?
tmp = reader.pages.map { |p| p.text.scan(/Economics[\s\S]*Overview/) }
tmp now contains a collection of all the scan results.
puts tmp.join("\n")
Will print them all out with newlines between each match.
Although won't that just print a wad of "Economics Overview"?
If you want to collect the pages themselves it's different code.
I have written a script in ruby that navigates through a website and gets to a form page. Once the form page is filled out the script hits the submit button and then a dialogbox opens asking you where to save it too. I am having trouble trying to get this file. I have searched the web and cant find anything. How would i go about retrieveing the file name of the document?
I would really appreciate if someone could help me
My code is below:
browser = Mechanize.new
## CONSTANTS
LOGIN_URL = 'https://business.airtricity.com/ews/welcome.jsp'
HOME_PAGE_URL = 'https://business.airtricity.com/ews/welcome.jsp'
CONSUMPTION_REPORT_URL = 'https://business.airtricity.com/ews/touConsChart.jsp?custid=209495'
LOGIN = ""
PASS = ""
MPRN_GPRN_LCIS = "10000001534"
CONSUMPTION_DATE = "20/01/2013"
END_DATE = "27/01/2013"
DOWNLOAD = "DL"
### Login page
begin
login_page = browser.get(LOGIN_URL)
rescue Mechanize::ResponseCodeError => exception
login_page = exception.page
end
puts "+++++++++"
puts login_page.links
puts "+++++++++"
login_form = login_page.forms.first
login_form['userid'] = LOGIN
login_form['password'] = PASS
login_form['_login_form_'] = "yes"
login_form['ipAddress'] = "137.43.154.176"
login_form.submit
## home page
begin
home_page = browser.get(HOME_PAGE_URL)
rescue Mechanize::ResponseCodeError => exception
home_page = exception.page
end
puts "----------"
puts home_page.links
puts "----------"
# Consumption Report
begin
Report_Page = browser.get(CONSUMPTION_REPORT_URL)
rescue Mechanize::ResponseCodeError => exception
Report_Page = exception.page
end
puts "**********"
puts Report_Page.links
pp Report_Page
puts "**********"
Report_Form = Report_Page.forms.first
Report_Form['entity1'] = MPRN_GPRN_LCIS
Report_Form['start'] = CONSUMPTION_DATE
Report_Form['end'] = END_DATE
Report_Form['charttype'] = DOWNLOAD
Report_Form.submit
## Download Report
begin
browser.pluggable_parser.csv = Mechanize::Download
Download_Page = browser.get('https://business.airtricity.com/ews/touConsChart.jsp?custid=209495/meter_read_download_2013-1-20_2013-1-27.csv').save('Hello')
rescue Mechanize::ResponseCodeError => exception
Download_Page = exception.page
end
http://mechanize.rubyforge.org/Mechanize.html#method-i-get_file
File downloading from url it's pretty straightforward with mechanize:
browser = Mechanize.new
file_url = 'https://raw.github.com/ragsagar/ragsagar.github.com/c5caa502f8dec9d5e3738feb83d86e9f7561bd5e/.html'
downloaded_file = browser.get_file file_url
File.open('new_file.txt', 'w') { |file| file.write downloaded_file }
I've seen automation fail because of the browser agent. Perhaps you could try
browser.user_agent_alias = "Windows Mozilla"