Ruby - Watir Saving File with a custom file name - ruby

If I have the below HTML:
<a class="name" title="file_name" href="/somelink>NAMEOFFILE</a>
<div class="profile">
<img class="FFVAD" decoding="auto" style="" sizes="496px" src="https://websitename.com/054a89a69181e68399c756d746f3b996/followme.jpg">
</div>
How do use Watir to save download and save the image using the link title.
So file followme.jpg, would be downloaded and saved as title_name.jpg

Apologies, I figured this one out in the end. I stored the title of the element into a string, then used the string when calling the file write.
#image_src = #browser.div(:class => "profile").image(:class => "FFVAD").src
#userimage = #browser.link(:class => "name").text
#filename = "./folder/#{#userimage}.jpg"
File.open(#filename, 'wb') do |f|
f.write open(#image_src).read
end

Related

Why is the following Nokogiri/XPath code removing tags inside the node?

The document going in has a structure like this:
<span class="footnote">Hello there, link</span>
The XPath search is:
#doc = set_nokogiri(html)
footnotes = #doc.xpath(".//span[#class = 'footnote']")
footnotes.each_with_index do |footnote, index|
puts footnote
end
The above footnote becomes:
<span>Hello there, link</span>
I assume my XPath is wrong but I'm having a hard time figuring out why.
I had the wrong tag in the output and should have been more careful. The point being that the <a> tag is getting stripped but its contents are still included.
I also added the set_nokogiri line in case that's relevant.
I can't duplicate the problem:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<span class="footnote">Hello there, link</span>
EOT
footnotes = doc.xpath(".//span[#class = 'footnote']")
footnotes.to_xml # => "<span class=\"footnote\">Hello there, link</span>"
footnotes.each do |f|
puts f
end
# >> <span class="footnote">Hello there, link</span>
An additional problem is that the <a> tag has an invalid href URL.
link
should be:
link

JSON will not display

I have a Sinatra application that should get image URLs from a JSON file and put them into HTML <img> tags.
I can parse through the JSON just fine when I print it to the command line, but when I use ERB to place the data, it won't show.
I put it in <ul> tags and got only the bullet points for every image in the JSON file.
Here is my code:
app.rb:
get "/" do
file = open("./images.json")
json = file.read
#parsed = JSON.parse(json)
erb :roar
##parsed.each do |roar|
# p roar["url"]
#end
end
Roar.erb:
<ul>
<% #parsed.each do |shop| %>
<li> <%shop["url"] %> </li>
<% end %>
</ul>
Are you not just missing an "=" here :
<li> <%= shop["url"] %> </li>
Just some comments on the code in general:
Don't use:
file = open("./images.json")
json = file.read
#parsed = JSON.parse(json)
Instead, use:
json = File.open("./images.json") do |fi|
fi.read
end
#parsed = JSON.parse(json)
Or:
json = File.open("./images.json") { |fi| fi.read }
#parsed = JSON.parse(json)
Or:
json = File.read("./images.json")
#parsed = JSON.parse(json)
Or:
#parsed = JSON.parse(File.read("./images.json"))
The reasons are:
file = open("./images.json") opens the file but never closes it. That's not good form, and it's also not idiomatic Ruby. The first two replacements automatically close the opened file.
Using File.read returns the contents of the file just as opening it and reading in a separate step, only it's all in one step. The file is also automatically closed afterwards.

Clicking the "Show more" link on a LinkedIn group page using Ruby Mechanize

I have logged in to Linkedin and reached my groups page using Ruby Mechanize. I am also able to retrieve the list of questions on the page. However, I am unable to click the "Show more" link at the bottom so that I can the entire page and hence all the questions:
require 'rubygems'
require 'mechanize'
require 'open-uri'
a = Mechanize.new { |agent|
# LinkedIn probably refreshes after login
agent.follow_meta_refresh = true
}
a.get('http://linkedin.com/') do |home_page|
my_page = home_page.form_with(:name => 'login') do |form|
form.session_key = '********' #put you email ID
form.session_password = '********' #put your password here
end.submit
mygroups_page = a.click(my_page.link_with(:text => /Groups/))
#puts mygroups_page.links
link_to_analyse = a.click(mygroups_page.link_with(:text => 'Semantic Web'))
link_to_test = link_to_analyse.link_with(:text => 'Show more...')
puts link_to_test.class
# link_to_analyse.search(".user-contributed .groups a").each do |item|
# puts item['href']
# end
end
Although a link exists with text 'Show more...' in the page, I am somehow not able to click it.the link_to_test.class shows NilClass What is the possible problem?
The part of the page I need to reach is:
<div id="inline-pagination">
<span class="running-count">20</span>
<span class="total-count">1134</span>
<a href="groups?mostPopularList=&gid=49970&split_page=2&ajax=ajax" class="btn-quaternary show-more-comments" title="Show more...">
<span>Show more...</span>
<img src="http://static01.linkedin.com/scds/common/u/img/anim/anim_loading_16x16.gif" width="16" height="16" alt="">
</a>
</div>
I need to click the show more... I can use links_with(:href => ..) but doesnt seem to work.
NEW ANSWER:
I just inspected the page source of the group and it seems that for the "Show more" link they actually use the three full stop characters and not an ellipsis.
Have you tried targeting the link by it's title attribute?
link_to_analyse.link_with(:title => 'Show more...')
If that's still not working, have you tried dumping the text of all the links on the page with
link_to_analyse.links.each do |link|
puts link.text
end
---- OLD ANSWER INCORRECT ----
LinkedIn use the "Horizontal Ellipsis" Unicode character (code U+2026) for their links that "look" like they have "..." at the end. So your code is not actually finding the link.
Character you need: http://www.fileformat.info/info/unicode/char/2026/index.htm
Sneaky :)
EDIT: and to get the link ofcourse you need to insert an appropriate Unicode character in your link text like so:
link_to_analyse.link_with(:text => 'Show more\u2026')
The tags inside the anchor will create some white space around the anchor text. You can account for that with:
link_to_analyse.link_with :text => /\A\s*Show more...\s*\Z/
But it's probably good enough to just do:
link_to_analyse.link_with :text => /Show more.../

Parsing webpage with some html tags using Nokogiri

For example:
content=Nokogiri::HTML(open(url)).at_css(".appwindow").text
This example parse text from .appwindow (only text).
How can I parse this text with <p> tag?
I think you want to find either the full HTML of the first element that has an appwindow class, or perhaps the inner HTML. If so:
require 'nokogiri'
html = Nokogiri::HTML <<ENDHTML
<div id='menu'>menu</div>
<div class='appwindow'><p>Hello <b>World</b>!</p></div>
ENDHTML
puts html.at_css('.appwindow').text
#=> Hello World!
puts html.at_css('.appwindow').to_html
#=> <div class="appwindow"><p>Hello <b>World</b>!</p></div>
puts html.at_css('.appwindow').inner_html
#=> <p>Hello <b>World</b>!</p>
See the list of methods on Nokogiri::XML::Node for other options available to you.

Best way to parse a table in Ruby

I'd like to parse a simple table into a Ruby data structure. The table looks like this:
alt text http://img232.imageshack.us/img232/446/picture5cls.png http://img232.imageshack.us/img232/446/picture5cls.png
Edit: Here is the HTML
and I'd like to parse it into an array of hashes. E.g.,:
schedule[0]['NEW HAVEN'] == '4:12AM'
schedule[0]['Travel Time In Minutes'] == '95'
Any thoughts on how to do this? Perl has HTML::TableExtract, which I think would do the job, but I can't find any similar library for Ruby.
You might like to try Hpricot (gem install hpricot, prepend the usual sudo for *nix systems)
I placed your HTML into input.html, then ran this:
require 'hpricot'
doc = Hpricot.XML(open('input.html'))
table = doc/:table
(table/:tr).each do |row|
(row/:td).each do |cell|
puts cell.inner_html
end
end
which, for the first row, gives me
<span class="black">12:17AM </span>
<span class="black">
</span>
<span class="black">1:22AM </span>
<span class="black">
</span>
<span class="black">65</span>
<span class="black">TRANSFER AT STAMFORD (AR 1:01AM & LV 1:05AM) </span>
<span class="black">
N
</span>
So already we're down to the content of the TD tags. A little more work and you're about there.
(BTW, the HTML looks a little malformed: you have <th> tags in <tbody>, which seems a bit perverse: <tbody> is fairly pointless if it's just going to be another level within <table>. It makes much more sense if your <tr><th>...</th></tr> stuff is in a separate <thead> section within the table. But it may not be "your" HTML, of course!)
In case there isn't a library to do that for ruby, here's some code to get you started writing this yourself:
require 'nokogiri'
doc=Nokogiri("<table><tr><th>la</th><th><b>lu</b></th></tr><tr><td>lala</td><td>lulu</td></tr><tr><td><b>lila</b></td><td>lolu</td></tr></table>")
header, *rest = (doc/"tr").map do |row|
row.children.map do |c|
c.text
end
end
header.map! do |str| str.to_sym end
item_struct = Struct.new(*header)
table = rest.map do |row|
item_struct.new(*row)
end
table[1].lu #=> "lolu"
This code is far from perfect, obviously, but it should get you started.

Resources