Selenium in Ruby. How can I get the text in alt attribute in img tag? - ruby

I am working on the scraping project and I am facing the big problem that I can't get the text "alt" in "img" tag.
the code is looking like this.
<div class="example">
<span class="on">
<img src="https://www.~~~~~~~~" alt="hello">
</span>
<span class="on">
<img src="https://www.~~~~~~~~" alt="goodbye">
</span>
<span class="on">
<img src="https://www.~~~~~~~~" alt="konichiwa">
</span>
</div>
what I have tried are these
def fetch_text_in_on_class
# #driver.find_elements(:class_name, 'on')[2].text or this ↓
# #driver.find_elements(:css, 'div.pc-only:nth-of-type(3) tr:nth-of-type(3)').first.text
end
also something like this
def fetch_text_in_on_class
e = #driver.find_elements(:class => 'on').first&.attribute("alt")
e
end
there are bunch of elements that have "on" class in a page, and I want to get all of them.
apparently I can get the elements that have "on" class with the code below but I can't get the text in alt.
#driver.find_elements(:class => 'on')
I would really appreciate if you could help me.
Thank you.

Forgive me if my ruby syntax is incorrect or I'm not answering your actual question -- you want the alt text itself?. What if you identify the elements with class of "on" as an array, then loop through to retrieve the related alt text. So, something like this?
elements = #driver.find_elements(:css => 'span.on > img')
elements.each { |element|
altText = element.attribute("alt")
#whatever you want to do with the alt text, or store as an array above etc
}

It looks like the problem is that you are trying to get the alt text from the element with the class on. Given your posted HTML that element doesn't have the alt attribute. Try the CSS selector span.on > img to get the IMG tag and then get the alt text. An updated version of your code to get the text of the first element should work.
e = #driver.find_elements(:css => 'span.on > img').first&.attribute("alt")

Let's iterate over the collection and save the text in an array and then let's print it
elements = driver.find_elements(:css => 'span.on > img').map { |element| element.attribute("alt") }
p elements
Output
["hello", "goodbye", "konichiwa"]

Related

Excluding contents of <span> from text using Waitr

Watir
mytext =browser.element(:xpath => '//*[#id="gold"]/div[1]/h1').text
Html
<h1>
This is the text I want
<span> I do not want this text </span>
</h1>
When I run my Watir code, it selects all the text, including what is in the spans. How do I just get the text "This is the text I want", and no span text?
If you have a more complicated HTML, I find it can be easier to deal with this using Nokogiri as it provides more methods for parsing the HTML:
require 'nokogiri'
h1 = browser.element(:xpath => '//*[#id="gold"]/div[1]/h1')
doc = Nokogiri::HTML.fragment(h1.html)
mytext = doc.at('h1').children.select(&:text?).map(&:text).join.strip
Ideally start by trying to avoid using XPath. One of the most powerful features of Watir is the ability to create complicated locators without XPath syntax.
The issue is that calling text on a node gets all content within that node. You'd need to do something like:
top_level = browser.element(id: 'gold')
h1_text = top_level.h1.text
span_text = top_level.h1.span.text
desired_text = h1_text.chomp(span_text)
This is useful for top level text.
If there is only one h1, you can ommit id
#b.h1.text.remove(#b.h1.children.collect(&:text).join(' '))
Or specify it if there are more
#b.h1(id: 'gold').text.remove(#b.h1.children.collect(&:text).join(' '))
Make it a method and call it from your script with get_top_text(#b.h1) to get it
def get_top_text(el)
el.text.chomp(#b.h1.children.collect(&:text).join(' '))
end

Can I use Selenium and Nokogiri to locate an element based on a nearby label?

Let's say I want to scrape the "Weight" attribute from the following content on a website:
<div>
<h2>Details</h2>
<ul>
<li><b>Height:</b>6 ft</li>
<li><b>Weight:</b>6 kg</li>
<li><b>Age:</b>6</li>
</ul>
</div>
All I want is "6 kg". But it's not labeled, and neither is anything around it. But I know that I always want the text after "Weight:". Is there a way of selecting an element based on the text near it or in it?
In pseudocode, this is what it might look like:
require 'selenium-webdriver'
require 'nokogiri'
doc = parsed document
div_of_interest = doc.div where text of h2 == "Details"
element_of_interest = <li> element in div_of_interest with content that contains the string "Weight:"
selected_text = (content in element) minus ("<b>Weight:</b>")
Is this possible?
You can write the following code
p driver.find_elements(xpath: "//li").detect{|li| li.text.include?'Weight'}.text[/:(.*)/,1]
output
"6 kg"
My suggestion is to use WATIR which is wrapper around Ruby Selenium Binding where you can easily write the following code
p b.li(text: /Weight/).text[/:(.*)/,1]
Yes.
require 'nokogiri'
Nokogiri::HTML.parse(File.read(path_to_file))
.css("div > ul > li")
.children # get the 'li' items
.each_slice(2) # pair a 'b' item and the text following it
.find{|b, text| b.text == "Weight:"}
.last # extract the text element
.text
will return
"6 kg"
You can locate the element through pure xpath: use the contains() function which returns Boolean is its second argument found in the first, and pass to it text() (which returns the text of the node) and the target string.
xpath_locator = '/div/ul/li[contains(text(), "Weight:")]'
value = driver.find_element(:xpath, xpath_locator).text.partition('Weight:').last
Then just get the value after "Weight:".

Put the Xpath element's text to array

I am trying to use Selenium. The problem is the following:
The doc structure:
<div class="jsSkills oSkills">
<a class="oTag oTagSmall oSkill" href="/contractors/skill/software-testing/" data-contractor="749244">software-testing</a>
<a class="oTag oTagSmall oSkill" href="/contractors/skill/software-qa-testing/" data-contractor="749244">software-qa-testing</a>
<a class="oTag oTagSmall oSkill" href="/contractors/skill/blog-writing/" data-contractor="749244">blog-writing</a>
</div>
I need to obtain all a's text to be in array like:
{"software-testing", "software-qa-testing", "blog-writing"}
I tried this:
contrSkill = driver.find_element(:xpath, "//div[contains(#class, 'jsSkills')]").text
puts contrSkill
but got this:
"software-testingsoftware-qa-testingblog-writing"
Please explain how to appropriately make an array.
You should get all of the link elements you want (using find_elements). Then you can iterate over each link and collect its text into an array (Ruby has a collect method that helps with this).
# Get all of the link elements within the div
skill_links = driver.find_elements(:xpath, "//div[contains(#class, 'jsSkills')]/a")
# Create an array of the text of each link
skill_text_array = skill_links.collect(&:text)
p skill_text_array
#=> ["software-testing", "software-qa-testing", "blog-writing"]

Return a map of all hidden hrefs on a page in watir

Is it possible to return a map of hidden links using watir? I have been trying to find some useful documentation, but have been most unsuccessful.
I need it to be generic enough to return any link thats hidden on page regardless of class, id, etc
style=display: none;
This currently returns me all visible links
full_list = #driver.links.map{|a| a.href}
i'd like to do something like (my syntax is probably way off):
hidden_list = #driver.hiddens.map{:style, a => 'display: none;'}
Please, please let me know if there is a way!
Thanks!
You could find all the links that are not visible? and collect their href attributes:
For example, given the following html:
asdf
<a style="display:none;" href="somewhere/invisible">asdf</a>
<a style="display:none;" href="somewhere/invisible2">asdf</a>
You can do:
hidden_list = #driver.links.find_all{ |a| !a.visible? }.collect(&:href)
#=> ["somewhere/invisible", "somewhere/invisible2"]

get node text() with or without anchor tag

I can't figure out how to get a table cell's text() whether or not an anchor tag is parent to the text.
WITH:
<td class="c divComms" title="Komentarz|">
<a id="List1_Dividends_ctl01_HyperLink1" target="_blank" href="http://www.attrader.pl/pl/akcje/DRUKPAK/komunikat/EBI/none,20130104_090845_0000041461">uchwalona</a>
<div class="stcm">2013-01-29</div></td>
WITHOUT:
<td class="c divComms" title="Komentarz|Celem...">
proponowana
<div class="stcm">2012-10-05</div>
</td>
Composing elements of a hash, I would expect
details = rows.collect do |row|
detail = {}
[
[:paystatus, 'td[7]//text()[not(ancestor::div)]'],
[:paydate, 'td[7]/div/text()'], # the 2013-01-29 or 2012-10-05 above
].each do |name, xpath|
detail[name] = row.at_xpath(xpath).to_s.strip
end
to catch either uchwalona or proponowana (notice without the date in the trailing div), but as it stands, it ignores the a tag text, unless I do td[7]/a/text(), in which case only the anchor's text "uchwalona" is read.
Using the union operator | should work:
[:paystatus, '(td[7]|td[7]/a)/text()']
(I think you won't need the [not(ancestor::div)] part if you don't use a double-slash)
The problem appeared to be resolved when I used the row.xpath method instead of .at_xpath, which somehow made the union operator | ineffective.
So changed
detail[name] = row.at_xpath(xpath).to_s.strip
to:
detail[name] = row.xpath(xpath).to_s.strip
This meant I also had to tighten a few xpath expressions in my other field |name, xpath| pairs, to not over-include as unnoticed before.

Resources