nokogiri + mechanize css selector by text - ruby

I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table.
Table:
<div class="holder">
<div class ="row">
<div class="c1">
<!-- Content I Don't need -->
</div>
<div class="c2">
<span class="data">
<!-- Content I Don't Need -->
<span class="data">
</div>
</div>
...
<div class="row">
<div class="c1">
SPECIFIC TEXT
</div>
<div class="c2">
<span class="data">
What I want
</span>
</div>
</div>
</div>
My Script: (if SPECIFIC TEXT is found in the table it returns every "div.c2 span.data" variable - so I've either screwed up my knowledge of do loops or if statements)
data = []
page.agent.get(url)
page.search('div.row').each do |row_data|
if (row_data.search('div.c1:contains("/SPECIFIC TEXT/")').text.strip
temp = row_data.search('div.c2 span.data').text.strip
data << temp
end
end

There's no need to stop and insert ruby logic when you can extract what you need in a single CSS selector.
data = page.search('div.row > div.c1:contains("SPECIFIC TEXT") + div.c2 span.data')
This will include only those that match the selector (e.g. follow the SPECIFIC TEXT).
Here's where your logic may have gone wrong:
This code
if (row_data.search('div.c1:contains("SPECIFIC TEXT")'...
temp = row_data.search('div.c2 span.data')...
first searches the row for the specific text, then if it matches, returns ALL rows matching the second query, which has the same starting point. The key is the + in the CSS selector above which will return elements immediately following (e.g. the next sibling element). I'm making an assumption, of course, that the next element is always what you want.

I'd do
require 'nokogiri'
html = <<_
<div class="holder">
<div class ="row">
<div class="c1">
<!-- Content I Don't need -->
</div>
<div class="c2">
<span class="data">
<!-- Content I Don't Need -->
<span class="data">
</div>
</div>
<div class="row">
<div class="c1">
SPECIFIC TEXT
</div>
<div class="c2">
<span class="data">
What I want
</span>
</div>
</div>
</div>
_
doc = Nokogiri::HTML(html)
css_string = 'div.row > div.c1[text()*="SPECIFIC TEXT"] + div.c2 span.data'
doc.at(css_string).text.strip
# => "What I want"
How those selectors would work here -
[name*="value"] - Selects elements that have the specified attribute with a value containing the a given substring.
Child Selector (“parent > child”) - Selects all direct child elements specified by "child" of elements specified by "parent".
Next Adjacent Selector (“prev + next”) - Selects all next elements matching "next" that are immediately preceded by a sibling "prev".
Class Selector (“.class”) - Selects all elements with the given class.
Descendant Selector (“ancestor descendant”) - Selects all elements that are descendants of a given ancestor.

Related

XPath - exclude descendants of specific selector

I have following markup (schema.org attributes included):
<body>
<div itemscope itemtype="http://schema.org/Foo">
<div>
<div itemname="name">
Foo scoped name
</div>
</div>
<div>
<div itemscope itemtype="http://schema.org/Bar">
<div>
<div itemname="name">
Bar scoped name
</div>
</div>
</div>
</div>
</div>
</body>
I need to select (presumably by xpath as css selectors won't be enough for the task) divs that have itemname="name" in http://schema.org/Foo scope but not those that have another element with itemscope attribute ascending them.
So in example provided I need to select only "Foo scoped name", but not "Bar scoped name".
You can use something like :
//div[#itemname="name"][ancestor::div[#itemscope][1][#itemtype="http://schema.org/Foo"]]
Look for a div element with a specific attribute value (#itemname="name"). Its first div ancestor (with #itemscope attribute) contains also a specific #itemtype attribute value (http://schema.org/Foo).
Output : <div itemname="name"> Foo scoped name </div>

Need to select a checkbox, based on text in another div

<div class="bli-category">
<div class="row ng-scope" ng-repeat="placementtrack by $index">
<div class="col-sm-12">
<div class="col-sm-1 bli-category-checkbox">
<input class="bli-check-box ng-valid" type="checkbox" ng-click="addPlacement" ng-checked="checkedPlacementIndex" ng-model="selectedPlacement">
</div>
<div class="col-sm-8 bli-category-content">
<div class="ng-binding" ng-bind="placement.placementName">page_details</div>
</div>
</div>
</div>
I need to select the checkbox in class='bli-check-box ng-valid' for the text in class='ng-binding'
When I try to get the xpath like
//input[#class='bli-check-box ng-valid']
it selects all the 4-5 checkboxes
To select the checkbox in class='bli-check-box ng-valid' with respect to the text in class='ng-binding' i.e. page_details you can use the following xpath :
//div[#class='bli-category']//div[#class='ng-binding' and contains(.,'page_details')]//preceding::input[#class='bli-check-box ng-valid']
Note : As the element is an Angular element you have to induce wait for the element to be clickable before attempting to click.
//div[text='page_detials' and class='ng-binding']/../preceding-sibling::div//input[class='bli-check-box ng-valid']
The above xpath starts with finding the node which has the custom text that you know. It then traverses to its parent and then its previous sibling which in your case houses your required input node. So after traversing to the div you select its child which is your required input node.

Enter text in input in Watir with same class

I am trying to enter text into an input field and can not successfully get it working. I have two inputs that look like this:
<div class"outerParentClass">
<div class="classLabel">From</div>
<div class="classA classB classD">
<div class="classE">
<div class="classText"> TEXT HERE </div>
<input class="classInputA classInoutB" type="text">
</div>
</div>
</div>
<div class="classLabel">To</div>
<div class="classA classB classD">
<div class="classE">
<div class="classText"> DIFFERENT TEXT HERE </div>
<input class="classInputA classInoutB" type="text">
</div>
</div>
</div>
</div>
Both the inputs are the exact same format as above. There are no Id's and both have the same classes. I am struggling at entering the text into these or even finding them correctly.
When I do this:
browser.text_field(:class => "classInputA").size
It returns 20
When I do this:
browser.text_field(:class => "classInputA")
It returns:
#<Watir::TextField:0x..fbccafb7ed2e9b85e located=false selector={:class=>"classInputA", :tag_name=>"input"}>
Not sure how to locate either of these inputs. Any suggestions?
The text adjacent to the field provides a label and context for the field. As it is likely unique, you can use this to identify the element.
To do this, find the div containing the label text. Then navigate to the adjacent div that contains the text field.
browser.div(text: 'From', class: 'classLabel') # label of interest
.element(xpath: './following-sibling::div[1]') # adjacent div containing text field
.text_field # the text field
Note that in the next release of Watir, .element(xpath: './following-sibling::div[1]') will be replaceable by just .following-sibling.

What will be the xpath?

How can I get the element data using jsoup or xpath.
My requirement is
if i have selected class='SecondClass' then how to find its parent "FirstClass". Means if i have selected class="SecondClass">yyyyyyyyy then how to find
class="FirstClass">Hi element
<div class="FirstClass">Hello</div>
<div class="SecondClass">xyza</div>
<div class="SecondClass">lllllllll</div>
<div class="FirstClass">Hi</div>
<div class="SecondClass">ooooooooo</div>
<div class="SecondClass">yyyyyyyyy</div>
<div class="SecondClass">ttttttttyt</div>
<div class="FirstClass">HelloHi</div>
<div class="SecondClass">xysefsfza</div>
<div class="SecondClass">hohoho</div>
<div class="SecondClass">xydadaza</div>
<div class="SecondClass">new</div>
You can try this XPath expression to get nearest preceding <div> element having class attribute value equals FirstClass :
/preceding-sibling::div[#class='FirstClass'][1]
With that, given XML data is as posted in question, and current element is this :
<div class="SecondClass">yyyyyyyyy</div>
XPath query above will return this element :
<div class="FirstClass">Hi</div>

xPath strange behaviour - selecting ALL elements even if [1] set

today I stumbled upon a very interesting case (at least for me). I am messing around with Selenium and xPath and tried to get some elements, but got a strange behaviour:
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some other text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some even unrelated text
</a>
</div>
</div>
</div>
This is my data.
When i run the following xPath query:
//div[#class="title"][1]/a
I get as a result ALL instead of only the first one. But if I query:
//div[#class="resultcontainer"][1]/div[#class="info"]/div[#class="title"]/a
I get only the first , not all.
Is there some divine reason behind that?
Best regards,
bisko
I think you want
(//div[#class="title"])[1]/a
This:
//div[#class="title"][1]/a
selects all (<a> elements that are children of) <div> elements that have a #class of 'title', that are the first children of their parents (in this context). Which means: it selects all of them.
The working XPath selects all <div> elements that have a #class of 'title' - and of those it takes the first one.
The predicates (the expressions in square brackets []) are applied to each element that matched the preceding location step (i.e. "//div") individually. To apply a predicate to a filtered set of nodes, you need to make the grouping clear with parentheses.
Consequently, this:
//div[1][#class="title"]/a
would select all <div> elements, take the first one, and then filter it down futher by checking the #class value. Also not what you want. ;-)

Resources