I want to find all the elements which there attributes start with x.
For example,
<a xt="1"> text1 </a>
<a xu="2"> text2 </a>
<a text1="3"> text3 </a>
The xpath would find the first two elements, because it contains attributes with it name is xt and xu, respectively. text1 attribute doesn't start with x, and for that I will not get it.
I try with start-with() function, but as I understood it find the values, and not attributes.
try
#*[starts-with(name(), 'x')]
I would use:
//*[#*[starts-with(local-name(),'x')]]
Similar to the other answer, but uses local-name() instead. This will only check the attribute name and not the namespace prefix.
For example, if you use local-name() the following will also be matched because the prefix starts with x...
<a xns:text1="3" xmlns:xns="xns"> text3 </a>
If you use local-name(), it will not match because the local name of xns:text1 is text1.
Related
I have a variable e which stores a Nokogiri::XML::Element object.
when I execute puts e I get on the screen the following:
<h3 class="fixed-recipe-card__h3">
<a href="https://www.allrecipes.com/recipe/21712/chocolate-covered-strawberries/" data-content-provider-id="" data-internal-referrer-link="hub recipe" class="fixed-recipe-card__title-link">
<span class="fixed-recipe-card__title-link">Chocolate Covered Strawberries</span>
</a>
</h3>
I would like to scrape this part https://www.allrecipes.com/recipe/21712/chocolate-covered-strawberries/
How can I do this using Nokogiri
If you want to extract the link, you can use:
e.at_css("a").attributes["href"].value
.at_css returns the first element matching the CSS selector (another Nokogiri::XML::Element). To get a list of all matching elements, use .css instead.
.attributes gives you a hash mapping attribute name to Nokogiri::XML::Attr. Once you look up the desired attribute in this hash (href), you can call .value to get the actual text value.
I have element
<a href="/s-xQ6qeR/documents/download?revid=28">
<span class="icon icon-file-pdf-o" style="vertical-align: middle"></span> test_upload_uwfacjtn.pdf
</a>
I need to check this element on page
I try do it:
$fileHref = $this->I->grabAttributeFrom("//a[contains(., 'test_upload_uwfacjtn.pdf')]", 'href');
But I got error:
Step Grab attribute from "//a[contains(.,
'test_upload_uwfacjtn.pdf')]","href" Fail Element that matches CSS
or XPath element with '//a[contains(., 'test_upload_uwfacjtn.pdf')]'
was not found.
I finded two way to check the text inside an html tag :
1. Using the method grabAttributeFrom and then compare the result
$fileName = $I->grabTextFrom('//a[#href="/s-xQ6qeR/documents/download?revid=28"]/span');
$I->assertEquals('test_upload_uwfacjtn.pdf', $fileName);
This can be usefull if you want to put the result inside a variable and use it for other tests later.
2. Using method seeElement with the text to compare inside your xpath
$I->seeElement('//span[text()="test_upload_uwfacjtn.pdf"]');
How to write the single xpath for this
<div class="col-lg-4 col-md-4 col-sm-4 profilesky"> <div class="career_icon">
<span> Boost </span> <br/>
Your Profile </div>
I am able to write by two line using "contains" method.
.//*[contains(text(),'Boost')]
.//*[contains(text(),'Your Profile')]
But i want in a single line to write the xpath for this.
You can try this way :
.//*[#class='career_icon' and contains(., 'Boost') and contains(., 'Your Profile')]
Above xpath check if there is an element having class attribute equals career_icon and contains both Boost and Your Profile texts in the element body.
Note that text() only checks direct child text node. To check entire text content of an element simply use dot (.).
You can combine several rules just by writing them one after another since they refer to the same element:
.//[contains(text(),'Boost')][contains(text(),'Your Profile')]
I have the following HTML structure (there are many blocks using the same architecture):
<span id="mySpan">
<i>
Price
<b>
3 900
<small>€</small>
</b>
</i>
</span>
Now, I want to get the content of <b> using Xpath which I tried like so:
//span[#id="mySpan"]/i/node()[1][contains(text(),"Price")]
which does match anything. How can I match this using the node()[1] text as anchor?
Regarding the Xpath you tried, instead of text() which return text node child, simply use . :
//span[#id="mySpan"]/i/node()[1][contains(.,"Price")]
For the ultimate goal, I'd suggest this XPath :
//span[#id="mySpan"]/i[contains(.,"Price")]/b
or if you want specifically to match against the first node within <i> :
//span[#id="mySpan"]/i[contains(node(),"Price")]/b
I want to get a class name like the following:
class="hostHostGrid0_body"
The integer in between hostHostGrid and _body can change, but everything else I want it just like that in the order.
How can I achieve this?
In XPath 1.0 you can use this:
//*[starts-with(#class,'hostHostGrid') and substring-after(#class,'_') = 'body']
to select any element containing one class. It will match tags in any context. It will match all three elements below:
<div class="hostHostGrid0_body">
<span class="hostHostGrid123_body"/>
<b class="hostHostGrid1_body">xxx</b>
</div>
Limitations: it doesn't restrict what is between them to a number. It can be anything, including spaces (ex: it will also match this: class="hostHostGrid xyz abc_body")
This one allows for the class occurring among other classes:
//*[contains(substring-before(#class,'_body'),'hostHostGrid')]
It will match:
<div class="other-class hostHostGrid0_body">
<span class="hostHostGrid123_body other-class"/>
<b class="hostHostGrid1_body">xxx</b>
</div>
(it also has the same limitations - will match anything between 'hostHostGrid' and '_body')