Access to a text before a node with xpath - xpath

I have an html text like this
<span class="item-detail">2 <small>hab.</small></span>
<span class="item-detail">64 <small>m²</small></span>
<span class="item-detail">Planta 3ª <small>exterior con ascensor</small></span>
Which is the best way to select the 64 (which is the number of meters), taking into account that the order of the elements may vary

I am still not 100% certain of the requirement, but try this:
//span[small/text()="m²"]/text()[1]

You can use below x-path
.//span[#class='item-detail' and child::small[contains(.,'m²')]]/text()[1]

Related

Select either A or B with Or Operator in XPath

I'm trying to crawl some websites, and the data I want can be found either of these places depending on the site:
Page 1:
<div>
<ul>
<li class="asd"> SomeText1 </li>
</ul>
</div>
Page 2:
<div>
<ul>
<li class="dsa"> SomeText2 </li>
</ul>
</div>
I would like an XPath expression which tries to select SomeText1 first, and if it doesn't exist, tries to get SomeText2.
I've tried //li[#class="asd"]/text() or //li[#class="dsa"]/text(), but this doesn't seem to cut it.
Am I using the or operator wrong? If so, how is it supposed to be used?
EDIT
I'm trying to feed a crawler an XPath in order to find information to store in a DB. On a given webpage, can the information I'm trying to get be two different places?
Which means webpage 1 could be:
<AA>
<BB>
<CC> Test </CC>
</BB>
</AA>
and on another there could be
<DD>
<EE>
<FF> Test </FF>
</EE>
</DD>
How can I construct an XPath expression which can say either do
AA/BB/CC or (if it fails/doesn't exist) DD/EE/FF?
You can shorten it to:
//li[#class = 'asd' or #class = 'dsa']/text()
Having said that, "not working" is never an accurate description of what went wrong. A potential source of error is double quotes instead of single quotes. If there are double quotes arround the expression, any quotes inside must be single.
Am I using the or operator wrong ?
No, your usage of the or operator is fine. Something else went wrong. (To really diagnose your problem, we'd need more context).
Try...
//li[#class="asd" or #class="dsa"]/text()

Extracting content between two tags with XPath

I've just started working with XPath recently and run into a problem. Here is the code I want to extract from:
<h3>Some Company</h3>
Mainstreet 1234
<br>
98776, Country
<br>
How would I extract the content between the closing h3 and br tag?
Try //h3/following-sibling::text()[following::br]
This could work h3/following-sibling::node()[not(preceding-sibling::br) and not(self::br)] (returns "Mainstreet 1234" for me).
But I'm affraid your real xml and real needs are more complicated than provided sample so it is possible you will need to further adjust it to fit you requirements.
If your code was in the block below:
<par>
<h3>Some Company</h3>
Mainstreet 1234
<br>
98776, Country
</br>
</par>
You will need to tell XPath to give you the text inside every par node that is after an h3 node and before a br node.
In XPath terms this translates to:
//par/text()[preceding::*[name()='h3'] and following::*[name()='br']]
The above would search everywhere in the document for a par node. You can get more specific about the content of the h3 and/or br nodes as well:
//par/text()[preceding::*[name()='h3' and text()='Some Company'] and following::*[name()='br']]
Please let me know if the above does not resolve your problem.

trouble with xpath query in rapid miner

I'm having trouble using xpath in Rapidminer. Below is a sample html that I'm trying to pull data from. I'm having trouble getting the number 7001 and Calfornia.
I use //h:span[#class='detail-block']//h:/text() and I can get "Number:"
Then I try //h:span[#class='detail-block']/span//h:/text() and get nothing. I tried a bunch of variation of this and still come up with nothing. I'm able to get things to work on google spreadsheet =importXML, but not rapidminer.
<div class="information">
<h2 class="underline">Information</h2>
<span class="detail-block"><span class="detail-attribute">Number: </span>
<span>7001</span></span>
<span class="detail-block"><span class="detail-attribute">Location: </span> <span>California</span></span>
I do not see why your "working" example (//h:span[#class='detail-block']//h:/text()) should do. The h: is an namespace prefix. hand has to be followed by an node or an attribute.
//h:span[#class='detail-block']//text() will return any dependent text nodes to span[#class='detail-block']: Number: 7001 Location: California
For "Number:" use:
//h:span[#class='detail-block'][1]/h:span[1]/text()
For "7001
//h:span[#class='detail-block'][1]/h:span[2]//text()
And for "California"
//h:span[#class='detail-block'][2]/h:span[2]//text()

Unable to find Dynamic Xpath

My xpath is :
//*[#id='form_MenuBar:j_id24']/span
and the value # 24 changes.
//*[#id='form_MenuBar:j_id48']/span
I tried but doesn't works.
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id$')]/span"));
Source XML:
<li class="ui-menuitem ui-widget ui-corner-all ui-menuitem-active" role="menuitem">
<a id="form_MenuBar:j_id24" class="ui-menuitem-link ui-corner-all ui-state-hover" href="/Demand/j_spring_security_logout">
<span class="ui-menuitem-text">Log off</span>
</a>
</li>
Just try
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id')]/span"));
If you are using contains in xpath no need to use '$'.
It appears that you are using java so I'll try to answer it based on that. I'm not a java developer so I apologize if it's not syntactically correct.
If all that is changing is the number within the ID, and you know the ID, you could do:
driver.findElement(By.id(String.format("form_MenuBar:j_id%d", the_id));
Also, I'm not sure about your application that you are testing, but if there are multiple elements that have an id beginning with "form_MenuBar:j_id", then findElement will only find the first one, which might not be the link you are attempting to find.
you could use findElements which will return a list of all elements that match that and then iterate through those until you find the one you really want.

How to match elements after an element with certain content or attribute?

Simplified example
<td>caption</a>
<a id="tt-1">text1</a>
<a id="tt-2">text2</a>
<td>topics</td>
<a id="tt-3">text3</a>
<a id="tt-4">text4</a>
<a id="tt-5">text5</a>
What I need is to match all a elements below <td>topics</td>.
Note that there are plenty of elements between those elements in example. Also <td> may be enclosed into other elements.
My current real-world XPath expression looks like this
//a[contains(#id,'tt-')]
Updated to be closer to real-world
Another update to clarify.
Based on your statement "What I need is to match all a elements below <td>topics</td>"
//td[.='topics']/a
I'm sure that's not the whole story, though.
Based on your updated example:
//a[starts-with(#id, 'tt-') and preceding-sibling::td[1] = 'topics']

Resources