trouble with xpath query in rapid miner - xpath

I'm having trouble using xpath in Rapidminer. Below is a sample html that I'm trying to pull data from. I'm having trouble getting the number 7001 and Calfornia.
I use //h:span[#class='detail-block']//h:/text() and I can get "Number:"
Then I try //h:span[#class='detail-block']/span//h:/text() and get nothing. I tried a bunch of variation of this and still come up with nothing. I'm able to get things to work on google spreadsheet =importXML, but not rapidminer.
<div class="information">
<h2 class="underline">Information</h2>
<span class="detail-block"><span class="detail-attribute">Number: </span>
<span>7001</span></span>
<span class="detail-block"><span class="detail-attribute">Location: </span> <span>California</span></span>

I do not see why your "working" example (//h:span[#class='detail-block']//h:/text()) should do. The h: is an namespace prefix. hand has to be followed by an node or an attribute.
//h:span[#class='detail-block']//text() will return any dependent text nodes to span[#class='detail-block']: Number: 7001 Location: California
For "Number:" use:
//h:span[#class='detail-block'][1]/h:span[1]/text()
For "7001
//h:span[#class='detail-block'][1]/h:span[2]//text()
And for "California"
//h:span[#class='detail-block'][2]/h:span[2]//text()

Related

Access to a text before a node with xpath

I have an html text like this
<span class="item-detail">2 <small>hab.</small></span>
<span class="item-detail">64 <small>m²</small></span>
<span class="item-detail">Planta 3ª <small>exterior con ascensor</small></span>
Which is the best way to select the 64 (which is the number of meters), taking into account that the order of the elements may vary
I am still not 100% certain of the requirement, but try this:
//span[small/text()="m²"]/text()[1]
You can use below x-path
.//span[#class='item-detail' and child::small[contains(.,'m²')]]/text()[1]

Avoid parentheses in path using XPath 1.0

The following XML structure represents a website with many articles. Every article contains, among many other things, date of its creation and possibly arbitrarily many dates of its modification. I want to get the date of the last access (either creation or last modification) to every article using XPath 1.0.
<website>
<article>
<date><strong>22.11.2017</strong></date>
<edits>
<edit><strong>17.12.2017</strong></edit>
</edits>
</article>
<article>
<date><strong>17.4.2016</strong></date>
<edits></edits>
</article>
<article>
<date><strong>3.5.2011</strong></date>
<edits>
<edit><strong>4.5.2011</strong></edit>
<edit><strong>12.8.2012</strong></edit>
</edits>
</article>
<article>
<date><strong>12.2.2009</strong></date>
<edits></edits>
</article>
<article>
<date><strong>23.11.1987</strong></date>
<edits>
<edit><strong>3.4.2001</strong></edit>
<edit><strong>11.5.2006</strong></edit>
<edit><strong>13.9.2012</strong></edit>
</edits>
</article>
</website>
In other words, the expected output is:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
So far I've only created this path:
//article/*[self::date or self::edits/edit][last()]
that looks for date and nonempty edits nodes in every article and selects the latter one. But I don't know how to access the latest strong of every such selection and the naive //strong[last()] appended to the end of the path doesn't work.
I found a solution in XPath 2.0. Either of these paths should work, if I'm not mistaken:
//article/(*[self::date or self::edits/edit][last()]//strong)[last()]
//article/(*//strong)[last()]
Such use of parentheses within path is invalid in XPath 1.0 though.
This XPath 1.0 expression
/website/article/descendant::strong[parent::date|parent::edit][last()]
Selects the nodes:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
Tested in http://www.xpathtester.com/xpath/56d8f7bc4b9c8c064fdad16f22469026
Do note: position predicates acts over the context list.
Here is the simple xpath to get your output.
//article/descendant-or-self::strong[last()]

Xpath how to extract all texts in a scope?

<div class="main">
<p>Peter got some troubles.</p>
<p>I gave him my hand.</p>
<p>But Sam didn't.</p>
</div>
How can I extract all texts in the div.main with xpath?
I've tried string(//div[#class="main"]/p), but it only extracted the first line:
Peter got some troubles.
But I hope I can process all lines like:
Peter got some troubles.
I gave him my hand.
But Sam didn't.
The string value of the div element should give you what you want. In other words, take off the /p at the end of your XPath expression. The problem with your expression is that string() takes only the first node in the nodeset.

Extracting content between two tags with XPath

I've just started working with XPath recently and run into a problem. Here is the code I want to extract from:
<h3>Some Company</h3>
Mainstreet 1234
<br>
98776, Country
<br>
How would I extract the content between the closing h3 and br tag?
Try //h3/following-sibling::text()[following::br]
This could work h3/following-sibling::node()[not(preceding-sibling::br) and not(self::br)] (returns "Mainstreet 1234" for me).
But I'm affraid your real xml and real needs are more complicated than provided sample so it is possible you will need to further adjust it to fit you requirements.
If your code was in the block below:
<par>
<h3>Some Company</h3>
Mainstreet 1234
<br>
98776, Country
</br>
</par>
You will need to tell XPath to give you the text inside every par node that is after an h3 node and before a br node.
In XPath terms this translates to:
//par/text()[preceding::*[name()='h3'] and following::*[name()='br']]
The above would search everywhere in the document for a par node. You can get more specific about the content of the h3 and/or br nodes as well:
//par/text()[preceding::*[name()='h3' and text()='Some Company'] and following::*[name()='br']]
Please let me know if the above does not resolve your problem.

Unable to find Dynamic Xpath

My xpath is :
//*[#id='form_MenuBar:j_id24']/span
and the value # 24 changes.
//*[#id='form_MenuBar:j_id48']/span
I tried but doesn't works.
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id$')]/span"));
Source XML:
<li class="ui-menuitem ui-widget ui-corner-all ui-menuitem-active" role="menuitem">
<a id="form_MenuBar:j_id24" class="ui-menuitem-link ui-corner-all ui-state-hover" href="/Demand/j_spring_security_logout">
<span class="ui-menuitem-text">Log off</span>
</a>
</li>
Just try
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id')]/span"));
If you are using contains in xpath no need to use '$'.
It appears that you are using java so I'll try to answer it based on that. I'm not a java developer so I apologize if it's not syntactically correct.
If all that is changing is the number within the ID, and you know the ID, you could do:
driver.findElement(By.id(String.format("form_MenuBar:j_id%d", the_id));
Also, I'm not sure about your application that you are testing, but if there are multiple elements that have an id beginning with "form_MenuBar:j_id", then findElement will only find the first one, which might not be the link you are attempting to find.
you could use findElements which will return a list of all elements that match that and then iterate through those until you find the one you really want.

Resources