How to get any text between an opening and closing node with xpath? - xpath

I want to get the specified text as in example but when I used strong[3] but it returns "Text5:" as expected. How can I get the airport name section with xpath?
Code:
<tr>
<td>
<strong>Text1 </strong>Text2
<strong> Text3: </strong>Text4
<strong>Text5:</strong> Text_Text_Text_Text_Text
</td>
</tr>
The part that I need:
Text_Text_Text_Text_Text

The solution is /tr/td/text()[3]

Related

Xpath: Wildcards for descendant nodes not working

Desired output: 3333
<tbody>
<tr>
<td class="name">
<p class="desc">Intel</p>
</td>
</tr>
Other tr tags
<tr>
<td class="tel">
<p class="desc">3333</p>
</td>
</tr>
</tbody>
I want to select the last tr tag after the tr tag that has "Intel" in the p tag
//tbody//tr[td[p[contains(text(),'Intel')]]]/followingsibling::tr[position()=last()]//p/text()
The above works but I don't wish to reference td and p explicitly. I tried wildcards ? or *, but it doesn't work.
//tbody//tr[?[?[contains(text(),'Intel')]]]/followingsibling::tr[position()=last()]//p/text()
"...which contains a text node equal to 'Intel'"
//tbody/tr[.//text() = 'Intel']/following-sibling::tr[last()]/td/p/text()
"...which contains only the string 'Intel', once you remove all insignificant white-space"
//tbody/tr[normalize-space() = 'Intel']/following-sibling::tr[last()]/td/p/text()
I think the key take-away here is that you can use descendant paths (//) and pay attention to context in predicates once you make them relative (.//).

Html Agility Pack search all nodes and save them

I shall search over whole website entries with "00:00-00:01" and replace with "" , like below.
<td id="tb"> Fr, 3.Sep.2021 00:00-00:01 </td>...<td id="tb"> Fr,3.Sep.2021 </td>
or
<td class="tbda">Fr, 3.Sep.2021 00:00-00:01</td>...<class="tbda">Fr, 3.Sep.2021 </td>
or
<b>Fr, 3.Sep.2021 00:00-00:01</b>...<b>Fr, 3.Sep.2021</b>
A single one is no problem but how can I found all and how can I save the path to this?
One way is to use regex:
re.findall(r'<td\s+id="tb">(\w+,\s+\d+\.\w+.2021\s+[0-9:]{4}-[0-9:]{4})</td>',text)
But you want more details, how it was found and where. So find all matched tags first, then find all content between them, then save it with an html tag. Like below:
<div>
<tr> # this is the start tag </tr>
<td id="tb">Fr, 3.Sep.2021 00:00-00:01</td> # this is the end content </td> # this is the end tag </tr>
... more tr ...
</div>
The idea can be found in How to convert an XML file to nice pandas dataframe? .

XPath extract value within attribute

This is my HTML code so far:
<tr valign="top">
<td nowrap="x">Citation(s)</td>
<td>
<span class="pubmed_id" id="26472973">
26472973
</span>
</td>
</tr>
I would like to extract the number 26472973, which is a value that changes for each entry in the database.
It is unclear if you want to get either the value from the attribute #id or the following a element.
So, for the attribute value, try this XPath:
//tr[#valign='top']/td/span[#class='pubmed_id']/#id
Or, for the element's a value use this XPath:
//tr[#valign='top']/td/span[#class='pubmed_id']/a/text()
In both case the result is 26472973.
In case you just want the 'citations', here another try:
//tr/td[text()='Citation(s)']/following-sibling::td/span/#id

XPath: Getting a node by attribute value of subnode

People, could you please help me with this XPATH. Lets say I have the following HTML code
<table>
<tr>
<td class="clickable">text</td>
<td>value1</td>
</tr>
<tr>
<td>value2</td>
<td>text</td>
</tr>
</table>
I need to build a XPath that will pick <tr>that have <td> with value text AND attribute class equals clickable.
I tried the following xpath:
//tr[contains(.,'text')][contains(./td/#class,'clickable')]
//tr[contains(.,'text')][contains(td/#class,'clickable')]
but none of those worked
Any help is appreciated
Thanks
You are almost there:
//tr[contains(td/#class,'clickable') and contains(td, 'text')]
Demo using xmllint:
$ xmllint input.xml --xpath "//tr[contains(td/#class,'clickable') and contains(td, 'text')]"
<tr>
<td class="clickable">text</td>
<td>value1</td>
</tr>
If you find tr with a td having value text and a td (maybe, another) with attribute class equals clickable, use answer of #alecxe.
If that is one td with two condition then
//tr[td[.='text' and #class='clickable']]

Finding first matching sibling element while traversing the DOM

I am trying to create an xpath expression that will find the first matching sibling 'down' the dom given an initial sibling (note: initial siblings will be Tom and Steve). For example, I want to find 'jerry1' under the 'Tom' tr. I have looked into the following-sibling argument, but I'm not sure that's the best approach for this? Any ideas?
<tr>
<a title=”Tom”/>
</tr>
<tr>
<a title=”jerry1”/>
</tr>
<tr>
<a title=”jerry2”/>
</tr>
<tr>
<a title=”jerry3”/>
</tr>
<tr>
<a title=”Steve”/>
</tr>
<tr>
<a title=”jerry1”/>
</tr>
<tr>
<a title=”jerry2”/>
</tr>
<tr>
<a title=”jerry3”/>
</tr>
following-sibling will work. This will select the a node with the title "jerry1":
//a[#title='Tom']/../following-sibling::tr/a
The /.. traverses up to Tom's parent <tr>, then following-sibling to the next <tr>, then finally the <a> node within that.
Following XPath worked for me:
(//a[#title='Tom']/parent::*/following-sibling::tr/a[#title= 'jerry1'])[1]
First matching a with title jerry1 following a tr with an a-child with title Tom.
Starting at a[#title='Tom'], going to the parent tr with /parent , selecting all following sibling tr-nodes with ::*/following-sibling::tr, that have an /a[#title= 'jerry1'] as child node. Because this would select 2 jerry1-nodes and the first jerry1 following Tom is searched, selecting the first one by wrapping the XPath with () and choosing the first match with [1].
The following XPath statement finds the first tr element that has an a with the #title "jerry1" that is a following-sibling of the tr element that has an a with the #title of "Tom"
//tr[a/#title='Tom']/following-sibling::tr[a/#title='jerry1'][1]

Resources