How to select row by it's values using Xpath [duplicate] - xpath

This question already has answers here:
Using XPath, How do I select a node based on its text content and value of an attribute?
(2 answers)
Closed 4 years ago.
Let's have an example table:
<table>
<tr><td>foo</td><td>bar</td><td>xxx</td></tr>
<tr><td>xxx</td><td>bar</td><td>baz</td></tr>
<tr><td>foo</td><td>bar</td><td>baz</td></tr>
<tr><td>bar</td><td>baz</td><td>foo</td></tr>
<tr><td>foo</td><td>xxx</td><td>baz</td></tr>
</table>
I would like to select row with values: "foo" "bar" "baz". It's important to select this row by values and not to use absolute path (Table content will have different order each time).

//tr[td/text()='foo' and td/text()='bar' and td/text()='baz']
and, if order is important:
//tr[td[1]/text()='foo' and td[2]/text()='bar' and td[3]/text()='baz']

Related

Is it possible to select the properties of a node a XPATH?

I have an XML of the form:
<articleslist>
<articles>
<originalId>507948</originalId>
<title>Hogan Lovells Training Contract</title>
<slug>hogan-lovells-training-contract</slug>
<metaTitle>Hogan Lovells Training Contract</metaTitle>
<metaDescription>Find out about the Hogan Lovells Training Contract and Application Process</metaDescription>
<language>en</language>
<disableAds>false</disableAds>
<shortUrl>false</shortUrl>
<category_slug>law</category_slug>
<subcategory_slug>industry</subcategory_slug>
<updatedAt>2021-03-15T18:38:51.058+00:00</updatedAt>
<createdAt>2018-11-29T06:42:51.665+00:00</createdAt>
</articles>
</articlelist>
I'm able to select the row values with the XPATH //articles.
How can I select the child properties of articles (i.e. the column headings), so I get back a list of the form:
originalId
title
slug
etc...
Depends on your XPath version.
In XPath 2.0 it's simply //articles/*/name()
In 1.0 it's not possible because there's no such data type as a "sequence of strings". You would have to return the set of elements as //articles/*, and then extract their names in the calling program.

select element based on class and attribute value

I'm trying to use Xpath in order to select an HTML tag based on its value
Here is my html code:
<span class="yellowbird">Continue</span>
<span class="yellowbird">Stop</span>
I can select the span elements with a specific class value using
//span[contains(#class, 'yellowbird')]
However I'm struggling to select only the element which contains the value "Continue"
This XPath expression will select any span element whose class attribute equals yellowbird and text equals Continue:
//span[#class='yellowbird' and text()='Continue']
Here is the syntax I used to make this work using request.xpath and scrapy
//span[contains(#class, 'yellowbird')][1]//text()='Continue'

how to find the raw html from a tag which is itself find by using xpath [duplicate]

This question already has answers here:
How to get raw XML back from lxml?
(2 answers)
Closed 4 years ago.
I have get elements by XPath as follows. This found all <tr> tags. The <tr> elements have some content. How can I get the HTML of a single tr element?
tbody = tbody_element1[0].xpath('.//tbody')
if tbody:
tr_value = tbody[0].xpath('.//tr')
tr_value is an array of all tr elements inside the tbody element.
To obtain raw html I use etree
if tbody:
tr_value = tbody[0].xpath('.//tr')
raw_value = etree.tostring(tr_value[0])
now raw_value have html content which is contains by tr_value

xpath expression to select attribute value

Is there an xpath way to select a given attribute value?
For example I have an html document and want to select only "?ms=669601" :
<input type="button" value="تفاصيل" onclick="xmlreqGET("?ms=669601","jm1x");">
In your simple example, you could simply select that portion of the onclick attribute in the only input:
substring(input/#onclick, 12, 10)
In more complicated documents, try selecting first by #value (or some other (possibly unique) criteria):
substring(//input[#value='تفاصيل']/#onclick, 12, 10)
Or by targeting the input that contains part of the desired substring:
substring(//input[contains(#onclick, 'xmlreqGET(')]/#onclick, 12, 10)
Selecting the input element itself if its onclick attribute contains the target string:
//input[contains(#onclick, '?ms=669601')]
Note: Your input is not valid XML, due to nested double-quotes.

How to get table data from tables using xpath

What XPATH query could i use to get the values from the 1st and 3rd <td> tag for each row in a html table.
The XPATH query I have used use is
/table/tr/td[1]|td[3].
This only returns the values in the first <td> tag for each row in a table.
EXAMPLE
I would expect to get the values bob,19,jane,11,cameron and 32 from the below table. But am only getting bob,jane,cameron.
<table>
<tr><td>Bob</td><td>Male</td><td>19</td></tr>
<tr><td>Jane</td><td>Feale</td><td>11</td></tr>
<tr><td>Cameron</td><td>Male</td><td>32</td></tr>
</table>
#jakenoble's answer:
/table/tr/td[1]|/table/tr/td[3]
is correct.
An equivalent XPath expression that avoids the | (union) operator and may be more efficient is:
/table/tr/td[position() = 1 or position() = 3]
Try
/table/tr/td[1]|/table/tr/td[3]
I remember doing this in the past and found it rather annoying because it is ugly and long-winded

Resources