This question already has answers here:
Using XPath, How do I select a node based on its text content and value of an attribute?
(2 answers)
Closed 4 years ago.
Let's have an example table:
<table>
<tr><td>foo</td><td>bar</td><td>xxx</td></tr>
<tr><td>xxx</td><td>bar</td><td>baz</td></tr>
<tr><td>foo</td><td>bar</td><td>baz</td></tr>
<tr><td>bar</td><td>baz</td><td>foo</td></tr>
<tr><td>foo</td><td>xxx</td><td>baz</td></tr>
</table>
I would like to select row with values: "foo" "bar" "baz". It's important to select this row by values and not to use absolute path (Table content will have different order each time).
//tr[td/text()='foo' and td/text()='bar' and td/text()='baz']
and, if order is important:
//tr[td[1]/text()='foo' and td[2]/text()='bar' and td[3]/text()='baz']
Related
I have an XML of the form:
<articleslist>
<articles>
<originalId>507948</originalId>
<title>Hogan Lovells Training Contract</title>
<slug>hogan-lovells-training-contract</slug>
<metaTitle>Hogan Lovells Training Contract</metaTitle>
<metaDescription>Find out about the Hogan Lovells Training Contract and Application Process</metaDescription>
<language>en</language>
<disableAds>false</disableAds>
<shortUrl>false</shortUrl>
<category_slug>law</category_slug>
<subcategory_slug>industry</subcategory_slug>
<updatedAt>2021-03-15T18:38:51.058+00:00</updatedAt>
<createdAt>2018-11-29T06:42:51.665+00:00</createdAt>
</articles>
</articlelist>
I'm able to select the row values with the XPATH //articles.
How can I select the child properties of articles (i.e. the column headings), so I get back a list of the form:
originalId
title
slug
etc...
Depends on your XPath version.
In XPath 2.0 it's simply //articles/*/name()
In 1.0 it's not possible because there's no such data type as a "sequence of strings". You would have to return the set of elements as //articles/*, and then extract their names in the calling program.
I'm trying to use Xpath in order to select an HTML tag based on its value
Here is my html code:
<span class="yellowbird">Continue</span>
<span class="yellowbird">Stop</span>
I can select the span elements with a specific class value using
//span[contains(#class, 'yellowbird')]
However I'm struggling to select only the element which contains the value "Continue"
This XPath expression will select any span element whose class attribute equals yellowbird and text equals Continue:
//span[#class='yellowbird' and text()='Continue']
Here is the syntax I used to make this work using request.xpath and scrapy
//span[contains(#class, 'yellowbird')][1]//text()='Continue'
This question already has answers here:
How to get raw XML back from lxml?
(2 answers)
Closed 4 years ago.
I have get elements by XPath as follows. This found all <tr> tags. The <tr> elements have some content. How can I get the HTML of a single tr element?
tbody = tbody_element1[0].xpath('.//tbody')
if tbody:
tr_value = tbody[0].xpath('.//tr')
tr_value is an array of all tr elements inside the tbody element.
To obtain raw html I use etree
if tbody:
tr_value = tbody[0].xpath('.//tr')
raw_value = etree.tostring(tr_value[0])
now raw_value have html content which is contains by tr_value
Is there an xpath way to select a given attribute value?
For example I have an html document and want to select only "?ms=669601" :
<input type="button" value="تفاصيل" onclick="xmlreqGET("?ms=669601","jm1x");">
In your simple example, you could simply select that portion of the onclick attribute in the only input:
substring(input/#onclick, 12, 10)
In more complicated documents, try selecting first by #value (or some other (possibly unique) criteria):
substring(//input[#value='تفاصيل']/#onclick, 12, 10)
Or by targeting the input that contains part of the desired substring:
substring(//input[contains(#onclick, 'xmlreqGET(')]/#onclick, 12, 10)
Selecting the input element itself if its onclick attribute contains the target string:
//input[contains(#onclick, '?ms=669601')]
Note: Your input is not valid XML, due to nested double-quotes.
What XPATH query could i use to get the values from the 1st and 3rd <td> tag for each row in a html table.
The XPATH query I have used use is
/table/tr/td[1]|td[3].
This only returns the values in the first <td> tag for each row in a table.
EXAMPLE
I would expect to get the values bob,19,jane,11,cameron and 32 from the below table. But am only getting bob,jane,cameron.
<table>
<tr><td>Bob</td><td>Male</td><td>19</td></tr>
<tr><td>Jane</td><td>Feale</td><td>11</td></tr>
<tr><td>Cameron</td><td>Male</td><td>32</td></tr>
</table>
#jakenoble's answer:
/table/tr/td[1]|/table/tr/td[3]
is correct.
An equivalent XPath expression that avoids the | (union) operator and may be more efficient is:
/table/tr/td[position() = 1 or position() = 3]
Try
/table/tr/td[1]|/table/tr/td[3]
I remember doing this in the past and found it rather annoying because it is ugly and long-winded