how to find the preceding sibling of a link - xpath

I have the following I am trying to analyse using xpath
<table>
<tr>
<td>Name</td>
<td>Info</td>
<td>Download</td>
</tr>
<tr>
<td>Name2</td>
<td>Info</td>
<td>Download</td>
</tr>
....
<tr>
..
</tr>
</table>
I have the following xpath to grab the download links
$xpath->query("//a[text()='Download']/#href");
What I am trying to figure out is the query to send to grab the Name of each of the downloads.
The page has no div id markups at all, just plain table, tr, td tags.
I have tried something like
$xpath->query("//preceding-sibling::a[text()='Download']");
Does anyone have any idea on this?

Close!
Given a particular context node (here, the href attribute for a download), you want to find the eldest sibling of the td containing the context node. So your relative path should first ascend to the td and then find the oldest sibling:
parent::a/parent::td/preceding-sibling::td[last()]
or more briefly (and without assuming that there are no elements like p or span intervening between the td and the a):
ancestor::td[1]/preceding-sibling::td[last()]
Some users find the reverse numbering of nodes on the preceding-sibling axis confusing, so it may feel simpler to say that what you really want is the first td child of the smallest containing tr:
ancestor::tr[1]/child::td[1]
If you need in a single pass to pick up all the download links and the textual label for them, then how you do it depends on the context in which you're using XPath. In XSLT, for example, you might write:
<xsl:apply-templates select="//a[text()='Download']/#href"/>
and then fetch the label in the appropriate template:
<xsl:template match="a/#href">
<xsl:value-of select="string(ancestor::tr[1]/td[1])"/>
:
<xsl:value-of select="."/>
</xsl:template>
In other host languages, you will want to do something similar. The key problem is that you have to iterate over the nodes matching your expression for href, and then for each of those nodes you need to move back in the document to pick up the label. How you say "evaluate this second XPath expression based on the current node from the first XPath expression" will vary with your environment.

Related

Choosing multiple elements by using a square bracket at the end vs enclosing the whole xpath in braces before sq brackets?

I've come across cases when there are lots of links on the page and the following xpath works for choosing the first one:
//tag[#...]/div/a[1]
There are other cases when the above xpath doesn't work and then I need to use it the following way:
(//tag[#...]/div/a)[1]
As I write lengthier xpaths to code in business logic for which elements to select, this difference starts getting all the more complicated where the same xpath has multiple combinations of both of these.
What is the difference exactly between writing xpaths in these two ways? I've seen that for any particular occasion one of them works and the other doesn't.
Consider this sample HTML:
<table>
<tbody>
<tr>
<td>1.1</td>
<td>1.2</td>
<td>1.3</td>
</tr>
<tr>
<td>2.1</td>
<td>2.2</td>
<td>2.3</td>
</tr>
<tr>
<td>3.1</td>
<td>3.2</td>
<td>3.3</td>
</tr>
</tbody>
</table>
Here you can use //table/tbody/tr/td[index] to go through <td> elements of only first row <tr>. //table/tbody/tr will return the first match, which is your first row and then indexing is done only on the <td> elements in first row. So valid indexes are 1,2,3.
But you can use (//table/tbody/tr/td)[index] if you want to go through all <td> values in the table. Here the indexing applies on the whole xpath which is same for all the <td> elements. So valid indexes are 1,2,3,..9.

Xpath - Selecting attributes using starts-with

I am trying to write an xpath expression that selects all div tags that have an attribute id that start with CompanyCalendar. Below is a snippet of the HTML that I am looking at:
<td class="some class" align="center" onclick="Calendar_DayClicked(this,'EventCont','Event');">
<span class="Text"></span>
<div id="CompanyCalendar02.21" class="Pop CalendarClick" style="right: 200px; top: 235px;"></div>
There are multiple divs that have an id like CompanyCalendar02.21 but for each new month in the calendar, they change the id. For example, the next month would be CompanyCalendar02.22. I would like to be able to select all of the divs that are equal to CompanyCalendar*
I am rather new at this so I was using some example off the net to try and get my xpath expression to work but to no avail. Any help would be greatly appreciated.
I am trying to write an xpath expression that selects all div tags that have an attribute id that start with CompanyCalendar.
The following expression is perhaps what you are looking for:
//div[starts-with(#id,'CompanyCalendar')]
What it does, in plain English, is
Return all div elements in the XML document that have an attribute id whose attribute value starts with "CompanyCalendar".
While checking in Browser console with the $x() call, it worked only after flipping the quotes - i.e. double quotes inside the Xpath starts-with() call.
$x('//div[starts-with(#id,"CompanyCalendar")]')

Selenium WebDriver and xpath: a more complicated selection

So let's say my structure looks like this at some point:
..........
<td>
[...]
<input value="abcabc">
[...]
</td>
[...]
<td></td>
[...]
<td>
<input id="booboobooboo01">
<div></div> <=======I want to click this!
</td>
.........
I need to click that div, but I need to be sure it's on the same line as the td containing the input with value="abcabc". I also know that the div I need to click (which doesn't have id or any other relevant attribute I can use) is in a td at the same level as the first td, right after the input with id CONTAINING "boo" (dynamically generated, I only know the root part of the id). td's contain nothing relevant I can use.
This is what I tried as far as xpath goes:
//input[#value='abcabc']/../td/input[contains(#id,'boo')]/following-sibling::div
//input[#value='abcabc']/..//td/input[contains(#id,'boo')]/following-sibling::div
None of them worked, of course (element cannot be found).
I want to know if there's a way of selecting that div and how.
EDIT: //input[#value='abcabc']/../../td/input[contains(#id,'boo')]/following-sibling::div is the correct way. This was suggested by the person with the accepted answer. Also note that he offered a slightly different way of doing it. See his answer for details.
Try
//input[#value='abcabc']/ancestor::tr[1]/td/input[contains(#id,'boo')]/following-sibling::div[1]
Note that //input[#value='abcabc']/.. only goes up to the parent <td>, that's why your's did not work.
Another XPath that may work, is a bit more simple:
//input[#id='booboobooboo01']/../div[1]

XPath selection while excluding elements having certain attribute values

My first post here - it's a great site and I will certainly do my best to give back as much as I can.
I have seen different manifestations of this following question; however my attempts to resolve don't appear to work.
Consider this simple tree:
<root>
<div>
<p>hello</p>
<p>hello2</p>
<p><span class="bad">hello3</span></p>
</div>
</root>
I would like to come up with an XPath expression that will select all child nodes of "div", except for elements that have their "class" attribute equal to "bad".
Here is what I have tried:
/root/div/node()[not (#class='bad')]
... However this doesn't seem to work.
What am I missing here?
Cheers,
Isaac
When testing your XPath here with the provided XML document, the XPath seems to be indeed selecting all child nodes that do not have an attribute class="bad" - these are all the <p> elements in the document.
You will note that the only child node that has such an attribute is the <span>, which indeed does not get selected.
Are you expecting the p node surrounding your span not to be selected?
I have been working with XPath in a Java program I'm writing. If you want to select the nodes that don't have class="bad" (i.e. the <span> nodes, but not the surrounding <p> nodes), you could use:
/root/div/descendant::*[not (#class='bad')]
Otherwise, if you want to select the nodes that don't have a child with class='bad', you can use something like the following:
/root/div/p/*[not (#class='bad')]/..
the .. part selects the immediate parent node.
The identity transform just matches and copies everything:
<xsl:template match="#*|node()" >
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
But you add a null transform that more specifically matches the pattern you want to exclude:
<xsl:template match="span[#class='bad']" />
( you can also add a priority attrib if you want to be more explicit about which one has precedence. )
Welcome to SO, Isaac!
I'd try this:
/root/div/*[./*[#class != "bad"]]
this ought to select all child elements (*) of the div element that do not have a descendant element with a class attribute that equals bad.
Edit:
As per #Alejandros comment:
/root/div/*[not(*/#class "bad")]

Xpath query to find elements which contain a certain descendant

I'm using Html Agility Pack to run xpath queries on a web page. I want to find the rows in a table which contain a certain interesting element. In the example below, I want to fetch the second row.
<table name="important">
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm interested in</td>
<td><interestingtag/></td>
<td>More stuff I'm interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
</table>
I'm looking to do something like this:
//table[#name='important']/tr[has a descendant named interestingtag]
Except with valid xpath syntax. ;-)
I suppose I could just find the interesting element itself and then work my way up the parent chain from the node that's returned, but it seemed like there ought to be a way to do this in one step and I'm just being dense.
"has a descendant named interestintag" is spelled .//interestintag in XPath, so the expression you are looking for is:
//table[#name='important']/tr[.//interestingtag]
Actually, you need to look for a descendant, not a child:
//table[#name='important']/tr[descendant::interestingtag]
I know this isn't what the OP was asking, but if you wanted to find an element that had a descendant with a particular attribute, you could do something like this:
//table[#name='important']/tr[.//*[#attr='value']]
I know it is a late answer but why not going the other way around. Finding all <interestingtag/> tags and then select the parent <tr> tag.
//interestingtag/ancestor::tr

Resources