How do I select sets of nodes with a single XPath query? - xpath

I'm trying to extract journey and price information from my favorite airline.
I have a search results page that looks like this:
MASwings search results http://img28.imagevenue.com/aAfkjfp01fo1i-2846/loc29/42467_dayview_oneway_122_29lo.jpg
EDIT: Image host might have blocked the hotlink. See the image on this page: http://img28.imagevenue.com/img.php?image=42467_dayview_oneway_122_29lo.jpg
Repro URL for booking query
I can select each row that represents a flight using this XPath selector:
//*[#class="servicecode "]/ancestor::tr[1]
But each flight row is not an independent journey; the flights are really grouped into legs, and these are what I want to select.
The row class alternates for each new leg: the rows of the first leg have class "datarow", and the rows of the next leg have "datarow alt". In Python I can group the nodes selected by the above expression using itertools.groupby, but if there is a way to acheive this purely in XPath, I would prefer it.
An extension to this question: my selector selects all rows, whether the flight is sold out or not. I can select the first flight of every bookable journey using this selector:
//*[contains(#class, "datarow")][.//input]
But if the leg has more than one flight, then I will have to look for following sibling with the same class using another XPath query.
Is there a single XPath query that will return me each bookable leg as a nodeset?
Note: I'm using the Python lxml library, in case that matters.

I can select each row that represents a flight using this XPath selector:
//*[#class="servicecode "]/ancestor::tr[1]
But each flight row is not an independent journey; the flights are really grouped into legs, and these are what I want to select.
The row class alternates for each new leg: the rows of the first leg have class "datarow",
Use:
//tr[#class='datarow'][.//*[#class='servicecode']]
An extension to this question: my
selector selects all rows, whether the
flight is sold out or not. I can
select the first flight of every
bookable journey using this selector:
//*[contains(#class, "datarow")][.//input]
But if the leg has more than one
flight, then I will have to look for
following sibling with the same class
using another XPath query.
Is there a single XPath query that
will return me each bookable leg as a
nodeset?
Yes:
(//tr[#class='datarow'])[1]//input
|
(//tr[#class='datarow'])[1]
//following-sibling::tr[#class='datarow altrow']
[count(preceding-sibling::tr[#class='datarow'])=1]
//input
This XPath expression selects all tr elements that represent each bookable leg (in this case 3 legs) of the first journey.
To get all legs of the second journey, substitute 1 in the above expression with 2.
To get all legs of the k-th journey, substitute 1 in the above expression with the actual value of k.

This does what I want. But is there a more elegant solution?
//*[contains(#class, "columns")]//tr[contains(#class, "datarow")][1]
|
//*[contains(#class, "columns")]//tr[not(contains(#class, "altrow"))]
[preceding-sibling::tr[1]
[contains(#class, "altrow")]]
|
//*[contains(#class, "columns")]//tr[contains(#class,"altrow")]
[preceding-sibling::tr[1]
[not(contains(#class, "altrow"))]]
The second part selects each set of consecutive rows with class not containing "altrow" as a single nodeset.
The third part selects each set of consecutive rows with class containing "altrow" as a single node set.
The first part selects the first set of consecutive rows with class not containing "altrow", because it is not selected by the second part.

Related

How can I locate items using xpath from below elements?

I've created some xpath expressions to locate the first item by it's "index" after "h4". However, I did something wrong that is why it doesn't work at all. I expect someone to take a look into it and give me a workaround.
I tried with:
//div[#id="schoolDetail"][1]/text() --For the Name
//div[#id="schoolDetail"]//br[0]/text() --For the PO Box
Elements within which items I would like the expression to locate is pasted below:
<div id="schoolDetail" style=""><h4>School Detail: Click here to go back to list</h4> GOLD DUST FLYING SERVICE, INC.<br>PO Box 75<br><br>TALLADEGA AL 36260<br> <br>Airport: TALLADEGA MUNICIPAL (ASN)<br>Manager: JEAN WAGNON<br>Phone: 2563620895<br>Email: golddustflyingse#bellsouth.net<br>Web: <br><br>View in AOPA Airports (Opens in new tab) <br><br></div>
By the way, the resulting values should be:
GOLD DUST FLYING SERVICE, INC.
PO Box 75
Try to locate required text nodes by appropriate index:
//div[#id="schoolDetail"]/text()[1] // For "GOLD DUST FLYING SERVICE, INC."
//div[#id="schoolDetail"]/text()[2] // For "PO Box 75"
Locator to get both elements:
//*[#id='schoolDetail']/text()[position()<3]
Explanation:
[x] - xPath could sort values using predicate in square brackets.
x - could be integer, in this case it will automatically be compared with element's position in this way [position()=x]:
//div[2] - searches for 2nd div, similar to div[position()=2]
In case predicate [x] is not an integer - it will be automatically converted to boolean value and will return only elements, where result of x is true, for example:
div[position() <= 4] - search for first four div elements, as 4 <= 4, but on the 5th and above element position will be more than 4
Important: please check following locators on this page:
https://www.w3schools.com/tags/ref_httpmessages.asp
//table//tr[1] - will return every 1st row in each table ! (12 found
elements, same as tables on the page)
(//table//tr)[1] - will return 1st row in the first found table (1 found element)

Xquery not returning desired values

I am trying to return a certain set of values however the query is not quite returning what I would like. I would like to return records by the author "Hennie J. Steenhagen" grouped by year. However what it is returning is records grouped by year if it’s of the same year as one of Hennies records. Not only Hennies.
For example, if we have the record <www><author>Hennie*</author><year>1990</year></www> and <www><author>Derpie</author><year>1990></year></www> the query will return both records grouped in the year 1990, I would only like Hennies to be returned.
for $y in /*/*/year where $y/../author ="Hennie J. Steenhagen" return <year-Pub>{$y}{/*/*[year = $y]}</year-Pub>
Your question is quite difficult to understand because your XPath addresses a larger XML node tree than the example XML you have provided. However for the example I will assume that your records are named record. Also your output of your XPath does not make a lot of sense to me, but I will assume that you know what you want!
Given the XML:
<record>
<www>
<author>Hennie J. Steenhagen</author>
<year>1990</year>
</www>
and
<www>
<author>Derpie</author>
<year>1990></year>
</www>
</record>
If you have an XQuery 3.0 processor, you could use the following:
/record/www[author = "Hennie J. Steenhagen"] ! <year-Pub>{year}{.}</year-Pub>
If you only have access to an XQuery 1.0 processor, then you could fall-back to the following:
for $w in /record/www[author = "Hennie J. Steenhagen"]
return
<year-Pub>{$w/year}{$w}</year-Pub>
Both of my examples only use a single predicate which will only filter the data once. Whereas your self-found solution uses both a predicate and a where expression, and so has to filter the data twice.
Fixed it,
for $y in /*/*/year where $y/../author ="Hennie J. Steenhagen" and /*/*[year=$y] return <year-Pub>{$y/../*}</year-Pub>
Thanks for any one whom spend their time looking.

How to reconcile two collections

I have 2 collections of the same type elements. Let's call those elements lettersOfAlphabet. Each letter has ID as int and Name as string.
So
ID Letter
0 A
1 B
2 C
3 D
and so on.
First collection contains the alphabet, second contains selected letters.
In first step I create new selectedLettersCollection and this is easy. I simply add element from alphabet collection to selectedLettersColletion and immediately renove it from source collection (alphabet). So let's say I created collection of first 5 letters of the alphabet and saved it to SQL table. Now the alphabet collection starts at F and contains letters through Z, selectedlettersCollection contains letters A,B,C,D and E.
Now let's say I want to remove letter C form selectedLettersCollection and move it back to alphabet and get letter G from alphabet and move it to selectedLettersColletion.
What is the most efficient way to perform this opetation in LINQ, generic collections and/or T-SQL?
I would certainly create a new temporary collection of selected items and load selected elements into it. I would then perform my add remove operations. But so far the only thing that comes to my mind to reconcile those collections would be to iterate through selectedLettersColletion and new temporary collection and move elements accordingly but I was wondering if there is a method that would not require iteration akin TSQL's joins looking for NULLs.
I'll use A and B for the table names.
To move the row with Id = 1 from table A to table B use the following:
DELETE A
OUTPUT deleted.Id, deleted.Letter INTO B
WHERE Id = 1;

Calculations inside a repeater (new repeat) control

A form in a Orbeon form builder contains a repeater control(new repeat).Suppose there are three text controls on each row(or repeat) of a repeater control(new repeat).first two text controls on each row contains numeric values.I want to bring the product of first two text controls to the third text control at run time without any event.there will be multiple numbers of repeat in the runtime ,i.e the row may increase but for each the calculation much reflect at runtime and for each row product of first two must be viewed on the third one
I used the following codes :
if ($quantity castable as xs:double and $price castable as xs:double)
then $quantity * $price
else 'n/a'
Its ok with this xpath expression when there is only one row in the repeater control.But on adding new rows ,i.e on increasing the repeat at run time, all results in the controls of third column changes to the else value ("n/a"). This is working only for a single row of a repeater control(new repeat). Because for every repeat the value must be calculated for each row separately.
Assume this is your node which repeats for each row
<repeater>
<quantity></quantity>
<price></price>
<product></product>
</repeater>
the Xpath expression for calculating the product would be
if(../quantity castable as xs:double and ../price castable as xs:double)
then ../quantity * ../price
else 'N/A'
This expression when used in calculate for the <product> node results the product on each row and there is no event based action required since this is written on the bind definition of the node.
Hope this answers to all your questions

XPath 2.0: Finding number of distinct elements before first element with current node's value

Setup: I am using XPath 2.0. But inside Altova Stylevision, see my comment later on.
I have got the following XML structure:
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>123</blub>
</bla>
<bla>
<blub>234</blub>
</bla>
<bla>
<blub>123</blub>
<!--I want to find the number of distinct elements before the first occurance of a blub element with the same value as the current node - so for this node the result should be one (two times 222 before the first appearance of 123)-->
</bla>
</entries>
When parsing that I file I would like to know at each occurance of a blub: How many distinct values of blub's are there before the first occurance of a blub with the same value as the current node.
So basically first determining where the first occurance of a blub with the same value as the current node is, and then figuring out the number of distinct blubs before.
One of my problems is that Altova doesn't support the current() function. Quote: "Note that the current() function is an XSLT function, not an XPath function, and cannot therefore be used in StyleVision's Auto-Calculations and Conditional Templates. To select the current node in an expression use the for expression of XPath 2.0."
So any solution that could do without the current() function would be great ;)
Thanks all!
Stevo
If you need the first node with the same value, you can always start at the beginning and search it with /entries/bla[blub=string()][1]. (string without parameter should return the value of the current node)
And then you can insert it in your expression and get
count(distinct-values( /entries/bla[blub=string()][1]/preceding-sibling::bla/blub ))
And if you need it for all blubs you can count it for all of them:
for $x in /entries/bla/blub return count(distinct-values( /entries/bla[blub=string($x)][1]/preceding-sibling::bla/blub ))
edit: it might however be slow to perform, so many loops. If distinct-values in that Stylevision preserves the order of the elements, the number of elements before a value is the index of that a value in the distinct value sequence.
So you can the count for one node with index-of(distinct-values(/entries/bla/blub), string()) - 1 and the count for all nodes with
for $x in /entries/bla/blub return index-of(distinct-values(/entries/bla/blub), $x) - 1
And if it is possible to define new variables you could set $s to distinct-values(/entries/bla/blub) and simplify it to
for $x in /entries/bla/blub return index-of($s, $x) - 1

Resources