Efficiently grouping elements that exists in both documents (inner join) in Xquery - xpath

I have the following data:
<Subjects>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Subjects>
and:
<Courses>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Courses>
I wish to efficiently get elements from both documents that share the share the same Ids.
I want to get the result like this:
<Results>
<Result>
<Table1>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Table2>
</Result>
</Results>
So far I have 2 solutions:
<Results>
{
for $e2 in $t2/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := $t1/Subject[Id = $foriegnId]
where $e1
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
</Results>
and the otherway round:
<Results>
{
for $e1 in $t1/Subject
let $id := $e1/Id
group by $id
let $e2 := $t2/Course[SubjectId = $id]
where $e2
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
</Results>
Is there a more efficient way of doing this?
Perhaps taking advantages of multiple groups?
Update
A major issue with my code at the moment is that it's performance is highly dependent on which table is bigger. For example the 1st solution is better in cases where the 2nd table is bigger and vice versa.

The solution you have looks reasonable to me. It will perform siginificantly better on a processor like Saxon-EE that does join optimization than on one (like Saxon-HE) that doesn't. If you want to hand-optimize it, your simplest approach is to switch to using XSLT: use the key() function to replace the filter expression $t1/Subject[Id = $foriegnId] which, in the absence of optimization, searches your second file once for each element selected in the first file.

Related

Select Xpath element value based on value of another element

I am trying to capture a value using XPath based on value of a different field.
Example XML:
<?xml version="1.0" encoding="UTF-8" ?>
<employees>
<employee>
<id>1</id>
<firstName>Tom</firstName>
<lastName>Cruise</lastName>
<photo>https://jsonformatter.org/img/tom-cruise.jpg</photo>
</employee>
<employee>
<id>2</id>
<firstName>Maria</firstName>
<lastName>Sharapova</lastName>
<photo>https://jsonformatter.org/img/Maria-Sharapova.jpg</photo>
</employee>
<employee>
<id>3</id>
<firstName>Robert</firstName>
<lastName>Downey Jr.</lastName>
<photo>https://jsonformatter.org/img/Robert-Downey-Jr.jpg</photo>
</employee>
</employees>
I am trying to get Xpath expression for value in the firstName field, when id value is 3.
You can locate parent node based on the known child node and then find the desired child node of that parent, as following:
//employee[./id='3']/firstName
the expression above will give the desired firstName node itself.
To retrieve it's text value this can be used:
//employee[./id='3']/firstName/text()

How to compare element position in xpath

I am trying to compare customer account values to display only different values and ignore duplicate in XPath:
XML code:
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
The result should compare customer 1/2/3 and display:
customer 1
customer 2
customer 3
You can achieve this with the XPath-2.0 expression
for $c in distinct-values(/info/Customer/#CustAccount) return concat('customer ',$c,'
')
Output is:
customer 1
customer 2
customer 3
If you do not like the newlines, remove the
from the expression.
There is no pure XPath-1.0 expression achieving this; you could only do this with XSLT-1.0 if XPath-2.0 is unavailable.
Here is the pure xpath 1.0 solution.
Sample xml:
<root >
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
</root>
xpath 1.0:
/root/info/Customer[not(./#CustAccount=preceding::Customer/#CustAccount)]
Evidence:

students whose majors are not listed?

<univ>
<student>
<id>1</id>
<major>cs</major>
</student>
<student>
<id>2</id>
</student>
....
</univ>
How to find student ID's not having any major?
So the output should be 2.
Guess the one mentioned is wrong.
/univ/student[not(major)]/id/text()

Linq to XML - get elements that have certain child element

Using LINQ to XML, how do I get a collection of all elements that have a named child element.
for example;
<root>
<Garage>
<Car id="001">
<Price PaymentType="Cash">$100</Price>
</Car>
<Car id="002">
<Price PaymentType="Cash">$200</Price>
</Car>
<Car id="003">
</Car>
</Garage>
</root>
this will return 2 Car elements (#1 and #2) as they have the Price element. It won't return Car #3, as it doesn't have a price element.
thanks as always
Assuming you have an XDocument object named doc with your example xml loaded into it. You could try something like this.
IEnumerable<XElement> elements = doc.Descendants("Garage").Elements().Where(e => e.Elements().Any());

Double node on Xpath for different values

How to write a Xpath for two attributes? e.g. i need to get a value of discount > 20% and also the same discount is greater than amount 200(without any link to base value)
You can combine constraints in predicates. E.g.:
from lxml import etree
doc = etree.XML("""<xml>
<items>
<item discount_perc="25" discount_value="250">Something</item>
</items>
</xml>
""")
doc.xpath('items/item[#discount_perc > 20 and #discount_value > 200]')
Will try to answer by a simple example. Imagine you have the following xml:
<?xml version="1.0"?>
<data>
<node value="10" weight="1">foo</node>
<node value="10" weight="2">bar</node>
</data>
Then use this query to select the first <node>'s text:
//node[#value="10" and #weight="1"]/text()
and this for the second:
//node[#value="10" and #weight="2"]/text()
Hope this helps.

Resources