XPath: How to check multiple attributes across similar nodes - xpath

If I have some xml like:
<root>
<customers>
<customer firstname="Joe" lastname="Bloggs" description="Member of the Bloggs family"/>
<customer firstname="Joe" lastname="Soap" description="Member of the Soap family"/>
<customer firstname="Fred" lastname="Bloggs" description="Member of the Bloggs family"/>
<customer firstname="Jane" lastname="Bloggs" description="Is a member of the Bloggs family"/>
</customers>
</root>
How do I get, in pure XPath - not XSLT - an xpath expression that detects rows where lastname is the same, but has a different description? So it would pull the last node above?

How do I get, in pure XPath - not XSLT
- an xpath expression that detects rows where lastname is the same, but
has a different description?
Here's how to do this with a single XPath expression:
"/*/*/customer
[#lastname='Bloggs'
and
not(#description
= preceding-sibling::*[#lastname='Bloggs']/#description
)
]"
This expression selects all <customer> elements with attribute lastname equal to "Bloggs" and different value of the attribute description.
The selected nodes are:
<customer firstname="Joe" lastname="Bloggs" description="Member of the Bloggs family"/>
<customer firstname="Jane" lastname="Bloggs" description="Is a member of the Bloggs family"/>

/root/customers/customer[#lastname='Bloggs'
and not(#description = preceding-sibling::*[#lastname='Bloggs']/#description)
and not(#description = following-sibling::*[#lastname='Bloggs']/#description)]
It would perform better doing it in steps, though.

Related

How to compare element position in xpath

I am trying to compare customer account values to display only different values and ignore duplicate in XPath:
XML code:
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
The result should compare customer 1/2/3 and display:
customer 1
customer 2
customer 3
You can achieve this with the XPath-2.0 expression
for $c in distinct-values(/info/Customer/#CustAccount) return concat('customer ',$c,'
')
Output is:
customer 1
customer 2
customer 3
If you do not like the newlines, remove the
from the expression.
There is no pure XPath-1.0 expression achieving this; you could only do this with XSLT-1.0 if XPath-2.0 is unavailable.
Here is the pure xpath 1.0 solution.
Sample xml:
<root >
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
</root>
xpath 1.0:
/root/info/Customer[not(./#CustAccount=preceding::Customer/#CustAccount)]
Evidence:

Efficiently grouping elements that exists in both documents (inner join) in Xquery

I have the following data:
<Subjects>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Subjects>
and:
<Courses>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Courses>
I wish to efficiently get elements from both documents that share the share the same Ids.
I want to get the result like this:
<Results>
<Result>
<Table1>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Table2>
</Result>
</Results>
So far I have 2 solutions:
<Results>
{
for $e2 in $t2/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := $t1/Subject[Id = $foriegnId]
where $e1
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
</Results>
and the otherway round:
<Results>
{
for $e1 in $t1/Subject
let $id := $e1/Id
group by $id
let $e2 := $t2/Course[SubjectId = $id]
where $e2
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
</Results>
Is there a more efficient way of doing this?
Perhaps taking advantages of multiple groups?
Update
A major issue with my code at the moment is that it's performance is highly dependent on which table is bigger. For example the 1st solution is better in cases where the 2nd table is bigger and vice versa.
The solution you have looks reasonable to me. It will perform siginificantly better on a processor like Saxon-EE that does join optimization than on one (like Saxon-HE) that doesn't. If you want to hand-optimize it, your simplest approach is to switch to using XSLT: use the key() function to replace the filter expression $t1/Subject[Id = $foriegnId] which, in the absence of optimization, searches your second file once for each element selected in the first file.

How to find same elements with xpath

With the next xml, how coud i get the list of directors where two directors has the same LastName in one movie?
<MoviesLib>
<Movie Title="Batman" Year="2013">
<Directors>
<Director>
<Name>Robert</Name>
<LastName>Zemeckis</LastName>
</Director>
</Directors>
</Movie>
<Movie Title="Gru" Year="2012">
<Directors>
<Director>
<Name>john</Name>
<LastName>tailer</LastName>
</Director>
<Director>
<Name>Emma</Name>
<LastName>Smith</LastName>
</Director>
<Director>
<Name>Lana</Name>
<LastName>Smith</LastName>
</Director>
</Directors>
</Movie>
</MoviesLib>
for example in this case would be: Emma Smith, Lana Smith
thanks
The following XPath 2.0 expression should work:
for $d in //Director
return $d[../Director[not(. is $d) and LastName = $d/LastName]]
I can't come up with a single XPath 1.0 expression since it doesn't support for expressions (see the question How to get the context of outer predicate? for some background).

Sorting XPath results in the same order as multiple select parameters

I have an XML document as follows:
<objects>
<object uid="0" />
<object uid="1" />
<object uid="2" />
</objects>
I can select multiple elements using the following query:
doc.xpath("//object[#uid=2 or #uid=0 or #uid=1]")
But this returns the elements in the same order they're declared in the XML document (uid=0, uid=1, uid=2) and I want the results in the same order as I perform the XPath query (uid=2, uid=0, uid=1).
I'm unsure if this is possible with XPath alone, and have looked into XSLT sorting, but I haven't found an example that explains how I could achieve this.
I'm working in Ruby with the Nokogiri library.
There is no way in XPath 1.0 to specify the order of the selected nodes.
XPath 2.0 allows a sequence of nodes with any specific order:
//object[#uid=2], //object[#uid=1]
evaluates to a sequence in which all object items with #uid=2 precede all object items with #uid=1
If one doesn't have anXPath 2.0 engine available, it is still possible to use XSLT in order to output nodes in any desired order.
In this specific case the sequence of the following XSLT instructions:
<xsl:copy-of select="//object[#uid=2]"/>
<xsl:copy-of select="//object[#uid=1]"/>
produces the desired output:
<object uid="2" /><object uid="1" />
I am assuming you are using XPath 1.0. The W3C spec says:
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
* node-set (an unordered collection of nodes without duplicates)
* boolean (true or false)
* number (a floating-point number)
* string (a sequence of UCS characters)
So I don't think you can re-order simply using XPath. (The rest of the spec defines document order and reverse document order, so if the latter does what you want you can get it using the appropriate axis (e.g. preceding).
In XSLT you can use <xsl:sort> using the name() of the attribute. The XSLT FAQ is very good and you should find an answer there.
An XSLT example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="pSequence" select="'2 1'"/>
<xsl:template match="objects">
<xsl:for-each select="object[contains(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))]">
<xsl:sort select="substring-before(concat(' ',$pSequence,' '),
concat(' ',#uid,' '))"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<object uid="2" /><object uid="1" />
I don't think there is a way to do it in xpath but if you wish to switch to XSLT you can use the xsl:sort tag:
<xsl:for-each select="//object[#uid=1 or #uid=2]">
<xsl:sort: select="#uid" data-type="number" />
{insert new logic here}
</xsl:for-each>
more complete info here:
http://www.w3schools.com/xsl/el_sort.asp
This is how I'd do it in Nokogiri:
require 'nokogiri'
xml = '<objects><object uid="0" /><object uid="1" /><object uid="2" /></objects>'
doc = Nokogiri::XML(xml)
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort_by { |n| n['uid'].to_i }.reverse
puts objects_by_uid
Running that outputs:
<object uid="2"/>
<object uid="1"/>
An alternative to the search would be:
objects_by_uid = doc.search('//object[#uid="2" or #uid="1"]').sort { |a,b| b['uid'].to_i <=> a['uid'].to_i }
if you don't like using sort_by with the reverse.
XPath is useful for locating and retrieving the nodes but often the filtering we want to do gets too convoluted in the accessor so I let the language do it, whether it's Ruby, Perl or Python. Where I put the filtering logic is based on how big the XML data set is and whether there are a lot of different uid values I'll want to grab. Sometimes letting the XPath engine do the heavy lifting makes sense, other times its easier to let XPath grab all the object nodes and filter in the calling language.

Select top 10 events from wevtutil using xpath

I am currently working on a project that uses the Windows event log. I am using wevtutil to get the results from the event logs. I know that wevtutil supports xpath queries, but since I'm new to xpath I don't know that I can achieve what I'm trying to do.
In SQL, what I would be doing is something like this:
SELECT log.*, COUNT(1) numHits
FROM Application log
GROUP BY Source, Task, Level, Description
ORDER BY numHits DESC
LIMIT 10
Is it possible to do such a thing using xpath?
Edit: Here is a sample Event:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='MSSQL$SQLEXPRESS' />
<EventID Qualifiers='16384'>17403</EventID>
<Level>4</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime='2010-10-20T20:06:18.000Z' />
<EventRecordID>9448</EventRecordID>
<Channel>Application</Channel>
<Computer>SHAZTOP</Computer>
<Security />
</System>
<EventData>
<Data>73094</Data>
<Binary>
FB4300000A000000130000005300480041005A0054004F0050005C00530051004C004500580050005200450053005300000000000000</Binary>
</EventData>
</Event>
XPath 1.0 has four data types: string, number, boolean and node set.
The only XPath ordering criteria is document order (in the given axis direction). That is how you can limit any result node set as #Dimitre and #Welbog have sugested with fn:position().
But, there is no specification that an XPath engine must provide a node set result in any given order. So, you can't sort nor grouping in XPath 1.0. You can select the firsts of each group, but not efficiently. As example:
//Event[not(System/Level = preceding::Level) or
not(System/Task = preceding::Task)]
XPath 2.0 has the sequence data type. A sequence has the exclicit order of construction. So, you can group. As example:
for $event (//Event)[index-of(//Event/System/concat(Level,'++',Task),
System/concat(Level,'++',Task))[1]]
result //Event[System/Level = $event/System/Level]
[System/Task = $event/System/Task]
But, because XPath 2.0 has not built-in sorting nor recursion mechanism (you could provide an extension function...) you can't sort.
For that you need a language with built-in sorting or a way to express its algorithm. Both XSLT (1.0 or 2.0) and XQuery have these features.
In SQL, what I would be doing is
something like this:
SELECT log.*, COUNT(1) numHits
FROM Application log
GROUP BY Source, Task, Level, Description
ORDER BY numHits DESC
LIMIT 10
Is it possible to do such a thing
using xpath?
In case no sorting is necessary, one can get the first $n nodes selected by any XPath expression by:
(ExpressionSelectingNodeSet)[not(position() > $n)]
where $n can be substituted by a specific number
If there is a requirement that the nodes be sorted on one or more sort-keys, then this is not possible pure XPath, but one can easily perform such tasks with XSLT, using the <xsl:sort> instruction and the XPath position() function:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<nums>
<xsl:for-each select="num">
<xsl:sort data-type="number" order="descending"/>
<xsl:if test="not(position() > 5)">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
</nums>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>010</num>
</nums>
the correct result, containing only the top 5 numbers is produced:
<nums>
<num>010</num>
<num>09</num>
<num>08</num>
<num>07</num>
<num>06</num>
</nums>
You can use the position() function to limit the results you're getting:
/root/element[position()<=10]
For example, that would select the first ten element elements which are children of the root.
If your structure is more complicated, you can use the position element in different places. For example, if the element element can exist in more than one parent, but you want the first ten of them regardless of parent, you can do it this way:
(/root/parent1/element | /root/parent2/element)[position()<=10]

Resources