I want to use saxon for xpath queries but i don't know to load multiple xml files.
I'm trying to use saxon with command line from windows
I read in saxon manual that i can use the command:
Query.exe -s:myDataFile.xml -q:myQueryFile -o:myOutputFile
but i don't know how to load multiple xml files and not just one
edit:
I have many xml files myDataFile1.xml, myDataFile2.xml, myDataFile3.xml ... and i want to run the query to alla these files So i want to load all all files and then query them (I don't want to query every single file and then concatenate the results)
Use the standard XPath 2.0 function collection().
The documentation for the Saxon-specific implementation of collection() is here.
You can use the standard XPath 2.x collection() function, as implemented in Saxon 9.x
The Saxon implementation allows a search pattern to be used in the string-Uri argument of the function, thus you may be able to specify after the path of the directory a pattern for any filename starting with report_ then having two other characters, then ending with .xml.
Example:
This XPath expression:
collection('file:///c:/?select=report_*.xml')
selects the document nodes of every XML document that resides in c:\ in a file with name starting with report_ then having a 0 or more characters, then ending with .xml.
Related
I'm embarking on a new project with eXist. We'll be storing a few hundred TEI XML documents that represent manuscripts. A number of things we want to capture are repetitve, mainly people and places. My colleague has asked the TEI community about strategies for representing what we want to capture and using XInclude had been suggested as a way of reducing duplication.
I've had a quick play with adding an XInclude into a document and the serialized XML does render the include XML file. However, the included text was missing from an XQuery. I notice in the eXist docs (http://exist-db.org/exist/apps/doc/xinclude.xml) that:
eXist-db expands XIncludes at serialization time, which means that the
query engine will see the XInclude tags before they are expanded. You
therefore cannot query across XIncludes - unless you create your own
code (e.g. an XQuery function) for it. We would certainly like to
support queries over xincluded content in the future though.
What is the best practice for querying files that use XInclude?
I'm wondering whether I should have a 'job' that serializes the source TEI XML files to expand the XIncludes and store these files in a separate collection? In that case, would file:serialize be the correct function for this task?
We are at the start of the project, so any advice appreciated.
Can you describe what kind of query you tried that was missing the text?
Generally, since the files referenced via XInclude are well-formed xml documents, you can use collections (folders) to organise your queries in exist-db. So instead of for $search in doc("mydoc.xml") you could for $search in collection('/app/mydata')/*
more elaborate answers would follow the attribute of the unexpanded xinclude statement in source document and find the matching element in the target, but its difficult to abstract that without a concrete MWE.
have you tried to create a temporary and expanded fragment in a let clause, and query that instead of the stored xml?
Beware of namespaces !
Hope this helps, and greetings to Sebastiaan.
I'm trying to find an alternative to a helpful piece of functionality provided by ant - the <modified> selector.
When specifying a set of files in ant, you can use the <modified> selector to only include files whose content has changed since the last time it was run.
The selector computes a value for a file, compares that to the value stored in a cache and selects the file if these two values differ.
Is there an existing way of doing this in bash? I don't want to use a full blown build tool or similar just to return a list of modified file paths.
I have a strange problem. I am generating an XSD to XSD mapping in MapForce and it is valid and producing output. However when the XSLT is utilized by our DataPower folks, they are saying the namespace prefixes in the XSLT are causing the code to not find the nodes in the incoming message.
For example in the XSLT, the select is:
<xsl:for-each select="ns0:costOrderHeaderLookupResponse/return/ns1:Order">
In the incoming message, the namespace prefix is as below:
*snip*
<return>
<ns2:Order BillToID="300850001000" DocumentType="0001"....*snip*>
However MapForce is generating the output just fine with no errors even with the namespace prefix difference.
The DataPower folks are requesting that instead of the namespace prefix I customize MapForce to output the nodes like this:
/*[local-name()='Order']
I read the MapForce documentation and googled for awhile but I am not finding a way to customize XSLT output like this. It is possible for C/Java/etc but I am not finding any help on changing how the XSLT is generated.
Create a filter in MapForce and use a boolean function (like core:logical functions:equal) to check to see if the local-name of the node in the select (costOrderHeaderLookupResponse/return/Order) has a local-name equal to a constant string with value Order. The function to check for the local-name should be in the xslt:xpath functions library as local-name.
The filter should replace your connection from the Orders node to whatever node it is mapped to in the second XSD.
To see how filters work (assuming you aren't already using one to get your select) view http://manual.altova.com/Mapforce/mapforcebasic/index.html?mfffilteringdata.htm
The XSLT transformation is done through dot net code using API provided by Saxon. I am using Saxon 9 home edition api. The XSLT version is 2.0 and generates xml output. The input file size is 123 KB.
The XSLT adds attributes to the input XML file depending on certain scenarios. There are total 7 modes used in this XLST. The value of attribute generated in one mode is used in another mode and hence multiple modes are used.
The output is correctly generated but it takes around 10 second to execute this XSLT. When same XSLT executed in 'Altova XMLSpy 2013', it took around 3-4 seconds.
Is there a way to further reduce this 10 second execution time? What could be the the cause for this much time for execution?
The XSLT is available at below link for download.
XSLT Link
Without having a source document to run this against (and therefore to make measurements) it's very hard to be definitive about where the inefficiencies are, but the most obvious at first glance is the weird testing of element names in patterns like:
match="*[name()='J' or name()='H' or name()='F' or name()='D' or name()='B' or name()='I' or name()='G' or name()='E' or name()='C' or name()='A' or name()='X' or name()='Y' or name()='O' or name()='K' or name()='L' or name()='M' or name()='N']
which in Saxon would be vastly more efficient if written the natural way as
match="J|H|F|D|B|I|G|E|C|A|X|Y|O|K|L|M|N"
It's also more likely to be correct that way, since comparing name() against a string is sensitive to the chosen prefix, and XSLT code really ought to work whatever namespace prefix the source document author has chosen.
The reason the latter is much more efficient is that Saxon organizes the source tree for rapid matching of elements by name (meaning namespace URI plus local name, excluding prefix). When you match by name in this way, the matching template rule can be found by a quick hash table lookup. If you use predicates that have to be evaluated by expanding the name (with prefix) as a string and comparing the string, not only does each comparison take longer, it can't be optimized with a hash lookup.
I need to write a XML Parser using Boost Property tree which can replace an existing MSXML DOM Parser. Basically my code should return the list of child nodes, number of child nodes etc. Can this be achieved using Property Tree? Eg. GetfirstChild(),selectNodes(),Getlength()etc.
I saw a lot of APIs related to Boost Property Tree, but the documentation seems to be bare minimum and confusing. As of now, I am able to parse the entire XML using BOOST_FOREACH. But the path to each node is hard coded which will not serve my purpose.
boost::property_tree can be used to parse XML and it's a tree so you can use as XML DOM substitution but the library is not intended to be fully fledged XML parser and it's not complaint with XML standard. For instance it can successfully parse non-wellformed xml input and it doesn't support some of XML features. So it's your choice - if you want simple interface to simple XML configuration then yes, you should use boost::property_tree