I've got some XML documents which conform to a known schema which include geometries in GML format.
I'm looking to perform validation on the XML using XSD and Schematron validation, but I'll need some way of performing spatial queries within the Xpath language (I presume via extension functions).
I was wondering if anyone is aware of a standard for implementation I can use, or indeed if someone has already done this - I've come up empty on google.
As an example (representative only, only attempting to demonstrate the xpath part of the question (which is the question really - the fact I'm aiming to use it in schematron is moot))
My XML:
<Things>
<Thing type="A">
<Geometry>...GML...</Geometry>
</Thing>
<Thing type="B">
<Geometry>...GML...</Geometry>
</Thing>
</Things>
Xpath to return things of type A which spatially intersect with things of type B (again, I'm making up a function extension namespace and a (pretty dumb) function to give an example of what I'm trying to accomplish):
/Things/Thing[#type='A' and geo:has-intersection(Geometry, /Things/Thing[#type='B']/Geometry)]
As this seems somewhere between development and GIS, I've cross posted on GIS and StackOverflow.
The EXPath Geo Module defines functions on simple OGC geometries. I believe there are several implementations but the only one I'm familiar with is BaseX.
Related
When I evaluate this XPath expression: //superhero[n0:name="Superman"]/n1:name on this xml:
<n0:rootElement xmlns:n0='http://example.com' xmlns:n1='http://example.com'>
<superheroes>
<superhero>
<n0:name>Superman</n0:name>
<n1:name>Clark</n1:name>
</superhero>
<superhero>
<n0:name>Spiderman</n0:name>
<n1:name>Peter</n1:name>
</superhero>
</superheroes>
</n0:rootElement>
using an XPath evaluator, I get the expected result.
But when I send it to an XQuery processor, I get an error message saying that
Namespace prefix 'n0' has not been declared. Weird, huh?
It's always the prefix in the brackets (is it called a filter, maybe?) that gets the complaint.
I've used http://www.xpathtester.com to verify the difference between XPath and XQuery interpretations.
It works fine with https://codebeautify.org/Xpath-Tester which is XPath only.
If I replace n0: or n1: with *: it works in for XQuery processors, but not for XPath testers.
This is of course a toy example I've written up to clarify my issue. In production I'm calling an external service which I believe is driven by Saxon-HE. I know it accepts XQuery so I'm guessing it is in "XQuery-mode" for XPath expressions.
There isn't much I can do to the xml file since I receive it from another source. Is there a better XQuery expression I can use?
Is this a bug, or by design?
Different XPath engines provide different ways of binding the namespace prefixes used in the expression. Some, I believe, pick up the namespace bindings from the source document. So it's not a non-conformance with the standard, it's the fact that the standard leaves it up to the particular processor how the original context is established.
The underlying problem is that you probably want your query to work regardless what namespace prefixes are used in the source document. Picking up the namespace bindings from the source document is handy for ad-hoc queries, but it means that a query that does the right thing with one document will fail with a different one.
In XQuery you can declare any namespaces you want to use in your query:
declare namespace n0 = 'http://example.com';
declare namespace n1 = 'http://example.com';
//superhero[n0:name="Superman"]/n1:name
https://xqueryfiddle.liberty-development.net/bdxZ8S
See the spec at https://www.w3.org/TR/xquery-31/#id-namespace-declaration
I'm looking for a way to define queries on sets independently from a programming language or the kind of sets.
In detail this would be a language definition and implementations for common languages like Java, C++, Python etc.
As commented I'm not looking for a database or any implementation of a set-representation but only a way to define a query for elements from e.g. a std::set/vector a Python set() or any linear structure which can be seen as a set.
A close example would be something like jLinq but without being tied to JSON or javascript and with a well defined string representation.
Of course without knowing the kind of data structure you would have to implement any conditional filter for every problem and every programming language, but the way you construct query strings and how you evaluate them would be clear and you would not have to write parsers.
So what I'd like to write in Java or C++ is something like
q = query()
.created_after("14.03.2010")
.and(contains("hello")
.or(contains("hallo")))
.sort("caption")
or written as a string:
"(created_after("14.03.2010") and ( contains("hello") or contains("hallo"))) sort("caption")"
(this is not thought through - just to show what an interface could look like)
A good example for a different problem would be JSON or XML: clear language definition and parsers/tools for any platform or programming language.
I know this is an old question, but I think I know what you mean and I was actually looking for something similar. What you need is a "search query parser".
I found search-query-parser for nodejs (I'm not the author). Haven't tried it yet but looks promising.The example in the docs is very illustraring, you would receive an input string from the UI
from:hi#retrace.io,foo#gmail.com to:me subject:vacations date:1/10/2013-15/04/2014 photos
And the library would parse it to a structured json object
{
from: ['hi#retrace.io', 'foo#gmail.com'],
to: 'me',
subject: 'vacations',
date: {
from: '1/10/2013',
to: '15/04/2014'
},
text: 'photos'
}
And the from that object you could construct and issue a query command to your database. As you can see it handles lists and ranges. Right away I can't see any boolean operator (AND,OR) but I guess could be easily implemented.
Hope this helps.
RSQL is a good option these days. There are plenty of parsers available and the queries are URL friendly.
I'm working on writing a parser for a specific XML based document, which has a lot of rules and complicated interface.
I was going to write the parser in Ruby to parse it to JSON. Then realized, a lot of other people who use different languages like to use it. So I'm thinking of somehow creating a central rule system, where each language can wrap it and create it's own parser.
Any idea how to go about it?
It's unlikely to be productive for you to write your own XML parser from scratch.
As you anticipated, there has indeed been a need for parsing XML in every major language. You can likely find libraries that implement multiple parsing models in any language you need. Be aware of tree-based models such as DOM, stream-based models such as SAX, and pull-based models such as StAX. Also consider XML processing models above the parsing level: Declarative transformations (eg XSLT) and databinding (eg JAXB).
The "central rule system" you envision has also already been realized in schemas (eg, XSD, RelaxNG, Schematron, ...).
I googled, but I can't find a satisfactory answer. This SO question is related but kinda old as well as the exact opposite of what I am looking for: a way to do screen-scraping using XPath, not CSS selectors.
I've used enlive for some basic screen-scraping but sometimes one needs the power of XPath selectors. So here it is:
Is there any equivalent to Nokogiri or lxml for clojure (java)? What is the state of the "pure java Nokogiri"? Any way to use the library from clojure? Any better alternatives than this hack?
There are a couple of possibilities here.
Several of these require semi-well formed XML to work. If you don't have it, I would pair clj-tagsoup with hiccup to produce the XML (parse with clj-tag-soup, which produces a form that hiccup and write out as XML) and work with that.
First, just use the native JDK capabilities. Assuming the document is well formed enough, try using clj-xpath which provides a wrapper around the native JDK parsing.
If that doesn't suffice, consider taking a more Clojure data structure based route. A simpler path could just use the output of TagSoup and a combination of maps, filters, and nths.
If you need something more advanced, consider using zippers to provide structure around the data, making it easier to manipulate. Use clojure.xml/parse and clojure.zip/xml-zip to produce the zipper, and go from there. An example can be found at http://techbehindtech.com/2010/06/25/parsing-xml-in-clojure/.
Using the native structures is my preferred route for anything complicated, as you can bring the full power of the language to bear.
If you provide a sample of why you need XPath, I can provide some sample code.
So, just as a fun project, I decided I'd write my own XML parser. No, not to parse a specific document, and no, not using an XML parser library. I mean writing code to parse out any XML document into a usable data structure. Just because I like the challenge. :-)
With that said, so far it's proved to be... interesting. It's not as easy to parse (especially when you start taking into account special characters, CDATA, empty tags, comments, etc.) as it initially looked.
Are there any well documented XML parsing algorithms or explanations anywhere that anyone knows of? It seems like there are well-documented Queue and Stack and BTree and etc. etc. etc. implementations everywhere, but I'm not sure I've ever seen a simple, well-documented XML parser algorithm...
I repeat: I am not looking for a pre-built parser library! I am looking for information on how to create my own pre-built parser library! Do not tell me "use expat" or "use SAX" or whatever. That's not what I'm asking for.
Antlr offers a tutorial on parsing XML. It breaks the process down into phases: lexing, parsing, tree parsing, etc. Looks pretty interesting.
I don't know if it would be "cheating" in your book, but you could try parsing your XML with a ready-built all-purpose language parser like ANTLR. The result would be a list of tokens (if you just use the lexer) or a parse tree (if you include the parser) and you could then re-build the parse tree almost 1:1 into an XML structure.
Maybe. I haven't thought about the ways in which XML might be different from "normal" ANTLR fodder like programming languages, and whether you would be able to define a suitable grammar.
VTD-XML is probably the simplest parsing technique possible...
http://expat.sourceforge.net/
Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags). An introductory article on using Expat is available on xml.com.