Complex SPARQL queries are laborious to construct, and are difficult to read. Is there is way to `include' sub-queries in SPARQL, such as using an "include queryX" line, that would save one from writing out repetitive simple queries in separate more complex queries each time?
The SPARQL FAQ mentions the following possibility:
A very limited form of subqueries can be accomplished with SPARQL engines that will perform HTTP GETs upon graphs named in FROM or FROM NAMED clauses by creating a URL consisting of an embedded SPARQL CONSTRUCT query submitted to a SPARQL endpoint and supplying this URL as part of the RDF dataset being queried. In practice, this technique is often inefficient and is subject to possible URL-maximum-length restrictions of the involved software.
The W3C wiki mentions some other possible workarounds as well.
Related
I've been using #skip and #include directives in a couple of my queries and while they work quite good for 'simply' customisable queries I'm looking for a solution supporting a highly customisable query. I'm talking about ~20 fields, each of them skipped/included by it's own individual flag. While passing 20 boolean arguments to the query and using #include(if: $the_flag) 20 times is theoretically possible I'm looking for a better way of doing that. Like passing a configuration object and including some query parts based on it's fields, or maybe merging the query from stitches based on each flag.
I've read about #graphql-tools/stitch but I'm not sure my solution will benefit this approach with ~20 stitches. Is there any tool or any easy way to create a highly customisable query based on multiple conditions on the fly?
Does AppSearch support cross distinct engine searching with the same query (where for eg two engines have a one to many relationship), such that the result set is a combination of both engines and with the data having filters applying to both datasets at the same time?
If this is supported, how would I write a query to do this, and are there special requirements regarding the data structure in the engines?
Or is there perhaps another way to structure data such that the second engine is not necessary but still allows the additional data to still be queryable?
I am currently working on a project where I have to retrieve some rows from the database based on some filters (I also have to paginate them).
My solution was to make a function that generates the queries and to query the database directly (it works and it's fast)
When I presented this solution to the senior programmer he told me this is going to work but it's not a long-term solution and I should rather use Spring Specifications.
Now here comes my questions :
Why is Spring Specifications better than generating a query?
Is a query generated by Spring Specifications faster than a normal query?
Is it that big of a deal to use hard-coded queries ?
Is there a better approach to this problem ?
I have to mention that the tables in the database don't store a lot of data, the biggest one (which will be queried the least) has around 134.000 rows after 1 year since the application was launched.
The tables have indexes on the rows that we will use to filter.
A "function that generates the queries" sounds like building query strings by concatenating smaller parts based on conditions. Even presuming this is a JPQL query string and not a native SQL string that would be DB dependent, there are several problems:
you lose the IDEs help if you ever refactor your entities
not easy to modularize and reuse parts of the query generation logic (eg. if you want to extract a method that adds the same conditions to a bunch of different queries with different joins and aliases for the tables)
easy to break the syntax of the query by a typo (eg. "a=b" + "and c=d")
more difficult to debug
if your queries are native SQL then you also become dependent on a database (eg. maybe you want your integration tests to run on an in-memory DB while the production code is on a regular DB)
if in your project all the queries are generated in a way but yours is generated in a different way (without a good reason) then maintenance of the will be more difficult
JPA frameworks generate optimized queries for most common use cases, so generally speaking you'll get at least the same speed from a Specification query as you do from a native one. There are times when you need to write native SQL to further optimize a query but these are exceptional cases.
Yes, it's bad practice that makes maintenance a nightmare
I read GraphQL specs and could not find a way to avoid 1 + N * number_of_nested calls, am I missing something?
i.e. a query has a type client which has nested orders and addresses, if there are 10 clients it will do 1 call for the 10 clients + 10 calls for each client.orders + 10 calls for each client.addresses.
Is there a way to avoid this? Not that it is not the same as caching an UUID of something, those are all different values and if you GraphQL points to a database which can make joins, it would be pretty bad on it because you could do 3 queries for any number of clients.
I ask this because I wanted to integrate GraphQL with an API that can fetch nested resources in an efficient way and if there was a way to solve the whole graph before resolving it would be nice to try to put some nested stuff in just one call.
Or I got it wrong and GraphQL is meant to be used only with microservices?
This is one of the difficulties of GraphQL's "resolver architecture". You must avoid incurring a ton of network latency by doing a lot of I/O in each resolver. Apps using a SQL DBMS will often grapple with the N + 1 problem at first. You need to use some batching and/or caching techniques to get around this.
If you are using Node.js on the server, I have two tools to recommend:
DataLoader - A database-agnostic tool for batching resolvers for each field and caching individual records.
Join Monster - A SQL-tailored tool that reads each query and your schema and compiles a SQL query for you. It leverages JOINs and DataLoader-style batching to fetch the data from your tables in a few (or a single) SQL queries.
I consider, that you're talking about using GraphQL with SQL database backend. The standard itself is database agnostic, and it doesn't care, how are you going to work out the problems of possible N+1 SELECT issues in your code. That being said, the specific server-side implementations of GraphQL server introduce many different ways of mitigating that problem:
AFAIK, Ruby implementation is able to to make use of Active Record and gems such as bullet to apply horizontal batching of executed database calls.
JavaScript implementation may make use of DataLoader library, which have similar techinque of batching series of executed promises together. You can see it in action here.
Elixir and Python implementations have concept of runtime info about executed subqueries, that can be used to determine which data will be further needed in order to execute GraphQL query, and potentially prefetch it.
F# implementation works similar to Elixir, but plugin itself can perform live analysis of execution tree to better describe, which fields can be potentially used in code, allowing for easier split of GraphQL domain model from database model.
Many implementations (i.e. PostGraph) tie underlying database model directly into GraphQL schema. In this case GQL query is often translated directly into database query language.
I am a Java developer working with a MarkLogic database. A key function of my code is its capacity to dynamically generate 4-6 SPARQL queries and run them via HTTP GET requests. The results of each are added together and then returned. I now need these results sorted consistently.
Since I am paging the results of each query (using the LIMIT and OFFSET statements) each query has its own ORDER BY statement. Without embedding sorting into the queries the pages of results will be returned out of order.
However, each query returns its own results which are individually sorted and need to be merged into a single sorted list. My preference would to be an alphanumeric sort that considers characters before considering case and that sorts empty and null values to the end. (Example: “0123456789AaBbCc…WwXxYyZz ”)
I have already done this in my Java code using a custom compare method, but I recently ran into a problem: my results still aren’t returning sorted. The issue I’m having stems from the fact that my custom ordering scheme is completely separate from the one used by SPARQL, resulting in a decidedly unsorted set of results. While I have considered sorting the results from scratch before returning them instead of assuming MarkLogic is returning sorted results, this seems unnecessarily wasteful and it may not even fix my problem.
In my research I have not been able to find any way to set the Collation for SPARQL, nor have I found a way to write a custom Collation. The documentation on this page (https://www.w3.org/TR/rdf-sparql-query/#modOrderBy) specifically states that SPARQL’s ORDER BY is based on a comparison method driven by XPATH’s fn:compare. That function references this page (https://www.w3.org/TR/xpath-functions/#collations) which specifically mentions options for specifying the Collation as well as using alternative implementations of the of the Unicode Collation Algorithm. What I can’t find is anything detailing how to actually do this.
In short, is there any way for me to manipulate or control how a SPARQL query compares characters to affect the final order?
If I understand what you're asking, you want to use ORDER BY, OFFSET, and LIMIT to select which results you're going to show, and then you want another ORDER BY to determine the order in which you'll show those results (which might be different than the order that you used to select them). You can do that with a nested query:
select ?result {
{ select ?result where {
#-- ...
}
order by #-- ...
offset #-- ...
limit #-- ...
}
}
order by #-- ...
There's not a whole lot of support for custom orderings, but you can use functions in the order expressions, and you can provide multiple expressions to sort first by one thing, then by another. In your case, it looks like you might want to do something like order lcase(?value) to order case-insensitively. (That won't be perfect, of course. For instances, it's not clear to me whether you want numeric sort for numeric prefixes or not (e.g., should the order be 1, 10, 2, or 1, 2, 10).)
I just got a definitive answer from SPARQL implementers.
The SPARQL spec doesn't really address collations. MarkLogic uses unicode codepoint collation for SPARQL ordering.
HOWEVER, we need to know your requirements. MarkLogic as you know supports all kinds of collations, and that support is built into the code backing SPARQL -- we simply have not exposed an interface as to how to leverage collations from SPARQL.
MarkLogic is watching this thread, so feel free to make that request, perhaps with a suggestion of how you would consider accessing collations from the query, and we'll see it.
I contacted Kevin Morgan from MarkLogic about this, and he was extremely helpful. We had a WebEx meeting yesterday discussing various solutions to the problem and it went very well.
Their engineers confirmed that so far there is no means of forcing SPARQL to use a particular sorting order. They proposed two promising solutions to my problem:
• Embed your triples within your documents and leverage document searches and range indexes: While this works for multiple system designs, it does not work for ours. Sorting and Pagination fall under a product upgrade and we cannot require our clients to completely re-ingest their data so we can apply this new standard.
• Wrap your SPARQL queries within an XQuery statement: This approach uses SPARQL to determine the entire result set, and then utilizes a custom collation within the XQuery to handle sorting. Pagination is also handled in the XQuery (for the obvious reason that paginating before sorting breaks both).
The second solution seems like it will work for us, but I will need to look into the performance costs before we can seriously consider implementing it. Incidentally, I find it very odd that SPARQL’s sorting does not support collations when the XQuery functions it is built upon do. It seems illogical to assume that its users will never want to sort untagged literal values with anything other than the basic Unicode Codepoint sorting. At what point does it become reasonable for me to take something built upon XQuery and embed it within XQuery because it seems the creators “left something out?”