Sesame caching common queries - caching

I use Sesame in a JSP web based application and I would like to know if there is any way to cache some queries that are used consistently.

I assume that what you want to "cache" is the query result for a given query with specific value. You can very easily build such a cache yourself. Just create a class for the general query that internally keeps a reference to a HashMap that maps from a value key (e.g. the placeid for your example query) to a query result:
HashMap<URI, TupleQueryResult> cache = new HashMap<>();
Then all you do is check, for a given place id, if it is present in the cache. If it is not, you execute the query, get the result back and materialize it as a MutableTupleQueryResult which you can then put in that cache:
if (!cache.contains(placeId)) {
// reuse the prepared query with the specific binding for which we want a result
preparedQuery.setBinding("placeid", placeId);
// execute the query and add the result to a result object we can reuse multiple times
TupleQueryResult result = new MutableTupleQueryResult(preparedQuery.evaluate());
// put the result in the cache.
cache.put(placeId, result);
}
return cache.get(placeId);
If you want something a bit more sophisticated (e.g. something that throws out cached items after a certain time, or sets a size limit on your cache), I would have a look at using something like a Guava Cache instead of a simple HashMap, but the basic setup would remain the same.

Related

List firestore collection ids filtered by criteria

In Go package for firestore I can easily get list of IDs by doing something like
client.Collection("mycollection").DocumentRefs()
with query I can easily filter documents before I can iterate over them
client.Collection("mycollection").Where("x", "==", "y").Documents()
But Query seems to be missing an option to get just the .DocumentRefs() is there some way to get list of DocumentRefs matching specific query without actually fetching all the matching Documents (incuring read costs for each)?
The bottom line is that after I apply the filtering logic to get constrained list of doc IDs I want to run additional regex based filtering on the values of the IDs, and the list of filtered IDs is my final result, no need fr fetching docs.
Firestore queries always return the entire contents of every matching document. There are no "light" queries that just return document IDs or references. This is the case for all provided Firestore SDKs, not just go.
In general, it's advisable not to store data in the ID of a document for the purpose of filtering. Your use case will work better if you're able to precompute the conditions where a document should match, and put that data in a field of the document. It should be noted also that Firestore doesn't support regex type queries, as those do not scale massively as Firestore requires.

GraphQL + Relay Connection Optimization

Using Relay + GraphQL (graphql-relay-js) connections and trying to determine the best way to optimize queries to the data source etc.
Everything is working, though inefficient when connection results are sliced. In the below query example, the resolver on item will obtain 200+ records for sale 727506341339, when in reality we only need 1 to be returned.
I should note that in order to fulfill this request we actually make two db queries:
1. Obtain all items ids associated with a sale
2. Obtain item data for each item id.
In testing and reviewing of the graphql-relay-js src, it looks like the slice happens on the final connection resolver.
Is there a method provided, short of nesting connections or mutating the sliced results of connectionFromArray, that would allow us to slice the results provided to the connection (item ids) and then in the connection resolver fetch the item details against the already sliced id result set? This would optimize the second query so we would only need to query for 1 items details, not all items...
Obviously we can implement something custom or nest connections, though it seems this is something that would be avail, thus I feel like I am missing something here...
Example Query:
query ItemBySaleQuery {
viewer {
item (sale: 727506341339) {
items (first:1){
edges {
node {
dateDisplay,
title
}
}
}
}
}
}
Unfortunately the solution is not documented in the graphql-relay-js lib...
Connections can use resolveNode functions to work directly on an edge node. Example: https://github.com/graphql/graphql-relay-js/blob/997e06993ed04bfc38ef4809a645d12c27c321b8/src/connection/tests/connection.js#L64

Sorting by a non-key (arbitrary) field in CouchDB

I have a fairly large CouchDB database (approximately 3 million documents). I have various view functions returning slices of the data that can't be modified (or at least, should only be modified as a last resort).
I need the ability to sort on an arbitrary field for reporting purposes. For smaller DBs, I return the entire object, json_parse it in our PHP backend, then sort there. However, we're often getting Out Of Memory errors when doing this on our largest DBs.
After some research, I'm leaning towards accessing a sort key (via URL parameter) in a list function and doing the sort there. This is an idea I've stolen from here. Excerpt:
function(head, req) {
var row
var rows=[]
while(row = getRow()) {
rows.push(row)
}
rows.sort(function(a,b) {
return b.value-a.value
})
send(JSON.stringify({"rows" : rows}))
}
It seems to be working for smaller DBs, but it still needs a lot of work to be production ready.
Is this:
a) a good solution?
b) going to work with 3, 5, or 10 million rows?
You can't avoid loading everything into memory by using a list function. So with enough data, eventually, you'll get an out of memory error, just as you're getting with PHP.
If you can live within the memory constrains, it's a reasonable solution, with some advantages.
Otherwise, investigate using something like lucene, elasticsearch, or Cloudant Search (clouseau & dreyfus).
In our environment, we have more than 5 million records. The couch is design such that each and every Document has some specific fields which distinguish it from the other category of documents.
For example, there are number documents with field DocumentType "USer" or DocumentType "XXX"
These DocumentType field allow us to sort various document based on different categories.
So if you have 3 Million doc, and you have around 10 categories so each category will have about 300k Docs.
Now you can design system such that you always pass the DocId you need to be passed to Couch. In that way it will be faster.
so query can be like
function(doc)
{
if(doc.DocumentType=== 'XXX' && doc._id) {emit(doc.FieldYouWant, doc._id)}
}
This is how our backhand is designed in production.

Passing parameters to a couchbase view

I'm looking to search for a particular JSON document in a bucket and I don't know its document ID, all I know is the value of one of the sub-keys. I've looked through the API documentation but still confused when it comes to my particular use case:
In mongo I can do a dynamic query like:
bucket.get({ "name" : "some-arbritrary-name-here" })
With couchbase I'm under the impression that you need to create an index (for example on the name property) and use startKey / endKey but this feels wrong - could you still end up with multiple documents being returned? Would be nice to be able to pass a parameter to the view that an exact match could be performed on. Also how would we handle multi-dimensional searches? i.e. name and category.
I'd like to do as much of the filtering as possible on the couchbase instance and ideally narrow it down to one record rather than having to filter when it comes back to the App Tier. Something like passing a dynamic value to the mapping function and only emitting documents that match.
I know you can use LINQ with couchbase to filter but if I've read the docs correctly this filtering is still done client-side but at least if we could narrow down the returned dataset to a sensible subset, client-side filtering wouldn't be such a big deal.
Cheers
So you are correct on one point, you need to create a view (an index indeed) to be able to query on on the content of the JSON document.
So in you case you have to create a view with this kind of code:
function (doc, meta) {
if (doc.type == "youtype") { // just a good practice to type the doc
emit(doc.name);
}
}
So this will create a index - distributed on all the nodes of your cluster - that you can now use in your application. You can point to a specific value using the "key" parameter

Lucene filter with docIds

I'm trying to do the following: I want to create a set of candidates by querying each field separately and then adding the top k matches to this set. After I'm done with that, I need to run another query on this candidate set.
The way how I implemented it right now is using a QueryWrapperFilter with a BooleanQuery that matches the unique id field of each candidate document. However, this means I have to call IndexSearcher.doc().get("docId") for each candidate document before I can add it to my BooleanQuery, which is the major bottleneck. I'm only loading the docId field via MapFieldSelector("docId).
I wanted to create my own Filter class, but I can't use the internal Lucene doc ids directly, because they are specified per segment. Any thoughts on how to approach this?
Instead of reading the stored docId, index the field (it probably already is) and use the FieldCache to retrieve docIds much faster. Then instead of using the docIds in a BooleanQuery, try using a TermsFilter or FieldCacheTermsFilter. The latter documentation describes the performance trade-offs.

Resources