How to search on FHIR using complex nested queries - hl7-fhir

I haven't really found examples or instructions on how a complex nested query should look like when searching a FHIR resource.
Some examples (pseudo-code):
(name=Mary AND gender=female) OR (address-city=Springfield AND
address-state=NY)
((name=Mary AND gender=female) OR
(address-city=Springfield & address-state=NY)) AND active=true
Is that even possible? If yes, how?

FHIR supports quite an elaborate search syntax, but it isn't a query language. The searches you want cannot be done in 1 go with this, unless you have access to the server and can implement the queries on that yourself.
If you have access/influence server side, you can implement a named query, and then use the _query search parameter to execute that (see http://www.hl7.org/fhir/search.html#query).
If you don't have that access, you can perform your queries in a couple of steps. For example your first one would take 2 queries:
GET [fhir endpoint]/Patient?name=Mary&gender=female
GET [fhir endpoint]/Patient?address-city=Springfield&address-state=NY
Both would give you a Bundle of results. The two Bundles together would be the complete list of matching resources you were looking for.
For the second example query, you would need to supply both GETs with &active=true.

Related

Search algorithm options for ontology querying?

I have developed a tool that enables searching of an ontology I authored. It submits the searches as SPARQL queries.
I have received some feedback that my search implementation is all-or-none, or "binary". In other words, if a user's input doesn't exactly match a term in the ontology, they won't get any hit at all.
I have been asked to add some more flexible, or "advanced" search algorithms. Indexing and bag-of-words searching were suggested.
Can anyone give some examples of implementing search methods on an ontology that don't require a literal match?
FIrst of all, what kind of entities are you trying to match (literals, or string casts of URIs?), and what kind of SPARQL queries are you running now? Something like this?
?term ?predicate "user input" .
If you are searching across literals, you can make the search more flexible right off the bat by using case-insensitive regular expression filtering, although this will probably make your searches slower, and it won't catch cases where some of the word tokens are present but in a different order. In the following example, your should probably constrain the types of ?term and ?predicate first, or even filter on a string datatype on ?userInput
?term ?predicate ?someLiteral .
FILTER(regex(?someLiteral), "user input", "i"))
Several triplestores offer support for full-text searching and result scoring. These are often extensions to the SPARQL language.
For example, Virtuoso and some others offer a bif:contains predicate. Virtuoso also offers the faceted search web interface (plus a service, I think.) I have been pleased with the web-based full text search in Blazegraph and Stardog, but I can't say anything at this point about using them with a SPARQL query to get a score on a search pattern. Some (GraphDB) even support explicit integration with Lucene or Solr*, so you may be able to take advantage of their search languages.
Finally... are you using a library like the OWL API or RDF4J to access your ontology? If so, you could certainly save the relationships between your terms and any literals in a Java native data structure, and then directly use a fuzzy search component like Lucene to index each literal as a "document" and then search the user input across the index.
Why don't you post your ontology and give an example of a search you would like to peform in a non-binary way. I (or someone else) can try to show you a minimal implementation.
*Solr integration only appears to be offered in the commercially-licensed version of GraphDB

a simple filtering language that can be embedded in ruby?

I have a ruby project where part of the operation is to select entities given user-specified constraints. So far, I've been hacking my own filter language, using regular expressions and specifying inclusion/exclusion based on the fields in the entities.
If you are interested in my current approach, here's an example: For instance, given this list of entities:
[{"type":"dog", "name":"joe"}, {"type":"dog", "name":"fuzz"}, {"type":"cat", "name":"meow"}]
A user could specify a filter like so:
{"filter":{
"type":{"included":["dog"] },
"name":{"excluded":["^f.*"] }
}}
Would match all dogs but exclude fuzz.
This is sort of working now. However, I am starting to require more sophisticated selection parameters. I am thinking that rather than continuing to hack on my own filter language, there might be a more general-purpose filter language I can just embed in my application? For instance, is there a parser that can in-app filter using a SQL where clause? Or are there some other general, simple filter languages that I'm not aware of? I would especially like to move away from regexps since I want to do range querying on numbers (like is entity["size"] < 50 ?)
It is a little bit of an extrapolation, but I think you may be looking for a search engine, or at least enough of one that you may as well use one just for the query language.
If so you might want to look at elasticsearch which does have Ruby client bindings, and could be a good fit for what you are trying to do. Especially if you want or need to express the data you want to search as JSON for use by client code, as that format is natively supported by the search engine.
The query language is quite expressive, and there are a variety of built-in and plugin tools available to explore and use it.
in the end, i ended up implementing a ruby dsl. it's easy, fun, and powerful.

Retrieving a DBpedia resource by its string name with SPARQL and without knowing its type

As shown in this question which has a similar title, I would like to retrieve a dbpedia resource by knowing a part of its name. I'm a beginner when it comes to SPARQL and such, but the example in the question helped me a lot, as the author searched for "Romania", and the person answering hooked him up with a Sparql request to do the job. That's nice, but here's the thing.
In the example, they already "knew" that Romania is a country, hence the
?c a dbpedia-owl:Country ;
in the WHERE clause. The complete sparql request being
SELECT ?c
WHERE {
?c a dbpedia-owl:Country ;
foaf:name "Romania"#en .
FILTER NOT EXISTS {?c dbpedia-owl:dissolutionYear ?y}
}
But, this question doesn't quite completely answer our need, hence searching for ANY resource by its name, the "name" being the actual name of a resource, or a part of it, regardless of its (rdf:)type. The goal would be to search for "anything", just knowing the name or a part of it.
I've been doing some research before asking you guys this question, and I already know that the "part of the name" problem could be resolved with bif function (the bad way, since it's not sparql compliant), or the CONTAINS clause, but I couldn't find any example showing how to use it.
Let's now suppose that there's a "word" to search for among the dbpedia resources, that word would be an input from some user. And let's call it "INPUT".
The request, I would imagine, would look like :
SELECT ?something WHERE
{
?something a (dbpedia Resource).
CONTAINS(?something,"INPUT")
}
My major question is about two major aspects :
Is there anything that describes the type Dbpedia Resource ? I don't think it's in ontology or anything. By knwoing that I would like to search among all the resources to find one matching ...
A specific name I would provide, or some string. I considered the FILTER option, but that would mean getting ALL the resources, and then filtering them by their name after they have been retreived, which would be, I guess, not so optimal.
So, does anyone knows this "Master Query" to get a resource by providing its name, or a part of it ? (An example being providing "Obama", and getting results not only for Barrack, but for Michelle as well).
Thank you in advance.
I'm assuming that in your first question you are interested in looking at just instance resources. I don't know if you can explicitly ask just for instance resources in the general case, since in RDF everything is a resource. If you specifically need this for the DBpedia dataset you can query for resources that have dcterms:subject as a property (in DBPedia only instance resources have a dcterms:subject). So you can have a query like this:
SELECT DISTINCT ?s ?label WHERE {
?s rdfs:label ?label .
FILTER (lang(?label) = 'en').
?label bif:contains "Obama" .
?s dcterms:subject ?sub
}
Similarly for your second question - if you are using just the DBpedia dataset you might want to use "bif:contains" although is not SPARQL compliant. I don't think there is another optimal way to do this and as you said using FILTER will be sub-optimal especially if you need to execute queries quickly. I think that keyword search and indexing is handled ad-hoc by each triple store there is not yet a standardized way to to full-text searchers.
So to sum up, if you work with dbpedia only just use the features of the store and the specifics of the dataset to solve your problem.

to_tsquery() validation

I'm currently developing a website that allows a search on a PostgreSQL
database, the search works with to_tsquery() and I'm trying to find a way to validate the input before it's being sent as a query.
Other than that I'm also trying to add a phrasing capability, so that if someone searches for HELLO | "I LIKE CATS" it will only find results with "hello" or the entire phrase "i like cats" (as opposed to I & LIKE & CATS that will find you articles that have all 3 words,
regardless where they might appear).
Is there some reason why it's too expensive to let the DB server validate it? It does seem a bit excessive to duplicate the ts_query parsing algorithm in the client.
If the concern is that you don't want it to try running the whole query (which presumably will involve table access) each time it validates, you could use the input in a smaller query, just in pseudocode (which may look a bit like Python, but that's just coincidence):
is_valid_query(input):
try:
execute("SELECT ts_query($1)", input);
return True
except DatabaseError:
return False
With regard to phrasing, it's probably easiest to search by the non-phrased query first (using indexes), then filter those for having the phrase. That could be done server side or client side. Depending on the language being parsed, it might be easiest to construct a simple regex of the phrase that deals with repeated whitespace or other ignorable symbols.
Search for to_tsquery('HELLO|(I&LIKE&CATS)'), getting back a list of documents which loosely match.
In the client, filter that to those matching the regex "HELLO|(I\s+LIKE\s+CATS)".
The downside is you do need some additional code for translating your query into the appropriate looser query, and then for translating it into a regex.
Finally, there might be a technique in PostgreSQL to do proper phrase searching using the lexeme positions that are stored in ts_vectors. I'm guessing that phrase searches are one of the intended uses, but I couldn't find an example of it in my cursory search. There's a section on it near the bottom of http://linuxgazette.net/164/sephton.html at least.

How to do SQL IN like query in hibernate search

A simulating scenario is:
Search for books whose content contains "success" AND author is in a list of passed names(could be thousands of).
I looked into filter:
http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#query-filter
Seams like hibernate search has no native support of this.
What is recommended approach for this problem? I think I am not alone.
Thanks for any inputs.
Let me post my current solution.
Get the search results with minimal projections for the keywords, and loop through the results to get only matching ones from the IN list.
I am not using filter.
Open to other alternatives once convinced.
If you look here http://lucene.apache.org/java/2_4_1/queryparsersyntax.html (at the end "Field Grouping"), you can write a query with something like :
content:success AND author:("firstname" "secondname" "thirdname" ...)

Resources