FHIR Issue trying to search by date range - hl7-fhir

I'm trying to do a /Observation search by value-quantity, I can do it without problems with the prefixes eq, ne, gt, lt, le etc but I can't do it for a range, I would like to be able to do it for a minimum and maximum value.
Is this possible? I have searched on composite search but I have not been able to...
Thanks!

Range searches in FHIR are done using a lower bound combined with an upper bound since multiple conditions are ANDed together. If you wanted to search for Observations with a value between 1 and 10 you could do
Observation?value-quantity=ge1&value-quantity=le10

Related

giving weight for SynonymFilterFactory terms

Any way in Solr to give weights to synonyms? (generated by SynonymFilterFactory )
Longer version of the question / some background:
We'd like to give smaller weight for synonyms words/terms injected by SynonymFilterFactory.
So exact matches get higher score.
First use case just to give one static weight for all synonyms
and if search-time matches through synonyms it'll have a certain (lower)
weight than exact match.
Can't find this in documentation.
Is there is a way for Solr to assign weights for terms produced by SynonymFilterFactory?
Any pointers highly appreciated.
PS. Another use case is to fine-tune each synonyms with a particular weight
for each particular synonym (i.e. synonyms="synonyms.txt" would have 3
columns and not 2). It seems not currently possible, so perhaps just static
weight for all synonyms described above would be possible.
As in most cases with Lucene, the solution is to use multiple fields - one field with synonyms expanded and one without. That way you can decide whether to search with synonyms enabled at all, or you can score hits in the different fields with different weights - and you can adjust those weights based on your query. In Solr you'd used copyField to index the same content into both field, and you can then adjust the weights when using edismax with field^5 field_with_synonyms to score hits without synonyms five times higher than those with synonyms.
If you really want to do it inside one, single field, it'll require far more brittle and custom setup, where you can use payloads attached to each token to manually score each token differently, but this is a more advanced use case and won't fit neatly into all other functionality. It'll solve your PS use case, though. I'd also recommend checking out one of the presentations from Lucene/Solr Revolution about use cases for payload scoring.
Using two fields is the easy way, using payloads is the more flexible, but also more advanced way.
Returns the float value computed from the decoded payloads of the term specified.
The return value is computed using the min, max, or average of the decoded payloads. A special first function can be used instead of the others, to short-circuit term enumeration and return only the decoded payload of the first term.
The field specified must have float or integer payload encoding capability (via DelimitedPayloadTokenFilter or NumericPayloadTokenFilter). If no payload is found for the term, the default value is returned.
payload(field_name,term): default value is 0.0, average function is used.
payload(field_name,term,default_value): default value can be a constant, field name, or another float returning function. average function used.
payload(field_name,term,default_value,function): function values can be min, max, average, or first.
A file used with the DelimitedPayloadTokenFilter is in the format of token|payload and allows you to attach any numeric value as the "payload" for that token.

Find all the rows given the number of matching attributes and the query

Below is my problem definition:
Given a database D, each row has m categorical attributes. Given a query which is a vector of m categorical attributes and the number of matching, k. How to find all the row ids such that the number of matching attributes to the query is greater than or equal to k efficiently?
The easier version (I think) is that given a vector of <=m-categorical attributes, how to find ids of all the rows that match those <=m-categorical attributes.
In some of the question (e.g. this), they need to scan the whole database every time the query comes in. I think this is not fast enough. I am not sure about the complexity on this actually.
If it is possible, I want to avoid scanning all the rows in the database. Therefore, I am thinking of building some kinds of index but I am wondering if there is any existing work for these?
In addition, is there a problem similar to this and what is it called? I want to take a look.
Thank you very much for your help.
(Regarding the coding, I mainly code in Python 2.7 for this.)

prefix similarity search

I am trying to find a way to build a fuzzy search where both the text database and the queries may have spelling variants. In particular, the text database is material collected from the web and likely would not benefit from full text engine's prep phase (word stemming)
I could imagine using pg_trgm as a starting point and then validate hits by Levenshtein.
However, people tend to do prefix queries E.g, in the realm of music, I would expect "beetho symphony" to be a reasonable search term. So, is someone were typing "betho symphony", is there a reasonable way (using postgresql with perhaps tcl or perl scripting) to discover that the "betho" part should be compared with "beetho" (returning an edit distance of 1)
What I ended up is a simple modification of the common algorithm: normally I would just pick up the last value from the matrix or vector pair. Referring to the "iterative" algorithm in http://en.wikipedia.org/wiki/Levenshtein_distance I put the strings to be probed as first argument and the query string as second one. Now, when the algorithm finishes, the minimum value in the result column gives the proper result
Sample results:
query "fantas", words in database "fantasy", "fantastic" => 0
query "fantas", wor in database "fan" => 3
The input to edit distance are words selected from a "most words" list based on trigram similarity
You can modify edit distance algorithm to give a lower weight to the latter part of the string.
Eg: Match(i,j) = 1/max(i,j)^2 instead of Match(i,j)=1 for every i&j. (i and j are the location of the symbols you are comparing).
What this does is: dist('ABCD', 'ABCE') < dist('ABCD', 'EBCD').

Elasticsearch scoring based on how close a number is to a query

I want to score my documents based on on how close a number is to a query. Given I have two documents document1.field = 1 and document2.field = 10, a query field = 3 then I want document1._score > document2._score. Or in other words I want something like a fuzzy query against number. How would I achieve this? The use case is I want to support price queries (exact or range), but want to rank stuff that isn't exactly in the boundaries.
You are looking for Decay functions:
Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.
It can be implemented using custom_score query where script will determine boost depending on absolute value of the difference between exact price and desired price. The desired price should be passed to the script as a parameter to avoid script recompilation for every request.
Alternatively, it can be implemented using custom_filters_score query. Filters here will contain different ranges around desired price. Smaller ranges will have higher boost and appear higher in the list than larger ranges.

Fuzzy Matching on Date-Type values

I don't have a real question but I'm more like seeking for creative input for a problem.
I want to compare two (most likely unequal) Date values and calculate the ratio of their similarity. So for example if I'd compare 08.01.2013 and 10.01.2013 I would get a relative high value but between 08.01.2013 and 17.04.1998it would be really low.
But now I'm not sure how I should exactly calculate the similarity. First I was thinking about turning the Date values into Strings and then use the EditDistance on them (number of single char operations to transform one String into another). This seems like a good idea for some cases and I'll definitly implement it but I also need an appropriate calculation for something like 31.01.2013 and 02.02.2013
Why not use the difference in days between two dates as a starting point?
It is "low" for similar dates and "high" for unequal dates, then use arithmetic to obtain a "similarity ratio" which matches your requirements.
Consider a fixed reference date "early enough" in the past if you get stuck.
The edit distance can be calculated using the Levenshtein distance.
A change in the year would mean a lot more "distance" than a change in the day.
The usual way to compare days would be to calculate the distance in days or hours. To do that, you'd convert both dates in a serial day number. Microsoft offers a DateDiff() function for date comparisons and distance calculations.

Resources