How to execute search for FHIR patient with multiple given names? - hl7-fhir

We've implemented the $match operation for patient that takes FHIR parameters with the search criteria. How should this search work when the patient resource in the parameters contains multiple given names? We don't see anything in FHIR that speaks to this. Our best guess is that we treat it as an OR when trying to match on given names in our system.
We do see that composite parameters can be used in the query string as AND or OR, but not sure how this equates when using the $match operation.

$match is intrinsically a 'fuzzy' search. Different servers will implement it differently. Many will allow for alternate spellings, common short names (e.g. 'Dick' for 'Richard'), etc. They may also allow for transposition of month and day and all sorts of similar data entry errors. The 'closeness' of the match is reflected in the score the match is given. It's entirely possible get back a match candidate that doesn't match any of the given names exactly if the score on other elements is high enough.

So technically, I think SEARCH works this way:
AND
/Patient?givenname=John&givenname=Jacob&givenname=Jingerheimer
The above is an AND clause. There is (can be) a person named with multiple given names "John", "Jacob", "Jingerheimer".
Now I realize SEARCH and MATCH are 2 different operations.
But they are loosely related.
But Patient-Matching is an "art". Be careful, a "false positive" (with a high "score") is/could-be a very big deal.
But as mentioned from Lloyd....you have a little more flexibility with your implementation of $match.
I have worked on 2 different "teams".
One team, we never let "out the door" anything that was below a 80% match-score. (How you determine a match-score is a deeper discussion).
Another team, we made $match work with a "IF you give me enough information to find a SINGLE match, I'll give it to you" .. but if not, tell people "not enough info to match a single".
Patient Matching is HARD. Do not let anyone tell you different.
at HIMSS and other events..when people show a demo of moving data, I always ask "how did you match this single person on this side.....as it is that person on the other side?"
As in "without patient matching...alot of work-flows fall a part at the get go"
Side note, I actually reported a bug with the MS-FHIR-Server (which the team fixed very quickly) (for SEARCH) here:
https://github.com/microsoft/fhir-server/issues/760
"name": [
{
"use": "official",
"family": "Kirk",
"given": [
"James",
"Tiberious"
]
},
Sidenote:
The Hapi-Fhir object to represent this is "ca.uhn.fhir.rest.param.TokenAndListParam"
Sidenote:
There is a feature request for Patient Match on the Ms-Fhir-Server github page:
https://github.com/microsoft/fhir-server/issues/943

Related

IFTTT JavaScript filter - How to make case insensitive searches + How to search Include and Exclude sets of terms

First off I'm a total novice for Javascript, so please go gently. I'm aware of how people feel about having to now pay for IFTTT, but it's perfect for what I need.
I am using a more expansive version of this code below to capture certain keywords from Tweets to then generate emails if the search returns a positive result. This search works very nicely, except it is case sensitive which is a problem.
Yes, I know you can manipulate the twitter search to pick up specific words or phrases. I am very proficient in achieving searches this way. I am casting a wide net to pick up approx 120 search words or phrases which is too long to achieve through "OR" Twitter search parameters alone which is why I'm using this.
Q1 - I have tried adding item.toLowerCase() and just .toLowerCase() in various parts of the code so it wouldn't matter if the sentence case of the search term is different to that of the original tweet text case. I just can't get it to work though. I've seen various posts on here but I can't get any of them to work in IFTTT. I believe IFTTT doesn't accept REGEX either, which is annoying.
Any advice of how to get this code running so it's case-insensitive for text within IFTTT?
Q2 - I have approx 120 search terms for the tweet text to return positive results. There is a lot of junk that comes through with that. Does anyone know how to add a second layer of 'and exclude' search terms?
I have something like 300-400 words and specific phrases which would be used to stop the email from being triggered - so it'd be something like "IF tweet text contains a, b, c BUT text ALSO contains x, y, z... do not send the email"
let str=Twitter.newTweetFromSearch.Text;
let searchTerms=[
"Northbound",
"Westbound",
"Southbound",
"Eastbound"
]
let foundOne=0;
if(searchTerms.some(function(v){return str.indexOf(v)>=0;})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
I have looked at the Twitter API, but that is a step too far for my coding ability which is why I'm using IFTTT.
Any help is very much appreciated
Thank you.
I'm playing with IFTTT Filter myself at the moment, so here are some thoughts about solving your solution.
If you want to do a case insensitive seatch on the original text, convert the original text to lowercase, then have all your search terms in lowercase.
Plus I think you want to iterate over the searchTerms array, and use the includes() method. Ok, just realised that .some() does the iteration for you, but I prefer includes() over indexof().
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
let foundOne=0;
if(searchTerms.some(function(term){return str.includes(term);})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
Or you could just skip having the foundOne variable, and do the search in the if() statement.
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
if(!searchTerms.some(function(term){return str.includes(term);})){
Email.sendMeEmail.skip();
}

Elasticsearch: how to find out if a value matches any value in a list?

I just start learning Elasticsearch. My data has the company name and its website, and I have a list which contains all the domain aliases of a company. I am trying to write a query which can boost the record with the same website in the list.
My data looks like:
{"company_name": "Kaiser Permanente",
"website": "http://www.kaiserpermanente.org"},
{"company_name": "Kaiser Permanente - Urgent Care",
"website": "http://kp.org"}.
The list of domain aliases is:
["kaiserpermanente.org","kp.org","kpcomedicare.org", "kp.com"]
The actual list is longer than the above example. I've tried this query:
{
"bool": {
"should": {
"terms": {
"website": [
"kaiserpermanente.org",
"kp.org",
"kpcomedicare.org",
"kp.com"
],
"boost": 20
}
}
}
}
The query doesn't return anything because "terms" query is the exact match. The domain in the list and the url is similar but not the same.
What I except is the query should return the two records in my example. I think "match" can work, but I couldn't figure out how to match a value with any similar value in the list.
I found a similar question How to do multiple "match" or "match_phrase" values in ElasticSearch. The solution works but my alias list contains more than 50 elements. It would be very verbose if I wrote multiple "match_phrase" for each element. Is there a more efficient way like "terms" so that I could just pass in a list?
I'd appreciate if anyone can help me out with this, thanks!
What you are observing has been covered in many stackoverflow posts / ES docs - the difference between terms and match. When you store that info, I assume you are using the standard analyzer. This means when you push "http://kp.org", Elasticsearch indexes [ "http", "kp", "org" ] tokens broken out. However, when you use terms, it looks for "kp.org" but there was no such "kp.org" token to find matches for since that was broken down by the analyzer when indexing. match, however, will break down what you query for so that "kp.org" => [ "kp", "org" ] and it is able to find one or both. Phrase matching just requires the tokens to be next to each other which is probably necessary for what you need.
Unfortunately, there does not appear to be such an option that works like match but allows many values to match against like terms. I believe you have three options:
programmatically generate the query as described in the stackoverflow post that you referenced, which you noted would be verbose, but I think this might be just ok unless you have 1k aliases.
analyze the website field so that analysis transforms "http://www.kaiserpermanente.org" => "kaiserpermanente.org" and "http://kp.org" => "kp.org" for indexing. With this index time analysis approach, when querying, you can successfully use the terms filter. This might be fine given urls are structured and the use cases you outline only appear to be concerned with domains. If you do this, use multi fields to analyze one website value in multiple ways. It's nice to have Elasticsearch do this kind of work for you and not worry about it in your own code.
do this processing beforehand (before pushing data to ES) so that when you store data in elasticsearch, you store not only the website field, but also a domain, paths, and whatever else you need that you calculated beforehand. You get control at the cost of effort you have to put in.

Search Console API: Impressions don't add up comparing totals to contains / not contains keywords

We are using the Search Console (webmaster tools) API to download search performance results for our site to compare search performance on people searching using our company name vs non company name searches. We have found a problem where the impressions don't add up when comparing "all search results" to "search results via specific keywords".
For example, if we do a report to show all web results for all devices for our site on a specific date, we get 189,491 impressions. If we then report to show results with the keyword "Our Name" we get 61,046. If we report on "OurName" (same keyword but without spaces) we get 1,086. If we then report not contains "Our Name" and not contains "OurName" we get 65,827, which adds up to 127,959, meaning somewhere we have 61,532 impressions missing.
Interestingly, if we change the filter on not contains to also include device equals DESKTOP, it increases to 65,997, yet I would have expected this to be equal to or less than all device impressions.
From the data we have this seemed to have stopped working on the 27th November 2015 (before this the 3 figures always added up to the total, on this date and afterwards they don't). The impressions add up fine if we only do one contains and one not contains. Clicks always seem to add up correctly, so I'm wondering if one of these queries is excluding data with zero clicks?
We are using the .Net library to access the Search Console data, but we get the same results when using the API Explorer. It is hard to replicate using the search console, as this doesn't allow you to include multi "not contains" keywords. The total figures and the contains "our name" / "ourname" figures match between the API and the search console.
I've found a few other post on here where people are having similar problems but they are dated over a year ago, and we've only just noticed the problem in the last 3 weeks so I don't know if this is a new problem.
The query for the not contains is as follows:
POST https://www.googleapis.com/webmasters/v3/sites/{YOUR_SITE_URL}/searchAnalytics/query?fields=rows&key={YOUR_API_KEY}
{
"startDate": "2015-12-07",
"endDate": "2015-12-07",
"searchType": "web",
"dimensionFilterGroups": [
{
"filters": [
{
"dimension": "query",
"expression": "our name",
"operator": "notContains"
},
{
"dimension": "query",
"expression": "ourname",
"operator": "notContains"
}
]
}
]
}
Many thanks in advance for any help
cross posted from Google Search Console Forum
From the API reference, there is no OR operation available for multiple filter expressions:
"Whether all filters in this group must return true ("and"), or one or more must return true (not yet supported)."
BOTH filters must be passed to get into the total.
Does not contain "our name" AND Does not contain "ourname".
https://developers.google.com/webmaster-tools/v3/searchanalytics/query
Having said that, you probably are even more at a loss to explain some of your results...maybe you have a number of queries that contain both "our name" AND "ourname"??
I working on the same topic at the moment (excluding brand searches); like Google say, they excluding search queries that can contain privat information:
To protect user privacy, Search Analytics doesn't show all data. For example, we might not track some queries that are made a very small number of times or those that contain personal or sensitive information.
https://support.google.com/webmasters/answer/6155685?hl=en#tablegone
With this in mind you have a big block of data with no query information, so if you filter in any way, that whole block isn't included.
For example, we had like 325.000 total impressions on the 01.07., but if I do two separate queries one with including and one with excluding and add the values for clicks and impressions together, I get like the total numbers for that block where my queries living in.
In our case that is around 180.000 impression, so 145k impressions were made with queries I don't know and can't filter them.
In your case the 127,959 could be your total of impressions (depending of your keywords). So your non brand traffic with 65,827 impressions is more like 50% percent than 30%.
I hope it's more or less understandable.

Google Place API street type list

I am using google place to retrieve address, and somehow we want the street(route in google terminology) to be separated into street name and street type. We also want the street type to match an existing column in database.
But things get difficult when google place sometimes use XXXX Street and some times XXXX st
For instance, this is a typical google address
{
administrative_area_level_1: ['short_name', 'VIC'],
locality: ['long_name', 'Carlton'],
postal_code: ['long_name', '3053'],
route: ['long_name', 'Canada Ln'],
street_number: ['short_name', '12'],
subpremise: ['short_name', '13']
}
But it always shows Canada Lane in the suggestion box.
And sometimes even worse when the abbreviation does not match my local data model. For instance we use la instead of ln for short of lane.
It will be appreciated if anyone could tell me where to find a list of street type (and abbreviation) used by google API. Or Is there a way to disable the abbreviation option?
Sounds like you're after "street suffixes". These are complicated.
Not only they change across countries and languages, even within the same country and language they can be used in different ways; abbreviations can have multiple meanings: "St" can be "Street" of "Saint"; abbreviations are used or not depending on subtle rules that also change from place to place.
Same goes for cardinal points (North, South, East, West) that are parts of road / street names: "North St" or "N 11st Street"? It's complicated.
If you already have a good amount of addresses, and you only care about addresses in English, you could take the last word from each street name as the suffix. When matching to your own data, allow for abbreviations when matching, rather than trying to expand them.
For instance, don't try to expand "Canada La" into "Canada Lane" so that it matches "Lane". Instead, expand "Lane" into ["Lane", "La", "Ln"] and match suffixes to all values.
Then you'd need a strategy for "collisions", abbreviations that can mean 2+ suffixes. These seem to be rare, I can't remember any ("St" isn't, because "Saint" isn't a suffix) and USPS' http://pe.usps.gov/text/pub28/28apc_002.htm doesn't seem to have any.

Wiktionary/MediaWiki Search & Suffix Filtering

I'm building an application that will hopefully use Wiktionary words and definitions as a data source. In my queries, I'd like to be able to search for all Wiktionary entries that are similar to user provided terms in either the title or definition, but also have titles ending with a specified suffix (or one of a set of suffixes).
For example, I want to find all Wiktionary entries that contain the words "large dog", like this:
https://en.wiktionary.org/w/api.php?action=query&list=search&srsearch=large%20dog
But further filter the results to only contain entries with titles ending with "d". So in that example, "boarhound", "Saint Bernard", and "unleashed" would be returned.
Is this possible with the MediaWiki search API? Do you have any recommendations?
This is mostly possible with ElasticSearch/CirrusSearch, but disabled for performance reasons. You can still use it on your wiki, or attempt smart search queries.
Usually for Wiktionary I use yanker, which can access the page table of the database. Your example (one-letter suffix) would be huge, but for instance .*hound$ finds:
Afghan_hound
Bavarian_mountain_hound
Foxhound
Irish_Wolfhound
Mahound
Otterhound
Russian_Wolfhound
Scottish_Deerhound
Tripehound
basset_hound
bearhound
black_horehound
bloodhound
boarhound
bookhound
boozehound
buckhound
chowhound
coon_hound
coonhound
covert-hound
covert_hound
coverthound
deerhound
double-nosed_andean_tiger_hound
elkhound
foxhound
gazehound
gorehound
grayhound
greyhound
harehound
heckhound
hell-hound
hell_hound
hellhound
hoarhound
horehound
hound
limehound
lyam-hound
minkhound
newshound
nursehound
otterhound
powder_hound
powderhound
publicity-hound
publicity_hound
rock_hound
rockhound
scent_hound
scenthound
shag-hound
sighthound
sleuth-hound
sleuthhound
slot-hound
slowhound
sluthhound
smooth_hound
smoothhound
smuthound
staghound
war_hound
whorehound
wolfhound

Resources