I see a few posts around nested fields and aggregation, but none of them seem to answer my question. So, pardon me if this is a repeated question and any help would be greatly appreciated.
We've built an index of lectures, and lectures have the following qualities:
A lecture can either be in-person (live) or pre-recorded (online)
Each lecture can have multiple chapters
Each of these chapters can be covered by different lecturers (example: chapter 1 of quantum physics can be covered by five different lecturers, and three of them may be live and the other two may be online)
An online lecture will always have one entry per lecturer per chapter per quality
Roughly the structure is as follows:
{
"topics": [
{
"id": "TOP1",
"chapters": [
{
"chapterId": 12345,
"availability": [
{
"type": "LIVE",
"lecturer": "Dr. Abraham Fisher",
"lectureChapterId": "861731",
"availableFrom": "2017-09-11 13:00:00",
"expiresAt": "2017-09-11 15:00:00",
"lecturerIds": [
"MON121",
"MEL122"
]
},
{
"type": "LIVE",
"lecturer": "Dr. Bob Fisher",
"lectureChapterId": "181751",
"availableFrom": "2017-09-11 20:00:00",
"expiresAt": "2017-09-11 22:00:00",
"lecturerIds": [
"MON122",
"MEL123"
]
},
{
"type": "LIVE",
"lecturer": "Dr. Bob Fisher",
"lectureChapterId": "181751",
"availableFrom": "2017-09-17 20:00:00",
"expiresAt": "2017-09-17 22:00:00",
"lecturerIds": [
"MON122",
"MEL123"
]
},
{
"type": "LIVE",
"lecturer": "Dr. Abraham Fisher",
"lectureChapterId": "861731",
"availableFrom": "2017-09-17 13:00:00",
"expiresAt": "2017-09-17 15:00:00",
"lecturerIds": [
"MON121",
"MEL122"
]
},
{
"type": "ONLINE",
"quality" : "HD",
"price" : 19.99,
"lecturer": "Dr. Catherine Fisher",
"lectureChapterId": "9127312",
"availableFrom": "2017-01-17 00:00:00",
"expiresAt": "2017-12-31 23:59:59",
"lecturerIds": [
"MON120",
"MEL120"
]
},
{
"type": "ONLINE",
"quality" : "SD",
"price" : 10.99,
"lecturer": "Dr. Catherine Fisher",
"lectureChapterId": "9127312",
"availableFrom": "2017-01-17 00:00:00",
"expiresAt": "2017-12-31 23:59:59",
"lecturerIds": [
"MON120",
"MEL120"
]
}
]
}
]
}
]
}
Now if the requirement is to return only the details first available lecture grouped by chapter, lecturer for LIVE lectures and return all online lectures (along with other metadata for the lecture topic), what is the best way to do that? In the example above, lectures by Dr. Abraham Fisher and Dr. Bob Fisher on the 11th of September should be returned.
I tried using inner_hits, but apparently, it doesn't allow aggregations (I get the following error).
"[nested] query does not support [aggs]"
P.S: The aggregation needs to be at a chapter level and not at the lecture topic (root) level.
Can you post your query? For an aggregation of a nested field, you always need to specify the nested path. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
I do not know your full data model, but if you have control over the document structure and you only need it for search, you could try to flatten it a bit more by denormalization.
Related
I'm still learning elastic and a lot of things are unclear to me, including this example:
Suppose I have marketplace like amazon/any (many products with many sub options and availability by cities). And I want use elastic for searching only by string field.
For example I want to search "lord of the rings compilation in dublin" and elastic should return only books compilation on lord of the rings which availability in dublin.
Into elastic I can put documents with any schema (using only for searching).
So now I have this schema for elastic (data compilation from prod database):
[
{
"name": "lord of rings",
"seller": "Home Production",
"availability": [
{
"city": "dublin",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "london",
"category": "book",
"types": [
"one book",
"compilation"
]
}
]
},
{
"name": "lord of rings",
"seller": "Some",
"availability": [
{
"city": "dublin",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "london",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "dublin",
"category": "dvd",
"types": [
"disk"
]
}
]
}
]
This is a very abstract example. We can format the data schema in any way for ease of searching. The search city is always known (it is not part of the text query).
The difficulty is that one seller, for one product, has many cities of availability and in each city we know the "options" of availability (for example, one book or a whole collection)
I don't know how to describe it in more detail or how to find it in Google correctly.
I tried multi_match but it gives wrong answers if i want 'lord of rings dvd in dublin'.
He suggests that the first document is more relevant to me, although in fact the second document is the correct answer.
Relevance issues are not easy to solve, sometimes you could get better results if you boost the city field using multi-match.
Anyway, you need to study more to understand your scenario and make the documents you want more relevant.
I recommend that you read the book Relevant Search that will help you a lot to understand why some results are not relevant as you want.
There is a LOINC concept 55284-4 that appears in some FHIR resources, for example:
https://syntheticmass.mitre.org/v1/fhir/Patient/0000aa42-c235-4447-8389-8a2640f44466/$everything
This code is for "Blood pressure systolic and diastolic" and is in the OMOP concept table as shown below.
This concept is not a standard concept, is not a valid concept (from OMOP's perspective) and does not have a mapping to any standard concept in OMOP.
This concept appears in the FHIR resource in the snippet shown below (from the synthmass url shown above).
What records should be created in OMOP to represent the snippet shown below?
"resource": {
"category": [
{
"coding": [
{
"code": "vital-signs",
"display": "vital-signs",
"system": http://hl7.org/fhir/observation-category
}
]
}
],
"code": {
"coding": [
{
"code": "55284-4",
"display": "Blood Pressure",
"system": http://loinc.org
}
],
"text": "Blood Pressure"
},
"component": [
{
"code": {
"coding": [
{
"code": "8462-4",
"display": "Diastolic Blood Pressure",
"system": http://loinc.org
}
],
"text": "Diastolic Blood Pressure"
},
"valueQuantity": {
"code": "mmHg",
"system": http://unitsofmeasure.org,
"unit": "mmHg",
"value": 84.88290301982099
}
},
{
"code": {
"coding": [
{
"code": "8480-6",
"display": "Systolic Blood Pressure",
"system": http://loinc.org
}
],
"text": "Systolic Blood Pressure"
},
"valueQuantity": {
"code": "mmHg",
"system": http://unitsofmeasure.org,
"unit": "mmHg",
"value": 117.69213707496547
}
}
],
"context": {
"reference": "Encounter/52430cb1-a0d3-4d1e-926c-c878c218502e"
},
"effectiveDateTime": "2018-05-06T04:07:24-04:00",
"id": "186e8672-f9ac-4419-8c5d-1e824bf6e536",
"issued": "2018-05-06T04:07:24.278-04:00",
"meta": {
"lastUpdated": "2019-04-09T08:36:08.363897+00:00",
"versionId": "MTU1NDc5ODk2ODM2Mzg5NzAwMA"
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/0000aa42-c235-4447-8389-8a2640f44466"
}
}
EDIT: ADDITIONAL INFORMATION
It looks like these are probably commonly used concepts for blood pressure in OMOP
select
*
from
concept
where 1=1
and (
lower(concept_name) = 'systolic blood pressure' or
lower(concept_name) = 'diastolic blood pressure')
and domain_id = 'Measurement'
and standard_concept = 'S';
And there are lots of other potential candidates (about 380 of them):
select
*
from
concept
where 1=1
and lower(concept_name) like '%blood pressure%'
and domain_id = 'Measurement'
and standard_concept = 'S';
I would map it the same as if it was LOINC code 85354-9. However, you should file an issue with Synthea because using the code 55284-4 without also having a translation of 85354-9 is non-conformant. (All FHIR conformant systems conveying a combined 'blood pressure' Observation are required to adhere to this profile, which mandates the 85354-9 code.)
I want to find out county, city, state, phoneAreaCode, latitude, longitude, cityStateKey based upon two inputs i.e. postal code and country
I tried going through the google api documentation, but their documentation is quite overwhelming.
input -
"CountryCode": "US",
"postalCode": "60090"
Output -
"provinceCode": "IL",
"county": "COOK",
"phoneAreaCode": "847/312/224/630/708",
"latitude": 42.124293,
"longitude": -87.924184,
"cityStateKey": "W15521",
"cityName": "WHEELING"
During the research found that "Find Place Request" from Google's Places API is the best fit for this requirement.
Here is the request and response:
Request:
https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=06850&inputtype=textquery&fields=formatted_address,name,geometry,types&key=API_KEY
Response:
{
"candidates": [
{
"formatted_address": "New York, NY 10029, USA",
"geometry": {
"location": {
"lat": 40.79164069999999,
"lng": -73.94479939999999
},
"viewport": {
"northeast": {
"lat": 40.79299052989272,
"lng": -73.94344957010728
},
"southwest": {
"lat": 40.79029087010727,
"lng": -73.94614922989271
}
}
},
"name": "New York, NY 10029",
"types": [
"postal_code"
]
}
],
"status": "OK"
}
I am setting up a LUIS service for dutch.
I have this sentence:
Hi, ik ben igor -> meaning Hi, I'm igor
Where Hi is an simple entity called Hey, that can have multiple different values such as (hey, hello, ..) which I specified as a list in the phrases.
And Igor is a simple entity called Name
In the dashboard I can see that Igor has been correctly mapped as a Name entity, but the retrieved result is the following:
{
"query": "Hi, ik ben igor",
"topScoringIntent": {
"intent": "Greeting",
"score": 0.462906122
},
"intents": [
{
"intent": "Greeting",
"score": 0.462906122
},
{
"intent": "None",
"score": 0.41605103
}
],
"entities": [
{
"entity": "hi",
"type": "Hey",
"startIndex": 0,
"endIndex": 1,
"score": 0.9947428
}
]
}
Is it possible to solve this? I do not want to make a phrase list of all the names that exist.
Managed to train LUIS to even recognize asdaasdasd:
{
"query": "Heey, ik ben asdaasdasd",
"topScoringIntent": {
"intent": "Greeting",
"score": 0.5320666
},
"intents": [
{
"intent": "Greeting",
"score": 0.5320666
},
{
"intent": "None",
"score": 0.236944184
}
],
"entities": [
{
"entity": "asdaasdasd",
"type": "Name",
"startIndex": 13,
"endIndex": 22,
"score": 0.8811139
}
]
}
To be honest I do not have a great guide on how to do this:
Add multiple example utterances with example entity position
Did this for about 5 utterances
No phrase list necessary
I'm going to accept this as an answer, but once someone explains in-depth and technically what is happening behind the covers, I will accept that answer.
Currently we have a problem to perform a query (or more precisely to design a mapping) in elasticsearch, which help us to perform a query over a relational problem, that we didn't get solved with our non-document orientated thinking from sql.
We want to create a many-to-many relation between different Elasticsearch entries. We need this to edit an entry once and keep all using’s updated to this.
To describe the problem, we'll use the following simple data model:
Broadcast Content
------------ ---------
Id Id
Day Title
Contents [] Description
So we have two different types to index, broadcasts and contents.
A broadcast can have many contents and single contents could also be part of different broadcasts (e.g. repetition).
JSON like:
index/broadcasts
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
"C1",
"C2"
]
}
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
"C1",
"C3"
]
}
index/contents
{
"Id": "C1",
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
}
{
"Id": "C2",
"Title": "Have a break!",
"Description": "Everything about Android"
}
{
"Id": "C3",
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
Now we want to rename the "Late Night Disaster" into something more precisely and keep all references up to date.
How could we approach this? Are there fourther options in ES, like includes in RavenDB?
Nested objects or child-parent relations didn't helped us so far.
What about denormalizing? seems difficult if we come from the SQL mindset, but give you a try, even with millions of documents, LUCENE indexing can help, and renaming will be a batch job.
[
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Have a break!",
"Description": "Everything about Android"
}
]
},
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
]
}
]