Elastic search multiple terms in a dictionary - elasticsearch

I have mapping like:
"profile": {
"properties": {
"educations": {
"properties": {
"university": {
"type": "string"
},
"graduation_year": {
"type": "string"
}
}
}
}
}
which obviously holds the educations history of people. Each person can have multiple educations. What I want to do is search for people who graduated from "SFU" in "2012". To do that I am using filtered search:
"filtered": {
"filter": {
"and": [
{
"term": {
"educations.university": "SFU"
}
},
{
"term": {
"educations.graduation_year": "2012"
}
}
]
}
But what this query does is to find the documents who have "SFU" and "2012" in their education, so this document would match, which is wrong:
educations[0] = {"university": "SFU", "graduation_year": 2000}
educations[1] = {"university": "UBC", "graduation_year": 2012}
Is there anyway I could filter both terms on each education?

You need to define nested type for educations and use nested filter to filter it, or Elasticsearch will internally flattens inner objects into a single object, and return the wrong results.
You can refer here for detail explainations and samples:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Related

How to search by non-tokenized field length in ElasticSearch

Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.
There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.
For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).

Elasticsearch: Search fields containing arrays using arrays

I have an ES index where one of my mappings stores a simple array of named entities pre-set at the point of ingestion.
I'm trying to search my index using a given array of entities, to return documents where containing many of the same entities.
Some code for illustration...
GET /test_data/_search
{
"query": {
"match": {
"entities": ['Trump', 'CNN', 'Oklahoma', 'Tiktok', 'Tulsa']
}
}
}
However, this returns a parse exception -- What would be the best method to search fields containing arrays using another array?
Thanks
If you're looking for exact matches then change match to terms -- this functions as an OR query:
GET /test_data/_search
{
"query": {
"terms": {
"entities": [
"Trump",
"CNN",
"Oklahoma",
"Tiktok",
"Tulsa"
]
}
}
}
otherwise use a bool-should array of match queries:
GET /test_data/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"entities": "Trump"
}
},
{
"match": {
"entities": "CNN"
}
},
{
"match": {
"entities": "Oklahoma"
}
},
...
]
}
}
}
You can define how many of them should match with the minimum_should_match param.

language analyzer not working with span query elasticsearch?

I have an index my_index and a type my_type
{
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "spanish"
}
}
}
}
}
then I want to search for 2 or more words, with, or without gaps but in the specified order.
For that I wrote the following query:
{
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"my_text": "creación"
}
},
{
"span_term": {
"my_text": "fundación"
}
}
],
"slop":4,
"in_order": true
}
}
}
The query is not producing results. But if i search "creacion" and "fundacion" (without diacritics) the query show results.
If i do a phrase_query "creación de una fundación" produces results, so i think there is not support for language analyzers in span queries?
Thanks in advance...
Span queries are NOT analyzed, nor stemmed. They are LOW level queries exposed by the API and you need to know how it is stored before u do the SPAN query.

Boolean AND with exact matches oin Elasticsearch

In our Elasticsearch collection of products, we have an an array of hashes, called "nutrients". A partial example of the data would be:
"_source": {
"quantity": "150.0",
"id": 1001,
"barcode": "7610809001066",
"nutrients": [
{
"per_hundred": "1010.0",
"name_fr": "Énergie",
"per_portion": "758.0",
"name_de": "Energie",
"per_day": "9.0",
"name_it": "Energia",
"name_en": "Energy"
},
{
"per_hundred": "242.0",
"name_fr": "Énergie (kCal)",
"per_portion": "181.0",
"name_de": "Energie (kCal)",
"per_day": "9.0",
"name_it": "Energia (kCal)",
"name_en": "Energy (kCal)"
},
{
"per_hundred": "18.0",
"name_fr": "Matières grasses",
"per_portion": "13.5",
"name_de": "Fett",
"per_day": "19.0",
"name_it": "Grassi",
"name_en": "Fat"
},
In the search, we are trying to bring back the products based on an exact match of two of the fields contained in the nutrients array. What I am finding is the conditions seemed to be OR and not AND.
The two attempts have been:
"query": {
"bool": {
"must": [
{ "match": { "nutrients.name_fr": "Énergie" } },
{ "match": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
and
"query": {
"filtered": {
"filter": {
"and": [
{ "term": { "nutrients.name_fr": "Énergie" } },
{ "term": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
Both of these are in fact bringing back entries with Énergie and 242.0, but are also match on different name_fr, eg:
{
"per_hundred": "242.0",
"name_fr": "Acide folique",
"per_portion": "96.0",
"name_de": "Folsäure",
"per_day": "48.0",
"name_it": "Acido folico",
"name_en": "Folic acid"
},
They are also matching on a non exact match, i.e: matching also on "Énergie (kCal)" when we want to match only on "Énergie"
On your first problem:
You have to make the nutrients field nested, so you can query each object inside it for itself Elasticsearch Nested Objects.

Elasticsearch script query involving root and nested values

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.
You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

Resources