I have a dataset like this:
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "kt",
"_type" : "_doc",
"_id" : "3kscgngBKcaOnhm7gU5w",
"_score" : 1.0,
"_source" : {
"authorName" : "Alastair Reynolds",
"bookName" : "Arınma Geçidi",
"publishDate" : "2021",
"book" : [
{
"pageNum" : 1,
"pageContent" : ""
},
{
"pageNum" : 2,
"pageContent" : "© İndie Kitap - 2021 © The Orion Publishing Group Limited - 2019 Yazar:
so, it goes like this page numbers and page content. What I want to do is, for example if I search "spear", I want spears highlighted and also I want the pageNum. This is my mapping:
{
"kt" : {
"mappings" : {
"properties" : {
"authorName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"book" : {
"type" : "nested",
"properties" : {
"pageContent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"pageNum" : {
"type" : "long"
}
}
},
"bookName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"publishDate" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
before that, "book" was not a nested object and I was getting the highlighted words with phrases but all results was together, like this:
"""Bugün de öyle, “taptığınız <em>kitap</em> tanrısal değildir,
el yapımıdır” diyor birileri.""",
"""“İğrenç ve utanç verici bir <em>kitap</em>” olarak tanımladığı Kuran’ı her-
kesin tanıması için (auff das yderman""",
"""Bu yazı vesilesiyle okuma fırsatı buldum, daha sonra bundan iki ay sürecek
bir tartışma ve bir <em>kitap</em>""",
"""anlamıyorum, siz hep Kemalizm’e dair çok sert eleş-
tiriler yaptınız Yanlış Cumhuriyet diye de bir <em>kitap</em>""",
"""Siz Kemalizm’e dair derin eleştirilerde
bulunuyordunuz, <em>kitap</em> çıktı...""",
"""Tabii ki ben Yanlış Cumhuriyet’in önemli bir <em>kitap</em>
olduğunu düşünüyorum ve özellikle de kendini""",
"""55
These are results coming from different pages, I want to know the pageNum as well. So the result I want is something like this if I search for "Reynolds":
{
"pageNum" : 3,
"pageContent" : "Alastair <em>Reynolds</em> ARINMA GEÇİDİ "
},
{
"pageNum" : 236,
"pageContent" : "ne izin vererek sözlerinin hak ettikleri saygın yeri sağladı. “Peki ya
Thorn’un çocuğuna ne oldu?” Khouri, “O Aura,” dedi. <em>Reynolds</em> için geldiğim çocuk.” "}
What should be my search query and should I change anything in mapping or settings? How can I achieve this? Thanks in advance!
You can highlight the pageContent and show the corresponding pageNum by using highlight query in inner hits
Adding a working example with index data, search query and search result
Index Data:
{
"authorName": "Alastair Reynolds",
"bookName": "Arınma Geçidi",
"publishDate": "2021",
"book": [
{
"pageNum": 1,
"pageContent": ""
},
{
"pageNum": 2,
"pageContent": "Alastair Reynolds ARINMA GEÇİDİ"
}
]
}
{
"authorName": "Alastair Reynolds",
"bookName": "Arınma Geçidi",
"publishDate": "2021",
"book": [
{
"pageNum": 1,
"pageContent": ""
},
{
"pageNum": 2,
"pageContent": "© İndie Kitap - 2021 © The Orion Publishing Group Limited - 2019 Yazar"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "book",
"query": {
"bool": {
"must": {
"match": {
"book.pageContent": "Reynolds"
}
}
}
},
"inner_hits": {
"highlight": {
"fields": {
"book.pageContent": {}
}
}
}
}
}
}
Search Result:
"inner_hits": {
"book": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5619608,
"hits": [
{
"_index": "66868025",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "book",
"offset": 1
},
"_score": 0.5619608,
"_source": {
"pageNum": 2, // note this
"pageContent": "Alastair Reynolds ARINMA GEÇİDİ"
},
"highlight": {
"book.pageContent": [
"Alastair <em>Reynolds</em> ARINMA GEÇİDİ" // note this
]
}
}
]
}
}
}
}
Related
When I search for multiple keywords, the last term is not highlighted in the result.
This is the index and mapping:
PUT objects
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
And this is my search:
// query
GET objects/_search
{
"query": {
"multi_match": {
"query": "Goldenen Vlies",
"type": "bool_prefix",
"fields": [
"title",
"title._2gram",
"title._3gram",
"title._index_prefix"
]
}
},
"highlight": {
"fields": {
"title": {}
}
},
"_source": false
}
The output I get is the following:
{
"took" : 1,
"timed_out" : false,
"_shards" : {...},
"hits" : {
"total" : {
"value" : 23,
"relation" : "eq"
},
"max_score" : 7.628418,
"hits" : [
{
"_index" : "objects",
"_id" : "AWj1tIEBIysZ6sOt9vqw",
"_score" : 7.628418,
"highlight" : {
"title" : [
"Schwurkreuz des Ordens vom <em>Goldenen</em> Vlies" <-------
]
}
}
]
}
}
However, this would be the expected/desired output:
{
"took" : 1,
"timed_out" : false,
"_shards" : {...},
"hits" : {
"total" : {
"value" : 23,
"relation" : "eq"
},
"max_score" : 7.628418,
"hits" : [
{
"_index" : "objects",
"_id" : "AWj1tIEBIysZ6sOt9vqw",
"_score" : 7.628418,
"highlight" : {
"title" : [
"Schwurkreuz des Ordens vom <em>Goldenen</em> <em<Vlies</em>" <-------
]
}
}
]
}
}
It does work as expected when I add an extra empty space in the query like so: "query": "Goldenen Vlies ", but I want to know if there is a better solution?
Try this way with "best_fields":
{
"query": {
"multi_match": {
"query": "Goldenen Vlies",
"type": "best_fields",
"fields": [
"title",
"title._2gram",
"title._3gram",
"title._index_prefix"
]
}
},
"highlight": {
"fields": {
"title": {}
}
},
"_source": false
}
I am very new to elasticsearch
I work for a dating website that has data as follows:
Single - with fields: name, signUpDate, state, and other data fields.
Encounter - with fields: state, encounterDate, singlesInvolved, and other data fields.
These are my 2 indexes
Now I have to write a query that returns as follows:
For every state, how many singles, how many encounters, the longest time a single has been part of our website, and the average time a single has been part of our website
And also return one result that is that same average for all states
Like this example:
[
{ //this one is the average of all states
"singles": 45,
"dates": 18,
"minWaitingTime": 1644677979530,
"avgWaitingTime": 15603
},
{ //these are the averages of each state
"state": "MA",
"singles": 50,
"dates": 23,
"minWaitingTime": 1644677979530,
"avgWaitingTime": 15603
},
{
"state": "NY",
"singles": 39,
"dates": 13,
"minWaitingTime": 1644850558872,
"avgWaitingTime": 6033
}
]
I've been working on the query for each state individually but i dont know how to get an average of all states
so far what i have is this:
GET /single,encounter/_search
{
"size": 0,
"aggs": {
"bystate": {
"terms": {
"field": "state",
"size": 59
},
"aggs": {
"group-by-index": {
"terms": {
"field": "_index"
}
},
"min_date": {
"min": {
"field": "signedUpAt"
}
},
"avg_date": {
"avg": {
"field": "signedUpAt"
}
}
}
}
}
}
I don't know if there is a better way to do this, likewise I don't know how to calculate the average (singles, encounters, min_date and average_date average) for all states using this result
Every result of the previous query looks like this:
{
"key" : "MA",
"doc_count" : 164,
"avg_date" : {
"value" : 1.6457900076508965E12,
"value_as_string" : "2022-02-25T11:53:27.650"
},
"min_date" : {
"value" : 1.64467797953E12,
"value_as_string" : "2022-02-12T14:59:39.530"
},
"group-by-index" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "single",
"doc_count" : 135
},
{
"key" : "encounter",
"doc_count" : 29
}
]
}
},
I would really appreciate help on this one
Addition: index mapping.
Encounter:
{
"encounter" : {
"aliases" : { },
"mappings" : {
"properties" : {
"_class" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"avgAge" : {
"type" : "integer",
"index" : false,
"doc_values" : false
},
"application" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"createdAt" : {
"type" : "date",
"format" : "date_hour_minute_second_millis"
},
"encounterId" : {
"type" : "keyword"
},
"locationType" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"singleOneId" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"singleTwoId" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"serviceLine" : {
"type" : "keyword"
},
"state" : {
"type" : "keyword"
},
"rating" : {
"type" : "keyword"
}
}
},
"settings" : {
"index" : {
"refresh_interval" : "1s",
"number_of_shards" : "1",
"provided_name" : "encounter",
"creation_date" : "1643704661932",
"number_of_replicas" : "1",
"uuid" : "MliXQL_bRBKDN7_d8G_BYw",
"version" : {
"created" : "7100299"
}
}
}
}
}
And Single:
{
"single" : {
"aliases" : { },
"mappings" : {
"properties" : {
"_class" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"id" : {
"type" : "keyword"
},
"singleId" : {
"type" : "keyword"
},
"state" : {
"type" : "keyword"
},
"preferedGender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"refresh_interval" : "1s",
"number_of_shards" : "1",
"provided_name" : "single",
"creation_date" : "1643704662136",
"number_of_replicas" : "1",
"uuid" : "Js_tqZfRRx-IxbjVRRN4wQ",
"version" : {
"created" : "7100299"
}
}
}
}
}
You can use avg bucket aggregation, where you can provide bucket_path and based on value it will calculate avg of entire aggregation.
Below is sample query:
{
"size": 0,
"aggs": {
"bystate": {
"terms": {
"field": "state",
"size": 59
},
"aggs": {
"group-by-index": {
"terms": {
"field": "_index"
}
},
"min_date": {
"min": {
"field": "signedUpAt"
}
},
"avg_date": {
"avg": {
"field": "signedUpAt"
}
}
}
},
"avg_all_state": {
"avg_bucket": {
"buckets_path": "bystate>avg_date"
}
}
}
}
My documents stored in elasticsearch have following structure:
{
"id": 1,
"test": "name",
"rules": [
{
"id": 2,
"name": "rule1",
"ruleDetails": [
{
"id": 3,
"requiredAnswerId": 1
},
{
"id": 4,
"requiredAnswerId": 2
},
{
"id": 5,
"requiredAnswerId": 3
}
]
}
]
}
where, rules property has nested type.
I need to query documents by checking that array of requiredAnswerId passed in the search request (provided terms) contains all rules.ruleDetails.requiredAnswerId stored in the document.
Does anyone know which elasticsearch option I can use to build such specific query? Or maybe, it is better to fetch the whole document and perform filtering on the application level.
UPDATED
Adding mapping
{
"my_index": {
"mappings": {
"properties": {
"id": {
"type": "long"
},
"test": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"rules": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ruleDetails": {
"properties": {
"id": {
"type": "long"
},
"requiredAnswerId": {
"type": "long"
}
}
}
}
}
}
}
}
}
Mapping:
{
"index4" : {
"mappings" : {
"properties" : {
"id" : {
"type" : "integer"
},
"rules" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"ruleDetails" : {
"properties" : {
"id" : {
"type" : "long"
},
"requiredAnswerId" : {
"type" : "long"
}
}
}
}
},
"test" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
}
}
}
Query: This will need use of scripts which are not good from performance perspective. I am looping through all documents and checking if field is present is passed parameters
{
"query": {
"nested": {
"path": "rules",
"query": {
"script": {
"script": {
"source": "for(a in doc['rules.ruleDetails.requiredAnswerId']){if(!params.Ids.contains((int)a)) return false; } return true;",
"params": {
"Ids": [
1,
2,
3
]
}
}
}
},
"inner_hits": {}
}
}
}
Result:
"hits" : [
{
"_index" : "index4",
"_type" : "_doc",
"_id" : "TxOpvnEBf42mOjxvvLQB",
"_score" : 4.0,
"_source" : {
"id" : 1,
"test" : "name",
"rules" : [
{
"id" : 2,
"name" : "rule1",
"ruleDetails" : [
{
"id" : 3,
"requiredAnswerId" : 1
},
{
"id" : 4,
"requiredAnswerId" : 2
},
{
"id" : 5,
"requiredAnswerId" : 3
}
]
},
{
"id" : 3,
"name" : "rule3",
"ruleDetails" : [
{
"id" : 3,
"requiredAnswerId" : 1
},
{
"id" : 4,
"requiredAnswerId" : 2
}
]
}
]
},
"inner_hits" : {
"rules" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 4.0,
"hits" : [
{
"_index" : "index4",
"_type" : "_doc",
"_id" : "TxOpvnEBf42mOjxvvLQB",
"_nested" : {
"field" : "rules",
"offset" : 0
},
"_score" : 4.0,
"_source" : {
"id" : 2,
"name" : "rule1",
"ruleDetails" : [
{
"id" : 3,
"requiredAnswerId" : 1
},
{
"id" : 4,
"requiredAnswerId" : 2
},
{
"id" : 5,
"requiredAnswerId" : 3
}
]
}
}
]
}
}
}
}
]
EDIT 1
Terms_set can be used as an alternative. It will be faster compared to script query
Returns documents that contain a minimum number of exact terms in a
provided field.
minimum_should_match_script- size of array can be used to match the minimum number of passed values.
Query:
{
"query": {
"nested": {
"path": "rules",
"query": {
"bool": {
"filter": {
"terms_set": {
"rules.ruleDetails.requiredAnswerId": {
"terms": [
1,
2,
3
],
"minimum_should_match_script": {
"source": "doc['rules.ruleDetails.requiredAnswerId'].size()"
}
}
}
}
}
},
"inner_hits": {}
}
}
}
After some time playing with ES and reading its documentation, I found that you should keep in mind that provided script should be compiled and applied for the document, hence it will be slower, if you just know the required number elements that should match in advance.
Therefore, I created a separate field requiredMatches that stores the number of rules.ruleDetails.requiredAnswerId elements for every document and calculate it before indexing document. Then, instead of using minimum_should_match_script in my search query, I am using minimum_should_match_field:
{
"query": {
"nested": {
"path": "rules",
"query": {
"bool": {
"filter": {
"terms_set": {
"rules.ruleDetails.requiredAnswerId": {
"terms": [
1,
2,
3
],
"minimum_should_match_field": "requiredMatches"
}
}
}
}
},
"inner_hits": {}
}
}
}
I used, following example, as a reference
I didn't find any answers how to do simple thing in ElasticSearch 6.8 I need to filter nested objects.
Index
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {
"human": {
"properties": {
"cats": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"breed": {
"type": "text"
},
"colors": {
"type": "integer"
}
}
},
"name": {
"type": "text"
}
}
}
}
}
Data
{
"name": "iridakos",
"cats": [
{
"colors": 1,
"name": "Irida",
"breed": "European Shorthair"
},
{
"colors": 2,
"name": "Phoebe",
"breed": "european"
},
{
"colors": 3,
"name": "Nino",
"breed": "Aegean"
}
]
}
select human with name="iridakos" and cats with breed contains 'European' (ignore case).
Only two cats should be returned.
Million thanks for helping.
For nested datatypes, you would need to make use of nested queries.
Elasticsearch would always return the entire document as a response. Note that nested datatype means that every item in the list would be treated as an entire document in itself.
Hence in addition to return entire document, if you also want to know the exact hits, you would need to make use of inner_hits feature.
Below query should help you.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "iridakos"
}
},
{
"nested": {
"path": "cats",
"query": {
"match": {
"cats.breed": "european"
}
},
"inner_hits": {}
}
}
]
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.74455214,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1", <--- The document that hit
"_score" : 0.74455214,
"_source" : {
"name" : "iridakos",
"cats" : [
{
"colors" : 1,
"name" : "Irida",
"breed" : "European Shorthair"
},
{
"colors" : 2,
"name" : "Phoebe",
"breed" : "european"
},
{
"colors" : 3,
"name" : "Nino",
"breed" : "Aegean"
}
]
},
"inner_hits" : { <---- Note this
"cats" : {
"hits" : {
"total" : {
"value" : 2, <---- Count of nested doc hits
"relation" : "eq"
},
"max_score" : 0.52354836,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 1
},
"_score" : 0.52354836,
"_source" : { <---- First Nested Document
"breed" : "european"
}
},
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 0
},
"_score" : 0.39019167,
"_source" : { <---- Second Document
"breed" : "European Shorthair"
}
}
]
}
}
}
}
]
}
}
Note in your response how the inner_hits section would appear where you would find the exact hits.
Hope this helps!
You could use something like this:
{
"query": {
"bool": {
"must": [
{ "match": { "name": "iridakos" }},
{ "match": { "cats.breed": "European" }}
]
}
}
}
To search on a cat's breed, you can use the dot-notation.
I really think that I'm trying to do is fairly simple. I'm simply trying to query for N tags. A clear example of this was asked and answered over at "Elasticsearch: How to use two different multiple matching fields?". Yet, that solution doesn't seem to work for the latest version of ES (more likely, I'm simply doing it wrong).
To show the current data and to demonstrate a working query, see below:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Price": [10,5]
}
}
}
}
}
Here are the results for this. As you can see, 5 and 10 are showing up (this demonstrates that basic queries do work):
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGMYXB5vRcKBZaDw",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "a",
"Name" : "Sample 1",
"Timestamp" : 1.455031083799152E9,
"Price" : "10",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGHHXB5vRcKBZaDF",
"_score" : 1.0,
"_source" : {
"Category" : [ "Small Signs" ],
"Code" : "b",
"Name" : "Sample 2",
"Timestamp" : 1.45503108346191E9,
"Price" : "5",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGILXB5vRcKBZaDO",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "c",
"Name" : "Sample 3",
"Timestamp" : 1.455031083530215E9,
"Price" : "10",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGGgXB5vRcKBZaDA",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "d",
"Name" : "Sample 4",
"Timestamp" : 1.4550310834233E9,
"Price" : "10",
"IsEnabled" : true
}
}]
}
}
As a side note: the following bool query gives the exact same results:
{
"query": {
"bool": {
"must": [{
"terms": {
"Price": [10,5]
}
}]
}
}
}
Notice Category...
Let's simply copy/paste Category into a query:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Category" : [ "Medium Signs" ]
}
}
}
}
}
This gives the following gem:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Again, here's the bool query version that gives the same 0-hit result:
{
"query": {
"bool": {
"must": [{
"terms": {
"Category" : [ "Medium Signs" ]
}
}]
}
}
}
In the end, I definitely need something similar to "Category" : [ "Medium Signs", "Small Signs" ] working (in concert with other label queries and minimum_should_match as well-- but I can't even get this bare-bones query to work).
I have zero clue why this is. I poured over the docs for houring, trying everything I can see. Do I need to look into debugging various encodings? Is my syntax archaic?
The problem here is that ElasticSearch is analyzing and betokening the Category field, and the terms filter expects an exact match. One solution here is to add a raw field to Category inside your entry mapping:
PUT labelsample
{
"mappings": {
"entry": {
"properties": {
"Category": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"Code": {
"type": "string"
},
"Name": {
"type": "string"
},
"Timestamp": {
"type": "date",
"format": "epoch_millis"
},
"Price": {
"type": "string"
},
"IsEnabled": {
"type": "boolean"
}
}
}
}
}
...and filter on the raw field:
GET labelsample/entry/_search
{
"query": {
"filtered": {
"filter": {
"terms": {
"Category.raw" : [ "Medium Signs" ]
}
}
}
}
}