Elasticsearch query that requires all values in array to be present - elasticsearch

Heres a sample query:
{
"query":{
"constant_score":{
"filter":{
"terms":{
"genres_slugs":["simulator", "strategy", "adventure"]
}
}
}
},
"sort":{
"name.raw":{
"order":"asc"
}
}
}
The value mapped to the genres_slugs property is just a simple array.
What i'm trying to do here is match all games that have all the values in the array: ["simulator","strategy","adventure"]
As in, the resulting items MUST have all those values. What's returning instead are results that have only one value and not the others.
Been going at this for 6 hours now :(

Ok, if the resulting items MUST have all those values, use MUST param instead of FILTER.
{ "query":
{ "constant_score" :
{ "filter" :
{ "bool" :
{ "must" : [
{ "term" :
{"genres_slugs":"simulator"}
},
{ "term" :
{"genres_slugs":"strategy"}
},
{ "term" :
{"genres_slugs":"adventure"}
}]
}
}
}
}
}
This returns:
{
"took": 54,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "try",
"_type": "stackoverflowtry",
"_id": "123",
"_score": 1,
"_source": {
"genres_slugs": [
"simulator",
"strategy",
"adventure"
]
}
},
{
"_index": "try",
"_type": "stackoverflowtry",
"_id": "126",
"_score": 1,
"_source": {
"genres_slugs": [
"simulator",
"strategy",
"adventure"
]
}
}
]
}
}
Doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_multiple_exact_values.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html

Related

Get elasticsearch to ignore diacritics and accents in search hit

I want to search data on elasticsearch with different languages, and expect the data will be retrieved no matter if there is a diacritics or accent.
``
for example I have this data:
``
POST ابجد/_doc/31
{
"name":"def",
"city":"Tulkarem"
}
``
POST ابجٌد/_doc/31 { "name":"def", "city":"Tulkarem" }
PUT /abce
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
},
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
}
}
}
}
}
The difference between the two indexes is the diacritics.
Trying to get data:
GET ابجد/_search
I need it to retrieve both index, currently it is revering this:
`{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ابجد",
"_id": "31",
"_score": 1,
"_source": {
"name": "def",
"city": "Tulkarem"
}
}
]
}
}

Elasticsearch query showing weird behavior : bug?

To sum up things quickly, we are using Elasticsearch 6.8.4 and have documents with fields such as "statutPublicOuInterne" (public or internal state) or "identifiant" (identifier).
I cannot share the whole JSON (_source) for security reasons (corporate restrictions), but it looks like the following:
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"contenu": { ... }
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH",
(...)
}
Some of the "statutPublicOuInterne" can have values such as "enAttenteTraitementCommissionTask" or "enCoursTraitementCommissionTask".
1st question: for some reason, when I search for statutPublicOuInterne=enCoursTraitementCommissionTask, it doesn't work, but if I search for statutPublicOuInterne=enCoursTraitementCommission (without "Task"), it works! That seems so weird to me and I really can't explain it.
2nd question: if I assume I need to search without the "Task" at the end, then searching for statutPublicOuInterne=enCoursTraitementCommission works but statutPublicOuInterne=enAttenteTraitementCommission doesn't work! (nor does statutPublicOuInterne=enAttenteTraitementCommissionTask work)
The query is as follows:
{
"query": {
"bool" : {
"must" : [
{
"match" : {
"statutPublicOuInterne" : {
"query" : "enAttenteTraitementCommission"
}
}
}
]
}
}
}
I just can't understand why it doesn't find anything, because if I search for this document with its "identifiant" field, then it works:
{
"query": {
"bool" : {
"must" : [
{
"match" : {
"identifiant" : {
"query" : "SFB-20201105-ELUH"
}
}
}
]
}
}
}
The response is:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.0283146,
"hits": [
{
"_index": "some-index",
"_type": "demandes",
"_id": "SFB-20201105-ELUH",
"_score": 2.0283146,
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"contenu": { ... }
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH",
(...)
}
}
]
}
}
We can clearly see "statutPublicOuInterne": "enAttenteTraitementCommissionTask" in the response.
Am I missing something?
Many thanks in advance for your help!
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"statutPublicOuInterne": {
"type": "text"
}
}
}
}
Index Data:
{
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH"
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"match": {
"statutPublicOuInterne": {
"query": "enAttenteTraitementCommissionTask"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "64700803",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH"
}
}
]

Elasticsearch order by _score or max_score from SearchResponse Java API

I have an index which contain documents with same employee name and email address but varies with other information such as meetings attended and amount spent.
{
"emp_name" : "Raju",
"emp_email" : "raju#abc.com",
"meeting" : "World cup 2019",
"cost" : "2000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "International Academy",
"cost" : "3000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "School of Education",
"cost" : "4000"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "Water world",
"cost" : "1200"
}
{
"emp_name" : "Sanju",
"emp_email" : "sanju#abc.com",
"meeting" : "Event of Tech",
"cost" : "5200"
}
{
"emp_name" : "Bajaj",
"emp_email" : "bajaju#abc.com",
"meeting" : "Event of Tech",
"cost" : "4500"
}
Now, when I do search based on emp_name field like "raj" then I should get one of the Raju, Sanju and Bajaj document since I am using fuzzy search functionality (fuzziness(auto)).
I am implementing elasticsearch using Java High level rest client 6.8 API.
TermsAggregationBuilder termAggregation = AggregationBuilders.terms("employees")
.field("emp_email.keyword")
.size(2000);
TopHitsAggregationBuilder termAggregation1 = AggregationBuilders.topHits("distinct")
.sort(new ScoreSortBuilder().order(SortOrder.DESC))
.size(1)
.fetchSource(includeFields, excludeFields);
Based on the above code, it's getting distinct documents but Raju's record is not on the top of the response instead we see Sanju document due to the number of counts.
Below is the JSON created based on the searchrequest.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "raj",
"fields": [
"emp_name^1.0",
"emp_email^1.0"
],
"boost": 1.0
}
}
],
"filter": [
{
"range": {
"meeting_date": {
"from": "2019-12-01",
"to": null,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"employees": {
"terms": {
"field": "emp_email.keyword",
"size": 2000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"distinct": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"explain": false,
"_source": {
"includes": [
"all_uid",
"emp_name",
"emp_email",
"meeting",
"country",
"cost"
],
"excludes": [
]
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
}
}
}
}
}
I think if we order by max_score or _score then Raju's record will be on top of the response.
Could you please let me know how to get order by _score or max_score of the document returned by response?
Sample response is
{
"took": 264,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 232,
"max_score": 0.0,
"hits": [
]
},
"aggregations": {
"sterms#employees": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sanju",
"doc_count": 4,
"top_hits#distinct": {
"hits": {
"total": 4,
"max_score": 35.71312,
"hits": [
{
"_index": "indexone",
"_type": "employeedocs",
"_id": "1920424",
"_score": 35.71312,
"_source": {
"emp_name": "Sanju",
...
}
}
]
}
}
},
{
"key": "Raju",
"doc_count": 1,
"top_hits#distinct": {
"hits": {
"total": 1,
"max_score": 89.12312,
"hits": [
{
"_index": "indexone",
"_type": "employeedocs",
"_id": "1920424",
"_score": 89.12312,
"_source": {
"emp_name": "Raju",
...
}
}
]
}
}
}
Let me know if you have any question.
Note: I see many similar kind of questions but none of them helped me. Please advise.
Thanks,
Chetan

Function score ignored

I have two nearly identical documents, one of which has the fields CONSTRUCTION: 1 and EDUCATION: 0.1, the other with CONSTRUCTION: 0.1 and EDUCATION: 1. I want to be able to sort results by the value of either the CONSTRUCTION or EDUCATION field
GET /objects/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "Monkeys"
}
}
},
"field_value_factor": {
"field" : "CONSTRUCTION",
"missing": 1
}
}
},
"_source": ["name", "CONSTRUCTION", "EDUCATION"]
}
Returns the incorrect results:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.7622693,
"hits": [
{
"_index": "objects__feed_id_key_pages__date_2019-12-10__timestamp_1575988952__batch_id_3gpnz7fc__",
"_type": "_doc",
"_id": "dit:greatDomesticUi:KeyPages:12",
"_score": 1.7622693,
"_source": {
"CONSTRUCTION": 0.1,
"name": "Space Monkeys - education",
"EDUCATION": 1
}
},
{
"_index": "objects__feed_id_key_pages__date_2019-12-10__timestamp_1575988952__batch_id_3gpnz7fc__",
"_type": "_doc",
"_id": "dit:greatDomesticUi:KeyPages:11",
"_score": 1.0226655,
"_source": {
"CONSTRUCTION": 1,
"name": "Space Monkeys - construction",
"EDUCATION": 0.1
}
}
]
}
}
This only always returns the same results. Indeed if you misspell the field_value_factor field, you get the same score "field_value_factor": { "field" : "WHATEVER",... }. This suggests the field simply isn't being read.
Dynamic mapping was turned off. The EDUCATION and CONSTRUCTION fields were not mapped. Mystery solved!

Elasticsearch wildcard query

can you help me understand, why simple query not working.
I have a simple index with default settings:
PUT my_index/doc/1
{
"path": "C:\\Windows\\system32\\cmd.exe"
}
Why the following query doesn't return anything?
GET my_index/_search
{
"_source": "path",
"query": {
"query_string": {
"query": "(path: *\\system32\\*.exe)"
}
}
}
You should specify the field in your query like this.
GET sample-index/_search
{
"query": {
"query_string" : {
"fields" : ["path.keyword"],
"query" : """*\\system32\\*.exe"""
}
}
}
Output I got was :
{ "took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0 },
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "sample-index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"path": """C:\Windows\system32\cmd.exe"""
}
}
]
}
}
Here I have used path.keyword as when you post a new field (like you did in your question) without mapping, it will by default create a keyword field for it.
check here for more
Extra tip: You can also apply regex over the field section if you want to check for multiple fields (i.e. : path,path1,pathcc etc.)
GET sample-index/_search
{
"query": {
"query_string" : {
"fields" : ["path*"],
"query" : """*\\system32\\*.exe"""
}
}
}

Resources