Elasticsearch function_score decay not working, always returns 1 - elasticsearch

I've been trying to fix this for hours, but nothing seems to change the return value of the function_score decay function. It's simply 1 at all time. It looks like it can't get the integer of the field I'm specifying?
The data model looks like this (obviously fake):
{
"basics": {
"name": "Mr Augustus Flybynight (Jim)",
"name_pref": "Jim",
"location": {
"city": "Melbourne",
"postalCode": "3040",
"meta": {
"country": "Australia"
},
"region": "VIC",
"address": "iytiytiyt, tyiuyti"
},
"email": "augustus.flybynight2#gmail.com",
"applicantNumber": "11882",
"name_first": "Augustus",
"meta": {
"alternateContact": "",
"lastModified": 1473353751,
"alternateName": "",
"notificationType": "-1",
"alternatePhones": [
],
"gender": "M"
},
"name_last": "Flybynight",
"phone": "44556677"
}
}
I have 3 duplicates of this entity which the only difference is their timestamp (basics.meta.lastModified). I'm trying to create a 'closer is better' functional score so that the latest comes to the top. We haven't mapped the timestamp as a date yet, but it is mapped as an integer.
When trying to query with the following
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"basics.meta.lastModified": {
"origin": 1474868635, // now
"offset": 86400, // one day
"scale": 604800, // seven days
"decay": 0.5
}
},
"weight": 2
}
],
"query": {
"bool": {
"should": [
{
"match": {
"_all": "augustus flybynight"
}
},
{
"match": {
"basics.all_names.all_names_identifier_whitespace": {
"query": "augustus flybynight",
"boost": 2
}
}
},
{
"match": {
"basics.email.email_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"match": {
"basics.applicantNumber.applicantNumber_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"wildcard": {
"basics.email.email_identifier_keyword": {
"wildcard": "augustus flybynight*",
"boost": 2
}
}
},
{
"wildcard": {
"basics.all_names.all_names_identifier_whitespace": {
"wildcard": "augustus flybynight*"
}
}
}
],
"must": []
}
}
}
},
"size": 25,
"from": 0,
"min_score": 0.2
}
But this always returns '1' for the functional score, which is then multiplied to the query and doesn't affect it. It's the weirdest thing.
When looking at the explanation, this is what's returned:
{
"value": 1,
"description": "min of:",
"details": [
{
"value": 1,
"description": "product of:",
"details": [
{
"value": 1,
"description": "Function for field basics.meta.lastModified:",
"details": [
{
"value": 1,
"description": "max(0.0, ((2.0 - MIN[0.0])/2.0)",
"details": [
]
}
]
},
{
"value": 1,
"description": "weight",
"details": [
]
}
]
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": [
]
}
]
}
Seems like 'MIN[0.0]' is the part that should be returning the timestamp, but it isn't, instead returning 0 and making the decay function always 1. if I make the decay parameters stricter, like origin:0, offset:0, scale:1 and decay:0.5, I'd expect the function_score to be close to 0, but it's still 1.
Please help. I've been trying everything and there doesn't seem to be a lot of examples online. Any suggestions would be welcomed.

For those hitting the same issue, I've finally found the culprit.
It seems that someone didn't setup the mappings properly at that the basics.meta property was set as a nested type, but since it wasn't populated as such (you'd think that would of caused an issue when indexing data?), when trying to access the data within it, it always returned MIN[0.0] because it simply couldn't find the value of the property.
So yeah, if you ever hit this issue, thoroughly look through your mappings instead of wasting a whole day like I did :|

Related

Cannot seem to use must and must_not together in an elastic search query

If I run the following query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "boxing",
"fuzziness": 2,
"minimum_should_match": 2
}
}
],
"must_not": [
{
"terms_set": {
"allowedCountries": {
"terms": ["gb", "mx"],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"filter": [
{
"range": {
"expireTime": {
"gt": 1674061907954
}
}
},
{
"term": {
"region": {
"value": "row"
}
}
},
{
"term": {
"sourceType": {
"value": "article"
}
}
}
]
}
}
}
against an index with articles that look like:
{
"_index": "content-items-v10",
"_type": "_doc",
"_id": "e7hm75ui4dma1mm4j8q5v7914",
"_score": 4.3724976,
"_source": {
"allowedCountries": ["gb", "ie"],
"body": "Both Joshua Buatsi and Craig Richards join The DAZN Boxing Show ahead of their clash at London's O2 Arena. Matchroom's Eddie Hearn also gives his take on the night, as well as Chantelle Cameron previewing her contest with Victoria Noelia Bustos.",
"competitions": [
{
"id": "8lo6205qyio0fksjx9glqbdhj",
"name": "Buatsi v Richards"
}
],
"contestants": [
{
"id": "7rq59j3eiamxlm12vhxcsgujj",
"name": "Joshua Buatsi"
},
{
"id": "boby9oqe23g6qyuwphrxh8su5",
"name": "Craig Richards"
}
],
"countries": [
{
"id": "7yasa43laq1nb2e6f8bfuvxed",
"name": "World"
},
{
"id": "258l9t5sm55592i08mdpqzr3t",
"name": "United Kingdom"
}
],
"dotsLastUpdateTime": 1673979749396,
"expireTime": 4800000000000,
"fixtureDate": {},
"headline": "Buatsi vs. Richards: Preview",
"id": "e7hm75ui4dma1mm4j8q5v7914",
"importance": 0,
"languageKeys": ["en"],
"languages": ["en"],
"lastUpdateTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"promoImageUrl": null,
"publication": {
"typeId": "1plcw0iyhx9vn1fcanbm2ja3rf",
"typeName": "Shoulder"
},
"publishedTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"region": "row",
"shortHeadline": null,
"sourceType": "article",
"sports": [
{
"id": "2x2oqzx60orpoeugkd754ga17",
"name": "Boxing"
}
],
"teaser": "",
"thumbnailImageUrl": "https://images.daznservices.com/di/library/babcock_canada/45/3e/the-dazn-boxing-show-20052022_xc4jbfqi022l1shq9lu641h9e.png?t=-477976832",
"translations": {}
}
}
I get the following validation error from elasticsearch:
{
"ok": false,
"errors": {
"validation": [
{
"message": "\"query.bool.must_not\" is not allowed",
"path": [
"query",
"bool",
"must_not"
],
"type": "object.unknown",
"context": {
"child": "must_not",
"label": "query.bool.must_not",
"value": [
{
"terms_set": {
"allowedCountries": {
"terms": [
"gb",
"mx"
],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"key": "must_not"
}
}
]
},
"correlationId": "d29e9275-9ab3-4ff8-944d-852b98d4b503"
}
And I cannot figure out what the issue might be! From the elastic docs it should be OK.
I'm using ElasticSearch 7.9.3 running in a local docker container.
I'm hoping someone out there will give me a clue!
Cheers!
I would expect this to just work.
I'm trying to filter out articles that have both of the country codes gb and mx in the field allowedCountries.
I can include them easily enough in the results when I add the terms_set query to the bool.must section of the query.
It works well, you just need to enclose your query in the query section
{
"query": { <--- add this
"bool": { <--- your query starts here
"must": [
...
Thank you for responding!
I was helping with a system I did not have full context on - it turns out there is a proxy in the mix with validation that was blocking the must_not query. So, with the proxy fixed, it now works.

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Sort Elasticsearch results based on field value

Assuming I have 3 documents (users), and they have knowledge of multiple programming languages - with scores associated, as described below, how can I search for multiple fields (multi-match for example), and if some search-keywords hits a language, sort by its score?
// user1
{
"name": "John Bayes",
"prog_langs": [
{
"name": "python",
"score": 10
},
{
"name": "java",
"score": 500
}
]
}
// user2
{
"name": "John Russel",
"prog_langs": [
{
"name": "python",
"score": 100
},
{
"name": "PHP",
"score": 200
}
]
}
// user3
{
"name": "Terry Guy",
"prog_langs": [
{
"name": "C++",
"score": 600
},
{
"name": "Javascript",
"score": 200
}
]
}
For example: searching "John python"
Should return user1 and user2, but user2 showing up first
**I've been trying to use sort and functions, but I think they always use lowest/highest/average values of score.
Thanks!
[Edit]
**In the meantime I got it working in a testing way to see if without full-text/multi-matched works, and I found out I had to make "prog_langs" nested, so I changed the mapping and it works as expected.
Now I'm only missing the part where a full-text search with multi-match merges with current query.
Thanks again!
I managed to fix the query and now it's working as expected.
Before posting my solution, just have to leave a few things to keep in mind:
I made a new mapping, and added some nested objects, so my original query had to suffer some changes (prog_langs are now of type nested)
I wanted at least two fields to match, being mandatory which should match at least once
{
"query": {
"bool": {
"must": [
{
"query": {
"match": {
"name": {
"query": "john python",
"boost": 5
}
}
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "prog_langs",
"query": {
"match": {
"prog_langs.name": {
"query": "john python",
"boost": 5
}
}
}
}
}
]
}
}
],
"should": [
{
"function_score": {
"query": {
"match": {
"prog_langs.name": "john python"
}
},
"functions": [
{
"script_score": {
"script": "_score * (1 + doc['prog_langs.score'].value)"
}
}
]
}
}
]
}
},
"highlight": {
"fields": {
"name": {},
"prog_langs.name": {}
}
}
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources