Query string with boost fields in Elastic Search - elasticsearch

I am using Query String with Boost Fields in Elastic Search 1.7. It is working fine but in some scenario, I am not getting expected result.
Query:
query
{
"from": 0,
"size": 10,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "account and data",
"fields": [
"title^5"
"authors^4",
"year^5",
"topic^6"
],
"default_operator": "and",
"analyze_wildcard": true
}
},
"score_mode": "sum",
"boost_mode": "sum",
"max_boost": 100
}
}
}
Sample Data :
{
"took": 50,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 12.833213,
"hits": [
{
"_id": "19850",
"_score": 12.833213,
"_source": {
"ID": "19850",
"Year": "2010",
"Title": "account and data :..."
}
},
{
"_id": "16896",
"_score": 11.867042,
"_source": {
"ID": "16896",
"Year": "2014",
"Title": "effectivness of data..."
}
},
{
"_id": "59862",
"_score": 9.706333,
"_source": {
"ID": "59862",
"Year": "2007",
"Title": "best system to..."
}
},
{
"_id": "18501",
"_score": 9.685843,
"_source": {
"ID": "18501",
"Year": "2010",
"Title": "management of..."
}
}
]
}
I am getting above sample data by using query and that is as per expectation. But now, If I increase weight of year to 100 then I expect 4th result at 3rd position and 3rd result at 4th position. I tried many things but I don't know what I am doing wrong.

The boost is only used when the query matches the field you are boosting and it multiplies the score elastic search computes with the boosting you defined. In your query you are looking for "account and data" and that doesn't match any year so the boosting in the year will not be used.
Are you trying to take the year into account for ordering? If that is the case you can try adding the field_value_factor to your query like this:
"query" : {
"function_score": {
"query": { <your query goes here> },
"field_value_factor": {
"field": "year"
}
}
}
This will multiply the year with the score elastic search computes so it will take the year into account without necessary ordering by the year. You can read more about it here https://www.elastic.co/guide/en/elasticsearch/guide/current/boosting-by-popularity.html.
You can always use the explain tool to figure out how elastic search came up with the score and thus returned the results in that order. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

Related

Why elasticsearch return irrelevant results with operators "OR"?

Below are the two documents:
Document-1:
{
"type": "document",
"name": "Meter testing practice",
"id": "cd1269",
"tags": [ "METER TESTING PRACTICE" ]
}
Document-2:
{
"type": "document",
"name": "Single phase meter",
"id": "cd1271",
"tags": [ "SINGLE PHASE METER", "SINGLE PHASE METER INSTALLATION",
"TOOLS FOR METER INSTALLATION" ]
}
Query1:
{
"query": {
"match" : {
"tags" : {
"query" : "SINGLE PHASE METER"
}
}
}
}
When executing query1, it returns below results:
Results:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.2655861,
"hits": [
{
"_shard": "[document_org4][4]",
"_node": "YgzzS4wzQQKpdHxvsbVzPA",
"_index": "document_org4",
"_type": "document",
"_id": "cd1269",
"_score": 1.2655861,
"_source": {
"tags": [ "METER TESTING PRACTICE" ],
"type": "document",
"name": "Meter testing practice",
"id": "cd1269"
}
},
{
"_shard": "[document_org4][3]",
"_node": "YgzzS4wzQQKpdHxvsbVzPA",
"_index": "document_org4",
"_type": "document",
"_id": "cd1271",
"_score": 0.8617958,
"_source": {
"tags": [ "SINGLE PHASE METER", "SINGLE PHASE METER INSTALLATION", "TOOLS FOR METER INSTALLATION" ],
"type": "document",
"name": "Single phase meter",
"id": "cd1271"
}
}
]
}
}
as we can see that in the results first document has highest score, i didn't understand why this is happening. If we see the second document, it is more relevant than first document.
Query2:
{
"query": {
"match" : {
"tags" : {
"query" : "SINGLE PHASE METER",
"operator": "AND"
}
}
}
}
But when executing query2 it gives me the correct result as i expected. Please someone help me out...
It's because the field is shorter. I'd recommend reading up on BM25 (which is the current default scoring algorithm for ES.
You can use the explain api in order to figure out what the individual components of the algorithm score like. This will help you figure out why one documents appears above an other
I'm assuming that you are looking to filter out un-matching documents from the entire population because of the use of tags. It this scenario, you would be going for an exact match, right?
If that's the case I suggest you index your array datatype field as 'KEYWORD' first.
You could then go for a term query:
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"bool":{
"must":[
{
"term": {
"tags.keyword": "single phase meter"
}
}
]
}
}
}
}
}
You might want to normalize your keyword field if you ever want to aggregate or sort on it without encountering odd results. In this example, the field is normalized at index time to lower case.
...
tags:{
"fields":{
"keyword":{
"type":"keyword"
}
}
}
...
For this example to work, you need to create a keyword field in your mapping. Remember that the keyword field is case sensitive. You need to have the exact same spelling at query time for it to match. If you don't normalize your input you would need to use the uppercase spelling.
...
"term": {
"tags.keyword": "SINGLE PHASE METER"
}
....

Elasticsearch query with fuzziness AUTO not working as expected

From the Elasticsearch documentation regarding fuzziness:
AUTO
Generates an edit distance based on the length of the term. Low and high distance arguments may be optionally provided AUTO:[low],[high]. If not specified, the default values are 3 and 6, equivalent to AUTO:3,6 that make for lengths:
0..2
Must match exactly
3..5
One edit allowed
>5
Two edits allowed
However, when I am trying to specify low and high distance arguments in the search query the result is not what I am expecting.
I am using Elasticsearch 6.6.0 with the following index mapping:
{
"fuzzy_test": {
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text"
},
"id": {
"type": "keyword"
}
}
}
}
}
}
Inserting a simple document:
{
"id": "1",
"description": "hello world"
}
And the following search query:
{
"size": 10,
"timeout": "30s",
"query": {
"match": {
"description": {
"query": "helqo",
"fuzziness": "AUTO:7,10"
}
}
}
}
I assumed that fuzziness:AUTO:7,10 would mean that for the input term with length <= 6 only documents with the exact match will be returned. However, here is a result of my query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.23014566,
"hits": [
{
"_index": "fuzzy_test",
"_type": "_doc",
"_id": "OQtUu2oBABnEwrgM3Ejr",
"_score": 0.23014566,
"_source": {
"id": "1",
"description": "hello world"
}
}
]
}
}
This is strange but seems like that bug exists only in version the Elasticsearch 6.6.0. I've tried 6.4.2 and 6.6.2 and both of them work just fine.

Elasticsearch OR query with nested objects returns inner_hits not matching the criteria

I'm getting weird results when querying nested objects. Imagine the following structure:
{ owner.name = "fred",
...,
pets [
{ name = "daisy", ... },
{ name = "flopsy", ... }
]
}
If I only have the document shown above, and I search pets matching this criteria:
pets.name = "daisy" OR
(owner.name = "julie" and pet.name = "flopsy")
I would expect to only get one result ("daisy"), but I'm getting both pet names.
This is one way to reproduce this:
# Create nested mapping
PUT pet-owners
{
"mappings": {
"animals": {
"properties": {
"owner": {"type": "text"},
"pets": {
"type": "nested",
"properties": {
"name": {"type": "text", "fielddata": true}
}
}
}
}
}
}
# Insert nested object
PUT pet-owners/animals/1?op_type=create
{
"owner" : "fred",
"pets" : [
{ "name" : "daisy"},
{ "name" : "flopsy"}
]
}
# Query
GET pet-owners/_search
{ "from": 0, "size": 50,
"query": {
"constant_score": {
"filter": { "bool": {"must": [
{"bool": {"should": [
{"nested": {"query":
{"term": {"pets.name": "daisy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_1",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}},
{"bool": {"must": [
{"term": {"owner": "julie"}},
{"nested": {"query":
{"term": {"pets.name": "flopsy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_2",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}}
]}}
]}}
]}}}},
"_source": false
}
The query returns both pets names (as opposed to the expected one).
Is this behavior normal? Am I doing something wrong, or my reasoning about the nested structure or the query behavior is flawed?
Any help or guidance will be much appreciated.
I'm running this query under ElasticSearch 6.3.x
EDIT: I'm adding the response received, to better illustrate the case
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_score": 1,
"inner_hits": {
"pets_hits_1": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 0
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"daisy"
]
}
}
]
}
},
"pets_hits_2": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 1
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"flopsy"
]
}
}
]
}
}
}
}
]
}
}
So we can see that it's not that the query matches and returns the whole existing document, but that it returns each of the pets independently, one inside each of the inner_hits. It's this result that's surprising to me.
(edited) - in summary this issue is around the context of the 'inner_hits':
It looks like the inner_hits 'pets_hits_2' is returning a match because it is belonging to the nested query that simply searches the pets field for 'flopsy'.
As an independent query on our single document, that is a valid hit.
However, because that query is within a list of bool/must queries, where other queries will not match on our document, you may well expect that the inner_hits should pick up on this and therefore not return a hit.
I haven't been able to find any docs to clarify whether this is intentional behaviour or not - might be worth raising with elastic ...

Why does elasticsearch filter does not give any results whereas using kibana dasboard gives the result?

I am query elastic search using sense. When using range filter on field, I get empty hits, but I am able to get results using kibana dashboard. Why is the filter not working? My query:
GET _search
{
"query": {
"bool": {
"must": [
{"match": {"field_name1": "value1"}},
{"match": {"file_name2": "value2"}}
]
}
},
"filter": { <- not working (no data, but gets data from kibana)
"range": {
"#timestamp": {
"gte": "2017-02-18"
}
}
},
"sort": [
{
"#timestamp": {
"order": "desc",
"ignore_unmapped" : true
}
}
]
}
From kibana dashboard when I add the time it add the time:(from:'2017-02-18T10:19:08.680Z',mode:absolute,to:'2017-02-19T10:19:08.680Z')) and I am able to see results. The dashboard also adds some other stuff like metadata and filter with negate but I think they do the same. Only the time part seem to be different. So why the difference and is my query correct? The sample url:
https://elasticsearch/app/kibana#/discover?
_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-02-18T09:23:41.044Z',mode:absolute,to:'2017-02-19T09:23:41.044Z'))
&_a=(columns:!(description,id),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:index-value,key:field_name1,negate:!f,value:value1),query:(match:(field_name2:(query:value2,type:phrase))))),index:index-value,interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('#timestamp',desc),uiState:(),vis:(aggs:!((params:(field:field_name2,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))
&indexPattern=index-value&type=histogram
Thanks.
Sample json response:
{
"took": some_number,
"timed_out": false,
"_shards": {
"total": some_number,
"successful": some_number,
"failed": 0
},
"hits": {
"total": some_number,
"max_score": null,
"hits": [
{
"_index": "index-name",
"_type": "log-1",
"_id": "alphanum",
"_score": null,
"_source": {
"headers": "header-string",
"query_string": "query-string",
"server_variables": "server-variables",
"cookies": "cookies",
"extra_data": "some extra stuff",
"exception_data_obj": {
"stack_trace": "",
"source": "",
"message": "success",
"additional_data": ""
},
"some_id": "211FA1F1-F312-1234-B539-F7AAE23EAA2F",
"level": "Warn",
"description": "Success",
"#timestamp": "2017-01-20T01:33:27.303Z",
"field1": "value1",
"field2": "value2"
"key": {
"key.field1": "key.value1",
"key.field2": "key.value2"
}
"#by": "app-name",
"environment": "env-name"
},
"sort": [
1484876007303
]
},
{}
]
}
}
it's not the same query, in the sense query you asked must query on field1 and field2 but in kibana you didn't

Which field matched query in multi_match search in Elasticsearch?

I have query with multi_match in Elasticsearch:
{
"query": {
"multi_match": {
"query": "luk",
"fields": [
"xml_string.autocomplete",
"state"
]
}
},
"size": 10,
"fields": [
"xml_string",
"state"
]
}
It works great, result returns expected value:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41179964,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "11",
"_score": 0.41179964,
"fields": {
"xml_string": "Lukas bla bla bla",
"state": "new"
}
}
]
}
}
I've searched a lot, but I am not able to find out which field matched the query(if it was xml_string OR state)
I have found solution: I have used highlight feature and it's working great
This is how my curl looks like:
curl -X GET 'http://xxxxx.com:9200/documents/document/_search?load=false&size=10&pretty' -d '{
"query": {
"multi_match": {
"query": "123",
"fields": ["some_field", "another_field"]
}
},
"highlight": {
"fields": {
"some_field": {},
"another_field": {}
}
},
"size": 10,
"fields": ["field","another_field"]
}'
As far as I know there is no feature for telling you which field has matched the query.
But you can use the explain feature for debugging your query. You only have to add to your query the pamameter &explain=true. With this parameter you will see an explanation for each field of why it is in the result set and you will guess which field matched the query.

Resources