Which field matched query in multi_match search in Elasticsearch? - elasticsearch

I have query with multi_match in Elasticsearch:
{
"query": {
"multi_match": {
"query": "luk",
"fields": [
"xml_string.autocomplete",
"state"
]
}
},
"size": 10,
"fields": [
"xml_string",
"state"
]
}
It works great, result returns expected value:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41179964,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "11",
"_score": 0.41179964,
"fields": {
"xml_string": "Lukas bla bla bla",
"state": "new"
}
}
]
}
}
I've searched a lot, but I am not able to find out which field matched the query(if it was xml_string OR state)

I have found solution: I have used highlight feature and it's working great
This is how my curl looks like:
curl -X GET 'http://xxxxx.com:9200/documents/document/_search?load=false&size=10&pretty' -d '{
"query": {
"multi_match": {
"query": "123",
"fields": ["some_field", "another_field"]
}
},
"highlight": {
"fields": {
"some_field": {},
"another_field": {}
}
},
"size": 10,
"fields": ["field","another_field"]
}'

As far as I know there is no feature for telling you which field has matched the query.
But you can use the explain feature for debugging your query. You only have to add to your query the pamameter &explain=true. With this parameter you will see an explanation for each field of why it is in the result set and you will guess which field matched the query.

Related

Elasticsearch OR query with nested objects returns inner_hits not matching the criteria

I'm getting weird results when querying nested objects. Imagine the following structure:
{ owner.name = "fred",
...,
pets [
{ name = "daisy", ... },
{ name = "flopsy", ... }
]
}
If I only have the document shown above, and I search pets matching this criteria:
pets.name = "daisy" OR
(owner.name = "julie" and pet.name = "flopsy")
I would expect to only get one result ("daisy"), but I'm getting both pet names.
This is one way to reproduce this:
# Create nested mapping
PUT pet-owners
{
"mappings": {
"animals": {
"properties": {
"owner": {"type": "text"},
"pets": {
"type": "nested",
"properties": {
"name": {"type": "text", "fielddata": true}
}
}
}
}
}
}
# Insert nested object
PUT pet-owners/animals/1?op_type=create
{
"owner" : "fred",
"pets" : [
{ "name" : "daisy"},
{ "name" : "flopsy"}
]
}
# Query
GET pet-owners/_search
{ "from": 0, "size": 50,
"query": {
"constant_score": {
"filter": { "bool": {"must": [
{"bool": {"should": [
{"nested": {"query":
{"term": {"pets.name": "daisy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_1",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}},
{"bool": {"must": [
{"term": {"owner": "julie"}},
{"nested": {"query":
{"term": {"pets.name": "flopsy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_2",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}}
]}}
]}}
]}}}},
"_source": false
}
The query returns both pets names (as opposed to the expected one).
Is this behavior normal? Am I doing something wrong, or my reasoning about the nested structure or the query behavior is flawed?
Any help or guidance will be much appreciated.
I'm running this query under ElasticSearch 6.3.x
EDIT: I'm adding the response received, to better illustrate the case
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_score": 1,
"inner_hits": {
"pets_hits_1": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 0
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"daisy"
]
}
}
]
}
},
"pets_hits_2": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 1
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"flopsy"
]
}
}
]
}
}
}
}
]
}
}
So we can see that it's not that the query matches and returns the whole existing document, but that it returns each of the pets independently, one inside each of the inner_hits. It's this result that's surprising to me.
(edited) - in summary this issue is around the context of the 'inner_hits':
It looks like the inner_hits 'pets_hits_2' is returning a match because it is belonging to the nested query that simply searches the pets field for 'flopsy'.
As an independent query on our single document, that is a valid hit.
However, because that query is within a list of bool/must queries, where other queries will not match on our document, you may well expect that the inner_hits should pick up on this and therefore not return a hit.
I haven't been able to find any docs to clarify whether this is intentional behaviour or not - might be worth raising with elastic ...

Query string with boost fields in Elastic Search

I am using Query String with Boost Fields in Elastic Search 1.7. It is working fine but in some scenario, I am not getting expected result.
Query:
query
{
"from": 0,
"size": 10,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "account and data",
"fields": [
"title^5"
"authors^4",
"year^5",
"topic^6"
],
"default_operator": "and",
"analyze_wildcard": true
}
},
"score_mode": "sum",
"boost_mode": "sum",
"max_boost": 100
}
}
}
Sample Data :
{
"took": 50,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 12.833213,
"hits": [
{
"_id": "19850",
"_score": 12.833213,
"_source": {
"ID": "19850",
"Year": "2010",
"Title": "account and data :..."
}
},
{
"_id": "16896",
"_score": 11.867042,
"_source": {
"ID": "16896",
"Year": "2014",
"Title": "effectivness of data..."
}
},
{
"_id": "59862",
"_score": 9.706333,
"_source": {
"ID": "59862",
"Year": "2007",
"Title": "best system to..."
}
},
{
"_id": "18501",
"_score": 9.685843,
"_source": {
"ID": "18501",
"Year": "2010",
"Title": "management of..."
}
}
]
}
I am getting above sample data by using query and that is as per expectation. But now, If I increase weight of year to 100 then I expect 4th result at 3rd position and 3rd result at 4th position. I tried many things but I don't know what I am doing wrong.
The boost is only used when the query matches the field you are boosting and it multiplies the score elastic search computes with the boosting you defined. In your query you are looking for "account and data" and that doesn't match any year so the boosting in the year will not be used.
Are you trying to take the year into account for ordering? If that is the case you can try adding the field_value_factor to your query like this:
"query" : {
"function_score": {
"query": { <your query goes here> },
"field_value_factor": {
"field": "year"
}
}
}
This will multiply the year with the score elastic search computes so it will take the year into account without necessary ordering by the year. You can read more about it here https://www.elastic.co/guide/en/elasticsearch/guide/current/boosting-by-popularity.html.
You can always use the explain tool to figure out how elastic search came up with the score and thus returned the results in that order. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

Why does elasticsearch filter does not give any results whereas using kibana dasboard gives the result?

I am query elastic search using sense. When using range filter on field, I get empty hits, but I am able to get results using kibana dashboard. Why is the filter not working? My query:
GET _search
{
"query": {
"bool": {
"must": [
{"match": {"field_name1": "value1"}},
{"match": {"file_name2": "value2"}}
]
}
},
"filter": { <- not working (no data, but gets data from kibana)
"range": {
"#timestamp": {
"gte": "2017-02-18"
}
}
},
"sort": [
{
"#timestamp": {
"order": "desc",
"ignore_unmapped" : true
}
}
]
}
From kibana dashboard when I add the time it add the time:(from:'2017-02-18T10:19:08.680Z',mode:absolute,to:'2017-02-19T10:19:08.680Z')) and I am able to see results. The dashboard also adds some other stuff like metadata and filter with negate but I think they do the same. Only the time part seem to be different. So why the difference and is my query correct? The sample url:
https://elasticsearch/app/kibana#/discover?
_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-02-18T09:23:41.044Z',mode:absolute,to:'2017-02-19T09:23:41.044Z'))
&_a=(columns:!(description,id),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:index-value,key:field_name1,negate:!f,value:value1),query:(match:(field_name2:(query:value2,type:phrase))))),index:index-value,interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('#timestamp',desc),uiState:(),vis:(aggs:!((params:(field:field_name2,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))
&indexPattern=index-value&type=histogram
Thanks.
Sample json response:
{
"took": some_number,
"timed_out": false,
"_shards": {
"total": some_number,
"successful": some_number,
"failed": 0
},
"hits": {
"total": some_number,
"max_score": null,
"hits": [
{
"_index": "index-name",
"_type": "log-1",
"_id": "alphanum",
"_score": null,
"_source": {
"headers": "header-string",
"query_string": "query-string",
"server_variables": "server-variables",
"cookies": "cookies",
"extra_data": "some extra stuff",
"exception_data_obj": {
"stack_trace": "",
"source": "",
"message": "success",
"additional_data": ""
},
"some_id": "211FA1F1-F312-1234-B539-F7AAE23EAA2F",
"level": "Warn",
"description": "Success",
"#timestamp": "2017-01-20T01:33:27.303Z",
"field1": "value1",
"field2": "value2"
"key": {
"key.field1": "key.value1",
"key.field2": "key.value2"
}
"#by": "app-name",
"environment": "env-name"
},
"sort": [
1484876007303
]
},
{}
]
}
}
it's not the same query, in the sense query you asked must query on field1 and field2 but in kibana you didn't

How to perform an exact match query on an analyzed field in Elasticsearch?

This is probably a very commonly asked question, however the answers I've got so far isn't satisfactory.
Problem:
I have an es index that is composed of nearly 100 fields. Most of the fields are string type and set as analyzed. However, the query can be both partial (match) or exact (more like term). So, if my index contains a string field with value super duper cool pizza, there can be partial query like duper super and will match with the document, however, there can be exact query like cool pizza which should not match the document. On the other hand, Super Duper COOL PIzza again should match with this document.
So far, the partial match part is easy, I used AND operator in a match query. However can't get the other type done.
I have looked into other posts related to this problem and this post contains the closest solution:
Elasticsearch exact matches on analyzed fields
Out of the three solutions, the first one feels very complex as I have a lot of fields and I do not use the REST api, I am creating queries dynamically using QueryBuilders with NativeSearchQueryBuilder from their Java api. Also it generates a lots of possible patterns which I think will cause performance issues.
The second one is a much easier solution but again, I have to maintain a lot more (almost) redundant data and, I don't think using term queries are ever going to solve my problem.
The last one has a problem I think, it will not prevent super duper to be matched with super duper cool pizza which is not the output I want.
So is there any other way I can achieve the goal? I can post some sample mapping if required for clearing the question farther. I am already keeping the source as well (in case that can be used). Please feel free to suggest any improvements as well.
Thanks in advance.
[UPDATE]
Finally, I used multi_field, keeping a raw field for exact queries. When I insert I use some custom modification on data, and during searching, I used the same modification routines on input text. This part is not handled by Elasticsearch. If you want to do that, you have to design appropriate analyzers as well.
Index settings and mapping queries:
PUT test_index
POST test_index/_close
PUT test_index/_settings
{
"index": {
"analysis": {
"analyzer": {
"standard_uppercase": {
"type": "custom",
"char_filter": ["html_strip"],
"tokenizer": "keyword",
"filter": ["uppercase"]
}
}
}
}
}
PUT test_index/doc/_mapping
{
"doc": {
"properties": {
"text_field": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "standard_uppercase"
}
}
}
}
}
}
POST test_index/_open
Inserting some sample data:
POST test_index/doc/_bulk
{"index":{"_id":1}}
{"text_field":"super duper cool pizza"}
{"index":{"_id":2}}
{"text_field":"some other text"}
{"index":{"_id":3}}
{"text_field":"pizza"}
Exact query:
GET test_index/doc/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": {
"term": {
"text_field.raw": "PIZZA"
}
}
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1.4054651,
"_source": {
"text_field": "pizza"
}
}
]
}
}
Partial query:
GET test_index/doc/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": {
"match": {
"text_field": {
"query": "pizza",
"operator": "AND",
"type": "boolean"
}
}
}
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"text_field": "pizza"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.5,
"_source": {
"text_field": "super duper cool pizza"
}
}
]
}
}
PS: These are generated queries, that's why there are some redundant blocks, as there would be many other fields concatenated into the queries.
Sad part is, now I need to rewrite the whole mapping again :(
I think this will do what you want (or at least come as close as is possible), using the keyword tokenizer and lowercase token filter:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase_token_filter"]
}
},
"filter": {
"lowercase_token_filter": {
"type": "lowercase"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"lowercase": {
"type": "string",
"analyzer": "lowercase_analyzer"
}
}
}
}
}
}
}
I added a couple of docs for testing:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"text_field":"super duper cool pizza"}
{"index":{"_id":2}}
{"text_field":"some other text"}
{"index":{"_id":3}}
{"text_field":"pizza"}
Notice we have the outer text_field set to be analyzed by the standard analyzer, then a sub-field raw that's not_analyzed (you may not want this one, I just added it for comparison), and another sub-field lowercase that creates tokens exactly the same as the input text, except that they have been lowercased (but not split on whitespace). So this match query returns what you expected:
POST /test_index/_search
{
"query": {
"match": {
"text_field.lowercase": "Super Duper COOL PIzza"
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.30685282,
"_source": {
"text_field": "super duper cool pizza"
}
}
]
}
}
Remember that the match query will use the field's analyzer against the search phrase as well, so in this case searching for "super duper cool pizza" would have exactly the same effect as searching for "Super Duper COOL PIzza" (you could still use a term query if you want an exact match).
It's useful to take a look at the terms generated in each field by the three documents, since this is what your search queries will be working against (in this case raw and lowercase have the same tokens, but that's only because all the inputs were lower-case already):
POST /test_index/_search
{
"size": 0,
"aggs": {
"text_field_standard": {
"terms": {
"field": "text_field"
}
},
"text_field_raw": {
"terms": {
"field": "text_field.raw"
}
},
"text_field_lowercase": {
"terms": {
"field": "text_field.lowercase"
}
}
}
}
...{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"text_field_raw": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "pizza",
"doc_count": 1
},
{
"key": "some other text",
"doc_count": 1
},
{
"key": "super duper cool pizza",
"doc_count": 1
}
]
},
"text_field_lowercase": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "pizza",
"doc_count": 1
},
{
"key": "some other text",
"doc_count": 1
},
{
"key": "super duper cool pizza",
"doc_count": 1
}
]
},
"text_field_standard": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "pizza",
"doc_count": 2
},
{
"key": "cool",
"doc_count": 1
},
{
"key": "duper",
"doc_count": 1
},
{
"key": "other",
"doc_count": 1
},
{
"key": "some",
"doc_count": 1
},
{
"key": "super",
"doc_count": 1
},
{
"key": "text",
"doc_count": 1
}
]
}
}
}
Here's the code I used to test this out:
http://sense.qbox.io/gist/cc7564464cec88dd7f9e6d9d7cfccca2f564fde1
If you also want to do partial word matching, I would encourage you to take a look at ngrams. I wrote up an introduction for Qbox here:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

Elastic Search fulltext search query and filters

I wanna perform a full-text search, but I also wanna use one or many possible filters. The simplified structure of my document, when searching with /things/_search?q=*foo*:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "things",
"_type": "thing",
"_id": "63",
"_score": 1,
"fields": {
"name": [
"foo bar"
],
"description": [
"this is my description"
],
"type": [
"inanimate"
]
}
}
]
}
}
This works well enough, but how do I combine filters with a query? Let's say I wanna search for "foo" in an index with multiple documents, but I only want to get those with type == "inanimate"?
This is my attempt so far:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*foo*"
}
},
"filter": {
"bool": {
"must": {
"term": { "type": "inanimate" }
}
}
}
}
}
}
When I remove the filter part, it returns an accurate set of document hits. But with this filter-definition it does not return anything, even though I can manually verify that there are documents with type == "inanimate".
Since you have not done explicit mapping, term query is looking for an exact match. you need to add "index : not_analyzed" to type field and then your query will work.
This will give you correct documents
{
"query": {
"match": {
"type": "inanimate"
}
}
}
but this is not the solution, You need do explicit mapping as I said.

Resources