Find actual matching word when using fuzzy query in elastic search

Find actual matching word when using fuzzy query in elastic search - elasticsearch

I am new to elasticsearch and was looking around fuzzy query search.
I have made a new index products with object/record values like this
{
"_index": "products",
"_type": "product",
"_id": "10",
"_score": 1,
"_source": {
"value": [
"Ipad",
"Apple",
"Air",
"32 GB"
]
}
}
Now when i am performing a fuzzy query search in elasticsearch like
{
query: {
fuzzy: {
value: "tpad"
}
}
}
It returns me the correct record (the product just made above) which is expected.
And i know that the term tpad matches ipad so record was return.
But technically how would i know that it has matched ipad. Elastic search just returns the full record(or records) like this
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.61489093,
"hits": [
{
"_index": "products",
"_type": "product",
"_id": "10",
"_score": 0.61489093,
"_source": {
"value": [
"Ipad",
"Apple",
"Air",
"32 GB"
]
}
}
]
}
}
Is there any way in elastic search so that i can know if it has matched tpad against ipad

if you use highlighting, Elasticsearch will show the terms that matched:
curl -XGET http://localhost:9200/products/product/_search?pretty -d '{
"query" : {
"fuzzy" : {
"value" : "tpad"
}
},
"highlight": {
"fields" : {
"value" : {}
}
}
}'
Elasticsearch will return matching documents with the fragment highlighted:
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.13424811,
"hits" : [ {
"_index" : "products",
"_type" : "product",
"_id" : "10",
"_score" : 0.13424811,
"_source":{
"value" : ["Ipad",
"Apple",
"Air",
"32 GB"
]
},
"highlight" : {
"value" : [ "<em>Ipad</em>" ]
}
} ]
}
}

if you just want to analyze the result, you could use the Inquisitor plugin.
If you need to do this programmatically, I think the highlighting feature will help you:
Determining which words were matched in a fuzzy search

I know the question is older but I just ran into it. The way I do it is by populating the query name field when building the query. This way it will come back inside the "matchedQuery" field in response. Hope this helps :)

Related

where does the value for the _doc come from when sort in elastic search

I am using 7.10.1, and I have put into the index with the following data:
PUT /lib15/_doc/1
{
"price":32
}
PUT /lib15/_doc/2
{
"price":21
}
PUT /lib15/_doc/3
{
"price":48
}
PUT /lib15/_doc/4
{
"price":40
}
PUT /lib15/_doc/5
{
"price":42
}
Then I do the following query,
GET /lib15/_search
{
"size": 2,
"query": {
"match_all": {}
},
"sort": [
{
"price": "desc"
},
{
"_doc": "desc"
}
]
}
The result is:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "lib15",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"price" : 48
},
"sort" : [
48,
2
]
},
{
"_index" : "lib15",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"price" : 42
},
"sort" : [
42,
4
]
}
]
}
}
I would ask 2 and 4 in "sort":[48,2] and "sort":[42,4] come from? Are they _doc value? but they are not equal to the _id.

As mentioned in the official documentation of sort, _doc is used to sort by index order.
it means your document containing price was indexed in 2nd and price containing document was indexed in 4th order.
Update:, I used the same order of insertion which you provided and was able to get the same order which you provided, although in the example we are indexing price:48 in 3rd and price:42 in 5th order, but when you use the GET api with these document-id, it prints the _seq_no which is 2 and 4 as shown below:
GET http://localhost:9900/lib15/_doc/3
{
"_index": "lib15",
"_type": "_doc",
"_id": "3",
"_version": 1,
"_seq_no": 2, // note for id 3, seq_no is 2
"_primary_term": 1,
"found": true,
"_source": {
"price": 48
}
}
And GET http://localhost:9900/lib15/_doc/5
{
"_index": "lib15",
"_type": "_doc",
"_id": "5",
"_version": 1,
"_seq_no": 4, // // note for id 5, seq_no is 4
"_primary_term": 1,
"found": true,
"_source": {
"price": 42
}
}

Named rescore queries

Named queries help me to identify which part of the query hit.
For normal queries this works perfectly.
However for rescore queries named queries don't show up in the response.
Question: Is this a bug or intentional? Is there a workaround?
Update: I raised a feature request
I attached some code to reproduce the problem:
Set up a test index with a single document
PUT /test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
},
"field2": {
"type": "keyword"
}
}
}
}
POST /test/_doc/1
{
"field1": "a",
"field2": "b"
}
"Normal" and rescore query.
GET /test/_search
{
"query": {
"term": {
"field1": {
"value": "a",
"_name": "query_field_1"
}
}
},
"rescore": {
"query": {
"rescore_query": {
"term": {
"field2": {
"value": "b",
"_name": "query_field_2"
}
}
}
},
"window_size": 50
}
}
Response: Only the name of the "normal" query shows up in matched_queries.
That "query_field_2" must have also hit can be ensured by comparing the score with and without the rescore query.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"field1" : "a",
"field2" : "b"
},
"matched_queries" : [
"query_field_1" <<-----HERE I'D EXPECT query_field_2------
]
}
]
}
}

The naming is perhaps unfortunate but the rescore query just tweaks the scoring and is applied after the query and post_filter phases. So since it's not an actual query, it cannot be _named.
It's certainly worth a feature request though.

elasticsearch - only return specific fields without _source?

I've found some answer like
Make elasticsearch only return certain fields?
But they all need _source field.
In my system, disk and network are both scarce resources.
I can't store _source field and I don't need _index, _score field.
ElasticSearch Version: 5.5
Index Mapping just likes
{
"index_2020-04-08": {
"mappings": {
"type1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"rank_score": {
"type": "float"
},
"first_id": {
"type": "keyword"
},
"second_id": {
"type": "keyword"
}
}
}
}
}
}
My query:
GET index_2020-04-08/type1/_search
{
"query": {
"bool": {
"filter": {
"term": {
"first_id": "hello"
}
}
}
},
"size": 1000,
"sort": [
{
"rank_score": {
"order": "desc"
}
}
]
}
The search results I got :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_1",
"_score": null,
"sort": [
0.06621722
]
},
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_2",
"_score": null,
"sort": [
0.07864579
]
}
]
}
}
The results I want:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_id": "id_1"
},
{
"_id": "id_2"
}
]
}
}
Can I implement it?

To return specific fields in the document, you must do one of the two:
Include the _source field in your documents, which is enabled by default.
Store specific fields with the stored fields feature which must be enabled manually
Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.
Here's an example that's close to what you want (just change the field list):
$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_id" : "8SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-iEGSHEBzNscjCyQ18cg"
}
]
}
}

Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.
...
"_source": ["only", "fields", "I", "need"]
...
_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.
{
"size": 0,
"aggs": {
"by_ids": {
"terms": {
"field": "_id"
}
}
}
}
which will save you a few bytes
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ac76WXEBnteqn982smh_",
"doc_count" : 1
},
{
"key" : "As77WXEBnteqn982EGgq",
"doc_count" : 1
}
]
}
}
}
but performing aggregations has a cost of its own.

MLT (More Like This) elasticsearch query

I'm trying to use elasticsearch MLT (More Like This) query.
Only one doc in store:
{
"_index": "monitors",
"_type": "monitor",
"_id": "AVTnvJ8SancUpEdFLMiq",
"_score": 1,
"_source": {
"ProcessGroup": "test",
"ProcessName": "test",
"OpName": "test",
"Domain": "test",
"LogLevel": "Info",
"StartDateTime": "2016-05-04 04:46:47",
"EndDateTime": "2016-05-04 04:47:47",
"MessageDateTime": "2016-05-04 04:46:47",
"ApplicationCode": "test",
"Status": "10",
}
}
Query:
POST /_search
{
"query": {
"more_like_this" : {
"fields" : ["ProcessName"],
"like" : "test",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
ProcessName is a not analyzed field.
I was expected to get this document as a response, but instead i got nada:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Why is that ?
Another question:
Suppose I have search engines docs, and I search for "stph". I expect to get "Stephan Curry" suggestion because it's commonly searched. Fuzzy search doesn't fit because distance is greater than 2, so does using MLT query is a good option for this scenario ?

Elasticsearch Prefix query not working on nested documents

I'm using a prefix query for an elasticsearch query. It works fine when using it on top-level data, but once applied to nested data there are no results returned. The data I try to query looks as follows:
Here the prefix query works fine:
Query:
{ "query": { "prefix" : { "duration": "7"} } }
Result:
{
"took": 25, ... },
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "itemresults",
"_type": "itemresult",
"_id": "ITEM_RESULT_7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90_8bce0a3f-f951-4a01-94b5-b55dea1a2752_7c965241-ad0a-4a83-a400-0be84daab0a9_61",
"_score": 1,
"_source": {
"score": 1,
"studentId": "61",
"timestamp": 1377399320017,
"groupIdentifiers": {},
"assessmentItemId": "7c965241-ad0a-4a83-a400-0be84daab0a9",
"answered": true,
"duration": "7.078",
"metadata": {
"Korrektur": "a",
"Matrize12_13": "MA.1.B.1.d.1",
"Kompetenz": "ZuV",
"Zyklus": "Z2",
"Schwierigkeit": "H",
"Handlungsaspekt": "AuE",
"Fach": "MA",
"Aufgabentyp": "L"
},
"assessmentSessionId": "7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90",
"assessmentId": "8bce0a3f-f951-4a01-94b5-b55dea1a2752"
}
},
Now trying to use the prefix query to apply on the nested structure 'metadata' doesn't return any result:
{ "query": { "prefix" : { "metadata.Fach": "M"} } }
Result:
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
What am I doing wrong? Is it at all possible to apply prefix on nested data?

It does not depends whether is nested or not. It depends on your mapping, if you are analyzing the string at index time or not.
I'm going to put an example:
I've created and index with the following mapping:
curl -XPUT 'http://localhost:9200/test/' -d '
{
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"index" : "analyzed"
},
"text_2" : {
"index": "not_analyzed",
"type" : "string"
}
}
}
}
}'
Basically 2 text fields, one analyzed and the other not_analyzed. Now I index the following document:
curl -XPUT 'http://localhost:9200/test/test/1' -d '
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}'
text_1 query
As text_1 is analyzed one of the things that elasticsearch does is to convert the field into lower case. So if I make the following query it doesn't find any document:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "H"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
But if I do the trick and use lower case for making the query:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "h"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}
text_2 query
As text_2 is not analyzed, when I make the original query it matches:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_2": "H"} } }
'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find actual matching word when using fuzzy query in elastic search - elasticsearch

if you just want to analyze the result, you could use the Inquisitor plugin. If you need to do this programmatically, I think the highlighting feature will help you: Determining which words were matched in a fuzzy search

I know the question is older but I just ran into it. The way I do it is by populating the query name field when building the query. This way it will come back inside the "matchedQuery" field in response. Hope this helps :)

Related

where does the value for the _doc come from when sort in elastic search

Named rescore queries

elasticsearch - only return specific fields without _source?

MLT (More Like This) elasticsearch query

Elasticsearch Prefix query not working on nested documents

Categories

Resources