Phrase suggester Elasticsearch not correcting two words

Phrase suggester Elasticsearch not correcting two words - elasticsearch

I have the following mapping to my phrase suggester:
{
"settings": {
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"shingle_filter"
],
"type": "custom"
}
},
"filter": {
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 6,
"type": "shingle"
}
}
}
},
"mappings": {
"sample_data": {
"properties": {
"name": {
"type": "string",
"analyzer": "suggests_analyzer"
}
}
}
}
}
I have "lung cancer", "colorectal cancer", "breast cancer" indexed in my index. But when I query for a mispelt query where both words are mispelt like "lhng cancar" returns zero results when I use the collate functionality. My sample query is as follows.
{
"suggest": {
"text": "lhng cancar",
"simple_phrase": {
"phrase": {
"field": "name",
"size": 5,
"real_word_error_likelihood": 0.95,
"max_errors": 0.5,
"direct_generator": [
{
"field": "name",
"suggest_mode": "always",
"size": 5
}
],
"collate": {
"query": {
"inline": {
"match_phrase": {
"{{field_name}}": "{{suggestion}}"
}
}
},
"params": {
"field_name": "name"
},
"prune": false
}
}
}
},
"size": 0
}
The response to the above query is:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1868381,
"max_score": 0,
"hits": []
},
"suggest": {
"simple_phrase": [
{
"text": "lhng cancar",
"offset": 0,
"length": 11,
"options": []
}
]
}
}
What changes do need to do in the query so that I get the expected result as "lung cancer" in the suggestions?

You have to raise max_errors to 0.8 or more.
Same answer is given here
ElasticSearch - Phrase Suggestor

Raising the parameter of max_errors: 2 solved my problem.

Related

Elasticsearch null_pointer_exception with top_hits aggregation

When having a nested top_hits aggregation inside a nested terms aggregation inside a children aggregation, I'm getting a null_pointer_exception. I expect to get a valid response.
Steps to reproduce:
create mapping
PUT http://localhost:9200/test
{
"mappings": {
"doc": {
"properties": {
"docType": {
"type": "text"
},
"userId": {
"type": "long"
},
"userName": {
"type": "text"
},
"title": {
"type": "text"
},
"joinField": {
"type": "join",
"relations": {
"post": "comment"
}
}
}
}
}
}
insert example post
PUT http://localhost:9200/test/doc/1
{
"joinField": {
"name": "post"
},
"docType": "post",
"title": "Example Post"
}
insert comment
PUT http://localhost:9200/test/doc/2?routing=1
{
"joinField": {
"name": "comment",
"parent": "1"
},
"userId": 22,
"userName": "John Doe",
"title": "Random comment",
"docType": "comment"
}
Perform search
POST http://localhost:9200/test/doc/_search
{
"aggs": {
"to-comment": {
"children": {
"type": "comment"
},
"aggs": {
"by-user": {
"terms": {
"field": "userId"
},
"aggs": {
"data": {
"top_hits": {
"size": 1
}
}
}
}
}
}
},
"query": {
"bool": {
"filter": [
{
"term": {
"docType": "post"
}
}
]
}
}
}
Response:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 4,
"skipped": 0,
"failed": 1,
"failures": [
{
"shard": 3,
"index": "test",
"node": "0RbF1bIbRO-yN5C1m-HXPA",
"reason": {
"type": "null_pointer_exception",
"reason": null
}
}
]
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"to-comment": {
"doc_count": 0,
"by-user": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
}
It works if I remove the query, but in the actual hits I want to get all the posts only. It also works if I remove the terms aggregations, but I want to filter the posts by other queries (e.g. match on title).

It seems that this is a bug with elastic search. The bug has been reported and will hopefully be fixed soon (https://github.com/elastic/elasticsearch/issues/37650).
If you have any alternative solutions on how to build a similar aggregation, please let me know.
Edit: You can use the painless scripting language for a work-around:
"script": {
"lang": "painless",
"source": "params._source.userName"
}

why Elasticsearch doesn't return right results?

I'm using Elasticsearch 6.2 configured with one cluster of 2 nodes.
GET _cluster/health:
{
"cluster_name": "cluster_name",
"status": "green",
"timed_out": false,
"number_of_nodes": 2,
"number_of_data_nodes": 2,
"active_primary_shards": 47,
"active_shards": 94,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}
GET myindex/_settings:
{
"myindex": {
"settings": {
"index": {
"number_of_shards": "3",
"analysis": {
"analyzer": {
"url_split_analyzer": {
"filter": "lowercase",
"tokenizer": "url_split"
}
},
"tokenizer": {
"url_split": {
"pattern": "[^a-zA-Z0-9]",
"type": "pattern"
}
}
},
"number_of_replicas": "1",
"version": {
"created": "6020499"
}
}
}
}
}
here a snapshot of the _mappings structure:
"myindex": {
"mappings": {
"mytype": {
"properties": {
"#timestamp": {
"type": "date"
},
............
"active": {
"type": "short"
},
"id_domain": {
"type": "short",
"ignore_malformed": true
},
"url": {
"type": "text",
"similarity": "boolean",
"analyzer": "url_split_analyzer"
}
}
.......
I have casually found documents, within my index, that I cannot find if I query the index using the id_domain property.
For example:
GET /myindex/mytype/_search
{
"query": {
"bool": {
"must": [
{
"match": { "active": 1 }
}
]
}
}
}
output example:
{
"_index": "myindex",
"_type": "mytype",
"_id": "myurl",
"_score": 1,
"_source": {
"id_domain": "73993",
"active": 1,
"url": "myurl",
"#timestamp": "2018-05-21T10:55:16.247Z"
}
}
....
returns a list of documents where I found id_domain that I cannot find querying against that id domain, like this:
GET /myindex/mytype/_search
{
"query": {
"match": {
"id_domain": 73993 // with or without " got the same result
}
}
}
output
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I cannot understand why this happens.
I also tried to reindex the index but I got the same result.
I am convinced that I'm missing something.
Is there any reason about that behaviour?
Thank you

In your mapping, id_domain has type short, but in your document you have a value that is out of the bounds for short values ([-32,768 to 32,767]), i.e. 73993.
You need to change the type to integer and all will be fine

Elastic Search: Aggregation sum on a particular field

I am new to elastic search and requesting some help.
Basically I have some 2 million documents in my elastic search and the documents look like below:
{
"_index": "flipkart",
"_type": "PSAD_ThirdParty",
"_id": "430001_MAM_2016-02-04",
"_version": 1,
"_score": 1,
"_source": {
"metrics": [
{
"id": "Metric1",
"value": 70
},
{
"id": "Metric2",
"value": 90
},
{
"id": "Metric3",
"value": 120
}
],
"primary": true,
"ticketId": 1,
"pliId": 206,
"bookedNumbers": 15000,
"ut": 1454567400000,
"startDate": 1451629800000,
"endDate": 1464589800000,
"tz": "EST"
}
}
I want to write an aggregation query which satisfies below conditions:
1) First query based on "_index", "_type" and "pliId".
2) Do aggregation sum on metrics.value based on metrics.id = "Metric1".
Basically I need to query records based on some fields and aggregate sum on a particular metrics value based on metrics id.
Please can you help me in getting my query right.

Your metrics field needs to be of type nested:
"metrics": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
}
}
}
If you want Metric1 to match, meaning upper-case letter, then as you see above the id needs to be not_analyzed.
Then, if you only want metrics.id = "Metric1" aggregations, you need something like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"pliId": 206
}
}
]
}
}
}
},
"aggs": {
"by_metrics": {
"nested": {
"path": "metrics"
},
"aggs": {
"metric1_only": {
"filter": {
"bool": {
"must": [
{
"term": {
"metrics.id": {
"value": "Metric1"
}
}
}
]
}
},
"aggs": {
"by_metric_id": {
"terms": {
"field": "metrics.id"
},
"aggs": {
"total_delivery": {
"sum": {
"field": "metrics.value"
}
}
}
}
}
}
}
}
}
}

Created new index:
Method : PUT ,
URL : http://localhost:9200/google/
Body:
{
"mappings": {
"PSAD_Primary": {
"properties": {
"metrics": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
}
}
Then I inserted some 200 thousand documents and than ran the query and it worked.
Response:
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "google",
"_type": "PSAD_Primary",
"_id": "383701291_MAM_2016-01-06",
"_score": 1,
"_source": {
"metrics": [
{
"id": "Metric1",
"value": 70
},
{
"id": "Metric2",
"value": 90
},
{
"id": "Metric3",
"value": 120
}
],
"primary": true,
"ticketId": 1,
"pliId": 221244,
"bookedNumbers": 15000,
"ut": 1452061800000,
"startDate": 1451629800000,
"endDate": 1464589800000,
"tz": "EST"
}
}
]
},
"aggregations": {
"by_metrics": {
"doc_count": 3,
"metric1_only": {
"doc_count": 1,
"by_metric_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Metric1",
"doc_count": 1,
"total_delivery": {
"value": 70
}
}
]
}
}
}
}
}

How to get the parent document in a nested top_hits aggregation?

This is my document/mapping with a nested prices array:
{
"name": "Foobar",
"type": 1,
"prices": [
{
"date": "2016-03-22",
"price": 100.41
},
{
"date": "2016-03-23",
"price": 200.41
}
]
}
Mapping:
{
"properties": {
"name": {
"index": "not_analyzed",
"type": "string"
},
"type": {
"type": "byte"
},
"prices": {
"type": "nested",
"properties": {
"date": {
"format": "dateOptionalTime",
"type": "date"
},
"price": {
"type": "double"
}
}
}
}
}
I use a top_hits aggregation to get the min price of the nested price array. I also have to filter the prices by date. Here is the query and the response:
POST /index/type/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"prices": {
"nested": {
"path": "prices"
},
"aggs": {
"date_filter": {
"filter": {
"range": {
"prices.date": {
"gte": "2016-03-21"
}
}
},
"aggs": {
"min": {
"top_hits": {
"sort": {
"prices.price": {
"order": "asc"
}
},
"size": 1
}
}
}
}
}
}
}
}
Response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"prices": {
"doc_count": 4,
"date_filter": {
"doc_count": 4,
"min": {
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "index",
"_type": "type",
"_id": "4225796ALL2016061541031",
"_nested": {
"field": "prices",
"offset": 0
},
"_score": null,
"_source": {
"date": "2016-03-22",
"price": 100.41
},
"sort": [
100.41
]
}
]
}
}
}
}
}
}
Is there a way to get the parent source document (or some fields from it) with _id="4225796ALL2016061541031" in the response (e.g. name)? A second query is not an option.

Instead of applying aggregations use query and inner_hits like :
{
"query": {
"nested": {
"path": "prices",
"query": {
"range": {
"prices.date": {
"gte": "2016-03-21"
}
}
},
"inner_hits": {
"sort": {
"prices.price": {
"order": "asc"
}
},
"size": 1
}
}
}
}
Fetch data of parent_documentdata from _source and actual data from inner_hits.
Hope it helps

elasticsearch aggregation result is 0

The following is my query for elasticsearch:
GET index/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"id_1": "xx"
}
},
{
"term": {
"level": "level2"
}
},
{
"or": [
{
"term": {
"type": "yyy"
}
},
{
"term": {
"type": "zzzz"
}
}
]
}
]
}
}
},
"aggs": {
"variable": {
"stats": {
"field": "score"
}
}
}
}
But the agg result is as follows:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 68,
"max_score": 0,
"hits": []
},
"aggregations": {
"variable": {
"count": 30,
"min": 0,
"max": 0,
"avg": 0,
"sum": 0
}
}
}
Why the min,max etc are 0. But value is there for score like(0.18,0.25,etc..). Also in mapping the type for score is long. Please help me to solve this. Thanks in advance.
Edit:
value in index:
"score": 0.18
Single document:
{
"_index": "index",
"_type": "ppppp",
"_id": "n0IiTEd2QFCnJUZOSiNu1w",
"_score": 1,
"_source": {
"name_2": "aaa",
"keyid": "bbbb",
"qqq": "cccc",
"level": "level2",
"type": "kkk",
"keytype": "Year",
"org_id": 25,
"tempid": "113",
"id_2": "561",
"name_1": "xxxxx",
"date_obj": [
{
"keyid": "wwwww",
"keytype": "Year",
"value": 21.510617952000004,
"date": "2015",
"id": "ggggggg",
"productid": ""
},
{
"keyid": "rrrrrr",
"keytype": "Year",
"value": 0.13,
"date": "2015",
"id": "iiiiii",
"productid": ""
}
],
"date": "2015",
"ddddd": 21.510617952000004,
"id_1": "29",
"leveltype": "nnnn",
"tttt": 0.13,
"score": 0.13 ------------------->problem
}
}
Mapping:
curl -XPUT ip:9200/index -d '{
"mappings" : {
"places" : {
"properties" : {
"score" : { "type" : "float"}
}
}
}
}'

The fix should be as simple as changing the type of the score field to float (or double) instead of long. long is an integer type and 0.18 will be indexed as 0 under the hood.
"score" : {
"type" : "float",
"null_value" : 0.0
}
Note that you'll need to reindex your data after making the mapping change.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Phrase suggester Elasticsearch not correcting two words - elasticsearch

You have to raise max_errors to 0.8 or more. Same answer is given here ElasticSearch - Phrase Suggestor

Raising the parameter of max_errors: 2 solved my problem.

Related

Elasticsearch null_pointer_exception with top_hits aggregation

why Elasticsearch doesn't return right results?

Elastic Search: Aggregation sum on a particular field

How to get the parent document in a nested top_hits aggregation?

elasticsearch aggregation result is 0

Categories

Resources