Completion suggester across multiple indices - elasticsearch

ElasticSearch 6.x does not support multiple type mappings in a single index. So I have for example 2 indices:
location
postcode
I want to have autocomplete functionality (I use completion suggester) using both indices. Is it possible?
This is my example of mapping location index:
{
"location": {
"mappings": {
"location": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
}
}
And postcode index:
{
"postcode": {
"mappings": {
"postcode": {
"properties": {
"code": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
It's possible to do a request if I just skip index name in a request, e.g.
POST _search
{
"suggest": {
"suggestion": {
"prefix": "abc",
"completion": {
"field": "suggest"
}
}
}
}
It searches in both indices but the result is incorrect. For example in the previous request we're searching for values start with abc. If location index contains many documents with values start with abc, e.g. abcd or abcde, then the response won't contain values from postcode index even if it contains exact value abc.
EDITED:
I wasn't right about incorrect behavior across multiple indices. If we use only one index (e.g. location) and PUT one more document with suggest value abc, then we will see the same behavior. It happens because all results have the same score = 1.
So how can I have a bigger score for exact matches?
I found this closed ticket (https://github.com/elastic/elasticsearch/issues/4759) but I don't understand what should I do to achive appropriate behaviour? It does not work out of the box.

Related

How to avoid index explosion in ElasticSearch

I have two docs from the same index that originally look like this (only _source value is shown here)
{
"id" : "3",
"name": "Foo",
"property":{
"schemaId":"guid_of_the_RGB_schema_defined_extenally",
"value":{
"R":255,
"G":100,
"B":20
}
}
}
{
"id" : "2",
"name": "Bar",
"property":{
"schemaId":"guid_of_the_HSL_schema_defined_extenally",
"value":{
"H":255,
"S":100,
"L":20
}
}
}
The schema(used for validation of value) is stored outside of ES since it has nothing to do with the indexing.
If I don't define mapping, the value field will be consider Object mapping. And its subfield will grow once there is a new subfield.
Currently, ElasticSearch supports Flattened mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html to prevent this explosion in the index. However it has a limited support for searching for inner field due to its restriction: As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically.
I need to be able to query the index to find the document match a given doc (e.g. B in the range [10,30])
So far I come up with a solution that structure my doc like this
{
"id":4,
"name":"Boo",
"property":
{
"guid_of_the_normalized_RGB_schema_defined_extenally":
{
"R":0.1,
"G":0.2,
"B":0.5
}
}
Although it does not solve my issue of the explosion in mapping, it mitigates some other issue.
My mapping now will look similar like this for the field property
"property": {
"properties": {
"guid_of_the_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "long"
},
"G": {
"type": "long"
},
"R": {
"type": "long"
}
}
},
"guid_of_the_normalized_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
},
"guid_of_the_HSL_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
}
}
}
}
This solve the issue with the case where the field have the same name but different data type.
Can someone suggest me a solution that could solve the explosion of indices with out suffering from the limit that the Flattened has in searching?
To avoid mapping explosion, the best solution is to normalize your data better.
You can set "dynamic": "strict", in your mapping, then a doc will be rejected if it contains a field which is not already in the mapping.
After that, you can still add new fields but you will have to add them explicitly in the mapping before.
You can add a pipeline to clean up and normalize your data before ingestion.
If you don't want, or cannot reindex:
To make your query easy even if you can not know the "middle" part of your key, you can use a multimatch with a star.
GET myindex/_search
{
"query": {
"multi_match": {
"query": 0.5,
"fields": ["property.*.B"]
}
}
}
But you will still not be able to sort it as you want.
For ordering on multiple 'unknown' field names without touching the data, you can use a script: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
But maybe you could simplify the whole process by adding a dynamic template to your index.
PUT test/_mapping
{
"dynamic_templates": [
{
"unified_red": {
"path_match": "property.*.R",
"mapping": {
"type": "float",
"copy_to": "unified_color.R"
}
}
},
{
"unified_green": {
"path_match": "property.*.G",
"mapping": {
"type": "float",
"copy_to": "unified_color.G"
}
}
},
{
"unified_blue": {
"path_match": "property.*.B",
"mapping": {
"type": "float",
"copy_to": "unified_color.B"
}
}
}
],
"properties": {
"unified_color": {
"properties": {
"R": {
"type": "float"
},
"G": {
"type": "float"
},
"B": {
"type": "float"
}
}
}
}
}
Then you'll be able to query any value with the same query :
GET test/_search
{
"query": {
"range": {
"unified_color.B": {
"gte": 0.1,
"lte": 0.6
}
}
}
}
For already existing fields, you'll have to add the copy_to by yourself on the mapping, and after that run an _update_by_query to populate them.

How to make Elasticsearch aggregation only create 1 bucket?

I have an Elasticsearch index which contains a field called "host". I'm trying to send a query to Elasticsearch to get a list of all the unique values of host in the index. This is currently as close as I can get:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {"field": "host"}
}
}
}
Which returns:
"buckets": [
{
"key": "04",
"doc_count": 201
},
{
"key": "cyn",
"doc_count": 201
},
{
"key": "pc",
"doc_count": 201
}
]
However the actual name of the host is 04-cyn-pc. My understanding is that it is spliting them up into keywords so I try something like this:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keyword",
"fielddata": true
}
}
}
}
}
But it returns illegal_argument_exception "reason": "Mapper for [host.raw] conflicts with existing mapping in other types:\n[mapper [host.raw] has different [index] values, mapper [host.raw] has different [analyzer]]"
As you can probably tell i'm very new to Elasticsearch and any help or direction would be awesome, thanks!
Try this instead:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
Elastic automatically indexes string fields as text and keyword type if you do not specify a mapping. In your example if you do not want your field to be analyzed for full text search, you should just define that fields' type as keyword. So you can get rid of burden of analyzed text field. With the mapping below you can easily solve your problem without changing your agg query.
"properties": {
"host": {
"type": "keyword"
}
}

elastic search copy_to field not filled

I'm trying to copy a main title field in Elastic Search 5.6, to an other field with: index:false, so I can use this field to match the exact value.
However. After the reindex, and performed search with _source:["exact_hoofdtitel"], the field "exact_hoofdtitel" is not filled with the value of "hoofdtitel".
PUT producten_prd_5_test
{
"aliases": {},
"mappings": {
"boek": {
"properties": {
"hoofdtitel": {
"type": "text",
"copy_to": [
"suggest-hoofdtitel", "exact_hoofdtitel"
]
},
"suggest-hoofdtitel": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
},
"exact_hoofdtitel":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"index":false
}
}
},
}
}
},
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
}
GET producten_prd_5_test/_search
{
"_source":["hoofdtitel","exact_hoofdtitel"]
}
hits": [
{
"_index": "producten_prd_5_test",
"_type": "boek",
"_id": "9781138340671",
"_score": 1,
"_source": {
"hoofdtitel": "The Nature of the Firm in the Oil Industry"
}
},
I believe that you can achieve what you want without copy_to. Let me show you how and why you don't need it here.
How can I make both full-text and exact match queries on the same field?
This can be done with fields mapping attribute. Basically, with the following piece of mapping:
PUT producten_prd_5_test_new
{
"aliases": {},
"mappings": {
"boek": {
"properties": {
"hoofdtitel": {
"type": "text", <== analysing for full text search
"fields": {
"keyword": {
"type": "keyword" <== analysing for exact match
},
"suggest": {
"type": "completion", <== analysing for suggest
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
}
you will be telling Elasticsearch to index the same field three times: one for full-text search, one for exact match and one for suggest.
The exact search will be possible to do via a term query like this:
GET producten_prd_5_test_new/_search
{
"query": {
"term": {
"hoofdtitel.keyword": "The Nature of the Firm in the Oil Industry"
}
}
}
Why the field exact_hoofdtitel does not appear in the returned document?
Because copy_to does not change the source:
The original _source field will not be modified to show the copied
values.
It works like _all field, allowing you to concat values of multiple fields in one imaginary field and analyse it in a special way.
Does it make sense to do a copy_to to an index: false field?
With index: false the field will not be analyzed and will not be searchable (like in your example, the field exact_hoofdtitel.keyword).
It may still make sense to do so if you want to do keyword aggregations on that field:
GET producten_prd_5_test/_search
{
"aggs": {
"by copy to": {
"terms": {
"field": "exact_hoofdtitel.keyword"
}
}
}
}
This will return something like:
{
"aggregations": {
"by copy to": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "The Nature of the Firm in the Oil Industry",
"doc_count": 1
}
]
}
}
}

Elasticsearch basic mapping fails

I've installed the Docker containers for Elasticsearch 5.5.2 and Kibana. I started to learn about mapping types, and created an index with the following code through xcurl:
{
"mappings": {
"user": {
"_all": { "enabled": false },
"properties": {
"title": { "type": "text" },
"name": { "type": "text" },
"age": { "type": "integer" }
}
}
}
The index was created successfully and I decided to insert some data. When I try to add a string into an integer field i.e. {"age": "hello"}, Elastic shows an error (this means mappings is working OK). The problem is with other data types:
1.It accepts integers and floats in string fields (I think this could be because of implicit casts).
2.It accepts floats like 22.4 in the agefield (when I search with Kibana or xcurl the agefield content is shown as float and not as an integer, that means is not doing casts from float to integer)
What I'm doing bad?
Have you tried to disable coercion? It can be done at field level:
{
"mappings": {
"user": {
"_all": { "enabled": false },
"properties": {
"title": { "type": "text" },
"name": { "type": "text" },
"age": { "type": "integer",
"coerce": false}
}
}
}
Or at index level for all fields:
"settings": {
"index.mapping.coerce": false
},
"mappings": {
...

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

Resources