elastic search copy_to field not filled - elasticsearch

I'm trying to copy a main title field in Elastic Search 5.6, to an other field with: index:false, so I can use this field to match the exact value.
However. After the reindex, and performed search with _source:["exact_hoofdtitel"], the field "exact_hoofdtitel" is not filled with the value of "hoofdtitel".
PUT producten_prd_5_test
{
"aliases": {},
"mappings": {
"boek": {
"properties": {
"hoofdtitel": {
"type": "text",
"copy_to": [
"suggest-hoofdtitel", "exact_hoofdtitel"
]
},
"suggest-hoofdtitel": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
},
"exact_hoofdtitel":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"index":false
}
}
},
}
}
},
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
}
GET producten_prd_5_test/_search
{
"_source":["hoofdtitel","exact_hoofdtitel"]
}
hits": [
{
"_index": "producten_prd_5_test",
"_type": "boek",
"_id": "9781138340671",
"_score": 1,
"_source": {
"hoofdtitel": "The Nature of the Firm in the Oil Industry"
}
},

I believe that you can achieve what you want without copy_to. Let me show you how and why you don't need it here.
How can I make both full-text and exact match queries on the same field?
This can be done with fields mapping attribute. Basically, with the following piece of mapping:
PUT producten_prd_5_test_new
{
"aliases": {},
"mappings": {
"boek": {
"properties": {
"hoofdtitel": {
"type": "text", <== analysing for full text search
"fields": {
"keyword": {
"type": "keyword" <== analysing for exact match
},
"suggest": {
"type": "completion", <== analysing for suggest
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
}
you will be telling Elasticsearch to index the same field three times: one for full-text search, one for exact match and one for suggest.
The exact search will be possible to do via a term query like this:
GET producten_prd_5_test_new/_search
{
"query": {
"term": {
"hoofdtitel.keyword": "The Nature of the Firm in the Oil Industry"
}
}
}
Why the field exact_hoofdtitel does not appear in the returned document?
Because copy_to does not change the source:
The original _source field will not be modified to show the copied
values.
It works like _all field, allowing you to concat values of multiple fields in one imaginary field and analyse it in a special way.
Does it make sense to do a copy_to to an index: false field?
With index: false the field will not be analyzed and will not be searchable (like in your example, the field exact_hoofdtitel.keyword).
It may still make sense to do so if you want to do keyword aggregations on that field:
GET producten_prd_5_test/_search
{
"aggs": {
"by copy to": {
"terms": {
"field": "exact_hoofdtitel.keyword"
}
}
}
}
This will return something like:
{
"aggregations": {
"by copy to": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "The Nature of the Firm in the Oil Industry",
"doc_count": 1
}
]
}
}
}

Related

How do I search documents with their synonyms in Elasticsearch?

I have an index with some documents. These documents have the field name. But now, my documents are able to have several names. And the number of names a document can have is uncertain. A document can have only one name, or there can be 10 names of one document.
The question is, how to organize my index, document and query and then search for 1 document by different names?
For example, there's a document with names: "automobile", "automobil", "自動車". And whenever I query one of these names, I should get this document. Can I create kind of an array of these names and build a query to search for each one? Or there's more appropriate way to do this.
Tldr;
I feels like you are looking for something like synonyms?
Solution
In the following example I am creating an index, with a specific text analyser.
This analyser, handle automobile, automobil and 自動車 as the same token.
PUT /74472994
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [ "automobile, automobil, 自動車" ]
}
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "synonym"
}
}
}
}
POST /74472994/_doc
{
"name": "automobile"
}
which allow me to perform the following request:
GET /74472994/_search
{
"query": {
"match": {
"name": "automobil"
}
}
}
GET /74472994/_search
{
"query": {
"match": {
"name": "自動車"
}
}
}
And always get:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7198386,
"hits": [
{
"_index": "74472994",
"_id": "ROfyhoQBcn6Q8d0DlI_z",
"_score": 1.7198386,
"_source": {
"name": "automobile"
}
}
]
}
}

How to make Elasticsearch aggregation only create 1 bucket?

I have an Elasticsearch index which contains a field called "host". I'm trying to send a query to Elasticsearch to get a list of all the unique values of host in the index. This is currently as close as I can get:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {"field": "host"}
}
}
}
Which returns:
"buckets": [
{
"key": "04",
"doc_count": 201
},
{
"key": "cyn",
"doc_count": 201
},
{
"key": "pc",
"doc_count": 201
}
]
However the actual name of the host is 04-cyn-pc. My understanding is that it is spliting them up into keywords so I try something like this:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keyword",
"fielddata": true
}
}
}
}
}
But it returns illegal_argument_exception "reason": "Mapper for [host.raw] conflicts with existing mapping in other types:\n[mapper [host.raw] has different [index] values, mapper [host.raw] has different [analyzer]]"
As you can probably tell i'm very new to Elasticsearch and any help or direction would be awesome, thanks!
Try this instead:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
Elastic automatically indexes string fields as text and keyword type if you do not specify a mapping. In your example if you do not want your field to be analyzed for full text search, you should just define that fields' type as keyword. So you can get rid of burden of analyzed text field. With the mapping below you can easily solve your problem without changing your agg query.
"properties": {
"host": {
"type": "keyword"
}
}

Completion suggester across multiple indices

ElasticSearch 6.x does not support multiple type mappings in a single index. So I have for example 2 indices:
location
postcode
I want to have autocomplete functionality (I use completion suggester) using both indices. Is it possible?
This is my example of mapping location index:
{
"location": {
"mappings": {
"location": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
}
}
And postcode index:
{
"postcode": {
"mappings": {
"postcode": {
"properties": {
"code": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
It's possible to do a request if I just skip index name in a request, e.g.
POST _search
{
"suggest": {
"suggestion": {
"prefix": "abc",
"completion": {
"field": "suggest"
}
}
}
}
It searches in both indices but the result is incorrect. For example in the previous request we're searching for values start with abc. If location index contains many documents with values start with abc, e.g. abcd or abcde, then the response won't contain values from postcode index even if it contains exact value abc.
EDITED:
I wasn't right about incorrect behavior across multiple indices. If we use only one index (e.g. location) and PUT one more document with suggest value abc, then we will see the same behavior. It happens because all results have the same score = 1.
So how can I have a bigger score for exact matches?
I found this closed ticket (https://github.com/elastic/elasticsearch/issues/4759) but I don't understand what should I do to achive appropriate behaviour? It does not work out of the box.

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.
Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}
I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter
so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}

Raw nested aggregation

I would like to create a raw nested aggregation in ElasticSearch, but I'm enable to get it working.
My documents look like this :
{
"_index": "items",
"_type": "frame_spec",
"_id": "19770602001",
"_score": 1,
"_source": {
"item_type_name": "frame_spec",
"status": "published",
"creation_date": "2016-02-18T11:19:15Z",
"last_change_date": "2016-02-18T11:19:15Z",
"publishing_date": "2016-02-18T11:19:15Z",
"attributes": [
{
"brand": "Sun"
},
{
"model": "Sunglasses1"
},
{
"eyesize": "56"
},
{
"opc": "19770602001"
},
{
"madein": "UNITED KINGDOM"
}
]
}
}
What I want to do is to aggregate based on one of the attributes. I can't do a normal aggregation with "attributes.model" (for example) because some of them contain spaces. So I've tried using the "raw" property but it appears that ES considers it as a normal property and does not return any result.
This is what I've tried :
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}
But I have no result.
Have you any solution I could use for this problem ?
You should use a dynamic_template in your mapping that will catch all attributes.* string fields and create a raw sub-field for all of them. For other types than string, you don't really need raw fields. You need to delete your current index and then recreate it with this:
DELETE items
PUT items
{
"mappings": {
"frame_spec": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"path_match": "attributes.*",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
After that, you need to re-populate your index and then you'll be able to run this:
POST /items/_search
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}

Resources