Use Completion Suggester for all the fields of across the Elasticsearch type - elasticsearch

I am implementing Completion Suggester in my application, and here goes my requirement:
I am using Elasticsearch 5.5.3 (which support multiple types). I have around 10 types in my Elasticsearch and each type has around 10 string fields. What I want to do is make a single search box, that would complete the phrase (of any fields of those 10 types) as user starts typing using completion suggester. What could be the best approach to it? Is using _all field a good idea?

Yes, that's perfectly doable using a "custom all field" field of type completion
First, create the index with all the types and make sure to copy each field in a custom field of type completion:
PUT my_index
{
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text",
"copy_to": "my_all"
},
"field2": {
"type": "text",
"copy_to": "my_all"
},
"my_all": {
"type": "completion"
}
}
},
"type1": {
"properties": {
"field1": {
"type": "text",
"copy_to": "my_all"
},
"field2": {
"type": "text",
"copy_to": "my_all"
},
"my_all": {
"type": "completion"
}
}
}
}
}
Then, you'd query the completion data like this (i.e. without specifying any mapping type and using the common my_all field):
POST my_index/_search
{
"suggest": {
"my-suggest": {
"prefix": "bla",
"completion": {
"field": "my_all"
}
}
}
}

Related

How to rename a field in Elasticsearch?

I have an index in Elasticsearch with the following field mapping:
{
"version_data": {
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"updated_at": {
"type": "date"
},
"updated_by": {
"type": "keyword"
}
}
}
}
I have already created some documents in it and now want to rename version_data field with _version_data.
Is there any way in the Elasticsearch to rename a field within the mapping and in documents?
The closest thing is the alias data type.
In your mapping you could link it from the old to the new name like this:
PUT test/_mapping
{
"properties": {
"_version_data": {
"type": "alias",
"path": "version_data"
}
}
}
BTW I would generally avoid leading underscored since those normally used for internal fields like _id.

kibana keyword occurrency across documents

I have been unable to show words occurrency in kibana inside a full_text field mapped as "type": "keyword" across documents in the index.
My first attempt involved the usage of an analyzer. However I have been unable to change the document in any way, the index mapping relfect the analyzer but no field reflect the analysis.
This is the simplified mapping:
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"analyzed": {
"type": "text",
"analyzer": "rebuilt"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"rebuilt": {
"tokenizer": "standard"
}
}
},
"index.mapping.ignore_malformed": true,
"index.mapping.total_fields.limit": 2000
}
}
but still I'm unable to see the array of words that I expect to be saved under the text.analyzed field, indeed that fields does not exists and I'm wondering why
It seems like settings fielddata=true link, in spite of being heavily discouraged, solved my problem (at least for now), and allows me to visualize in kibana the occurrence (or absolute frequency) of each word in the text field across documents.
The final version of the proposed simplified mapping therefore became:
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt",
"fielddata": true
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"rebuilt": {
"tokenizer": "standard"
}
}
},
"index.mapping.ignore_malformed": true,
"index.mapping.total_fields.limit": 2000
}
}
Getting rid of the useless analyzed field.
I still have to check the performance of kibana. If someone has a performance safe solution to this problem please do not hesitate.
Thanks.

How to make Elasticsearch aggregation only create 1 bucket?

I have an Elasticsearch index which contains a field called "host". I'm trying to send a query to Elasticsearch to get a list of all the unique values of host in the index. This is currently as close as I can get:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {"field": "host"}
}
}
}
Which returns:
"buckets": [
{
"key": "04",
"doc_count": 201
},
{
"key": "cyn",
"doc_count": 201
},
{
"key": "pc",
"doc_count": 201
}
]
However the actual name of the host is 04-cyn-pc. My understanding is that it is spliting them up into keywords so I try something like this:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keyword",
"fielddata": true
}
}
}
}
}
But it returns illegal_argument_exception "reason": "Mapper for [host.raw] conflicts with existing mapping in other types:\n[mapper [host.raw] has different [index] values, mapper [host.raw] has different [analyzer]]"
As you can probably tell i'm very new to Elasticsearch and any help or direction would be awesome, thanks!
Try this instead:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
Elastic automatically indexes string fields as text and keyword type if you do not specify a mapping. In your example if you do not want your field to be analyzed for full text search, you should just define that fields' type as keyword. So you can get rid of burden of analyzed text field. With the mapping below you can easily solve your problem without changing your agg query.
"properties": {
"host": {
"type": "keyword"
}
}

Elasticsearch Field Preference for result sequence

I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

Dynamic Mapping for an object field that unwraps the parent path

I am evaluating whether ElasticSearch can meet the needs of a new system I'm building. It looks amazing, so I'm really hopeful I can figure out a mapping strategy that works.
In this system, administrators can define fields to be associated with documents dynamically. So a given type (in the elasticsearch sense of the word) can have any number of fields, which I do not know the name of ahead of time. And each field can be of any type: int, date, string, etc.
An example document may look like:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"tagline": "Ask not what your company can do for you, but..."
}
Notice that there are 2 string fields. Awesome. My problem though is that I want the "tagline" to be analyzed, but I do not want "title" to be analyzed.
Remember I don't know the names of these fields ahead of time. And there could be multiple fields of each type. So there could be 10 string fields of various names, 3 of which should be analyzed and 7 of which should not.
Another requirement I have is that the name the administrator gives the field should also be what they can search by. So, for example, if they want to find all the Vice Interns who have something to say, the lucene query may be:
+title:"Vice Intern" +tagline:"company"
So my thought was that I could define a dynamic mapping. Since I don't know the names of the fields ahead of time, it seems like a great approach. The key though is coming up with a way of differentiating string fields that should be analyzed and ones that shouldn't be!
I thought, hey, I'll just put all the fields that need analyzing into a nested object, like this:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"textfields": {
"tagline": "Ask not what your company can do for you, but...",
"somethingelse": "lorem ipsum",
}
}
Then, in my dynamic mapping, I have a way of mapping those fields differently:
{
"mytype": {
"dynamic_templates": {
"nested_textfields": {
"match": "textfields",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"analyzer": "default"
}
}
}
}
}
I know that isn't right, I actually need some kind of nested mapping, but no matter, because if I understand it correctly, even if I got that working, it would mean those fields are searched for (via lucene syntax) like this:
+title:"Vice Intern" +textfields.tagline:"company"
And I don't want the "textfields" prefix. Since I'm the one providing the textfields object that wraps the text fields, I know that the fields within it are still uniquely named across the entire document.
I thought of using a pattern match instead. So instead of wrapping them in a "textfields" object, I could prefix them, like "textfield_tagline". But when doing that, the {name} token in the dynamic mapping includes the prefix, I don't see a way to just pull out the "*" portion.
Any solution which gets me the necessary behavior is a correct answer. Even if that involves nested mapping information into the documents themselves (can you do that? I've seen something like that, I think...).
EDIT:
I've attempted the following dynamic template. I'm trying to use index_name to remove the 'textfields.' in the index. This dynamic template just doesn't seem to match though, because after putting a document and looking at the mapping I see no analyzer specified.
{
"mytype" : {
"dynamic_templates":
[
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
}
]
}
}
I was able to duplicate the results that you asked for specifically with the following index creation (with mappings), document, and search query. The type does vary a bit, but it serves the purpose of the example.
Index Settings
PUT http://localhost:9200/sandbox
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"mytype": {
"dynamic_templates": [
{
"indexedfields": {
"path_match": "indexedfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
},
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}"
}
}
},
{
"strings": {
"path_match": "*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Document
PUT http://localhost:9200/sandbox/mytype/1
{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
Search
POST http://localhost:9200/sandbox/mytype/_search
{
"query": {
"query_string": {
"query": "message:\"great balls\""
}
},
"filter":{
"query":{
"query_string":{
"query":"username:\"User Name\""
}
}
},
"from":0,
"size":10,
"sort":[
]
}
The search returns the following response:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[
{
"_index":"sandbox",
"_type":"mytype",
"_id":"1",
"_score":0.19178301,
"_source":{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
}
]
}
}

Resources