Specify which fields are indexed in ElasticSearch - elasticsearch

I have a document with a number of fields that I never query on so I would like to turn indexing on those fields off to save resources. I believe I need to disable the _all field, but how do I specify which fields are indexed then?

By default all the fields are indexed within the _all special field as well, which provides the so called catchall feature out of the box. However, you can specify for each field in your mapping whether you want to add it to the _all field or not, through the include_in_all option:
"person" : {
"properties" : {
"name" : {
"type" : "string", "store" : "yes", "include_in_all" : false
}
}
}
The above example disables the default behaviour for the name field, which won't be part of the _all field.
Otherwise, if you don't need the _all field at all for a specific type you can disable it like this, again in your mapping:
"person" : {
"_all" : {"enabled" : false},
"properties" : {
"name" : {
"type" : "string", "store" : "yes"
}
}
}
When you disable it your fields will still be indexed separately, but you won't have the catchall feature that _all provides. You will need then to query your specific fields instead of relying on the _all special field, that's it. In fact, when you query and don't specify a field, elasticsearch queries the _all field under the hood, unless you override the default field to query.

Each string field has index param in the mapping config, which defaults to analyzed. That means that besides the _all field each field is indexed solely.
And for the _all field it is said in reference that:
By default, it is enabled and all fields are included in it for ease of use.
So, to completely disable indexing for a field you have to specify (if the _all field is enabled):
"mappings": {
"your_mapping": {
"properties": {
"field_not_to_index": {
"type": "string",
"include_in_all": false,
"index": "no"
}
}
}
}
For the fields that should be queried on whether include them in the _all field (with "index": "no" to save resources) if you query through the _all field, or if you query on those fields solely use the index param with any positive value (analyzed or not_analyzed) and disable the _all field to save resources.

Following is an important doc page to understand the index settings in elastic search
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping-intro.html
For your problem, ideally you should set the "index" flag to no in the field properties.

You can utilize enabled field to disable particular field or entire mapping.
ElasticSearch Doc
Disable Field mapping (i.e. session_data field)
{
"mappings": {
"_doc": {
"properties": {
"session_data": {
"enabled": false
}
}
}
}
}
Disable entire mapping
{
"mappings": {
"_doc": {
"enabled": false
}
}
}

Set dynamic index and _all index to false. Specify the required fields in mapping.
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
{
"mappings":{
"candidates":{
"_all":{
"enabled":false
},
"dynamic": "false",
"properties":{
"tags":{
"type":"text"
},
"derivedAttributes":{
"properties":{
"city":{
"type":"text"
},
"zip5":{
"type":"keyword"
}
}
}
}
}
}
}

_all has been deprecated since 6.0. Use below
"mappings": {
"dynamic":"false",
"properties": {
"field_to_index":{"index": true, "type": "text"}
}
According to es docs
Setting dynamic to false doesn’t alter the contents of the _source field at all. The _source will still contain the whole JSON document that you indexed. However, any unknown fields will not be added to the mapping and will not be searchable.

Related

can ElasticSearch only add field index ,no save the orignal value just like lucene Field.Store.NO

I have a big size field in MySQL and do not want to save the original value to ElasticSearch. Is there a method just like Lucene Field.Store.NO?
Thanks.
You just need to define the "store" mapping accordingly, eg. :
PUT your-index
{
"mappings": {
"properties": {
"some_field": {
"type": "text",
"index": true,
"store": false
}
}
}
}
You may also want to disable the _source field :
#disable-source-field
The _source field contains the original JSON document body that was passed at index time [...] Though very handy to have around, the source field does incur storage overhead within the index.
For this reason, it can be disabled as follows:
PUT your-index
{
"mappings": {
"_source": {
"enabled": false
}
}
}

how to specify a field which should not indexed?

as mentioned in the title, I want to disable index a specified field in elasticsearch, for example, I have a fields named #fileds which contains three sub-fields like name、age、salary, now I do not want to index the field #fields.age in elasticsearch, how can I achieve that? I have tried to use include_in_all parameters, but it doesn't work. mapping configuration like:
"mappings": {
"fluentd": {
"properties": {
"#fields": {
"properties": {
"age": {
"type": "text",
"include_in_all": false,
"index": "no"
}
}
}
}
}
}
when I use this mapping configuration above, I can only see #fields.age in the index's mapping, #fields.name and #fields.salary should appear on the index's mapping not the #fields.age, how can this happen? any answers will be appreciated.

elasticsearch - field filterable but not searchable

Using elastic 2.3.5. Is there a way to make a field filterable, but not searchable? For example, I have a language field, with values like en-US. Setting several filters in query->bool->filter->term, I'm able to filter the result set without affecting the score, for example, searching for only documents that have en-US in the language field.
However, I want a query searching for the term en-US to return no results, since this is not really an indexed field for searching, but just so I can filter.
Can I do this?
ElasticSearch use an _all field to allow fast full-text search on entire documents. This is why searching for en-US in all fields of all documents return you the one containing 'language':'en-US'.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
You can specify "include_in_all": false in the mapping to deactivate include of a field into _all.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string"
},
"country": {
"type": "string"
},
"language": {
"type": "string",
"include_in_all": false
}
}
}
}
}
In this example, searching for 'US' in all field will return only document containing US in title or country. But you still be able to filter your query using the language field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html

Wildcard query over _all field on Elasticsearch

I'm trying to perform wildcard queries over the _all field. An example query could be:
GET index/type/_search
{
"from" : 0,
"size" : 1000,
"query" : {
"bool" : {
"must" : {
"wildcard" : {
"_all" : "*tito*"
}
}
}
}
}
The thing is that to use a wildcard query the _all field needs to be not_analyzed, otherwise the query won't work. See ES documentation for more info.
I tried to set the mappings over the _all field using this request:
PUT index
{
"mappings": {
"type": {
"_all" : {
"enabled" : true,
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed"
},
"_timestamp": {
"enabled": "true"
},
"properties": {
"someProp": {
"type": "date"
}
}
}
}
}
But I'm getting the error analyzer [not_analyzed] not found for field [_all].
I want to know what I'm doing wrong and if there is another (better) way to perform this kind of queries.
Thanks.-
Have you tried removing:
"search_analyzer": "not_analyzed"
Also, I wonder how well a wildcard across all properties will scale. Have you looked into NGrams? See the docs here.
Most probably you wanted to give option
"index": "not_analyzed"
Index attribute for a string field, _all is a string field, determines if that field should be analyzed or not.
"search_analyzer" is to set to determine which analyzer should be used for user entered query, which is valid if index attribute is set to analyzed.
"index_analyzer" is to set to determine which analyzer should be used for documents, again which is valid if index attribute is set to analyzed.

How to define a mapping in elasticsearch that doesn't accept fields other that the mapped ones?

Ok, in my elastisearch I am using the following mapping for an index:
{
"mappings": {
"mytype": {
"type":"object",
"dynamic" : "false",
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "string"
},
"published": {
"type": "date"
}
}
}
}
}
it works. In fact if I put a malformed date in the field "published" it complains and fails.
Also I've the following configuration:
...
node.name : node1
index.mapper.dynamic : false
index.mapper.dynamic.strict : true
...
And without the mapping, I can't really use the type. The problem is that if I insert something like:
{
"name":"boh58585",
"address": "hiohio",
"published": "2014-4-4",
"test": "hophiophop"
}
it will happily accept it. Which is not the behaviour that I expect, because the field test is not in the mapping. How can I restrict the fields of the document to only those that are in the mapping???
The use of "dynamic": false tells Elasticsearch to never allow the mapping of an index to be changed. If you want an error thrown when you try to index new documents with fields outside of the defined mapping, use "dynamic": "strict" instead.
From the docs:
"The dynamic parameter can also be set to strict, meaning that not only new fields will not be introduced into the mapping, parsing (indexing) docs with such new fields will fail."
Since you've defined this in the settings, I would guess that leaving out the dynamic from the mapping definition completely will default to "dynamic": "strict".
Is your problem with the malformed date field?
I would fix the date issue and continue to use dynamic: false.
You can read about the ways to set up the date field mapping for a custom format here.
Stick the date format string in a {type: date, format: ?} mapping.

Resources