Fields that need not be searchable in ElasticSearch - elasticsearch

I am using ElasticSearch v6 to search my product catalog.
My product has a number fields, such as title, description, price, etc... one of the fields is: photo_path, which would contain the location of product photo on disk.
photo_path does need to be searched, but need to be retrieved.
Question: Is there a way to mark this field as not searchable/not indexed? And is this a good idea, for example will I save storage/process time, by marking this field not searchable.
I have seen this answer and read, _source and _all, but since _all is deprecated in version 6, I am confused what to do.

If you want some field are not indexed are not queryable, setting property"index": false, and if you only want "photo_path" field as the search result, includes this field on source only (save disk space and fetch less data from disk), show mappings like below:
{
"mappings": {
"data": {
"_source": {
"includes": [
"photo_path" // search result only contains this
]
},
"properties": {
"photo_path": {
"type": "keyword",
"doc_values": false, // Set docValues as false if you don't want to use this field to sort/aggregate
"index": false // Not index this field
},
"title": {
"type": "..."
}
}
}
}
}

Related

Elastic Search - store only field

Is there an option in elastic search to store vales just for the purpose of retrieving and not used for searching? So when indexing we'll index all fields and when searching we'll search on a single field only, but need other data as well.
For example, we'll index products, fields could be Name, SKU, Supplier Name etc. Out of which, only Name needs to be indexed and searched. SKU and Supplier Name are just for storing and retrieving with a search.
Since the _source document is stored anyway, the best way to achieve what you want is to neither store nor index any fields, except the one you're searching on, like this:
PUT my-index
{
"mappings": {
"_source": {
"enabled": true <--- true by default, but adding for completeness
},
"properties": {
"name": {
"type": "text",
"index": true <--- true by default, but adding for completeness
},
"sku": {
"type": "keyword",
"index": false, <--- don't index this field
"store": false <--- false by default, but adding for completeness
},
"supplier": {
"type": "keyword",
"index": false, <--- don't index this field
"store": false <--- false by default, but adding for completeness
},
}
}
}
So to sum up:
the fields you want to search on must have index: true
the fields you don't want to search on must have index: false
store is false by default so you don't need to specify it
_source is enabled by default, so you don't need to specify it
enabled should only be used at the top-level or on object fields, so it doesn't have its place here
With the above mapping, you can
search on name
retrieve all fields from the _source document since the _source field is stored by default and contains the original document

How to restrict a field in Elasticsearch document to be updated?

I have an Elasticsearch index, I am saving document into the index.
Is there any way, when I try to index/save same document(with same _id) again, with new value/updated value for some field,
Elasticsearch should throw an exception,
only if that particular field is what we are trying to update? for other field it can work as default behavior.
For Ex: I have index mapping as below
PUT /_index_template/example_template
{
"index_patterns": [
"example*"
],
"priority": 1,
"template": {
"aliases": {
"example":{}
},
"mappings": {
"dynamic":"strict",
"_source":
{"enabled": false},
"properties": {
"SomeID": {
"type": "keyword"
},
"AnotherInfo": {
"type": "keyword"
}
}
}
}
}
Then I create an index based on this index mapping
PUT example01
After that I save a document against the this index
POST example01/_doc/1
{
"SomeId": "abcdedf",
"AnotherInfo":"xyze"
}
Now next time if I try to save the document again with different "SomeId" value
POST example01/_doc/1
{
"SomeId": "uiiuiiu",
"AnotherInfo":"xyze"
}
I want to say "Sorry "someId" field can not be updated"
basically, Preventing document field from getting updated in Elastic
Search.
Thanks in advance!
Elastic search support revision on documents by default it meant it trace the changes on indexed documents with their generated _id and each time you manipulate the document for example with id 17 it's increase the value of #Version field so you can not have two duplicated document with same id if you don't have custom_routing but if you have custom routing always be careful about duplication on _id field because this field is not just identifier it's also keep record of which shard it located.
More over i guess elastic has no way to enforce restrictions at the field level within a document and you may control restriction on updating fields on application level or field level security based on roles.
As an example of field level security consider below role definition grants read access only to the category, #timestamp, and message fields in all the events-* data streams and indices.
POST /_security/role/test_role1
{
"indices": [
{
"names": [ "events-*" ],
"privileges": [ "read" ],
"field_security" : {
"grant" : [ "category", "#timestamp", "message" ]
}
}
]
}

Get array index of matching query in Elasticsearch

I'm storing records in Elasticsearch as:
"mappings": {
"en": {
"_timestamp": {
"enabled": true
},
"_all": {
"enabled": false
},
"properties": {
"id": {
"type": "string",
"index": "analyzed"
},
"text": {
"type": "string",
"index": "analyzed",
"analyzer": "english"
}
}
}
}
Where each elasticsearch record is actually many records bundled into one by having the id field an array of ids [id1, id2, id3, ...] and the text field its respective text ['text 1', 'text 2', 'text 3', ...], so a POST would look something like:
POST my-index/en
{
"id": ["{doc1-ID}", "{doc2-ID}"],
"text": ["document 1 text goes here", "document 2 text goes here"]
}
And I'm running a search for text in the text field, and this all ok except I need the matching documents corresponding id value. I can do this within the app logic itself by iterating through each array item, but that is very costly and inefficient since each Elasticsearch record will be near the max size of ~2GB, so storing that in memory while I search through it all is simply not an option. I'm trying to find a way to retrieve the array index of the matching text from the text array field, so that I can grab it's respective id. Is there a way to get the array's index of the matching text using some kind of Elasticsearch script?
NOTE: I'm storing my documents in this seemingly convoluted way for a very good reason, I realize it would obviously be much easier to have 1 record per elasticsearch record, as it's designed to be; but this will not work for my requirements.

elasticsearch - field filterable but not searchable

Using elastic 2.3.5. Is there a way to make a field filterable, but not searchable? For example, I have a language field, with values like en-US. Setting several filters in query->bool->filter->term, I'm able to filter the result set without affecting the score, for example, searching for only documents that have en-US in the language field.
However, I want a query searching for the term en-US to return no results, since this is not really an indexed field for searching, but just so I can filter.
Can I do this?
ElasticSearch use an _all field to allow fast full-text search on entire documents. This is why searching for en-US in all fields of all documents return you the one containing 'language':'en-US'.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
You can specify "include_in_all": false in the mapping to deactivate include of a field into _all.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string"
},
"country": {
"type": "string"
},
"language": {
"type": "string",
"include_in_all": false
}
}
}
}
}
In this example, searching for 'US' in all field will return only document containing US in title or country. But you still be able to filter your query using the language field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html

Boost field on index in Elastic

I'm using Elastic 1.7.3 and I would like to have a boost on some fields in a index with documents like this fictional example :
{
title: "Mickey Mouse",
content: "Mickey Mouse is a fictional ...",
related_articles: [
{"title": "Donald Duck"},
{"title": "Goofy"}
]
}
Here eg: title is really important, content too, related_articles is a bit more important. My real document have lot of fields and nested object.
I would like to give more weight to the title field than content, and more to content than related_articles.
I have seen the title^5 way, but I must use it at each query and I must (I guess) list all my fields instead of a "_all" query.
I do a lot of search but I found lot of deprecated solutions (_boost by eg).
As I used to work with Sphinx : I search something that works like the field weight option where you can give some weight to field that are really important in your index than others.
You're right that the _boost meta-field that you could use at the type level has been deprecated.
But you can still use the boost property when defining each field in your mapping, which will boost your field at indexing time.
Your mapping would look like this:
{
"my_type": {
"properties": {
"title": {
"type": "string", "boost": 5
},
"content": {
"type": "string", "boost": 4
},
"related_articles": {
"type": "nested",
"properties": {
"title": {
"type": "string", "boost": 3
}
}
}
}
}
}
You have to be aware, though, that it's not necessarily a good idea to boost your field at index time, because once set, you cannot change it unless you are willing to re-index all of your documents, whereas using query-time boosting achieves the same effect and can be changed more easily.

Resources