I have an index in elastics search with products. Every product has an article number in the form of a guid. To show this products on a webshop I don't want to show a guid (to long). I want an integer number.
Now i have two keys. One to lookup the web request (the integer) and one to update the product (the guid)
I know I can search on a field in elastic search. But is an exact match search on a field slower as an exact match on a key (_id)? I don't want to do a mapping search from one key to the other because that is another operation.
The _id field is just a primary key for documents. It will be stored separately. Yes, there will be some lag. But you'll find it's not that much lag. If you want a field to search as fast as _id field. Then in mapping, store the field externally. Refer to the store attribute for a field.
Like other fields, it's also stored in ES. By default _id is not analyzed. If you define a field as not_analyzed its also as fast as the _id field. ES indexes each and every field the same.
Related
I'm using kibana 7.10.1.
I need it to use different 'time fields' for each index pattern. Is this possible to set multiple time fields for same index ?
You can pick any date (or date_nanos) field as the primary time field in an index pattern. Screenshot from the second page when creating it:
#timestamp is just a convention. Though you will need to create a different index pattern for each combination of index(es) and primary time field.
The question, is there a way to calculate the most expensive field in a Elasticsearch index.
AIM is to calculate and compare the storage and index size of two fields in a elasticsearch Index.
Also is it wise to use dual type fields?
like a string in elasticsearch has text field which is searchable and .keyword field which is aggregatable
Will it use double the storage and index space?
is it wise to use dual-type fields. Like a string in elasticsearch has text field which is searchable and .keyword field which is aggregatable
It totally depends on the use case. Maintain both keyword & text representation of a field value if :
a) You need advance searching capability on the field
b) Either your current or future requirements requires capability to either sort or aggregate on the field.
In real life i have seen for short text fields like 'name', 'business-name','tag' etc it makes sense to maintain both. But for larger texts e.g description i don't think there are use cases for aggregation & sorting (in general).
I want to have in the search response only documents with specified doc id. In stackoverflow I found this question (Lucene filter with docIds) but as far as I understand there is created the additional field in the document and then doing search by this field. Is there another way to deal with it?
Lucene's docids are intended only to be internal keys. You should not be using them as search keys, or storing them for later use. Those ids are subject to change without warning. They will be changed when updating or reindexing documents, and can change at other times, such as segment merges, as well.
If you want your documents to have a unique identifier, you should generate that key separate from the docId, and index it as a field in your document.
As per my understanding, elasticsearch uses a structure called inverted index to provide full text search. It is clear that inverted index has terms and ids of the documents which has that term but the document can have any number of fields and the field name can be used in the query time to look/search only on that field. In that case how elasticsearch restricts/limits search only to a particular field? I would like to know if inverted index contains fields name or field id along with terms and document id.
Similar thing happens when you sort based on any field. So there could be a way to associate terms with field names. Please help me understand the intricacies involved here.
Thanks in advance.
I would like to know if inverted index contains fields name or field id
along with terms and document id.
Quoting from Lucene Docs
The same string in two different fields is considered a different term. Thus terms are represented as a pair of strings, the first naming the field, and the second naming text within the field.
In that case how elasticsearch restricts/limits search only to a
particular field?
Each segment index maintains Term Vectors : For each field in each document, the term vector is stored. A term vector consists of term text and term frequency.
Hence, the indexes are maintained for each field in each document.
We have a inverted index per field per index.
And there is something called field data cache ( or doc values ) which has the inverted "inverted index". All doc to field value lookup happens here.
I was also having this question
I can share my understanding here with you.
Elasticsearch creates an inverted index for each full-text field of the document. So if an index has 10 fields that allow full-text search then Elasticsearch will create 10 different inverted index for the 10 fields and store the analyzer results in those inverted indices for each field.
Thus when you perform a search operation and specify what all fields you want to search then Elasticsearch will search on the inverted indices of those specific fields only
Thus to summarize, an inverted index is created at the field level.
I hope that helps
Thanks
So basically I have an index that I created, and have set the mapping so that whenever a document is created, the _id of the document is set as one of the fields of the document.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html
This was easy enough, but I noticed that when I update that field (through the Java API), the _id of the document remains the same so the field and _id are out of sync.
Is this intended behaviour? If so, does anyone know why, and if it is a bad idea to set the _id as a field that may frequently change?
If I wanted the _id and the field to be in sync, is reindexing an option?
Thanks
The _id is extracted and copied from that field while indexing.
Also _id is used as routing key and it decides where the document should go in the whole cluster.
Hence its not possible to keep _id as reference to some field , rather that valued is copied to _id before indexing.
If you want to change _id , re-indexing is the only option.