I was wondering if it is possible to modify the behviour of ES when dynamically mapping a field. In my case I don't want ES to map anything. Most of the fields I have are considered text by ES when the field occurs for the first time.
The correct mapping though for our application is 99% always keyword since we don't want the tokenizer to run on it. Can we modify the behaviour for new fields to be always mapped as keyword (unless defined otherwise in the index mapping of course)
Cheers and thanks!
You can use dynamic templates to solve your issue. Moreover, Elasticsearch guide has snippet which is suitable for your case.
Related
When using a SetProcessor to enrich documents with a new field, what is the behavior if strict mapping is used for the index? Does the field being set by the SetProcessor need to be added to the mapping beforehand?
Yes, the new field needs to be added to the mapping prior to the execution of the pipeline. It makes no difference if you just add a new field to your source document or if an ingest pipeline creates one out of the blue, strict mapping is strict, no matter what.
I'm confused with mapping and indexing. As I know, mapping a index is make kinda a schema of document.
My point is when I'm creating a document, there are several ways.
1) mapping an index -> indexing documents
2) when creating documents, simultaneously mapping would be done.
then, why do I have to do a mapping for some cases?
Elasticsearch allows to not define mapping for fields because it has some options to detect field types and apply its default mapping. https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
It's good practice to always default mapping explicitly, relying on ES algorithm can cause unpredictable results.
If you really need some dynamic mappings because for example you don't know all required fields while defining mapping you can use something like dynamic templates or default mapping.
There are several known limitations with default mappings (like text is defaulted to keyword and anything above 256 bytes is ignored). That might work for most of the cases where data is from log. But having a mapping type allows you more control on what kind of indexing to be done and whether to index a field or not. Depending on your use case, the preferred option (defining mapping vs on the fly mapping) might be different.
So im working on a system that logs bad data sent to an api and what the full request was. Would love to be able to see this in Kibana.
Issue is the datatypes could be random, so when I send them to the bad_data field it fails if it dosen't match the original mapping.
Anyone have a suggestion for the right way to handle this?
(2.X Es is required due to a sub dependancy)
You could use ignore_malformed flag in your field mappings. In that case wrong format values will not be indexed and your document will be saved.
See elastic documentation for more information.
If you want to be able to query such fields as original text you could use fields in your mapping for multi-type indexing, to get fast queries on raw text values.
We're in the process of setting up Amazon Elasticsearch Service (running Elasticsearch version 2.3).
We have different types of data (that I'm currently thinking of as different document types within the same index).
We have a generic search in an app where we want an inline autocomplete function, that is, a completion suggester returning hits from all different data (document) types. How can that be set up?
When querying suggesters you have to specify an index, so that's why I wanted to keep all the data in the same index. According to the documentation, the completion suggester considers all documents in the index.
Setting up the completion suggester for the first document type was pretty straight forward and is working great. However, as far as I can see you to specify a suggest field when querying. That would be all good hadn't it been for the error message we get when setting up the mapping for the second document type:
Type: illegal_argument_exception Reason: "[suggest] is defined as an object in mapping [name_of_document_type] but this name is already used for a field in other types"
Writing this question I see that it's possible to specify more than one suggester in a single suggest query. Maybe that is what we have to solve it? (I.e. get X results from Y suggesters where we compare the scores to get the 1 suggestion we want to present to the user.)
One of the core principles of good data design for Elasticsearch (as with many data stores) is to optimise your data storage for ease of reading. Usually, this means embracing duplication.
With this in mind, I'd suggest having a separate autocomplete index with a mapping that's designed specifically for the suggester queries.
Whenever you insert or write one of your other documents, map it to your autocomplete type and add or update it in your autocomplete index at the same time (or, depending on how up-to-date it needs to be, create an offline process to update your autocomplete index e.g. every day).
Then, when you do your suggest query, you can just use your autocomplete index and not worry about dealing with different types of documents with different fields.
I am trying to implement an analyzer (uppercase) and index some documents after that in elasticsearch. My question is, am i following the correct procedure?
Implement your analyzer (containing index and type name), which would create the index if it doesnt exist
Then index the documents with the same index and type name as above during which stream of text would pass through the analyzer and then would be saved in index.
Is this the correct way to go about it?
I indexed some documents with and without using analyzers, checked the contents of index before/after using Facets, and they were no different.
The content is not supposed to be different. How it's indexed is. You should recognize the difference because queries would have different results, like some documents are found which weren't without the analyzers, and viceversa.
Try for instance a March Query.
The _score may and should also change