Kafka-connect elasticsearch how to define index mappings - elasticsearch

I'm using kafka-connect-elasticsearch with a custom converter, which extends standard JsonConverter.
I have 250+ topics with different event types, thus i'm happy that kafka-connect automatically creates indices for me in elasticsearch.
However, I'd like to disable all analysers except for keyword-analyser (don't need full-text search here).
How can I do that? How and where can I manipulate index mappings?
In my custom converter I infer schema for my payload, convert it to kafka-connect-specific schema, and then return new SchemaAndValue(connectSchema, connectValue) object.
I suppose, connect-specific schema is then used to generate mappings, is that true?

Related

How to create an Elasticseearch index with index sorting via Spring annotation

I'm using Spring Data for Elasticsearch. I need to create an index with an index sorting as it is described here
Is there a way to define a POJO field to be used as a sorting field during indexing?
I'm using annotations, and that would be a preferred way, but any other options would be Ok too.
Currently this is not possible. Index sorting must be defined when the index is created, and as it is currently possible to define a json file with index settings and add that with #Setting to the entity, this fails in this case. The reason is, that when an index sorting is defined, the corresponding field must be defined in the mappings definition on index creation as well. Spring Data Elasticsearch first creates the index with the settings and after that it writes the mappings - which then is too late.
Please open an issue in the issue tracker that the index creation with index sorting should be possible, we have to think about how to define the sort fields.
Edit 28.03.2021:
From Spring Data Elasticsearch 4.2.0.RC1 on index creation will always be in one step with writing the mapping, so it's possible to provide a settings file that will be used along with the mapping.
It is as well possible now to define the index sorting parameters with arguments of the #Setting annotation, so no need for a json file at all.

disable TypeHints in the Document Generated for Spring Data ElasticSearch 4.X

Is there a way i can disable TypeHints in the Document Generated for Spring Data ElasticSearch.
https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.mapping.meta-model.rules
I have the Mapping Definition for my elastic Index (7.X) Dynamic Mapping Set to Strict and when i am trying to Index a Document it was created a Field _class in the Elastic Document which is failing the Document Insertion into the ElasticSearch index 7.X with Below Error
Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [_class] within [_doc] is not allowed]
Currently this is not possible. You can create an issue in Jira to have this implemented as a new feature, but beware that if type hints are not written, you wont be able to properly read collection-like values of generics.
For example if you have two classes Foo and Bar and in an entity you have a property of type List<Object> which contains Foos and Bars you won't be able to read back such an entity from Elasticsearch, because the type information of the objects would be lost.

Is there any tool out there for generating elasticsearch mapping

Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping

Is it possible to set up the Fuzzy parameter on all indexes data search as the app parameter when the SpringBoot app is requesting ElasticSearch?

I'd like to have a properties set up to adjust fuzziness of elasticsearch search request as a whole application set up, i.e not changing this per #Query of the individual MyEntitySearchRepository. Is there a way to specify this using 1) some SpringBoot properties to be picked up by the Spring Data ElasticSearch 2) using ElasticsearchTemplate to prepopulate it with the fuzzy value from the homegrown spring boot property, while the other part of the app queries to go to ElasticSearch should go from the Spring data definitions (index names, by/in/like parameters). Is it ever possible, or for now the only way it to set up individual #Query to form the request json, containing fuzzy parameter like is described there and I can only paste the fuzzy value there being taken from the homegrown SpringBoot property?
This is at the moment not possible, and I'm not sure if I understand you right: You want to define a global fuzzy setting that should be applied to all queries? On which fields of your document? All String fields?
There is no global fuzzy setting in Elasticsearch itself, so it would be necessary to build custom queries internally.
At the moment the only way to go is with #Query annotated custom repository methods.

Is it possible to save except for indexing specific fields in elasticsearch

The json data is stored in the elasticsearch. However, the type of the specific field of the data is not one but two or more types. So, if save it, I got a type error because of the stored other type data first.
I want to store the raw data without changing the data type. Is it possible to save except for the indexing of certain fields?
You can disable the indexing of specific fields with the enabled option. But this option can only be used at the root of the mapping or on "object" fields.
An other way is to set dynamic mapping to false for this index (documentation here), and manually create the mapping for the only fields you want to index.

Resources