Elasticsearch datastreams integration with spring data - elasticsearch

We have time series data & there by moving to use elasticsearch datastreams
I am using springboot, having following dependency
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
<version>4.4.1</version>
</dependency>
Seems Spring Data stopped working with following error:
Error Message: cannot create index with name [my_datastream_index_test], because it matches with template [my-datastream-index-template] that creates data streams only, use create data stream api instead
Caused by: ElasticsearchStatusException[Elasticsearch exception [type=illegal_argument_exception, reason=cannot create index with name [my_datastream_index_test], because it matches with template [my-datastream-index-template] that creates data streams only, use create data stream api instead]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:178)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2484)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2461)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2184)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2154)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2118)
at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:152)
at org.springframework.data.elasticsearch.core.RestIndexTemplate.lambda$doCreate$0(RestIndexTemplate.java:86)
at org.springf
Does elastic spring-data work with data streams? I expected it to be seamless transition. Am I missing on anything?
The repository bean itself is not getting created:
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository]: Constructor threw exception; nested exception is RestStatusException{status=400} org.springframework.data.elasticsearch.RestStatusException: Elasticsearch exception [type=illegal_argument_exception, reason=cannot create index with name [my_datastream_index_test], because it matches with template [my-datastream-index-template] that creates data streams only, use create data stream api instead]; nested exception is ElasticsearchStatusException[Elasticsearch exception [type=illegal_argument_exception, reason=cannot create index with name [my_datastream_index_test], because it matches with template [my-datastream-index-template] that creates data streams only, use create data stream api instead]]
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:224)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.lambda$instantiateClass$5(RepositoryFactorySupport.java:579)
at java.util.Optional.map(Optional.java:215)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.instantiateClass(RepositoryFactorySupport.java:579)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getTargetRepositoryViaReflection(RepositoryFactorySupport.java:544)
at org.springframework.data.elasticsearch.repository.support.ElasticsearchRepositoryFactory.getTargetRepository(ElasticsearchRepositoryFactory.java:74)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:325)
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.lambda$afterPropertiesSet$5(RepositoryFactoryBeanSupport.java:323)
at org.springframework.data.util.Lazy.getNullable(Lazy.java:231)
at org.springframework.data.util.Lazy.get(Lazy.java:115)
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:329)
at org.springframework.data.elasticsearch.repository.support.ElasticsearchRepositoryFactoryBean.afterPropertiesSet(ElasticsearchRepositoryFactoryBean.java:69)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1863)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1800)
... 161 more

First, as Val commented, Spring Data Elasticsearch currently creates indexes and no data streams.
But second, according to the error message, the data stream would be created by an index template, I assume. You need to keep Spring Data Elasticsearch from creating the index automatically, you do this by adding the createIndex value to the #Document annotation of your entity class:
#Document(indexName = "my_datastream_index_test", createIndex = false)

Related

“Error writing to ES after 1 attempt(s). No more attempts allowed” when using the “PubSub to Elasticsearch” streaming template

We ran a Google Dataflow batch job that writes data records to a pubsub pipeline, and have a separate streaming job that pulls data from the pubsub pipeline and writes updates to our elasticsearch index using the “PubSub to Elasticsearch” streaming template. However, we had to terminate the streaming job that writes to Elastic because we encountered multiple “Error writing to ES after 1 attempt(s). No more attempts allowed” errors.
The job that reads from PubSub and writes to Elasticsearch is writing to an index that is not a streaming index. It actually needs to update the same document in the index, potentially multiple times.
Does this job need to write to a streaming index? If not, how can we configure the dataflow job to actually slow down and not overwhelm Elastic?
I have seen that increasing the thread_pool.index.bulk.queue_size can be used but it isn't recommended.
Error message from worker: generic::unknown: org.apache.beam.sdk.util.UserCodeException: java.io.IOException: Error writing to ES after 1 attempt(s). No more attempts allowed
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBundleFn$DoFnInvoker.invokeFinishBundle(Unknown Source)
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1751)
org.apache.beam.fn.harness.data.PTransformFunctionRegistry.lambda$register$0(PTransformFunctionRegistry.java:111)
org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:538)
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Error writing to ES after 1 attempt(s). No more attempts allowed
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.handleRetry(ElasticsearchIO.java:2569)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushBatch(ElasticsearchIO.java:2519)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushAndOutputResults(ElasticsearchIO.java:2435)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.finishBundle(ElasticsearchIO.java:2396)
From the official docs for the template:
The Dataflow template uses Elasticsearch's data streams feature to store time series data across multiple indices while giving you a single named resource for requests
There are parameters you can use to control this and it sounds like you will want to change the defaults:
bulkInsertMethod (Optional) Whether to use INDEX (index, allows upserts) or CREATE (create, errors on duplicate _id) with Elasticsearch bulk requests. Default: CREATE.
usePartialUpdate (Optional) Whether to use partial updates (update rather than create or index, allowing partial docs) with Elasticsearch requests. Default: false.
maxRetryAttempts (Optional) Max retry attempts, must be > 0. Default: no retries.

Delete data from nested fields in ES via java springboot

How to delete data based on condition from elastic search index using RestHighLevelClient in spring boot
The stackoverflow above describe how to remove main_phone1=1 from companyphones[] array inside the ES document via 'painless'
Anyone know how to remove data from fields[] from ES document without using 'painless' script inside the java code. For example construct the QueryBuilder and using esresttemplate to invoke the delete data?
currently I am using springboot + springdataelasticsearch jar + mvn
Thank you

Spring Data Elasticsearch - How to get mapping for a field

Wondering how to do this in java using spring data elasticsearch library.
GET /my-index-000001/_mapping/field/user
This is not supported by Spring Data Elasticsearch. You'll have to get the mapping for the index and extract the part you need from the returned Map<String, Object>.

disable TypeHints in the Document Generated for Spring Data ElasticSearch 4.X

Is there a way i can disable TypeHints in the Document Generated for Spring Data ElasticSearch.
https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.mapping.meta-model.rules
I have the Mapping Definition for my elastic Index (7.X) Dynamic Mapping Set to Strict and when i am trying to Index a Document it was created a Field _class in the Elastic Document which is failing the Document Insertion into the ElasticSearch index 7.X with Below Error
Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [_class] within [_doc] is not allowed]
Currently this is not possible. You can create an issue in Jira to have this implemented as a new feature, but beware that if type hints are not written, you wont be able to properly read collection-like values of generics.
For example if you have two classes Foo and Bar and in an entity you have a property of type List<Object> which contains Foos and Bars you won't be able to read back such an entity from Elasticsearch, because the type information of the objects would be lost.

Guidance on how to index Elasticsearch documents using Spring Data

My application uses both Spring Data JPA and Spring Data Elasticsearch.
I plan to first persist the JPA entities, then map them to a slightly different java class (the Elasticsearch document) and finally index that document into the Elasticsearch index.
However, I have a few questions as how, where and when to index the documents.
Is indexing a time consuming process that should be asynchronous?
What design pattern could help me avoid having problematic code such as the following?
saveAdvertisement method from AdvertisementService:
public void saveAdvertisement(Advertisement jpaAdvertisement) {
jpaAdvertisementRepository.save(jpaAdvertisement);
//somehow map the jpa entity to the es document
elasticSearchTemplate.index(esAdvertisement);
}
whereby I have to have two concerns in the same method:
JPA persist
Elasticsearch indexing

Resources