Guidance on how to index Elasticsearch documents using Spring Data - spring

My application uses both Spring Data JPA and Spring Data Elasticsearch.
I plan to first persist the JPA entities, then map them to a slightly different java class (the Elasticsearch document) and finally index that document into the Elasticsearch index.
However, I have a few questions as how, where and when to index the documents.
Is indexing a time consuming process that should be asynchronous?
What design pattern could help me avoid having problematic code such as the following?
saveAdvertisement method from AdvertisementService:
public void saveAdvertisement(Advertisement jpaAdvertisement) {
jpaAdvertisementRepository.save(jpaAdvertisement);
//somehow map the jpa entity to the es document
elasticSearchTemplate.index(esAdvertisement);
}
whereby I have to have two concerns in the same method:
JPA persist
Elasticsearch indexing

Related

Spring Data Elasticsearch - How to get mapping for a field

Wondering how to do this in java using spring data elasticsearch library.
GET /my-index-000001/_mapping/field/user
This is not supported by Spring Data Elasticsearch. You'll have to get the mapping for the index and extract the part you need from the returned Map<String, Object>.

Fetching Index data from elasticsearch DB using Spring Data ElasticSearch

I have a java code which connects to Elasticsearch DB using Spring-data-elasticsearch and fetches all the index data by connecting to the repository and executing the findAll() method. The data received from ES is being processed by a seperate application. When new data is inserted into elastic search, I have the below queries
1. How can I fetch only the newly inserted data Programatically ?
2. Apart from using the DSL queries, Is there a way to Asyncronously get the new records as and when new data is inserted into elasticsearch DB.
I dont want to execute the findAll() method again. Because it returns the entire data ( including the previously processed records as well) .
Any help on this is much appreciated.
You will need to add a field (I call it createdAt here) to your entities that contains the timestamp when your application inserts into Elasticsearch. One possibility would be to use the auditing support of Spring Data Elasticsearch to have the value set automatically, or you set the value in your application. If the data is inserted by some other application you need to make sure that it contains a timestamp in a format that maps the field type definition of this field in your application.
Then you'd need to define a method in your repository like
SearchHits<T> findByCreatedAtAfter(Timestamp referenceValue);
As for getting a notification in some form when new data is inserted: I'm not aware that Elasticsearch offers something like that. You will probably need to regularly call the method that retrieves the data.

disable TypeHints in the Document Generated for Spring Data ElasticSearch 4.X

Is there a way i can disable TypeHints in the Document Generated for Spring Data ElasticSearch.
https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.mapping.meta-model.rules
I have the Mapping Definition for my elastic Index (7.X) Dynamic Mapping Set to Strict and when i am trying to Index a Document it was created a Field _class in the Elastic Document which is failing the Document Insertion into the ElasticSearch index 7.X with Below Error
Elasticsearch exception [type=strict_dynamic_mapping_exception, reason=mapping set to strict, dynamic introduction of [_class] within [_doc] is not allowed]
Currently this is not possible. You can create an issue in Jira to have this implemented as a new feature, but beware that if type hints are not written, you wont be able to properly read collection-like values of generics.
For example if you have two classes Foo and Bar and in an entity you have a property of type List<Object> which contains Foos and Bars you won't be able to read back such an entity from Elasticsearch, because the type information of the objects would be lost.

Elasticsearch - Integration of Elasticsearch with Spring + Mongodb

Currently I am working on application related to Spring boot + Mongodb. Although application is working fine, but there are some collections where some fields are statis/required and others are totally dynamic.
Example:
#Document(collection = "yyyEntry")
public class YYYEntry{
#Id
private String id;
private String yyyId;
private String yyyNumber;
private List<KeyValueVo> customFields;
//...
}
public class KeyValueVo {
private String field;
private Object value;
}
Where field field in KeyValueVo can hold any value. And there can be a list of such pairs. Now although I can search document by these fields in mongodb, but its performance is too low(There can be a very large number of such records!), and this completely looks like a Complete Textual Search.
So for this purpose, we decided to integrate ElasticSearch with Mongodb for searching purpose. I have integrated Elasticsearch with Spring boot and working fine, but came to know that Elasticsearch is using/storing same data in its own format in its own storage and not using data of Mongodb.(Data Duplication!)
So my questions are:
Can we configure Elasticsearch to use data of mongodb?(Remove
Data duplication)
If question-1 is impossible, then how can I store data in both Mongodb/ElasticSearch-engine using spring boot at the same time?
Is it good to use Elasticsearch with Mongodb?
Is it good to use Elasticsearch MongoDB River for that purpose?
No need for code implementations, just need simple guidelines, any help will be highly appreciated.
Thanks,

Batch indexing Spring Data JPA entries to Elastic through Spring Data ElasticSearch

Our current setup is MySQL as main data source through Spring Data JPA, with Hibernate Search to index and search data. We now decided to go to Elastic Search for searching to better align with other features, besides we need to have multiple servers sharing the indexing and searching.
I'm able to setup Elastic using Spring Data ElasticSearch for data indexing and searching easily, through ElasticsearchRepository. But the challenge now is how to index all the existing MySQL records into Elastic Search. Hibernate Search provides an API to do this org.hibernate.search.jpa.FullTextEntityManager#createIndexer which we use all the time. But I cannot find a handy solution within Spring Data ElasticSearch. Hope somebody can help me out here or provide some pointers.
There is a similar question here, however the solution proposed there doesn't fit my needs very well as I'd prefer to be able to index a whole object, which fields are mapped to multiple DB tables.
So far I haven't found a better solution than writing my own code to index all JPA entries to ES inside my application, and this one worked out for me fine
Pageable page = new PageRequest(0, 100);
Page<Instance> curPage = instanceManager.listInstancesByPage(page); //Get data by page from JPA repo.
long count = curPage.getTotalElements();
while (!curPage.isLast()) {
List<Instance> allInstances = curPage.getContent();
for (Instance instance : allInstances) {
instanceElasticSearchRepository.index(instance); //Index one by one to ES repo.
}
page = curPage.nextPageable();
curPage = instanceManager.listInstancesByPage(page);
}
The logic is very straightforward, just depending on the quantity of the data it might take a while, so breaking down to batches and adding some messages can be helpful.

Resources