Spring data elasticsearch - migration documents to new index - spring

I'm developing spring application for search purpose. I use elasticsearch spring data library for creating indices, and managing documents. For querying (searching) I used regular client from elasticsearch - not from spring data.
I noticed that the spring data only creates index if it is missing in the elasticsearch. Whenever new field is added to the the class annotated with #Document, mapping will not be updated. Thus, searching in just-added field cause a bad request.
The application works now on production already. There are multiple instances of this application running. I would like to change the mapping of the index and keep existing data.
The solution I found in the internet and in the documentation is to create new index, copy data (and possibly change them on-the-fly) with reindex functionality and switch aliases to the new one.
I implemented solution with this approach. Migration procedure runs on application startup(if required - decided with env param).
However, this approach seems to me to be cheap and shoddy. Changing documents with painless script is error prone. It is difficult to test migration. I need to manually keep information on which env I am running migration, and have proper index name set. During deployment I need to keep an eye on the proces to check if everything worked correctly. Possibly some manual changes would be required as well. What if reindex procedure fails in the meantime?
There are a lot of questions that are bothering me. I was searching why there isn't library, similar to Flyway. Also, I understand that it is no possible to change mapping of the index, but it is possible to add new field and this is not supported in the the spring data elasticsearch.
Could you guys please give me some advices how do you tackle such situations?

This is no answer as how to generally do these migrations, but some clarification of what Spring Data Elasticsearch can do and what it does.
Spring Data Elasticsearch creates an index with the corresponding mappping if you are using a Spring Data Elasticsearch repository for your entity and if the index does not exist on application startup. It does not update the mapping of an index by itself.
You can nevertheless update an index mapping from the program code, there's IndexOperations.putMapping(java.lang.Class<?>) for that. So if you add a new property to your entity and then on application start call this method with the changed entity class, the index mapping will be updated. This can only add new fields to the mapping, not change existing ones - this is a restriction of Elasticsearch.
If your application is running in multiple instances it is up to you to synchronize them in updateing or in correctly handling errors.
If you add fields make sure to update the mapping before adding data, otherwise the new field type will be autodetected by Elasticsearch and you will have to do a manual reindex process.

Related

Spring data JPARepository findById() on an Entity with #Version persists the record when no data is present for the primary key

Am using Spring Data JPA and hibernate in as springboot project for persistence, Whenever the method findyById() method on the Repository(JPA CRUD Repository) returns no data for the Primary key for an entity which uses #Version annotation for optimistic locking, it tries to insert an entity to the database.
I could see the insert query generated in the log file.
Has anyone come across such an issue? Please help.
The things I noticed from your explanation seem very strange to the program because this should not happen, you are just doing a simple query, it should not depend on the output of the query. Consider how much you have to look at different situations in very large application to avoid unexpected behaviors that cause problems.
One of the goals of ORM (Hibernate, etc.) is to ensure that the application meets your needs without worries.
There may be configuration on the side of your existing application that cause this problem.
In my opinion, to understand the problem, create another simple project with the minimum requirements, try again.

Update specific field in mongodb of a spring-boot app using spring-data feature

Can we update only specific field in mongodb of a spring-boot app using spring-data feature?
Currently, spring-data provides a save method to update as well as save in a document. If two sets of concurrent updates happen in a single document for the different field, we can lose information. I know we can solve the problem using Mongotemplate. Can we solve these problems using spring-data?
Thanks
What about Optimistic Locking feature?
The #Version annotation provides syntax similar to that of JPA in the context of MongoDB and makes sure updates are only applied to documents with a matching version. Therefore, the actual value of the version property is added to the update query in such a way that the update does not have any effect if another operation altered the document in the meantime. In that case, an OptimisticLockingFailureException is thrown.
See Spring Documentation: https://docs.spring.io/spring-data/mongodb/docs/2.0.9.RELEASE/reference/html/#mongo-template.optimistic-locking
With MongoDB 4.0, ACID transactions have arrived in the Document store, enforcing all-or-nothing execution and maintaining data integrity. So, let’s get straight to it by looking at both the synchronous and the reactive execution models.
you could write code like this:
#Transactional
void insertDocuments() {
operations.insert(documentOne);
operations.insert(documentTwo);
}
Complete Spring's documentation:
https://spring.io/blog/2018/06/28/hands-on-mongodb-4-0-transactions-with-spring-data

cassandra #table dynamically change name

I have multiple consumers for an API who post similar data into my API. My API needs to consume this data and persist the data into cassandra tables identified by consumer name. Eg. consumername_tablename
My spring boot entity is annotated with #Table which doesn't let me change the table name dynamically. Most recommendations online suggest that its not something we should try and change.
But in my scenario identifying all consumers and creating table in advance doesnt sound right. In future I want to be able to add consumers to my API seamlessly.
I want to use a variable passed in my API call as the prefix for my cassandra table names. Is this something I can achieve?
For starters: You cannot change annotations without recompiling- they are baked into the compiled class file. This is not the right approach.
Why not put everything in one table and make consumer part of the key? This should give you identical functionality without any of the hassle.

making changes to database with hibernate

so im quite new to all spring and hibernate so i used a feature in myeclipse called generate CRUD application (it uses spring and hibernate for the heart of the application and JSF for presentation objects)that im intended to make changes so that i can work with .. my question is the following .. after i made the application that works fine by the way , i discovered that there are fields and probably even tables to be added to the database(an oracle 11g instance database)..so my questions are the following:
if i create the classes and update the existing .. will it be written directly in the database?
if not is there any way to do it because i dont think a direct update in the database will be a good idea ..
thank you in advance ..
If I understand correctly, you want to know whether the database schema can be created/updated automatically from your #Entity classes, and how to enable/disable such creation. Yes, it's possible by using some property. The name of the property would depend on your project kind. For example, in a default Spring Boot application, you can have
spring.jpa.hibernate.ddl-auto: update
in application.properties. The value update above will have the schema automatically created on first run and then updated on subsequently. validate instead of update won't alter the schema, but just validate it.
This stackoverflow post lists the possible values and their behaviour.

How to combine neo4j and elasticsearch

I am developing a Question answering application and for that I need to use neo4j and elasticsearch in the same maven project. I am using elasticsearch to make my application more robust.
As we know that neo4j and elasticsearch works on different version of lucene, so whichever version I include in dependency, it gives an error.
Here is what I am doing:
First elasticsearch will index the data and the data and relationships will be stored as graphdatabase using neo4j. Then the user will input as a query, through which the data will be retrieved with the help of indexes. This data will be trigerred in graphdatabasev using trigger score which will be then propagated along the graphdatabase to find relevant results according to the user query.
Is there any way that I can integrate neo4j and elasticsearch in same maven project, or is there any other way through which these two modules can interact seperately.
Thanks
Please check out our integration page:
http://neo4j.com/developer/elastic-search/
Which has some discussion and also an example project to get you started.
http://github.com/neo4j-contrib/neo4j-elasticsearch

Resources