Elastic search spring JPA saveAll() + performance issue - elasticsearch

I am using spring boot 2.2 version and elastic search 6.4 version
And using spring data elastic search which extends CurdRepository
Problem is saveAll() on elastic search spring data jpa is taking 30 sec on the first iteration to save 500 records and after some iterations it is 14 or 15 sec to save 500 records which is more time and this is happening when the object is processed from cache but the same 500 records took 1 sec when it is fetched from elastic search and saved.
Elastic search document has more columns with nested objects
Elastic search document
#Document(indexname = "example", type = "example1", shards = 1, replicas =0, refreshInterval = -1)
public class Example1 {
...
private Example2 example2;
private Example3 example3;
private List<Example6> example6List;
...
}
#Document(indexname = "example", type = "example2", shards = 1, replicas =0, refreshInterval = -1)
public class Example1 {
...
private Example4 example4;
...
}
#Document(indexname = "example", type = "example3", shards = 1, replicas =0, refreshInterval = -1)
public class Example1 {
...
private Example5 example5;
private List<Example3> example3List;
}
Elastic search repository class
public interface Example1Repository extends ElasticSearchRepository {
}
Service class
#RequiredArgsConstructor
public class ExampleService {
private final Example1Repository example1Respository;
public void exampleMethod1() {
//read data from cache
// process the logic and save the list
// save to elastic search
example1Repository.saveAll(list); <-- this is taking 30 sec for 500 records to save
}
public void exampleMethod2() {
//read data from cache
//process the logic and save the list
//read data from **elastic search** and iterate and set the values from above list
//save to elastic search
example1Repository.saveAll(list); <-- this is taking 1 sec for 500 records to save
}
}
How to improve the performance in exampleMethod1() approach ? or what is causing performance issue ?
Thank you

Spring Boot 2.2 means Spring Data Elasticsearch 3.2, both versions are out of maintenance quite a while.
From a Spring Data Elasticsearch both calls do the same: Create a bulkindex request, send that to Elasticsearch and then do a refresh.
The difference in these requests might be - and I cannot see this from the info you provide - that the first time there is no id set in the objects.
I don't know if Elasticsearch is faster on a bulk index when there already are ids in the document. But you should consider the information about bulk requests from the Elasticsearch documentation for the current version.
Have you checked if there are any relevant messages in the Elasticsearch log?

Related

Recreating Elastic Index with different Field Type

I'm new to ES currently attempting to use spring-data-elasticsearch 3.2.1.RELEASE in my service.
Design is still in early phase and hence I've had to change/update fields in the ElasticDocument which we annotate by #Document.
It looks somewhat like:
#Docuemnt(...)
public class MyDocument {
#Id
private String id;
...
#Field(type = FieldType.String, name = "myField")
private String myField;
}
I had to change days field to an object, for which I simply changed the Datatype and FieldType Attribute to Object.
#Docuemnt(...)
public class MyDocument {
#Id
private String id;
...
#Field(type = FieldType.Object, name = "myField")
private Object myField;
}
I deleted all documents from my index on cluster and attempted to save this document with new field type but it looks like it's still giving errors due to previous type being Text.
org.springframework.data.elasticsearch.ElasticsearchException: Bulk indexing has failures.
Use ElasticsearchException.getFailedDocuments() for detailed messages
[
{
XYZ=ElasticsearchException[
Elasticsearch exception [
type=mapper_parsing_exception,
reason=failed to parse field [myField] of type [text] in document with id 'XYZ']
];
nested: ElasticsearchException[
Elasticsearch exception [
type=illegal_state_exception,
reason=Can't get text on a START_OBJECT at 1:296 ]
];
}
]
I'm pretty sure this might not be the best practice to change Field Types but I have tried this with a different indexName and that worked.
For another attempt, deleting this particular index manually and letting spring data elasticsearch create it while doing bulk indexing does not help. I see the same error.
Could it be because I have more instances (non-local) which are connected to elastic though not doing any operations on this index at this moment?

How to use generic annotations like #Transient in an entity shared between Mongo and Elastic Search in Spring?

I am using Spring Boot and sharing the same entity between an Elastic Search database and a MongoDB database. The entity is declared this way:
#Document
#org.springframework.data.elasticsearch.annotations.Document(indexName = "...", type = "...", createIndex = true)
public class ProcedureStep {
...
}
Where #Document is from this package: org.springframework.data.mongodb.core.mapping.Document
This works without any issue, but I am not able to use generic annotations to target Elastic Search only. For example:
#Transient
private List<Point3d> c1s, c2s, c3s, c4s;
This will exclude this field from both databases, Mongo and Elastic, whereas my intent was to apply it for Elastic Search only.
I have no issue in using Elastic specific annotations like this:
#Field(type = FieldType.Keyword)
private String studyDescription;
My question is:
what annotation can I use to exclude a field from Elastic Search only and keep it in Mongo?
I don't want to rewrite the class as I don't have a "flat" structure to store (the main class is composed with fields from other classes, which themselves have fields I want to exclude from Elastic)
Many thanks
Assumption: ObjectMapper is used for Serialization/Deserialization
My question is: what annotation can I use to exclude a field from
Elastic Search only and keep it in Mongo? I don't want to rewrite the
class as I don't have a "flat" structure to store (the main class is
composed with fields from other classes, which themselves have fields
I want to exclude from Elastic)
Please understand this is a problem of selective serialization.
It can be achieved using JsonViews.
Example:
Step1: Define 2 views, ES Specific & MongoSpecific.
class Views {
public static class MONGO {};
public static class ES {};
}
Step2: Annotate the fields as below. Description as comments :
#Data
class Product {
private int id; // <======= Serialized for both DB & ES Context
#JsonView(Views.ES.class) //<======= Serialized for ES Context only
private float price;
#JsonView(Views.MONGO.class) // <======= Serialized for MONGO Context only
private String desc;
}
Step 3:
Configure Different Object Mappers for Spring-Data-ES & Mongo.
// Set View for MONGO
ObjectMapper mapper = new ObjectMapper();
mapper.setConfig(mapper.getSerializationConfig().withView(Views.MONGO.class));
// Set View for ES
ObjectMapper mapper = new ObjectMapper();
mapper.setConfig(mapper.getSerializationConfig().withView(Views.ES.class));

Save composite key data to elastic search document

I am using hibernate envers to audit the data for my tables and save in Oracle DB. This auditing data I am reading and saving to elastic search index through Java code using spring data elastic search. I have a composite key (id and rev) which defines a unique row to save data but for elastic search I can't provide a composite key. It is taking only rev (identifier) column and replacing data.
Hibernate envers background information:
rev is the default identifier that hibernate is providing and for list of records which modified at same time, it creates same rev id:
Eg: id rev comments
1 1 newly created
2 1 newly created
1 2 modified
2 2 modified
First 2 rows are created at same time and next time I modified both the rows and updated so hibernate envers creates same rev id for 1 save.
#Entity
#IdClass(MyEmbeddedId.class)
#Document(indexName = "#{#indexName}", type = "my-document", shards = 1, replicas = 0, refreshInterval = "-1")
#Getter #Setter
public class MyClassAudit {
#Id
private Long id;
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#org.springframework.data.annotation.Id --> (this is for elastic search _id)
private Long rev;
}
#Getter #Setter
public class MyEmbeddedId implements Serializable {
private Long id;
private Long rev;
}
Java code:
List<MyClass> list = repository.findById(id);
elasticSearchRepository.saveAll(list)
Elastic search repository interface:
public interface MyElasticSearchRepository extends GenericSearchRepository<MyClassAudit, Long> {}
When I save data to elastic search then all 4 records should be saved as given in example but only 2 records are saved like below:
_id id rev comments
1 2 1 newly created
2 2 2 modified
It is because rev is taken as identifier in elastic search and 2nd record is being updated.
How to make elastic search to consider composite key to maintain unique records?
PS: _id is the elastic search identifier. since rev is having spring data annotation id, rev is considered as identifier in elastic search
Elasticsearch itself has no concept of a composite key. Spring Data Elasticsearch takes the #Id annotated element and uses it's toString() method to create the id entry for Elasticsearch (and stores the field as well in the source).
So - without haven't tried - you could use your MyEmbeddedId class as a field property of your MyClassAudit class and annotate it with #Id. But you have to have this property in your class, it will not be synthesized.
This will probably conflict with the annotations for hibernate, but I don't think it's a good idea to share one entity between storages and using mixed annotations.

Spring Data Cassandra Pagination

Can anybody know how to achieve the pagination in Spring Data Cassandra ?
I have tried all the possible alternatives to implement pagination for my table.
One of the stackoverflow answers says it is not provided directly.
Paging SELECT query results from Cassandra in Spring Boot application
As per documentation (https://docs.spring.io/spring-data/cassandra/docs/current-SNAPSHOT/reference/html/#cassandra.repositories.queries) says CrudRepository provides method with Pageable.
org.springframework.data.repository.CrudRepository does not provide.
I am using Cassandra 3.11.3 and Spring Boot 1.5.1.RELEASE.
Can one provide me simple pagination demo with Spring Data Cassandra ?
The pagination in Spring Data Cassandra work in 'forward only' basis like the way iterable interface works.
#Autowired
PersonRepository personrepo;
#GetMapping("/all/{page}")
public List<Person> getPaginated(#PathVariable Integer page) {
int currpage = 0, size = 2;
Slice<Person> slice = personrepo.findAll(CassandraPageRequest.first(size));
while(slice.hasNext() && currpage < page) {
slice = personrepo.findAll(slice.nextPageable());
currpage++;
}
return slice.getContent();
}
Where your repository contains:
public interface PersonRepo extends CrudRepository<Person, Integer> {
Slice<Person> findAll(Pageable pr);
}
Hence, you keep querying the result until the desired location of the data is reached.
This is the simplest way to do pagination in reactive Cassandra or regular Cassandra with spring data
In your repository class create custom methods with Query annotation (key point here is for second method there should be a offset that needs to sent as a parameter) Assuming date created is a clustering key
user table primary key(user_id, date_created)
Ex:-
#Query("select * from user order by date_created asc Limit :count)
Flux<UserEntity> findFirstSetOfUsers(#Param(count) int noOfRecords);
Next set of records
#Query("select * from user where date_created > :dateCreated order by date_created asc Limit :count)
Flux<UserEntity> findNextSetOfUsers(#Param(count) int noOfRecords, #Param(dateCreated) Date dateCreated);

Spring Data MongoDB #Indexed annotation - create index error (system.indexes) during find query operation

I am facing a weird issue as this has been working earlier with no issues.
I am using latest Spring Data MongoDB 1.5.2 release with Mongo Java Driver 2.12.3
I have used mongoDB ensureIndex command to create index on a collection field manaully through Mongo Shell (MongoDB serevr running with MongoDB 2.4 version).
I have checked both on my collection with collection.getIndexes() command and system.indexes collection too that the above index is created correctly.
With Spring Data MongoDB, I have also placed #Indexed annotation on same field in domain object.
During a query find operation, I was getting below Spring Data MongoDB create index exception for creating index in system.indexes collection and complained the document obj size is greater then 16MB.
If, I remove #Indexed annotation on domain object field and re-run the find query, then there are no errors from application side.
I would like to understand as why:
1) Spring Data MongoDB is trying to create index in system.indexes during a find query operation with #Indexed annotation available.
2) Why Spring Data MongoDB complains obj size is greater then 16 MB during find query operation with #Indexed annotation, whereas I can run the same find query on MongoDB shell with no issues.
3) Is the above document in collection is corrupt? Do I have to re-import fresh data again and test the same as this has been working in the past with no issues from application side?
4) What is the life cycle of Spring Data MongoDB #Indexed annotation or How does it works? Do we have any documentation available in details?
Domain Object
#Document(collection = "Users")
public class Users implements Serializable {
#Id
ObjectId id;
#Indexed
String appUserId
String firstName;
String lastName;
}
#Repository
public interface UsersRepository extends MongoRepository<Users, String>, UsersRepositoryCustom {
// default query methods
}
#Repository
public interface UsersRepositoryCustom {
int findUserCount(String appUserId);
}
#Component
public class UsersRepositoryImpl implements UsersRepositoryCustom {
#Override
public int findUserCount(String appUserId) {
DBCursor dbCursor = null;
int count = 0;
Query query = new Query();
query.addCriteria(
Criteria.where("appUserId").is(appUserId));
try {
DBCollection dbCollection = mongoOperations.getCollection("Users");
System.out.println("Start time : " + new Date().toString());
dbCursor = dbCollection.find(query.getQueryObject());
//while (dbCursor.hasNext()) {
//do dome processing
}
count = dbCursor.count();
System.out.println("End time : " + new Date().toString());
} finally {
dbCursor.close();
}
return count;
}
}
Caused by: com.mongodb.WriteConcernException: { "serverUsed" : "XXXXXXXX:11111" , "err" : "BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO" , "code" : 10334 , "n" : 0 , "connectionId" : 341 , "ok" : 1.0}
at com.mongodb.CommandResult.getWriteException(CommandResult.java:90)
at com.mongodb.CommandResult.getException(CommandResult.java:79)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:131)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:135)
at com.mongodb.DBTCPConnector.access$000(DBTCPConnector.java:39)
at com.mongodb.DBTCPConnector$1.execute(DBTCPConnector.java:186)
at com.mongodb.DBTCPConnector$1.execute(DBTCPConnector.java:181)
at com.mongodb.DBTCPConnector.doOperation(DBTCPConnector.java:210)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:181)
at com.mongodb.DBCollectionImpl.insertWithWriteProtocol(DBCollectionImpl.java:530)
at com.mongodb.DBCollectionImpl.createIndex(DBCollectionImpl.java:369)
at com.mongodb.DBCollection.createIndex(DBCollection.java:564)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.createIndex(MongoPersistentEntityIndexCreator.java:135)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.checkForAndCreateIndexes(MongoPersistentEntityIndexCreator.java:129)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.checkForIndexes(MongoPersistentEntityIndexCreator.java:121)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.onApplicationEvent(MongoPersistentEntityIndexCreator.java:105)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.onApplicationEvent(MongoPersistentEntityIndexCreator.java:46)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:98)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:333)
at org.springframework.data.mapping.context.AbstractMappingContext.addPersistentEntity(AbstractMappingContext.java:307)
at org.springframework.data.mapping.context.AbstractMappingContext.getPersistentEntity(AbstractMappingContext.java:181)
at org.springframework.data.mapping.context.AbstractMappingContext.getPersistentEntity(AbstractMappingContext.java:141)
at org.springframework.data.mapping.context.AbstractMappingContext.getPersistentEntity(AbstractMappingContext.java:67)
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactory.getEntityInformation(MongoRepositoryFactory.java:141)
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactory.getTargetRepository(MongoRepositoryFactory.java:83)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:158)
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:224)
at org.springframework.data.repository.core.support.`enter code here`RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:210)
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactoryBean.afterPropertiesSet(MongoRepositoryFactoryBean.java:108)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1612)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1549)

Resources