SpringData ElasticSearch insert/update multiple documents - elasticsearch

My goal is to do insert and update multiple documents in ElasticSearch using ElasticsearchRepository.
public interface EmployeeInfoRepository extends ElasticsearchRepository<EmployeeInfo, String> {
}
However, whenever I call saveAll(entities), the number of document is unchanged but it creates new indexes for those entities.
employeeInfoRepository.saveAll(employeeInfos);
If I insert 1000 elements, at first it will have 1000 docs and 1000 indexes, which is what I expected. Then I call saveAll two more times, it still has 1000 docs but now the number of indexes increases to 3000.
How can I update it properly?
It would be the best if it's just as easy as calling saveAll and the rest is handled by SpringBootData ElasticSearch.
Update 1:
There is no change with the data, however when I run saveAll, the storage_size keeps increasing. Not sure if it creates the indexes again and still keeps the old indexes.

If you define the id in your document class, elasticsearch updates/inserts the document with the same document id value.
Here an example:
#Document(indexName = "example_index")
public class Example {
#Id
#Field(type = FieldType.Long)
private long id;
}
Of course, you have to handle the logic for a unique id in your project.

Related

SpringData JPA: Query with collection of entity as parameter

I have a list of entities on which I want to perform an update, I know I could update the table with list of String/Integer.. etc as the parameter with something like
#Query("update tableName i set i.isUpdated = true where i.id in :ids")
void markAsUpdated(#Param("ids") List<Integer> itemIds);
I'm trying to avoid repeated conversion of list of entities to list of Ids for making the query in DB. I know there are deleteAll and deleteInBatch commands which accept parameter as list of entities.
How do I do this in JPA Query, I tried the following but it didn't work yet.
#Modifying(flushAutomatically = true, clearAutomatically = true)
#Query("update tableName i set i.updated = true where i in :items")
void markAsUpdated(#Param("items") List<Item> items)
The query needs ids, it doesn't know how to deal with entities.
You have multiple options:
Just pass ids to the method, the client is responsible for extracting ids.
Pass entities and use SpEL for extracting ids
As suggested in the comments use a default method to offer both APIs and to delegate from one to the other.
As for the question that came up in the comments: You can move the method for extracting the id into a single method by either have relevant entities implement an interface similar to this one:
interface WithId {
Long getId();
}
Or by passing a lambda to the method, doing the conversion for a single entity:
List<ID> extractIds(List<E> entities, Function<E, ID> extractor) {
// ...
}

Problem indexing LongField from custom FieldBridge

for a search using lucene, I made a bridge,
public class EntityIDFieldBridge implements FieldBridge {
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
BaseEntity baseEntity = (BaseEntity) value;
if(value !=null){
Field field = new LongField(name, baseEntity.getId(),Field.Store.NO);
document.add(field);
}
}
}
when I search for the value, I dont get the correct documents. when I search term:* I do get the ones that are not null, so I see that its getting indexed.... StringField is working fine. But I think it should be a long field. Any ideas?
Based on the little information you provided, I am assuming that you are not trying to get a value whihc is null.
Field Bridge provided more information on what it is, what lucene supports and how it works:
In Lucene all index fields have to be represented as Strings. For this
reason all entity properties annotated with #Field have to be indexed
in a String form. For most of your properties, Hibernate Search does
the translation job for you thanks to a built-in set of bridges. In
some cases, though you need a more fine grain control over the
translation process.
Also for Null values
null elements are not indexed. Lucene does not support null elements
and this does not make much sense either.

Spring Data Rest - Set request timeout

I have a Visit entity which refers to a Patient entity by ManyToOne relationship. The repository for Visit is:
#RepositoryRestResource(collectionResourceRel = "visits", path = "visits", excerptProjection=VisitProjection.class)
public interface VisitRepository extends PagingAndSortingRepository<Visit, Long> {
#RestResource(path="all")
List<Visit> findByPatientIdContaining(#Param("keyword") String keyword);
}
When searching visits by patient ID with /visits/search/all?keyword=1 which may return millions of records, the query is forever pending and never ends. In the console there are dozens of hibernate sqls printed every second. How can I set the request timeout from server side?
I have tried:
And Transactional annotation with timeout attribute to repository method: (works a little but still takes long to timeout)
#RestResource(path="all")
#Transactional(timeout=2)
List<Visit> findByPatientIdContaining(#Param("keyword") String keyword);
add some timeout properties to application.properties: (just doesn't work at all):
spring.jpa.properties.hibernate.c3p0.timeout=2
spring.jpa.properties.javax.persistence.query.timeout=2
spring.mvc.async.request-timeout=2
server.connection-timeout=2
rest.connection.connection-request-timeout=2
rest.connection.connect-timeout=2
rest.connection.read-timeout=2
server.servlet.session.timeout=2
spring.session.timeout=2
spring.jdbc.template.query-timeout=2
spring.transaction.default-timeout=2
spring.jpa.properties.javax.persistence.query.timeout=2
javax.persistence.query.timeout=2
server.tomcat.connection-timeout=5
Okay, no one using your API is going to want millions of records in one hit so use the provided paging functionality to make the result set more manageable:
https://docs.spring.io/spring-data/rest/docs/3.1.6.RELEASE/reference/html/#paging-and-sorting
#RestResource(path="all")
Page<Visit> findByPatientIdContaining(#Param("keyword") String keyword, Pageable p);
Clients can specify the records they want the records returned by adding the params:
?page=1&size=5

Adding Solr sort criteria to named query in Spring Data

I am trying to add a sort criteria to a Spring data Solr search.
I have seen methods for sorting search results (List<> .. after the data has been received from Solr) but not for specifying the sort order to Solr in the query.
The Solr query should look like....
http://localhost:8088/solr/books/select?fl=fullrecord%2C%20url&q=mod_date%3A%5B2012-04-17T11%3A38%3A15Z%20TO%20NOW%5D&sort=mod_date%20asc
I need to do this in Solr because the request is paged (there are potentially hundreds of thousands of results) and so only a limited number of results (one page) are returned for each query.
How can I add the "&sort=mod_date asc" solr query string?
// Catalog.findByDateChanged=mod_date:[?0 TO ?1]
public interface CatalogRepository extends SolrCrudRepository<CatalogDoc, String> {
#Query(name = "Catalog.findByDateChanged", fields = { "fullrecord", "url" })
public Page<CatalogDoc> findByDateChanged(String fromDate, String toDate, Pageable pageable);
}
It seem that the sort can be added to the call...
Page<CatalogDoc> results = solrRepo.findByDateChanged(from, to,
PageRequest.of(pageNo, perPage, Sort.Direction.ASC, "mod_date" ));
Curiously it adds it twice to the command to Solr...
&sort=mod_date+asc,mod_date+asc

Getting multiple entries from extra lazy loaded collection

Is it possible to somehow get multiple objects from a one-to-many-collection by index/key, which is marked with extra lazy load?
I have a big collection where I can't fetch all entries but still want to get multiple objects from it.
For example:
class System
{
...
#OneToMany(mappedBy = "system")
#MapKey(name = "username")
#LazyCollection(LazyCollectionOption.EXTRA)
private Map<String, User> users = new HashMap<>();
public List<User> getUsers(List<String> usernames)
{
//what to do
}
}
It's just a simple example but it portraits my problem.
I know I could just use the Criteria API or (named) queries but I try to keep the logic where it belongs to.
Unfortunately it seems that Hibernate does not support loading multiple entries from a collection inside a entity.
Only ways I found:
use eager/lazy loading and get all objects (which won't work if there are many)
use extra lazy loading and get multiple objects by retrieving one by one (can hurt performance)
use Session.createFilter which can not be called inside an entity

Resources