get one field from elasticsearch by spring data - spring

I have a ES Document like this
class User {
String name;
String describe;
List<String> items;
}
I'm using spring data to talk to ES by the Repository interface
interface UserRepository extends Repository<User, String> {
}
Now I need to build a rest interface which responses a JSON-format data like this
{"name": String, "firstItem": String}
Because the describe and items in User is very big, it's very expensive to retrieve all field from the ES.
I know the ES have a feature named "Response Filtering" which can fit my requirement, but I don't find a way to using it in Spring Data.
How to do this in spring data?

What you need is a mix of source filtering (for not retrieving heavy fields) and response filtering (for not returning heavy fields). However, the latter is not supported in Spring Data ES (yet)
For the former, you can leverage NativeSearchQueryBuilder and specify a FetchSourceFilter that will only retrieve the fields you need. The latter is not supported yet in Spring Data ES. What you can do is to create another field named firstItem in which you'd store the first element of items so that you can return it for this query.
private ElasticsearchTemplate elasticsearchTemplate;
String[] includes = new String[]{"name", "firstItem"};
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery())
.withSourceFilter(new FetchSourceBuilder(includes, null))
.build();
Page<User> userPage =
elasticsearchTemplate.queryForPage(searchQuery, User.class);

Related

With spring-data-elasticsearch and searching for similar documents, how to get similarity score?

I am using the latest version of elasticsearch (in docker) and a spring boot (latest version) app where I attempt to search for similar documents. My document class has a String field:
#Field(
name = "description",
type = FieldType.Text,
fielddata = true,
analyzer = "icu_analyzer",
termVector = TermVector.with_positions_offsets,
similarity = Similarity.BM25)
private String description;
I get plenty of results for my query when I use the built-in searchSimilar method:
public Page<BookInfo> findSimilarDocuments(final long id) {
return bookInfoRepository.findById(id)
.map(bookInfo -> bookInfoRepository.searchSimilar(bookInfo, new String[]{"description"}, pageable))
.orElse(Page.empty());
}
However, I have no idea how similar the documents are, because it is just a page of my Document object. It would be great to be able to see the similarity score, or to set a similarity threshold when performing the query. Is there something different that I should be doing?
I just had a look, the existing method Page<T> searchSimilar(T entity, #Nullable String[] fields, Pageable pageable) was added to the ElasticsearchRepository interface back in 2013, it just returns a Page<T> which does not contain any score information.
Since Spring Data Elasticsearch version 4.0 the score information is available and when you look at the implementation you see that it is stripped from the return value of the function in order to adhere to the method signature from the interface:
public Page<T> searchSimilar(T entity, #Nullable String[] fields, Pageable pageable) {
Assert.notNull(entity, "Cannot search similar records for 'null'.");
Assert.notNull(pageable, "'pageable' cannot be 'null'");
MoreLikeThisQuery query = new MoreLikeThisQuery();
query.setId(stringIdRepresentation(extractIdFromBean(entity)));
query.setPageable(pageable);
if (fields != null) {
query.addFields(fields);
}
SearchHits<T> searchHits = execute(operations -> operations.search(query, entityClass, getIndexCoordinates()));
SearchPage<T> searchPage = SearchHitSupport.searchPageFor(searchHits, pageable);
return (Page<T>) SearchHitSupport.unwrapSearchHits(searchPage);
}
You could implement a custom repository fragment (see https://docs.spring.io/spring-data/elasticsearch/docs/4.2.6/reference/html/#repositories.custom-implementations) that provides it's own implementation of the method that returns a SearchPage<T>:
public SearchPage<T> searchSimilar(T entity, #Nullable String[] fields, Pageable pageable) {
Assert.notNull(entity, "Cannot search similar records for 'null'.");
Assert.notNull(pageable, "'pageable' cannot be 'null'");
MoreLikeThisQuery query = new MoreLikeThisQuery();
query.setId(stringIdRepresentation(extractIdFromBean(entity)));
query.setPageable(pageable);
if (fields != null) {
query.addFields(fields);
}
SearchHits<T> searchHits = execute(operations -> operations.search(query, entityClass, getIndexCoordinates()));
SearchPage<T> searchPage = SearchHitSupport.searchPageFor(searchHits, pageable);
return searchPage;
}
A SearchPage<T> is a page containing SearchHit<T> instances; these contain the entity and the additional information like the score.

How to make a custom converter for a String field using Spring Data Elasticsearch?

I am using Spring Data Elasticsearch in my project. I have a class like this,
#Document("example")
class Example {
#Id
private String id;
private String name;
private String game;
private String restField;
}
What I need is when ever I am saving the Example object to elasticsearch I am removing a value. But when I am getting the data from elasticsearch I need that removed value to be appended.
save(Example example) {
example.setRestField(example.getRestField().replace("***", ""));
exampleRepository.save(example);
}
get(String id) {
Example example = exampleRepository.findById(id);
example.getRestField().concat("***");
return example;
}
Right now I am doing like the above way. But can we use a custom converter for this? I checked the converter examples for Spring Data Elasticsearch but those are for different different objects. How I can create a custom converter only for this particular String field restField? I don't want to apply this converter for other String fields.
Currently there is no better solution. The converters registered for Spring Data Elasticsearch convert from a class to a String and back. Registering a converter for your case would convert any String property of every entity.
I had thought about custom converters for properties before, I have created a ticket for this.
Edit 05.11.2021:
Implemented with https://github.com/spring-projects/spring-data-elasticsearch/pull/1953 and will be available from 4.3.RC1 on.

Indexing problem with Spring Data Elastic migration from 3.x to 4.x

In our monolith application that used JHIPSTER-6.10.5, we were using Spring-Data-Elastic Version: 3.3.1 with Elastic Search Version: 6.8.8. We have multiple #ManyToOne and #OneToMany relationships with over a 100+ entities.
In some cases a maximum of 7 entities are referenced from each other (I mean interlinked not just from one to other).
For elastic searching, We have been using
To ignore indexing: #JsonIgnoreProperities(value = { "unwanted fields" }, allowSetters = true) and #JsonIgnore where not needed
To map the relations: on ManyToOne's we use #JsonBackReference with a corresponding #JsonManagedReference on the respective OneToMany relationships.
Now we are in process of migration to Jhipster-7.0.1 and started seing the below problems:
New Spring-Data-Elastic Version: 4.1.6 with Elastic Search Version: 7.9.3
Now with Spring data elastic, the Jackson based mapper is not available we are seeing multiple StackOverflow errors. Below is the migration change we did on the annotations:
On the relationships we have added #Field(type = FieldType.Nested, ignoreMalformed = true, ignoreFields = {"unwanted fields"}). This stopped StackOverflow errors at Spring data level but still throw StackOverflow errors at elastic rest-client level internally. So, we are forced to use #Transient to exclude all the OnetoMany relations.
Even on ManyToOne relations with the above mentioned #Field annotation present we are facing the elasticsearchexception with "Limit of total fields [1000] in index [] has been exceeded"
I have tried to follow the documentation on spring data, but could not resolve it.
We have kept the Json(Jackson) Annotations also that were generated by Jhipster but they have no effect.
We are stalled at the moment as we are not sure how to resolve these issues; personally it was very convenient and well documented to use the Json annotations; We being new to both elastic search and spring data elastic search, started using it just for the past 8 months back, not able to figure out how to fix these errors.
Please ask if i missed any information needed. I will share as much as it doesn't voilate the org policies.
Sample code Repository as requested on gitter: https://gitlab.com/thelearner214/spring-data-es-sample
Thank you in advance
Had a look at the repository you linked on gitter (you might consider adding a link here).
First: the #Field annotation is used to write the index mapping and the ignoreFields property is needed to break circular references when the mapping is built. It is not used when the entity is written to Elasticsearch.
What happens for example with the Address and Customer entities during writing to Elasticsearch: The Customer document has Addresses so these adresses are converted as subdocuments embedded in the Customer document. But the Address has a Customer, so on writing the address the Customer is embedded into this Address element which already is a subdocument of the customer.
I suppose the Customers should not be stored in the Address and the other way round. So you need to mark these embedded documents as #org.springframework.data.annotation.Transient, you don't need the #Field annotation on them as you don not want to store them as properties in the index.
Jackson annotations are not used by Spring Data Elasticsearch anymore.
The basic problem of the approach that is used here, is that the modelling that comes from a relational world - linking and joining different tables with (one|many)to{one|many) relationships, manifested in a Java object graph by an ORM mapper - is used on a document based data store that does not use these concepts.
It used to work in your previous version, because the elder version of Spring Data Elasticsearch used Jackson as well and so these fields were skipped on writing, now you have to add the #Transient annotation which is a Spring Data annotation.
But I don't know how #Transient might interfere with Spring Data JPA - another point that shows that it's not a good idea to use the same Java class for different stores
Here is an approach we are using as a stop gap arrangement until we rewrite / find a better solution. Can't use separate classes for ES, like #P.J.Meisch advised, as we have large number of entities to maintain and a "microservice migration program" is already in progress.
Posting here as it might be useful for someone else with a similar issue.
Created a utility to serialize and deserialize the entity to get the benefit of Jackson annotations on the class. Ex: #JsonIgnoreProperities, #JsonIgnore etc.
This way, we are able to reduce usage of the #Transient annotation and still get the ID(s) of the related object(s).
package com.sample.shop.service.util;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.JavaType;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;
import com.fasterxml.jackson.datatype.hibernate5.Hibernate5Module;
import org.jetbrains.annotations.NotNull;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.List;
import java.util.Optional;
public class ESUtils {
private static final Logger log = LoggerFactory.getLogger(ESUtils.class);
public static <T> Optional<T> mapForES(Class<T> type, T input) {
ObjectMapper mapper = getObjectMapper();
try {
return Optional.ofNullable(mapper.readValue(mapper.writeValueAsString(input), type));
} catch (JsonProcessingException e) {
log.error("Parsing exception {} {}", e.getMessage());
return Optional.empty();
}
}
public static <T> List<T> mapListForES(Class<T> type, List<T> input) {
ObjectMapper mapper = getObjectMapper();
try {
JavaType javaType = mapper.getTypeFactory().constructCollectionType(List.class, type);
String serialText = mapper.writeValueAsString(input);
return mapper.readValue(serialText, javaType);
} catch (JsonProcessingException e) {
log.error("Parsing exception {} {}", e.getMessage());
}
}
#NotNull
private static ObjectMapper getObjectMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false);
mapper.configure(SerializationFeature.WRITE_SELF_REFERENCES_AS_NULL, true);
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
Hibernate5Module module = new Hibernate5Module();
module.disable(Hibernate5Module.Feature.FORCE_LAZY_LOADING);
module.enable(Hibernate5Module.Feature.SERIALIZE_IDENTIFIER_FOR_LAZY_NOT_LOADED_OBJECTS);
module.enable(Hibernate5Module.Feature.USE_TRANSIENT_ANNOTATION);
module.enable(Hibernate5Module.Feature.REPLACE_PERSISTENT_COLLECTIONS);
return mapper;
}
}
Then, to save a single entry, have adjusted the logic to save to use the above utility like:
// categorySearchRepository.save(result); instead of the Jhipster generated code let's use the ESUtils
ESUtils.mapForES(Category.class,category).map(res -> categorySearchRepository.save(res));
And to save a list to for bulk-reindex, using the second utility:
Page<T> categoryPage = jpaRepository.findAll(page);
List<T> categoryList = ESUtils.mapListForES(Category.class, categoryPage.getContent());
elasticsearchRepository.saveAll(categoryList);
Might not be a best solution, but got the work done for our migration.
#Lina Basuni You can use java.util.Collections.emptyList()

How to use generic annotations like #Transient in an entity shared between Mongo and Elastic Search in Spring?

I am using Spring Boot and sharing the same entity between an Elastic Search database and a MongoDB database. The entity is declared this way:
#Document
#org.springframework.data.elasticsearch.annotations.Document(indexName = "...", type = "...", createIndex = true)
public class ProcedureStep {
...
}
Where #Document is from this package: org.springframework.data.mongodb.core.mapping.Document
This works without any issue, but I am not able to use generic annotations to target Elastic Search only. For example:
#Transient
private List<Point3d> c1s, c2s, c3s, c4s;
This will exclude this field from both databases, Mongo and Elastic, whereas my intent was to apply it for Elastic Search only.
I have no issue in using Elastic specific annotations like this:
#Field(type = FieldType.Keyword)
private String studyDescription;
My question is:
what annotation can I use to exclude a field from Elastic Search only and keep it in Mongo?
I don't want to rewrite the class as I don't have a "flat" structure to store (the main class is composed with fields from other classes, which themselves have fields I want to exclude from Elastic)
Many thanks
Assumption: ObjectMapper is used for Serialization/Deserialization
My question is: what annotation can I use to exclude a field from
Elastic Search only and keep it in Mongo? I don't want to rewrite the
class as I don't have a "flat" structure to store (the main class is
composed with fields from other classes, which themselves have fields
I want to exclude from Elastic)
Please understand this is a problem of selective serialization.
It can be achieved using JsonViews.
Example:
Step1: Define 2 views, ES Specific & MongoSpecific.
class Views {
public static class MONGO {};
public static class ES {};
}
Step2: Annotate the fields as below. Description as comments :
#Data
class Product {
private int id; // <======= Serialized for both DB & ES Context
#JsonView(Views.ES.class) //<======= Serialized for ES Context only
private float price;
#JsonView(Views.MONGO.class) // <======= Serialized for MONGO Context only
private String desc;
}
Step 3:
Configure Different Object Mappers for Spring-Data-ES & Mongo.
// Set View for MONGO
ObjectMapper mapper = new ObjectMapper();
mapper.setConfig(mapper.getSerializationConfig().withView(Views.MONGO.class));
// Set View for ES
ObjectMapper mapper = new ObjectMapper();
mapper.setConfig(mapper.getSerializationConfig().withView(Views.ES.class));

How to manage multiple user indexes in spring data elasticsearch

In spring data elasticsearch one model class/entity represents or maps to an index and type.
eg :-
#Document(indexName = "myindex",type="mytype")
public class DocumentModel {
......
}
I have a use case in which I should index data in different es indexes with same structure. If that's the case how can I represent all of those indices with this model class ?
Spring Data ES supports using SpEL expressions in the index name of the #Document annotation, like this:
#Document(indexName = "myindex-#{userId}", type="mytype")
public class DocumentModel {
......
}
Hence, you have access to the whole context offered by SpEL in order to create your index names.
UPDATE
If you're using elasticsearchTemplate, there is a simpler variant, you can do it like this:
IndexQuery indexQuery = new IndexQueryBuilder()
.withId(docModel.getId())
.withObject(docModel)
.withIndex("myindex"+docModel.getUserId()).build();
the call to withIndex("...") will override whatever index name you have in the #Document annotation

Resources