Programmatic mapping with embedded index in Hibernate Search results in unable to find field error - elasticsearch

I am having to do a programmatic configuration of the fields to be indexed with Hibernate Search.
In the scenario below the use of indexEmbedded() is resulting in a "field not found error".
#Entity
public class AT {
#ManyToOne(fetch = FetchType.EAGER)
#JoinColumn(name = "ARRM_IDE", nullable = false)
private A arr;
private Date dateType;
(and other fields)
}
#Entity
public class A {
#Id
#SequenceGenerator(name = "C_SEQUENCE", sequenceName = "S_ARRM_01")
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "C_SEQUENCE")
#Column(name = "IDE_ARR")
private Long id;
}
SearchMapping mapping = new SearchMapping();
mapping.entity(AT.class).indexed()
.property("dateType", ElementType.FIELD)
.field()
.store(Store.YES)
.property("arr", ElementType.FIELD)
.indexEmbedded()
.entity(A.class).indexed()
.property("id", ElementType.FIELD).documentId().name("arrId")
.field()
.store(Store.YES)
;
When I create and persist entities (I have integrated Hibernate Search with Elasticsearch), the entities are created and the indexes are created in Elasticsearch also.
contents on Elasticsearch:
"_index" : "com.....at",
"_type" : "com.....AT",
"_id" : "7744",
"_score" : 1.0,
"_source" : {
"dateType" : "2016-06-12T06:08:52.780Z",
"arr" : {
"id" : 6352
}
}
} ]
But when I try querying using the Hibernate Search Lucene query it fails:
FullTextEntityManager fullTextEntityManager =
org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
QueryBuilder qb = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(AT.class).get();
org.apache.lucene.search.Query luceneQuery = qb.bool()
.must(qb
.range()
.onField("dateType")
.from(parseDate(startDate))
.to(parseDate(endDate)).excludeLimit()
.createQuery())
.must(qb
.keyword()
.onField("arr")
.matching(crsArrId).createQuery())
.createQuery();
Sort sort = null;
if (order == OrderEnum.ASCENDING) {
sort = new Sort(
new SortField("dateType", SortField.Type.STRING));
} else {
sort = new Sort(
new SortField("dateType", SortField.Type.STRING, true));
}
FullTextQuery jpaQuery = fullTextEntityManager.createFullTextQuery(luceneQuery, AT.class);
jpaQuery.setSort(sort);
jpaQuery.setFirstResult(offset);
jpaQuery.setMaxResults(maxReturnedEvents);
return jpaQuery.getResultList();
Error is:
org.hibernate.search.exception.SearchException: Unable to find field arr in com....AT
at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.objectToString(DocumentBuilderIndexedEntity.java:977)
at org.hibernate.search.query.dsl.impl.FieldContext.objectToString(FieldContext.java:75)
at org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder.buildSearchTerm(ConnectedMultiFieldsTermQueryBuilder.java:145)
at org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder.createQuery(ConnectedMultiFieldsTermQueryBuilder.java:105)
at org.hibernate.search.query.dsl.impl.ConnectedMultiFieldsTermQueryBuilder.createQuery(ConnectedMultiFieldsTermQueryBuilder.java:67)
Thanks for your help!

You want to search on arr.id, not arr.
Just change .onField("arr") to .onField("arr.id").

Related

What is the best way to save jena Result set in database?

I am creating a Spring web application that queries SPARQL endpoints. As a requirement, I'm supposed to save the query and the result for later viewing and editing. So far I have created some entities (QueryInfo, Result, Endpoint) that I use to save the information entered about the Query and the Result. However I'm having trouble with saving the actual results themselves
public static List<String> getSelectQueryResult(QueryInfo queryInfo){
Endpoint endpoint = queryInfo.getEndpoint();
Query query = QueryFactory.create(queryInfo.getContent());
List<String> subjectStrings = query.getResultVars();
List<String> list = new ArrayList();
RDFConnection conn = RDFConnectionFactory.connect(endpoint.getUrl());
QueryExecution qExec = conn.query(queryInfo.getContent()) ; //SELECT DISTINCT ?s where { [] a ?s } LIMIT 100
ResultSet rs = qExec.execSelect() ;
while (rs.hasNext()) {
QuerySolution qs = rs.next();
System.out.println("qs: "+qs);
RDFNode rn = qs.get(subjectStrings.get(0)) ;
System.out.print(qs.varNames());
if(rn!= null) {
if (rn.isLiteral()) {
Literal literal = qs.getLiteral(subjectStrings.get(0));
list.add(literal.toString());
} else if (rn.isURIResource()) {
Resource subject = qs.getResource(subjectStrings.get(0));
System.out.println("Subject: " + subject.toString());
list.add(subject.toString());
}
}
}
return list;
}
My Result entity looks like this:
#Entity #Data #Table(schema = "sparql_tool") public class Result {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
#Column(length = 10485760)
private String content;
#OneToOne
#JoinColumn(name = "query_info_id",referencedColumnName = "id")
private QueryInfo queryInfo;
#Column(length = 10485760)
#Convert(converter = StringListConverter.class)
private List<String> contentList;
public Result() {
}
public Result(String content, QueryInfo queryInfo, List<String> list) {
this.content = content;
this.queryInfo = queryInfo;
this.contentList=list;
}
}
I used to save the actual results in the List contentList attribute. However, this only works when the query has only one result variable. If I have multiple result variables I have a table instead of a list. What is the best way to save this result in DB?
I'm working with an SQL DB if that is relevant. Thank you so much in advance!

Hibernate search : Sorting with filter on nested object, how to?

I have to code an hibernate search query (for elastic search database backend) which include a conditionnal sort of this kind :
Date dateOfBirth = new Date('01/01/2000');
Integer age = 10;
if (dateOfBirth == null) {
//then sort by age
}
else {
//sort by date of birth
}
I found an example to code this conditionnal sort inside Hibernate Search Reference, it can be done like this (quoted example) :
List<Author> hits = searchSession.search( Author.class )
.where( f -> f.matchAll() )
.sort( f -> f.field( "books.pageCount" )
.mode( SortMode.AVG )
.filter( pf -> pf.match().field( "books.genre" )
.matching( Genre.CRIME_FICTION ) ) )
.fetchHits( 20 );
My problem is that I hibernate search throws an exception at runtime. My sort filter code :
case DATE_SIGNATURE:
FieldSortOptionsStep bivSortFirst = f.field(Depot_.VENTE + "." + Vente_.DATE_SIGNATURE)
.filter(fa ->
{
PredicateFinalStep a = fa.bool(bo -> bo.must(fa.exists().field(Depot_.VENTE + "." + Vente_.DATE_SIGNATURE)));
return fa.bool(b0 -> b0.must(a));
}
);
FieldSortOptionsStep bivSortSecond = f.field(Depot_.VENTE + "." + Vente_.ACTE + "." + Acte_.SIGNATURE)
.filter(fa ->
{
PredicateFinalStep a = fa.bool(bo -> bo.mustNot(fa.exists().field(Depot_.VENTE + "." + Vente_.DATE_SIGNATURE)));
PredicateFinalStep b = fa.bool(bo -> bo.must(fa.exists().field(Depot_.VENTE + "." + Vente_.ACTE + "." + Acte_.SIGNATURE)));
return fa.bool(b0 -> b0.must(a).must(b));
}
);
sortFieldOrderedList.add(bivSortFirst);
sortFieldOrderedList.add(bivSortSecond);
break;
In the above example, I sort on two fields by priority. The first is assimilable to 'date of birth' and the second to 'age'. At runtime, the filter are not accepted by hibernate search and then throws an exception like follows :
The error message :
HSEARCH400604: Invalid sort filter: field 'vente.acte.signature' is
not contained in a nested object. Sort filters are only available if
the field to sort on is contained in a nested object. Context: field
'vente.acte.signature'
I read to do so, I need to go for 'inner_hits' query for elastic search. But how do I do this with hibernate search API ?
Thanks.
EDIT : Hibernate mapping of classes :
#Entity
#Indexed
public class Depot {
...
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "vente_fk")
protected Vente vente;
#IndexedEmbedded(includePaths = {
Vente_.ID,
Vente_.DATE_SIGNATURE,
Vente_.DATE_SIGNATURE_ACTE,
Vente_.ACTE + "." + Acte_.SIGNATURE,
and much more
}
public Vente getVente() {
return this.vente;
}
...
}
#Entity
public class Vente {
#OneToMany(mappedBy = Depot_.VENTE, fetch = FetchType.LAZY, cascade = CascadeType.ALL)
protected Set<Depot> depot = new HashSet<>();
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "acte_fk")
protected Acte acte;
...
#AssociationInverseSide(inversePath = #ObjectPath(#PropertyValue(propertyName = Acte_.VENTE)))
#IndexedEmbedded
public Acte getActe() {
return this.acte;
}
...
}
#Entity
public class Acte {
...
#GenericField(projectable = Projectable.YES, sortable = Sortable.YES, aggregable = Aggregable.YES)
protected Date signature;
#OneToMany(mappedBy = Vente_.ACTE)
protected Set<Vente> vente = new HashSet<>();
public Date getSignature() {
return this.signature;
}
...
}
From what I can see, for each Depot, there is at most one Acte and one Vente. So what you're trying to do is a bit exotic, as filtering in sorts is generally used on multi-valued nested objects.
The reason it's not working is you didn't mark the #IndexedEmbedded objects (vente, acte) as "nested"; as explained in the documentation, filtering only works on nested objects. And "nested" has a very precise meaning, it's not synonmymous with "indexed-embedded".
However, I think the whole approach is wrong in this case: you shouldn't use filtering. I'm quite sure that even if you mark the #IndexedEmbedded objects as "nested", you will face other problems, because what you're trying to do isn't the intended purpose of filtering. One of those problems could be performance; nested documents mean runtime joins, and runtime joins aren't cheap.
Instead, consider solving this problem at indexing time. Instead of trying to figure out which date to use for each document when searching, do that when indexing:
#Entity
#Indexed
public class Depot {
//...
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "vente_fk")
protected Vente vente;
#IndexedEmbedded(includePaths = {
Vente_.ID,
Vente_.DATE_FOR_SORT, // <================= ADD THIS
Vente_.DATE_SIGNATURE,
Vente_.DATE_SIGNATURE_ACTE,
Vente_.ACTE + "." + Acte_.SIGNATURE,
//and much more
})
public Vente getVente() {
return this.vente;
}
}
#Entity
public class Vente {
#OneToMany(mappedBy = Depot_.VENTE, fetch = FetchType.LAZY, cascade = CascadeType.ALL)
protected Set<Depot> depot = new HashSet<>();
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "acte_fk")
protected Acte acte;
//...
#AssociationInverseSide(inversePath = #ObjectPath(#PropertyValue(propertyName = Acte_.VENTE)))
#IndexedEmbedded
public Acte getActe() {
return this.acte;
}
// v================= ADD THIS
#Transient
#IndexingDependency(derivedFrom = {
#ObjectPath(#PropertyValue(propertyName = Vente_.DATE_SIGNATURE)),
#ObjectPath(#PropertyValue(propertyName = Vente_.ACTE), #PropertyValue(propertyName = Acte_.SIGNATURE)),
})
public Date getDateForSort() {
if ( getDateSignature() != null ) {
return getDateSignature();
}
else {
return getActe().getSignature();
}
}
// ^================= ADD THIS
//...
}

ElasticsearchRepository skip null values

I have the Repo to interact with ES index:
#Repository
public interface RegDocumentRepo extends ElasticsearchRepository<RegDocument, String> {
}
RegDocument class is a POJO of reg-document index:
#Document(indexName = "reg-document")
#Data
#AllArgsConstructor
#NoArgsConstructor
public class RegDocument {
#Id
String id;
#Field(type = FieldType.Nested, includeInParent = true)
private List<Map<String, Object>> attachments;
private String author;
#Field(type = FieldType.Nested, includeInParent = true)
private List<Map<String, Object>> classification;
private String content;
private String intent;
#Field(type = FieldType.Nested, includeInParent = true)
private List<Map<String, Object>> links;
private String name;
#Field(name = "publication_date")
private String publicationDate;
private Integer raiting;
private Long status;
private String title;
private String type;
private String version;
}
To hide my business-logic I use service:
#RequiredArgsConstructor
#Service
public class SearchServiceImpl {
#Autowired
RegDocumentRepo regDocumentRepo;
public RegDocument updateRating(String uuid, Integer rating) throws IOException {
final RegDocument regDocument = regDocumentRepo
.findById(uuid)
.orElseThrow(() -> new IOException(String.format("No document with %s id", uuid)));
Integer ratingFromDB = regDocument.getRaiting();
ratingFromDB = ratingFromDB == null ? rating : ratingFromDB + rating;
regDocument.setRaiting(ratingFromDB);
final RegDocument save = regDocumentRepo.save(regDocument);
return save;
}
}
So I had the such document in my ES index:
{
"_index" : "reg-document",
"_type" : "_doc",
"_id" : "9wEgQnQBKzq7IqBZMDaO",
"_score" : 1.0,
"_source" : {
"raiting" : null,
"attachments" : null,
"author" : null,
"type" : "answer",
"classification" : [
{
"code" : null,
"level" : null,
"name" : null,
"description" : null,
"id_parent" : null,
"topic_type" : null,
"uuid" : null
}
],
"intent" : null,
"version" : null,
"content" : "В 2019 году размер материнского капитала составляет 453026 рублей",
"name" : "Каков размер МСК в 2019 году?",
"publication_date" : "2020-08-26 06:49:10",
"rowkey" : null,
"links" : null,
"status" : 1
}
}
But after I update my ranking score, I have next structure:
{
"_index" : "reg-document",
"_type" : "_doc",
"_id" : "9wEgQnQBKzq7IqBZMDaO",
"_score" : 1.0,
"_source" : {
"raiting" : 4,
"type" : "answer",
"classification" : [
{
"code" : null,
"level" : null,
"name" : null,
"description" : null,
"id_parent" : null,
"topic_type" : null,
"uuid" : null
}
],
"content" : "В 2019 году размер материнского капитала составляет 453026 рублей",
"name" : "Каков размер МСК в 2019 году?",
"publication_date" : "2020-08-26 06:49:10",
"status" : 1
}
}
As you can see, Java service skip NULL values. But if the field is nested, null values were saved.
ElasticSearch version - 7.8.0
maven dependency for spring-data:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
<version>4.0.0.RELEASE</version>
</dependency>
So how can i SAVE null values, not skip them?
**
UDP
**
I have investigated spring-data-elasticsearch-4.0.0 dependency and find out, as Best Answer author said, that MappingElasticsearchConverter.java has following methods:
#Override
public void write(Object source, Document sink) {
Assert.notNull(source, "source to map must not be null");
if (source instanceof Map) {
// noinspection unchecked
sink.putAll((Map<String, Object>) source);
return;
}
Class<?> entityType = ClassUtils.getUserClass(source.getClass());
TypeInformation<?> type = ClassTypeInformation.from(entityType);
if (requiresTypeHint(type, source.getClass(), null)) {
typeMapper.writeType(source.getClass(), sink);
}
Optional<Class<?>> customTarget = conversions.getCustomWriteTarget(entityType, Map.class);
if (customTarget.isPresent()) {
sink.putAll(conversionService.convert(source, Map.class));
return;
}
ElasticsearchPersistentEntity<?> entity = type.getType().equals(entityType)
? mappingContext.getRequiredPersistentEntity(type)
: mappingContext.getRequiredPersistentEntity(entityType);
writeEntity(entity, source, sink, null);
}
This methods explain why nested data was saved as null and wasn't skip. It just put Map inside.
So the next method use reflection in such way. So if it is a null value, it's just skip it:
protected void writeProperties(ElasticsearchPersistentEntity<?> entity, PersistentPropertyAccessor<?> accessor,
MapValueAccessor sink) {
for (ElasticsearchPersistentProperty property : entity) {
if (!property.isWritable()) {
continue;
}
Object value = accessor.getProperty(property);
if (value == null) {
continue;
}
if (property.hasPropertyConverter()) {
ElasticsearchPersistentPropertyConverter propertyConverter = property.getPropertyConverter();
value = propertyConverter.write(value);
}
if (!isSimpleType(value)) {
writeProperty(property, value, sink);
} else {
Object writeSimpleValue = getWriteSimpleValue(value);
if (writeSimpleValue != null) {
sink.set(property, writeSimpleValue);
}
}
}
}
There is no official solution. So i have created a Jira ticket
The null values of the inner objects are stored, because this happens when the Map with null values for keys is stored.
Entity properties with a null value are not persisted by Spring Data Elasticsearch are not persisted as this it would store information that is not needed for saving/retrieving the data.
If you need the null values to be written, this would mean, that we'd need to add some flag to the #Field annotation for this, can you add an issue in Jira (https://jira.spring.io/projects/DATAES/issues) for this?
Edit: Implemented in versions 4.0.4.RELEASE and 4.1.0.RC1

Spring mongoDB aggregation group on date minute subset

I'm running Spring 2.2.7 with spring-data-mongodb.
I've an entity called BaseSample stored in a samples MongoDb collection and I want to group records by minute from the created date and getting the average value collected. I don't know how to use the DateOperators.Minute in the group aggregation operation.
detailed explanation
#Data
#Document(collection = "samples")
#EqualsAndHashCode
public class BaseSample extends Message {
private float value;
private UdcUnitEnum unit;
private float accuracy;
public LocalDateTime recorded;
}
that extends a Message class
#Data
#EqualsAndHashCode
public class Message {
#Indexed
private String sensorUuId;
#Indexed
private String fieldUuId;
#Indexed
private String baseStationUuId;
#Indexed
private Date created;
}
on which I want to apply this query
db.samples.aggregate ([
{
$match: {
$and : [ {fieldUuId:"BS1F1"}, {unit: "DGC"}, {created : {$gte : ISODate("2020-05-30T17:00:00.0Z")}}, {created : {$lt : ISODate("2020-05-30T17:15:00.0Z")}}]
}
},
{
$group: {
_id : { $minute : "$created"},
date : {$first : {$dateToString:{date: "$created", format:"%Y-%m-%d"}}},
time : {$first : {$dateToString:{date: "$created", format:"%H:%M"}}},
unit : {$first : "$unit"},
data : { $avg : "$value"}
}
},
{
$sort: {date:1, time:1}
}
])
on the samples collection (exerpt)
{
"_id" : ObjectId("5ed296150af58a1c60c4f154"),
"value" : 90.85242462158203,
"unit" : "HPA",
"accuracy" : 0.6498473286628723,
"recorded" : ISODate("2020-05-30T17:21:25.850Z"),
"sensorUuId" : "458f0ffd-13f9-466d-81a1-8d2e1c808da9",
"fieldUuId" : "BS1F2",
"baseStationUuId" : "BS1",
"created" : ISODate("2020-05-30T17:21:25.777Z"),
"_class" : "org.open_si.udc_common.models.BaseSample"
}
{
"_id" : ObjectId("5ed296150af58a1c60c4f155"),
"value" : 40.84038162231445,
"unit" : "HPA",
"accuracy" : 0.030185099691152573,
"recorded" : ISODate("2020-05-30T17:21:25.950Z"),
"sensorUuId" : "b396264f-fcd5-4653-8ac8-358ca3a4cb87",
"fieldUuId" : "BS2F3",
"baseStationUuId" : "BS2",
"created" : ISODate("2020-05-30T17:21:25.868Z"),
"_class" : "org.open_si.udc_common.models.BaseSample"
}
I coded the following method to get the average value of samples grouped per minute for a selected unit type (degree, ...) in a selected field (sensors logical group)
public List aggregateFromField(String fieldUuId, UdcUnitEnum unit, LocalDateTime from, LocalDateTime to, Optional pageNumber, Optional pageSize){
Pageable paging = new PagingHelper(pageNumber, pageSize).getPaging();
MatchOperation fieldMatch = Aggregation.match(Criteria.where("fieldUuId").is(fieldUuId));
MatchOperation unitMatch = Aggregation.match(Criteria.where("unit").is(unit.name()));
MatchOperation fromDateMatch = Aggregation.match(Criteria.where("created").gte(from));
MatchOperation toDateMatch = Aggregation.match(Criteria.where("created").lt(to));
DateOperators.Minute minute = DateOperators.Minute.minuteOf("created");
GroupOperation group = Aggregation.group("created")
.first("created").as("date")
.first("created").as("time")
.first("unit").as("unit")
.avg("value").as("avg")
;
SortOperation sort = Aggregation.sort(Sort.by(Sort.Direction.ASC, "$date")).and(Sort.by(Sort.Direction.ASC, "$time"));
SkipOperation skip = Aggregation.skip(paging.getOffset());
LimitOperation limit = Aggregation.limit(paging.getPageSize());
Aggregation agg = Aggregation.newAggregation(
fieldMatch,
unitMatch,
fromDateMatch,
toDateMatch,
group,
sort,
skip,
limit
);
AggregationResults<SampleAggregationResult> results = mongoTemplate.aggregate(agg, mongoTemplate.getCollectionName(BaseSample.class), SampleAggregationResult.class);
return results.getMappedResults();
}
this result class is :
#Data
public class SampleAggregationResult {
private String date;
private String time;
private String unit;
private float data;
}
Any idea on using the DateOperators.Minute type in the agrregation group operation ?
thnaks in advance.

n-gram implementaion in spring-boot ElasticSearch

I am trying to achieve autocomplete in elasticsearch, I am using it inside spring boot, I have tried a lot and tried with many example from internet but not able to make it. below is my code example pls help me on this.
Main Class:-
#SpringBootApplication
#EnableNatsAnnotations
#EnableAutoConfiguration
#EnableConfigurationProperties(ElasticsearchProperties.class)
#EntityScan(basePackages = {
"com.text.model"
})
#ComponentScan(
{
"com.text.elastic",
"com.text.elastic.controller",
"com.text.elastic.service",
"com.text.elastic.service.impl",
"com.text.nats.utils"
}
)
public class ElasticServicesApplication {
public static void main(String[] args) {
SpringApplication.run(ElasticServicesApplication.class, args);
}
}
Bean Class:-
#Setting(settingPath = "elasticsearch-settings.json")
#Document(indexName = "content", type = "content", shards = 1, replicas = 0, createIndex = true, refreshInterval = "-1")
public class Content {
#Id
private String id;
private Locale locale;
// #Field(type = text, index = true, store = true, analyzer = "standard")
#Field(
type = FieldType.String,
index = FieldIndex.analyzed,
searchAnalyzer = "standard",
//indexAnalyzer = "type_ahead",
analyzer = "standard"
/*,
store = true*/
)
private String contentTitle;
Here I want to achieve it in contentTitle.
Mapping
Concise way of using annotation:
#CompletionField()
private Completion suggest;
Or more powerful but tedious way:
{
"content" : {
"properties" : {
"contentTitle" : { "type" : "string" },
"suggest" : { "type" : "completion",
"analyzer" : "simple",
"search_analyzer" : "simple"
}
}
}
}
//Then refer to the mapping by `#Mapping`:
#Setting(settingPath = "elasticsearch-settings.json")
#Document(indexName = "content", type = "content", shards = 1, replicas = 0, createIndex = true, refreshInterval = "-1")
#Mapping(mappingPath = "/mappings/content-mapping.json")
public class Content {...}
Index
We can index as our common entity:
esTemplate.save(new File(...));
Query
The ElasticsearchTemplate has the method for query suggest:
public SuggestResponse suggest(SuggestBuilder.SuggestionBuilder<?> suggestion, String... indices);
Ref
Blog posts about completion
Official document about completion

Resources