Hibernate Search: Elasticsearch and Lucene yield different search results - elasticsearch

I am trying to implement a quite basic search functionality for my REST backend using Spring Data Rest and Hibernate Search. I would like to allow users to execute arbitrary queries by passing query strings to a search function. In order to be able to easier run the backend locally and to avoid having to spin up Elasticsearch to run tests, I would like to be able to work with a local index in these situations.
My problem is that the following code, does not yield equal results using local index compared to Elasticsearch. I am trying to limit the following code to what I believe is relevant.
The entity:
#Indexed(index = "MyEntity")
#AnalyzerDef(name = "ngram",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class ),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = {
#Parameter(name = "minGramSize", value = "2"),
#Parameter(name = "maxGramSize", value = "3") } )
}
)
public class MyEntity {
#NotNull
#Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES, analyzer = #Analyzer(definition = "ngram"))
private String name;
#Field(analyze = Analyze.YES, store = Store.YES)
#FieldBridge(impl = StringCollectionFieldBridge.class)
#ElementCollection(fetch = FetchType.EAGER)
private Set<String> tags = new HashSet<>();
}
application.yml for local index:
spring:
jpa:
hibernate:
ddl-auto: update
show-sql: false
application.yml for Elasticsearch:
spring:
jpa:
hibernate:
ddl-auto: create-drop
properties:
hibernate:
search:
default:
indexmanager: elasticsearch
elasticsearch:
host: 127.0.0.1:9200
required_index_status: yellow
Search endpoint:
private static String[] FIELDS = { "name", "tags" };
#Override
public List<MyEntity> querySearch(String queryString) throws ParseException {
QueryParser queryParser = new MultiFieldQueryParser(FIELDS, new SimpleAnalyzer());
queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
org.apache.lucene.search.Query query = queryParser.parse(queryString);
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(this.entityManager);
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, MyEntity.class);
return persistenceQuery.getResultList();
}
I create a instance of MyEntity with the following values:
$ curl 'localhost:8086/myentities'
{
"_embedded" : {
"myentities" : [ {
"name" : "Test Entity",
"tags" : [ "bar", "foobar", "foo" ],
"_links" : {
...
}
} ]
},
"_links" : {
...
}
}
The following queries work (return that entity) using Elasticsearch:
name:Test
name:Entity
tags:bar
Using a local index, I get the result for "tags:bar: but the queries on the name field return not results. Any ideas why this is the case?

You should make sure that the Elasticsearch mapping is properly created by Hibernate Search. By default, Hiberante Search will only create a mapping if it is missing.
If you launched your application once, then changed the mapping, and launched the application again, it is possible that the name field does not have the correct in Elasticsearch.
In development mode, try this:
spring:
jpa:
hibernate:
ddl-auto: create-drop
properties:
hibernate:
search:
default:
indexmanager: elasticsearch
elasticsearch:
host: 127.0.0.1:9200
required_index_status: yellow
index_schema_management_strategy: drop-and-create-and-drop
See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#elasticsearch-schema-management-strategy
Note that documents being successfully indexed is unfortunately not an indication that your mapping is correct: Elasticsearch even creates fields dynamically when you try to index unknown fields trying to guess their type (generally wrong, in the case of text fields...). You can use the validate index management strategy to be really sure that, on bootstrap, the Elasticsearch mapping is in sync with Hibernate Search.

Related

Spring Mongo Aggregation give a conversion error

I'm trying to use Mongo aggregation but I receive an error that I don't understand.
This is my domain:
#Document(collection = "tapes")
public class Tape {
#Id
private String id;
private String area;
private Integer tape;
private String tapeModel;
// follow getters e setters
The mongo shell command and the output is the following:
> db.tapes.aggregate([{ $group: { _id: { "area":"$area"}, tapes: {$push: {tape: "$tape"}}}} ])
{ "_id" : { "area" : "free" }, "tapes" : [ { "tape" : 1 }, { "tape" : 2 } ] }
{ "_id" : { "area" : "Qnap" }, "tapes" : [ { "tape" : 3 } ] }
The following is an attempt to re-create the aggregation in Spring:
AggregationOperation group = Aggregation.group("area").push("tape").as("tape");
Aggregation aggregation = Aggregation.newAggregation(group);
AggregationResults<Tape> results = mongoTemplate.aggregate(aggregation, "tapes", Tape.class);
//List<Tape> tapes = mongoTemplate.aggregate(aggregation, mongoTemplate.getCollectionName(Tape.class), Tape.class).getMappedResults();
List<Tape> tapes = results.getMappedResults();
System.out.println(tapes);
But I obtain the following error:
Cannot convert [3] of type class java.util.ArrayList into an instance of class java.lang.Integer! Implement a custom Converter<class java.util.ArrayList, class java.lang.Integer> and register it with the CustomConversions. Parent object was: it.unifi.cerm.cermadminspring.domain.Tape#2b84da07 -> null
org.springframework.data.mapping.MappingException: Cannot convert [3] of type class java.util.ArrayList into an instance of class java.lang.Integer! Implement a custom Converter<class java.util.ArrayList, class java.lang.Integer> and register it with the CustomConversions. Parent object was: it.unifi.cerm.cermadminspring.domain.Tape#2b84da07 -> null
I don't understand why, I searched for aggregation examples and all are more or less similar to mine.
Someone can help me?
First, for making life easier, a more simplified aggregation can be used:
> db.tapes.aggregate([ {$group: {_id: "$area", tapes: {$push: "$tape"}}} ])
Which should yield:
{ "_id" : "free", "tapes" : [ 1 , 2 ] }
{ "_id" : "Qnap", "tapes" : [ 3 ] }
Which should be matched by a change to the Java group aggregation operation:
AggregationOperation group = Aggregation.group("area").push("tape").as("tapes");
Note that I've changed to plural: as("tapes")
Then, notice that you are actually returning a document which doesn't have the same structure as the one you've mapped in the Tape class. That document contains two fields, the String id and a List<Integer> tapes fields.
This is the reason for the shorthand group aggregation I've suggested above, for making mapping easier:
public class TapesForArea {
private String id; // which is the area
private List<Integer> tapes;
// getters, setters ...
}
You don't need to map this class using spring-data-mongodb annotations.
Finally, have the aggregation results return the right type:
AggregationResults<TapesForArea> results =
mongoTemplate.aggregate(aggregation, "tapes", TapesForArea.class);
List<TapesForArea> tapes = results.getMappedResults();
BTW, the error comes from the fact that you try to map the single item tape array [ 3 ] into private Integer tape; property of Tape class.

How to use Term Query for nested objects in spring data elasticsearch?

My Document is like:
class Foo{
private Integer idDl;
private String Name;
private String Add;
#Field(type = FieldType.Nested)
private List< Bar> Bar;
}
class Bar{
private String barId;
private List<String> barData
}
and Foo sample response data is like:
{
"idDl": 123,
"Name": "ABCD",
"Add": "FL",
"Bar": [
{
"barId": "A456B",
"barData": [
"Bar1",
"Bar2"
]
},
{
"barId": "A985D",
"barData": [
"Bar4",
"Bar5"
]
}
]
}
I want to return all Fooobjects where Bar.barId is matching. I am using NativeSearchQueryBuilder provided by spring-data-elasticsearch as:
String[] includeFields = new String[]{"idDl", "Name"};
String[] excludeFields = new String[]{"Add"}; // to exclude Add field of Foo
Query searchQuery = new NativeSearchQueryBuilder()
.withQuery(termQuery("Bar.barId", "A456B"))
.withSourceFilter(new FetchSourceFilter(includeFields, excludeFields))
.build();
return elasticsearchRestTemplate.queryForList( searchQuery, Foo.class);
We have also tried using nestedQuery as follows:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(nestedQuery("Bar",
boolQuery().must(termQuery("Bar.barId", "A456B")), ScoreMode.Max))
.withIndices(indices)
.withSourceFilter(new FetchSourceFilter(includeFields, excludeFields))
.build();
return elasticsearchRestTemplate.queryForList(searchQuery, Foo.class);
But getting exception as:
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1092) ~[elasticsearch-rest-high-level-client-6.8.7.jar:6.8.7]
I am using termQuery as in the first snippet but i ain't getting response for it and but instead if i use matchQuery("Bar.barId", "A456B") I am getting the response. We just want to check query performance using termQuery and matchQuery.How to fetch the data using termQuery ?
P.S: we are using spring-boot-starter-data-elasticsearch 2.2.6.RELEASE in our spring-boot project.
We have the similar requirements and solved using this snippet, I've tried to covert it, according to your requirement. Code is pretty straight forward, let me know if you need further clarification.
BoolQueryBuilder boolQueryBuilder = boolQuery();
BoolQueryBuilder nestedBoolQueryBuilder = boolQuery().must(boolQuery()
.should(termQuery("Bar.barId", barId.toLowerCase()))).minimumNumberShouldMatch(1);
QueryBuilder nestedQueryBuilder = nestedQuery("Bar", nestedBoolQueryBuilder);
boolQueryBuilder = boolQueryBuilder.must(nestedQueryBuilder);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(boolQueryBuilder)
.withPageable(pageable)
.build();
You haven't specified any analyser. So default one is used standard analyzer
Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
Reference
Term query do not analyse which means it looks for A456B but index contains a456b due to the behaviour of standard analyzer which contains lowercase tokenizer
Whereas match query is a full text search query which does the analyser on index time and search time. So search time a456bmatches the words in the indexa3456b`.

Spring Boot + Mongo - com.mongodb.BasicDocument, you can't add a second 'id' criteria

Any ideas why I get this error when making a query:
org.springframework.data.mongodb.InvalidMongoDbApiUsageException: Due to limitations of the com.mongodb.BasicDocument, you can't add a second 'id' criteria. Query already contains '{ "id" : "123"}'
I'm using Spring Boot and Mongo:
fun subGenreNames(subGenreIds: List<String>?): List<String> {
val results = mutableListOf<String>()
var query = Query()
subGenreIds!!.forEach{
query.addCriteria(Criteria.where("id").`is`(it))
var subGenreName = mongoTemplate.findById(it, SubGenre::class.java)
results.add(subGenreName!!.name)
}
return results
}
I have the class SubGenre set with:
#Document(collection = "subgenres")
data class SubGenre(
#Field("id")
val id: String,
val name: String
)
Thanks
Based on your code, you need to use either
query.addCriteria(Criteria.where("id").`is`(it))
var subGenreName = mongoTemplate.find(query, SubGenre::class.java)
or
var subGenreName = mongoTemplate.findById(it, SubGenre::class.java)
but not both.

More Like This Query Not Getting Serialized - NEST

I am trying to create an Elasticsearch MLT query using NEST's object initializer syntax. However, the final query when serialized, is ONLY missing the MLT part of it. Every other query is present though.
When inspecting the query object, the MLT is present. It's just not getting serialized.
I wonder what I may be doing wrong.
I also noticed that when I add Fields it works. But I don't believe fields is a mandatory property here that when it is not set, then the MLT query is ignored.
The MLT query is initialized like this;
new MoreLikeThisQuery
{
Like = new[]
{
new Like(new MLTDocProvider
{
Id = parameters.Id
}),
}
}
MLTDocProvider implements the ILikeDocument interface.
I expect the serialized query to contain the MLT part, but it is the only part that is missing.
This looks like a bug in the conditionless behaviour of more like this query in NEST; I've opened an issue to address. In the meantime, you can get the desired behaviour by marking the MoreLikeThisQuery as verbatim, which will override NEST's conditionless behaviour
var client = new ElasticClient();
var parameters = new
{
Id = 1
};
var searchRequest = new SearchRequest<Document>
{
Query = new MoreLikeThisQuery
{
Like = new[]
{
new Like(new MLTDocProvider
{
Id = parameters.Id
}),
},
IsVerbatim = true
}
};
var searchResponse = client.Search<Document>(searchRequest);
which serializes as
{
"query": {
"more_like_this": {
"like": [
{
"_id": 1
}
]
}
}
}

Spring boot custom query MongoDB

I have this MongoDb query:
db.getCollection('user').find({
$and : [
{"status" : "ACTIVE"},
{"last_modified" : { $lt: new Date(), $gte: new Date(new Date().setDate(new Date().getDate()-1))}},
{"$expr": { "$ne": ["$last_modified", "$time_created"] }}
]
})
It works in Robo3T, but when I put this in spring boot as custom query, it throws error on project start.
#Query("{ $and : [ {'status' : 'ACTIVE'}, {'last_modified' : { $lt: new Date(), $gte: new Date(new Date().setDate(new Date().getDate()-1))}}, {'$expr': { '$ne': ['$last_modified', '$time_created']}}]}")
public List<User> findModifiedUsers();
I tried to make query with Criteria in spring:
Query query = new Query();
Criteria criteria = new Criteria();
criteria.andOperator(Criteria.where("status").is(UserStatus.ACTIVE), Criteria.where("last_modified").lt(new Date()).gt(lastDay), Criteria.where("time_created").ne("last_modified"));
but it doesn't work, it returns me all users like there is no this last criteria not equal last_modified and time_created.
Does anyone know what could be problem?
I think that this feature is not supported yet by Criteria - check this https://jira.spring.io/browse/DATAMONGO-1845 .
One workaround is to pass raw query via mongoTemplate like this:
BasicDBList expr = new BasicDBList();
expr.addAll(Arrays.asList("$last_modified","$time_created"));
BasicDBList and = new BasicDBList();
and.add(new BasicDBObject("status","ACTIVE"));
and.add(new BasicDBObject("last_modified",new BasicDBObject("$lt",new Date()).append("$gte",lastDate)));
and.add(new BasicDBObject("$expr",new BasicDBObject("$ne",expr)));
Document document = new Document("$and",and);
FindIterable<Document> result = mongoTemplate.getCollection("Users").find(document);

Resources