Self Join scenario in spring data elastic seach - elasticsearch

How to search self referencing documents in elastic search?
public class ProductDocument {
#Id
private String id;
private String title;
private List<String> tags;
//private List<ProductDocument> relatedProducts -- Doesn't work
private List<String> relatedProducts;
}
So while searching i want to perform operations like
{
"query": {
"multi_match" : {
"query": "cloths",
"fields": [ "title", "tags", "relatedProducts.title", "relatedProducts.tags" ]
}
}
}
Sample DataSet:
id title tags relatedProducts
1. Book null null
2. WM. cloths. 1
3. cloths. 2
Output:
id title tags relatedProducts
2. WM. cloths. 1
3. cloths. 2
How can this be achieved? I searched around and found nothing so far. any help is highly appreciated

Need to use a nested type
public class ProductDocument {
#Id
private String id;
private String title;
private List<String> tags;
// Use the nested datatype for the related products field
#Field(type = FieldType.Nested)
private List<ProductDocument> relatedProducts;
}
Then, when you perform a search, you can use the nested query to search for matches in the related products field. Here's an example of how you could do that:
{
"query": {
"nested": {
"path": "relatedProducts",
"query": {
"multi_match" : {
"query": "cloths",
"fields": ["title", "tags"]
}
}
}
}
}

Related

Spring Boot, query Elasticsearch specific fields from already indexed data created by Elastic Stack

The target is to query specific fields from an index via a spring boot app.
Questions in the end.
The data in elasticsearch are created from Elastic Stack with Beats and Logstash etc. There is some inconsistency, eg some fields may be missing on some hits.
The spring app does not add the data and has no control on the fields and indexes
The query I need, with _source brings
GET index-2022.07.27/_search
{
"from": 0,
"size": 100,
"_source": ["#timestamp","message", "agent.id"],
"query": {
"match_all": {}
}
}
brings the hits as
{
"_index": "index-2022.07.27",
"_id": "C1zzPoIBgxar5OgxR-cs",
"_score": 1,
"_ignored": [
"event.original.keyword"
],
"_source": {
"agent": {
"id": "ddece977-9fbb-4f63-896c-d3cf5708f846"
},
"#timestamp": "2022-07-27T09:18:27.465Z",
"message": """a message"""
}
},
and with fields instead of _source is
{
"_index": "index-2022.07.27",
"_id": "C1zzPoIBgxar5OgxR-cs",
"_score": 1,
"_ignored": [
"event.original.keyword"
],
"fields": {
"#timestamp": [
"2022-07-27T09:18:27.465Z"
],
"agent.id": [
"ddece977-9fbb-4f63-896c-d3cf5708f846"
],
"message": [
"""a message"""
]
}
},
How can I get this query with Spring Boot ?
I lean on StringQuery with the RestHighLevelClient as below but cant get it to work
Query searchQuery = new StringQuery("{\"_source\":[\"#timestamp\",\"message\",\"agent.id\"],\"query\":{\"match_all\":{}}}");
SearchHits<Items> productHits = elasticsearchOperations.search(
searchQuery,
Items.class,
IndexCoordinates.of(CURRENT_INDEX));
What form must Items.class have? What fields?
I just need timestamp, message, agent.id. The later is optional, it may not exist.
How will the mapping work?
versions:
Elastic: 8.3.2
Spring boot: 2.6.6
elastic (mvn): 7.15.2
spring-data-elasticsearch (mvn): 4.3.3
official documentation states that with RestHighLevelClient the versions should be supported
Support for upcoming versions of Elasticsearch is being tracked and
general compatibility should be given assuming the usage of the
high-level REST client.
You can define an entity class for the data you want to read (note I have a nested class for the agent):
#Document(indexName = "index-so", createIndex = false)
public class SO {
#Id
private String id;
#Field(name = "#timestamp", type = FieldType.Date, format = DateFormat.date_time)
private Instant timestamp;
#Field(type = FieldType.Object)
private Agent agent;
#Field(type = FieldType.Text)
private String message;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public Instant getTimestamp() {
return timestamp;
}
public void setTimestamp(Instant timestamp) {
this.timestamp = timestamp;
}
public Agent getAgent() {
return agent;
}
public void setAgent(Agent agent) {
this.agent = agent;
}
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
class Agent {
#Field(name = "id", type = FieldType.Keyword)
private String id;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
}
}
The query then would be:
var query = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery())
.withSourceFilter(new FetchSourceFilter(
new String[]{"#timestamp", "message", "agent.id"},
new String[]{}))
.build();
var searchHits = operations.search(query, SO.class);

Using #Query to access a data object?

I have a Spring Elastic Repository function that has way too many parameters (11 parameters).
I want to be able to access members of class objects to reduce the parameter count. Is there a way to access class members via #Query? For example, I want to access members of the following class:
public class Header {
int version;
string timestamp;
}
and want to do a query as follows:
#Query("""
{
"bool":{
"must":[
{ "match": { "header_version": {"query": "<header version substitution>"}}},
{ "match": { "header_timestamp": {"query": "<header timestamp substitution>"}}},
{ "match": { "body": {"query": "<body value>"}}}
]
}
}
""")
List<Result> findEntry(Header header, String body);
Where:
public class Result {
int version;
String timestamp;
String body;
}

Find all embedded documents from manual reference in MongoDB

I use MongoDB and Spring Boot in a project. I used manual reference to point out a collection, My structure is as follows:
Reel collection
{
_id : "reel_id_1",
name: "reel 1",
category :[
{
_id : "category_id_1",
name: "category 1",
videos: ["video_id_1","video_id_2"]
}
]
}
Video collection
{
_id: "video_id_1", // first document
name: "mongo"
}
{
_id: "video_id_2", // seconddocument
name: "java"
}
Java classes are
#Document
#Data
public class Reel {
#Id
private ObjectId _id;
private String name;
List<Category> category;
}
#Data
public class Category {
#Id
private ObjectId _id=new ObjectId();
private String name;
Video videos;
}
#Document
#Data
public class Video {
#Id
private ObjectId _id = new ObjectId();
private String name;
}
I tried to join both document via mongoTemplate
public List<Reel> findById(ObjectId _id) {
LookupOperation lookupOperation = LookupOperation.newLookup()
.from("video")
.localField("category.videos")
.foreignField("_id")
.as("category.videos");
UnwindOperation unwindOperation = Aggregation.unwind("category");
Aggregation agg = newAggregation(unwindOperation,match(Criteria.where("_id").is(_id)),lookupOperation);
Aggregation aggregation = newAggregation(lookupOperation);
List<Reel> results = mongoTemplate.aggregate(aggregation, "reel", Reel.class).getMappedResults();
return results;
}
But it throws an error.
Failed to instantiate java.util.List using constructor NO_CONSTRUCTOR with arguments
But since I use "unwind", I created a new Entity UnwindReel and add Category category instead of List<Category> category. And used
List<UnwindReel> results = mongoTemplate.aggregate(aggregation, "reel", UnwindReel.class).getMappedResults();
It combines only first video (video_id_1) object. How can I get all objects inside videos array? Is there any method to fetch?
Your JSON stored in database has wrong structure. Your Reel class expects list of Category, but in database you have stored as nested object.
You need to add this stage just after $lookup
{
"$addFields": {
"category": {
"$map": {
"input": "$category.videos",
"in": {
"videos": "$$this"
}
}
}
}
}
Java code
public List<Reel> findById(String _id) {
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("_id").is(_id)),
Aggregation.lookup(mongoTemplate.getCollectionName(Video.class), "category.videos", "_id", "category.videos"),
new AggregationOperation() {
#Override
public Document toDocument(AggregationOperationContext context) {
return new Document("$addFields",
new Document("category", new Document("$map", new Document("input", "$category.videos")
.append("in", new Document("videos", "$$this")))));
}
})
.withOptions(AggregationOptions.builder().allowDiskUse(Boolean.TRUE).build());
LOG.debug(
aggregation.toString().replaceAll("__collection__", mongoTemplate.getCollectionName(Reel.class)));
return mongoTemplate.aggregate(aggregation, mongoTemplate.getCollectionName(Reel.class), Reel.class)
.getMappedResults();
}
Recomendations
Do not hard-code collection name, use better mongoTemplate.getCollectionName method
Always log aggregation pipeline before performing, it helps debugging.
If your collection will grow in the future, use {allowDiskUse: true} MongoDb aggregation option.

Hibernate search and elasticsearch: mapper_parsing_exception + analyzer [...] not found for field [...]

I'm using hibernate search to automatically create indexes for a specific entity
#Entity
#Indexed
public class Entity extends BaseEntity {
private static final long serialVersionUID = -6465422564073923433L;
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
#OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, orphanRemoval = true)
#Field(bridge = #FieldBridge(impl = PropertyFieldBridge.class))
private List<PropertyValue> properties = new ArrayList<>(); // PropertyValue is an abstract
}
The field bridge is creating the string fields with the format: pt_[a-zA-Z0-9]+_i18n.
And after that i'm creating a dynamic template to deal with translated fields:
PUT {{elasticsearch}}/com.orm.entity.entity.entity/com.orm.entity.entity.Entity/_mapping
{
"com.gamila.api.orm.entity.entity.Entity": {
"dynamic_templates": [
{
"my_analyzer": {
"match_mapping_type": "string",
"match_pattern": "regex",
"match": "^pt_[a-zA-Z0-9]+_i18n",
"mapping": {
"type": "text",
"analyzer": "portugueseAnalyzer"
}
}
}
]
}
}
But when i'm creating an entity it always returns an error:
Response: 400 'Bad Request' with body
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [portugueseAnalyzer] not found for field [pt_name_i18n]"
}
],
"type": "mapper_parsing_exception",
"reason": "analyzer [portugueseAnalyzer] not found for field [pt_name_i18n]"
},
"status": 400
}
The portugueseAnalyser is defined through:
public class Analyzer implements ElasticsearchAnalysisDefinitionProvider {
#Override
public void register(ElasticsearchAnalysisDefinitionRegistryBuilder builder) {
builder.analyzer("portugueseAnalyzer")
.withTokenizer("standard")
.withTokenFilters("lowercase", "portugueseStemmer", "portugueseStop", "edge_ngram_3");
builder.tokenFilter("portugueseStemmer")
.type("stemmer").param("language", "portuguese");
builder.tokenFilter("portugueseStop")
.type("stop").param("stopwords", "_portuguese_");
}
}
Can somebody tell me what i'm doing wrong? I've already skim through some questions here in stackoverflow without success.
PS: i'm using elasticsearch (5.6) from AWS
Thanks in advance
When Hibernate Search 5 pushes the mapping to Elasticsearch, it will only include analyzer definitions for analyzers that are actually used somewhere in the mapping. In your case, the analyzer is not used as far as Hibernate Search is concerned, so it is ignored.
As a workaround in Hibernate Search 5, you can declare a dummy field that uses your analyzer, but will never be populated. You can find an example of how to do that here.

How to query nested objects from MongoDB using Spring Boot (REST) and TextQuery?

I am writing RESTful API to search MongoDB collection named 'global' by criteria using TextQuery. My problem is that I cannot access nested object fields when doing query.
For example this works:
GET localhost:8080/search?criteria=name:'name'
And this does not:
GET localhost:8080/search?criteria=other.othername:'Other Name'
I have MongoDB json structure (imported from JSON into 'global' collection as whole nested objects)
[{
"name": "Name",
"desc": "Desc",
"other" {
"othername": "Other Name",
}
},
{
"name": "Name",
"desc": "Desc",
"other" {
"othername": "Other Name",
}
}
]
And classes (with getters & setters & etc):
#Document(collection="global")
public class Global{
#TextIndexed
String name;
#TextIndexed
String desc;
Other other;
...
}
public class Other{
String othername;
...
}
My controller has method
#GetMapping("/search")
public Iterable<Global> getByCriteria(#RequestParam("criteria") String criteria) {
...
}
And I am trying to write text search with
public Iterable<Global> findByCriteria(String criteria) {
TextCriteria criteria = TextCriteria.forDefaultLanguage().matching(criteria);
TextQuery query = TextQuery.queryText(criteria);
return mongoTemplate.find(query, Global.class);
}
You need to add #TextIndexed to your other field.
public class Global{
#TextIndexed
String name;
#TextIndexed
String desc;
#TextIndexed
Other other;
...
}
Note: All nested object fields are searchable
or you may add #TextIndexed for each nested object field:
public class Other {
#TextIndexed
String othername;
...
}

Resources