Elasticsearch search templates - How to construct the search terms in NEST - elasticsearch

Currently I have a search template that I am trying to pass in a couple of parameters,
How can I construct my search terms using NEST to get the following result.
Template
PUT _scripts/company-index-template
{
"script": {
"lang": "mustache",
"source": "{\"query\": {\"bool\": {\"filter\":{{#toJson}}clauses{{/toJson}},\"must\": [{\"query_string\": {\"fields\": [\"companyTradingName^2\",\"companyName\",\"companyContactPerson\"],\"query\": \"{{query}}\"}}]}}}",
"params": {
"query": "",
"clauses": []
}
}
}
DSL query looks as follow
GET company-index/_search/template
{
"id": "company-index-template",
"params": {
"query": "sky*",
"clauses": [
{
"terms": {
"companyGroupId": [
1595
]
}
},
{
"terms": {
"companyId": [
158,
836,
1525,
2298,
2367,
3176,
3280
]
}
}
]
}
}
I would like to construct the above query in NEST but cant seem to find a good way to generate the clauses value.
This is what I have so far...
var responses = this.client.SearchTemplate<Company>(descriptor =>
descriptor
.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", queryBuilder.Query)
.Add("clauses", "*How do I contruct this JSON*");
UPDATE:
This is how I ended up doing it. I just created a dictionary with all my terms in it.
I do think there might be a beter why of doing it, but I cant find it.
new List<Dictionary<string, object>>
{
new() {{"terms", new Dictionary<string, object> {{"companyGroupId", companyGroupId}}}},
new() {{"terms", new Dictionary<string, object> {{"companyId", availableCompanies}}}}
}
And then I had to Serialize when I passed it to the Params method.
var response = this.client.SearchTemplate<Company>(descriptor =>
descriptor.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", "*" + query + "*")
.Add("clauses", JsonConvert.SerializeObject(filterClauses))));

Related

How to query elastic search with Hashmap

I would like to query the Elastic Search with map of values and retrieve the documents.
Example:
I have indexed the below two documents
1. {
"timestamp": 1601498048,
"props": {
"cp1": "cv1",
"cp2": "cv2"
}
}
2. {
"timestamp": 1601498098,
"props": {
"key1": "v1",
"key2": "v2"
}
}
So, I wanted to query with the entire map values props with
"props"
{
"cp1": "cv1",
"cp2": "cv2"
}
and return documents only for the entired matched map values. So in this case the result would be only first document, since it matched the given props.
I can able to query with only single map value like below , but need to search for entire map.
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : [
{
"terms" : {
"customProperties.cp1.keyword" : [ "cv1" ]
}
}
]
}
}
}
'
So how we query for entire map props and return documents only if all map key-values matched.
Update
Mainly I need a QueryBuilder to search with map of values. I could do for set of values like below
val sampleSet = setOf("foo", "bar")
val query = NativeSearchQueryBuilder()
.withQuery(
QueryBuilders.termsQuery(
"identifiers.endpointId.keyword", sampleSet)
)
.build()
I need QueryBuilder to search with map of values in the ES index and return document only if entire map values matches.
Suggestions please.
you must apply double match clausule.
{
"query": {
"bool": {
"must": [
{
"match": {
"props.cp1": "cv1"
}
},
{
"match": {
"props.cp2": "cv2"
}
}
]
}
}
}
Or Term.
{
"query": {
"bool": {
"must": [
{
"term": {
"props.cp1.keyword": "cv1"
}
},
{
"term": {
"props.cp2.keyword": "cv2"
}
}
]
}
}
}
This worked. I just looped through the queryBuilder with map values props.
val builder = QueryBuilders.boolQuery()
for (prop in props) {
builder.must(QueryBuilders.matchQuery("customProperties.${prop.key}", prop.value))
}
val query = NativeSearchQueryBuilder().withQuery(builder)
println("results + $queryForList(query)")
passed query to this function
internal fun queryForList(query: NativeSearchQuery): List<DocumentType> {
val resp = searchOperations.search(query, type, IndexCoordinates.of(indexName))
return resp.searchHits.map { it.content }
}

Elasticsearch Java - use Search Template to query

I have below code working fine in my java service.
Query searchQuery = new StringQuery(
"{\"bool\":{\"must\":[{\"match\":{\"id\":\"" + id + "\"}}]}}");
SearchHits<Instance> instanceSearchHits = elasticsearchOperations.search(searchQuery, Instance.class, IndexCoordinates.of("test"));
log.info("hits :: " + instanceSearchHits.getSearchHits().size());
Now, I want to save this query as a template in elastic and just pass params and search template from java service to elastic to execute the query.
Search Template added in Elastic
PUT _scripts/search-template-1
{
"script": {
"lang": "mustache",
"source": {
"query": {
"bool": {
"must": [
{
"term": {
"id": "{{id}}"
}
}
]
}
}
},
"params": {
"id": "id to search"
}
}
}
Call this template
GET test/_search/template
{
"id": "search-template-1",
"params": {
"id": "f52c2c62-e921-4410-847f-25ea0f3eeb40"
}
}
But unfortunately not able to find API reference for the same to call this search template from JAVA (spring-data-elasticsearch)
As mentioned by val this can be used to call the search template query from java
SearchTemplateRequest request = new SearchTemplateRequest();
request.setRequest(new SearchRequest("posts"));
request.setScriptType(ScriptType.STORED);
request.setScript("title_search");
Map<String, Object> params = new HashMap<>();
params.put("field", "title");
params.put("value", "elasticsearch");
params.put("size", 5);
request.setScriptParams(params);
SearchTemplateResponse response = client.searchTemplate(request, RequestOptions.DEFAULT);
SearchResponse searchResponse = response.getResponse();
SearchHits searchHits = searchResponse.getHits();
log.info("hits :: " + searchHits.getMaxScore());
searchHits.forEach(searchHit -> {
log.info("this is the response, " + searchHit.getSourceAsString());
});
This is currently not yet possible. There is an issue for this in Spring Data Elasticsearch.

Elasticsearch merging documents in response

I am having data in 3 indexes. I want to generate a invoice report using information from other indexes. For example the following are the sample document of each index
Users index
{
"_id": "userId1",
"name": "John"
}
Invoice index
{
"_id": "invoiceId1",
"userId": "userId1",
"cost": "10000",
"startdate": "",
"enddate": ""
}
Orders index
{
"_id": "orderId1",
"userId": "userId1",
"productName": "Mobile"
}
I want to generate a invoice report by combining information from these three indexes as follows
{
"_id": "invoiceId1",
"userName": "John",
"productName": "Mobile",
"cost": "10000",
"startdate": "",
"enddate": ""
}
How to write Elasticsearch query which returns response by combining information from other index documents?
You cannot do query-time joins in Elasticsearch and will need to denormalize your data in order to efficiently retrieve and group it.
Having said that, you could:
leverage the multi-target syntax and query multiple indices at once
use an OR query on the id and userId -- since either of those is referenced at least once in any of your docs
and then trivially join your data through a map/reduce tool called scripted metric aggregations
Quick side note: you won't be able to use the _id keyword inside your docs because it's reserved.
Assuming your docs and indices are structured as follows:
POST users_index/_doc
{"id":"userId1","name":"John"}
POST invoices_index/_doc
{"id":"invoiceId1","userId":"userId1","cost":"10000","startdate":"","enddate":""}
POST orders_index/_doc
{"id":"orderId1","userId":"userId1","productName":"Mobile"}
Here's how the scripted metric aggregation could look like:
POST users_index,invoices_index,orders_index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"id.keyword": {
"value": "userId1"
}
}
},
{
"term": {
"userId.keyword": {
"value": "userId1"
}
}
}
]
}
},
"aggs": {
"group_by_invoiceId": {
"scripted_metric": {
"init_script": "state.users = []; state.invoices = []; state.orders = []",
"map_script": """
def source = params._source;
if (source.containsKey("name")) {
// we're dealing with the users index
state.users.add(source);
} else if (source.containsKey("cost")) {
// we're dealing with the invoices index
state.invoices.add(source);
} else if (source.containsKey("productName")) {
// we're dealing with the orders index
state.orders.add(source);
}
""",
"combine_script": """
def non_empty_state = [:];
for (entry in state.entrySet()) {
if (entry != null && entry.getValue().length > 0) {
non_empty_state[entry.getKey()] = entry.getValue();
}
}
return non_empty_state;
""",
"reduce_script": """
def final_invoices = [];
def all_users = [];
def all_invoices = [];
def all_orders = [];
// flatten all resources
for (state in states) {
for (kind_entry in state.entrySet()) {
def map_kind = kind_entry.getKey();
if (map_kind == "users") {
all_users.addAll(kind_entry.getValue());
} else if (map_kind == "invoices") {
all_invoices.addAll(kind_entry.getValue());
} else if (map_kind == "orders") {
all_orders.addAll(kind_entry.getValue());
}
}
}
// iterate the invoices and enrich them
for (invoice_entry in all_invoices) {
def invoiceId = invoice_entry.id;
def userId = invoice_entry.userId;
def userName = all_users.stream().filter(u -> u.id == userId).findFirst().get().name;
def productName = all_orders.stream().filter(o -> o.userId == userId).findFirst().get().productName;
def cost = invoice_entry.cost;
def startdate = invoice_entry.startdate;
def enddate = invoice_entry.enddate;
final_invoices.add([
'id': invoiceId,
'userName': userName,
'productName': productName,
'cost': cost,
'startdate': startdate,
'enddate': enddate
]);
}
return final_invoices;
"""
}
}
}
}
which'd return
{
...
"aggregations" : {
"group_by_invoiceId" : {
"value" : [
{
"cost" : "10000",
"enddate" : "",
"id" : "invoiceId1",
"userName" : "John",
"startdate" : "",
"productName" : "Mobile"
}
]
}
}
}
Summing up, there are workarounds to achieve query-time joins. At the same time, scripts like this shouldn't be used in production because they could take forever.
Instead, this aggregation should be emulated outside of Elasticsearch after the query resolves and returns the index-specific hits.
BTW — I set size: 0 to return just the aggregation results so increase this parameter if you want to get some actual hits.

Painless scripting initialize new array

I'm trying to add or update a nested object in Elasticsearch using a script. Below script works fine if integrations is already an array, but when it is null, below script throws a null pointer exception. How do I initialize ctx._source.integrations to be an empty array if its value is null? (Something like the equivalent of JavaScript's myObject.integrations = myObject.integrations ?? [])
POST /products/_update/VFfrnQrKlC5bwdfdeaQ7
{
"script": {
"source": """
ctx._source.integrations.removeIf(i -> i.id == params.integration.id);
ctx._source.integrations.add(params.integration);
ctx._source.integrationCount = ctx._source.integrations.length;
""",
"params": {
"integration": {
"id": "dVTV8GjHj8pXFnlYUUlI",
"from": true,
"to": false,
"vendor": "sfeZWDpZXlF5Qa8mUsiF",
"targetProduct": {
"id": "nyILhphvCrGYm53cfaOx",
"name": "Test Product",
"categoryIds": []
}
}
}
}
}
ok i think this does the trick:
if (ctx._source.integrations == null) {
ctx._source.integrations = new ArrayList();
}
is there a short hand to this like in the JS example?

Translate ElasticSearch query to Nest c#

I need some help in creating an AggregationDictionary from the following elasticsearch query
GET organisations/_search
{
"size": 0,
"aggs": {
"by_country": {
"nested": {
"path": "country"
},
"aggs": {
"by_country2": {
"filter": {
"bool": {
"must": [
{
"term": {
"country.isDisplayed": "true"
}
}
]
}
},
"aggs": {
"by_country3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
}
}
}
}
}
}
I managed to write this horrible piece of code which I am pretty sure it is wrong, I am totally new to this.
AggregationDictionary aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[] {
new NestedQuery
{
Query = new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Can someone tell me if I am in the correct direction? I am using Nest 6.6.0. Is there any tool that helps with these translations?
What you have so far is pretty solid, but when you try to execute this aggregation with the following call
var searchAsync = await client.SearchAsync<Document>(s => s.Size(0).Aggregations(aggs));
you will get this error
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
}
],
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
},
"status" : 400
}
Checking request which was sent to elasticsearch give us the answer why it happened
{
"aggs": {
"countries_step1": {
"aggs": {
"countries_step2": {
"aggs": {
"countries_step3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
},
"filter": {}
}
},
"nested": {
"path": "country"
}
}
},
"size": 0
}
filter clause is empty, this is because you tried to used nested query but you didn't pass path parameter. We don't need nested query here (as shown in your example query), we can simplify the whole query to
var aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Now we have a valid request sent to elasticsearch.
There are a couple of things we can improve here:
1. Remove unnecessary bool query
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
to
Filter =
new TermQuery
{
Field = "country.isDisplayed",
Value = true
},
2. Replace string field names
Usually, when doing calls from .Net there is some kind of POCO type which is helping us with writing strongly-typed requests to elasticsearch which helps us managing clean code and refactoring. With this, we can change field definition from
"country.displayName.keyword"
to
Infer.Field<Document>(f => f.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
my types definition
public class Document
{
public int Id { get; set; }
[Nested]
public List<Country> Country { get; set; }
}
public class Country
{
public bool IsDisplayed { get; set; }
public string DisplayName { get; set; }
}
3. Consider using a fluent syntax
With NEST you can write queries in two ways: using object initializer syntax (which you did) or with help of fluent syntax. Have a look. Trying to write above query with the fluent syntax you will get something like
var searchResponse = await client.SearchAsync<Document>(s => s
.Size(0)
.Aggregations(a => a.Nested("by_country", n => n
.Path(p => p.Country)
.Aggregations(aa => aa
.Filter("by_country2", f => f
.Filter(q => q
.Term(t => t
.Field(field => field.Country.FirstOrDefault().IsDisplayed)
.Value(true)))
.Aggregations(aaa => aaa
.Terms("by_country3", t => t
.Field(field => field.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
.Size(9999)
)))))));
which I find a little bit easier to follow and write, maybe it will be better for you as well.
As a final note, have a look into docs and check how you can debug your queries.
Hope that helps.

Resources