How to query elastic search with Hashmap - elasticsearch

I would like to query the Elastic Search with map of values and retrieve the documents.
Example:
I have indexed the below two documents
1. {
"timestamp": 1601498048,
"props": {
"cp1": "cv1",
"cp2": "cv2"
}
}
2. {
"timestamp": 1601498098,
"props": {
"key1": "v1",
"key2": "v2"
}
}
So, I wanted to query with the entire map values props with
"props"
{
"cp1": "cv1",
"cp2": "cv2"
}
and return documents only for the entired matched map values. So in this case the result would be only first document, since it matched the given props.
I can able to query with only single map value like below , but need to search for entire map.
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : [
{
"terms" : {
"customProperties.cp1.keyword" : [ "cv1" ]
}
}
]
}
}
}
'
So how we query for entire map props and return documents only if all map key-values matched.
Update
Mainly I need a QueryBuilder to search with map of values. I could do for set of values like below
val sampleSet = setOf("foo", "bar")
val query = NativeSearchQueryBuilder()
.withQuery(
QueryBuilders.termsQuery(
"identifiers.endpointId.keyword", sampleSet)
)
.build()
I need QueryBuilder to search with map of values in the ES index and return document only if entire map values matches.
Suggestions please.

you must apply double match clausule.
{
"query": {
"bool": {
"must": [
{
"match": {
"props.cp1": "cv1"
}
},
{
"match": {
"props.cp2": "cv2"
}
}
]
}
}
}
Or Term.
{
"query": {
"bool": {
"must": [
{
"term": {
"props.cp1.keyword": "cv1"
}
},
{
"term": {
"props.cp2.keyword": "cv2"
}
}
]
}
}
}

This worked. I just looped through the queryBuilder with map values props.
val builder = QueryBuilders.boolQuery()
for (prop in props) {
builder.must(QueryBuilders.matchQuery("customProperties.${prop.key}", prop.value))
}
val query = NativeSearchQueryBuilder().withQuery(builder)
println("results + $queryForList(query)")
passed query to this function
internal fun queryForList(query: NativeSearchQuery): List<DocumentType> {
val resp = searchOperations.search(query, type, IndexCoordinates.of(indexName))
return resp.searchHits.map { it.content }
}

Related

Elasticsearch search templates - How to construct the search terms in NEST

Currently I have a search template that I am trying to pass in a couple of parameters,
How can I construct my search terms using NEST to get the following result.
Template
PUT _scripts/company-index-template
{
"script": {
"lang": "mustache",
"source": "{\"query\": {\"bool\": {\"filter\":{{#toJson}}clauses{{/toJson}},\"must\": [{\"query_string\": {\"fields\": [\"companyTradingName^2\",\"companyName\",\"companyContactPerson\"],\"query\": \"{{query}}\"}}]}}}",
"params": {
"query": "",
"clauses": []
}
}
}
DSL query looks as follow
GET company-index/_search/template
{
"id": "company-index-template",
"params": {
"query": "sky*",
"clauses": [
{
"terms": {
"companyGroupId": [
1595
]
}
},
{
"terms": {
"companyId": [
158,
836,
1525,
2298,
2367,
3176,
3280
]
}
}
]
}
}
I would like to construct the above query in NEST but cant seem to find a good way to generate the clauses value.
This is what I have so far...
var responses = this.client.SearchTemplate<Company>(descriptor =>
descriptor
.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", queryBuilder.Query)
.Add("clauses", "*How do I contruct this JSON*");
UPDATE:
This is how I ended up doing it. I just created a dictionary with all my terms in it.
I do think there might be a beter why of doing it, but I cant find it.
new List<Dictionary<string, object>>
{
new() {{"terms", new Dictionary<string, object> {{"companyGroupId", companyGroupId}}}},
new() {{"terms", new Dictionary<string, object> {{"companyId", availableCompanies}}}}
}
And then I had to Serialize when I passed it to the Params method.
var response = this.client.SearchTemplate<Company>(descriptor =>
descriptor.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", "*" + query + "*")
.Add("clauses", JsonConvert.SerializeObject(filterClauses))));

Filter on elasticsearch field if exists, otherwise ignore the filter

I'm trying to filter on two differents indice but in the same elastic query,
I would like to filter on a field that not exist on the two indices :
first indice :
label: string,
value: string,
isShowable: boolean
second indice :
label: string,
value: string
I want to retrieve all items from the second indice, and get only showable items from the first one.
I'm using Elasticsearch DSL in php.
Here is what I tried :
Item::search($formattedRequest, function (Client $client, Search $body) {
$dispensationQuery = new TermQuery("isShowable", true);
$bool = new BoolQuery();
$bool->add($dispensationQuery, BoolQuery::SHOULD);
$body->addQuery($bool);
return $client->search(
['index' => firstIndex . "," . secondIndex, 'body' => $body->toArray()],
);
})->get();
But it filter on all my items and don't retrieve value from the second indice.
How to manage that ?
You can use a should clause to combine exists and bool check. It will be like return a document if either field exists or its value is true
GET index1,index2/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "isShowable"
}
}
]
}
},
{
"term": {
"isShowable": {
"value": true
}
}
}
]
}
}
}

Elasticsearch merging documents in response

I am having data in 3 indexes. I want to generate a invoice report using information from other indexes. For example the following are the sample document of each index
Users index
{
"_id": "userId1",
"name": "John"
}
Invoice index
{
"_id": "invoiceId1",
"userId": "userId1",
"cost": "10000",
"startdate": "",
"enddate": ""
}
Orders index
{
"_id": "orderId1",
"userId": "userId1",
"productName": "Mobile"
}
I want to generate a invoice report by combining information from these three indexes as follows
{
"_id": "invoiceId1",
"userName": "John",
"productName": "Mobile",
"cost": "10000",
"startdate": "",
"enddate": ""
}
How to write Elasticsearch query which returns response by combining information from other index documents?
You cannot do query-time joins in Elasticsearch and will need to denormalize your data in order to efficiently retrieve and group it.
Having said that, you could:
leverage the multi-target syntax and query multiple indices at once
use an OR query on the id and userId -- since either of those is referenced at least once in any of your docs
and then trivially join your data through a map/reduce tool called scripted metric aggregations
Quick side note: you won't be able to use the _id keyword inside your docs because it's reserved.
Assuming your docs and indices are structured as follows:
POST users_index/_doc
{"id":"userId1","name":"John"}
POST invoices_index/_doc
{"id":"invoiceId1","userId":"userId1","cost":"10000","startdate":"","enddate":""}
POST orders_index/_doc
{"id":"orderId1","userId":"userId1","productName":"Mobile"}
Here's how the scripted metric aggregation could look like:
POST users_index,invoices_index,orders_index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"id.keyword": {
"value": "userId1"
}
}
},
{
"term": {
"userId.keyword": {
"value": "userId1"
}
}
}
]
}
},
"aggs": {
"group_by_invoiceId": {
"scripted_metric": {
"init_script": "state.users = []; state.invoices = []; state.orders = []",
"map_script": """
def source = params._source;
if (source.containsKey("name")) {
// we're dealing with the users index
state.users.add(source);
} else if (source.containsKey("cost")) {
// we're dealing with the invoices index
state.invoices.add(source);
} else if (source.containsKey("productName")) {
// we're dealing with the orders index
state.orders.add(source);
}
""",
"combine_script": """
def non_empty_state = [:];
for (entry in state.entrySet()) {
if (entry != null && entry.getValue().length > 0) {
non_empty_state[entry.getKey()] = entry.getValue();
}
}
return non_empty_state;
""",
"reduce_script": """
def final_invoices = [];
def all_users = [];
def all_invoices = [];
def all_orders = [];
// flatten all resources
for (state in states) {
for (kind_entry in state.entrySet()) {
def map_kind = kind_entry.getKey();
if (map_kind == "users") {
all_users.addAll(kind_entry.getValue());
} else if (map_kind == "invoices") {
all_invoices.addAll(kind_entry.getValue());
} else if (map_kind == "orders") {
all_orders.addAll(kind_entry.getValue());
}
}
}
// iterate the invoices and enrich them
for (invoice_entry in all_invoices) {
def invoiceId = invoice_entry.id;
def userId = invoice_entry.userId;
def userName = all_users.stream().filter(u -> u.id == userId).findFirst().get().name;
def productName = all_orders.stream().filter(o -> o.userId == userId).findFirst().get().productName;
def cost = invoice_entry.cost;
def startdate = invoice_entry.startdate;
def enddate = invoice_entry.enddate;
final_invoices.add([
'id': invoiceId,
'userName': userName,
'productName': productName,
'cost': cost,
'startdate': startdate,
'enddate': enddate
]);
}
return final_invoices;
"""
}
}
}
}
which'd return
{
...
"aggregations" : {
"group_by_invoiceId" : {
"value" : [
{
"cost" : "10000",
"enddate" : "",
"id" : "invoiceId1",
"userName" : "John",
"startdate" : "",
"productName" : "Mobile"
}
]
}
}
}
Summing up, there are workarounds to achieve query-time joins. At the same time, scripts like this shouldn't be used in production because they could take forever.
Instead, this aggregation should be emulated outside of Elasticsearch after the query resolves and returns the index-specific hits.
BTW — I set size: 0 to return just the aggregation results so increase this parameter if you want to get some actual hits.

Translate ElasticSearch query to Nest c#

I need some help in creating an AggregationDictionary from the following elasticsearch query
GET organisations/_search
{
"size": 0,
"aggs": {
"by_country": {
"nested": {
"path": "country"
},
"aggs": {
"by_country2": {
"filter": {
"bool": {
"must": [
{
"term": {
"country.isDisplayed": "true"
}
}
]
}
},
"aggs": {
"by_country3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
}
}
}
}
}
}
I managed to write this horrible piece of code which I am pretty sure it is wrong, I am totally new to this.
AggregationDictionary aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[] {
new NestedQuery
{
Query = new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Can someone tell me if I am in the correct direction? I am using Nest 6.6.0. Is there any tool that helps with these translations?
What you have so far is pretty solid, but when you try to execute this aggregation with the following call
var searchAsync = await client.SearchAsync<Document>(s => s.Size(0).Aggregations(aggs));
you will get this error
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
}
],
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
},
"status" : 400
}
Checking request which was sent to elasticsearch give us the answer why it happened
{
"aggs": {
"countries_step1": {
"aggs": {
"countries_step2": {
"aggs": {
"countries_step3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
},
"filter": {}
}
},
"nested": {
"path": "country"
}
}
},
"size": 0
}
filter clause is empty, this is because you tried to used nested query but you didn't pass path parameter. We don't need nested query here (as shown in your example query), we can simplify the whole query to
var aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Now we have a valid request sent to elasticsearch.
There are a couple of things we can improve here:
1. Remove unnecessary bool query
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
to
Filter =
new TermQuery
{
Field = "country.isDisplayed",
Value = true
},
2. Replace string field names
Usually, when doing calls from .Net there is some kind of POCO type which is helping us with writing strongly-typed requests to elasticsearch which helps us managing clean code and refactoring. With this, we can change field definition from
"country.displayName.keyword"
to
Infer.Field<Document>(f => f.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
my types definition
public class Document
{
public int Id { get; set; }
[Nested]
public List<Country> Country { get; set; }
}
public class Country
{
public bool IsDisplayed { get; set; }
public string DisplayName { get; set; }
}
3. Consider using a fluent syntax
With NEST you can write queries in two ways: using object initializer syntax (which you did) or with help of fluent syntax. Have a look. Trying to write above query with the fluent syntax you will get something like
var searchResponse = await client.SearchAsync<Document>(s => s
.Size(0)
.Aggregations(a => a.Nested("by_country", n => n
.Path(p => p.Country)
.Aggregations(aa => aa
.Filter("by_country2", f => f
.Filter(q => q
.Term(t => t
.Field(field => field.Country.FirstOrDefault().IsDisplayed)
.Value(true)))
.Aggregations(aaa => aaa
.Terms("by_country3", t => t
.Field(field => field.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
.Size(9999)
)))))));
which I find a little bit easier to follow and write, maybe it will be better for you as well.
As a final note, have a look into docs and check how you can debug your queries.
Hope that helps.

Elasticsearch custom sorting / adding filter clauses scores

I have this simple documents set:
{
id : 1,
book_ids : [2,3],
collection_ids : ['a','b']
},
{
id : 2,
book_ids : [1,2]
}
If I run this filter query, it will match both documents:
{
bool: {
filter: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
bool: {
filter: {
term: {
book_ids: 2
}
}
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
bool: {
filter: {
term: {
collection_ids: 'a'
}
}
}
}
]
}
}
]
}
}
The thing is I want to sort these documents, and I would like the first one (id: 1) to be returned first because it matched both the book_ids value and the collection_ids values provided.
A simple sort clause like this one is not working:
[
'book_ids',
'collection_ids'
]
because it will return first document 2 due to the book_ids array first value.
Edit: this is a simplified example of the problem I am facing, which has N such clauses in the should clause. Moreover there is an order between the clauses, as I tried to reflect with the sort snippet: results matching the first clause (book_ids) should appear before results matching the second clause (collection_ids). I am really looking for some kind of SQL sort operation where I would only take into account the matching value of the field array. A viable option might be to assign decreasing constant_scores to each term clause, according to the expected sort order, and ES would have to sum this sub-scores to compute the final score. But I cannot figure out how to do it or if it is even possible.
Bonus question:
is there any way for ElasticSearch to return some kind of new document with only the matching values? Here is what I would expect as a response to the above filter query:
{
id : 1,
book_ids : [2],
collection_ids : ['a']
},
{
id : 2,
book_ids : [2]
}
I think you're right about the constant score idea. I think you can do it like this:
{
query: {
bool: {
must: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
book_ids: 2
}
},
boost: 100
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
collection_ids: 'a'
}
},
boost: 50
}
}
]
}
}
]
}
}
}
I think the only thing you were missing using constant score, was likely just that the top level query needs to be must, not filter. (There's no scoring for filters, all the scores are 0.)
An alternative would be to put the filter inside a function_score query (but leave it as a filter), and then compute the score as you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html)
As to the bonus question, it's possible if you use a script field to filter and add a new field like you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html), but it's not possible in a straightforward way. It's probably easier and makes more sense to do that filtering after you receive the result, unless you have very long lists in your values.

Resources