MongoTemplate grouping and getting count using Spring JPA - spring

I wrote the below Aggregation for getting the count of applicants in a state
Aggregation aggregation = newAggregation(
group("applicants.state").count().as("total")
);
AggregationResults<Object> groupResults = mongoTemplate.aggregate(aggregation, Applicants.class, Object.class);
return groupResults.getMappedResults();
But for the above aggregation I got the result as shown below
[
{
"_id": "California",
"total": 82
},
{
"_id": "Seattle",
"total": 11298
}
:
:
]
I would like to know how to get the result as shown below, not sure if this is achievable from Spring JPA for mongodb
{
"states": {
"California": 82,
"Seattle": 11298
},
"total": 11380
}
Can someone please help me on this

Related

How to get inner hits field values in Nest or Elastic.Net library ? Alterantivly how to specify output type in Nest or Elastic.Net library?

I am new to elasticsearch and I am having troubles with the Nest/Elastic.Net library.
I would like to retrieve not the entire document but just part of it. I am able to do it in Postman but I cannot do it via Elastic.Net library or Nest library.
Document structure looks like following
{
“Doc_id”: “id_for_cross_refference_with_othersystem”
“Ocr”:[
{
“word”: “example_word1”,
“box”: [],
“cord”: “some_number”,
},
{
“word”: “example_word2”,
“box”: [],
“cord”: “some_number2”,
}
]
}
The document has a huge amount of properties but I am interested only in Doc_id , ocr.word, ocr.box and ocr.cord.
The following postman request fully satisfies my needs :
{
"query": {
"bool": {
"must": [
{
"match": {
"doc_id": "2a558865-7dc2-4e4d-ad02-3f683159984e"
}
},
{
"nested": {
"path": "ocr",
"query": {
"match": {
"ocr.word": "signing"
}
},
"inner_hits": {
"_source": {
"includes":[
"ocr.word",
"ocr.box",
"ocr.conf"
]
}
}
}
}
]
}
},
"_source":"false"
}
Result of that request is following :
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 18.99095,
"hits": [
{
"_index": "irrelevant",
"_type": "irrelevant",
"_id": "irrelevant",
"_score": 18.99095,
"_source": {},
"inner_hits": {
"ocr": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 7.9260864,
"hits": [
{
"_index": "irrelevant",
"_type": "irrelevant",
"_id": "irrelevant",
"_nested": {
"field": "ocr",
"offset": 11
},
"_score": 7.9260864,
"_source": {
"box": [
],
"conf": "96.452858",
"word": "signing"
}
}
]
}
}
}
},
{
"_index": "there_rest _of_object_is_ommited",
},
{
"_index": "there_rest _of_object_is_ommited",
}
]
}
}
However when I try to convert that request to Nest Query DSL I am not able to achieve the same result.
When I try to use the NEST library I don’t see any way to provide output result model/type. It looks like the Type of Document should match the output type which is not my case.
Query that I am using :
var searchResponse = client2.Search<Model>(s => s
.Query(q1 => q1.Bool(b1 => b1.Must(s1 =>
s1.Match(m => m.Field(f => f.doc_id).Query("2a558865-7dc2-4e4d-ad02-3f683159984e")),
s2 => s2.Nested(n => n.Path("ocr").Query(q2 => q2.Bool(b => b.Must(m => m.Match(m => m.Field(f => f.ocr.First().word).Query("signing")))))
.InnerHits(ih => ih.Source(s => s.Includes(i => i.Field(f => f.ocr.First().word).Field(f => f.ocr.First().conf))))
)
)))
.Source(false)
);
Due to the fact that the Model type is created for a document and it doesn’t match the output type I am getting [null, null, null] as the output .
There is property such properties as Hits in ISearchResponse? But when I look into it I cannot see values of fields.
I tried using a low level client (Elastic.Net) and providing json request as a string. But It looks like there is not way of specifying the output type either. When I ran my code with the low level library it returns me 3 object of class Model with empty fields.
My questions are :
Is it possible to specify output type different from document type for Nest query DSL or Elatic.Net library ?
Is it possible to get values of the fields that I specified in request for inner hits with help of Nest or Elastic.Net libraries?
How would you solve such problem ? I mean we have huge documents and we don’t want to pass unnecessary information back and forth. The inner hits approach looks like a neat solution for us but it doesn’t look like it works with the recommended libraries Unless I am doing some silly mistake.
NOTE: I can achieve desired result with simple use of HTTPClient and manually doing what I need , but I hope to leverage library that is written for this purpose(Nest or Elastic.Net).

How get a distinct list of document fields using NEST?

I have just started with Elasticsearch and am using the NEST API for my .Net application. I have an index and some records inserted. I am now trying to get a distinct list of document field values. I have this working in Postman. I do not know how to port the JSON aggregation body to a NEST call. Here is the call I am trying to port to the NEST C# API:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {
"field": "host"
}
}
}
Here is the result which is my next question. How would I parse or assign a POCO to the result? I am only interested in the distinct list of the field value in this case 'host'. I really just want an enumerable of strings back. I do not care about the count at this point.
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"hosts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "hoyt",
"doc_count": 3
}
]
}
}
}
I was able to get the results I am after with the following code:
var result = await client.SearchAsync<SyslogEntryIndex>(s => s.Size(0).Aggregations(a => a.Terms("hosts", t => t.Field(f => f.Host))));
List<string> hosts = new List<string>();
foreach (BucketAggregate v in result.Aggregations.Values)
{
foreach (KeyedBucket<object> item in v.Items)
{
hosts.Add((string)item.Key);
}
}
return hosts;

Is it possible to "join" two indexes in elasticsearch using query

I know that there is no option to join indexes in elasticsearch but I need to find a way to solve this problem:
I have 2 indexes like A, B
IndexA have informations like field1, field2, field3
IndexB have field4, field5, field6
And if i will search by the query field5(in this case "test") I want to have all relationships in the tree like:
Match all the the documents from IndexA which matches "field2" - from IndexA and "field5" from IndexB For example
IndexA documents:
5, "test", "test2",
10, "test", "test7"
11, "test10", "test11"
IndexB documents:
1, "test", (...)
2, "test", (...)
3, "test100", (...)
The example response:
for id5 (from indexA) i want to have an object with id's 1 and 2 from indexB like {id:5, responses: {1, 2}}
for id10 (from IndexA) i want to have an object with id's 1 and 2 from indexB like {id:10, responses: {1, 2}}
for id11 there is no match ("test10" != "test") {id:11, responses:{}}
Meybe there is any way to solve this? Finally I need to do this for four indexes (but if it is possible between two then I can do it on 4 aswell).
I don't think it's possible in elasticsearch, just like you said. You shouldn't create indexes with such relations. It would be better to rethink your model and denormalize the data.
In order to solve this, you'll have to do the processing programmatically in your backend. Pseudocode:
//Get all objects from indexA
const allIndexA = indexA.getAll();
const result = new Array();
//For each object in indexA, select the corresponding object in indexB
allIndexA.forEach((entryA) => {
const entriesB = indexB.get({field5: entryA.field2});
result.push({
entryA,
entriesB
});
});
I was tring as bellow:
GET /_msearch
{
"_index": [
"index1",
"index2",
"index3"
]
}
{
"query": {
"bool": {
"should": [
{
"match": {
"index3id": "1" // it is in the 3th index so i have responses from 3th index
} // only
}
]
}
},
"size": 100,
"aggregations": {
"firstLevel": {
"top_hits": {
"size": 100,
"_source": {
"includes": "index3id"
}
}
}
}
}
response of aggregation here:
"aggregations": {
"firstLevel": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "index3",
"_type": "someTypeNotRelevant",
"_id": "81",
"_score": 1,
"_source": {
"index3id": 1
}
},
{
"_index": "index3",
"_type": "someTypeNotRelevant",
"_id": "61",
"_score": 1,
"_source": {
"index3id": 1
}
}
]
}
}
}
Now I just want to do a new query in index2 for some field but with values which were in
_source(in this case - for the all index3id's) (i was thinking about some sub-aggregation to firstLevel": {} aggregation - but with use of new query to index2).
There are 2 problems:
1. How to pass these index3id's?
2. After first query, I have only "data" from index3 because of using index3id
Anyway thank you for advice.

using elasticsearch filter in logstash pipeline

I'm using the elasticsearch filter in my logstash pipeline. I correctly find the result using :
filter{
if [class] == "DPAPIINTERNAL" {
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
}
}
}
as you can see, Im using "query_template" which is :
{
"query": {
"query_string": {
"query": "class:DPAPI AND request.aw:%{[aw]}"
}
},
"_source": ["end_point", "vittorio"]
}
that tells elastichsearch to look up the log with that specific class that match "aw" with the DPAPIINTERNAL log.
Perfect! but now that i found the result, i want to add some field from it and attach them to my DPAPIINTERNAL log, for instance, i want to take "end_point" and add it in the new key "vittorio" inside my log.
This is not happening and I don't understand why.
here is the log that i'm looking at using the query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "dp_audit-2017.02.16",
"_type": "logs",
"_id": "AVpHoPHPuEPlW12Qu",
"_score": 1,
"_source": {
"svc": "dp-1.1",
"request": {
"method": "POST|PATCH|DELETE",
"aw": "prova",
"end_point": "/bank/6311",
"app_instance": "7D1-D233-87E1-913"
},
"path": "/home/vittorio/Documents/dpapi1.json",
"#timestamp": "2017-02-16T15:53:33.214Z",
"#version": "1",
"host": "Vito",
"event": "bank.add",
"class": "DPAPI",
"ts": "2017-01-16T19:20:30.125+01:00"
}
}
]
}
}
Your need to specify the fields parameter in your elasticsearch filter, like this:
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
fields => { "[request][end_point]" => "vittorio" }
}
Note that since end_point is a nested field, you need to modify the _source in your query template like this:
"_source": ["request.end_point"]
the problem is simply that you don't have to specify the "new" field using the query_template.
"_source": ["request"] # here you specify the field you want from the query result.
and then
filter{
if [class] == "DPAPIINTERNAL" {
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
fields => {"request" => "new_key"} # here you add the fields and will tell elastich filter to put request inside new_key
}
}
}
That worked for me!

Elastic search Nest TopHits aggregation

I've been struggling with a problem for a while now, so i thought i would swing this by stackoverflow.
My document type has a title, a language field (used to filter) and a grouping id field (im leaving out all the other fields to keep this to the point)
When i search for documents i want to find all documents containing the text in the title. I only want one document for each unique grouping id.
I've been looking at tophits aggregation, and from what i can see it should be able to solve my problem.
When running this query against my index:
{
"query": {
"match": {
"title": "dingo"
}
},
"aggs": {
"top-tags": {
"terms": {
"field": "groupId",
"size": 1000000
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"_source": {
"include": [
"*"
]
},
"size": 1
}
}
}
}
}
}
I get the following response (All results are in the same language):
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"top-tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "3044BC9E7C29450AAB2E4B6C9B35AAE2",
"doc_count": 2,
"top_tag_hits": {
"hits": {
"total": 2,
"max_score": 1.4983996,
"hits": [{
"_index": "elasticsearch",
"_type": "productdocument",
"_id": "FB15279FB18E4B34AD66ACAF69B96E9E",
"_score": 1.4983996,
"_source": {
"groupId": "3044BC9E7C29450AAB2E4B6C9B35AAE2",
"title": "wombat, dingo and zetapunga actionfigures",
}
}]
}
}
},
{
"key": "F11799ABD0C14B98ADF2554C84FF0DA0",
"doc_count": 1,
"top_tag_hits": {
"hits": {
"total": 1,
"max_score": 1.30684,
"hits": [{
"_index": "elasticsearch",
"_type": "productdocument",
"_id": "42562A25E4434A0091DE0C79A3E7F3F4",
"_score": 1.30684,
"_source": {
"groupId": "F11799ABD0C14B98ADF2554C84FF0DA0",
"title": "awesome dingo raptor"
}
}]
}
}
}]
}
}
}
This is exactly what i expected (two hits in one bucket, but only one document retrieved for that bucket). However when i try this in NEST i can't seem to retrieve all of the documents.
My query looks like this:
result = _elasticClient.Search<T>(s => s
.From(skip)
.Filter(fd => fd.Term(f => f.Language, language))
.Size(pageSize)
.SearchType(SearchType.Count)
.Query(
q => q.Wildcard(f => f.Title, query, 2.0)
|| q.Wildcard(f => f.Description, query)
)
.Aggregations(agd =>
agd.Terms("groupId", tagd => tagd
.Field("groupId")
.Size(100000) //We sadly need all products
)
.TopHits("top_tag_hits", thagd => thagd
.Size(1)
.Source(ssd => ssd.Include("*")))
));
var topHits = result.Aggs.TopHits("top_tag_hits");
var documents = topHits.Documents<ProductDocument>(); //contains only one document (I would expect it to contain two, one for each bucket)
Inspecting the aggregations in the debugger reveals there is a "groupId" aggregation with 2 buckets (and matching what i see in my "raw" query against the index. Just without any apparent way to retrieve the documents)
So my question is. How do i retrieve the top hit for each bucket? Or am i doing this completely wrong? Is there some other way to achieve what i am trying to do?
EDIT
After the help i received, i was able to retrieve my results with the following:
result = _elasticClient.Search<T>(s => s
.From(skip)
.Filter(fd => fd.Term(f => f.Language, language))
.Size(pageSize)
.SearchType(SearchType.Count)
.Query(
q => q.Wildcard(f => f.Title, query, 2.0)
|| q.Wildcard(f => f.Description, query)
)
.Aggregations(agd =>
agd.Terms("groupId", tagd => tagd
.Field("groupId")
.Size(0)
.Aggregations(tagdaggs =>
tagdaggs.TopHits("top_tag_hits", thagd => thagd
.Size(1)))
)
)
);
var groupIdAggregation = result.Aggs.Terms("groupId");
var topHits =
groupIdAggregation.Items.Select(key => key.TopHits("top_tag_hits"))
.SelectMany(topHitMetric => topHitMetric.Documents<ProductDocument>()).ToList();
Your NEST query tries to run both Terms aggregation and TopHits side by side, while your original query runs Terms first and then for each bucket, you're calling TopHits.
You simply have to move your TopHits agg into Terms in your NEST query to make it work fine.
This should fix it:
.Aggregations(agd =>
agd.Terms("groupId", tagd => tagd
.Field("groupId")
.Size(0)
.Aggregations(tagdaggs =>
tagdaggs.TopHits("top_tag_hits", thagd => thagd
.Size(1)))
)
)
By the way, you don't have to use Include("*") to include all fields. Just remove this option, also specifying .Size(0) should bring back ALL possible terms for you.

Resources