Can I get outer document fields from nested top hits aggregation? - elasticsearch

I am making use of the functionality that was added in Elasticsearch 1.5 to allow a top hits aggregation inside a nested aggregation. The problem I have is that once I have my top nested documents I want to be able to also get fields from their outer documents.
my pseudo aggregation structure is
nested: {
some_other_aggreagation: {
"top_hits": {
}
}
}
The top nested hits include the index, type and id of the outer document, so I could perform a secondary search, but I'd like to avoid that. My other option is to return all of the hits from the query (currently I only return the results of the aggregations) and then match up the documents with the events in my code, but that seems bad from a performance point of view.
Can anyone suggest something better? Thanks.

Related

Is it possible to query by field data type in Elasticsearch?

I am needing to do a query in Elasticsearch by field data type. I have not been successful in creating that query. I want to be able to {1) specify the type I want to search for in the query, i.e. all fields of {"type"="boolean"}, and also, (2) get the field and see what the type is for that field.
Reason is to check that the field is designated correctly. Let's say I inserted the following data into this index and fields and I now want to see what the data types of those fields are programmatically. How would I query that?
POST /index_name1/_doc/
{
"field1":"hello_field_2",
"field2":"123456.54321",
"field3.field4": false,
"field3.field5.field10":"POINT(-117.918976 33.812511)",
"field3.field5.field8": "field_of_dragons",
"field3.field5.field9": "2022-05-26T07:47:26.133275Z"
}
I have tried:
GET /index_name1/_search
{
"query":{
"wildcard":{
"field3.field4":{ "type":"*"}
}
}
}
That gives [wildcard] query does not support [type].
I've tried many other queries and searched the documentation and threads, but can't find anything that will do this. It has got to be possible, right?

Navigating terms aggregation in Elastic with very large number of buckets

Hope everyone is staying safe!
I am trying to explore the proper way to tacke the following use case in elasticsearch
Lets say that I have about 700000 docs which I would like to bucket on the basis of a field (let's call it primary_id). This primary id can be same for more than one docs (usually upto 2-3 docs will have same primary_id). In all other cases the primary_id is not repeted in any other docs.
So on average out of every 10 docs I will have 8 unique primary ids, and 1 primary id same among 2 docs
To ensure uniqueness I tried using the terms aggregation and I ended up getting buckets in response to my search request but not for the subsequent scroll requests. Upon googling, I found that scroll queries do not support aggregations.
As a result, I tried finding alternates solutions, and tried the solution in this link as well, https://lukasmestan.com/learn-how-to-use-scroll-elasticsearch-aggregation/
It suggests use of multiple search requests each specifying the partition number to fetch (dependent upon how many partitions do you divide your result in). But I receive client timeouts even with high timeout settings client side.
Ideally, I want to know what is the best way to go about such data where the variance of the field which forms the bucket is almost equal to the number of docs. The SQL equivalent would be select DISTINCT ( primary_id) from .....
But in elasticsearch, distinct things can only be processed via bucketing (terms aggregation).
I also use top hits as a sub aggregation query under terms aggregation to fetch the _source fields.
Any help would be extremely appreciated!
Thanks!
There are 3 ways to paginate aggregtation.
Composite aggregation
Partition
Bucket sort
Partition you have already tried.
Composite Aggregation: can combine multiple datasources in a single buckets and allow pagination and sorting on it. It can only paginate linearly using after_key i.e you cannot jump from page 1 to page 3. You can fetch "n" records , then pass returned after key and fetch next "n" records.
GET index22/_search
{
"size": 0,
"aggs": {
"ValueCount": {
"value_count": {
"field": "id.keyword"
}
},
"pagination": {
"composite": {
"size": 2,
"sources": [
{
"TradeRef": {
"terms": {
"field": "id.keyword"
}
}
}
]
}
}
}
}
Bucket sort
The bucket_sort aggregation, like all pipeline aggregations, is
executed after all other non-pipeline aggregations. This means the
sorting only applies to whatever buckets are already returned from the
parent aggregation. For example, if the parent aggregation is terms
and its size is set to 10, the bucket_sort will only sort over those
10 returned term buckets
So this isn't suitable for your case
You can increase the result size to value greater than 10K by updating setting index.max_result_window. Setting too big a size can cause out of memory issue so you need to test it out see how much your hardware can support.
Better option is to use scroll api and perform distinct at client side

Application-side Joins Elasticsearch

I have two indexes in Elasticsearch, a system index, and a telemetry index. I'd like to perform queries and aggregations on the telemetry index using filters from the systems index. The systems index is relatively small and only receives new documents occasionally, but the telemetry index is much larger and is constantly receiving new documents. This seems like an ideal situation for using an application-side join.
I tried emulating the example query at the pervious link, but it turns out the filtered query is deprecated as of ES 5.0. (Why is this example in the current documentation?!)
Here are my queries:
GET /system/_search
{
"query": {
"match": {
"name": "George's system"
}
}
}
GET /telemetry/_search
{
"query": {
"bool":{
"must": {
"multi_match": {
"operator": "and",
"fields": ["systemId"]
, [1] }
}
}
}
}
}
The second one fails with a json_parse_exception because for some reason it doesn't like the [ ] characters after "fields".
Can anyone provide a simple example of using application-side joins?
Once such a query is defined (perhaps in Kibana's Dev Tools console) is there a way to visualize it in Kibana?
With elastic there is no way to execute two nested queries like in a relational database where the first query uses the response of the second. The example in the application-side join, means that you are actually making two queries (two different requests to elastic) on the application side.
First query you get the list of ids you need to filter on.
Second query you pass the list of ids that you got to the terms filter.
This works when you have no more than 1024 values for systemId. Because terms query has a limit on the number of terms.
Because this query is not feasible, then you can't visualize it in kibana.
In such case you have to sacrifice a little of space and add the systemId to your mapping.
Good Luck!

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Elasticsearch highlight with nested objects

I have a question about highlighting nested object fields.
Consider record like this:
_source: {
id: 286
translations: [
{
id: 568
language: lang1
value: foo1 bar1
}
{
id: 569
language: lang2
value: foo2 bar2
}
]
}
If the translations.value has ngram filter, is it possible to highlight matches in nested object such as this one?
And how would the highlight query look like.
Thanks a lot for response.
Same problem over here. It seems that there is now way to do it in elastic search and won't be in near future.
Developer Shay Banon wrote:
In order to do highlighting based on the nested query, the nested
documents needs to be extracted as well in order to highlight it,
which is more problematic (and less performant).
Also:
His explanation was that this would take a good amount of memory as
there can be a large number of children. And it looks genuine to me as
adding this feature will violate the basic concept of processing only
N number of feeds at a time.
So the only way is to process the result of a query manually in your own programm to add the highlights.
Update
I don't know about tire or ngram filters but i found a way to retrieve all filter matching nested documents by using nested facets and facet filters. You need a seperate query for highlighting but its much faster than browsing through _source, in my case at least.
{"query":
{"match_all":{}},
"facets":{
"matching_translations":{
"nested":"translations",
"terms":{"field":"translations.value"},
"facet_filter":{
"bool":{"must":[{"terms":{"translations.value":["foo1"]}}]}
}
}
}
}
You can use the resulting facet terms for highlighting in your programm.
For example: i want to highlight links to nested documents (in jquery):
setHighlights = function(sdata){
var highlightDocs = [];
if(sdata['facets'] && sdata['facets']['docIDs'] && sdata['facets']['doctIDs']['terms'] && sdata['facets']['docIDs']['terms'].length >0){
for(var i in sdata['facets']['docIDs']['terms']){
highlightDocs.push(sdata['facets']['docIDs']['terms'][i]['term'])
}
}
$('li.document_link').each(function(){
if($.inArray($(this).attr('id'),highlightDocs) != -1) {
$(this).addClass('document_selected');
}
});
I hope that helps a little.
You can use force_source" : true in the fields to cause the document be highlighted once nested fields are joined.

Resources