Search document from an index which _ids does not exists in another index - elasticsearch

Is any way to search documents from one index that _ids do not exists in another index? Something like NOT EXISTS in MySQL.

There is no convinient way to do this within elasticsearch but only by using aggregations as a workaround:
GET index-a,index-b/_search
{
"size": 0,
"aggs": {
"group_by_id": {
"terms": {
"field": "_id",
"size": 1000
},
"aggs": {
"containied_in_indices_count": {
"cardinality": {
"field": "_index"
}
},
"filter_only_differences": {
"bucket_selector": {
"buckets_path": {
"count": "containied_in_indices_count"
},
"script": "params.count < 2"
}
}
}
}
}
}
Then you'll need to iterate over all buckets in group_by_id aggregation. Consider using a larger size, as it´s 10 by default and 1000 in my example. If there are more differences in your indicies you need to use bucket partitioning as described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Elasticsearch bucket_selector to include parent aggregation size

I found this query in elasticsearch docs
link to docs
GET /seats/_search
{
"size": 0,
"aggs": {
"theatres": {
"terms": {
"field": "theatre",
"size": 10
},
"aggs": {
"max_cost": {
"max": {
"field": "cost"
}
},
"filtering_agg": {
"bucket_selector": {
"buckets_path": {
"max": "max_cost"
},
"script": {
"params": {
"base_cost": 5
},
"source": "params.max + params.base_cost > 10"
}
}
}
}
}
}
}
here size is 10 in theatre parent aggregation and inside child aggregation we have condition on bucket, so if parent aggregation gives 10 documents and bucket_selector filter removes 4 documents then final result has 6 documents, i want final result to have 10 documents as mentioned in parent aggregation size field, is there any way to achieve this. can we include size after bucket_selector and remove size for parent aggregation.

How to get specific _source fields in aggregation

I am exploring ElasticSearch, to be used in an application, which will handle large volumes of data and generate some statistical results over them. My requirement is to retrieve certain statistics for a particular field. For example, for a given field, I would like to retrieve its unique values and document frequency of each value, along-with the length of the value. The value lengths are indexed along-with each document.
So far, I have experimented with Terms Aggregation, with the following query:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
}
}
}
}
The query returns all the values in the field val with the number of documents in which each value occurs. I would like the field val_len to be returned as well. Is it possible to achieve this using ElasticSearch? In other words, is it possible to include specific _source fields in buckets? I have looked through the documentation available online, but I haven't found a solution yet.
Hoping somebody could point me in the right direction. Thanks in advance!
I tried to include _source in the following manners:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"_source":["val_len"]
}
}
and
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"_source":["val_len"]
}
}
}
But I guess this isn't the right way, because both gave me parsing errors.
You need to use another sub-aggregation called top_hits, like this:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"aggs": {
"hits": {
"top_hits": {
"_source":["val_len"],
"size": 1
}
}
}
}
}
Another way of doing it is to use another avg sub-aggregation so you can sort on it, too
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"order": {
"length": "desc"
}
},
"aggs": {
"length": {
"avg": {
"field": "val_len"
}
}
}
}
}

Get all documents from elastic search with a field having same value

Say I have documents of type Order and they have a field bulkOrderId. Bulkorderid represents a group or bulk of orders issued at once. They all have the same Id like this :
Order {
bulkOrderId": "bulkOrder:12345678";
}
The id is unique and is generated using UUID.
How do I find groups of orders with the same bulkOrderId from elasticsearch when the bulkOrderId is not known? Is it possible?
You can achieve that using a terms aggregation and a top_hits sub-aggregation, like this:
{
"query": {
"match_all": {}
},
"aggs": {
"bulks": {
"terms": {
"field": "bulkOrderId",
"size": 10
},
"aggs": {
"orders": {
"top_hits": {
"size": 10
}
}
}
}
}
}

How to count number of objects in a nested field in elastic search?

How to count number of objects in a nested filed in elastic search?
Sample mapping :
"base_keywords": {
"type": "nested",
"properties": {
"base_key": {
"type": "text"
},
"category": {
"type": "text"
},
"created_at": {
"type": "date"
},
"date": {
"type": "date"
},
"rank": {
"type": "integer"
}
}
}
I would like to count number of objects in nested filed 'base_keywords'.
You would need to do this with inline script. This is what worked for me: (Using ES 6.x):
GET your-indices/_search
{
"aggs": {
"whatever": {
"sum": {
"script": {
"inline": "params._source.base_keywords.size()"
}
}
}
}
}
Aggs are normally good for counting and grouping, for nested documents you can use nested aggs:
"aggs": {
"MyAggregation1": {
"terms": {
"field": "FieldA",
"size": 0
},
"aggs": {
"BaseKeyWords": {
"nested": { "path": "base_keywords" },
"aggs": {
"BaseKeys": {
"terms": {
"field": "base_keywords.base_key.keyword",
"size": 0
}
}
}
}
}
}
}
You don't specify what you want to count, but aggs are quite flexible for grouping and counting data.
The "doc_count" and "key" behave similar to an sql group by + count()
Updated (This assumes you have a .keyword field create the "keys" values, since a property of type "text" can't be aggregated or counted:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" },
"aggs": {
"NestedKeywords": {
"terms": {
"field": "keywords1.keys.keyword",
"size": 0
}
}
}
}
}
}
For simply counting the number of nested keys you could simply do this:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" }
}
}
}
If you want to get some grouping on the field values on the "main" document or the nested documents, you will have to extend your mapping / data model to include terms that are aggregatable, which includes most data types in elasticsearch except "text", ex.: dates, numbers, geolocations, keywords.
Edit:
Example with aggregating on a unique identifier for each top level document, assuming you have a property on it called "WordMappingId" of type integer
{
"aggs": {
"word_maping_agg": {
"terms": {
"field": "WordMappingId",
"size": 0,
"missing": -1
},
"aggs": {
"Keywords1Agg": null,
"nested": { "path": "keywords1" }
}
}
}
}
If you don't add any properties to the "word_maping" document on the top level there is no way to do an aggregation for each unique document. The builtin _id field is by default not aggregateable, and I suggest you include a unique identifier from the source data on the top level to aggregate on.
Note: the "missing" parameter will put all documents that don't have the WordMappingId property set in a bucked with the supplied value, this makes sure you're not missing any documents in the search results.
Aggs can support a behaviour similar to a group by in SQL, but you need something to actually group it by, and according to the mapping you supplied there are no such fields currently in your index.
I was trying to do similar to understand production data distribution
The following query helped me find top 5
{
"query": {
"match_all": {}
},
"aggs": {
"n_base_keywords": {
"nested": { "path": "base_keywords" },
"aggs": {
"top_count": { "terms": { "field": "_id", "size" : 5 } }
}
}
}
}

Resources