Elasticsearch single request to do Union query Top N - elasticsearch

Not sure how to do SQL like union in Elasticsearch. I tried bool query but it doesn't meet my requirement yet. For example, the document structure is
{
"id": "123",
"authorId": 28,
"title": "Five Ways to Tap into...",
"byLine": "ashd jsabbdjs international",
"category": "Cat1"
}
I need to find top 5 matched "title" in each "category" when user types something. This can be done using multiple queries to Elasticsearch, but I was wondering if there are other ways to do it in one request.

Use an aggregation with top_hits sub-aggregation:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}

Here is query which returns multi buckets based on "category"
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"authorId": [
1,
28
]
}
}
],
"should": [
{
"query_string": {
"query": "*int*",
"fields": [
"title^2",
"byLine^1"
]
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}

Related

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

ES query ignoring time range filter

I have mimicked how kibana does a query search and have come up with the below query. Basically I'm looking for the lat 6 days of data (including those days where there is no data, since I need to feed it to a graph). But the returned buckets is giving me more than just those days. I woul like to understand where I'm going wring with this.
{
"version": true,
"size": 0,
"sort": [
{
"#timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"target_traffic": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 0,
"extended_bounds": {
"min": "now-6d/d",
"max": "now"
}
},
"aggs": {
"days_filter": {
"filter": {
"range": {
"#timestamp": {
"gt": "now-6d",
"lte": "now"
}
}
},
"aggs": {
"in_bytes": {
"sum": {
"field": "netflow.in_bytes"
}
},
"out_bytes": {
"sum": {
"field": "netflow.out_bytes"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"#timestamp",
"netflow.first_switched",
"netflow.last_switched"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "( flow.src_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) OR flow.dst_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) ) AND flow.traffic_locality: \"private\"",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
If you put the range filter inside your aggregation section without any date range in your query, what is going to happen is that your aggregations will run on all your data and metrics will be bucketed by day over all your data.
The range query on #timestamp should be moved inside the query section so as to compute aggregations only on the data you want, i.e. the last 6 days.

Elasticsearch - adding a separate query for aggregation

Below is the elasticsearch query I am using to get the results and the filter options for the results from the aggregation. The problem is that whenever someone applies a filter, the overall result changes and hence the filter options also changes. I do not want the filter options to changes unless query parameter change. For now I am making two calls:
get all results without aggregation
Get all filters by using aggregation and setting the size parameter to 0
This approach uses 2 api requests and hence doubling the time. Can this be done in one request only ?
First call: All results without aggregation
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"filter": [
{
"match": {
"is_paid": false
}
}
]
}
},
"sort": [],
"from": 0,
"size": 15
}
Second call: getting filters
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"courseCount": {
"terms": {
"field": "provider",
"size": 100
}
},
"paidCount": {
"terms": {
"field": "is_paid",
"size": 3
}
},
"subjectCount": {
"terms": {
"field": "subject",
"size": 30
}
},
"levelCount": {
"terms": {
"field": "level",
"size": 4
}
},
"pacingCount": {
"terms": {
"field": "pacing_type",
"size": 4
}
}
}
}

elasticsearch aggregation with filter from query

I'm new to elasticsearch and forgive if my question would be commonplace. I use ElasticSearch v2.2. The next query
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "nokia",
"fields": [
"*.right",
"*.correct_keyboard_layout"
],
"fuzziness": "AUTO"
}
},
"filter": [
{
"terms": {
"brands": ["Nokia"]
}
},
{
"terms": {
"models_id": ["2432", "5234"]
}
},
{
"terms": {
"stores": ["999"]
}
}
]
}
},
"aggs": {
"filtered": {
"aggs": {
"models_id": {
"terms": {
"field": "models_id",
"size": 0
}
},
"category_id": {
"terms": {
"field": "category_id",
"size": 0
}
}
}
}
}
}
I get in the aggregation result, excluding the filter from the request (that is, through all the records that match the query "Nokia", but I just need answers on these models, and in aggregation in response lists all models), although here
https://www.elastic.co/guide/en/elasticsearch/guide/current/_filtering_queries_and_aggregations.html
It says that the filter should be taken out of the request, and It do not understand why I do not work.
What am I doing wrong?

sorting elasticsearch top hits results

I am trying to execute a query in elasticsearch to get reuslt of specific users from certain date range. the results should be grouped by userId and sorted on trackTime field, I am able to use group by using aggregation but i am not able to sort aggregation buckets on tracktime, i write down the following query
GET _search
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"trackTime": {
"from": "2016-02-08T05:51:02.000Z"
}
}
}
]
}
},
"filter": {
"terms": {
"userId": [
9,
10,
3
]
}
}
}
},
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"size": 2
}
}
}
}
}
}
what more should i have to use to sort the top hits result? Thanks in advance...
You can use sort like .
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"fieldName": {
"order": "desc"
}
}
],
"size": 2
}
}
}
}
}
Hope it helps

Resources