To find the distinct fields in an elastic search query - elasticsearch

I need the values of only one field and there are duplicate values in it.
POST _search
{
"query": {
"bool": {
"must": [
{"term": {
"report": {
"value": "some_value"
}
}}
]
}
},
"fields": [
"field_name"
]
}
I need only the distinct values of field_name.

What if you have your query, with the use of terms aggregation and then by applying a top_hits aggregation in order to narrow down to the single value which you wanted to achieve:
"aggs": {
"values": {
"terms": {
"field": "your_field"
}
}
}
This SO could be helpful as well.

Related

Query on multiple range of document

What I want to search is to extract documents among certain range of documents, not the whole documents. I know ids of documents. For example, I want to query matching some sentences with query field - 'pLabel' among the documents ids of which I know via different process. My trial is as below but I got bunch of documents which is different with my expectation.
For example, in such documents as eid1, eid2...etc groups, I want to query filtering out the matching documents out of the groups (eid1, eid2, eid3, ...). Query is shown as below.
How I fix query statement to get the right search result?
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel" ,
"query": "search words here"
}
}
] ,
"must_not": [] ,
"should": [
{
"term": {
"eid": "eid1"
}
} ,
{
"term": {
"eid": "eid2"
}
}
]
}
} ,
"size": 0 ,
"_source": [
"eid"
] ,
"aggs": {
"eids": {
"terms": {
"field": "eid" ,
"size": 1000
}
}
}
}
You need to move the should clause of the Doc IDs inside the must clause.
Right now the query can return any document that matches the query_string clause, it'll only prefer docs that matches the Doc IDs.
Also, you should use terms query
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel",
"query": "search words here"
}
},
{
"terms": {
"user": ["eid1", "eid2"]
}
}
]
}
},
"size": 0,
"_source": [
"eid"
],
"aggs": {
"eids": {
"terms": {
"field": "eid",
"size": 1000
}
}
}
}

Ignore "match" clause from query in aggregation

I have a query with aggregations. One of the aggregation is on the field starsCount. There is a query clause that filters on the starsCount field along with other match clauses (hidden for clarity).
I wish for the starsCount aggregation to ignore the starsCount filtering in its results (the aggregation's result should be as if I had run the same query without the match clause on the starsCount field) while the other aggregation keeps its current behavior
Can this be done in a single query or should I use multiple ?
Here is the (simplified) query:
{
[...]
"aggs": {
"group_by_service": {
"comment": "keep current behaviour",
"terms": {
"field": "services",
"size": 46
}
},
"group_by_stars": {
"comment": "ignore the filter on the starsCount field",
"terms": {
"field": "starsCount",
"size": 100
}
}
},
"query": {
"bool": {
"must": [
[...] filters on other properties, non-relevant
{
"match": {
"starsCount": {
"query": "2"
}
}
}
]
}
}
}
Yes you can achieve this in single query by making use of post filter and filter aggregation.
You need to follow the below steps to create the query:
Remove the starsCount match query from the main query as it should not affect the group_by_stars aggregation.
Since starsCount match query should filter the documents, move it to post_filter. Any query inside post_filter will filter the documents after calculating aggregations.
Now since starsCount is no more part of main query all the aggregations will not be affected by it. But what is required is that this filter should effect all other aggregations except group_by_stars aggregation. To achieve this we'll make use of filter aggregation and apply it to all the aggregations except group_by_stars aggregation.
The resultant query will be as below. (Note that instead of match query I have used term query. You can still use match but in this case term is a better choice.):
{
"aggs": {
"some_other_agg":{
"filter": {
"term": {
"starsCount": "2"
}
},
"aggs": {
"some_other_agg_filtered": {
"terms": {
"field": "some_other_field"
}
}
}
},
"group_by_service": {
"filter": {
"term": {
"starsCount": "2"
}
},
"aggs": {
"group_by_service_filtered": {
"terms": {
"field": "services",
"size": 46
}
}
}
},
"group_by_stars": {
"terms": {
"field": "starsCount",
"size": 100
}
}
},
"query": {
"bool": {
"must": [
{...} //filter on other properties
]
}
},
"post_filter": {
"term": {
"starsCount": "2"
}
}
}

Elasticsearch search in documents with certain values for a field

I have an index with following document structure with 5 fields. I have written a search query as follows :
{
"query": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"from": 0,
"size": 1000
}
This works fine but as a new requirement I have to search only in documents where field4 has a given set of values suppose (1,2,3) and omit rest of the documents.
It is possible for me to obtain a list of field4 values which are to be omitted as they are present in the db with skip status.
Please suggest a solution for the same.Thanks in advance.
I suggest using a filter query inside a bool query to match the docs that meet the condition.
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"filter": {
"terms": {
"field4.keyword": [1, 2, 3]
}
}
}
}
}

Elasticsearch scoped aggregation not desired results

I have the following query but the aggregation doesn't seem to be acting on top of the query.
The query returns 3 results there are 10 items in the aggregation. Looks like the aggregation is acting on top of all queried results.
Basically, how do I get the aggregation to take the given query as the input?
{
"query": {
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"coordinates": [
-79.3931,
43.6709
],
"distance": "15km"
}
},
{
"term": {
"user.type": "2"
}
}
]
},
"query": {
"match": {
"user.shoes": "314"
}
}
}
},
"aggs": {
"dedup": {
"terms": { "field": "user.id" }
"aggs": {
"dedup_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
So as it turns out, I was expecting the aggregation to act on the paginated results given by the query. And that's incorrect.
The aggregation takes as input "all results" of the query, not just the paginated one.

How to check how many documents do not exist out of a list in elasticsearch

What will be the query to retrieve the number of documents not found in a query
This is my Query
$params['body']['query']["bool"]["filter"]["terms"]["feild"] = (list);
I want to retrieve the documents not found from the list.
If my List has (A,B,C). i just need to know that C is not indexed. I don't Need A,B,D,E,F or all of the remaining documents in index.
You can use must_not clause to achieve the negation of your query as shown below:
GET my-index/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must_not": {
"terms": {
"field": [
"value-1", "value-2"
]
}
}
}
}
}
}
}
must_not with aggregation will give more details about that field values which you are not expecting :-
{
"_source":false,
"query": {
"bool": {
"must_not": [
{"term": {"aFieldName": "aFieldValue"}}
]
}
},
"aggregations": {
"byLocation": {
"terms": {
"field": "aFieldName"
}
}
}
}

Resources