how to select distinct children in elasticsearch - elasticsearch

got this query
"query": {
"bool": {
"filter": {
"has_parent": {
"parent_type": "profiles",
"query": {
"query_string": {
"query": "age:>0 and user:aqwe"
}
}
}
}
}
},
"sort": ["user", {"createdAt": "asc"}]
as a result got multiple items with same '_id', I think this is something like problem with joining. How to edit this query to select distinct items?

If you want to return only unique values you can use terms aggregation. In your case it would look like this (size in this case is maximum number of unique ids you want to return):
"query": {
"bool": {
"filter": {
"has_parent": {
"parent_type": "profiles",
"query": {
"query_string": {
"query": "age:>0 and user:aqwe"
}
}
}
}
}
},
"aggs": {
"unique": {
"terms": {
"field": "_id",
"size": 100
}
}
}
"sort": ["user", {"createdAt": "asc"}]

Related

Why am I getting different results for the same exact query when I aggregate it in elasticsearch?

So I have this query and I am trying to aggregate a certain field but when I use the same query in aggregation I dont get expected results
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "TEST",
"fields": ["TESTFIELD1", "TESTFIELD2"],
"lenient": true,
"default_operator": "OR"
}
}
]
}
},
"aggs": {
"All": {
"global": {},
"aggs": {
"TESTAGG": {
"filter": {
"bool": {
"must": [
{
"query_string": {
"query": "TEST",
"fields": ["TESTFIELD1", "TESTFIELD2"],
"lenient": true,
"default_operator": "OR"
}
}
]
}
},
"aggs": {
"subs": {
"terms": {
"field": "TESTFIELD1",
"size": 100,
"order": { "_term": "asc" }
}
}
}
}
}
}
}
}
The issue is in the aggregation is that I get values for TESTFIELD1 that dont exist in the hits in the main query and I am not sure why. Any ideas?

ElasticSearch fetch child from parent

I want to get all child with value "sales" at some field from parent with some unique id.
this is my query:
GET /_search
{
"query": {
"bool": {
"should": [{
"query_string": {
"query": "sales"
}
},
{
"has_parent": {
"type": "role_permission_parent",
"query": {
"match": {
"resourceURI": "urn:module:com.qad.collaboration"
}
}
}
}]
}
},
"size": 100
}
I get a childs from different parents but not from only what I type. Why? And how I fix it?
You must use must condition instead of should:
GET /_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "sales"
}
},
{
"has_parent": {
"type": "role_permission_parent",
"query": {
"match": {
"resourceURI": "urn:module:com.qad.collaboration"
}
}
}
}
]
}
},
"size": 100
}
Otherwise it works like "parent match" or "query_string match" instead of "parent match" and "query_string match"

Get unique values of a field in elasticsearch and get all records

I am using elasticsearch 2x version
I want distinct values in a field. I am getting only 10 values in the query.
How do I change this to view all distinct records?
Following is the query I am using:
GET messages-2017.04*/_search
{
"fields": ["_index"],
"query": {
"bool": {
"must":{
"bool":{
"should": [
{
"match": {
"RouteData": {
"query": "Q25B",
"type": "phrase"
}
}
}
]
}
}
}
}
}
I need to get all distinct _index names from the DB.
You need to use a terms aggregation instead, like this:
POST messages-2017.04*/_search
{
"size": 0,
"query": {
"bool": {
"must":{
"bool":{
"should": [
{
"match": {
"RouteData": {
"query": "Q25B",
"type": "phrase"
}
}
}
]
}
}
}
},
"aggs": {
"all_indexes": {
"terms": {
"field": "_index",
"size": 100
}
}
}
}

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

How to distinguish hits of several should clauses

I have a query with several "should" clauses:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "<condition1>"
}
},
{
"query_string": {
"query": "<condition1>"
}
}
]
}
},
}
},
"size": 1000,
"sort": [
{
"#timestamp": {
"order": "asc"
}
}
]
}
How can I find out which query results were produced by condition1, and which by condition2? Is it possible to inject a field with different values for different conditions, or distinguish hits in any other way?
You can use named queries to achieve this.
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "<condition1>",
"_name": "sub_query_1"
}
},
{
"query_string": {
"query": "<condition1>",
"_name": "sub_query_2"
}
}
]
}
}
}
You result will then contain a matched_filters array with either sub_query_1, sub_query_2, or both in it.
Update
Play link: https://www.found.no/play/gist/af1a1fa2b5cf3aa279b1

Resources