Select distinct values of bool query elastic search - elasticsearch

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated

You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Related

How to return results from elasticsearch after a threshold match

I have two queries as follows:
The first query returns the count of all documents per domain.
The second query returns the count where a field is empty.
Later I filter it in my backend, such that, if for a domain the count of documents missing field value is more than a specific threshold then only consider them else ignore. Could these two queries be combined together, such that I could do the threshold comparison and then return the results.
The first query is as follows:
GET database/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
]
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
The second query just applies a should filter as follows:
GET mapachitl/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"source": {
"value": "Web"
}
}
}
],
"should": [
{
"term": {
"address.city.keyword": {
"value": ""
}
}
},
{
"term": {
"address.zip.keyword": {
"value": ""
}
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"domains": {
"terms": {
"field": "domain_id"
}
}
}
}
Can I only return those domains where the ratio of documents missing city or zip code is more than 25%? I read about scripting but not sure how can I use it here.

Elasticsearch - adding a separate query for aggregation

Below is the elasticsearch query I am using to get the results and the filter options for the results from the aggregation. The problem is that whenever someone applies a filter, the overall result changes and hence the filter options also changes. I do not want the filter options to changes unless query parameter change. For now I am making two calls:
get all results without aggregation
Get all filters by using aggregation and setting the size parameter to 0
This approach uses 2 api requests and hence doubling the time. Can this be done in one request only ?
First call: All results without aggregation
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"filter": [
{
"match": {
"is_paid": false
}
}
]
}
},
"sort": [],
"from": 0,
"size": 15
}
Second call: getting filters
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"courseCount": {
"terms": {
"field": "provider",
"size": 100
}
},
"paidCount": {
"terms": {
"field": "is_paid",
"size": 3
}
},
"subjectCount": {
"terms": {
"field": "subject",
"size": 30
}
},
"levelCount": {
"terms": {
"field": "level",
"size": 4
}
},
"pacingCount": {
"terms": {
"field": "pacing_type",
"size": 4
}
}
}
}

Elasticsearch aggregation not being applied to filters

Here is my query. I am trying to get all products that are inside "men_fashion" and "men_shoes" category (categories are being used as terms/tags). Then i want to query the whole result set and search for products that have "men boots yellow" in them.
The below query works perfectly fine, but now i am not getting the correct aggregation results. It gives me all the brands where as i am only interested in the brands.
{
"size": 15,
"from": 0,
"query": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": {
"bool": {
"must": [{
"match": {
"active": 1
}
}, {
"match": {
"category": "men_fashion"
}
}, {
"match": {
"category": "men_shoes"
}
}]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}
I think this might be due to the filter i have applied, but if this is somehow complicated i am ok with using a simple query that would achieve this without the filters.
You're using a post filter instead of a normal query filter, try like this instead:
{
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": [
{
"match": {
"active": 1
}
},
{
"match": {
"category": "men_fashion"
}
},
{
"match": {
"category": "men_shoes"
}
}
]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}

ElasticSearch How to AND a nested query

I am trying to figure out how to AND my Elastic Search query. I've tried a few different variations but I am always hitting a parser error.
What I have is a structure like this:
{
"title": "my title",
"details": [
{ "name": "one", "value": 100 },
{ "name": "two", "value": 21 }
]
}
I have defined details as a nested type in my mappings. What I'm trying to achieve is a query where it matches a part of the title and it matches various details by the detail's name and value.
I have the following query which gets me nearly there but I haven't been able to figure out how to AND the details. As an example I'd like to find anything that has:
detail of one with value less than or equal to 100
AND detail of two with value less than or equal to 25
The following query only allows me to search by one detail name/value:
"query" : {
"bool": {
"must": [
{ "match": {"title": {"query": titleQuery, "operator": "and" } } },
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{ "match": {"details.name" : "one"} },
{ "range": {"details.value" : { "lte": 100 } } }
]
}
}
} // nested
}
] // must
}
}
As a second question, would it be better to query the title and then move the nested part of the query into a filter?
You were so close! Just add another "nested" clause in your outer "must":
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "title",
"operator": "and"
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "one" } },
{ "range": { "details.value": { "lte": 100 } } }
]
}
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "two" } },
{ "range": { "details.value": { "lte": 25 } } }
]
}
}
}
}
]
}
}
}
Here is some code I used to test it:
http://sense.qbox.io/gist/1fc30d49a810d22e85fa68d781114c2865a7c92e
EDIT: Oh, the answer to your second question is "yes", though if you're using 2.0 things have changed a little.

How to Boost a field based on condition in ElasticSearch

I am having a query structure like
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match_phrase": {
"user_agencies": "Census"
}
},
{
"match_phrase": {
"user_agencies": "MDA"
}
},
{
"match_phrase": {
"user_agencies": "OSD"
}
}
]
}
},
"size": 500,
"from": 0
}
Suppose this will return a list of 10 users.
What I need to get is, the user having Agency: 'Census' to be the first one in the search result (boost the results having Census as agency). How can we do this?
The following will do it. I converted some of the match_phrase queries to match queries as they contain only single terms
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match": {
"user_agencies": {
"query": "Census",
"boost": 3
}
}
},
{
"match": {
"user_agencies": {
"query": "MDA",
}
},
{
"match": {
"user_agencies": {
"query": "OSD",
}
}
]
}
},
"size": 500,
"from": 0
}
You should boost at query time, and give a big boost documents with "Census" in the agency field. If the boost is high enough, a document matching "Census" will always be on top, regardless of the values for the other fields.
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match_phrase": {
"user_agencies": "Census", "boost": 10
}
},
{
"match_phrase": {
"user_agencies": "MDA"
}
},
{
"match_phrase": {
"user_agencies": "OSD"
}
}
]
}
},
"size": 500,
"from": 0
}

Resources