Exclude documents from aggregation - elasticsearch

I am trying to get a filtered result set from my index.
{"group_id": 123, "type" : 1},
{"group_id": 123, "type" : 3},
{"group_id": 123, "type" : 2},
{"group_id": 423, "type" : 3},
{"group_id": 423, "type" : 1},
{"group_id": 231, "type" : 1}
Now I want to get all documents but exclude the ones with group_id that contains type = 2. So, in this case, I want to get all documents with group_id = 423 and group_id = 231, but exclude all documents with group_id = 123.
I was experimenting with filtered bool query:
{
"query": {
"bool": {
"must_not": [
{
"term": {
"type": 2
}
}
]
}
}
}
but that only excludes one document.
Any hints are welcome!

You can achieve this using two Elasticsearch search requests:
First, get all values of "group_id" for which corresponding value of "type" is 2. You need to use Terms Aggregation for this.
POST <index name>/<type name>/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"term": {
"type": 2
}
}
}
},
"aggs": {
"group_ids_type_2": {
"terms": {
"field": "group_id",
"size": 0
}
}
}
}
Save the list of values of "group_id" fields received from the above request.
Now, use a query with must_not filter to get all documents such that the value of their "group_id" is not present in the list obtained above. You need to use Terms Filter here.
POST <index name>/<type name>/_search
{
"query": {
"bool": {
"must_not": [
{
"terms": {
"group_id": [
"123" <-- Replace this with a comma separated list of all group_id values received from first search request
]
}
}
]
}
}
}

Related

Elasticsearch query to match on one field but should filter results based on an other field

Looking for help formulating a query for the following use case. I have a text field and an integer field which is an ID. I need to search for matching text but expecting the response to contain results that match one of the IDs. As an example, I have two fields. One is product ID which is text. And an owner ID which is an integer. Owners must be allowed to view only those products that are owned by them. To add on, an owner can have multiple IDs.
Sample records in Elasticsearch:
{
"product": "MKL89ADH12",
"ownerId" : 98765
},
{
"product": "POIUD780H",
"ownerId" : 12345
},
{
"product": "UJK87TG89",
"ownerId" : 98765
},
{
"product": "897596YHJ",
"ownerId" : 98765
},
{
"product": "LKGGN764HH",
"ownerId" : 784512
}
If 98765 and 12345 belong to the same owner, they should be able to view first 4 products only. And no results should be returned if they search for LKGGN764HH .
I tried following query but it gives me no results.
{
"size": 24,
"query": {
"bool": {
"must":[{
"match" : {
"product": {
"query": "MKL89ADH12"
}
}
},
{
"match" : {
"product": {
"query": "LKGGN764HH"
}
}
},
{
"terms": {
"ownerId": [98765, 12345],
"boost": 1.0
}
}
]
}
}
}
I am expecting the response to contain MKL89ADH12 because I am filtering by the ownerId. Looking for help formulating the right query for my use case.
You need to filter on "ownerId": [98765, 12345] and then use "should" clause to return documents which match any of the text.
{
"size": 24,
"query": {
"bool": {
"filter": [
{
"terms": {
"ownerId": [
98765,
12345
],
"boost": 1
}
}
],
"should": [
{
"match": {
"product": {
"query": "MKL89ADH12"
}
}
},
{
"match": {
"product": {
"query": "LKGGN764HH"
}
}
}
],
"minimum_should_match": 1
}
}
}
Above query will translate to
select * from index
where ownerid in ( 98765, 12345)
AND (product IN ("MKL89ADH12","LKGGN764HH))
while your works like
select * from index
where ownerid in ( 98765, 12345)
AND product = "MKL89ADH12"
AND product = "LKGGN764HH"

how to use boost or weight in more_like_this

I have the following Elastic Query,
more_like_it_match = {
"min_score": 5,
"query":
{"filtered": {
"query": {
"bool": {
"must": {
"more_like_this": {
"fields": ["title","desc","cat_id","user_id"],
"like": {
"doc": {
"title": item["title"],
"desc": item["desc"],
"cat_id": item["cat_id"],
"user_id": item["user_id"],
},
},
"min_term_freq": 1,
"max_query_terms": 100,
"min_doc_freq": 0
}
}
}
},
"filter": {
"not": {
"term": {
"id": item["id"]
}
}
}
}
}
}
it works correctly but I'm looking for a solution that I could set boost or weight for each one of the fields, as an example I want to say to Elastics Title field matching is three-time more important than Category Field, is there any way to achieve it?
note: I've found the following query as the solution but it not what I'm looking for.
{
"min_score" : 5,
"query": {
"dis_max": {
"queries": [
{
"more_like_this" : {
"fields" : ["title"],
"like_text" : item["title"],
"min_term_freq" : 1,
"max_query_terms" : 100,
"boost": 100
}
},
{
"more_like_this" : {
"fields" : ["desc"],
"like_text" : item["desc"],
"min_term_freq" : 1,
"max_query_terms" : 100,
"boost": 100,
}
}
]
}
},
"filter":{
"not":{
"term" :{
"id": item["id"]
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html
Dis Max Queryedit
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.

ElasticSearch order by number of matches in nested fields

Complete beginner here, quite possibly trying to do the impossible.
I have the following structure that I would like to store in Elasticsearch:
{
"id" : 1,
"code" : "03f3301c-4089-11e7-a919-92ebcb67fe33",
"countries" : [
{
"id" : 1,
"name" : "Netherlands"
},
{
"id" : 2,
"name" : "United Kingdom"
}
],
"tags" : [
{
"id" : 1,
"name" : "Scanned"
},
{
"id" : 2,
"name" : "Secured"
},
{
"id" : 3,
"name" : "Cleared"
}
]
}
I have complete control over how it will be stored, so the structure can change, but it should contain all these fields in some form.
I’d like to be able to query this data over countries and tags in such way that all those items having at least one match are returned, ordered by number of matches. If at all possible I’d prefer not to do a full text search.
For example:
id, code, country ids, tag ids
1, ..., [1, 2, 3], [1]
2, ..., [1], [1, 2, 3]
For the question: "which of these was in country 1 or has tag 1 or has tag 2", should return:
2, ..., [1], [1, 2, 3]
1, ..., [1, 2, 3], [1]
In this order, because the second row matches more sub-queries in the above disjunction.
In essence, I’d like to replicate this SQL query:
SELECT p.id, p.code, COUNT(p.id) FROM packages p
LEFT JOIN tags t ON t.package_id = p.id
LEFT JOIN countries c ON c.package_id = p.id
WHERE t.id IN (1, 2, 3) OR c.id IN (1, 2, 3)
GROUP BY p.id
ORDER BY COUNT(p.id);
I’m using ElasticSearch 2.4.5 if that matters.
Hopefully I was clear enough. Thank you for your help!
You need countries and tags to be of type nested. Also, you need to take control of the scoring with function_score give a weight of 1 for the queries inside the function_score and also play with boost_mode and score_mode. In the end you can use this query:
GET /nested/test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 1
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 2
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "countries",
"query": {
"term": {
"countries.id": 1
}
}
}
},
"weight": 1
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
For a more complete test case, I am also providing the mapping and test data:
PUT nested
{
"mappings": {
"test": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"countries": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST nested/test/_bulk
{"index":{"_id":1}}
{"name":"Foo Bar","tags":[{"id":2,"name":"My Tag 5"},{"id":3,"name":"My Tag 7"}],"countries":[{"id":1,"name":"USA"}]}
{"index":{"_id":2}}
{"name":"Foo Bar","tags":[{"id":3,"name":"My Tag 6"}],"countries":[{"id":1,"name":"USA"},{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}
{"index":{"_id":3}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 4"},{"id":3,"name":"My Tag 1"}],"countries":[{"id":3,"name":"UAE"}]}
{"index":{"_id":4}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 1"},{"id":2,"name":"My Tag 4"},{"id":3,"name":"My Tag 2"}],"countries":[{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}

elastic search sort aggregation by selected field

How can I sort the output from an aggregation by a field that is in the source data, but not part of the output of the aggregation?
In my source data I have a date field that I would like the output of the aggregation to be sorted by date.
Is that possible? I've looked at using "order" within the aggregation, but I don't think it can see that date field to use it for sorting?
I've also tried adding a sub aggregation which includes the date field, but again, I cannot get it to sort on this field.
I'm calculating a hash for each document in my ETL on the way in to elastic. My data set contains a lot of duplication, so I'm trying to use the aggregation on the hash field to filter out duplicates and that works fine. I need the output from the aggregation to retain a date sort order so that I can work with the output in angular.
The documents are like this:
{_id: 123,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}
{_id: 124,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}
{_id: 132,
_source: {
"hash": "0202020202020"
"user": "1"
"dateTime" : "2001/2/20 09:20:43"
"action": "Logout"
}
{_id: 200,
_source: {
"hash": "0303030303030303"
"user": "2"
"dateTime" : "2001/2/22 09:32:14"
"action": "Login"
}
So I want to use an aggregation on the hash value to remove duplicates from my set and then render the response in date order.
My query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"action": "Login"
}
}
]
},
"size": 0,
"aggs": {
"md5": {
"terms": {
"field": "hash",
"size": 0
}
},
"size": 0,
"aggs": {
"byDate": {
"terms": {
"field": "dateTime",
"size": 0
}
}
}
}
}
}
}
}
Currently the output is ordered on the hash and I need it ordered on the date field within each hash bucket. Is that possible?
If the aggregation on "hash" is just for removing duplicates, it might work for you to simply aggregate on "dateTime" first, followed by the terms aggregation on "hash". For example:
GET my_index/test/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool": {
"must" : [
{ "term": {"action":"Login"} }
]
}
}
}
},
"size": 0,
"aggs": {
"byDate" : {
"terms": {
"field" : "dateTime",
"order": { "_term": "asc" } <---- EDIT: must specify order here
},
"aggs": {
"byHash": {
"terms": {
"field": "hash"
}
}
}
}
}
}
This way, your results would be sorted by "dateTime" first.

Elasticsearch: how to filter by summed values in nested objects?

I have the following products structure in the elasticsearch:
POST /test/products/1
{
"name": "product1",
"sales": [
{
"quantity": 10,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-02"
},
{
"quantity": 5,
"customer": "customer2",
"date": "2013-12-30"
}
]
}
POST /test/products/2
{
"name": "product2",
"sales": [
{
"quantity": 1,
"customer": "customer1",
"date": "2014-01-01"
},
{
"quantity": 15,
"customer": "customer1",
"date": "2014-02-01"
},
{
"quantity": 1,
"customer": "customer2",
"date": "2014-01-21"
}
]
}
The sales field is nested object. I need to filter products like this:
"get all products which have total quantity >= 16 and sales.customer = 'customer1'".
The total quantity is sum(sales.quantity) where sales.customer = 'customer1'.
Therefore the search results should contain only 'product2'.
I tried to use aggs but I didn't understand how to filter in this case.
I haven't found any information about it in the elasticsearch documentation.
Is it possible?
I would welcome any ideas, thanks!
First of all be clear what do you want as result? Is it count or query fields? Aggregations only gives count and for fields you need to use filter in query. If you want fields then you cant get filter for sum(sales.quantity)>=16 and if you want count you can get it using range aggregation but for that also i think you can use range only in elasticsearch document fields not some computed values.
The nearest solution i can give you is as below
{
"size" : 0,
"query" :{
"filtered" : {
"query" :{ "match_all": {} },
"filter" : {
"nested": {
"path": "sales",
"filter" : {"term" : {"sales.customer" : "customer1"}}
}
}
}
},
"aggregations" :{
"salesNested" : {
"nested" : {"path" : "sales"},
"aggregations" :{
"aggByrange" : {
"numeric_range": {
**"field": "sales.quantity"**,
"ranges": [
{
"from": 16
}]
}
}
},
"aggregations" : {
"quantityStats" : {
"stats" : {
{ "field" : "sales.quantity" }
}
}
}
}
}
}
In above query we are using "field": "sales.quantity". For your solution use must be able change sales.quantity with sum value of quantityStats aggregation which i think elasticsearch dont provide.

Resources