GROUP BY in elasticsearch - elasticsearch

I am trying to write a GROUP BY query in elastic search using version 5.2
I want to query the data and limit that down to those which have a particular 'tag'. In the case below. I want to select items which contain the word "FIY" in the title or content fields and then narrow that down so as to only search those documents which have the tags "FIY" and "Competition"
The query part is fine but I am struggling to limit it to the given tag.
So far I have got, but I am getting the error.
"reason": "[bool] query does not support [terms]",
GET advice-articles/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": ["title", "content"]
}
}
], "filter": {
"bool": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
}
}
an example index is
"_index": "advice-articles",
"_type": "article",
"_id": "1460",
"_score": 4.3167734,
"_source": {
"id": "1460",
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
the mappings I have are as follows
{
"advice-articles": {
"mappings": {
"article": {
"properties": {
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"tags": {
"type": "nested",
"properties": {
"tagName": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
}

bool query built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are:
must, must_not, filter, should
GET _search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": [
"title",
"content"
]
}
},
{
"nested": {
"path": "tags",
"query": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
]
}
}
}
Here is how you can use a must clause for your query requirements.

Inside the filter you dont need to put bool.
POST newindex/test/1460333
{
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "shoud not return"
}
]
}
POST newindex/test/1460
{
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
}
Query:
GET newindex/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "FIY",
"fields": [
"title",
"content"
]
}
}
],
"filter": {
"terms": {
"tags.tagName": [
"competition"
]
}
}
}
}
}
Result :
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "newindex",
"_type": "test",
"_id": "1460",
"_score": 0.2876821,
"_source": {
"title": "Your top FIY tips",
"content": "Fix It Yourself in April 2012.",
"tags": [
{
"tagName": "Fix it yourself"
},
{
"tagName": "customer tips"
},
{
"tagName": "competition"
}
]
}
}
]
}
}

Related

ElasticSearch: Fetch records from nested Array that "only" include given element/s and filter-out the rest with mixed values

I am stuck on one of my tasks.
Overview:
There are some records on elastic search. Which includes information about the candidates and their employment.
There is a field that stores information about the statuses in which the candidate got submitted.
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
}
}
}
]
}
}
I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.
here is the mapping
"submittedJobs": {
"type": "nested",
"properties": {
"statusId": {
"type": "long"
},
"status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase_normalizer"
}
}
},
"jobId": {
"type": "keyword"
}
}
}
For example. let's suppose there are two documents
document #1:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
},
document #2:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.
Document #1 will be filtered-out since it includes mixed statuses.
Any help will be appreciated.
Try this:
Will be return only document with all item of array with status PendingPM.
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must_not": [
{
"match": {
"submittedJobs.status": {
"query": "PendingPM"
}
}
},
{
"match": {
"submittedJobs.status": {
"query": "PendingClient"
}
}
}
]
}
}
}
}
]
}
}
}
You can use inner_hits along with nested query to get only the matched results from the document
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"submittedJobs": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Search Result would be:
"hits": [
{
"_index": "73062439",
"_id": "1",
"_score": 0.0,
"_source": {
"submittedJobs": [
{
"status": "PendingPM",
"jobId": "ABC"
},
{
"status": "PendingClient",
"jobId": "XYZ"
},
{
"status": "PendingPM",
"jobId": "WXY"
}
]
},
"inner_hits": { // note this
"submittedJobs": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.4700036,
"hits": [
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 0
},
"_score": 0.4700036,
"_source": {
"jobId": "ABC",
"status": "PendingPM"
}
},
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 2
},
"_score": 0.4700036,
"_source": {
"jobId": "WXY",
"status": "PendingPM"
}
}
]
}
}
}
}
]

Query only those documents where image field is not empty

I have the following mapping **(dynamic strict on the type)**
"created": {
"type": "date"
},
"images": {
"properties": {
"checksum": {
"type": "text",
"index": false
},
"path": {
"type": "text",
"index": false
},
"url": {
"type": "text",
"index": false
}
}
},
I want to query documents where there is a image present
I tried couple of combinations but no luck so far.
This is the last i tried
POST catalog/_search
{
"query": {
"script": {
"script": "doc['images'].values.length > 0"
}
}
}
POST catalog/_search
{
"query": {
"script": {
"script": "doc['images.url'].values.length > 0"
}
}
}
But here it says that field data is not true for text fields. Is there anyway I can do this without changing my mapping.
Ideally this should give me all the records where there is no images. But this is returning all records
POST catalog/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "images"
}
}
]
}
}
}
Here is the example document in which there is a image.
{
"_index": "catalog-2018-03-03",
"_type": "product",
"_id": "151755703145e27e4983a0bd1b70be44",
"_score": 1,
"_source": {
"merchant": {
"link": "http://shophive.com/",
"name": "shophive"
},
"images": [],
"updated": "2018-03-18T13:06:33.583480",
"name": "Plantronics Savi Talk",
"created": "2018-03-18T13:06:33.583459",
"url": "http://www.shophive.com/plantronics-savi-talk",
"price": {
"new": 24999,
"old": 24999,
"discount_percent": 0
},
"category": {
"level_1": {
"url": "computers/tablets/networking",
"name": "Computers/Tablets & Networking "
},
"level_2": {
"url": "tablets/ebook-readers",
"name": "Tablets & eBook Readers"
}
}
}
}
Updated
With the below query I am expecting that elasticsearch would return the documents in which image is missing
POST catalog/product/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "images"
}
}
]
}
}
}
But the result i receive is all the documents in my index and apparently every document has one image. Here is the example document i get with above query
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20967,
"max_score": 1,
"hits": [
{
"_index": "catalog-2018-03-03",
"_type": "product",
"_id": "151755703145e27e4983a0bd1b70be44",
"_score": 1,
"_source": {
"merchant": {
"link": "http://shophive.com/",
"name": "shophive"
},
"images": [
{
"url": "http://www.shophive.com/media/catalog/product/cache/1/small_image/165x/9df78eab33525d08d6e5fb8d27136e95/p/l/plantronics_savi_talk.jpg",
"path": "full/8e3587bd2b6107f0beafa9b1ba05f476539be0a8.jpg",
"checksum": "fa74ade23c8e80e9590d48d4e59b6b64"
}
],
"updated": "2018-03-18T13:06:33.583480",
"name": "Plantronics Savi Talk",
"created": "2018-03-18T13:06:33.583459",
"url": "http://www.shophive.com/plantronics-savi-talk",
"price": {
"new": 24999,
"old": 24999,
"discount_percent": 0
},
"category": {
"level_1": {
"url": "computers/tablets/networking",
"name": "Computers/Tablets & Networking "
},
"level_2": {
"url": "tablets/ebook-readers",
"name": "Tablets & eBook Readers"
}
}
}
}
}
}
You should leave out the the square brackets in the query as you only have one clause
POST /catalog/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "images"
}
}
}
}
}
This returns the docs with out images for me and if you need only those that have images
POST /catalog/_search
{
"query": {
"exists": {
"field": "images"
}
}
}

elastic bool query must match mot getting considered

i am basically trying to write a query where it should return the document where
school is "holy international" AND grade is "second".
but the issue with the current query is that its not considering the must match query part. ie even though i don't i specify the school is the giving me this document where as it is not a match.
query is giving me all the documents where the grade is second.
i want only document where school is "holy international" AND grade is "second".
as well as i have not specified in the match query for "schools.school" but its giving me results.
mapping
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase1": {
"tokenizer": "keyword",
"filter": ["lowercase", "my_pattern_replace1", "trim"]
},
"my_keyword_lowercase2": {
"tokenizer": "standard",
"filter": ["lowercase", "trim"]
}
},
"filter": {
"my_pattern_replace1": {
"type": "pattern_replace",
"pattern": ".",
"replacement": ""
}
}
}
},
"mappings": {
"test_data": {
"properties": {
"schools": {
"type": "nested",
"properties": {
"school": {
"type": "string",
"analyzer": "my_keyword_lowercase1"
},
"grade": {
"type": "string",
"analyzer": "my_keyword_lowercase2"
}
}
}
}
}
}
}
data
{
"_index": "data_index",
"_type": "test_data",
"_id": "57a33ebc1d41",
"_version": 1,
"found": true,
"_source": {
"summary": null,
"schools": [{
"school": "little flower",
"grade": "first",
"date": "2007-06-01",
},
{
"school": "holy international",
"grade": "second",
"date": "2007-06-01",
},
],
"first_name": "Adam",
"location": "Kansas City",
"last_name": "Roger",
"country": "US",
"name": "Adam Roger",
}
}
query
{
"_source": ["first_name"],
"query": {
"nested": {
"path": "schools",
"inner_hits": {
"_source": {
"includes": [
"schools.school",
"schools.grade"
]
}
},
"query": {
"bool": {
"must": {
"match": {
"schools.school": {
"query": "" <-----X didnt specify anything
}
}
},
"filter": {
"match": {
"schools.grade": {
"query": "second",
"operator": "and",
"minimum_should_match": "100%"
}
}
}
}
}
}
}
}
result
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "data_test",
"_type": "test_data",
"_id": "57a33ebc1d41",
"_score": 0.2876821,
"_source": {
"first_name": "Adam"
},
"inner_hits": {
"schools": {
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_nested": {
"field": "schools",
"offset": 0
},
"_score": 0.2876821,
"_source": {
"schools": {
"school": "holy international",
"grade": "second"
}
}
}
]
}
}
}
}
]
}
}
So, basically your problem is analysis step, when I load everything and checked, it become very clear:
This filter completely wipes all string from schools.school field
"filter": {
"my_pattern_replace1": {
"type": "pattern_replace",
"pattern": ".",
"replacement": ""
}
}
I think, that's happening because . is regexp literal, so, when I checked it:
POST /_analyze
{
"field": "schools.school",
"text": "holy international"
}
{
"tokens": [
{
"token": "",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 0
}
]
}
That's why you always get a match, every string you passed during indexing time and during search time becomes "". Some additional info from Elastic wiki - https://www.elastic.co/guide/en/elasticsearch/reference/5.1/analysis-pattern_replace-tokenfilter.html
After I removed patter replace filter, this query returns everything as expected:
{
"_source": ["first_name"],
"query": {
"nested": {
"path": "schools",
"inner_hits": {
"_source": {
"includes": [
"schools.school",
"schools.grade"
]
}
},
"query": {
"bool": {
"must": {
"match": {
"schools.school": {
"query": "holy international"
}
}
},
"filter": {
"match": {
"schools.grade": {
"query": "second"
}
}
}
}
}
}
}
}

Terms filter on a list of not_analysed string

Suppose there is a mapping like this:
"keywords": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
I wan't to get query like this:
Retrive all of documents that contain(exactly and phrase) "Blah Mlah" AND "Baw"
To following elasticsearch document for term filters, I'm try like this:
'query': {
'filtered': {
'filter': {
'bool': {
'must': {
[
{
"terms": {
"keywords": ["Blah Mlah", "Baw"],
"execution": "and"
}
}
]
}
}
},
},
},
},
but "execution:and" not work and it's will return like or.
Also i'm tried this:
[
{
"term": {
"keywords": "Blah Mlah"
}
},
{
"term": {
"keywords": "Baw"
}
}
]
but it's not work on keywords field when i wan't to search two word keywords like "Blah Mlah". How can do it?
the query should be against keywords.raw since that is the multi-field that has not been analyzed
Example Query in 1.x :
put test
put test/test/_mapping
{
"properties": {
"keywords": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
put test/test/1
{
"keywords" : [ "Baw", "Blah Mlah"]
}
put test/test/2
{
"keywords" : ["Baw"]
}
post test/test/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"keywords.raw": [
"Blah Mlah",
"Baw"
],
"execution": "and"
}
}
]
}
}
}
}
}
Results:
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"keywords": [
"Baw",
"Blah Mlah"
]
}
}
]
}
Example Query in 2.x :
{
"query": {
"bool": {
"filter": [
{
"terms": {
"keywords.raw": [
"Blah Mlah"
]
}
},
{
"terms": {
"keywords.raw": [
"Baw"
]
}
}
]
}
}
}

Filter an array of dictionaries that all must contain all of specified values

Say I had this document:
{
"_index": "food",
"_type": "recipes",
"_id": "AU2LjsMLOuShTUj_LBrT",
"_score": 1,
"_source": {
"name": "granola bars",
"ingredients": [
{
"name": "butter",
"quantity": 4
},
{
"name": "granola",
"quantity": 6
}
]
}
}
Using the following filter matches this document fine:
POST /food/recipes/_search
{
"query": {
"filtered": {
"query": {
"match_all": { }
},
"filter": {
"nested": {
"path": "ingredients",
"filter": {
"bool": {
"must": [
{
"terms": {
"ingredients.name": [
"butter",
"granola"
]
}
}
]
}
}
}
}
}
}
}
However it will also match documents that have additional ingredients.
How can I query so that it will only match documents that only have the ingredients butter and granola?
You need a "double negative", so to speak. You want to match parent documents that have nested docs that match your query, and no nested documents that don't match your query.
To test I set up the following index:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"ingredients": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"quantity": {
"type": "long"
}
}
},
"name": {
"type": "string"
}
}
}
}
}
And added these two documents:
PUT /test_index/doc/1
{
"name": "granola bars",
"ingredients": [
{
"name": "butter",
"quantity": 4
},
{
"name": "granola",
"quantity": 6
}
]
}
PUT /test_index/doc/2
{
"name": "granola cookies",
"ingredients": [
{
"name": "butter",
"quantity": 5
},
{
"name": "granola",
"quantity": 7
},
{
"name": "milk",
"quantity": 2
},
{
"name": "sugar",
"quantity": 7
}
]
}
Your query returns both the documents. For the purposes of this question, to make it easier to understand, I first simplified your query a little:
POST /test_index/doc/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "ingredients",
"filter": {
"terms": {
"ingredients.name": [
"butter",
"granola"
]
}
}
}
}
}
}
}
Then I added an outer "bool" with two "nested" filters. One is the filter you originally had inside a "must", and the second is the opposite of the filter you had (so it will match nested documents that do NOT contain those terms), inside a "must_not":
POST /test_index/doc/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "ingredients",
"filter": {
"terms": {
"ingredients.name": [
"butter",
"granola"
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "ingredients",
"filter": {
"not": {
"filter": {
"terms": {
"ingredients.name": [
"butter",
"granola"
]
}
}
}
}
}
}
]
}
}
}
}
}
This returns only the one doc:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"name": "granola bars",
"ingredients": [
{
"name": "butter",
"quantity": 4
},
{
"name": "granola",
"quantity": 6
}
]
}
}
]
}
}
Here is all the code I used for testing it:
http://sense.qbox.io/gist/e5fd0c35070fb329d40ad342b3198695e6f52d3a

Resources