Elastic match documents containing all values in array - elasticsearch

I have documents like the ones bellow in Elastic. Each document has some information and all mandatory permissions to access the document.
When I query, I would like to pass all user permissions and receive matched documents but am having difficulties with the query.
The documents:
{
"id": 1,
"permissions": ["a", "d"]
}
{
"id": 2,
"permissions": ["a"]
}
{
"id": 3,
"permissions": ["a", "b"]
}
{
"id": 4,
"permissions": ["a", "c"]
}
This is the closest I got:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"tags.keyword": "a"
}
},
{
"match_phrase": {
"tags.keyword": "b"
}
},
],
"minimum_should_match": doc.tags.length
}
}
}
// Result are documents with id's 2 and 3.
I tried extending the "minimum_should_match" with the "script" but without success (apparently it does not support it):
"script" : {
"script" : {
"inline": "doc['permissions'].length",
"lang": "painless"
}
}
In an example above, with passed permission array ["a", "b", "c"], the output should be documents with id's 2, 3 and 4. ["a"] matches only document with id 2.
EDIT: Additional information
A document has up to 5 permissions, as well as the users, but the set of permissions is quite big (500+), so I am searching for a generic query. The data can also be transformed.
I am using Elastic 7.6
Any help is appreciated!

There is no efficient way to achieve your expected result, but apart from the below search query, you can even get your result using scripting, please refer this SO answer to know that.
You can use Terms Query that returns documents that contain one or more exact terms in a provided field.
Adding a working example with index data(taken the same as that given in question), index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"permissions": {
"type": "text"
}
}
}
}
Search Query:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"terms": {
"permissions": [
"a",
"b",
"c"
]
}
}
],
"must_not": [
{
"terms": {
"permissions": [
"d"
]
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"id": 2,
"permissions": [
"a"
]
}
},
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"id": 3,
"permissions": [
"a",
"b"
]
}
},
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"id": 4,
"permissions": [
"a",
"c"
]
}
}
]

Related

Need Elasticsearch guidance for searching on an array of chemical compounds

I have a list of products and array of chemical compounds for each product, i.e. ['Sodium', 'Sodium bicarbonate', .....]. In this example 'sodium', and 'sodium bicarbonate' are two different values that can be search on independently, which complicates things, so using the text keyword field criteria did not help.
I need some guidance on the best method to handle these array of strings within Elasticsearch while retaining Elasticsearch's indexing magic. I appreciate any help you can provide.
FYI
I'm currently using Elasticsearch 6.3
You can use the multi-match query, which builds on the match query to allow multi-field queries
Adding a working example with index data, search query, and search result.
Index Data:
{
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
{
"product": "product2",
"compounds": [
"Sodium"
]
}
{
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
{
"product": "product4",
"compounds": [
"Chlorine
]
}
Search Query:
{
"query": {
"multi_match" : {
"query": "Sodium AND Sodium bicarbonate",
"fields": [ "compounds", "compounds.keyword" ]
}
}
}
Search Result:
"hits": [
{
"_index": "65513968",
"_type": "_doc",
"_id": "1",
"_score": 1.0897084,
"_source": {
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "3",
"_score": 1.0659102,
"_source": {
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "2",
"_score": 0.7032229,
"_source": {
"product": "product",
"compounds": [
"Sodium"
]
}
}
]
You can use terms query if you want to return documents that contain one or more exact terms in a field
A unique list of chemical compounds
To find the unique lists of chemical compounds you can use the terms aggregation.
{
"size": 0,
"aggs": {
"compounds": {
"terms": {
"field": "compounds.keyword"
}
}
}
}
Result:
"aggregations": {
"compounds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sodium",
"doc_count": 2
},
{
"key": "Sodium bicarbonate",
"doc_count": 2
},
{
"key": "Chlorine",
"doc_count": 1
}
]
}
}

Elastic Search search for all fields and boost exact match

I'm very new to elastic search, how do I write a query which search for a keyword (ie. test keyword) in all fields in the document, and boost for
exact match for this keyword phrase in all fields.
occurrences for certain fields (which I have boosted 5 for A, 3 for B and 1 for C)
I see some documentation on match_phrase, but it doesn't seem to support multiple fields.
{
"query": {
"multi_match": {
"query": "test keyword",
"fields": ["A^5", "B^3", "C^1"]
}
}
}
If you want an exact match for the keyword phrase in all fields along with boost then try out this below search query where the multi-match query is used with type phrase parameter :
Adding a working example with index data, search query, and search result
Index data:
{
"A":"test keyword",
"B":"a",
"C":"c"
}
{
"A":"a",
"B":"test keyword",
"C":"c"
}
{
"A":"a",
"B":"b",
"C":"test keyword"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "test keyword",
"fields": [
"A^5",
"B^3",
"C^1"
],
"type":"phrase" <-- note this
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64266554",
"_type": "_doc",
"_id": "1",
"_score": 16.285465,
"_source": {
"A": "test keyword",
"B": "a",
"C": "c"
}
},
{
"_index": "stof_64266554",
"_type": "_doc",
"_id": "2",
"_score": 8.142733,
"_source": {
"A": "a",
"B": "test keyword",
"C": "c"
}
},
{
"_index": "stof_64266554",
"_type": "_doc",
"_id": "3",
"_score": 1.6285465,
"_source": {
"A": "a",
"B": "b",
"C": "test keyword"
}
}
]

Elasticsearch associating exact match terms

I have a search index of filenames containing over 100,000 entries that share about 500 unique variations of the main filename field. I have recently made some modifications to certain filename values that are being generated from my data. I was wondering if there is a way to link certain queries to return an exact match. In the following query:
"query": {
"bool": {
"must": [
{
"match": {
"filename": "foo-bar"
}
}
],
}
}
how would it be possible to modify the index and associate the results so that above query will also match results foo-bar-baz, but not foo-bar-foo or any other variation?
Thanks in advance for your help
You can use a term query instead of a match query. Perfect to use on a keyword:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Adding a working example with index data and search query. (Using the default mapping)
Index Data:
{
"fileName": "foo-bar"
}
{
"fileName": "foo-bar-baz"
}
{
"fileName": "foo-bar-foo"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fileName.keyword": "foo-bar"
}
},
{
"match": {
"fileName.keyword": "foo-bar-baz"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar-baz"
}
}
]

Elastic Search- Fetch Distinct Tags

I have document of following format:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
Now, I want to fetch unique tags value from whole document 'tags field' starting with prefix g*, so that these unique tags will be display by tag suggestors(Stackoverflow site is an example).
For example: Whenever user types, 'g':
"guava", "gulmohar", "grammar", "grapes" and "green" should be returned as a result.
ie. the query should returns distinct tags with prefix g*.
I tried everywhere, browse whole documentations, searched es forum, but I didn't find any clue, much to my dismay.
I tried aggregations, but aggregations returns the distinct count for whole words/token in tags field. It does not return the unique list of tags starting with 'g'.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
result of above: guava(w), apple(q), mango(1),...
Can someone please suggest me the correct way to fetch all the distinct tags with prefix input_prefix*?
It's a bit of a hack, but this seems to accomplish what you want.
I created an index and added your docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
Then I used a combination of prefix query and highlighting as follows:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
What you're trying to do amounts to autocomplete, of course, and there are perhaps better ways of going about that than what I posted above (though they are a bit more involved). Here are a couple of blog posts we did about ways to set up autocomplete:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
As per #Sloan Ahrens advice, I did following:
Updated the mapping:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
Reference: ES API Guide
Inserted these indexes:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
I don't need to modify much, my original index. Just added input before tags array.
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
gave me the desired output.
I accepted #Sloan Ahrens answer as his suggestions worked like a charm for me, and he showed me the right direction.

ElasticSearch - prefix with space and filtering

My ElasticSearch server contains documents of the following form:
{
"_index": "xindex",
"_type": "xtype",
"_id": "1100",
"_score": 3.00010,
"_source": {
"_id": "2333345",
"field1": "11111111111111",
"field2": "y",
"name": "hello world",
}
}
I need to get all the documents with name prefix "hello wo" and field2 "y".
Tried a lot of queries and none have worked. There are all kind of solutions for the prefix with space issue, but when adding the filtering/another query for field2, results get corrupted.
Thanks.
You can achieve this in 3 steps :
Change your mapping of field name to not_analyzed
Use a match_phrase_prefix query (documentation here)
Filter this query results by wrapping it in a filtered query and use a term filter on the field2 with value "y"
You can see it working with the following dataset :
PUT test/prefix/_mapping
{
"properties": {
"name":{
"type": "string",
"index": "not_analyzed"
}
}
}
//should match
PUT test/prefix/2333345
{
"field1": "11111111111111",
"field2": "y",
"name": "hello world"
}
//should match
PUT test/prefix/1112223
{
"field1": "22222222222222",
"field2": "y",
"name": "hello wombat"
}
//should not match (field2 value is different)
PUT test/prefix/4445556
{
"field1": "33333333333333",
"field2": "z",
"name": "hello world"
}
//should not match (second word not starting with wo)
PUT test/prefix/4445556
{
"field1": "33333333333333",
"field2": "y",
"name": "hello zombie"
}
Then, the query is :
GET test/prefix/_search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix" : {
"name" : "hello wo"
}
},
"filter": {
"term": {
"field2": "y"
}
}
}
}
}
which outputs the documents 1112223 and 2333345 as expected :
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.592944,
"hits": [
{
"_index": "test",
"_type": "prefix",
"_id": "2333345",
"_score": 1.592944,
"_source": {
"field1": "11111111111111",
"field2": "y",
"name": "hello world"
}
},
{
"_index": "test",
"_type": "prefix",
"_id": "1112223",
"_score": 1.592944,
"_source": {
"field1": "22222222222222",
"field2": "y",
"name": "hello wombat"
}
}
]
}
}
use simple_query_string, This approach solved my issue:
{
"query": {
"bool": {
"should": [
{
"simple_query_string": {
"fields": [
"name"
],
"default_operator": "and",
"query": "(hello world*)"
}
}
]
}
}
}

Resources