How to search for an exact value in a keyword array field in ElasticSearch? - elasticsearch

I have a Keyword field that is an array and am trying to search for documents that contain one of the values in the array.
A document contains a field called allowed_groups:
"allow_groups" : [
"c4e3f246-0b1f-43cc-831e-37ca620bf083"
],
I have tried a match query
...
"must":[
{
"match": {
"allow_groups": {
"query": "c4e3f246-0b1f-43cc-831e-37ca620bf083"
}
}
},
...
This returns results but as soon as I change a single character (3 at the end of the value to a 4), the documents still return. I need to change the value to something much different for the documents to not return.
I have also tried a term query in its place but cant get documents to come back at all.
I also should mention that in the end I am trying to pass an array of values to be matched against the allow_groups keyword array and return a document when theres at least 1 single exact match. Im just testing with 1 value currently.

Related

Terms query does not work on keyword field which contains an array of values

I am a beginner in Elasticsearch. I recently added a new field jc_job_meta_field which is of keyword type (see image 1 below as I output the mapping of all my fields) and my index is en-gb. I expect it to be an array to hold a bunch of values. And I now have a document with ["Virtual", "Hybrid"] in that field. I wanted to have the ability to search all entries with Virtual in the field jc_job_meta_field. But now when I do a term query search like this
{
"query": {
"terms": {
"jc_job_meta_field": ["Virtual"]
}
}
}
Nothing returned (see image 2 below). Shouldn't it at least return that exact document with [Virtual, Hybrid]? I checked a similar post here and it seems like I am doing exactly what's supposed to work. What went wrong here? Thanks in advance!
My Mapping and field values:
My query:

The boolean fuzzy query in elasticsearch is not returning expected result

I am trying to build a fuzzy bool query on first and last names in elasticsearch 7.2.0. I have a document with "asim" and "banskota" as first and last name respectively. But when I query with "asi" or "asimmm" and the exact last name, elasticsearch returns no result. However, when queried with exact first name or "asimm", it returns me the intended result from the document.
I also wrote a "fuzzy" query instead of "match". I experimented with different fuzziness parameters, but the outcome is same. Both first name and last names are analyzed, and I queried the 'analyzer' API wrt how it analyze
'asim'. It is indexing the document with 'asim' as a single token with standard analyzer.
EDIT: It turns out that the fuzzy query works with 'Substitution' case, for example, it returns the result for 'asim' when queried with 'asmi' but not for deletion. It is surprising to me as the edit distance in the substitution is greater than in the deletion case. When the string length is greater, for instance with the last name 'Banskota', fuzzy matching works for 'deletion' case as well. What should I do to make the fuzzy search work in 'deletion' case with string length of 4 or 5?
fuzzy_body = {"size": 10,
"query":{
"bool":{
"must": [
{
"match":{"FIRST_NAME_N":{'query': 'asi',"fuzziness": "AUTO"}},
},
{
"fuzzy":{"LAST_NAME_N": "banskota"}
}
]
}
}
}
It turns out that if the name fields are indexed as keyword type, the query returns the expected results with "AUTO" fuzziness.

Elasticsearch how to match documents for which the field tokens are a sub-set of the query tokens

I have a keyword/key-phrase field I tokenize using standard analyser. I want this field to match if if there is a search phrase that has all tokens of this field in it.
For example if the field value is "veni, vidi, vici" and the search phrase is "Ceaser veni,vidi,vici" I want this search phrase to match but search phrase "veni, vidi" not match.
I also need "vidi, veni, vici" (weird!) to match. So the positions and ordering of the terms is not really important. A phrase match would not quite work for me I think.
I can use "bool query" with "minimum_should_match" parameter for this specific example but that is not really what I want as minimum should match is about ratio/number of tokens in the search phrase.
Pure ES solution would go like this. You will need two requests.
1) First you need to pass user query through analyze api to get all the search tokens.
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "standard",
"text" : "Ceaser veni,vidi,vici"
}'
you will get 4 tokens ceaser, veni, vidi, vici . You need to pass these tokens as an array to next search request.
2) We need to search for documents whose tokens are subset of search tokens.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"query": {
"match": {
"title": "Ceaser veni,vidi,vici"
}
}
},
{
"script": {
"script": "if(search_tokens.containsAll(doc['title'].values)){return true;}",
"params": {
"search_tokens": [
"ceaser",
"veni",
"vidi",
"vici"
]
}
}
}
]
}
}
}
}
}
Here job of first match query inside the filter is to narrow down the documents on which script should run. containsAll method will check if the documents tokens are sublist of search tokens. This will be slow but will do the job with your current set up. One big improvement you can do is store tokens as an array so that doc['title'].values can be replaced with that field which will improve the script.
Hope this helps!
No built-in solution but this works:
Add an extra field with the number of terms in the field for each document. So in your "veni, vidi, vici" example, you would have a field like "field_term_count" : 3.
Perform a separate match search for each token in the search query.
Sum the number of searches that matched for each document with at least one match (e.g. a hashtable with key of document ID and value of count).
Compare the number of matches in 3 to the "field_term_count" field for each of the documents with matches. If they are equal then the document is a match.
Then "Ceaser veni,vidi,vici" will match but the search phrases "veni, vidi" will not, as desired. It should be quite fast for reasonable numbers of matches.

Terms Include elastic search numeric values

The question I have is if there is a way to use a terms include on a numeric field in an elasticsearch aggregation.
I am using a generic query for multiple fields in elastic search and this is fine as most of my fields are string values and I can specify the unique field with an include. However one of my fields is a numeric value and is throwing this error:
"cannot support regular expression style include/exclude settings as
they can only be applied to string fields"
So my question is, is there a an equivalent to string matching include for numeric values? I have tried using a range set from say 9 to 9 to match but it is not returning anything and unfortunately is keyed by the specified range and not the value of the specified field which is what I desire. Any input would be appreciated!
Thanks!
You can pass numbers inside an array like this for exact match
{
"size": 0,
"aggs": {
"numeric_agg": {
"terms": {
"field": "my_field",
"include": [1,2,3]
}
}
}
}
Hope this helps!

Is it possible to chain fquery filters in elastic search with exact matches?

I have been having trouble writing a method that will take in various search parameters in elasticsearch. I was working with queries that looked like this:
body:
{query:
{filtered:
{filter:
{and:
[
{term: {some_term: "foo"}},
{term: {is_visible: true}},
{term: {"term_two": "something"}}]
}
}
}
}
Using this syntax I thought I could chain these terms together and programatically generate these queries. I was using simple strings and if there was a term like "person_name" I could split the query into two and say "where person_name match 'JOHN'" and where person_name match 'SMITH'" getting accurate results.
However, I just came across the "fquery" upon asking this question:
Escaping slash in elasticsearch
I was not able to use this "and"/"term" filter searching a value with slashes in it, so I learned that I can use fquery to search for the full value, like this
"fquery": {
"query": {
"match": {
"by_line": "John Smith"
But how can I search like this for multiple items? IT seems that when i combine fquery and my filtered/filter/and/term queries, my "and" term queries are ignored. What is the best practice for making nested / chained queries using elastic search ?
As in the comment below, yes I can just add fquery to the "and" block like so
{:filtered=>
{:filter=>
{:and=>[
{:term=>{:is_visible=>true}},
{:term=>{:is_private=>false}},
{:fquery=>
{:query=>{:match=>{:sub_location=>"New JErsey"}}}}]}}}
Why would elasticsearch also return results with "sub_location" = "new York"? I would like to only return "new jersey" here.
A match query analyzes the input and by default it is a boolean OR query if there are multiple terms after the analysis. In your case, "New JErsey" gets analyzed into the terms "new" and "jersey". The match query that you are using will search for documents in which the indexed value of field "sub_location" is either "new" or "jersey". That is why your query also matches documents where the value of field "sub_location" is "new York" because of the common term "new".
To only match for "new jersey", you can use the following version of the match query:
{
"query": {
"match": {
"sub_location": {
"query": "New JErsey",
"operator": "and"
}
}
}
}
This will not match documents where the value of field "sub_location" is "New York". But, it will match documents where the value of field "sub_location" is say "York New" because the query finally translates into a boolean query like "York" AND "New". If you are fine with this behaviour, well and good, else read further.
All these issues arise because you are using the default analyzer for the field "sub_location" which breaks tokens at word boundaries and indexes them. If you really do not care about partial matches and want to always match the entire string, you can make use of custom analyzers to use Keyword Tokenizer and Lowercase Token Filter. Mind you, going ahead with this approach will need you to re-index all your documents again.

Resources