Elasticsearch wildcard query not honoring the analyzer of the field - elasticsearch

I have a field named "tag" which is analyzed(default behavior) in elasticsearch. The "tag" field can have a single word or a comma separated string to store multiple tags. For eg. "Festive, Fast, Feast".
Now for example if a tag is "Festive", before indexing I am converting it to small case(to ignore case sensitivity) and indexing it as "festive".
Now if I search using a match query with all caps letters as mentioned below I get results fine(as expected).
{
"query": {
"match": {
"tag": "FESTIVE"
}
}
}
But if I do a wildcard query as mentioned below I don't get results :(
{
"query": {
"wildcard": {
"tag": {
"value": "F*"
}
}
}
}
If I change the value field in wildcard search to "f*" instead of "F*" then I get results.
Does anyone have any clue why is wildcard query behaving case sensitive?

Wildcard queries, fall under term level queries and hence not analyzed. From the Docs
Matches documents that have fields matching a wildcard expression (not
analyzed)
You will get expected results with query string query, it will lowercase the terms because by default as lowercase_expanded_terms is true. Try this
GET your_index/_search
{
"query": {
"query_string": {
"default_field": "tag",
"query": "F*"
}
}
}
Hope this helps!

Related

Elatisearch match_phrase_prefix query, with exact prefix match

I have a match_phrase_prefix query, which works as expected. But when the users passes any special characters at the end of the keyword, ES ignores these characters, and still returns the result.
query{ match_phrase_prefix:{ content: { query: searchTerm } } }
I am using this query to search for prefix. If i pass a term like overflow####!! ES is returning me all the results with the word overflow in it. But instead i want to make an exact prefix match, where the special characters are not ignored. The search term could be of multiple words as well stack overflow search.
How could i make ES search of prefix_match without ignoring the special_chars.
You can use keyword analyzer when defining your query.
{
"query": {
"match_phrase_prefix": {
"content": {
"query": "overflow####!!",
"analyzer": "keyword"
}
}
}
}

Find one result based on a term query or a list of results based on a match query

I have an index of documents, each containing an id and name field. Each document name happens to be unique.
I want to perform a query on the name field that returns one exact result if possible, or falls back to return a list of similar results. For example, if the search term is Acme Incorporated and there is an exact result, return that only. Otherwise return similar matches; e.g: ACME Inc., acme, Ace etc.
I assumed that I need to somehow combine a keyword-based term query for an exact match, and a text-based match query for the similar matches. I am still getting to grips with compound queries so my first attempt was pretty naive:
{
"query": {
"bool": {
"should": [
{
"term": {
"name.exact": "Acme Incorporated"
}
},
{
"match": {
"name": "Acme Incorporated"
}
}
]
}
}
}
This returns a list of similar matches AND an exact match if present, because at least one query should succeed. This is obviously not correct.
In order to facilitate the keyword-based term query above, I added name.exact to my document mapping:
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"exact": {
"type": "keyword"
}
}
}
}
}
}
I suppose another approach is use the Multi Search API to perform the above queries separately. This allows me to look at the responses, and decide to use the match query if the term query result set is empty. This will work for my use case but I suspect that this is not an optimal approach.
I assume this is a common use-case but I am not sure what the solution is.
Edit
My current thinking on this is that I go with a Multi Search query as described above, the first is the same keyword-based term query to attempt to find an exact result and the second is the following — a compound bool query that excludes an exact result.
{
"query": {
"bool": {
"must": {
"match": {
"name": "Acme Incorporated"
}
},
"must_not": {
"term": {
"name.keyword": "Acme Incorporated"
}
}
}
}
}
In the end, the MultiSearch API suited my use case:
The multi search API executes several searches from a single API request. The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format.
I used this to perform two queries in one request:
Find any exact results with a keyword-based term query on the document name field.
Find any similar results with a bool query, comprising a match query on the
document name field, and a must_not of the first query to
filter out any exact results.
A Multi Search body is constructed of one or more pairs of an (optionally) empty header and body (a single query) delimited by newlines; e.g:
GET /myindex/_msearch
{}
{"query": {"constant_score": {"filter": {"term": {"name.keyword": "Acme Incorporated"}}}}}
{}
{"query": {"bool": {"must": {"match": {"name": "Acme Incorporated"}}, "must_not": {"term": {"name.keyword": "Acme Incorporated"}}}}}
The query is in ndjson format, which states that "Each Line is a Valid JSON Value". This requires that each query be compressed to one line, which is not very readable but not an issue if you're using a library to construct queries.

How to match exact word using query_string

I have an Elasticsearch field values slim and extra slim, If I search for slim I'm getting extra slim included documents as a result. I want to match the exact word. I used fieldName.keyword while querying but It did'nt work if the field has multiple words.
The query I used is
{"query_string": {"query": "(fit:slim)" } }
How to match only specified value using query_string?
When looking for exact match against a field use term query on keyword field.
Query:
{
"query": {
"term": {
"fit.keyword": "slim"
}
}
}
UPDATE: Via query_string
For exact match using query_string wrap the string to be matched in quotes.
{
"query": {
"query_string": {
"query": "fit.keyword:\"extra slim\""
}
}
}

Elasticsearch find missing word in phrase

How can i use Elasticsearch to find the missing word in a phrase? For example i want to find all documents which contain this pattern make * great again, i tried using a wildcard query but it returned no results:
{
"fields": [
"file_name",
"mime_type",
"id",
"sha1",
"added_at",
"content.title",
"content.keywords",
"content.author"
],
"highlight": {
"encoder": "html",
"fields": {
"content.content": {
"number_of_fragments": 5
}
},
"order": "score",
"tags_schema": "styled"
},
"query": {
"wildcard": {
"content.content": "make * great again"
}
}
}
If i put in a word and use a match_phrase query i get results, so i know i have data which matches the pattern.
Which type of query should i use? or do i need to add some type of custom analyzer to the field?
Wildcard queries operate on terms, so if you use it on an analyzed field, it will actually try to match every term in that field separately. In your case, you can create a not_analyzed sub-field (such as content.content.raw) and run the wildcard query on that. Or just map the actual field to not be analyzed, if you don't need to query it in other ways.

How to deal with punctuation in an ElasticSearch field

I have a field in a document stored in Elastic Search, which I want to be analyzed as a full text field. In one case, it contains a value for the name field like this:
A&B Corp
I want to be able to search the documents for an auto-complete widget, using a query like this (suppose the user typed A&B into the autocomplete field). The intention is to match documents that contain the any terms with the typed prefix.
{ "query": {
"filtered": {
"query": {
"query_string": {
"query": "A&B*",
"fields": [
"firstName",
"lastName",
"name",
"key",
"email"
]
}
},
"filter": {
"terms": {
"environmentId": [
"foo"
]
}
}
}
}
}
```
My mapping for the name field looks like this:
"name": {
"type": "string"
},
But, I get no results. The query structure works for documents that don't have & in the field, so I'm pretty sure that is part of the problem.
But, I'm not sure how to deal with this. I am pretty sure I still want to analyze the field for full text search.
In addition, if I add a space before the * in the query (ie, "query": "A&B *",) then I get results including A&B, so I don't think it is just discarding the ampersand and treating the A and B as separate terms.
Should I change my mapping? The query?
The Query_string query has a set of reserved characters that needs to be escaped.
query_string : Read the reserved characters section
So to search for
'A&B' (or) 'A&B Corp' (or) 'A&B....'
Your query must be "A&B\\*" such that the query_string parser treats
it as a * wildcard operator.
While currently your query is searching for exact match of
"A&B*" it expects asterik to be part of your data.
And when you search "A&B *" the whitespace is a reserved
character so its
now searching for "A&B" (or) "*" and hence you get a match in this
case.

Resources