How to make query_string search exact phrase in ElasticSearch - elasticsearch

I put 2 documents in Elasticsearch :
curl -XPUT "http://localhost:9200/vehicles/vehicle/1" -d'
{
"model": "Classe A"
}'
curl -XPUT "http://localhost:9200/vehicles/vehicle/2" -d'
{
"model": "Classe B"
}'
Why is this query returns the 2 documents :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe A\""
}
}
}'
And this one, only the second document :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe B\""
}
}
}'
I want elastic search to match on the exact phrase I pass to the query parameter, WITH the whitespace, how can I do that ?

What you need to look at is the analyzer you're using. If you don't specify one Elasticsearch will use the Standard Analyzer. It is great for the majority of cases with plain text input, but doesn't work for the use case you mention.
What the standard analyzer will do is split the words in your string and then converts them to lowercase.
If you want to match the whole string "Classe A" and distinguish this from "Classe B", you can use the Keyword Analyzer. This will keep the entire field as one string.
Then you can use the match query which will return the results you expect.
Create the mapping:
PUT vehicles
{
"mappings": {
"vehicle": {
"properties": {
"model": {
"type": "string",
"analyzer": "keyword"
}
}
}
}
}
Perform the query:
POST vehicles/_search
{
"query": {
"match": {
"model": "Classe A"
}
}
}
If you wanted to use the query_string query, then you could set the operator to AND
POST vehicles/vehicle/_search
{
"query": {
"query_string": {
"query": "Classe B",
"default_operator": "AND"
}
}
}

Additionally, you can use query_string and escape the quotes will also return an exact phrase:
POST _search
{
"query": {
"query_string": {
"query": "\"Classe A\""
}
}

use match phrase query as mentioned below
GET /company/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}

Seems like in the latest versions of ES you can just use .keyword
POST vehicles/_search
{
"query": {
"term": {
"model.keyword": "Classe A"
}
}
}
It will match exactly the string "Classe A"
Dynamic fields determined by ES as text will have a subfield 'keyword', very useful for this cases:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

Another nice solution would be using match and minimum_should_match(providing the percentage of the words you want to match). It can be 100% and will return the results containing at least the given text;
It is important that this approach is NOT considering the order of the words.
"query":{
"bool":{
"should":[
{
"match":{
"my_text":{
"query":"I want to buy a new new car",
"minimum_should_match":"90%"
}
}
}
]
}
}

Related

how to write Elastic search query for exact match for a string

I am using kibanna
I am trying to put filter on a field container_name = "armenian"
but I have other container names with following names
armenian_alpha
armenian_beta
armenian_gama
armenian1
armenian2
after putting the filter , search query in kibanna becomes
{
"query": {
"match": {
"container_name": {
"query": "armenian",
"type": "phrase"
}
}
}
}
But the output searches logs for all containers , as I can see the Elastic search query is using a pattern matching
How can I put an exact match with the string provided and avoid the rest ?
You can try out with term query. Do note that it is case sensitive by default unless you specify with case_insensitive equals to true. Also, if your container_name is a text field type instead of keyword field type, do add the .keyword after the field name. Otherwise, ignore the .keyword.
Example:
GET /_search
{
"query": {
"term": {
"container_name.keyword": {
"value": "armenian"
}
}
}
}
Link here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
I would recommend using a direct wildcard in query or wildcard as follow
GET /_search
{
"query": {
"match": {
"container_name": {
"query": "*armenian",
"type": "phrase"
}
}
}
}
GET /_search
{
"query": {
"wildcard": {
"container_name": {
"value": "*armenian"
}
}
}
}
With *armenian you are ensuring that armenian comes at the end.

Query string query with keyword and text fields in the same search

Upgrading from Elasticsearch 5.x to 6.x. We make extensive use of query string queries and commonly construct queries which used fields of different types.
In 5.x, the following query worked correctly and without error:
{
"query": {
"query_string": {
"query": "my_keyword_field:\"Exact Phrase Here\" my_text_field:(any words) my_other_text_field:\"Another phrase here\" date_field:[2018-01-01 TO 2018-05-01]",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
In 6.x, this query will return the following error:
{
"type": "illegal_state_exception",
"reason": "field:[my_keyword_field] was indexed without position data; cannot run PhraseQuery"
}
If I wrap the phrase in parentheses instead of quotes, the search will return 0 results:
{
"query": {
"query_string": {
"query": "my_keyword_field:(Exact Phrase Here)",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
I guess this is because there is a conflict between the way the analyzer stems the incoming query and how the data is stored in the keyword field, but the phrase approach (my_keyword_field:"Exact Phrase Here") did work in 5.x.
Is this no longer supported in 6.x? And if not, what is the migration path and/or a good workaround?
It would be better to rephrase the query by using different type of queries available for different use cases. For example use term query for exact search on keyword field. Use range query for ranges etc.
You can rephrase query as below:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "my_text_field:(any words) my_other_text_field:\"Another phrase here\"",
"default_operator": "AND",
"analyzer": "custom_text"
}
},
{
"term": {
"my_keyword_field": "Exact Phrase Here"
}
},
{
"range": {
"date_field": {
"gte": "2018-01-01",
"lte": "2018-05-01"
}
}
}
]
}
}
}

elasticsearch added wildcard fails query

Works as expected:
{
"query": {
"query_string": {
"query": "Hofstetten-Grünau"
}
}
}
an added wildcard at the end delivers no results and I wonder why:
{
"query": {
"query_string": {
"query": "Hofstetten-Grünau*"
}
}
}
how to fix it?
elasticsearch v5.3.2
This delivers results:
{
"query": {
"query_string": {
"query": "Hofstetten*"
}
}
}
I use a single search field. The end user can freely use wildcards as they see fit. A user might type in:
hofstetten grünau
+ort:hofstetten-grünau
+ort:Hofstetten-G*
so using a match query wont work out for me.
I am using Jest (Java Annotations) as Mapping, and using "default" for this field. My index mapping declares nothing special for the field:
{
"mappings": {
"_default_": {
"date_detection": false,
"dynamic_templates": [{
}]
}
}
}
Adding the wildcard "*" at the end of your query string is causing the query analyzer to interpret the dash between "Hofstetten" and "Grünau" as a logical NOT operator. So you're actually searching for documents that contain Hofstetten but do NOT contain Grünau.
You can verify this by doing the following variations of your search:
"query": "Hofstetten-XXXXX" #should not return results
"query": "Hofstetten-XXXXX*" #should return results
To fix this I would recommend using a match query instead of a query_string query:
{"query": {"match": { "city": "Hofstetten-Grünau" }}}'
(with whatever your appropriate field name is in place of city).

Find exact match phrase in ElasticSearch

So I have the following ElasticSearch query:
"query": {
"bool": {
"must": [
{
"nested": {
"path": "specs",
"query": {
"bool": {
"must": [
{
"match": {
"specs.battery": "2 hours"
}
}
],
"minimum_should_match": 1
}
}
}
},
{
"terms": {
"category_ids": [
16405
]
}
}
]
}
}
At the moment it returns all documents that have either 2 or hours in specs.battery value. How could I modify this query, so that it only returns documents, that have exact phrase 2 hours in specs.battery field? As well, I would like to have the ability to have multiple phrases (2hrs, 2hours, 3 hours etc etc). Is this achievable?
The data in elasticsearch is by default tokenized when you index it. This means the result of indexing the expression "2 hours" will be 2 tokens mapped to the same document.
However there will not be a one token "2 hours", therefore it will either search 2 or hours or even will not find it if you use a filtered query.
To have Elasticseach consider "2 hours" as one expression you need to define specs.battery as not_analyzedin your mapping like follows:
curl -XPOST localhost:9200/your_index -d '{
"mappings" : {
"your_index_type" : {
"properties" : {
...
"battery" : { "type" : "string", "index":"not_analyzed" }
...
}
}
}
}'
Then you can have an exact match using a filtered query as follows:
curl -XGET 'http://localhost:9200/_all/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"filter" : {
"term": {
"battery": "2 hours"
}
}
}
}
}'
Then you'll have an exact match.
More details at: https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
If on the other hand you absolutely need your field to be analyzed or work with an existing index that you can't change, you still have a solution by using the operator "and" like follows:
curl -XGET localhost:9200/your_index' -d '
{
"query": {
"match": {
"battery": {
"query": "2 hours",
"operator": "and"
}
}
}
}'
In the last option, you may have understood already that if you have a document that has "2 hours and something else" , the document will still be matched so this is not as precise as with an "not_analyzed" field.
More details on the last topic at:
https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html

Find misspelled documents in elasticsearch

I have an Author document in my elasticsearch index. I have a user input to put new author in the index.
Before storing those new Author, I want to check if the Author already exist in the index, even if it was first misspelled.
I'm doing fuzzy search that seems to be the way of doing this.
Here is the request I'm doing:
curl 'http://localhost:9200/my_index/Author/_search?pretty' -d '{
"query":
{
"fuzzy": {
"name": {
"value": "put a name here"
}
}
}
}'
Given I have an Author named "Daniel Bluefield".
The above request works well when I search "Danel".
But it don't return anythin if I search the full name, it did not return any result.
How can I make a request for "Danel Bluefld" returns some results ?
Change it to fuzzy_like_this_field,you might need to tweak the fuzziness parameter
curl 'http://localhost:9200/my_index/Author/_search?pretty' -d '{
"query":
{
"fuzzy_like_this_field" : {
"name" : {
"like_text" : "Danel Bluefld",
"max_query_terms" : 10
}
}
}
}'
The Mihai works well, however, I've managed to make it work another way:
{
"min_score": 3,
"query": {
"match": {
"name": {
"query": "danil greenfld",
"fuzziness": "AUTO"
}
}
}
}
But I can't really see the difference between those two queries...

Resources