exact query search in elasticsearch - elasticsearch

I have this query that returns if the word "mumbai" appear anywhere in the title.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"title": "mumbai"
}
}
}
}
}
So the result contains...
mumbai
mumbai ports
financial capital mumbai
I need to return only "mumbai" term and not the other documents where mumbai word is associated with other phrases. Only the first result is correct. How do I discard other results?
update
This query is working as expected and it lists the sort value 58 (random value) if the match is exact.
curl -XPOST "localhost:9200/enwiki_content/page/_search?pretty" -d'
{
"fields": "title",
"query": {
"match": {"title": "Mumbai"}
},
"sort": {
"_script": {
"script": "_source.title == \"Mumbai\" ? \"58\": \"78\";",
"type": "string"
}
}
}'
I need to return the title where match is exact Mumbai (and hence the sort value 58). How do I filter or add the script to "fields" parameter?

To get mumbai to match with doc which contains only mumbai and nothing else, you'll have to store a token count field for the field you are searching on.
This token count field will contain the number of tokens the field contains. Using this field, you can match mumbai on your title field, and match token_count field with the number of tokens in mumbai (which is one).
Note that token_count field in other documents will more than 1.
For reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html
Note: If you are using stopwords, then you need to know about the other caveats related to token count. You can find the information in the above link.

Try the term query. It will do exact match search
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "mumbai"
}
}
]
}
}
}
Term query will not match Mumbai and mumbai, it will be counted as different words
Second Option:
If you can change the mapping then you can set the title field as not_analyzed
Third Option
match query with analyzer option
{
"query": {
"match": {
"title": {
"query": "mumbai",
"analyzer": "keyword"
}
}
}
}

Related

Elasticsearch exact search query

I'm using query string to search on documents in my index.
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "table test",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
the problem is that it returns all additional strings that include search keywords.. I wanna to give strings that have exact phrase.
for example the documents table test 1 and table test 12 and table test are in my index. when I search table test, I wanna it just return table test.
I used term also, but it could not consider space charter between strings!
how can I handle this?
your mapping is generated by Elasticsearch, than for every text field there will be a corresponding .keyword field and hence
{
"query": {
"term": {
"table.name.kwyword": { // Note .keyword in the field name.
"value": "table test",
"boost": 1.0
}
}
}
if you don't have a .keyword field, then you have to create a keyword field and use term query that is used for exact or keyword searches.
You can use Match Phrase Query as Amit suggested in another answer.
Also, if you want to use only Query String type of query then you can give your query in double quotes as shown below:
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "\"table test\"",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
Updated:
if you want to do exact match in entire field then you can go ahead with term query in elasticsearch:
{
"query": {
"term": {
"table.name.keyword": {
"value": "table test",
"boost": 1.0
}
}
}
}

Filter results from Elasticsearch if only a specific field matches

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.
There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

increase score of query where all text match and not repeating words

I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.
If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.
you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.

Elasticsearch query with fields giving subset of results

I am new to Elasticsearch. This is the how my document look like :
_source :
{
"name": "this is my title",
"address" : "1300 S Belmont Road"
"ID= : 54000"
}
When i run this query :
Query 1 :
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
"fields": ["name^5", "address^4","ID^3"]
}
},
"filter": {...}
}
}
I get 51 results
Query 2:
But this one gives 123 results :
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
}
},
"filter": {...}
}
}
Why is it that the queries give different results even thogh I am Running the query on all the fields in Query 1
Mappings :
Address and Name are both string and "not_analyzed"
This is because the way _all field works. Your first query is looking for *Belmont* in specified fields with specific analyzer honored. It is internally converted to bool query and matched with each field individually.
Since address is not_analyzed, 1300 S Belmont Road will be stored as it is but _all field will have space delimited words with standard analyzer applied like 1300, s , belmont etc. From the Doc
The _all field is a special catch-all field which concatenates the
values of all of the other fields into one big string, using space as
a delimiter, which is then analyzed and indexed, but not stored.
so your second query operates on _all field and gives you more results.
Also your first query wont match "address" : "1300 S Belmont Road" as by default it will be lowercased while using wildcard so it will search for belmont and wont find it. You can change this behavior with lowercase_expanded_terms which is true by default. Try this
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
"fields": ["name^5", "address^4","ID^3"],
"lowercase_expanded_terms" : false
}
},
"filter": {...}
}
}
You might get more results depending on how you have stored names and address.
Hope this helps!

Finding an exact phrase in multiple fields with Elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.
If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)
how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

Resources