Elasticsearch query with fields giving subset of results - elasticsearch

I am new to Elasticsearch. This is the how my document look like :
_source :
{
"name": "this is my title",
"address" : "1300 S Belmont Road"
"ID= : 54000"
}
When i run this query :
Query 1 :
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
"fields": ["name^5", "address^4","ID^3"]
}
},
"filter": {...}
}
}
I get 51 results
Query 2:
But this one gives 123 results :
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
}
},
"filter": {...}
}
}
Why is it that the queries give different results even thogh I am Running the query on all the fields in Query 1
Mappings :
Address and Name are both string and "not_analyzed"

This is because the way _all field works. Your first query is looking for *Belmont* in specified fields with specific analyzer honored. It is internally converted to bool query and matched with each field individually.
Since address is not_analyzed, 1300 S Belmont Road will be stored as it is but _all field will have space delimited words with standard analyzer applied like 1300, s , belmont etc. From the Doc
The _all field is a special catch-all field which concatenates the
values of all of the other fields into one big string, using space as
a delimiter, which is then analyzed and indexed, but not stored.
so your second query operates on _all field and gives you more results.
Also your first query wont match "address" : "1300 S Belmont Road" as by default it will be lowercased while using wildcard so it will search for belmont and wont find it. You can change this behavior with lowercase_expanded_terms which is true by default. Try this
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*Belmont*",
"fields": ["name^5", "address^4","ID^3"],
"lowercase_expanded_terms" : false
}
},
"filter": {...}
}
}
You might get more results depending on how you have stored names and address.
Hope this helps!

Related

Elasticsearch exact search query

I'm using query string to search on documents in my index.
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "table test",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
the problem is that it returns all additional strings that include search keywords.. I wanna to give strings that have exact phrase.
for example the documents table test 1 and table test 12 and table test are in my index. when I search table test, I wanna it just return table test.
I used term also, but it could not consider space charter between strings!
how can I handle this?
your mapping is generated by Elasticsearch, than for every text field there will be a corresponding .keyword field and hence
{
"query": {
"term": {
"table.name.kwyword": { // Note .keyword in the field name.
"value": "table test",
"boost": 1.0
}
}
}
if you don't have a .keyword field, then you have to create a keyword field and use term query that is used for exact or keyword searches.
You can use Match Phrase Query as Amit suggested in another answer.
Also, if you want to use only Query String type of query then you can give your query in double quotes as shown below:
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "\"table test\"",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
Updated:
if you want to do exact match in entire field then you can go ahead with term query in elasticsearch:
{
"query": {
"term": {
"table.name.keyword": {
"value": "table test",
"boost": 1.0
}
}
}
}

Filter results from Elasticsearch if only a specific field matches

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.
There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

Different relevance of fields in elasticsearch query

I have about 10 fields in my elastic index. I want to search over all these fields. So I set no fields parameter in my query:
GET /_search
{
"query": {
"query_string": {
"query": "this OR that"
}
}
}
Now I want to set the field "title" more relevant. I know that I can do this by:
"fields": ["title^5"]
My problem is that in this case I only search over the field "title", isn't it?
is there a possibility to search over all fields but set one of these fields more relevant?
What I suggest is to specify all the 10 fields you want to search on so you can boost specific ones, like this:
GET /_search
{
"query": {
"query_string": {
"query": "this OR that",
"fields": ["title^5", "field2", "field3", ...]
}
}
}

exact query search in elasticsearch

I have this query that returns if the word "mumbai" appear anywhere in the title.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"title": "mumbai"
}
}
}
}
}
So the result contains...
mumbai
mumbai ports
financial capital mumbai
I need to return only "mumbai" term and not the other documents where mumbai word is associated with other phrases. Only the first result is correct. How do I discard other results?
update
This query is working as expected and it lists the sort value 58 (random value) if the match is exact.
curl -XPOST "localhost:9200/enwiki_content/page/_search?pretty" -d'
{
"fields": "title",
"query": {
"match": {"title": "Mumbai"}
},
"sort": {
"_script": {
"script": "_source.title == \"Mumbai\" ? \"58\": \"78\";",
"type": "string"
}
}
}'
I need to return the title where match is exact Mumbai (and hence the sort value 58). How do I filter or add the script to "fields" parameter?
To get mumbai to match with doc which contains only mumbai and nothing else, you'll have to store a token count field for the field you are searching on.
This token count field will contain the number of tokens the field contains. Using this field, you can match mumbai on your title field, and match token_count field with the number of tokens in mumbai (which is one).
Note that token_count field in other documents will more than 1.
For reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html
Note: If you are using stopwords, then you need to know about the other caveats related to token count. You can find the information in the above link.
Try the term query. It will do exact match search
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "mumbai"
}
}
]
}
}
}
Term query will not match Mumbai and mumbai, it will be counted as different words
Second Option:
If you can change the mapping then you can set the title field as not_analyzed
Third Option
match query with analyzer option
{
"query": {
"match": {
"title": {
"query": "mumbai",
"analyzer": "keyword"
}
}
}
}

AND between tokens in elasticsearch

When I'm trying to search for a documents with such query (field indexed with Standard analyzer):
"query": {
"match": {
"Book": "OG/44"
}
}
I've got terms 'OG' and '44' and the result set will contain results where could be either of these terms. What analyzer/tokenizer I should use to get results when only both of terms are present?
You can set operator in match query (by default it is or)
"query": {
"match": {
"Book": {
"query": "OG/44",
"operator" : "and"
}
}
}
You have two tokens because standard analyzer tokenized them by slash, so if you need not this behaviour you can escape it

Resources