ElasticSearch queries for Ngramm - elasticsearch

I am trying to make search for such case
for example i have document
1)"There are a lot of diesel cars in the city"
2)"Cars have diesel engines"
3)"Bob sold diesel car"
and I want to find doc 1 and doc 3
if I wrote such query
"query":
{
"function_score":
{ "query":
{"bool":
{"should":[
{"query_string":
{ "fields" : ["text"],
"query" : "\"diesel car\"~1^5"
}}]}}}}
I will find doc1 but not doc3
Is it possible if i use Ngramm analyser this query will work also for doc3?
Or maybe there are other solutions?
Proximity search works only for totally exact phrases if only one character in word change then it's not work. Maybe ES have other solutions for that?

I found the solution
1)Use english stemmer to settings and mapping
2)Use simple query like
(diesel AND car)^5

Related

How to boost individual words in a elasticsearch match query

Suppose I want to query "Best holiday places to visit during summer" in a Elasticsearch cluster. But I want holiday, visit and summer to have high priority than other words:
Something Like this: Best holiday^4 places to visit^3 during summer^2.
I know about field boosting but what I want to do is not achievable by boost.
Basically I want to boost individual words.
Does any one have any idea about doing this in Elasticsearch 5.6 above??
You could use query_string to boost individual terms like this:
{
"query" : {
"query_string" : {
"fields" : ["content", "name"],
"query" : "Best holiday^4 places to visit^3 during summer^2"
}
}
}

Elastic Search Multimatch: Is there a way to search all fields except one?

We have an Elastic Search structure that specifies fields in a multi_match query like this:
"multi_match": {
"query": "find this string",
"fields": ["*_id^20", "*_name^20", "*"]
}
This works great - except under certain circumstances like when query is "Find NOWAK". This is because "NOW" is a reserved word for date searching and field "*" matches fields that are defined as dates.
So what I would like to do is ignore fields that match "*_at".
Is there way to tell Elastic Search to ignore certain fields in a multi_match query?
If the answer to that is "no" then the follow up question is how to escape the search term so that it won't trigger key words
Running version 6.7
Try this:
Exclude a field on a Elasticsearch query
curl -XGET 'localhost:9200/testidx/items/_search?pretty=true' -d '{
"query" : {
"query_string": {
"fields": ["title", "field2", "field3"], <-- add this
"query": "Titulo"
}},
"_source" : {
"exclude" : ["*.body"]
}
}'
Apparently the answer is "No: there is not a way to tell ElasticSearch to ignore certain fields in a multi_match query"
For my particular issue I found an inexpensive way to find the necessary white-listed fields (this is performed outside the scope of ElasticSearch otherwise I would post it here) and list those in place of the "*" when building the query.
I am hopeful someone will tell me I'm wrong, but I don't think I am.

Elasticsearch query multi_match

I'm trying to create an elasticsearch query that looks for multiple fields. This works fine so far. However, I would like to refine this.
Let's say the word was indexed: "test". However, when I search for "tes" he does not find that word for me, but I would like to show it already - but the combination with my query brings me to a challenge.
{
"multi_match" : {
"query": "*" + query + "*",
"type": "cross_fields",
"operator": "and",
"fields": ["article.number^1","article.name_de^1", "article.name_en^5", "article.name_fr^5", "article.description^1"],
"tie_breaker": 0,
}
Depending on your constraints, here are your options.
If you wish to use wildcard before/after your search term, you can use wildcard query. This has high processing cost at query time.
If you are fine with additional storage cost, you can opt to tokenize your input during analysis process. See ngram tokenizer. Beware that if you have long strings, it can quickly explode the storage requirement.

Matches on different words should score higher then multiple matches on one word in elasticsearch

In our elasticsearch we have indexed some persons where each person can have multiple taggings.
Take for example 2 persons (fullname - (taggings)):
Bart Newman - (bart,engineer,ceo)
Bart Holland - (developer,employer)
Our searchquery
{
"multi_match": {
"type": "most_fields",
"query": "bart developer",
"operator": "or",
"boost": 5,
"fields": [
"fullname^5",
"taggings.tag.name^5"
],
"fuzziness": 0
}
}
Let's say we are searching on "bart developer". Then we should expect that Bart Holland should come before Bart Newman, but because Bart Newman has bart in his fullname and bart as tag, he scores higher then Bart Holland does.
Is there a way where I can configure that matches on different words (bart, developer) can score higher then multiple matches on one word (bart).
I already tried the and-operator without success.
Thanks!
This is kind of expected with most fields query, it is field-centric rather than term-centric, From the Docs
most_fields being field-centric rather than term-centric: it looks for
the most matching fields, when really what we’re interested is the
most matching terms.
Another problem is Inverse Document Frequency which is also likely in your case. I guess only few documents have tag named bart which is why its IDF is very high and hence gets higher score.
As given in the above links, you should see how documents are scored with validate and explain.
There are couple of ways to solve this issue
1) You can use custom _all field, i.e copy both full name and tag information to new field with copy_to parameter and then query on it but you have to reindex your data for that
2) I think better solution would be to use cross fields, it takes term-centric approach. From the Docs
The cross_fields type first analyzes the query string to produce a
list of terms, and then it searches for each term in any field.
It also solves IDF issue by blending it across all fields.
This should solve your issue.
{
"query": {
"multi_match": {
"type": "cross_fields",
"query": "bart developer",
"operator": "or",
"fields": [
"fullname",
"tagging.tag.name"
],
"fuzziness": 0
}
}
}
Hope this helps!

Elasticsearch prefer exact match over partial matches when doing typeahead searches

I have configured es to do autocomplete and I can also get exact match preferred over suggestions.
For example if someone type London, the api returns London first then Londonderry. But if someone type Londo then es returns Londonderry first then London. Surely, London is a closer match than Londonderry.
Same thing happens with "New York" and York. "New York" is preferred over York when I search for York.
I am using the solution provided here.
Favor exact matches over nGram in elasticsearch
This code was helpfull for me:
"query": {
"match": {
"message": {
"query": inputQuery,
"fuzziness": 3,
"prefix_length": 2
}
}
}
first of all you should use fuzziness - ES documentation
I hope it will help you also.

Resources