increase score of query where all text match and not repeating words

increase score of query where all text match and not repeating words - elasticsearch

I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.

If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.

you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.

Related

Elasticsearch match string to field with fuzziness

I'm trying to match a string to a field and only want to apply fuzziness.
For example, with these documents:
{ title: "replace oilfilter" }, { title: "replace motoroil" }
The following queries should match only the first document:
"Replace oilfilter", "Replace oilsfilter", "Replaze oilfilter"
The following queries should NOT match any document:
"replace", "oilfilter", "motoroil"
What I got so far is the following:
index
I'm using the keyword analyzer so it sees the (potential) phrase as a single word, this way it does not match a document when searching for "replace" but it does find a document when searching for the exact term "Replace oilfilter".
"mappings": {
"blacklist": {
"properties": {
"title": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
search
I've tried multiple queries to search the documents. I got close with the following query:
"query": {
"query_string": {
"default_field": "title",
"fuzziness": "3",
"query": query
}
}
results
With this query the following are the results:
> "Replace oilfilter" (exact words)
< doc: { title: "replace oilfilter" }, score: 0.5753..
< doc: { title: "replace motoroil" }, score: 0.2876..
> "Replace iolfilter" (typo)
< doc: { title: "replace oilfilter" }, score: 0.2876..
> "oilfilter" (other term)
< doc: { title: "replace oilfilter" }, score: 0.2876..
problem
The results aren't that bad, but I need the scores to be more accurate. The second query with only the simple typo should get a much higher score than the second result in the first query and the only result in the third query.
What I'm trying to achieve is that it matches the whole query against the whole field in the document, that's why I'm using keyword analyzer. On top of that I only want to apply some fuzziness.
Hope someone can shed some light on this issue.
Thanks!

The following search should achieve what you want:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "replace oilfliter",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
You can increase the minimum_should_match to 100% if you want require a match on all the query terms no matter how long the query string is.

multi_match fuzzy query across multiple fields

I am working to match a 'term' to multi fields (or _all field)
I want to do a fuzzy match on cross_fields but it is not supported.
any ideas how to do it or any other ways to do it ?
query: {
multi_match: {
query: term,
type: "cross_fields",
fields: ['_all']
}
}
when trying the solution here
ElasticSearch multi_match query over multiple fields with Fuzziness
I get this error
[parsing_exception] Fuziness not allowed for type [cross_fields], with
{ line=1 & col=128 }
elasticsearch version 5.0
edit:
here is the query I am building
bool: {
must: [
{
fuzzy: {
_all: term
}
},
{
fuzzy: {
"location.country": country
}
},
{
fuzzy: {
"location.city": city
}
}
]
}

cross_fields works by searching the term on your multiple fields. Since fuzziness isn't supported for cross_fields you have to write the query in a different way.
One possible is: implement your own "cross_fields" with shoulds and add there the fuzziness.
Say your term is: "term1 term2", you can split by word boundary (Regex \b) then should them in this form:
{
{
"query": {
"bool": {
"should": [{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field1": "term",
"fuzziness": 1
}
},{
"match": {
"field2": "term1",
"fuzziness": 1
}
},{
"match": {
"field2": "term12",
"fuzziness": 1
}
}
]
}
}
}
}
This is probably less the optimal if you have many fields, the query will become a cartesian product of the terms and fields.
Important note You're using _all field which is one field. which all other fields are indexed into. Maybe you don't even need cross_fields?

Elasticsearch with AND query in DSL

this drives me crazy. I have no clue why this elastic search do not return me value.
I put values with this:
PUT /customer/person-test/1?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q1"
}
PUT /customer/person-test/2?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q2"
}
and when I query like this, it do not returns me value:
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "John Doe"
}
},
{
"term": {
"quarter": "2016_Q1"
}
}
]
}
}
}
this query i copied from A simple AND query with Elasticsearch
I just want to get the person with "John Doe" AND "2016_Q1", why this did not work?

You should use match instead of term :
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"match": {
"name": "John Doe"
}
},
{
"match": {
"quarter": "2016_Q1"
}
}
]
}
}
}
Explanation
Why doesn’t the term query match my document ?
String fields can be of type text (treated as full text, like the body
of an email), or keyword (treated as exact values, like an email
address or a zip code). Exact values (like numbers, dates, and
keywords) have the exact value specified in the field added to the
inverted index in order to make them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index.
There are many ways to analyze text: the default standard analyzer
drops most punctuation, breaks up text into individual words, and
lower cases them. For instance, the standard analyzer would turn the
string “Quick Brown Fox!” into the terms [quick, brown, fox].
This analysis process makes it possible to search for individual words
within a big block of full text.
The term query looks for the exact term in the field’s inverted
index — it doesn’t know anything about the field’s analyzer. This
makes it useful for looking up values in keyword fields, or in numeric
or date fields. When querying full text fields, use the match query
instead, which understands how the field has been analyzed.
...

its not working because of u r using default standard analyzer link for 'name' and 'quarter' .
You have two more options :-
1)change mapping :-
"name": {
"type": "string",
"index": "not_analyzed"
},
"quarter": {
"type": "string",
"index": "not_analyzed"
}
2)try this , lowercase your value since by default standard analyzer use Lower Case Token Filter :-
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "john_doe"
}
},
{
"term": {
"quarter": "2016_q1"
}
}
]
}
}
}

exact query search in elasticsearch

I have this query that returns if the word "mumbai" appear anywhere in the title.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"title": "mumbai"
}
}
}
}
}
So the result contains...
mumbai
mumbai ports
financial capital mumbai
I need to return only "mumbai" term and not the other documents where mumbai word is associated with other phrases. Only the first result is correct. How do I discard other results?
update
This query is working as expected and it lists the sort value 58 (random value) if the match is exact.
curl -XPOST "localhost:9200/enwiki_content/page/_search?pretty" -d'
{
"fields": "title",
"query": {
"match": {"title": "Mumbai"}
},
"sort": {
"_script": {
"script": "_source.title == \"Mumbai\" ? \"58\": \"78\";",
"type": "string"
}
}
}'
I need to return the title where match is exact Mumbai (and hence the sort value 58). How do I filter or add the script to "fields" parameter?

To get mumbai to match with doc which contains only mumbai and nothing else, you'll have to store a token count field for the field you are searching on.
This token count field will contain the number of tokens the field contains. Using this field, you can match mumbai on your title field, and match token_count field with the number of tokens in mumbai (which is one).
Note that token_count field in other documents will more than 1.
For reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html
Note: If you are using stopwords, then you need to know about the other caveats related to token count. You can find the information in the above link.

Try the term query. It will do exact match search
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "mumbai"
}
}
]
}
}
}
Term query will not match Mumbai and mumbai, it will be counted as different words
Second Option:
If you can change the mapping then you can set the title field as not_analyzed
Third Option
match query with analyzer option
{
"query": {
"match": {
"title": {
"query": "mumbai",
"analyzer": "keyword"
}
}
}
}

elasticsearch scoring unique terms vs ngram terms

i've figured out how to return results on a partial word result using ngrams. but now i'd like to arrange (score or sort) my results based on the term first and then a partial term.
for example, the user searches a movie db for 'we'. i want 'we are marshall' and similar to show up at the top, and not 'north by northwest'. (the 'we' is in 'northwest').
currently this is my mapping for this title field:
"title": {
"type": "string",
"analyzer": "ngramAnalyer",
"fields": {
"term": {
"type": "string",
"analyzer": "fullTermCaseInsensitive"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
i've created a multifield where ngramAnalyzer is a custom ngram, term is using a keyword tokenizer with a standard filter, and raw is not_indexed.
my query is as follows:
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * (1+ (1 / doc['salesrank'].value) )"
}
}
],
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": {
"query": "we",
"max_expansions": 10
}
}
}
],
"should":{
"term" : {
"title.term" : {
"value" : "we",
"boost" : 10
}
}
}
}
}
}
i'm basically requiring that the ngram must be matched, and the term 'we' should be matched, and if so, boost it.
this isn't working of course.
any ideas?
edit
to add further complexity ... how would i match first on exact title, then on a custom score?
i've taken some stabs at it, but doesn't seem to work.
for example:
input: 'game'
results should be ordered by exact match 'game'
followed by a custom score based on a sales rank (integer)
so that the next results after 'game' might be something like 'hunger games'

what about bool combination of boosting query, where first match about full term with 10x boost factor, and another matches against ngram term with standard boost factor?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

increase score of query where all text match and not repeating words - elasticsearch

Related

Elasticsearch match string to field with fuzziness

multi_match fuzzy query across multiple fields

Elasticsearch with AND query in DSL

exact query search in elasticsearch

elasticsearch scoring unique terms vs ngram terms

Categories

Resources