Elasticsearch rescore all results ignoring base score

Elasticsearch rescore all results ignoring base score - elasticsearch

I'm trying to rescore my results with the following query:
POST /archive/item/_search
{
"query": {
"multi_match": {
"fields": ["title", "description"],
"query": "1 złoty",
"operator": "and"
}
},
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"multi_match": {
"type": "phrase",
"fields": ["title", "description"],
"query": "1 złoty",
"slop": 10
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
I'm doing this because I want to score by proximity mainly.
Also, I want to ignore source field length impact on the score.
Am I doing this right? If not, what's the best practice here?
And the second question. Why window_size is needed anyway?
I don't want top results only.
The main query atcs like a filter, so all the results it returns are relevant.
I quess something like "window_size": "all" would be perfect, but I couldn't find anything in the docs.

To answer your second question, the reason it's needed is because it's designed to be for top results only. Basically it's a cost issue - the assumption is that the secondary algorithm is more expensive so it was only designed to be run on the top results. There's more discussion about this here:
https://github.com/elasticsearch/elasticsearch/issues/2640
and here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-rescore.html
Personally I think the "all" option is a great idea, maybe you should open an issue on github?

If you want to score with proximity match all results returned by some other filter this should do:
{
"query": {
"filtered" : {
"query" : {
"multi_match": {
"type": "phrase",
"fields": ["title", "description"],
"query": "1 złoty",
"slop": 10
}
},
"filter" : {
"query": {
"multi_match": {
"fields": ["title", "description"],
"query": "1 złoty",
"operator": "and"
}
}
}
}
}
}
According to this, the filter is run before the query, so the performance shouldn't be bad as well. What's more you don't score twice, because filters don't calculate scores. Another advantage is that filters can be cached which should speed things significantly.
Keep in mind that I did short tests only, mostly focusing on syntax not results. You might want to double check it.

Related

ElasticSearch: obtaining individual scores from each query inside of a bool query

Assume I have a compound bool query with various "must" and "should" statements that each may include different leaf queries including "multi-match" and "match_phrase" queries such as below.
How can I get the score from individual queries packed into a single query?
I know one way could be to break it down into multiple queries, execute each, and then aggregate the results in code-level (not query-level). However, I suppose that is less efficient, plus, I lose sorting/pagination/.... features from ElasticSearch.
I think "Explanation API" is also not useful for me since it provides very low-level details of scoring (inefficient and hard to parse) while I just need to know the score for each specific leaf query (which I've also already named them)
If I'm wrong on any terminology (e.g. compound, leaf), please correct me. The big picture is how to obtain individual scores from each sub-query inside of a bool query.
PS: I came across Different score functions in bool query. However, it does not return the scores. If I wrap my queries in "function_score", I want the scoring to be default but obtain the individual scores in response to the query.
Please see the snippet below:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "...",
"fields": [
"field1^3",
"field2^5"
],
"_name": "must1_mm",
"boost": 3
}
}
],
"should": [
{
"multi_match": {
"query": "...",
"fields": [
"field3^2",
"field4^5"
],
"boost": 2,
"_name": "should1_mm",
"boost": 2
}
},
{
"match_phrase": {
"field5": {
"_name": "phrase1",
"boost": 1.5,
"query": "..."
}
}
},
{
"match_phrase": {
"field6": {
"_name": "phrase2",
"boost": 1,
"query": "..."
}
}
}
]
}
}
}```

How to add fuzziness to search query in elasticsearch?

I'm trying to implement fuzziness on a particular field in a cross-fields query. It's a bit difficult though.
So the query should:
Match phrases across fields.
Match an exact match against partNumber and barcode (no fuzziness)
Match fuzzy terms against title and subtitle.
The query that I have so far is below - note the fuzziness isn't working at all in query so far.
So this should match 1 result which is "Amazing t-Shirt" in the title, and Blue in the subtitle. (note the spelling error).
Is it possible to implement the fuzziness at the index mapping level instead? Title and subtitle are quite short in the data set - maybe 30 - 40 characters combined maximum.
Otherwise how can I add fuzziness to the title and subtitle in the query?
{
"query": {
"multi_match": {
"query": "Bleu Amazing T-Shirt",
"fuzziness": "auto",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
],
"type": "cross_fields"
}
},
"fields": [
"identity.partNumber",
"identity.barcode",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
]
}

well it doesn't seem to be supported to fuzzy search using cross_fields, there was a few related issues. So instead of crossfield search, I copied the title & subtitle to a new field at index time and split the query like below. Seems to work for my test cases at least....
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "{{searchTerm}}",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn"
],
"type": "best_fields"
}
},
{
"match": {
"fuzzyFields": {
"query": "{{searchTerm}}",
"operator": "and",
"fuzziness": "auto"
}
}
}
]
}
}

Finding an exact phrase in multiple fields with Elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.

If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)

how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?

Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.

I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Elasticsearch query on parent child using facet count for matching results from both parent and child

The idea is to perform a query on everything that matches a basic query statement and return a facet count
The children matched of type page (child) and then the count of the book (parent). Use case of this would be to show X amount of books on Y amount of pages. These would then have seperates links, with additional queries etc.
I'm fresh out of the box with elasticsearch, very cool what I've got into so far, hit a brick wall with this, any help would be really useful.
Thank you for your time :)
{
"query": {
"has_child": {
"type": "page",
"query": {
"filtered": {
"query": {
"query_string": {
"default_field": "text",
"query": "some example search query"
}
}
}
}
},
"facets": {},
"sort": [
"_score"
],
"from": 0,
"size": 10
}
}
Yes, I've read the documentation on facets

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch rescore all results ignoring base score - elasticsearch

Related

ElasticSearch: obtaining individual scores from each query inside of a bool query

How to add fuzziness to search query in elasticsearch?

Finding an exact phrase in multiple fields with Elasticsearch

Elasticsearch fuzzy matching: How can I get direct hits first?

Elasticsearch query on parent child using facet count for matching results from both parent and child

Categories

Resources