Multiple elasticsearch match queries - elasticsearch

Say I have a document with 3 text fields: field_a , field_b and field_c.
Is it possible to do a single query so that we have results in this order:
'match' in field_a
'match' in field_b
'match' in field_c
'mutli_match' results can have results from different fields mixed together in the order of the results, what I want is any and all results from field_a, then any and all results from field_b and so on.

Even though, I find this approach strange in general (I think the problem you have should be solved in a different way, e.g. multiple stages of search), I think you could solve it for now in a following manner.
Multi match query have a perfect ability to provide boost to your fields. E.g.
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "field_a^1000", "field_b^10", "field_c" ]
}
}
The sign ^ is a boost sign which will multiple score of the match in this field by the value - 1000 in case of field_a
However, I would recommend to avoid this sort of behavior in general, since:
It's hard to control those boosting values
It could be in some cases behaving not as expected (imagine you get the score of 1000 in field_b)
If you would have many hits, this makes whole idea of having match of field_c kinda obsolete, since no user will scroll that far away in search results

Related

The boolean fuzzy query in elasticsearch is not returning expected result

I am trying to build a fuzzy bool query on first and last names in elasticsearch 7.2.0. I have a document with "asim" and "banskota" as first and last name respectively. But when I query with "asi" or "asimmm" and the exact last name, elasticsearch returns no result. However, when queried with exact first name or "asimm", it returns me the intended result from the document.
I also wrote a "fuzzy" query instead of "match". I experimented with different fuzziness parameters, but the outcome is same. Both first name and last names are analyzed, and I queried the 'analyzer' API wrt how it analyze
'asim'. It is indexing the document with 'asim' as a single token with standard analyzer.
EDIT: It turns out that the fuzzy query works with 'Substitution' case, for example, it returns the result for 'asim' when queried with 'asmi' but not for deletion. It is surprising to me as the edit distance in the substitution is greater than in the deletion case. When the string length is greater, for instance with the last name 'Banskota', fuzzy matching works for 'deletion' case as well. What should I do to make the fuzzy search work in 'deletion' case with string length of 4 or 5?
fuzzy_body = {"size": 10,
"query":{
"bool":{
"must": [
{
"match":{"FIRST_NAME_N":{'query': 'asi',"fuzziness": "AUTO"}},
},
{
"fuzzy":{"LAST_NAME_N": "banskota"}
}
]
}
}
}
It turns out that if the name fields are indexed as keyword type, the query returns the expected results with "AUTO" fuzziness.

Elasticsearch: Constant score applied within match query, but after search terms have been analysed?

Imagine I have some documents, with the following values contained within a text field called name
Document1: abc xyz group
Document2: group x/group y
Document3: group 1, group 2, group 3, group 4
Now imagine I'm sending a simple match query to ES for the term 'group':
{
"query": {
"match": {
"name": "group"
}
}
}
My desired outcome would be that all 3 documents would return with the same score, no matter how often the term appears, where it appears, etc.
Now, I already know that I can do this by wrapping my match with a constant_score, like so:
{
"query": {
"constant_score": {
"filter": {
"match": {
"name": "group"
}
},
"boost": 1
}
}
}
BUT, say I now want to query using the search term abc group. In this case, what I want to happen is that Document2 and Document3 will return the same score (matches group), but Document1 to have a better score as it matches both abc and group.
With a constant_score wrapping my match query, documents that contain any of the terms return the same score (i.e Document1, 2 and 3 return the same score for abc group). If I remove the constant_score, then Document 3 has the best score presumably because it contains more matches with the search text (group appearing 4 times).
It seems as though I need a way of moving the constant_score query to after the match query has analyzed my search text. Effectively causing a query of abc group to be two constant_score queries - one for abc and one for group.
Does anyone know of a way to achieve this?
I've managed to solve this by utilising Elasticsearch's unique token filter: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-unique-tokenfilter.html
I've added that to my name field in the index mappings, and it looks to be retrieving the desired results without having to worry about constant_score.
Note however all this does is eliminate term frequencies from having any effect on the _score - other metrics (such as fieldLength) still have an effect on the results. This isn't, therefore, the equivalent of using a post-analyzed version of constant_score as I hypothesized in the question, however this will suffice for my current requirements.

Nested count queries

i'm looking to add a feature to an existing query. Basically, I run a query that returns say 1000 documents. Those documents all have the same structure, only the values of certain fields vary. What i'd like, is to not only get the full list as a result, but also count how many results have a field X with the value Y, how many results have the same field X with the value Z etc...
Basically get all the results + 4 or 5 "counts" that would act like the SQL "group by", in a way.
The point of this is to allow full text search over all the clients in our database (without filtering), while showing how many of those are active clients, past clients, active prospects etc...
Any way to do this without running additional / separate queries ?
EDIT WITH ANSWER :
Aggregations is the way to go. Here's how I did it, it's so straightforward that I expected much harder work !
{
"query": {
"term": {
"_type":"client"
}
},
"aggregations" : {
"agg1" : {
"terms" : {
"field" : "listType.typeRef.keyword"
}
}
}
}
Note that it's even in a list of terms and not a single field, that's just how easy it was !
I believe what you are looking for is the aggregation query.
The documentation should be clear enough, but if you struggle please give us your ES query and we will help you from there.

tf/idf boosting within field

My use case is like this:
for a query iphone charger, I am getting higher relevance for results, having name, iphone charger coupons than with name iphone charger, possibly because of better match in description and other fields. Boosting name field isn't helping much unless I skew the importance drastically. what I really need is tf/idf boost within name field
to quote elasticsearch blog:
the frequency of a term in a field is offset by the length of the field. However, the practical scoring function treats all fields in the same way. It will treat all title fields (because they are short) as more important than all body fields (because they are long).
I need to boost this more important value for a particular field. Can we do this with function score or any other way?
A one term difference in length is not much of a difference to the scoring algorithm (and, in fact, can vanish entirely due to imprecision on the length norm). If there are hits on other fields, you have a lot of scoring elements to fight against.
A dis_max would probably be a reasonable approach to this. Instead of all the additive scores and coords and such you are trying to overcome, it will simply select the score of the best matching subquery. If you boost the query against title, you can ensure matches there are strongly preferred.
You can then assign a "tie_breaker", so that the score against the description subquery is factored in only when "title" scores are tied.
{
"dis_max" : {
"tie_breaker" : 0.2,
"queries" : [
{
"terms" : {
"age" : ["iphone", "charger"],
"boost" : 10
}
},
{
"terms" : {
"description" : ["iphone", "charger"]
}
}
]
}
}
Another approach to this sort of thing, if you absolutely know when you have an exact match against the entire field, is to separately index an untokenized version of that field, and query that field as well. Any match against the untokenized version of the field will be an exact match again the entire field contents. This would prevent you needing to relying on the length norm to make that determination.

Unexpected case sensitivty

I am a noob running elastic search 1.5.9. I want to pull out all of the documents that have the field "PERSON" set to "Johnson." (Note the mixed casing). If I manually look at elastic search head, I can see a document with exactly those attributes.
The docs explain that I should construct a filter query to pull out this document. But when I do so, I get some unexpected behavior.
This works. It returns exactly one document w/ Person = "Johnson", as expected
query = {"filter": {"term" : { "PERSON" : "johnson" }}}
But this does not work
query = {"filter": {"term" : { "PERSON" : "Johnson" }}}
If you look closely, you'll see that the good query is lowercase but the bad query is mixed case -- even though the PERSON field is set to "Johnson".
Adding to the weirdness, I am lower casing everything that goes into the full_text field: "_source": { "full_text": "all lower case" So the full text includes johnson -- which I would think would be totally independent from the PERSON field.
What's going on? How do I do a mixed case search on the PERSON field?
Term query wont analyze your search text.
This means you need to analyzed and provide the query in token format for term query to actually work.
Use match query instead , things will work like magic.
So when a string like below goes to Elasticsearch , its tokenized ( or rather analyzed) and stored
"Green Apple" -> ( "green" , "apple")
This is the default behavior of analysis.
Now when you search using term query , the analysis wont happen.
Which means for the word Apple , it searches for the token Apple with case preserved. And hence fails.
For match query , it does do the analysis. Which means if you search with Apple , it converts it to apple and then does the search. Which give good matches.
You can learn more on analysis here.

Resources