ElasticSearch filter boosting based on field value - elasticsearch

I have the following query:
{
"from": 0,
"query": {
"custom_filters_score": {
"filters": [
{
"boost": 1.5,
"filter": {
"term": {
"format": "test1"
}
}
},
{
"boost": 1.5,
"filter": {
"term": {
"format": "test2"
}
}
}
],
"query": {
"bool": {
"must": {
"query_string": {
"analyzer": "query_default",
"fields": [
"title^5",
"description^2",
"indexable_content"
],
"query": "blah"
}
},
"should": []
}
}
}
},
"size": 50
}
Which should be boosting things that have {"format":"test1"} in them, if I'm reading the documentation correctly.
However, using explain tells me that "custom score, no filter match, product of:" is the outcome, and the score of the returned documents that match the filter isn't changed by this filter.
What am I doing wrong?
Edit: here's the schema:
mapping:
edition:
_all: { enabled: true }
properties:
title: { type: string, index: analyzed }
description: { type: string, index: analyzed }
format: { type: string, index: not_analyzed, include_in_all: false }
section: { type: string, index: not_analyzed, include_in_all: false }
subsection: { type: string, index: not_analyzed, include_in_all: false }
subsubsection: { type: string, index: not_analyzed, include_in_all: false }
link: { type: string, index: not_analyzed, include_in_all: false }
indexable_content: { type: string, index: analyzed }
and let's assume a typical document is like:
{
"format": "test1",
"title": "blah",
"description": "blah",
"indexable_content": "keywords here",
"section": "section",
"subsection": "child-section",
"link":"/section/child-section/blah"
}

If it says "no filter match", it means that it didn't match any filters in your query. Most likely reason for this is that the records that match your query don't have terms "test1" in them. Unfortunately, you didn't provide mapping and test data, so it's difficult to tell what's going on there for sure.
Try running this query to see if you can actually find any records that match your search criteria and should be boosted:
{
"from": 0,
"query": {
"bool": {
"must": [{
"query_string": {
"analyzer": "query_default",
"fields": ["title^5", "description^2", "indexable_content"],
"query": "blah"
}
}, {
"term": {
"format": "test1"
}
}]
}
},
"size": 50
}
Your query looks fine and based on the provided information, it should work: https://gist.github.com/4448954

Related

ElasticSearch 6.8 doesn't order by exact matches first

I've been searching for this kind of issue for some days and I didn't make it work. I followed steps like this and this but no success.
So basically, I have the following data on ElasticSearch:
{ title: "Black Dust" },
{ title: "Dust In The Wind" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
and the problem is that I want to search by "Dust" word and I want the results be ordered like:
{ title: "Dust In The Wind" },
{ title: "Black Dust" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
where "Dust" must appear at the top of the result instead.
Posting the mappings and query would be better than continue explaining the issue itself.
settings: {
analysis: {
normalizer: {
lowercase: {
type: 'custom',
filter: ['lowercase']
}
}
}
},
mappings: {
_doc: {
properties: {
title: {
type: 'text',
analyzer: 'standard',
fields: {
raw: {
type: 'keyword',
normalizer: 'lowercase'
},
fuzzy: {
type: 'text',
},
},
}
}
}
}
and my query is:
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"title"
],
"default_operator": "AND",
"query": "dust"
}
},
"should": {
"prefix": {
"title.raw": "dust"
}
}
}
}
Can anyone please help me in this?
Thank you!
SOLUTION!
I figured it out and I solved by performing the following query:
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"prefix": {
"title.raw": {
"value": "dust",
"boost": 1000000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 50000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 10,
"fuzziness": 1
}
}
}
]
}
}
}
}
However, while writing tests, I found a little issue.
So, I'm generating a random uuid and adding to database the following:
{ title: `${uuid} A` }
{ title: `${uuid} W` }
{ title: `${uuid} Z` }
{ title: `A ${uuid}` }
{ title: `z ${uuid}` }
{ title: `Z ${uuid}` }
When I perform the query above looking for the uuid, I get:
uuid Z
uuid A
uuid W
Z uuid
I achieved my first goal that was having the uuid on first position, but why Z is before A? (first and second result)
When everything else fails you can use a trivial substring position sort like so:
{
"query": {
"bool": {
"must": {
...
},
"should": {
...
}
}
},
"sort": [
{
"_script": {
"script": "return doc['title.raw'].value.indexOf('dust')",
"type": "number",
"order": "asc" <--
}
}
]
}
I've set the order to asc because the lower the substring index, the higher the 'score'.
EDIT
We've gotta account for index == -1 so replace the script above with:
"script": "def pos = doc['title.raw'].value.indexOf('dust'); return pos == -1 ? Integer.MAX_VALUE : pos"

elasticsearch nested query returns only last 3 results

We have the following elasticsearch mapping
{
index: 'data',
body: {
settings: {
analysis: {
analyzer: {
lowerCase: {
tokenizer: 'whitespace',
filter: ['lowercase']
}
}
}
},
mappings: {
// used for _all field
_default_: {
index_analyzer: 'lowerCase'
},
entry: {
properties: {
id: { type: 'string', analyzer: 'lowerCase' },
type: { type: 'string', analyzer: 'lowerCase' },
name: { type: 'string', analyzer: 'lowerCase' },
blobIds: {
type: 'nested',
properties: {
id: { type: 'string' },
filename: { type: 'string', analyzer: 'lowerCase' }
}
}
}
}
}
}
}
and a sample document that looks like the following:
{
"id":"5f02e9dae252732912749e13",
"type":"test_type",
"name":"test_name",
"creationTimestamp":"2020-07-06T09:07:38.775Z",
"blobIds":[
{
"id":"5f02e9dbe252732912749e18",
"filename":"test1.csv"
},
{
"id":"5f02e9dbe252732912749e1c",
"filename":"test2.txt"
},
// removed in-between elements for simplicity
{
"id":"5f02e9dbe252732912749e1e",
"filename":"test3.csv"
},
{
"id":"5f02e9dbe252732912749e58",
"filename":"test4.txt"
},
{
"id":"5f02e9dbe252732912749e5a",
"filename":"test5.csv"
},
{
"id":"5f02e9dbe252732912749e5d",
"filename":"test6.txt"
}
]
}
I have the following ES query which is querying documents for a certain timerange based on the creationTimestamp field and then filtering the nested field blobIds based on a user query, that should match the blobIds.filename field.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"creationTimestamp": {
"gte": "2020-07-01T09:07:38.775Z",
"lte": "2020-07-07T09:07:40.147Z"
}
}
},
{
"nested": {
"path": [
"blobIds"
],
"query": {
"query_string": {
"fields": [
"blobIds.filename"
],
"query": "*"
}
},
// returns the actual blobId hit
// and not the whole array
"inner_hits": {}
}
},
{
"query": {
"query_string": {
"query": "+type:*test_type* +name:*test_name*"
}
}
}
]
}
}
}
},
"sort": [
{
"creationTimestamp": {
"order": "asc"
},
"id": {
"order": "asc"
}
}
]
}
The above entry is clearly matching the query. However, it seems like there is something wrong with the returned inner_hits, since I always get only the last 3 blobIds elements instead of the whole array that contains 24 elements, as can be seen below.
{
"name": "test_name",
"creationTimestamp": "2020-07-06T09:07:38.775Z",
"id": "5f02e9dae252732912749e13",
"type": "test_type",
"blobIds": [
{
"id": "5f02e9dbe252732912749e5d",
"filename": "test4.txt"
},
{
"id": "5f02e9dbe252732912749e5a",
"filename": "test5.csv"
},
{
"id": "5f02e9dbe252732912749e58",
"filename": "test6.txt"
}
]
}
I find it very strange since I'm only doing a simple * query.
Using elasticsearch v1.7 and cannot update at the moment

Elasticsearch multi-match not returning all results when providing empty string

I have a total of 1783 records and I want ES to return all of them in case the multi_match query is not provided (searchObject.query = '')
I manage to do so if I pass an empty array to query.bool.should, so in theory I could update the ES object below based on the searchObject.query value but I'm not sure if that's a good idea.
{
_source: [
'id',
'event',
'description',
'element',
'date'
],
track_total_hits: true,
query: {
bool: {
should: [{
multi_match:{
query: searchObject.query
fields: ["element","description","nar.*","title","identifier"]
}
}],
filter: []
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 10
}
Any suggestions?
You can append a match_all to the should:
{
"_source": [
"id",
"event",
"description",
"element",
"date"
],
"track_total_hits": true,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "",
"fields": [
"line",
"element",
"description",
"nar.*",
"title",
"identifier"
]
}
},
{
"match_all": {}
}
],
"filter": []
}
},
"highlight": {
"fields": {
"*": {}
}
},
"sort": [],
"from": 0,
"size": 10
}
That's what it's usually for. IMHO the empty string should be checked before you perform the ES request. I'm assuming it's coming from an autocomplete or such.
This is regulated by Match query zero_terms_query property. Just add this property to your multi_match block: "zero_terms_query": "all".

Elasticsearch with nested objects query

I have an index with a nested mapping.
I want to preform a query that will return the following: give me all the documents where each word in the search term appears in one or more of the nested documents.
Here is the index:
properties: {
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
Here is the latest query I try:
nested: {
path: "column_values_index_as_objects",
query: {
bool: {
must: [
{
match: {
"column_values_index_as_objects.value.analyzed": {
query: search_term,
boost: 10 * boost_factor,
operator: "or",
analyzer: "searchkick_search"
}
}
}
For example if I search the words 'food and water', I want that each word will appear in at least on nested document.
The current search returns the document even if only one of the words exists
Thanks for the help!
Update:
As Cristoph suggested, the solution works. now I have the following problem.
Here is my index:
properties: {
name: {
type: "text"
},
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
And the query I want to preform is if I search for 'my name is guy', and will give all the documents where all the words are found - might be in the nested documents and might in the name field.
For example, I could have a document with the value 'guy' in the name field and other words in the nested documents
In order to do this, I usually split the terms and generate a request like this (foo:bar is an other criteria on an other field) :
{
"bool": {
"must": [
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "food",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "and",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "water",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"query": {
"term": {
"foo": "bar"
}
}
}
]
}
}

elasticsearch: combine text match and array contains

We have documents in elastic saved in the following structure:
{
...
name: "name",
ancestors: ["id1", "id2", "id3"],
...
}
I want to create a search query that searches for name: "some name" AND ancestors contains "id1".
I've tried many queries but none seems to work, or return the desired result. If it's of any help, this combined query should only return one entry every time.
Some of the queries I've tried are the following:
filtered: {
query: {
query_string: {
query: "name:name"
},
term: {
ancestors: "id1"
}
}
}
__
match: {
name: "name",
ancestors: "id1"
},
defaultOperator: 'AND'
__
bool: {
must: { term: { name: "name" }},
filter: {
term: { ancestors: "id1" }
}
}
The mappings are the following:
{
"data": {
"mappings": {
"entry": {
"properties": {
"ancestors": {
"type": "string"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
We haven't changed the default mappings, that's why ancestors is of type string, but I don't think this makes any difference
Try this query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "name",
"query": "stripe AND 1"
}
},
{
"match": {
"ancestors": "id1"
}
}
]
}
}
}

Resources