Elasticsearch multi-match not returning all results when providing empty string

Elasticsearch multi-match not returning all results when providing empty string - elasticsearch

I have a total of 1783 records and I want ES to return all of them in case the multi_match query is not provided (searchObject.query = '')
I manage to do so if I pass an empty array to query.bool.should, so in theory I could update the ES object below based on the searchObject.query value but I'm not sure if that's a good idea.
{
_source: [
'id',
'event',
'description',
'element',
'date'
],
track_total_hits: true,
query: {
bool: {
should: [{
multi_match:{
query: searchObject.query
fields: ["element","description","nar.*","title","identifier"]
}
}],
filter: []
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 10
}
Any suggestions?

You can append a match_all to the should:
{
"_source": [
"id",
"event",
"description",
"element",
"date"
],
"track_total_hits": true,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "",
"fields": [
"line",
"element",
"description",
"nar.*",
"title",
"identifier"
]
}
},
{
"match_all": {}
}
],
"filter": []
}
},
"highlight": {
"fields": {
"*": {}
}
},
"sort": [],
"from": 0,
"size": 10
}
That's what it's usually for. IMHO the empty string should be checked before you perform the ES request. I'm assuming it's coming from an autocomplete or such.

This is regulated by Match query zero_terms_query property. Just add this property to your multi_match block: "zero_terms_query": "all".

Related

ElasticSearch query is not working with only 2 characters

I have a field with this mapping definition
identifierNumber: {
type: "keyword",
fields: { text: { type: "text" } },
},
the values of this field look something like this 22-001,22-002, etc
I am making the following query to ElasticSearch
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "NEW"
}
}
],
"must": [
{
"simple_query_string": {
"query": "22 22~",
"fields": [
"title^3",
"identifierNumber^2"
],
"lenient": true
}
}
]
}
},
"sort": []
}
this query returns 0 results.
changing the simple_query_string query to 22001 or 22-001 will return relevant results.
Can someone explain to me why the original query with only 2 characters does not work?

I think you need add the fields "identifierNumber.text" in simple_query_string clausule.
"simple_query_string": {
"query": "22 22~",
"fields": [
"title^3",
"identifierNumber.text"
],
"lenient": true
}

ElasticSearch - Get only matching nested objects with All Top level fields in search response

let say I have following Document:
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
},
{
name: 'xyz',
surname: 'wef'
},
{
name: 'defg',
surname: 'pqr'
}
]
}
I want to Get only matching nested objects with All Top level fields in search response.
I mean If I search/filter for users with name 'abc', I want below response
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
}
]
}
How can I do that?
Reference : select matching objects from array in elasticsearch

If you're ok with having all root fields except the nested one and then only the matching inner hits in the nested field, then we can re-use the previous answer like this by specifying a slightly more involved source filtering parameter:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": { <---- this is where the magic happens
"_source": [
"name", "surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}

Maybe late, I use nested sorting to limit element on my nested relation, here a example :
"sort": {
"ouverture.periodesOuvertures.dateDebut": {
"order": "asc",
"mode": "min",
"nested_filter": {
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
"nested_path": "ouverture.periodesOuvertures"
}
},
Since 5.5 ES (I think) you can use filter on nested query.
Here a example of nested query filter I use:
{
"nested": {
"path": "ouverture.periodesOuvertures",
"query": {
"bool": {
"must": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
],
"filter": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
Hope this can help ;)
Plus if you ES is not in the last version (5.5) inner_hits could slow your query Including inner hits drastically slows down query results

https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-inner-hits.html#nested-inner-hits-source
"inner_hits": {
"_source" : false,
"stored_fields" : ["name", "surname"]
}
but you may need to change mapping to set those fields as "stored_fields" , otherwise you can use
"inner_hits": {}
to get a result that not that perfect.

You can make such a request, but the response will have internal fields starting with _
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {},
"query": {
"bool": {
"must": [
{ "match": { "users.name": "abc" }}
]
}
}
}
}
}

In one of my projects, My expectation was to retrieve unique conversation messages text(inner fields like messages.text) having specific tags. So instead of using inner_hits, I used aggregation like below,
final NestedAggregationBuilder aggregation = AggregationBuilders.nested("parentPath", "messages").subAggregation(AggregationBuilders.terms("innerPath").field("messages.tag"));
final NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.addAggregation(aggregation).build();
final Aggregations aggregations = elasticsearchOperations.search(searchQuery, Conversation.class).getAggregations();
final ParsedNested parentAgg = (ParsedNested) aggregations.asMap().get("parentPath");
final Aggregations childAgg = parentAgg.getAggregations();
final ParsedStringTerms childParsedNested = (ParsedStringTerms) childAgg.asMap().get("innerPath");
// Here you will get unique expected inner fields in key part.
Map<String, Long> agg = childParsedNested.getBuckets().stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));

I use the following body to get that result (I have set the full path to the values):
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Also another way exists:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": false,
"docvalue_fields": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
See results in inner_hits of the result hits.
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/inner-hits.html#nested-inner-hits-source

ElasticSearch: Partial/Exact Scoring with edge_ngram & fuzziness

In ElasticSearch I am trying to get correct scoring using edge_ngram with fuzziness. I would like exact matches to have the highest score and sub matches have lesser scores. Below is my setup and scoring results.
settings: {
number_of_shards: 1,
analysis: {
filter: {
ngram_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20
}
},
analyzer: {
ngram_analyzer: {
type: 'custom',
tokenizer: 'standard',
filter: [
'lowercase',
'ngram_filter'
]
}
}
}
},
mappings: [{
name: 'voter',
_all: {
'type': 'string',
'index_analyzer': 'ngram_analyzer',
'search_analyzer': 'standard'
},
properties: {
last: {
type: 'string',
required : true,
include_in_all: true,
term_vector: 'yes',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard'
},
first: {
type: 'string',
required : true,
include_in_all: true,
term_vector: 'yes',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard'
},
}
}]
After doing a POST with first name "Michael" I do a query as below with changes "Michael", "Michae", "Micha", "Mich", "Mic", and "Mi".
GET voter/voter/_search
{
"query": {
"match": {
"_all": {
"query": "Michael",
"fuzziness": 2,
"prefix_length": 1
}
}
}
}
My score results are:
-"Michael": 0.19535106
-"Michae": 0.2242768
-"Micha": 0.24513611
-"Mich": 0.22340237
-"Mic": 0.21408978
-"Mi": 0.15438235
As you can see the score results aren't getting as expected. I would like "Michael" to have the highest score and "Mi" to have the lowest
Any help would be appreciated!

One way to approach this problem would be to add raw version of text in your mapping like this
last: {
type: 'string',
required : true,
include_in_all: true,
term_vector: 'yes',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
"fields": {
"raw": {
"type": "string" <--- index with standard analyzer
}
}
},
first: {
type: 'string',
required : true,
include_in_all: true,
term_vector: 'yes',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
"fields": {
"raw": {
"type": "string" <--- index with standard analyzer
}
}
},
You could also make it exact with index : not_analyzed
Then you can query like this
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": {
"query": "Michael",
"fuzziness": 2,
"prefix_length": 1
}
}
},
{
"match": {
"last.raw": {
"query": "Michael",
"boost": 5
}
}
},
{
"match": {
"first.raw": {
"query": "Michael",
"boost": 5
}
}
}
]
}
}
}
Documents that matches more clauses will be scored higher.
You could specify boost according to your requirements.

Elasticsearch how to use multi_match with wildcard

I have a User object with properties Name and Surname. I want to search these fields using one query, and I found multi_match in the documentation, but I don't know how to properly use that with a wildcard. Is it possible?
I tried with a multi_match query but it didn't work:
{
"query": {
"multi_match": {
"query": "*mar*",
"fields": [
"user.name",
"user.surname"
]
}
}
}

Alternatively you could use a query_string query with wildcards.
"query": {
"query_string": {
"query": "*mar*",
"fields": ["user.name", "user.surname"]
}
}
This will be slower than using an nGram filter at index-time (see my other answer), but if you are looking for a quick and dirty solution...
Also I am not sure about your mapping, but if you are using user.name instead of name your mapping needs to look like this:
"your_type_name_here": {
"properties": {
"user": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"surname": {
"type": "string"
}
}
}
}
}

Such a query worked for me:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{"query": {"wildcard": {"user.name": {"value": "*mar*"}}}},
{"query": {"wildcard": {"user.surname": {"value": "*mar*"}}}}
]
}
}
}
}
}
Similar to what you are doing, except that in my case there could be different masks for different fields.

I just did this now:
GET _search {
"query": {
"bool": {
"must": [
{
"range": {
"theDate": {
"gte": "2014-01-01",
"lte": "2014-12-31"
}
}
},
{
"match" : {
"Country": "USA"
}
}
],
"should": [
{
"wildcard" : { "Id_A" : "0*" }
},
{
"wildcard" : { "Id_B" : "0*" }
}
],"minimum_number_should_match": 1
}
}
}

Similar to suggestion above, but this is simple and worked for me:
{
"query": {
"bool": {
"must":
[
{
"wildcard" : { "processname.keyword" : "*system*" }
},
{
"wildcard" : { "username" : "*admin*" }
},
{
"wildcard" : { "device_name" : "*10*" }
}
]
}
}
}

I would not use wildcards, it will not scale well. You are asking a lot of the search engine at query time. You can use the nGram filter, to do the processing at index-time not search time.
See this discussion on the nGram filter.
After indexing the name and surname correctly (change your mapping, there are examples in the above link) you can use multi-match but without wildcards and get the expected results.

description: {
type: 'keyword',
normalizer: 'useLowercase',
},
product: {
type: 'object',
properties: {
name: {
type: 'keyword',
normalizer: 'useLowercase',
},
},
},
activity: {
type: 'object',
properties: {
name: {
type: 'keyword',
normalizer: 'useLowercase',
},
},
},
query:
query: {
bool: {
must: [
{
bool: {
should: [
{
wildcard: {
description: {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
{
wildcard: {
'product.name': {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
{
wildcard: {
'activity.name': {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
],
},
},
{
match: {
recordStatus: RecordStatus.Active,
},
},
{
bool: {
must_not: [
{
term: {
'user.id': req.currentUser?.id,
},
},
],
},
},
{
bool: {
should: tags
? tags.map((name: string) => {
return {
nested: {
path: 'tags',
query: {
match: {
'tags.name': name,
},
},
},
};
})
: [],
},
},
],
filter: {
bool: {
must_not: {
terms: {
id: existingIds ? existingIds : [],
},
},
},
},
},
},
sort: [
{
updatedAt: {
order: 'desc',
},
},
],

ElasticSearch filter boosting based on field value

I have the following query:
{
"from": 0,
"query": {
"custom_filters_score": {
"filters": [
{
"boost": 1.5,
"filter": {
"term": {
"format": "test1"
}
}
},
{
"boost": 1.5,
"filter": {
"term": {
"format": "test2"
}
}
}
],
"query": {
"bool": {
"must": {
"query_string": {
"analyzer": "query_default",
"fields": [
"title^5",
"description^2",
"indexable_content"
],
"query": "blah"
}
},
"should": []
}
}
}
},
"size": 50
}
Which should be boosting things that have {"format":"test1"} in them, if I'm reading the documentation correctly.
However, using explain tells me that "custom score, no filter match, product of:" is the outcome, and the score of the returned documents that match the filter isn't changed by this filter.
What am I doing wrong?
Edit: here's the schema:
mapping:
edition:
_all: { enabled: true }
properties:
title: { type: string, index: analyzed }
description: { type: string, index: analyzed }
format: { type: string, index: not_analyzed, include_in_all: false }
section: { type: string, index: not_analyzed, include_in_all: false }
subsection: { type: string, index: not_analyzed, include_in_all: false }
subsubsection: { type: string, index: not_analyzed, include_in_all: false }
link: { type: string, index: not_analyzed, include_in_all: false }
indexable_content: { type: string, index: analyzed }
and let's assume a typical document is like:
{
"format": "test1",
"title": "blah",
"description": "blah",
"indexable_content": "keywords here",
"section": "section",
"subsection": "child-section",
"link":"/section/child-section/blah"
}

If it says "no filter match", it means that it didn't match any filters in your query. Most likely reason for this is that the records that match your query don't have terms "test1" in them. Unfortunately, you didn't provide mapping and test data, so it's difficult to tell what's going on there for sure.
Try running this query to see if you can actually find any records that match your search criteria and should be boosted:
{
"from": 0,
"query": {
"bool": {
"must": [{
"query_string": {
"analyzer": "query_default",
"fields": ["title^5", "description^2", "indexable_content"],
"query": "blah"
}
}, {
"term": {
"format": "test1"
}
}]
}
},
"size": 50
}
Your query looks fine and based on the provided information, it should work: https://gist.github.com/4448954

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch multi-match not returning all results when providing empty string - elasticsearch

This is regulated by Match query zero_terms_query property. Just add this property to your multi_match block: "zero_terms_query": "all".

Related

ElasticSearch query is not working with only 2 characters

ElasticSearch - Get only matching nested objects with All Top level fields in search response

ElasticSearch: Partial/Exact Scoring with edge_ngram & fuzziness

Elasticsearch how to use multi_match with wildcard

ElasticSearch filter boosting based on field value

Categories

Resources