Calculate counts of hits of several subqueries inside one query to Elasticsearch - elasticsearch

I have 3 fields in a document that I need to match. I'd like to identify which of those 3 fields have any matches.
More specifically, I'd like to find out if the given wildcard query matches only one field through the document set or matches several fields. If the wildcard query matches only, say field1, then I can make a conclusion that the given wildcard query is applicable to only field1. If the wildcard query matches two or three fields, then I cannot make such a conclusion and I'll wait for more characters to be entered by user to narrow search.
I've written the following query that matches all 3 fields:
{
"query": {
"bool": {
"should": [
{"wildcard": { "field1": "*R*" }},
{"wildcard": { "field2": "*R*" }},
{"wildcard": { "field3": "*R*" }}
]
}
},
"size": 0
}
It returns the total count of all documents that have matches on any of those fields. Now I'd like to know if it's possible to receive 3 separate counts for each subquery. This can be achieved by sending 3 separate requests but I'd like to minimize the number of requests to elasticsearch.
I've tried bool and dis_max queries but could not find a solution.
UPDATE
Using named queries I've built the following query:
{
"query": {
"bool": {
"should": [
{"wildcard": { "field1": { "value": "*R*", "_name": "query1" }}},
{"wildcard": { "field2": { "value": "*R*", "_name": "query2" }}},
{"wildcard": { "field3": { "value": "*R*", "_name": "query3" }}}
]
}
},
"size": 1
}
This query returns a single result with the best score. By default, the score is higher when more fields are matched in the same document. So if the found document was matched by two or three fields, it already answers my initial question. However, if the found document was matched by a single field, say, field1, it does not guarantee there are no other documents that are matched by field2 or field3, so it's still not a solution.
Do I have to send 3 requests to run searches over each field separately to solve my problem?

Related

Elasticsearch collapse not working with search_after with single sort field and PIT

I have an Elastic query that initially returns results. When I attempt the query again using search_after for paging, I am getting the error: Cannot use [collapse] in conjunction with [search_after] unless the search is sorted on the same field. Multiple sort fields are not allowed. So far as I can tell, I am sorting and collapsing using just a single field per_id. Is my query structured incorrectly or is there something else I need to do to get this query to run?
GET /_search
{
"query": {
"bool": {
"must": [{
"term": {
"pform": "iphone"
}
}]
}
},
"collapse": {
"field": "per_id"
},
"pit": {
"id": "g-ABCDDEFG12345678ABCDDEFG12345678==",
"keep_alive": "5m"
},
"sort": [
{"per_id": "asc"}
],
"search_after" : [
"ABCDDEFG12345678",
123456
]
}
I needed to exclude the tie breaker in my search_after. It shouldn't cause duplicates because I am using a PIT and sorting on the collapse field, meaning duplicates shouldn't exist in the my result set.
"search_after" : [
"ABCDDEFG12345678"
]
So I needed to remove the tiebreaker returned from the previous result before passing it into the next one

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

How to perform search query on two different data types?

my query is very simple, for the sake of even making it simpler, lets say I only search on two fields, name(text) & age(long):
GET person_db/person/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase_prefix": {
"name": "hank"
}
},
{
"match_phrase_prefix": {
"age": "hank"
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
if I search for "23", no problem, elastic knows how to change it to numeric and it won't fail, but if the search input is "john" I get error 400 "reason": "failed to create query: {\n \"bool\....".
what should I do in this case?
I thought of changing the values that are numeric to strings before insert to es, but trying to avoid it, I think es should have a way to support it.
appreciate it
This query works: (thanks to #jmlw)
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "alt",
"type": "phrase_prefix",
"fields": [
"name",
"taxid",
"providers.providerAddress.street"
],
"lenient": true
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
Without details of your documents, or your mappings, my first guess is that the age field is interpreted as a numeric field by Elasticsearch. Passing in anything other than a 'number' type, or something that can be converted into a number will cause the query to fail, with some exception reporting a failure to convert your string into a number.
With that said, you may try add ing lenient: true to your match_phrase_prefix search term, which will allow Elasticsearch to ignore failures to convert to a numeric type, and remove that term from the search.
Another approach is to only allow users to query on multiple fields of the same type, or specify what data they'd like to query in which field. I.E. I'm a user, and I want to search for people where age is 23, and have the name John, instead of typing in 23 John, or similar.
Otherwise, you may need to pre-process the query string, and split search terms and pass them into search clauses individually with lenient: true to attempt searching multiple terms in multiple fields with different data types.
You could also try using a different search type, like a multi_match, query_string, or simple_query_string as these will likely have more flexibility for what you are wanting to do.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

query_string vs group match in elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Resources