Terms query not returning results for list of strings - elasticsearch

I have this Elastic query which fails to return the desired results for terms.letter_score. I'm certain there is available matches in the index. This query (excluding letter_score) returns the expected filtered results but nothing with letter_score. The only difference is (as far as I can tell), is that the cat_id values is a list of integers vs strings. Any ideas of what could be the issue here? I'm basically trying to get it to match ANY value from the letter_score list.
Thanks
{
"size": 10,
"query": {
"bool": {
"filter": [
{
"terms": {
"cat_id": [
1,
2,
4
]
}
},
{
"terms": {
"letter_score": [
"A",
"B",
"E"
]
}
}
]
}
}
}

It sounds like your letter_score field is of type text, and hence, has been analyzed, so the tokens A, B and E have been stored as a, b and e so the terms query won't match them.
Also if that's the case, the probability is high that the token a has been ignored at indexing time because it is a stop word and the standard analyzer (default) ignores them (if you're using ES 5+).
A first approach is to use a match query instead of terms, like this:
{
"match": {
"letter_score": "A B E"
}
}
If that still doesn't work, I suggest that you change the mapping of your letter_score field to keyword (requires reindexing your data) and then your query will work as it is now

Related

Terms Set Query's minimum_should_match_field does not behave as expected when the provided field has value zero

I am wondering, using "terms set" query, why when a field that specified by the minimum_should_match_field has value "0", it behaves as if it has value "1".
To replicate the problem, I take the example from the Elasticsearch doc and construct three steps below.
Step 1:
Create a new index
PUT /job-candidates
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"programming_languages": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}
Step 2:
Create two docs with required_matches set to zero
PUT /job-candidates/_doc/1?refresh
{
"name": "Jane",
"programming_languages": [ "c++", "java" ],
"required_matches": 0
}
and also
PUT /job-candidates/_doc/1?refresh
{
"name": "Ben",
"programming_languages": [ "python" ],
"required_matches": 0
}
Step 3:
Search for docs with the following search
GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c++", "java"],
"minimum_should_match_field": "required_matches"
}
}
}
}
Expected Results: I expect step 3 returns both docs "Jane" and "Ben"
Actual Results: but it only returns doc "Jane"
I don't understand. If minimum_should_match is 0, doesn't it mean that an returned doc do not need to match any term(s), therefore "Ben" doc should also be returned?
Some links I found but still can't answer my question:
minimum_should_match
It looks like minimum_should_match can't not be zero, but it does not says how search works if it's indeed zero or more than the number of optional values.
A discussion of default value for minimum_should_match
But they didn't discuss the "terms set" query in particular.
Any clarification will be appreciated! Thanks.
When looking at the terms_set source code, we can see that the underlying Lucene query being used is called CoveringQuery.
So the explanation can be found in Lucene's source code of CoveringQuery, whose documentation says
Per-document long value that records how many queries should match. Values that are less than 1 are treated like 1: only documents that have at least one matching clause will be considered matches. Documents that do not have a value for minimumNumberMatch do not match.
And a little further, the code that sets minimumNumberMatch is pretty self-explanatory:
final long minimumNumberMatch = Math.max(1, minMatchValues.longValue());
We can simply sum it up by stating that it doesn't really make sense to send a terms_set query with minimum_should_match: 0 as it would be equivalent to a match_all query.

Elasticsearch - Edit distance using fuzzy is inaccurate

I am using ES 5.5 and my requirement is to allow upto two edits while matching a field.
In ES,I have value as 124456788 and query comes in as 123456789
"fuzzy": {
"idkey": {
"value": **"123456789"**,
"fuzziness": "20"
}
}
To my knowledge the edit distance is 2 between these two numbers. But it is not matching even with fuzziness property as 20.
I did an explain api call and here is what I am seeing
"description": "no match on required clause (((idkey:012345789)^0.7777778 (idkey:012346789)^0.7777778 (idkey:013456789)^0.7777778 (idkey:023456789)^0.8888889 (idkey:102345678)^0.7777778 (idkey:112345678)^0.7777778 (idkey:113456789)^0.8888889 (idkey:120456589)^0.7777778 (idkey:121345678)^0.7777778 (idkey:122345678)^0.7777778 (idkey:122345679)^0.7777778 (idkey:122456789)^0.8888889 (idkey:123006789)^0.7777778 (idkey:123045678)^0.7777778 (idkey:123096789)^0.7777778 (idkey:123106789)^0.7777778 (idkey:123145678)^0.7777778 (idkey:123146789)^0.7777778 (idkey:123226789)^0.7777778 (idkey:123256789)^0.8888889 (idkey:123345678)^0.7777778 (idkey:123345689)^0.7777778 (idkey:123346789)^0.7777778 (idkey:123406784)^0.7777778 (idkey:123415678)^0.7777778 (idkey:123435678)^0.7777778 (idkey:123446789)^0.8888889 (idkey:123453789)^0.8888889 (idkey:123454789)^0.8888889 (idkey:123455789)^0.8888889 (idkey:123456289)^0.8888889 (idkey:123456489)^0.8888889 (idkey:123456709)^0.8888889 (idkey:123456779)^0.8888889 (idkey:123456780)^0.8888889 (idkey:123456781)^0.8888889 (idkey:123456783)^0.8888889 (idkey:123456785)^0.8888889 (idkey:123456786)^0.8888889 (idkey:123456787)^0.8888889 (idkey:123456889)^0.8888889 (idkey:123457789)^0.8888889 (idkey:123466789)^0.8888889 (idkey:123496789)^0.8888889 (idkey:123556789)^0.8888889 (idkey:126456789)^0.8888889 (idkey:223456789)^0.8888889 (idkey:423456789)^0.8888889 (idkey:623456789)^0.8888889 (idkey:723456789)^0.8888889)^5.0)",
The value I am expecting to match is 124456788 but ES query is internally not converting it as one of the possible match parameter in fuzzy query.
Do i need to use different ES method to make this work?
This a simple indexing and search.
PUT /myIndex/type1/1
{
"key":"123456789",
"name":"test"
}
GET /myIndex/_search
{
"query": {
"fuzzy": {
"key": {
"value": "124456799",
"fuzziness": 2
}
}
}
}
It is always matching with the given key. fuzziness values 2 or greater is fine.

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

Scope Elasticsearch Results to Specific Ids

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?
Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!
As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}
You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.
According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);
You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Filter items which array contains any of given values

I have a set of documents like
{
tags:['a','b','c']
// ... a bunch properties
}
As stated in the title: Is there a way to filter all documents containing any of given tags using Nest ?
For instance, the record above would match ['c','d']
Or should I build multiple "OR"s manually ?
elasticsearch 2.0.1:
There's also terms query which should save you some work. Here example from docs:
{
"terms" : {
"tags" : [ "blue", "pill" ],
"minimum_should_match" : 1
}
}
Under hood it constructs boolean should. So it's basically the same thing as above but shorter.
There's also a corresponding terms filter.
So to summarize your query could look like this:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tags": ["c", "d"]
}
}
}
}
With greater number of tags this could make quite a difference in length.
Edit: The bitset stuff below is maybe an interesting read, but the answer itself is a bit dated. Some of this functionality is changing around in 2.x. Also Slawek points out in another answer that the terms query is an easy way to DRY up the search in this case. Refactored at the end for current best practices. —nz
You'll probably want a Bool Query (or more likely Filter alongside another query), with a should clause.
The bool query has three main properties: must, should, and must_not. Each of these accepts another query, or array of queries. The clause names are fairly self-explanatory; in your case, the should clause may specify a list filters, a match against any one of which will return the document you're looking for.
From the docs:
In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
Here's an example of what that Bool query might look like in isolation:
{
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
And here's another example of that Bool query as a filter within a more general-purpose Filtered Query:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
}
}
Whether you use Bool as a query (e.g., to influence the score of matches), or as a filter (e.g., to reduce the hits that are then being scored or post-filtered) is subjective, depending on your requirements.
It is generally preferable to use Bool in favor of an Or Filter, unless you have a reason to use And/Or/Not (such reasons do exist). The Elasticsearch blog has more information about the different implementations of each, and good examples of when you might prefer Bool over And/Or/Not, and vice-versa.
Elasticsearch blog: All About Elasticsearch Filter Bitsets
Update with a refactored query...
Now, with all of that out of the way, the terms query is a DRYer version of all of the above. It does the right thing with respect to the type of query under the hood, it behaves the same as the bool + should using the minimum_should_match options, and overall is a bit more terse.
Here's that last query refactored a bit:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tag": [ "c", "d" ],
"minimum_should_match": 1
}
}
}
}
Whilst this an old question, I ran into this problem myself recently and some of the answers here are now deprecated (as the comments point out). So for the benefit of others who may have stumbled here:
A term query can be used to find the exact term specified in the reverse index:
{
"query": {
"term" : { "tags" : "a" }
}
From the documenation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Alternatively you can use a terms query, which will match all documents with any of the items specified in the given array:
{
"query": {
"terms" : { "tags" : ["a", "c"]}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
One gotcha to be aware of (which caught me out) - how you define the document also makes a difference. If the field you're searching in has been indexed as a text type then Elasticsearch will perform a full text search (i.e using an analyzed string).
If you've indexed the field as a keyword then a keyword search using a 'non-analyzed' string is performed. This can have a massive practical impact as Analyzed strings are pre-processed (lowercased, punctuation dropped etc.) See (https://www.elastic.co/guide/en/elasticsearch/guide/master/term-vs-full-text.html)
To avoid these issues, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search. (https://www.elastic.co/blog/strings-are-dead-long-live-strings)
For those looking at this in 2020, you may notice that accepted answer is deprecated in 2020, but there is a similar approach available using terms_set and minimum_should_match_script combination.
Please see the detailed answer here in the SO thread
You should use Terms Query
{
"query" : {
"terms" : {
"tags" : ["c", "d"]
}
}
}

Resources