Filtering Field with multiple values - elasticsearch

How would I approach the following problem:
I want to filter on a field which contains multiple values(eg. ["value1", "value2", "value3"]).
The filter would also contain multiple values (eg. ["value1", "value2"].
I want to get back only the items which have the same field value as filter, eg. field is ["value1", "value2"] and the filter is also ["value1", "value2"]
Any help would be greatly appreciated

I think the somewhat-recently added (v6.1) terms_set query (which Val references on the question he linked in his comment) is what you want.
terms_set, unlike a regular terms, has a parameter to specify a minimum number of matches that must exist between the search terms and the terms contained in the field.
Given:
PUT my_index/_doc/1
{
"values": ["living", "in a van", "down by the river"],
}
PUT my_index/_doc/2
{
"values": ["living", "in a house", "down by the river"],
}
A terms query for ["living", "in a van", "down by the river"] will return you both docs: no good. A terms_set configured to require all three matching terms (the script params.num_terms evaluates to 3) can give you just the matching one:
GET my_index/_search
{
"query": {
"terms_set": {
"values": {
"terms": ["living", "in a van", "down by the river"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
NOTE: While I used minimum_should_match_script in the above example, it isn't a very efficient pattern. The alternative minimum_should_match_field is the better approach, but using it in the example would have meant a couple of more PUTs to add the necessary field to the documents, so I went with brevity.

Related

Search in two fields on elasticsearch with kibana

Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Analyzer to find , e.g: "starbucks" when mistakenly querying "star bucks"

How would I define an analyzer so a query recalls a document with term "starbucks" when mistakenly querying "star bucks"?
Or in general: how would I define an analyzer that is able to search for combined terms by omitting term-separators/ spaces, in the supplied query?
N-grams clearly don't work, since you'd have to know to split up the term 'starbucks' on indexing in 2 separate terms 'star' and 'bucks'. Splitting on syllables might be enough, but not sure if that's possible (or scales)
Thoughts?
You can use Fuzzy Search.
Here is a full working sample:
PUT test1
POST test1/a
{
"item1": "starbucks"
}
POST test1/a
{
"item1": "foo"
}
GET test1/a/_search
{
"query": {
"fuzzy": {
"item1": "star bucks"
}
}
}

Match query return records only if query contains all words of object's field

I read about match and multiword queries but it seems that I need to do something a bit different.
Let's say I have following query: "this is a test" and I want to find that query in one field called "text". I want to get objects which match some of that query (doesn't matter how many words) but only those objects which query value contains every word of text field.
Example for query: "this is a test". I want get those objects:
obj1: {"text":"this is a test"}
obj2: {"text":"this is a"}
obj3 : { "text" : "is a" }
obj4 : { "text" : "test" }
But if obj has something more in text field it will not be returned for example:
obj5: {"text":"this is a test and something more"}
Is it possible to achieve this using Elasticsearch?
It's kind of a hack, but I was able to get it to work with a script filter:
POST /test_index/_search
{
"query": {
"match": {
"text": "this is a test"
}
},
"filter": {
"script": {
"script": "for(val in doc[\"text\"].values){ if(!(val in terms)){ return false; }}; return true;",
"params": {
"terms": ["this", "is", "a", "test"]
}
}
}
}
I thought there would be a better way to do this, but wasn't immediately able to come up with one. Using scripting can be problematic in production, unless your ES cluster is behind an auth wall of some kind.
Anyway, here's the code I used to test it:
http://sense.qbox.io/gist/3929abc89d71ebf724e6121b1b5ba6da54501088

Filter items which array contains any of given values

I have a set of documents like
{
tags:['a','b','c']
// ... a bunch properties
}
As stated in the title: Is there a way to filter all documents containing any of given tags using Nest ?
For instance, the record above would match ['c','d']
Or should I build multiple "OR"s manually ?
elasticsearch 2.0.1:
There's also terms query which should save you some work. Here example from docs:
{
"terms" : {
"tags" : [ "blue", "pill" ],
"minimum_should_match" : 1
}
}
Under hood it constructs boolean should. So it's basically the same thing as above but shorter.
There's also a corresponding terms filter.
So to summarize your query could look like this:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tags": ["c", "d"]
}
}
}
}
With greater number of tags this could make quite a difference in length.
Edit: The bitset stuff below is maybe an interesting read, but the answer itself is a bit dated. Some of this functionality is changing around in 2.x. Also Slawek points out in another answer that the terms query is an easy way to DRY up the search in this case. Refactored at the end for current best practices. —nz
You'll probably want a Bool Query (or more likely Filter alongside another query), with a should clause.
The bool query has three main properties: must, should, and must_not. Each of these accepts another query, or array of queries. The clause names are fairly self-explanatory; in your case, the should clause may specify a list filters, a match against any one of which will return the document you're looking for.
From the docs:
In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
Here's an example of what that Bool query might look like in isolation:
{
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
And here's another example of that Bool query as a filter within a more general-purpose Filtered Query:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
}
}
Whether you use Bool as a query (e.g., to influence the score of matches), or as a filter (e.g., to reduce the hits that are then being scored or post-filtered) is subjective, depending on your requirements.
It is generally preferable to use Bool in favor of an Or Filter, unless you have a reason to use And/Or/Not (such reasons do exist). The Elasticsearch blog has more information about the different implementations of each, and good examples of when you might prefer Bool over And/Or/Not, and vice-versa.
Elasticsearch blog: All About Elasticsearch Filter Bitsets
Update with a refactored query...
Now, with all of that out of the way, the terms query is a DRYer version of all of the above. It does the right thing with respect to the type of query under the hood, it behaves the same as the bool + should using the minimum_should_match options, and overall is a bit more terse.
Here's that last query refactored a bit:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tag": [ "c", "d" ],
"minimum_should_match": 1
}
}
}
}
Whilst this an old question, I ran into this problem myself recently and some of the answers here are now deprecated (as the comments point out). So for the benefit of others who may have stumbled here:
A term query can be used to find the exact term specified in the reverse index:
{
"query": {
"term" : { "tags" : "a" }
}
From the documenation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Alternatively you can use a terms query, which will match all documents with any of the items specified in the given array:
{
"query": {
"terms" : { "tags" : ["a", "c"]}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
One gotcha to be aware of (which caught me out) - how you define the document also makes a difference. If the field you're searching in has been indexed as a text type then Elasticsearch will perform a full text search (i.e using an analyzed string).
If you've indexed the field as a keyword then a keyword search using a 'non-analyzed' string is performed. This can have a massive practical impact as Analyzed strings are pre-processed (lowercased, punctuation dropped etc.) See (https://www.elastic.co/guide/en/elasticsearch/guide/master/term-vs-full-text.html)
To avoid these issues, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search. (https://www.elastic.co/blog/strings-are-dead-long-live-strings)
For those looking at this in 2020, you may notice that accepted answer is deprecated in 2020, but there is a similar approach available using terms_set and minimum_should_match_script combination.
Please see the detailed answer here in the SO thread
You should use Terms Query
{
"query" : {
"terms" : {
"tags" : ["c", "d"]
}
}
}

Resources