ElasticSearch multi_match if field exists apply filter otherwise dont worry about it? - elasticsearch

So we got an elasticsearch instance, but a job is requiring a "combo search" (A single search field, with checkboxes for types across a specific index)
This is fine, I simply apply this kind of search to my index (for brevity: /posts):
{
"query": {
"multi_match": {
"query": querystring,
"type":"cross_fields",
"fields":["title","name"]
}
}
}
}
As you may guess from the need for the multi_match here, the schemas to each of these types differs in one way or another. And that's my challenge right now.
In one of the types, just one, there is a field that doesnt exist in the other types, it's called active and it's a basic boolean 0 or 1.
We want to index inactive items in the type for administration search purposes, but we don't want inactive items in this type to be exposed to the public when searching.
To my knowledge and understanding, I want to use a filter. But when I supply a filter asking for active to be 1, I only ever now get results from that type and nothing else. Because now it's explicitly looking for items with that field and equal to one.
How can I do a conditional "if field exists, make sure it equals 1, otherwise ignore this condition"? Can this even be achieved?

if field exists, make sure it equals 1, otherwise ignore this condition
I think it can be implemented like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "active"
}
},
{
"term": {
"active": 1
}
}
]
}
},
{
"missing": {
"field": "active"
}
}
]
}
}
}
}
}
and the complete query:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "whatever",
"type": "cross_fields",
"fields": [
"title",
"name"
]
}
},
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "active"
}
},
{
"term": {
"active": 1
}
}
]
}
},
{
"missing": {
"field": "active"
}
}
]
}
}
}
}
}

Related

Elasticsearch conditional query for nested array

Using the following document, I'm trying to perform an Elasticsearch keyword query, conditionally excluding field data from the scope of the search. Is this possible?
{
"Name":"doc1",
"UserData":[
{
"EnteredBy":"Eric",
"Description":"Desc entered by Eric, abc"
},
{
"EnteredBy":"Alex",
"Description":"Desc entered by Alex, def"
}
]
}
The Elasticsearch query I need will allow me to search across the whole document, except it should exclude from the search UserData items where EnteredBy does not match the specified user.
The following queries would return results:
User:Eric doc1
User:Eric abc
User:Alex doc1
User:Fred doc1
The following queries would not return results:
User:Eric def
User:Fred def
Everything I've tried thus far, ends up filtering content based on the presence of UserData nodes which apply to the specified user. I can't think of a way to specify that a field should be searched, only if the EnteredBy field matches.
I could restructure the document, if that would solve the problem.
Edit 1
The index..
PUT index1
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
},
"mappings": {
"properties" : {
"UserData" : {
"type":"nested"
},
"Name": {
"type":"text"
}
}
}
}
Edit 2
The query below is providing the results that I need, except for the child entity, I have to search in a specific field. If I change the second condition of the nested search into a query_string search, then it no longer uses the EnteredBy condition.
GET index1/_search
{
"query": {
"bool": {
"should": [
{
"nested":
{
"path": "UserData",
"query": {
"bool": {
"must": [{
"match": {
"UserData.EnteredBy": "Eric"
}},
{
"match": {
"UserData.Description": "def"
}
}]
}
}
}
},
{
"query_string":
{
"query": "doc1x"
}
}
]
}
}
}
This query appears to be working. I think I answered my own question.
GET index1/_search
{
"query": {
"bool": {
"should": [
{
"nested":
{
"path": "UserData",
"query": {
"bool": {
"must": [{
"match": {
"UserData.EnteredBy": "Eric"
}},
{
"query_string": {
"query": "def"
}
}]
}
}
}
},
{
"query_string":
{
"query": "doc1"
}
}
]
}
}
}

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

Adding boost to Elasticsearch query

I'm trying to add weight to some results from Elasticsearch.
I'm currently only filtering on an 'active' boolean to grab only the published items:
query: {
filtered: {
query: {
match: {
_all: params[:q]
}
},
filter: {
term: {
active: true
}
}
},
}
I now want to add weight to some of my models. For example, a Market should get a +2 boost. I was trying something like this: (search_type is a field on my results, it's basically the Rails model name)
POST _search
{
"query": {
"function_score": {
"query": {
"match": {
"_all": "hospitality"
}
},
"functions": [
{
"filter": {
"term": {
"active": true
}
}
},
{
"filter": {
"term": {
"search_type": "Market"
}
},
"weight": 2
}
]
}
}
}
However, that does not seem to work: "One entry in functions list is missing a function". So I added "weight": 1 to the active filter.. But now it says it can't parse.
I have no experience with ElasticSearch and the docs are quite confusing. I have also tried using a custom_filters_score thing, but that doesn't seem to work for my version of ES (as described here: http://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/). Another option I tried was combining a boolean query with must and should, but that returned zero results...
Not sure how to proceed. Some insights would be great.
you should be able to use a filtered query alongside function-score to achieve this
Example:
{
"query": {
"filtered": {
"query": {
"function_score": {
"query": {
"match": {
"_all": "hospitality"
}
},
"functions": [
{
"filter": {
"term": {
"search_type": "Market"
}
},
"weight": 2
}
]
}
},
"filter": {
"term": {
"active": true
}
}
}
}
}

How to do nested AND and OR filters in ElasticSearch?

My filters are grouped together into categories.
I would like to retrieve documents where a document can match any filter in a category, but if two (or more) categories are set, then the document must match any of the filters in ALL categories.
If written in pseudo-SQL it would be:
SELECT * FROM Documents WHERE (CategoryA = 'A') AND (CategoryB = 'B' OR CategoryB = 'C')
I've tried Nested filters like so:
{
"sort": [{
"orderDate": "desc"
}],
"size": 25,
"query": {
"match_all": {}
},
"filter": {
"and": [{
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"progress": "incomplete"
}
}, {
"term": {
"progress": "completed"
}
}]
}
}
}, {
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"paid": "yes"
}
}, {
"term": {
"paid": "no"
}
}]
}
}
}]
}
}
But evidently I don't quite understand the ES syntax. Is this on the right track or do I need to use another filter?
This should be it (translated from given pseudo-SQL)
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query":
{
"filtered":
{
"filter":
{
"and":
[
{ "term": { "CategoryA":"A" } },
{
"or":
[
{ "term": { "CategoryB":"B" } },
{ "term": { "CategoryB":"C" } }
]
}
]
}
}
}
}
I realize you're not mentioning facets but just for the sake of completeness:
You could also use a filter as the basis (like you did) instead of a filtered query (like I did). The resulting json is almost identical with the difference being:
a filtered query will filter both the main results as well as facets
a filter will only filter the main results NOT the facets.
Lastly, Nested filters (which you tried using) don't relate to 'nesting filters' like you seemed to believe, but related to filtering on nested-documents (parent-child)
Although I have not understand completely your structure this might be what you need.
You have to think tree-wise. You create a bool where you must (=and) fulfill the embedded bools. Each embedded checks if the field does not exist or else (using should here instead of must) the field must (terms here) be one of the values in the list.
Not sure if there is a better way, and do not know the performance.
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query": {
"query": { #
"match_all": {} # These three lines are not necessary
}, #
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "progress"
}
}
},
{
"terms": {
"progress": [
"incomplete",
"complete"
]
}
}
]
}
},
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "paid"
}
}
},
{
"terms": {
"paid": [
"yes",
"no"
]
}
}
]
}
}
]
}
}
}
}
}

NOT condition in elasticsearch

I am trying to implement NOT condition in elasticsearch query.
Can I Implement filter inside bool or I need to write separate
filter as below. Any optimum solution is there?
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "fashion"
}
},
{
"term": {
"post_status": "publish"
}
}
]
}
},
"filter": {
"not": {
"filter": {
"term": {
"post_type": "page"
}
}
}
}
}
You can use a must_not clause:
{
"query": {
"bool": {
"must": [
{
"match": {
"_all": "fashion"
}
},
{
"term": {
"post_status": "publish"
}
}
],
"must_not": {
"term": {
"post_type": "page"
}
}
}
}
}
Also, I'd recommend using a match filter instead of query_string, as query_string requires the much more strict Lucene syntax (and is therefor more error prone), whereas match works more like a search box: it will automatically transform a human readable query to a Lucene query.

Resources