Elastic Search MUST + at least one SHOULD in percolator query - elasticsearch

Im trying to make suggestions to users based on several factors:
•Suggestions MUST only be students from the same college
•Suggestions MUST match at least one other field
I thought I had it but the problem is this query will return ALL students from the same school regardless of everything else:
PUT /user/.percolator/4
{
"query": {
"bool": {
"must": [
{ "match": { "college":{
"query" : "Oakland University",
"type" : "phrase"
}}}
],
"should": [
{ "match": { "hashtag": "#chipotle" }},
{ "match": { "hashtag": "#running"}},
{ "match": { "college_course": "ART-160" }}
]
}
}
}
POST /user/stuff/_percolate/
{
"doc":{
"college":"Oakland University",
"college_course": "WRT BIO MTH-400"
}
}

This is because the behavior of should and must in the same bool query. By default none of the "should" clauses are required to match, unless your bool contains only the "should" clause then it's required to match at least one.
To solve you problem, you just need to add "minimum_should_match" : 1 inside your bool query :)

Related

Is it possible to access a query term in a script field?

I would like to construct an elasticsearch query in which I can search for a term and on-the-fly compute a new field for each found document, which is calculated based on some existing fields as well as the query term. Is this possible?
For example, let's say in my EL query I am searching for documents which have the keyword "amsterdam" in the "text" field.
"filter": [
{
"match_phrase": {
"text": {
"query": "amsterdam"
}
}
}]
Now I would also like to have a script field in my query, which computes some value based on other fields as well as the query.
So far, I have only found how to access the other fields of a document though, using doc['someOtherField'], for example
"script_fields" : {
"new_field" : {
"script" : {
"lang": "painless",
"source": "if (doc['citizens'].value > 10000) {
return "large";
}
return "small";"
}
}
}
How can I integrate the query term, e.g. if I wanted to add to the if statement "if the query term starts with a-e"?
You're on the right track but script_fields are primarily used to post-process your documents' attributes — they won't help you filter any docs because they're run after the query phase.
With that being said, you can use scripts to filter your documents through script queries. Before you do that, though, you should explore alternatives.
In other words, scripts should be used when all other mechanisms and techniques have been exhausted.
Back to your example. I see three possibilities off the top of my head.
Match phrase prefix queries as a group of bool-should subqueries:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase_prefix": {
"text_field": "a"
}
},
{
"match_phrase_prefix": {
"text_field": "b"
}
},
{
"match_phrase_prefix": {
"text_field": "c"
}
},
... till the letter "e"
]
}
}
]
}
}
}
A regexp query:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"text_field": "[a-e].+"
}
}
]
}
}
}
Script queries using .charAt comparisons:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
char c = doc['text_field.keyword'].value.charAt(0);
return c >= params.gte.charAt(0) && c <= params.lte.charAt(0);
""",
"params": {
"gte": "a",
"lte": "e"
}
}
}
}
]
}
}
}
If you're relatively new to ES and would love to see real-world examples, check out my recently released Elasticsearch Handbook. One chapter is dedicated to scripting and as it turns out, you can achieve a lot with scripts (if of course executed properly).

Search document even when some fields are missing in elastic search

I want to search student based on centerId, courseId and batchId. For example i have student data as below.
{
"s1":{
"name":alex,
"centerId":"N001",
"courseId":"ncjava",
"batchId":"nb1"},
"s2":{
"name":John,
"centerId":"N001",
"courseId":"nc02",
"batchId":"ncb2"},
"s3":{
"name":David,
"centerId":"N001",
"courseId":"ncjava",
}
}
Now i want to search student where centerId,courseId and batchId matches and even want students that have matching centerId and courseId but where batchId is missing. I wrote below query
{
"query": {
"bool": {"must": [
{
"match": {
"centerId":"N001"
}},
{ "match": {
"courseId": "ncjava"
}}
],
"should":[
{
"match": {
"batchId": "nb1"
}
}
]
}
}
}
This query returns me all the student that matches with centerId and courseId. But it also returns me students who have different 'batchId'. I only want student when batchId is matched or it does not exists.
You can add query terms which are "bool", in order to make "or" logic like you want. batchId = X OR batchId is missing can be represented with a should expression (and batchId is missing with a must_not and exists), like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"centerId": "N001"
}
},
{
"match": {
"courseId": "ncjava"
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"batchId": "nb1"
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "batchId"
}
}
}
}
]
}
}
]
}
}
}
You can consider must like "and", and should like "or" (though more flexible than boolean or), and must_not as boolean "not". So, the above query means something like centerId == N001 AND courseId == ncjava AND (batchId == nb1 OR NOT exists batchId).
In this particular context, minimum_should_match actually isn't required (the default behavior is already what you want), but since the behavior is different in different contexts, I like to include it explicitly, in case the query is edited in an unexpected way in the future (then the behavior of the should will remain the same despite the changed context). minimum_should_match of 1 means that at least 1 of the should clauses must match.
Here's the docs for each of these components:
bool query
exists query
minimum_should_match

multiple sub query inside one query elasticsearch

I have index named dictionary , where contains field like keyword,mapped keyword and category filter.
Keyword Mapped Keyowrd Category
------- -------------- --------
apple apple iphone smartphones
apple apple watch smart watches
apple apple ipad tablets
So if user searches for apple, internally the query will search mapped keywords with respective categories as below query.
SELECT * FROM products where (title= "*apple*" AND title="*iphone*" and category="smartphones") OR (title= "*apple*" AND title="*ipad*" and category="tablets") OR (title= "*apple*" AND title="*watch*" and category="smart watches")
Below is the corresponding elastic search query,I have written.
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match" : {
"title" : {
"query" : "apple iphone",
"operator" : "and"
}
}
},
{
"term": {
"category.raw": "smartphones"
}
}
]
}
},
{
"bool": {
"must": [
{
"match" : {
"title" : {
"query" : "apple watch",
"operator" : "and"
}
}
},
{
"term": {
"category.raw": "smartwatch"
}
}
]
}
},
{
"bool": {
"must": [
{
"match" : {
"title" : {
"query" : "apple ipad",
"operator" : "and"
}
}
},
{
"term": {
"category.raw": "tablets"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
Is the above query right? Any changes needed in the above query?
Is there any way to get top 10 results of each sub query in elasticsearch by adding some parameter in this query?
Yes, your query looks fine as far as I can tell. "minimum_should_match": 1 isn't really necessary, that's the default behavior.
You might be able to impose that sort of logic using a function_score query (maybe with a script_score), but I think the better way to do that would be to just execute three different queries, and get the results for each. If you want to execute those multiple queries in one request, you can do that using the Multi Search API.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I haven´t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Elasticsearch DSL query from an SQL statement

I'm new to Elasticsearch. I don't think I fully understand the concept of query and filters. In my case I just want to use filters as I don't want to use advance feature like scoring.
How would I convert the following SQL statement into elasticsearch query?
SELECT * FROM advertiser
WHERE company like '%com%'
AND sales_rep IN (1,2)
What I have so far:
curl -XGET 'localhost:9200/advertisers/advertiser/_search?pretty=true' -d '
{
"query" : {
"bool" : {
"must" : {
"wildcard" : { "company" : "*com*" }
}
}
},
"size":1000000
}'
How to I add the OR filters on sales_rep field?
Thanks
Add a "should" clause after your must clause. In a bool query, one or more should clauses must match by default. Actually, you can set the "minimum_number_should_match" to be any number, Check out the bool query docs.
For your case, this should work.
"should" : [
{
"term" : { "sales_rep_id" : "1" }
},
{
"term" : { "sales_rep_id" : "2" }
}
],
The same concept works for bool filters. Just change "query" to "filter". The bool filter docs are here.
I come across this post 4 years too late...
Anyways, perhaps the following code could be useful...
{
"query": {
"filtered": {
"query": {
"wildcard": {
"company": "*com*"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"sales_rep_id": [ "1", "2" ]
}
}
]
}
}
}
}
}

Resources