Is it possible to access a query term in a script field? - elasticsearch

I would like to construct an elasticsearch query in which I can search for a term and on-the-fly compute a new field for each found document, which is calculated based on some existing fields as well as the query term. Is this possible?
For example, let's say in my EL query I am searching for documents which have the keyword "amsterdam" in the "text" field.
"filter": [
{
"match_phrase": {
"text": {
"query": "amsterdam"
}
}
}]
Now I would also like to have a script field in my query, which computes some value based on other fields as well as the query.
So far, I have only found how to access the other fields of a document though, using doc['someOtherField'], for example
"script_fields" : {
"new_field" : {
"script" : {
"lang": "painless",
"source": "if (doc['citizens'].value > 10000) {
return "large";
}
return "small";"
}
}
}
How can I integrate the query term, e.g. if I wanted to add to the if statement "if the query term starts with a-e"?

You're on the right track but script_fields are primarily used to post-process your documents' attributes — they won't help you filter any docs because they're run after the query phase.
With that being said, you can use scripts to filter your documents through script queries. Before you do that, though, you should explore alternatives.
In other words, scripts should be used when all other mechanisms and techniques have been exhausted.
Back to your example. I see three possibilities off the top of my head.
Match phrase prefix queries as a group of bool-should subqueries:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase_prefix": {
"text_field": "a"
}
},
{
"match_phrase_prefix": {
"text_field": "b"
}
},
{
"match_phrase_prefix": {
"text_field": "c"
}
},
... till the letter "e"
]
}
}
]
}
}
}
A regexp query:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"text_field": "[a-e].+"
}
}
]
}
}
}
Script queries using .charAt comparisons:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
char c = doc['text_field.keyword'].value.charAt(0);
return c >= params.gte.charAt(0) && c <= params.lte.charAt(0);
""",
"params": {
"gte": "a",
"lte": "e"
}
}
}
}
]
}
}
}
If you're relatively new to ES and would love to see real-world examples, check out my recently released Elasticsearch Handbook. One chapter is dedicated to scripting and as it turns out, you can achieve a lot with scripts (if of course executed properly).

Related

Elastic Search 7.9: exact on multiple fields

I am using the latest version of Elastic Search (7.9) and i'm trying to find a good way to do a multiple term query as an AND.
Essentially what i want do is:
select * where field1 === 'word' AND field2 === 'different word'
I am currently using term to do an exact keyword match on field1. But adding in the second field is causing me some jip.
{
"query": {
"term": {
"field1": {
"value": "word"
}
}
}
}
This is my current query, i have tried using BOOL. I came across answers in previous versions where i could maybe use a filtered query. But i cant seem to get that to work either.
Can someone help me please. It's really doing my nut.
EDIT
Here is what i've tried with a bool / must with multiple terms. But i get no results even though i know in this case this query should return data
{
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": "word"
}
}
},
{
"term": {
"field2": {
"value": "other word"
}
}
}
]
}
}
}
Ok, so, fun fact. You can do the bool / match as i have tried. What you should remember is that keyword matches (which is what i'm using) are case sensitive. ES doesnt do any analyzing or filtering of the data when you set the type of a field to keyword.
It's also possible to use a bool with a filter to get the same results (when you factor in a type)
{
"query": {
"bool": {
"must":
{
"term": {
"field1": {
"value": "word"
}
}
},
"filter":
{
"term": {
"field2": {
"value": "other word"
}
}
}
}
}
}

Elastic search query using python list

How do I pass a list as query string to match_phrase query?
This works:
{"match_phrase": {"requestParameters.bucketName": {"query": "xxx"}}},
This does not:
{
"match_phrase": {
"requestParameters.bucketName": {
"query": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo",
]
}
}
}
match_phrase simply does not support multiple values.
You can either use a should query:
GET _search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "auditloggingnew2232"
}
}
},
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "config-bucket-123"
}
}
}
]
},
...
}
}
or, as #Val pointed out, a terms query:
{
"query": {
"terms": {
"requestParameters.bucketName": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo"
]
}
}
}
that functions like an OR on exact terms.
I'm assuming that 1) the bucket names in question are unique and 2) that you're not looking for partial matches. If that's the case, plus if there are barely any analyzers set on the field bucketName, match_phrase may not even be needed! terms will do just fine. The difference between term and match_phrase queries is nicely explained here.

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

Fast elasticsearch CASE WHEN THEN ELSE equivalent?

I need to build an exclusive bucketing aggregation in Elasticsearch (ie. the documents are assigned to the FIRST bucket to meet the criterion, not ALL buckets that meet it as the filters might overlap - this is the same behavior as a CASE WHEN THEN ELSE in SQL environments). Currently I am using a Filters Aggregation coupled with a Bool Query/Filter to achieve what I want. The idea is to use the "must" and "must_not" parts of the "Bool Query" where the "must" is my filter and the "must_not" is the collection of all the other filters that have already been used previously. An example would be:
GET _search
{
"query":{"match_all":{}},
"size":0,
"aggs":{
"bin_1": {
"filter": {
"bool": {
"must": { <filter1> },
"must_not": { <empty> }
}
}
},
"bin_2": {
"filter": {
"bool": {
"must": { <filter2> },
"must_not": { <filter1> }
}
}
},
"bin_3": {
"filter": {
"bool": {
"must": { <filter3> },
"must_not": { <filter1>, <filter2> }
}
}
},
"bin_else": {
"filter": {
"bool": {
"must": { <empty> },
"must_not": { <filter1>, <filter2>, <filter3> }
}
}
}
}
}
In a relational approach, the same would be achieved by the CASE WHEN clause like so:
CASE WHEN <filter1> THEN <bin_1>
WHEN <filter2> THEN <bin_2>
WHEN <filter3> THEN <bin_3>
ELSE <bin_else>
END
The problem with this approach is that it gets slower and slower the more buckets I add (in my real case I even have nested buckets). Is there any language support for exclusive bucketing like this in Elastic? Or any other faster approach that would yield the same results?
Thank you!
I think the solution would be to Script fields. It would use the if else logic, so no extra conditions would be used. Just I do not know what kind of filter you are using but it should be possible to implement anything I think. I will write here an equivalent of
SELECT
CASE WHEN <filter1> THEN <bin_1>
WHEN <filter2> THEN <bin_2>
ELSE <bin_else>
END as binning
FROM SOMETHING
Implemented using script fields in painless language. As is described here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html
and painless here:
https://www.elastic.co/guide/en/elasticsearch/painless/5.6/painless-examples.html
GET _search
{
"query" : { "match_all": {} },
"script fields" : {
"binning" : {
"script" : {
"lang": "painless",
"source": "if (<filter>) {return <bin1>;} else if (<filter2>) {return <bin2>;} else {return <bin3>;}"
}
}
}
where the "filter" would be something like: doc['my_field'].value == "value1" where 'my_field' is the field that you use in the filter.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I haven´t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources