Is there a way to build an Elastic query with changing search values? - elasticsearch

I want to use Elastic in PHP to process a search request from my website. For example, I have the search parameter
name
age
height
weight.
But it should not be necessary to always search for all parameters.
So it could be that only (name AND age) have values and (height AND weight) have not.
Is there a way to build one query with flexible/changing input values?
The query below would not work when there are no search values for (height AND weight).
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
]
}
}
}

Search templates to the rescue:
POST _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": """
{
"query": {
"bool": {
"should": [
{{#name}}
{ "match": { "name.keyword": "{{name}}" } },
{{/name}}
{{#age}}
{ "match": { "age": "{{age}}" } },
{{/age}}
{{#height}}
{ "match": { "height": "{{height}}" } },
{{/height}}
{{#weight}}
{ "match": { "weight": "{{weight}}" } },
{{/weight}}
{ "match_none": { } }
]
}
}
}
"""
}
}
Note that since you don't know how many criteria you have, the last condition is always false and is only there to make sure the JSON is valid (i.e. the last comma doesn't stay dangling)
You can then run your query like this:
POST my-index/_search/template
{
"id": "my-search-template",
"params": {
"name": "Anna",
"age": 30
}
}

You need to handle in your application that constructs your Elasticsearch query and its very easy to do it in the application as you know what all search parameter value you got from UI, if they are not null than only includes those fields in your Elasticsearch query.
Elasticsearch doesn't support if...else like condition in query.

Tldr;
They are multiple way to address your problem in Elasticsearch.
You could be playing with the parameter minimum_should_match
You could be using template queries with conditions.
You could also perform more complex bool queries, that enumerate the possibilities for a match.
You could also use scripts to program the logic you want to see.
Minimum should match
POST /_bulk
{"index":{"_index":"73121817"}}
{"name": "ana", "age": 1, "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "jack", "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "emma", "age": 1, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "william", "age": 1, "height": 180}
{"index":{"_index":"73121817"}}
{"name": "jenny", "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "marco", "age": 1}
{"index":{"_index":"73121817"}}
{"name": "giulia", "height": 180}
{"index":{"_index":"73121817"}}
{"name": "paul"}
GET 73121817/_search
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
],
"minimum_should_match": 2
}
}
}
with the minimum should match set to 2 only 2 documents are returned ana and jack
Template queries
Well Val's answer is quite complete
You could also refer to the doc
Complex queries
Refer to the so post behind the link
Scripted queries
GET 73121817/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": """
return (!doc["name.keyword"].empty && !doc["age"].empty);
"""
}
}
}
}
}

Related

ElasticSearch - score boosting using scripting

We have a specific use-case for our ElasticSearch instance: we store documents which contain proper names, dates of birth, addresses, ID numbers, and other related info.
We use a name-matching plugin which overrides the default scoring of ES and assigns a relevancy score between 0 and 1 based on how closely the name matches.
What we need to do is boost that score by a certain amount if other fields match. I have started to read up on ES scripting to achieve this. I need assistance on the script part of the query. Right now, our query looks like this:
{
"size":100,
"query":{
"bool":{
"should":[
{"match":{"Name":"John Smith"}}
]
}
},
"rescore":{
"window_size":100,
"query":{
"rescore_query":{
"function_score":{
"doc_score":{
"fields":{
"Name":{"query_value":"John Smith"},
"DOB":{
"function":{
"function_score":{
"script_score":{
"script":{
"lang":"painless",
"params":{
"query_value":"01-01-1999"
},
"inline":"if **<HERE'S WHERE I NEED ASSISTANCE>**"
}
}
}
}
}
}
}
}
},
"query_weight":0.0,
"rescore_query_weight":1.0
}
}
The Name field will always be required in a query and is the basis for the score, which is returned in the default _score field; for ease of demonstration, we'll just add one additional field, DOB, which if matched, should boost the score by 0.1. I believe I'm looking for something along the lines of if(query_value == doc['DOB'].value add 0.1 to _score), or something along these lines.
So, what would be the correct syntax to be entered into the inline row to achieve this? Or, if the query requires other syntax revision, please advise.
EDIT #1 - it's important to highlight that our DOB field is a text field, not a date field.
Splitting to a separate answer as this solves the problem differently (i.e. - by using script_score as OP proposed instead of trying to rewrite away from scripts).
Assuming the same mapping and data as the previous answer, a scripted version of the query might look like the following:
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": "double boost = 0.0; if (params['_source']['State'] == 'FL') { boost += 0.1; } if (params['_source']['DOB'] == '1965-05-24') { boost += 0.3; } return boost;",
"lang": "painless"
}
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
Two notes about the script:
The script uses params['_source'][field_name] to access the document, which is the only way to get access to text fields. This is significantly slower as it requires accessing documents directly on disk, though this penalty might not be too bad in the context of a rescore. You could instead use doc[field_name].value if the field was an aggregatable type, such as keyword, date, or something numeric
DOB here is compared directly to a string. This is possible because we're using the _source field, and the JSON for the documents has the dates specified as strings. This is somewhat brittle, but likely will do the trick
Assuming static weights per additional field, you can accomplish this without using scripting (though you may need to use script_score for any more complex weighting). To solve your issue of directly adding to a document's original score, your rescoring query will need to be a function score query that:
Composes queries for additional fields in a should clause for the function score's main query (i.e. - will only produce scores for documents matching at least one additional field)
Uses one function per additional field, with the filter set to select documents with some value for that field, and a weight to specify how much the score should increase (or some other scoring function if desired)
Mapping (as template)
Adding a State and DOB field for sake of example (making sure multiple additional fields contribute to the score correctly)
PUT _template/employee_template
{
"index_patterns": ["employee"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"Name": {
"type": "text"
},
"State": {
"type": "keyword"
},
"DOB": {
"type": "date"
}
}
}
}
}
Sample data
POST /employee/_doc/_bulk
{"index":{}}
{"Name": "John Smith", "State": "NY", "DOB": "1970-01-01"}
{"index":{}}
{"Name": "John C. Reilly", "State": "CA", "DOB": "1965-05-24"}
{"index":{}}
{"Name": "Will Ferrell", "State": "FL", "DOB": "1967-07-16"}
Query
EDIT: Updated the query to include the original query in the new function score in an attempt to compensate for custom scoring plugins.
A few notes about the query below:
Setting the rescorers score_mode: max is effectively a replace here, since the newly computed function score should only be greater than or equal to the original score
query_weight and rescore_query_weight are both set to 1 such that they are compared on equal scales during score_mode: max comparison
In the function_score query:
score_mode: sum will add together all the scores from functions
boost_mode: sum will add the sum of the functions to the score of the query
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
],
"filter": {
"bool": {
"should": [
{
"term": {
"State": "CA"
}
},
{
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
}
]
}
}
}
},
"functions": [
{
"filter": {
"term": {
"State": "CA"
}
},
"weight": 0.1
},
{
"filter": {
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
},
"weight": 0.3
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"score_mode": "max",
"query_weight": 1,
"rescore_query_weight": 1
}
}
}

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Sort Elasticsearch results based on field value

Assuming I have 3 documents (users), and they have knowledge of multiple programming languages - with scores associated, as described below, how can I search for multiple fields (multi-match for example), and if some search-keywords hits a language, sort by its score?
// user1
{
"name": "John Bayes",
"prog_langs": [
{
"name": "python",
"score": 10
},
{
"name": "java",
"score": 500
}
]
}
// user2
{
"name": "John Russel",
"prog_langs": [
{
"name": "python",
"score": 100
},
{
"name": "PHP",
"score": 200
}
]
}
// user3
{
"name": "Terry Guy",
"prog_langs": [
{
"name": "C++",
"score": 600
},
{
"name": "Javascript",
"score": 200
}
]
}
For example: searching "John python"
Should return user1 and user2, but user2 showing up first
**I've been trying to use sort and functions, but I think they always use lowest/highest/average values of score.
Thanks!
[Edit]
**In the meantime I got it working in a testing way to see if without full-text/multi-matched works, and I found out I had to make "prog_langs" nested, so I changed the mapping and it works as expected.
Now I'm only missing the part where a full-text search with multi-match merges with current query.
Thanks again!
I managed to fix the query and now it's working as expected.
Before posting my solution, just have to leave a few things to keep in mind:
I made a new mapping, and added some nested objects, so my original query had to suffer some changes (prog_langs are now of type nested)
I wanted at least two fields to match, being mandatory which should match at least once
{
"query": {
"bool": {
"must": [
{
"query": {
"match": {
"name": {
"query": "john python",
"boost": 5
}
}
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "prog_langs",
"query": {
"match": {
"prog_langs.name": {
"query": "john python",
"boost": 5
}
}
}
}
}
]
}
}
],
"should": [
{
"function_score": {
"query": {
"match": {
"prog_langs.name": "john python"
}
},
"functions": [
{
"script_score": {
"script": "_score * (1 + doc['prog_langs.score'].value)"
}
}
]
}
}
]
}
},
"highlight": {
"fields": {
"name": {},
"prog_langs.name": {}
}
}
}

Elasticsearch 2.x, query for tag, and sort results by tag weigth

i am using elasticsearch 2.3
i have an index of books. each book has tag, and each tag has weight.
i want to get all books that have the requested tag, sorted by tag weight.
for example:
PUT book/book/0
{
"name": "book 0",
"tags": [
{"t": "comedy", "w": 30},
{"t": "drama","w": 20},
]
}
PUT book/book/1
{
"name": "book 1",
"tags": [
{"t": "comedy", "w": 10},
{"t": "drama","w": 5},
{"t": "other","w": 50},
]
}
PUT book/book/2
{
"name": "book 2",
"tags": [
{"t": "comedy", "w": 5},
{"t": "drama","w": 30},
]
}
PUT book/book/3
{
"name": "book 3",
"tags": [
{"t": "comedy", "w": 5},
{"t": "other","w": 30},
]
}
i want to search for all books that has tags comedy and drama.
the result order is:
book 0 (20+30)
book 2 (30+5)
book 1 (10+5)
UPDATE:
i want to return only books that match both tags (and sort only by requested tags). so if i search for 'drama' and 'comedy', only books that has both tags will return (in this case book 0, book 1, book2), sorted by requested tag weights.
how can i get this? any example for query?
Ibrahim's answer is correct if you always want to sum up all weights, even for tags that don't match your query.
If you only want to take the weights of tags into account that you're searching for, you'll have to index tags as a nested object. This is because otherwise all ts and ws are flattened into lists, losing the associations in the process (described here).
Then you can use a function_score query wrapped in a nested query to sum up only the weights of the matching tags. You will have to enable scripting.
Here is an example:
GET /book/_search
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"terms": {
"tags.t": [
"comedy",
"drama"
]
}
}
]
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
},
"score_mode": "sum"
}
}
}
=== EDIT following #Eyal Ch's comment ===
If only books matching BOTH tags (comedy and drama in the example) are to be returned, it gets a bit more complicated, as each search term needs its own nested query.
Here's an example:
GET /book/_search
{
"query": {
"bool": {
"must":
[
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"term": {
"tags.t": {
"value": "comedy"
}
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
}
}
},
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"term": {
"tags.t": {
"value": "drama"
}
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
}
}
}
]
}
}
}
Try this:
POST book/book/_search
{
"query": {
"match": {
"tags.t": "comedy drama"
}
},
"sort": [
{
"tags.w": {
"order": "desc",
"mode": "sum"
}
}
]
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources