Elastic Search Relevance for query based on most matches - elasticsearch

I have a following mapping
posts":{
"properties":{
"prop1": {
"type": "nested",
"properties": {
"item1": {
"type": "string",
"index": "not_analyzed"
},
"item2": {
"type": "string",
"index": "not_analyzed"
},
"item3": {
"type": "string",
"index": "not_analyzed"
}
}
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
Consider the objects indexed like following for these mapping
{
"name": "Name1",
"prop1": [
{
"item1": "val1",
"item2": "val2",
"item3": "val3"
},
{
"item1": "val1",
"item2": "val5",
"item3": "val6"
}
]
}
And another object
{
"name": "Name2",
"prop1": [
{
"item1": "val2",
"item2": "val7",
"item3": "val8"
},
{
"item1": "val12",
"item2": "val9",
"item3": "val10"
}
]
}
Now say i want to search documents which have prop1.item1 value to be either "val1" or "val2". I also want the result to be sorted in such a way that the document with both val1 and val2 would have more score than the one with only one of "val1" or "val2".
I have tried the following query but that doesnt seem to score based on number of matches
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "prop1",
"filter": {
"or": [
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val2"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val5"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val12"}},
{"term": {"prop1.item2": "val9"}}
]
}
]
}
}
}
}
}
}
Now although it should give both documents, first document should have more score as it contains 2 of the things in the filter whereas second contains only one.
Can someone help with the right query to get results sorted based on most matches ?

The biggest problem you have with your query is that you are using a filter. Therefore no score is calculated. Than you use a match_all query which gives all documents a score of 1. Replace the filtered query with a query and use the bool query instead of the bool filter.
Hope that helps.

Scores aren't calculated on filters use a nested query instead:
{
"query": {
"nested": {
"score_mode": "sum",
"path": "prop1",
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val2"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val5"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val12"
}
},
{
"match": {
"prop1.item2": "val9"
}
}]
}
}]
}
}
}
}
}

Related

Elasticsearch query by generic properties: keywords and numeric values

I have this mapping in ES 7.9:
{
"mappings": {
"properties": {
"cid": {
"type": "keyword",
"store": true
},
"id": {
"type": "keyword",
"store": true
},
"a": {
"type": "nested",
"properties": {
"attribute":{
"type": "keyword"
},
"key": {
"type": "keyword"
},
"num": {
"type": "float"
}
}
}
}
}
}
And some documents indexed like:
{
"cid": "177",
"id": "1",
"a": [
{
"attribute": "tags",
"key": [
"heel",
"thong",
"low_heel",
"economic"
]
},
{
"attribute": "weight",
"num": 15
}
]
}
Basically, an object can have multiple attributes (a property array).
Those attributes can be different for each client. In this example, I have 2 types of attributes: tag and weight, however other documents could have other attributes like vendor, size, power, etc., so the model has to be generic enough to support beforehand unknown attributes.
An attribute can be a list of keywords (like tags) or a numeric value (like weight).
I need an ES query to fetch the documents ids with this pseudo-query:
cid="177" and (tag="flat" or tag="heel") and tag="economic" and weight<20
I managed to reach this query that seems to be working as expected:
{
"_source": ["id"],
"query": {
"bool": {
"must" : [
{"term" : { "cid" : "177" }},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"terms" : { "a.key": ["flat","heel"]}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"term" : { "a.key": "economic"}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "weight" } },
{"range": { "a.num": {"lt": 20} } }
]
}
}
}
}
]
}
}
}
Is this query correct or I am getting the correct results by chance?
Is the query (or mapping) optimal or I should rethink something?
Can the query be simplified?
The query is correct.
The mapping is great and the query is optimal.
While the query can be simplified:
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"term": {
"cid": "177"
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:tags AND ((a.key:flat OR a.key:heel) AND a.key:economic)"
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:weight AND a.num:<20"
}
}
}
}
]
}
}
}
it'd be less optimal due to the fact that these query_strings would still need to be internally compiled into essentially the query DSL that you've got above. Plus you'd still be needing the two separate nested groups so... You're good to roll with what you've got.

Elasticsearch for index array element

Hi i want to search array element from index using elastic search query
{
"name": "Karan",
"address": [
{
"city": "newyork",
"zip": 12345
},
{
"city": "mumbai",
"zip": 23456
}]
}}
when i am trying to search using match query it does not work
{
"query": {
"bool": {
"must": [
{
"match": {
"address.city": "newyork"
}
}
]
}
}
}
when i access simple feild like "name": "Karan" it works, there is only issue for array element.
Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:
GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "eggs"
}
},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{
"match": {
"comments.name": "john"
}
},
{
"match": {
"comments.age": 28
}
}
]
}
}
}
}
]
}}}
See the docs
The way i followed..
Mapping :
{
"mappings": {
"job": {
"properties": {
"name": {
"type": "text"
},
"skills": {
"type": "nested",
"properties": {
"value": {
"type": "text"
}
}
}
}
}
}
Records
[{"_index":"jobs","_type":"job","_id":"2","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}]}},{"_index":"jobs","_type":"job","_id":"1","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}, {"value": "javascript"}]}},
search Query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{ "match": {"skills.value": "java"}}
]
}
}
}
}
}

Elasticsearch nested query and sorting

can somebody help me to understand what Elastic means by nested. In documentations https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_nested_sorting_examples is an example which does not show how the document object looks like. It look like I should imagine the mapping from the search query. Query looks like:
POST /_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"bool": {
"must": {"range": {"parent.age": {"gte": 21}}},
"filter": {
"nested": {
"path": "parent.child",
"query": {"match": {"parent.child.name": "matt"}}
}
}
}
}
}
},
"sort" : [
{
"parent.child.age" : {
"mode" : "min",
"order" : "asc",
"nested": {
"path": "parent",
"filter": {
"range": {"parent.age": {"gte": 21}}
},
"nested": {
"path": "parent.child",
"filter": {
"match": {"parent.child.name": "matt"}
}
}
}
}
}
]
}
Can somebody write a document structure on which this query will work?
Something like this.
{
"parent": {
"name": "Elasti Sorch",
"age": 23,
"child": [
{
"name": "Kibana Lion",
"age": 12
},
{
"name": "Matt",
"age": 15
}
]
}
}
In Elastic nested means it's an array of objects. To store an array of objects into a field in elastic search you have to map the field to a nested while creating the index.
PUT parent
{
"mappings": {
"doc":{
"properties": {
"name":{
"type": "text"
},
"age":{
"type": "integer"
},
"child":{
"type": "nested",
"properties": {
"name":{
"type":"text"
},
"age":{
"type":"integer"
}
}
}
}
}
}
}
and a sample nested document cab be inserted like this
POST parent/doc
{
"name":"abc",
"age":50,
"child":[
{
"name":"son1",
"age":25
},
{
"name":"adughter1",
"age":20
}
]
}

Searching in specific fields of types

Consider the following query:
{
"query" : {
"match_phrase" : {
"_all" : "Smith"
}
}
}
How would I specify in which fields of which types it may search, instead of searching in everything? (field names may be non-unique across types)
I've tried the query below, but it didn't work (it doesn't return results, it does when I remove person. from all fields):
{
"query": {
"multi_match": {
"query": "Smith",
"fields": [
"person.first_name",
"person.last_name",
"person.age"
],
"lenient": true
}
}
}
I'm sending these queries to http://localhost:9200/tsf-model/_search.
If you can build your query dynamically, I think you can use a combination of your multi_match query and a type query for each type, in order to achieve what you want:
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"filter": [
{
"type": {
"value": "type1"
}
},
{
"multi_match": {
"query": "Smith",
"fields": [
"field1",
"field3",
"field5"
]
}
}
]
}
},
{
"bool": {
"filter": [
{
"type": {
"value": "type2"
}
},
{
"multi_match": {
"query": "Smith",
"fields": [
"field2",
"field4",
"field6"
]
}
}
]
}
}
]
}
}
}

Elasticsearch : search document with conditional filter

I have two documents in my index (same type) :
{
"first_name":"John",
"last_name":"Doe",
"age":"24",
"phone_numbers":[
{
"contract_number":"123456789",
"phone_number":"987654321",
"creation_date": ...
},
{
"contract_number":"123456789",
"phone_number":"012012012",
"creation_date": ...
}
]
}
{
"first_name":"Roger",
"last_name":"Waters",
"age":"36",
"phone_numbers":[
{
"contract_number":"546987224",
"phone_number":"987654321",
"creation_date": ...,
"expired":true
},
{
"contract_number":"87878787",
"phone_number":"55555555",
"creation_date": ...
}
]
}
Clients would like to perform a full text search. Okay no problem here
My problem :
In this full text search, sometimes user will search by phone_number. In this case there is a parameter like expired=true.
Example :
First client search request : "987654321" with expired absent or set to false
--> Result : Only first document
Second client search request : "987654321" with expired set to true
--> Result : The two documents
How can I achieve that ?
Here is my mapping :
{
"user": {
"_all": {
"auto_boost": true,
"omit_norms": true
},
"properties": {
"phone_numbers": {
"type": "nested",
"properties": {
"phone_number": {
"type": "string"
},
"creation_date": {
"type": "string",
"index": "no"
},
"contract_number": {
"type": "string"
},
"expired": {
"type": "boolean"
}
}
},
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "string"
}
}
}
}
Thanks !
MC
EDIT :
I tried this query :
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"filter": {
"bool": {
"should":[
{
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
},
{
"missing": {
"field": "expired"
}
}
]
}
},
{
"bool": {
"must_not": [
{
"term": {
"phone_number": "987654321"
}
}
]
}
}
]
}
}
}
}
}
}}
But I get the two documents instead of get only the first one
You're very close. Try using a combination of must and should, where the must clause ensures the phone_number matches the search value, and the should clause ensures that either the expired field is missing or set to false. For example:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
}
],
"should": [
{
"missing": {
"field": "expired"
}
},
{
"term": {
"expired": false
}
}
]
}
}
}
}
}
}
}
}
}
I ran this query using your mapping and sample documents and it returned the one document for John Doe, as expected.

Resources