Function Score On Nested Object - elasticsearch

I have this index blog with the following settings and mappings.
PUT /blog
{
"settings": {
"index": {
"number_of_shards": "1"
}
},
"mappings": {
"post": {
"_all": {
"enabled": false
},
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"visitor": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"last_visit": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}
}
}
I want to rank my posts based on relevancy and visitor's last visit. I tried this query without success. It seems like the gauss function cannot get the value of visitor's last_visit. How to get this worked?
POST /blog/post/_search
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"visitor.last_visit": {
"origin": "now/d",
"offset": "3d",
"scale": "4d",
"decay": 0.5
}
},
"filter": {
"nested": {
"path": "visitor",
"query": {
"term": {
"visitor.id": "1"
}
}
}
}
}
]
}
}
}

Here is a query with a match for a name that uses a nested object that I had for a particular use case. I didn't use any date fields, but as I said, it does use a nested object. I used relevancy of distance along with a text match, so it's similar.
I used the answer from this question to structure my query as it matched what I was trying to do. Scoring documents by text match and distance
GET dev_search_core_data/_search?size=200
{
"query": {
"bool": {
"should": [
{
"match": {
"NAME": "Amy Smith"
}
},
{
"bool": {
"must": [
{
"function_score": {
"query": {
"nested": {
"path": "LOCATION",
"query": {
"term": {
"LOCATION.SOME_IND": {
"value": true
}
}
}
}
},
"functions": [
{
"gauss": {
"LOCATION.COORDINATES": {
"origin": "-118.309, 34.041",
"scale": "50km",
"offset": "10km",
"decay": 0.5
}
}
}
]
}
}
]
}
}
]
}
}
}
I think the problem is with the structure of your query. I always run this command first to validate my queries if I'm having any problems to eliminate any syntax issues.
GET dev_search_core_data/_validate/query?explain
This was the result:
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "dev_search_core_data_b",
"valid": true,
"explanation": "filtered((NAME:amy NAME:smith) (+function score (ToParentBlockJoinQuery (filtered(LOCATION.SOME_IND:true)->random_access(_type:_LOCATION)),function=org.elasticsearch.index.query.functionscore.DecayFunctionParser$GeoFieldDataScoreFunction#274227b9)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#1012ada6)"
}
]
}
I also looked at the docs for an in-depth explanation of how the function score worked. You don't mention your version, but I'm using ES 1.6.

Related

Query hashmap structure with elasticsearch

I have two questions regarding mapping and querying a java hashmap in elasticsearch.
Does this mapping make sense in elasticsearch (is it the correct way to map a hashmap)?:
{
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
Here is some example data:
{
"itemsMap": {
"2021-12-31": {
"itemVal1": 100.0,
"itemVal2": 150.0,
},
"2021-11-30": {
"itemVal1": 200.0,
"itemVal2": 50.0,
}
}
}
My queries don't seem to work. For example:
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-11-30"
}
}
]
}
}
}
}
}
Am I doing something wrong? How can I query such a structure? I have the possibility to change the mapping if it's necessary.
Thanks
TLDR;
The way you are uploading your data, nothing is stored in key.
You will have fields named 2021-11-30 ... and key is going to be empty.
Either you have a limited amount of "dates" and this is a viable options (less than 1000) else your format is not viable on the long run.
If you don't want to change your doc, here is the query
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "itemsMap.2021-12-31"
}
}
]
}
}
}
}
}
To understand
If you inspect the mapping by querying the index
GET /<index_name>/_mapping
You will see that the number of fields name after your date is going to grow.
And in all your doc, itemsMap.key is going to be empty. (this explain why my previous answer did not work.
A more viable option
Keep your mapping, update the shape of your docs.
They will look like
{
"itemsMap": [
{
"key": "2021-12-31",
"value": { "itemVal1": 100, "itemVal2": 150 }
},
{
"key": "2021-11-30",
"value": { "itemVal1": 200, "itemVal2": 50 }
}
]
}
DELETE /71525899
PUT /71525899/
{
"mappings": {
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2022-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-11-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-12-31"
}
}
]
}
}
}
}
}

Need help combining wildcard search with range query within elasticsearch?

I am trying to combine wildcard with date range in elastic search query but is not giving response based upon the wildcard search. It is returning response with items which have incorrect date range.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
]
}
}
]
}
}
}
The index mapping looks as below:
{
"index_history": {
"mappings": {
"applications_datalake": {
"properties": {
"query": {
"properties": {
"term": {
"properties": {
"server": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
},
"index-data-type": {
"properties": {
"attributes": {
"properties": {
"wwnListForServer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"hostName": {
"type": "keyword"
},
"requestDate": {
"type": "date"
},
"requestedBy": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
You missed minimum_should_match parameter,
Check this out :
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html.
I think your query should looklike this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
],
"minimum_should_match" : 2
}
}
]
}
}
}
From the documentation :
You can use the minimum_should_match parameter to specify the number
or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
According to your mappings, you have to call-out the fully qualified property for hostName and requestDate fields. Example:
"wildcard": {
"index-data-type.hostName": {
"value": "..."
}
}
Also, could also consider reducing your compound queries to just the main bool query, using the must clause, and apply a filter. Example:
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"wildcard": {
"index-data-type.hostName": {
"value": "*abc*"
}
}
}
],
"filter": {
"range": {
"index-data-type.requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
}
}
}
The filter context doesn't contribute to the _score yet it reduces your number of hits.
Warnining:
Using the leading asterisk (*) on a wildcard query can have severe performance impacts to your queries.

Elasticsearch : search document with conditional filter

I have two documents in my index (same type) :
{
"first_name":"John",
"last_name":"Doe",
"age":"24",
"phone_numbers":[
{
"contract_number":"123456789",
"phone_number":"987654321",
"creation_date": ...
},
{
"contract_number":"123456789",
"phone_number":"012012012",
"creation_date": ...
}
]
}
{
"first_name":"Roger",
"last_name":"Waters",
"age":"36",
"phone_numbers":[
{
"contract_number":"546987224",
"phone_number":"987654321",
"creation_date": ...,
"expired":true
},
{
"contract_number":"87878787",
"phone_number":"55555555",
"creation_date": ...
}
]
}
Clients would like to perform a full text search. Okay no problem here
My problem :
In this full text search, sometimes user will search by phone_number. In this case there is a parameter like expired=true.
Example :
First client search request : "987654321" with expired absent or set to false
--> Result : Only first document
Second client search request : "987654321" with expired set to true
--> Result : The two documents
How can I achieve that ?
Here is my mapping :
{
"user": {
"_all": {
"auto_boost": true,
"omit_norms": true
},
"properties": {
"phone_numbers": {
"type": "nested",
"properties": {
"phone_number": {
"type": "string"
},
"creation_date": {
"type": "string",
"index": "no"
},
"contract_number": {
"type": "string"
},
"expired": {
"type": "boolean"
}
}
},
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "string"
}
}
}
}
Thanks !
MC
EDIT :
I tried this query :
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"filter": {
"bool": {
"should":[
{
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
},
{
"missing": {
"field": "expired"
}
}
]
}
},
{
"bool": {
"must_not": [
{
"term": {
"phone_number": "987654321"
}
}
]
}
}
]
}
}
}
}
}
}}
But I get the two documents instead of get only the first one
You're very close. Try using a combination of must and should, where the must clause ensures the phone_number matches the search value, and the should clause ensures that either the expired field is missing or set to false. For example:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
}
],
"should": [
{
"missing": {
"field": "expired"
}
},
{
"term": {
"expired": false
}
}
]
}
}
}
}
}
}
}
}
}
I ran this query using your mapping and sample documents and it returned the one document for John Doe, as expected.

Facet by objects(tags) in an array

I am running into a query problem with ElasticSearch.
We have objects that looks like this:
{
"id":"1234",
"tags":[
{ "tagName": "T1", "tagValue":"V1"},
{ "tagName": "T2", "tagValue":"V2"},
{ "tagName": "T3", "tagValue":"V3"}
]
}
{
"id":"5678",
"tags":[
{ "tagName": "T1", "tagValue":"X1"},
{ "tagName": "T2", "tagValue":"X2"}
]
}
And I would like to get a list of tagValues for tagName=T1, which is "V1" and "X1".
I tried
{
"filter": {
"bool": {
"must": [
{
"term":{
"tags.tagName": "T1"
}
}
]
}
},
"facets": {
"TagValues":{
"filter": {
"term": {
"tags.tagName": "T1"
}
},
"terms": {
"field": "tags.tagValue",
"size": 30
}
}
}
}
It seems like it's returning all tagValues from all tags "T1", "T2", and "T3".
Can someone please help me with this query? How can I get faceted list for objects that's in an array?
Any help would be appreciated.
Thank you,
The main idea is to use the nested type for your tags field. Here is the mapping you should use:
curl -XPUT localhost:9200/mytags -d '{
"mappings": {
"mytag": {
"properties": {
"id": {
"type": "string"
},
"tags": {
"type": "nested",
"properties": {
"tagName": {
"type": "string",
"index": "not_analyzed"
},
"tagValue": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
Then you can reindex your data and run a query like the one below, which will first filter only the document containing a tagName whose value is T1 and then using aggregations (don't use facets anymore as they are deprecated), you can again select only those tags whose tagName is T1 and then retrieve the associated tagValue fields. This will get you the expected V1 and X1 values.
curl -XPOST localhost:9200/mytags/mytag/_search -d '{
"size": 0,
"query": {
"filtered": {
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.tagName": "T1"
}
}
}
}
}
},
"aggs": {
"tags": {
"nested": {
"path": "tags"
},
"aggs": {
"values": {
"filter": {
"term": {
"tags.tagName": "T1"
}
},
"aggs": {
"values": {
"terms": {
"field": "tags.tagValue"
}
}
}
}
}
}
}
}'

Nested filtering in elasticsearch with more than one term of the same nested type

I'm new to elasticsearch, so maybe my approach is plain wrong, but I want to make an index of recipes and allow the user to filter it down with the aggregated ingredients that are still found in the subset.
Maybe I'm using the wrong language to explain so maybe this example will clarify. I would like to search for recipes with the term salt; which results in three recipes:
with ingredients: salt, flour, water
with ingredients: salt, pepper, egg
with ingredients: water, flour, egg, salt
The aggregate on the results ingredients returns salt, flour, water, pepper, egg. When I filter with flour I only want recipe 1 and 3 to appear in the search results (and the aggregate on ingredients should only return salt, flour, water, egg and salt). When I add another filter egg I want only recipe 3 to appear (and the aggregate should only return water, flour, egg, salt).
I can't make the latter to work: one filter next to the default query does narrow down the results as desired but when adding the other term (egg) to the terms filter the results again start to include b as well, as if it were an OR filter. Adding AND however to the filter execution results in NO results ... what am I doing wrong?
My mapping:
{
"recipe": {
"properties": {
"title": {
"analyzer": "dutch",
"type": "string"
},
"ingredients": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"analyzer": "dutch",
"include_in_parent": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
My query:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"_all": "salt"
}
}
]
}
},
"filter": {
"nested": {
"path": "ingredients",
"filter": {
"terms": {
"ingredients.name": [
"flour",
"egg"
],
"execution": "and"
}
}
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"nested": {
"path": "ingredients"
},
"aggregations": {
"count": {
"terms": {
"field": "ingredients.name.raw"
}
}
}
}
}
}
Why are you using a nested mapping here? Its main purpose is to keep relations between the sub-object attributes, but your ingredients field has just one attribute and can be modeled simply as a string field.
So, if you update your mapping like this :
POST recipes
{
"mappings": {
"recipe": {
"properties": {
"title": {
"type": "string"
},
"ingredients": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
You can still index your recipes as :
{
"title":"recipe b",
"ingredients":["salt","pepper","egg"]
}
And this query gives you the result you are waiting for :
POST recipes/recipe/_search
{
"query": {
"filtered": {
"query": {
"match": {
"_all": "salt"
}
},
"filter": {
"terms": {
"ingredients": [
"flour",
"egg"
],
"execution": "and"
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"terms": {
"field": "ingredients"
}
}
}
}
which is :
{
...
"hits": {
"total": 1,
"max_score": 0.22295055,
"hits": [
{
"_index": "recipes",
"_type": "recipe",
"_id": "PP195TTsSOy-5OweArNsvA",
"_score": 0.22295055,
"_source": {
"title": "recipe c",
"ingredients": [
"salt",
"flour",
"egg",
"water"
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "egg",
"doc_count": 1
},
{
"key": "flour",
"doc_count": 1
},
{
"key": "salt",
"doc_count": 1
},
{
"key": "water",
"doc_count": 1
}
]
}
}
}
Hope this helps.

Resources