Elasticsearch scoring documents liked by similar users higher - elasticsearch

In Elasticsearch I have two indexes, places and users. This is the mapping for places:
mappings: {
location: {
type: "geo_point"
}
}
And this is the mapping for users:
mappings: {
likes: {
type: "keyword"
},
seen: {
type: "keyword"
}
}
As you can see a user can like and see different places. Now I want to query places which a user has not seen or liked yet and want to show places which are liked by users who like similar places as the querying user first. This is the query I was able to come up with:
POST /places/_search
{
"_source": [
"id"
],
"size": 1,
"query": {
"function_score": {
"query": {
"bool": {
"must_not": [
{
"terms": {
"_id": {
"index": "users",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "seen"
}
}
},
{
"terms": {
"_id": {
"index": "users",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "likes"
}
}
}
],
"filter": {
"geo_distance": {
"distance": "200km",
"location": {
"lat": 52,
"lon": 13
}
}
}
}
},
"random_score": {},
"boost_mode": "replace"
}
}
}
However, at this moment this query just assigns a random score to all results. As I'm new to Elasticsearch I'm struggling to come up with a scoring function to achieve scoring places, that similar users have liked, higher, especially because the data about user likes is stored in a different index than the one I'm actually querying. What would be the best approach this problem? Is something like this even possible with my current data model?

I think you have to perform two request like below
Get all the similar user's likes location ids
Then use the location ids to match and exclude the likes and seen location
Step 1 query example :
GET users/_search
{
"_source": [
"likes"
],
"query": {
"bool": {
"filter": [
{
"terms": {
"likes": {
"index": "users",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "likes"
}
}
}
],
"must_not": [
{
"ids": {
"values": [
vu0E1rjJEqcgyfj29fwZ
]
}
}
]
}
}
}
Step 2 query example :
GET places/_search
{
"_source": [
"id"
],
"size": 1,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"ids": {
"values": [] # Put all the similar user like ids here
}
}
],
"must_not": [
{
"terms": {
"_id": {
"index": "users",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "seen"
}
}
},
{
"terms": {
"_id": {
"index": "users",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "likes"
}
}
}
],
"filter": {
"geo_distance": {
"distance": "200km",
"location": {
"lat": 52,
"lon": 13
}
}
}
}
},
"random_score": {},
"boost_mode": "replace"
}
}
}

You could use a gauss decay function from within your function score query, as nicely described here:
GET /places/_search
{
"size": 5,
"query": {
"function_score": {
"query": {
"bool": {
"must_not": [
{
"terms": {
"_id": {
"index": "users",
"type": "_doc",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "seen"
}
}
},
{
"terms": {
"_id": {
"index": "users",
"type": "_doc",
"id": "vu0E1rjJEqcgyfj29fwZ",
"path": "likes"
}
}
}
]
}
},
"functions": [
{
"gauss": {
"location": {
"origin": {
"lat": 52,
"lon": 13
},
"scale": "200km"
}
}
}
],
"boost_mode": "replace"
}
}
}
But I wonder what the current connection between the likes and places is in your data model.

Related

ElasticSearch: Fetch records from nested Array that "only" include given element/s and filter-out the rest with mixed values

I am stuck on one of my tasks.
Overview:
There are some records on elastic search. Which includes information about the candidates and their employment.
There is a field that stores information about the statuses in which the candidate got submitted.
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
}
}
}
]
}
}
I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.
here is the mapping
"submittedJobs": {
"type": "nested",
"properties": {
"statusId": {
"type": "long"
},
"status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase_normalizer"
}
}
},
"jobId": {
"type": "keyword"
}
}
}
For example. let's suppose there are two documents
document #1:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
},
document #2:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.
Document #1 will be filtered-out since it includes mixed statuses.
Any help will be appreciated.
Try this:
Will be return only document with all item of array with status PendingPM.
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must_not": [
{
"match": {
"submittedJobs.status": {
"query": "PendingPM"
}
}
},
{
"match": {
"submittedJobs.status": {
"query": "PendingClient"
}
}
}
]
}
}
}
}
]
}
}
}
You can use inner_hits along with nested query to get only the matched results from the document
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"submittedJobs": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Search Result would be:
"hits": [
{
"_index": "73062439",
"_id": "1",
"_score": 0.0,
"_source": {
"submittedJobs": [
{
"status": "PendingPM",
"jobId": "ABC"
},
{
"status": "PendingClient",
"jobId": "XYZ"
},
{
"status": "PendingPM",
"jobId": "WXY"
}
]
},
"inner_hits": { // note this
"submittedJobs": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.4700036,
"hits": [
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 0
},
"_score": 0.4700036,
"_source": {
"jobId": "ABC",
"status": "PendingPM"
}
},
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 2
},
"_score": 0.4700036,
"_source": {
"jobId": "WXY",
"status": "PendingPM"
}
}
]
}
}
}
}
]

Elasticsearch nested path query into an object type

Having this template (abbreviated version).
{
"index_patterns": "index_pattern*",
"order": 1,
"version": 1,
"aliases": {
"some_alias": {}
},
"settings": {
"number_of_shards": 5,
},
"mappings": {
"dynamic": "false",
"properties": {
"someId": {
"type": "keyword"
},
"audience": {
"type": "object",
"properties": {
....
"ageRanges": {
"type": "nested",
"properties": {
"ageTo": {
"type": "integer"
},
"ageFrom": {
"type": "integer"
}
}
}
}
}
}
}
}
I need to query if the audience.ageRanges does not exist or if it does exist apply other filters.
Let's say we want to search if a document with specific someId value fits into the audience.ageRanges query clauses (removed for clarity).
It has some audience properties but no ageRanges.
"audience": {
"genders": [
"any"
],
"deviceType": "any"
}
Shouldn't the below query return the specific document?
{
"query": {
"bool": {
"must": [
{
"term": {
"someId": {
"value": "03183f31"
}
}
},
{
"nested": {
"path": "audience.ageRanges",
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "audience.ageRanges"
}
}
]
}
}
}
}
]
}
}
}
My results are 0, it is a bit confusing how it works.
Trying with a document id that does have audience.ageRanges items and changing the must_not nested query to must will return results.
Instead of putting must_not inside the nested query, you should put the nested query inside the must_not.
Consider a sample index data as
{
"someId":123,
"audience": {
"genders": [
"any"
],
"deviceType": "any"
}
}
You need to modify your search query as shown below -
Search Query:
{
"query": {
"bool": {
"must": [
{
"term": {
"someId": {
"value": "123"
}
}
},
{
"bool": {
"must_not": {
"nested": {
"path": "audience.ageRanges",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "audience.ageRanges"
}
}
]
}
}
}
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "65852173",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"someId": 123,
"audience": {
"genders": [
"any"
],
"deviceType": "any"
}
}
}
]

How can i do both search across all field and search with field specified in Elastic search?

I'm very new to elastic search, how do I write a query which search for a keyword (ie. test keyword) in all fields in the document, and one more keyword which search in a specific field.
this can be done using query_string but we can't do search in nested fields with nested field specified, So i'm using LUQUM to convert lucene query to Elasticsearch DSL.
Below is the sample usecase:
I have a mapping:
"mappings": {
"properties": {
"grocery_name":{
"type": "text"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"stock": {
"type": "integer"
},
"category": {
"type": "text"
}
}
}
}
}
}
and the data looks like below
{
"grocery_name": "Elastic Eats",
"items": [
{
"name": "Red banana",
"stock": "12",
"category": "fruit"
},
{
"name": "Cavendish banana",
"stock": "10",
"category": "fruit"
},
{
"name": "peach",
"stock": "10",
"category": "fruit"
},
{
"name": "carrot",
"stock": "9",
"category": "vegetable"
},
{
"name": "broccoli",
"stock": "5",
"category": "vegetable"
}
]
}
How can I query to get all items where the item name matches banana from grocery_name: Elastic Eats ?
tried with * and _all, it didn't work.
example query:
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"grocery_name": {
"query": "Elastic Eats"
}
}
},
{
"match": {
"*": {
"query": "banana",
"zero_terms_query": "all"
}
}
}
]
}
}
}
I'm sure I'm missing something obvious, but I have read the manual and I'm getting no joy at all.
UPDATE:
enabling include_in_parent for nested object works for below query, but it will internally duplicates data which will definitely impact on memory.
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"grocery_name": {
"query": "Elastic Eats"
}
}
},
{
"multi_match": {
"query": "banana"
}
}
]
}
}
}
Is there any other way to do this?
You need to use a nested match query with inner_hits resulting in an inner nested query to automatically match the relevant nesting level, rather than root
Search Query
{
"query": {
"bool": {
"filter": [
{
"term": {
"grocery_name": "elastic"
}
},
{
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"match": {
"items.name": "banana"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Search Result:
"inner_hits": {
"items": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.744874,
"hits": [
{
"_index": "stof_64273970",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "items",
"offset": 0
},
"_score": 0.744874,
"_source": {
"name": "Red banana",
"stock": "12",
"category": "fruit"
}
},
{
"_index": "stof_64273970",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "items",
"offset": 1
},
"_score": 0.744874,
"_source": {
"name": "Cavendish banana",
"stock": "10",
"category": "fruit"
}
}
]
}
Update 1:
On the basis of your comments, you can use multi match query, for your use case
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
(*) extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Search Query:
{
"query": {
"bool": {
"filter": [
{
"term": {
"grocery_name": "elastic"
}
},
{
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "banana" <-- note this
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Update 2:
You need to use a combination of multiple bool queries like this:
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"grocery_name": {
"query": "Elastic Eats"
}
}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"multi_match": {
"query": "banana"
}
}
]
}
},
{
"bool": {
"must": [
{
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "banana"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
]
}
}
]
}
}
}

Filter not working for weighted search

I am pretty new to elasticsearch and have not really got the hold of it. So I have a search, the results of which will be weighed according to the weight of their tags, which works absolutely fine, but later when I introduced a filter, the search always gives me empty results. Here is what I have tried:
{
"nested": {
"path": "tags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"tags.tag": "big"
}
}
],
"filter": {
"term": {
"type.primary": "audio"
}
}
}
},
"field_value_factor": {
"field": "tags.weight"
},
"boost_mode": "multiply",
"boost": 10
}
}
}
}
The example result with the filter should be something like this:
{
"_index": "assets",
"_type": "Asset",
"_id": "5a1dc3c0848662ee49e36f43s",
"_score": 886.8744,
"_source": {
"name": "And Action Breakbeat",
"meta_data": {
"type": "audio/mp3",
"file_name": "music_zapsplat_and_action_breakbeat.mp3"
},
"file_key": "music_zapsplat_and_action_breakbeat.mp3",
"src": {
"url": "https://exapmle.com/music_zapsplat_and_action_breakbeat.mp3"
},
"type": {
"primary": "AUDIO",
"secondary": "mp3"
},
"thumbnail_url": "https://example.com/thumbnail/audio.jpg",
"tags": [
{
"tag": "big",
"weight": 10
},
{
"tag": "beat",
"weight": 5
},
{
"tag": "music",
"weight": 3.3333333333333335
}
],
"isDeleted": false,
}
}
Thank you!
You cannot match type.primary inside a nested query for tags. Try this query instead:
{
"query": {
"bool": {
"filter": {
"term": {
"type.primary": "audio"
}
},
"must": [
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"match_phrase_prefix": {
"tags.tag": "big"
}
},
"field_value_factor": {
"field": "tags.weight"
},
"score_mode": "sum",
"boost_mode": "multiply",
"boost": 10
}
}
}
}
]
}
}
}

Unsure how to structure ElasticSearch Range Query with nested properties

My object in ES looks like:
{
"_index": "myIndex",
"_type": "myType",
"_id": "75fd98d2-eca7-4a94-9dd8-1cc2c9b1fbbf",
"_version": 2,
"found": true,
"_source": {
"account_id": "100",
"import_ids": [
"4f4eef42-5493-464e-ac08-68a3a25a01fb"
],
"accept": "html",
"deleted_at": null,
"signup_form_ids": [
{
"timestamp": "2015-11-23T20:08:11.604000",
"signup_form_id": "1234"
}
],
"mailing_status": "active",
"group_ids": [
"0eddd2c0-ce70-4eb7-bcd8-9e41e41ac0b3"
],
"confirmed_opt_in_at": null,
"fields": [
{
"text_value": "My Company",
"name": "company"
},
{
"text_value": "Foo",
"name": "first-name"
},
{
"text_value": "Bar",
"name": "last_name"
},
{
"text_value": "555-555-5555",
"name": "phone"
}
],
"created_at": "2015-11-23T19:20:15.889000",
"last_modified_at": "2015-11-23T20:08:11.604000",
"bounce_count": 0,
"opted_out_at": null,
"archived_at": null,
"email": "example#example.com",
"opt_out_mailing_id": "None"
}
}
I am trying to run write a query that gives me all hits where the signup_form_ids.timestamp are lte now-7d/d. I'm looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#ranges-on-dates but unsure how to structure the query
This is what I have so far:
{
"query": {
"nested": {
"path": "signup_form_ids",
"bool": {
"must": [
{
"range": {
"timestamp" {
"lte": "now-7d/d"
}
}
}
]
}
},
"bool": {
"must": [
{
"bool": {
"must": []
}
},
{
"match": {
"account_id": "100"
}
},
{
"filtered": {
"filter": {
"missing": {
"field": "deleted_at"
}
}
}
}
]
}
},
"size": 500,
"from": 0
}
There are several things wrong here, and it's not entirely obvious which ones are artifacts of you adjusting your query to post here.
First, you're missing a colon after "timestamp" in your query. Also, you have an empty inner "bool". And your "range" query is inside a needless "bool". Also your "filtered" clause is redundant and you can just use the "filter" inside it.
But the main problems are that 1) your "nested" query needs to be inside your "bool" if you want all the conditions to apply, 2) your "nested" "range" filter needs to specify the full path to "timestamp" and 3) the "bool" inside your "nested" clause needs to be in a "filter".
So, making minimal adjustments to make the query work, the following query returns the document you posted (I changed the "lte" to "gte" so the document you posted would be returned, otherwise it doesn't match the query, yet):
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": []
}
},
{
"match": {
"account_id": "100"
}
},
{
"filtered": {
"filter": {
"missing": {
"field": "deleted_at"
}
}
}
},
{
"nested": {
"path": "signup_form_ids",
"filter": {
"bool": {
"must": [
{
"range": {
"signup_form_ids.timestamp": {
"gte": "now-7d/d"
}
}
}
]
}
}
}
}
]
}
},
"size": 500,
"from": 0
}
If I clean it up to remove all the redundancies, I end up with:
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"account_id": "100"
}
},
{
"missing": {
"field": "deleted_at"
}
},
{
"nested": {
"path": "signup_form_ids",
"filter": {
"range": {
"signup_form_ids.timestamp": {
"gte": "now-7d/d"
}
}
}
}
}
]
}
},
"size": 500,
"from": 0
}
Here is some code I used to play around with it:
http://sense.qbox.io/gist/ee96042c0505dfb07199b919d134b2a20c5a66fd

Resources