Querying and matching on 2 keys in a sub-object - elasticsearch

Hopefully I will be able to explain this issue clearly enough :/
I am trying to run a query on a resultset that returns a list of users who have liked an artist AND have a score of greater then or equal to 500. Consider this index:
{
"profile": 12345,
"artists": [
{
"id": 135,
"score": 10
},
{
"id": 246,
"score": 50
},
{
"id": 1357,
"score": 100
}
]
},
{
"profile": 24680,
"artists": [
{
"id": 135,
"score": 1
},
{
"id": 246,
"score": 500
},
{
"id": 1357,
"score": 77
}
]
},
{
"profile": 13579,
"artists": [
{
"id": 135,
"score": 5
},
{
"id": 246,
"score": 1000
},
{
"id": 1357,
"score": 150
}
]
}
Now, I would want to find users who have an artist.id value of 1357 AND have a score of greater then or equal to 100. So, I would expect users 12345 and 13579 to be returned. However, if I run the following query:
{
"query": {
"bool": {
"must": [
{
"term": {
"artists.key": "1357"
}
}
],
"filter": {
"range": {
"artists.currentScore": {
"gte": 100
}
}
}
}
}
Then all three users are returned. Because user 24680 has a score of greater than 100 on one of his results, despite it not being the id that I am passing, he is still being treated as a match.
Does anyone know of a way of matching both conditions, or at least when filtering, filtering on those where the original condition matched...
...if that makes any sense

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Elasticsearch function_score decay not working, always returns 1

I've been trying to fix this for hours, but nothing seems to change the return value of the function_score decay function. It's simply 1 at all time. It looks like it can't get the integer of the field I'm specifying?
The data model looks like this (obviously fake):
{
"basics": {
"name": "Mr Augustus Flybynight (Jim)",
"name_pref": "Jim",
"location": {
"city": "Melbourne",
"postalCode": "3040",
"meta": {
"country": "Australia"
},
"region": "VIC",
"address": "iytiytiyt, tyiuyti"
},
"email": "augustus.flybynight2#gmail.com",
"applicantNumber": "11882",
"name_first": "Augustus",
"meta": {
"alternateContact": "",
"lastModified": 1473353751,
"alternateName": "",
"notificationType": "-1",
"alternatePhones": [
],
"gender": "M"
},
"name_last": "Flybynight",
"phone": "44556677"
}
}
I have 3 duplicates of this entity which the only difference is their timestamp (basics.meta.lastModified). I'm trying to create a 'closer is better' functional score so that the latest comes to the top. We haven't mapped the timestamp as a date yet, but it is mapped as an integer.
When trying to query with the following
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"basics.meta.lastModified": {
"origin": 1474868635, // now
"offset": 86400, // one day
"scale": 604800, // seven days
"decay": 0.5
}
},
"weight": 2
}
],
"query": {
"bool": {
"should": [
{
"match": {
"_all": "augustus flybynight"
}
},
{
"match": {
"basics.all_names.all_names_identifier_whitespace": {
"query": "augustus flybynight",
"boost": 2
}
}
},
{
"match": {
"basics.email.email_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"match": {
"basics.applicantNumber.applicantNumber_identifier_keyword": {
"query": "augustus flybynight",
"boost": 3
}
}
},
{
"wildcard": {
"basics.email.email_identifier_keyword": {
"wildcard": "augustus flybynight*",
"boost": 2
}
}
},
{
"wildcard": {
"basics.all_names.all_names_identifier_whitespace": {
"wildcard": "augustus flybynight*"
}
}
}
],
"must": []
}
}
}
},
"size": 25,
"from": 0,
"min_score": 0.2
}
But this always returns '1' for the functional score, which is then multiplied to the query and doesn't affect it. It's the weirdest thing.
When looking at the explanation, this is what's returned:
{
"value": 1,
"description": "min of:",
"details": [
{
"value": 1,
"description": "product of:",
"details": [
{
"value": 1,
"description": "Function for field basics.meta.lastModified:",
"details": [
{
"value": 1,
"description": "max(0.0, ((2.0 - MIN[0.0])/2.0)",
"details": [
]
}
]
},
{
"value": 1,
"description": "weight",
"details": [
]
}
]
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": [
]
}
]
}
Seems like 'MIN[0.0]' is the part that should be returning the timestamp, but it isn't, instead returning 0 and making the decay function always 1. if I make the decay parameters stricter, like origin:0, offset:0, scale:1 and decay:0.5, I'd expect the function_score to be close to 0, but it's still 1.
Please help. I've been trying everything and there doesn't seem to be a lot of examples online. Any suggestions would be welcomed.
For those hitting the same issue, I've finally found the culprit.
It seems that someone didn't setup the mappings properly at that the basics.meta property was set as a nested type, but since it wasn't populated as such (you'd think that would of caused an issue when indexing data?), when trying to access the data within it, it always returned MIN[0.0] because it simply couldn't find the value of the property.
So yeah, if you ever hit this issue, thoroughly look through your mappings instead of wasting a whole day like I did :|

ES Sort by existance of specific value in multivalue field

I need to sort - 1st by existance of "shop_1" in "available", 2nd by "price" ascending.
{
"pID": 1,
"available": ["shop_1", "shop_3"],
"price": 100
}
{
"pID": 2
"available": ["shop_2", "shop_4"],
"price": 50
}
{
"pID": 3,
"available": ["shop_1"],
"price": 200
}
{
"pID": 4,
"available": ["shop_4"],
"price": 10
}
So the result would be pID: 1, 3, 4, 2
I believe something like below should work
Here if shop_1 is available , then we are increasing the score by 10 , this would be way higher than any reciprocal of price field can give.
Sum of these two will ensure that we get what you are looking for.
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"boost_mode": "replace",
"score_mode": "sum",
"functions": [
{
"filter": {
"term": {
"available": "shop_1"
}
},
"weight": 10
},
{
"field_value_factor": {
"field": "price",
"modifier": "reciprocal"
}
}
]
}
}
}

Resources