Perform nested sort without inner_hits in ElasticSearch - sorting

I need some help on querying records from ELasticSearch (1.7.3). We will be getting a list of evaluations performed and display only the last evaluation done as shown below:
evaluation: [
{
id: 2,
breaches: null
},
{
id: 6,
breaches: null
},
{
id: 7,
breaches: null
},
{
id: 15,
breaches: null
},
{
id: 18,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
},
{
id: 19,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
}
]
Now we need to query records on the basis of latest evaluation performed, that is to query only on the last object of the evaluation array. We found out the there is a support of inner_hits to sort and limit the nested records. For that we wrote a query to sort on the basis of evaluation id in desc order and limit its size to 1 as shown below:
{
"query": {
"bool": {
"must": {
"nested": {
"path": " evaluation",
"query": {
"bool": {
"must": {
"term": {
" evaluation. breaches": "rule_one"
}
}
}
},
"inner_hits": {
"sort": {
" evaluation.id": {
"order": "desc"
}
},
"size": 1
}
}
}
}
}
}
Please find the mapping below:
evaluation: {
type: "nested",
properties: {
id: {
type: "long"
},
breaches: {
type: "string"
}
}
}
We tried sorting records but it did not worked, can you suggest some other ways to search on just the last object of nested records.
Thanks.

Related

Elasticsearch - add normal field filter to nested field aggregation

I have document structure like below in ES:
{
customer_id: 1,
is_member: true,
purchases: [
{
pur_id: 1,
pur_channel_id: 1,
pur_amount: 100.00,
pur_date: '2021-08-01'
},
{
pur_id: 2,
pur_channel_id: 2,
pur_amount: 100.00,
pur_date: '2021-08-02'
}
]
},
{
customer_id: 2,
is_member: false,
purchases: [
{
pur_id: 3,
pur_channel_id: 1,
pur_amount: 200.00,
pur_date: '2021-07-01'
},
{
pur_id: 4,
pur_channel_id: 3,
pur_amount: 300.00,
pur_date: '2021-07-02'
}
]
}
I want to aggregate sum by purchases.pur_channel_id and also for each sub aggregation I want to add sub sum aggregation on documents that contains "is_member=false", therefore, I composed following query:
{
"size": 0,
"query": {
"match_all": {}
}
},
"aggs": {
"purchases": {
"nested": {
"path": "purchases"
},
"aggs": {
"pur_channel_id": {
"terms": {
"field": "purchases.pur_channel_id",
"size": 10
},
"aggs": {
"none_member": {
"filter": {
"term": {
"is_member": false
}
},
"aggs": {
"none_member_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
},
"aggs": {
"pur_channel_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
}
}
}
}
}
The query runs success, while I got 0 for all "none_member_amount". I wonder a normal field perhaps can not be added inside of a nested aggregation.
Please help! Thanks.
Nested aggregation runs at nested block level, so your query is searching for is_member field in nested documents. To join back to parent doc you need to use reverse nested aggregation or you can move is_member check before nested aggregation using filter aggregation.

ElasticSearch 6.8 doesn't order by exact matches first

I've been searching for this kind of issue for some days and I didn't make it work. I followed steps like this and this but no success.
So basically, I have the following data on ElasticSearch:
{ title: "Black Dust" },
{ title: "Dust In The Wind" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
and the problem is that I want to search by "Dust" word and I want the results be ordered like:
{ title: "Dust In The Wind" },
{ title: "Black Dust" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
where "Dust" must appear at the top of the result instead.
Posting the mappings and query would be better than continue explaining the issue itself.
settings: {
analysis: {
normalizer: {
lowercase: {
type: 'custom',
filter: ['lowercase']
}
}
}
},
mappings: {
_doc: {
properties: {
title: {
type: 'text',
analyzer: 'standard',
fields: {
raw: {
type: 'keyword',
normalizer: 'lowercase'
},
fuzzy: {
type: 'text',
},
},
}
}
}
}
and my query is:
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"title"
],
"default_operator": "AND",
"query": "dust"
}
},
"should": {
"prefix": {
"title.raw": "dust"
}
}
}
}
Can anyone please help me in this?
Thank you!
SOLUTION!
I figured it out and I solved by performing the following query:
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"prefix": {
"title.raw": {
"value": "dust",
"boost": 1000000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 50000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 10,
"fuzziness": 1
}
}
}
]
}
}
}
}
However, while writing tests, I found a little issue.
So, I'm generating a random uuid and adding to database the following:
{ title: `${uuid} A` }
{ title: `${uuid} W` }
{ title: `${uuid} Z` }
{ title: `A ${uuid}` }
{ title: `z ${uuid}` }
{ title: `Z ${uuid}` }
When I perform the query above looking for the uuid, I get:
uuid Z
uuid A
uuid W
Z uuid
I achieved my first goal that was having the uuid on first position, but why Z is before A? (first and second result)
When everything else fails you can use a trivial substring position sort like so:
{
"query": {
"bool": {
"must": {
...
},
"should": {
...
}
}
},
"sort": [
{
"_script": {
"script": "return doc['title.raw'].value.indexOf('dust')",
"type": "number",
"order": "asc" <--
}
}
]
}
I've set the order to asc because the lower the substring index, the higher the 'score'.
EDIT
We've gotta account for index == -1 so replace the script above with:
"script": "def pos = doc['title.raw'].value.indexOf('dust'); return pos == -1 ? Integer.MAX_VALUE : pos"

elasticsearch nested query returns only last 3 results

We have the following elasticsearch mapping
{
index: 'data',
body: {
settings: {
analysis: {
analyzer: {
lowerCase: {
tokenizer: 'whitespace',
filter: ['lowercase']
}
}
}
},
mappings: {
// used for _all field
_default_: {
index_analyzer: 'lowerCase'
},
entry: {
properties: {
id: { type: 'string', analyzer: 'lowerCase' },
type: { type: 'string', analyzer: 'lowerCase' },
name: { type: 'string', analyzer: 'lowerCase' },
blobIds: {
type: 'nested',
properties: {
id: { type: 'string' },
filename: { type: 'string', analyzer: 'lowerCase' }
}
}
}
}
}
}
}
and a sample document that looks like the following:
{
"id":"5f02e9dae252732912749e13",
"type":"test_type",
"name":"test_name",
"creationTimestamp":"2020-07-06T09:07:38.775Z",
"blobIds":[
{
"id":"5f02e9dbe252732912749e18",
"filename":"test1.csv"
},
{
"id":"5f02e9dbe252732912749e1c",
"filename":"test2.txt"
},
// removed in-between elements for simplicity
{
"id":"5f02e9dbe252732912749e1e",
"filename":"test3.csv"
},
{
"id":"5f02e9dbe252732912749e58",
"filename":"test4.txt"
},
{
"id":"5f02e9dbe252732912749e5a",
"filename":"test5.csv"
},
{
"id":"5f02e9dbe252732912749e5d",
"filename":"test6.txt"
}
]
}
I have the following ES query which is querying documents for a certain timerange based on the creationTimestamp field and then filtering the nested field blobIds based on a user query, that should match the blobIds.filename field.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"creationTimestamp": {
"gte": "2020-07-01T09:07:38.775Z",
"lte": "2020-07-07T09:07:40.147Z"
}
}
},
{
"nested": {
"path": [
"blobIds"
],
"query": {
"query_string": {
"fields": [
"blobIds.filename"
],
"query": "*"
}
},
// returns the actual blobId hit
// and not the whole array
"inner_hits": {}
}
},
{
"query": {
"query_string": {
"query": "+type:*test_type* +name:*test_name*"
}
}
}
]
}
}
}
},
"sort": [
{
"creationTimestamp": {
"order": "asc"
},
"id": {
"order": "asc"
}
}
]
}
The above entry is clearly matching the query. However, it seems like there is something wrong with the returned inner_hits, since I always get only the last 3 blobIds elements instead of the whole array that contains 24 elements, as can be seen below.
{
"name": "test_name",
"creationTimestamp": "2020-07-06T09:07:38.775Z",
"id": "5f02e9dae252732912749e13",
"type": "test_type",
"blobIds": [
{
"id": "5f02e9dbe252732912749e5d",
"filename": "test4.txt"
},
{
"id": "5f02e9dbe252732912749e5a",
"filename": "test5.csv"
},
{
"id": "5f02e9dbe252732912749e58",
"filename": "test6.txt"
}
]
}
I find it very strange since I'm only doing a simple * query.
Using elasticsearch v1.7 and cannot update at the moment

Full-text search through complex structure Elasticsearch

I have the following issue in case of a full-text search in Elasticsearch. I would like to search for all indexed attributes. However, one of my Project attributes is a very complex array of hashes/objects:
[
{
"title": "Group 1 title",
"name": "Group 1 name",
"id": "group_1_id",
"items": [
{
"pos": "1",
"title": "Position 1 title"
},
{
"pos": "1.1",
"title": "Position 1.1 title",
"description": "<p>description</p>",
"extra_description": {
"rotation": "2 years",
"amount": "1.947m²"
},
"inputs": {
"unit_price": true,
"total_net": true
},
"additional_inputs": [
{
"name": "additonal_input_name",
"label": "Additional input label:",
"placeholder": "Additional input placeholder",
"description": "Additional input description",
"type": "text"
}
]
}
]
}
]
My mappings look like this:
{:title=>{:type=>"text", :analyzer=>"english"},
:description=>{:type=>"text", :analyzer=>"english"},
:location=>{:type=>"keyword"},
:company=>{:type=>"keyword"},
:created_at=>{:type=>"date"},
:due_date=>{:type=>"date"},
:specification=>
{:type=>:nested,
:properties=>
{:id=>{:type=>"keyword"},
:title=>{:type=>"text"},
:items=>
{:type=>:nested,
:properties=>
{:pos=>{:type=>"keyword"},
:title=>{:type=>"text"},
:description=>{:type=>"text", :analyzer=>"english"},
:extra_description=>{:type=>:nested, :properties=>{:rotation=>{:type=>"keyword"}, :amount=>{:type=>"keyword"}}},
:additional_inputs=>
{:type=>:nested,
:properties=>
{:label=>{:type=>"keyword"},
:placeholder=>{:type=>"text"},
:description=>{:type=>"text"},
:type=>{:type=>"keyword"},
:name=>{:type=>"keyword"}
}
}
}
}
}
}
}
The question is, how to properly seek through it? For no nested attributes, it works as a charm, but for instance, I would like to seek by title in the specification, no result is returned. I tried both:
query:
{ nested:
{
multi_match: {
query: keyword,
fields: ['title', 'description', 'company', 'location', 'specification']
}
}
}
Or
{
nested: {
path: 'specification',
query: {
multi_match: {
query: keyword
}
}
}
}
Without any result.
Edit:
It's with elasticsearch-ruby for Ruby.
I am trying to query by: MODEL_NAME.all.search(query: with_specification("Group 1 title")) where with_specification is:
def with_specification(keyword)
{
bool: {
should: [
{
nested: {
path: 'specification',
query: {
bool: {
should: [
{
match: {
'specification.title': keyword,
}
},
{
multi_match: {
query: keyword,
fields: [
'specification.title',
'specification.id'
]
}
},
{
nested: {
path: 'specification.items',
query: {
match: {
'specification.items.title': keyword,
}
}
}
}
]
}
}
}
}
]
}
}
end
Querying on multi-level nested documents must follow a certain schema.
You cannot multi-match on nested & non-nested fields at the same time and/or query on nested fields under different paths.
You can wrap your queries in a bool-should but keep the 2 rules above in mind:
GET your_index/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "specification",
"query": {
"bool": {
"should": [
{
"match": {
"specification.title": "TEXT" <-- standalone match
}
},
{
"multi_match": { <-- multi-match but 1st level path
"query": "TEXT",
"fields": [
"specification.title",
"specification.id"
]
}
},
{
"nested": {
"path": "specification.items", <-- 2nd level path
"query": {
"match": {
"specification.items.title": "TEXT"
}
}
}
}
]
}
}
}
}
]
}
}
}

Filtering facets results in nested element with ElasticSearch

I have this mapping:
products: {
product: {
properties: {
id: {
type: "long"
},
name: {
type: "string"
},
tags: {
dynamic: "true",
properties: {
tagId: {
type: "long"
},
tagType: {
type: "long"
}
}
}
}
}
}
I want to create a facet on tag ids, but with tag-type filtering.
I need the filter to only apply on the facet and not the query results.
So here's my request:
{
"from": 0,
"size": 10,
"facets": {
"tags": {
"terms": {
"field": "tags.tagId",
"size": 10
},
"facet_filter": {
"terms": {
"tags.tagType": [
"11",
"19"
]
}
}
}
},
"query": {
"match_all": {}
}
}
The facet filtering does not seem to affect the faceting.
Any ideas?
The filter is applied to the documents, the parent entity in your example. That means that you're filtering the documents on which you make the facet by tags.tagType. Therefore all documents which have a specific tags.tagType value are used to build the facet, which is not what I want.
This is the usecase for nested documents. You can have a look at this nice article too.

Resources