Elastic Search matrix_stats on both nested and non nested value - elasticsearch

we have a doc structure
[{
score: 10,
list: [ // nested type
{
id : 3,
value: 10
},
{
id: 4
value: 20
},
{
id: 5,
value: 15
}
]
},
{
score: 1,
list: [
{
id : 3,
value: 4
},
{
id: 4
value: 3
},
{
id: 5,
value: 2
}
]
}...
we're trying to do matrix_stats aggregation on field “score” and nested field “value” and considering only specific ids like 4,
we couldn't find a way to do matrix_stats aggregation on both nested(list.value) and non nested(score) field
example, in 1st doc score is 10 and value of nested list matching id = 4 is 20 … likewise
any possible ways to get the value by scripts or runtimeMapping etc….
or is there any way we can access parent field inside nested aggregation for matrix_stats aggregation

Related

Sorting on nested object in elastic search, failed to find nested object under path

I have the following 2 documents indexed.
{
region: 'US',
manager: {
age: 30,
name: {
first: 'John',
last: 'Smith',
},
},
},
{
region: 'US',
manager: {
age: 30,
name: {
first: 'John',
last: 'Cena',
},
},
}
I am trying to search and sort them by their last name. I have tried the following query.
{
sort: [
{
'manager.name.first': {
order: 'desc',
nested: {
path: 'manager.name.first',
},
},
},
],
query: {
match: {
'manager.name.first': 'John',
},
},
},
I am getting the following error in response. What am I doing wrong here (I am very new to this elasticsearch, so apologize if this is a very basic thing I am not aware of)
ResponseError: search_phase_execution_exception: [query_shard_exception] Reason: [nested] failed to find nested object under path [manager.name.first]
I also tried path: 'manager.name', but that also didn't work.
You need to use only manager as nested path as that is only field define as nested type.
{
"sort": [
{
"manager.name.first.keyword": {
"order": "desc",
"nested": {
"path": "manager"
}
}
}
]
}
Use manager.name.first as field name if it is defined as keyword type otherwise use manager.name.first.keyword if it is define as multi type field with text and keyword both.

ElasticSearch filter by value in array

I have nested array of objects.
How can I filter by specific value in array? I want filter by reg = region1 and value greater than 2
{
"region": [
{
"reg": "region1",
"val": 2
},
{
"reg": "region2",
"val": 6
}
]
}

Adding additional fields to ElasticSearch terms aggregation

Indexed documents are like:
{
id: 1,
title: 'Blah',
...
platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
...
}
What I want is count and output stats-by-platform.
For counting, I can use terms aggregation with platform.id as a field to count:
aggs: {
platforms: {
terms: {field: 'platform.id'}
}
}
This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.
Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
name: {terms: {field: 'platform.name'}},
url: {terms: {field: 'platform.url'}}
}
}
}
Which, in fact, works, and returns pretty complicated structure in each bucket:
{key: 7,
doc_count: 528568,
url:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "http://facebook.com", doc_count: 528568}]},
name:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "Facebook", doc_count: 528568}]}},
Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?
It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
}
}
This way, each bucked will look like:
{"key": 7,
"doc_count": 529939,
"platform": {
"hits": {
"hits": [{
"_source": {
"platform":
{"id": 7, "name": "Facebook", "url": "http://facebook.com"}
}
}]
}
},
}
Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform
If you don't necessarily need to get the value of platform.id, you could get away with a single aggregation instead using a script that concatenates the two fields name and url:
aggs: {
platforms: {
terms: {script: 'doc["platform.name"].value + "," + doc["platform.url"].value'}
}
}

How to create readable result of group function in RethinkDB without group and reduction field?

I have a query that uses group() function:
...group('a','b','c','na').count()
the now the result is returns like in the form of group and reduction like this:
How can I get result without group and reduce in the form of
{
"na": 1285
"c" : 487
"b" : 746
"a" : 32
}
I'm not sure, but I think you're misunderstanding what group does.
The group command takes a property and groups documents by that property. So, for example, if you wanted to group documents by the a property, that would look something like this:
{
a: 1
}, {
a: 1
}, {
a: 1
}, {
a: 2
}
Then you would run the following query:
r.table(...).group('a').count().ungroup()
Which would result in:
[
{
"group": 1 ,
"reduction": 3
},
{
"group": 2 ,
"reduction": 1
}
]
By passing multiple arguments to group you are telling it to make distinct groups for all those properties. So you you have the following documents:
[ {
a: 1, b: 1
}, {
a: 1, b: 1
}, {
a: 1, b: 2
}, {
a: 2, b: 1
}]
And you group them by a and b:
r.table(...).group('a', 'b').count().ungroup()
You will get the following result:
[{
"group": [ 1 , 1 ] ,
"reduction": 2
},
{
"group": [ 1 , 2 ] ,
"reduction": 1
},
{
"group": [ 2 , 1 ] ,
"reduction": 1
}]
Your Answer
So, when you do .group('a','b','c','na').count(), you're grouping them by those 4 properties. If you want the following result:
{
"na": 1285
"c" : 487
"b" : 746
"a" : 32
}
Then your documents should look something like this:
[{
property: 'a'
}, {
property: 'c'
}, {
property: 'na'
},
...
]
And then you would group them in the following way:
r.table(...).group('property').count().ungroup()

select distinct from elasticsearch

I have a collection of documents which belongs to few authors:
[
{ id: 1, author_id: 'mark', content: [...] },
{ id: 2, author_id: 'pierre', content: [...] },
{ id: 3, author_id: 'pierre', content: [...] },
{ id: 4, author_id: 'mark', content: [...] },
{ id: 5, author_id: 'william', content: [...] },
...
]
I'd like to retrieve and paginate a distinct selection of best matching document based upon the author's id:
[
{ id: 1, author_id: 'mark', content: [...], _score: 100 },
{ id: 3, author_id: 'pierre', content: [...], _score: 90 },
{ id: 5, author_id: 'william', content: [...], _score: 80 },
...
]
Here's what I'm currently doing (pseudo-code):
unique_docs = res.results.to_a.uniq{ |doc| doc.author_id }
Problem is right on pagination: How to select 20 "distinct" documents?
Some people are pointing term facets, but I'm not actually doing a tag cloud:
Distinct selection with CouchDB and elasticsearch
http://elasticsearch-users.115913.n3.nabble.com/Getting-Distinct-Values-td3830953.html
Thanks,
Adit
As at present ElasticSearch does not provide a group_by equivalent, here's my attempt to do it manually.
While the ES community is working for a direct solution to this problem (probably a plugin) here's a basic attempt which works for my needs.
Assumptions.
I'm looking for relevant content
I've assumed that first 300 docs are relevant, so I consider
restricting my research to this selection, regardless many or some
of these are from the same few authors.
for my needs I didn't "really" needed full pagination, it was enough
a "show more" button updated through ajax.
Drawbacks
results are not precise
as we take 300 docs per time we don't know how many unique docs will come out (possibly could be 300 docs from the same author!). You should understand if it fits your average number of docs per author and probably consider a limit.
you need to do 2 queries (waiting remote call cost):
first query asks for 300 relevant docs with just these fields: id & author_id
retrieve full docs of paginated ids in a second query
Here's some ruby pseudo-code: https://gist.github.com/saxxi/6495116
Now the 'group_by' issue have been updated, you can use this feature from elastic 1.3.0 #6124.
If you search for following query,
{
"aggs": {
"user_count": {
"terms": {
"field": "author_id",
"size": 0
}
}
}
}
you will get result
{
"took" : 123,
"timed_out" : false,
"_shards" : { ... },
"hits" : { ... },
"aggregations" : {
"user_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "mark",
"doc_count" : 87350
}, {
"key" : "pierre",
"doc_count" : 41809
}, {
"key" : "william",
"doc_count" : 24476
} ]
}
}
}

Resources