I have a difficulties with elasticsearch.
Here is what I want to do:
Let's say unit of my index looks like this:
{
transacId: "qwerty",
amount: 150,
userId: "adsf",
client: "mobile",
goal: "purchase"
}
I want to build different types of statistics of this data and elasticsearch does it really fast. The problem I have is that in my system user can add new field in transaction on demand. Let's say we have another row in the same index:
{
transacId: "qrerty",
amount: 200,
userId: "adsf",
client: "mobile",
goal: "purchase",
token_1: "game"
}
So now I want to group by token_1.
{
query: {
match: {userId: "asdf"}
},
aggs: {
token_1: {
terms: {field: "token_1"},
aggs: {sumAmt: {sum: {field: "amount"}}}
}
}
}
Problem here that it will aggregate only documents with field token_1. I know there is aggregation missing and I can do something like this:
{
query: {
match: {userId: "asdf"}
},
aggs: {
token_1: {
missing: {field: "token_1"},
aggs: {sumAmt: {sum: {field: "amount"}}}
}
}
}
But in this case it will aggregate only documents without field token_1, what I want is to aggregate both types of documents in on query. I tried do this, but it also didn't work for me:
{
query: {
match: {userId: "asdf"}
},
aggs: {
token_1: {
missing: {field: "token_1"},
aggs: {sumAmt: {sum: {field: "amount"}}}
},
aggs: {
token_1: {
missing: {field: "token_1"},
aggs: {sumAmt: {sum: {field: "amount"}}}
}
}
}
}
I think may be there is something like operator OR in aggregation, but I couldn't find anything. Help me, please.
Related
I have the following 2 documents indexed.
{
region: 'US',
manager: {
age: 30,
name: {
first: 'John',
last: 'Smith',
},
},
},
{
region: 'US',
manager: {
age: 30,
name: {
first: 'John',
last: 'Cena',
},
},
}
I am trying to search and sort them by their last name. I have tried the following query.
{
sort: [
{
'manager.name.first': {
order: 'desc',
nested: {
path: 'manager.name.first',
},
},
},
],
query: {
match: {
'manager.name.first': 'John',
},
},
},
I am getting the following error in response. What am I doing wrong here (I am very new to this elasticsearch, so apologize if this is a very basic thing I am not aware of)
ResponseError: search_phase_execution_exception: [query_shard_exception] Reason: [nested] failed to find nested object under path [manager.name.first]
I also tried path: 'manager.name', but that also didn't work.
You need to use only manager as nested path as that is only field define as nested type.
{
"sort": [
{
"manager.name.first.keyword": {
"order": "desc",
"nested": {
"path": "manager"
}
}
}
]
}
Use manager.name.first as field name if it is defined as keyword type otherwise use manager.name.first.keyword if it is define as multi type field with text and keyword both.
I'm currently trying to set up a search where a user can search for room bookings with a degree of fuzziness, i.e. when there are no exact matches, the user sees results that have availabilty around that date. In my document, a bed has a premises and multiple bookings like so:
{
"bed": {
....
"premises": {
....
},
"bookings": [
{
date_from: "2020-01-02",
date_to: "2020-02-22"
},
....
]
}
}
I've attempted to add a function_score to my query as follows:
{
gauss: {
'bookings.start_time': {
origin: this.filterArgs.date_from,
scale: '10d',
offset: '2d',
},
'bookings.end_time': {
origin: this.filterArgs.date_to,
scale: '10d',
offset: '2d',
},
},
}
But it seems that this prioritises beds with bookings that match that date. Is there any way to do the inverse of this - i.e. prioritise the beds with no bookings for a given date range?
Any help would be appreciated!
Edit: Here's my index mapping:
{
mappings: {
properties: {
premises: {
properties: {
location: {
type: 'geo_point',
},
},
},
bookings: {
properties: {
start_time: {
type: 'date',
},
end_time: {
type: 'date',
},
},
},
},
},
}
The following two queries should return the same output, but they don't.
I am trying to load links between users on a map, since we have too much payload, I need to split the loading. Therefore I have to use this query to load the links that are necessary.
As mentioned the issue I am having is that these two return different results, which in my opinion they shouldn't. We are using GraphQL with Amplify inside of an React application. The data is stored on AWS.
entry in db:
source: "b864749a-c4bf-4c93-93db-dfa868ffc31d"
target: "cf7f4036-2df2-47ee-a3d7-96b77fc7fd1c"
giving no result:
query ListLinks(
$nextToken: String
) {
listLinks(filter: {
or: [{
and: [{
source: { eq: "b864749a-c4bf-4c93-93db-dfa868ffc31d" },
target: { eq: "cf7f4036-2df2-47ee-a3d7-96b77fc7fd1c" }
}],
and: [{
source: { eq: "cf7f4036-2df2-47ee-a3d7-96b77fc7fd1c" },
target: { eq: "b864749a-c4bf-4c93-93db-dfa868ffc31d" }
}]
}]
}, limit: 999, nextToken: $nextToken) {
items {
id
source
target
relation
verified
talentMap {
id
createdAt
updatedAt
}
createdAt
updatedAt
}
nextToken
}
}
giving result:
query ListLinks(
$nextToken: String
) {
listLinks(filter: {
or: [{
and: [{
target: { eq: "b864749a-c4bf-4c93-93db-dfa868ffc31d" },
source: { eq: "cf7f4036-2df2-47ee-a3d7-96b77fc7fd1c" }
}],
and: [{
target: { eq: "cf7f4036-2df2-47ee-a3d7-96b77fc7fd1c" },
source: { eq: "b864749a-c4bf-4c93-93db-dfa868ffc31d" }
}]
}]
}, limit: 999, nextToken: $nextToken) {
items {
id
source
target
relation
verified
talentMap {
id
createdAt
updatedAt
}
createdAt
updatedAt
}
nextToken
}
}
Any idea why this is the case?
Indexed documents are like:
{
id: 1,
title: 'Blah',
...
platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
...
}
What I want is count and output stats-by-platform.
For counting, I can use terms aggregation with platform.id as a field to count:
aggs: {
platforms: {
terms: {field: 'platform.id'}
}
}
This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.
Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
name: {terms: {field: 'platform.name'}},
url: {terms: {field: 'platform.url'}}
}
}
}
Which, in fact, works, and returns pretty complicated structure in each bucket:
{key: 7,
doc_count: 528568,
url:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "http://facebook.com", doc_count: 528568}]},
name:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "Facebook", doc_count: 528568}]}},
Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?
It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
}
}
This way, each bucked will look like:
{"key": 7,
"doc_count": 529939,
"platform": {
"hits": {
"hits": [{
"_source": {
"platform":
{"id": 7, "name": "Facebook", "url": "http://facebook.com"}
}
}]
}
},
}
Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform
If you don't necessarily need to get the value of platform.id, you could get away with a single aggregation instead using a script that concatenates the two fields name and url:
aggs: {
platforms: {
terms: {script: 'doc["platform.name"].value + "," + doc["platform.url"].value'}
}
}
I am really new to elasticsearch world.
Let's say I have a nested aggregation on two fields : field1 and field2 :
{
...
aggs: {
field1: {
terms: {
field: 'field1'
},
aggs: {
field2: {
terms: {
field: 'field2'
}
}
}
}
}
}
This piece of code works perfectly and gives me something like this :
aggregations: {
field1: {
buckets: [{
key: "foo",
doc_count: 123456,
field2: {
buckets: [{
key: "bar",
doc_count: 34323
},{
key: "baz",
doc_count: 10
},{
key: "foobar",
doc_count: 36785
},
...
]
},{
key: "fooOO",
doc_count: 423424,
field2: {
buckets: [{
key: "bar",
doc_count: 35
},{
key: "baz",
doc_count: 2435453
},
...
]
},
...
]
}
}
Now, my need is to exclude all aggregation results where doc_count is less than 1000 for instance and get this instead :
aggregations: {
field1: {
buckets: [{
key: "foo",
doc_count: 123456,
field2: {
buckets: [{
key: "bar",
doc_count: 34323
},{
key: "foobar",
doc_count: 36785
},
...
]
},{
key: "fooOO",
doc_count: 423424,
field2: {
buckets: [{
key: "baz",
doc_count: 2435453
},
...
]
},
...
]
}
}
Is it possible to set this need in the query body ? or do I have to perform the filter in the caller layout (in javascript in my case)?
Thanks in advance
Next time, M'sieur Toph' : RTFM !!!
I feel really dumb: I found the anwser in the manual, 30 seconds after asking.
I don't remove my question because, it can help, who knows...
Here is the anwser :
You can specify the min_doc_count property in the terms aggregation.
It gives you :
{
...
aggs: {
field1: {
terms: {
field: 'field1',
min_doc_count: 1000
},
aggs: {
field2: {
terms: {
field: 'field2',
min_doc_count: 1000
}
}
}
}
}
}
You also can specify a specific minimal count for each level of your aggregation.
What else ? :)