Im working on a nodejs+mongodb project using mongoose. Now I have come across a question I don't know the answer to.
I am using aggregation framework to get grouped results. The grouping is done on a date excluding time data field like: "2013 02 06". Code looks like this:
MyModel.aggregate([
{$match: {$and: [{created_date: {$gte: start_date}}, {created_date: {$lte: end_date}}]}},
{$group: {
_id: {
year: {$year: "$created_at"},
month: {$month: "$created_at"},
day: {$dayOfMonth: "$created_at"}
},
count: {$sum: 1}
}},
{$project: {
date: {
year: "$_id.year",
month:"$_id.month",
day:"$_id.day"
},
count: 1,
_id: 0
}}
], callback);
The grouped results are perfect, except that they are not sorted. Here is an example of output:
[
{
count: 1,
date: {
year: 2013,
month: 2,
day: 7
}
},
{
count: 1906,
date: {
year: 2013,
month: 2,
day: 4
}
},
{
count: 1580,
date: {
year: 2013,
month: 2,
day: 5
}
},
{
count: 640,
date: {
year: 2013,
month: 2,
day: 6
}
}
]
I know the sorting is done by adding this: {$sort: val}. But now I'm not sure what should be the val so the results would be sorted by date as my grouping key es an object of 3 values constructing the date. Does anyone know how this could be accomplished?
EDIT
Have tryed this and it worked :)
{$sort: {"date.year":1, "date.month":1, "date.day":1}}
It appears that this question has a very simple answer :) Just need to sort by multiple nesteed columns like this:
{$sort: {"date.year":1, "date.month":1, "date.day":1}}
I got stuck with the same problem, thanks for your answer.
But I found out that you can get the same result with less code
MyModel.aggregate([
{$match: {$and: [{created_date: {$gte: start_date}}, {created_date: {$lte: end_date}}]}},
{$group: {
_id: {
year: {$year: "$created_at"},
month: {$month: "$created_at"},
day: {$dayOfMonth: "$created_at"}
},
count: {$sum: 1}
}},
{$project: {
date: "$_id", // so this is the shorter way
count: 1,
_id: 0
}},
{$sort: {"date": 1} } // and this will sort based on your date
], callback);
This would work if you are only sorting by date if you had other columsn to sort on. YOu would need to expand _id
Related
Trying to build a grid with months as columns using webdatarocks, and the problem is that columns are sorted alphabetically (Apr 2020, Aug 2020, Dec 200, ...). Is there an option to order columns by date (Dec 200, Nov 2020, Oct 2020, ...)?
Example is available here
https://codesandbox.io/s/nifty-stonebraker-7mf56?file=/src/App.tsx
This is possible by adding an object to your data that will define data types. Here is an explanation.
In your case, this object would look this way:
{
"CONTRACT": {
"type": "string"
},
"value": {
"type": "number"
},
"date": {
"type": "date string"
},
"name": {
"type": "string"
}
}, {
type: "CONTRACT",
value: 217,
date: "Dec 2020",
name: "24"
}, {
type: "CONTRACT",
value: 725.84,
date: "Dec 2020",
name: "3 "
}, ...
After this, the columns should be ordered by dates. Note that input dates should be formatted properly (compliant with ISO 8601).
The way dates are shown inside WebDataRocks can be modified with the help of datePattern from options.
In Elastic I'd like to sort results by start_date ascending, but with past dates showing up after upcoming dates.
Example desired results:
[
{id: 5, start_date: '3000-01-01'},
{id: 7, start_date: '3001-01-01'},
{id: 8, start_date: '3002-01-01'},
{id: 1, start_date: '1990-01-01'},
{id: 4, start_date: '1991-01-01'},
{id: 3, start_date: '1992-01-01'},
]
Something like this would be possible in SQL:
ORDER BY (start_date > NOW()) DESC, start_date ASC
But I'm not sure how to accomplish this in Elastic. The only thing I can think of would be to set a boolean is_upcoming flag and reindex that every day.
Also I could be limiting and paginating the # of search results, so fetching them in reverse start_date order and then manipulating the results in my code isn't really doable.
It's perfectly possible using a sort script if your start_date is of type date and its format is yyyy-MM-dd (I found YYYY-... to not work properly).
GET future/_search
{
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "return doc['start_date'].value.millis > params.now ? (doc['start_date'].value.millis - params.now) : Long.MAX_VALUE",
"params": {
"now": 1594637988236
}
},
"order": "asc"
}
},
{
"start_date": {
"order": "asc"
}
}
]
}
The parametrized now is needed for synchronization reasons as described here.
I have the next document:
{
id: 222,
email: user#user.com,
experiences: [
{
id: 3,
position: "Programmer",
description: "Programming things"
init_date: "1990-01-01",
end_date: "1999-05-11"
},
{
id: 4,
position: "Full Stack Developer",
description: "Programming things"
init_date: "1999-01-01",
end_date: "2008-05-11"
},
{
id: 7,
position: "Gardener",
description: "Taking care of flowers"
init_date: "2009-01-01",
end_date: "2015-05-11"
},
]
}
So, I would like to do the next filter: keyword: programming, experience years: > 3
The experience years should be the sum of the experiences that match the keyword.
Is it possible to do in only one query?
At the time of Indexing itself you add one extra field for experience, Instead of calculating with query. It will be faster easy to query also.
Indexed documents are like:
{
id: 1,
title: 'Blah',
...
platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
...
}
What I want is count and output stats-by-platform.
For counting, I can use terms aggregation with platform.id as a field to count:
aggs: {
platforms: {
terms: {field: 'platform.id'}
}
}
This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.
Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
name: {terms: {field: 'platform.name'}},
url: {terms: {field: 'platform.url'}}
}
}
}
Which, in fact, works, and returns pretty complicated structure in each bucket:
{key: 7,
doc_count: 528568,
url:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "http://facebook.com", doc_count: 528568}]},
name:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "Facebook", doc_count: 528568}]}},
Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?
It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
}
}
This way, each bucked will look like:
{"key": 7,
"doc_count": 529939,
"platform": {
"hits": {
"hits": [{
"_source": {
"platform":
{"id": 7, "name": "Facebook", "url": "http://facebook.com"}
}
}]
}
},
}
Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform
If you don't necessarily need to get the value of platform.id, you could get away with a single aggregation instead using a script that concatenates the two fields name and url:
aggs: {
platforms: {
terms: {script: 'doc["platform.name"].value + "," + doc["platform.url"].value'}
}
}
I am really new to elasticsearch world.
Let's say I have a nested aggregation on two fields : field1 and field2 :
{
...
aggs: {
field1: {
terms: {
field: 'field1'
},
aggs: {
field2: {
terms: {
field: 'field2'
}
}
}
}
}
}
This piece of code works perfectly and gives me something like this :
aggregations: {
field1: {
buckets: [{
key: "foo",
doc_count: 123456,
field2: {
buckets: [{
key: "bar",
doc_count: 34323
},{
key: "baz",
doc_count: 10
},{
key: "foobar",
doc_count: 36785
},
...
]
},{
key: "fooOO",
doc_count: 423424,
field2: {
buckets: [{
key: "bar",
doc_count: 35
},{
key: "baz",
doc_count: 2435453
},
...
]
},
...
]
}
}
Now, my need is to exclude all aggregation results where doc_count is less than 1000 for instance and get this instead :
aggregations: {
field1: {
buckets: [{
key: "foo",
doc_count: 123456,
field2: {
buckets: [{
key: "bar",
doc_count: 34323
},{
key: "foobar",
doc_count: 36785
},
...
]
},{
key: "fooOO",
doc_count: 423424,
field2: {
buckets: [{
key: "baz",
doc_count: 2435453
},
...
]
},
...
]
}
}
Is it possible to set this need in the query body ? or do I have to perform the filter in the caller layout (in javascript in my case)?
Thanks in advance
Next time, M'sieur Toph' : RTFM !!!
I feel really dumb: I found the anwser in the manual, 30 seconds after asking.
I don't remove my question because, it can help, who knows...
Here is the anwser :
You can specify the min_doc_count property in the terms aggregation.
It gives you :
{
...
aggs: {
field1: {
terms: {
field: 'field1',
min_doc_count: 1000
},
aggs: {
field2: {
terms: {
field: 'field2',
min_doc_count: 1000
}
}
}
}
}
}
You also can specify a specific minimal count for each level of your aggregation.
What else ? :)