ElasticSearch - How to query top 3 "messages" of distinct "PostId" - elasticsearch

Currently, I have a list of messages with below structure:
PostId - Message type - Message - MessageDate
Ex:
post1 , like , "abc ...", 22 Sep 10:00
post1 , like , "def ...", 22 Sep 10:01
post1 , comment , "xyz ...", 22 Sep 10:05
...
post2 , like , "abc ...", 22 Sep 10:00
post3 , like , "def ...", 22 Sep 11:10
....
postn , comment , "xyz ...", 22 Sep 12:05
My question, how can we write query to select top 3 new messages of distinct posts. With above sample data, I wish to get
Post1 and its messages
Post2 and its messages
Post3 and its messages
I am new on ES, please help me.

This can be achieved using a combination of terms aggregation with an inner top hits aggregation. First you build groups of all your messages by their postid and then for each group you query the top 3 messages according to a descending timestamp value.
GET so-posts/_search
{
"size": 0,
"aggs": {
"by_post_id": {
"terms": {
"field": "postid.keyword",
"size": 10
},
"aggs": {
"latest_messages": {
"top_hits": {
"size": 3,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
For more details, please see my answer on pretty similar problem.

Related

RethinkDB Query: How to pluck bassed on a date range

Given a table called alerts and a database called database with an array
of objects with a date attribute called History how can I pluck based
on a date range on that date attribute?
with the following query,
r.db("database").table("alerts").pluck("history").limit(10000)
I get back something like the following
{
"history": [
{
"text": "text1" ,
"updateTime": Thu Jun 20 2019 01:29:47 GMT+00:00 ,
},
{
"text": "text2" ,
"updateTime": Thu Jun 20 2019 01:24:59 GMT+00:00 ,
},
]
}
{
"history": [
{
"text": "text3" ,
"updateTime": Thu Jun 20 2018 01:29:47 GMT+00:00 ,
},
{
"text": "text4" ,
"updateTime": Thu Jun 20 2018 01:24:59 GMT+00:00 ,
},
]
}
how can I pluck the sub object called history and only return histories that are in a specific range on the updateTime attribute.
for example between jan/2/2009 to jan/3/2009
You need to filter based on a time range and use pluck on a nested object. Here are some examples about how to do that from the official documentation
r.table("users").filter(function (user) {
return user("subscriptionDate").during(
r.time(2012, 1, 1, 'Z'), r.time(2013, 1, 1, 'Z'));
}).run(conn, callback);
Source: https://www.rethinkdb.com/api/javascript/filter/
r.table('marvel').pluck({'abilities' : {'damage' : true, 'mana_cost' : true}, 'weapons' : true}).run(conn, callback)
Source: https://www.rethinkdb.com/api/javascript/pluck/

Android fitness REST API missing data points

Here is what I am doing:
I installed Google Fit app on my phone and collected some fitness data:
Then I wen to OAuth 2.0 Playground and tried to read that data with a REST request:
Method: POST
URI: https://www.googleapis.com/fitness/v1/users/me/dataset:aggregate
BODY:
{
"aggregateBy": [{
"dataTypeName": "com.google.calories",
"dataSourceId": "derived:com.google.calories.bmr:com.google.android.gms:merged"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1547232519000,
"endTimeMillis": 1547837319000
}
What I expected to get:
7 datasets for 7 following days with one datapoint in each. Expected values are as follows:
12th January: 0
13th January: 0
14th January: 1688
15th January: 1934
16th January: 844
17th January: 0
18th January: 857
What I actually get is:
All days but 14th (with different start and end time of course):
{
"startTimeMillis": "1547578119000",
"endTimeMillis": "1547664519000",
"dataset": [
{
"dataSourceId":"derived:com.google.calories.bmr.summary:com.google.android.gms:aggregated",
"point": []
}
]
},
14th January:
{
"startTimeMillis": "1547491719000",
"endTimeMillis": "1547578119000",
"dataset": [
{
"dataSourceId": "derived:com.google.calories.bmr.summary:com.google.android.gms:aggregated",
"point": [
{
"startTimeNanos": "1547500395267000000",
"originDataSourceId": "derived:com.google.calories.bmr:com.google.android.gms:from_height&weight",
"endTimeNanos": "1547500402445000000",
"value": [
{
"mapVal": [],
"fpVal": 1688.25
},
{
"mapVal": [],
"fpVal": 1688.25
},
{
"mapVal": [],
"fpVal": 1688.25
}
],
"dataTypeName": "com.google.calories.bmr.summary"
}
]
}
]
},
Does anyone know why I don't get any value for most of the brackets while I do get value for one of them? And why is the value for 14th listed 3 times?
(Also I can't force these code blocks to format properly, apologies for that)
PUT THE CURRENT DATE AND TIME
1 Nov 2019: 1572586200 so the "startTimeMillis": "1572586200000"
8 Nov 2019: 1573191000 so the "endTimeMillis": "1573191000000"
put these 2 in the request body and it should work.

rethinkdb grouping to calculate balance of user funds

i am using rethinkdb with nodejs. i have a funds table and i am trying to calculate balance of any user by adding all the credit entries minus total debit entries. So far i was able to run following query.
r.db('testDB').table('funds').filter({userId:'63755d1e-e82e-4072-8312-4fcd88f1dfd3'}).group(function(g){
return g('userId')
})
this will produce following results.
[
{
"group": "63755d1e-e82e-4072-8312-4fcd88f1dfd3" ,
"reduction": [
{
"createdAt": Mon Jun 06 2016 14:17:26 GMT+00:00 ,
"createdBy": "63755d1e-e82e-4072-8312-4fcd88f1dfd3" ,
"credit": 900 ,
"id": "2afaca8e-6b4f-4ed5-a8ef-7fed3ce5ca67" ,
"userId": "63755d1e-e82e-4072-8312-4fcd88f1dfd3"
} ,
{
"createdAt": Fri Jun 17 2016 09:02:19 GMT+00:00 ,
"createdBy": "63755d1e-e82e-4072-8312-4fcd88f1dfd3" ,
"credit": 150 ,
"id": "c023ea2d-0d28-4f4b-ae6c-1c41c49aca08" ,
"userId": "63755d1e-e82e-4072-8312-4fcd88f1dfd3"
} ,
{
"createdAt": Fri Jun 17 2016 08:54:56 GMT+00:00 ,
"createdBy": "63755d1e-e82e-4072-8312-4fcd88f1dfd3" ,
"debit": 50 ,
"id": "89fd4a56-8722-4e86-8409-d42e4041e38d" ,
"userId": "63755d1e-e82e-4072-8312-4fcd88f1dfd3"
}
]
}
]
I have tried to use concatMap function and inside that tried using branch to check if its debit or credit but its not working.
this throwing errors
r.db('testDB').table('funds').filter({userId:'63755d1e-e82e-4072-8312-4fcd88f1dfd3'}).group(function(g){
return g('userId')
}).ungroup().concatMap(function(m){
//return m('reduction')('credit')
return r.branch (m('reduction')('credit').gt(0), 'c', 'd')
})
e: Cannot convert STRING to SEQUENCE in:
another approaching using reduce function provide me sum for all the credit entries but i dont know how to sum all debits.
r.db('testDB').table('funds').filter({userId:'63755d1e-e82e-4072-8312-4fcd88f1dfd3'}).group(function(g){
return g('userId')
}).ungroup().concatMap(function(m){
return m('reduction')('credit')
// return r.branch (m('reduction')('credit').gt(0), 'c', 'd')
})
.reduce(function(left, right){
return left.add(right);
})
result is 1050
You probably want something like this:
r.db('testDB').table('funds').group('userId').map(function(row) {
return row('credit').default(0).sub(row('debit').default(0));
}).sum()

Count the different variables in an array in my document

I'm trying out rethinkDB and playing around with some query to see if it could fit by use case. So far, so good. However, I have a question regarding reQL.
For example in this case I store analytics events in rethinkDB such as:
[{
"userId": "abdf213",
"timestamp": "Sat Jan 17 2015 00:32:20 GMT+00:00",
"action": "Page"
},
{
"userId": "123abc",
"timestamp": "Sat Jan 17 2015 00:42:20 GMT+00:00",
"action": "Track"
},
{
"userId": "abdf213",
"timestamp": "Sat Jan 17 2015 00:45:20 GMT+00:00",
"action": "Track"
},
{
"userId": "123abc",
"timestamp": "Sat Jan 17 2015 00:44:20 GMT+00:00",
"action": "Page"
},
{
"userId": "123abc",
"timestamp": "Sat Jan 17 2015 00:48:20 GMT+00:00",
"action": "Page"
}]
I'd like the end result of my query to look like this:
{
"group": "123abc",
"reduction": {
"Page": 2,
"Track": 1
}
},
{
"group": "abdf213",
"reduction": {
"Page": 1,
"Track": 1
}
}
Bear in mind that the action name are not known in advance.
TBH, I'm not quite sure how to achieve this with ReQL.
Right now I have this query (using the data explorer):
r.db('test').table('events').group('userId').map(function(event) {
return event('action')
})
which return doc like this one:
{
"group": "-71omc5zdgdimpuveheqs6dvt5q6xlwenjg7m" ,
"reduction": [
"Identify" ,
"Page" ,
"Track"
]
}
Anyone can point me in the right direction here?
Cheers,
S
Try:
r.table('events').group('userId').map(function(event) {
return r.object(event('action'), 1);
}).reduce(function(a, b) {
return a.merge(b.keys().map(function(key) {
return [key, a(key).default(0).add(b(key))];}).coerceTo('object'));
})
Here's my solution:
r.table("events").group("userId", "action").count().ungroup()
.group(r.row("group")(0))
.map([r.row("group")(1), r.row("reduction")])
.coerceTo("object")
ReQL doesn't support nesting groups, but you can group by multiple fields at the same time and then performing further grouping on the output.

Elasticsearch Query - how to?

I have the data in the following format in Elastic Search (from sense)
POST slots/slot/1
{
locationid:"1",
roomid:"10",
starttime: "08:45"
}
POST slots/slot/2
{
locationid:"1",
roomid:"10",
starttime: "09:00"
}
POST slots/slot/3
{
locationid:"2",
roomid:"100",
starttime: "08:45"
}
POST slots/slot/4
{
locationid:"2",
roomid:"101",
starttime: "09:00"
}
POST slots/slot/5
{
locationid:"3",
roomid:"200",
starttime: "09:30"
}
In short , the data is in the following format.
A Location has multiple rooms and each room has multiple slots of 15 minutes. So slot 1 for Room10 starts at 8:45 and ends at 09:00, Slot 2 for same room starts at 09:00 and ends at 09:15
Locationid RoomId Starttime
--------------------------------------
1 10 08:45
1 10 09:00
2 100 08:45
2 101 09:00
3 200 09:30
Im trying to write a query/filter which will give me all locations where a room is available with two or three slots.
For e.g Find a location that has 08:45 slot and 09:00 slot (configurable)
Answer should be location 1 only
Should Not be location 2 as room 100 has 08:45 slot but not the 09:00 slot. Room 101 has 09:00 slot but doesnt have the 08:45 slot
I believe this is not the best approach , but my attempt for the answer
POST slots/slot/_search?pretty=true&search_type=count
{
"facets": {
"locationswithslots": {
"terms": {
"field": "locationid",
"script" : "term + \"_\" + _source.roomid",
"size": 10
},
"facet_filter":
{
"terms":
{
"starttime":
[
"08:45",
"09:00"
]
}
}
}
}
}
This gives the answer as below
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"facets": {
"locationswithslots": {
"_type": "terms",
"missing": 0,
"total": 4,
"other": 0,
"terms": [
{
"term": "1_10",
"count": 2
},
{
"term": "2_101",
"count": 1
},
{
"term": "2_100",
"count": 1
}
]
}
}
}
Now I need to figure out a way to filter the facets that return count 2 as I passed in 2 slots in the filter.
Any other option possible?

Resources