How to use MongoDB Ruby Driver to do a "Group" (group by)? - ruby

related to MongoDB Group using Ruby driver
if I want to do something like the following in SQL:
select page_id, count(page_id) from a_table group by page_id
I thought the MongoDB's doc says
http://api.mongodb.org/ruby/current/Mongo/Collection.html#group-instance_method
group(key, condition, initial, reduce, finalize = nil)
# returns an array
So from the other post, I am using:
Analytic.collection.group( "fucntion (x) return {page_id : x.page_id}",
nil,
{:count => 0},
"function(x, y) { y.count++ }" )
but it actually returns
[{"count"=>47.0}]
which is the total number of records (documents) in the collection. Is something not correct above? I thought the key might be a static string like in
http://kylebanker.com/blog/2009/11/mongodb-count-group/
db.pageviews.group(
{
key: {'user.agent': true},
initial: {sum: 0},
reduce: function(doc, prev) { prev.sum += 1}
});
but it is not in the other stackoverflow post.
Update: actually, in the link above, the solution like
Analytic.collection.group( ['page_id'], nil,
{:count => 0}, "function(x, y) { y.count++ }" )
works, but just wonder why the first method in this post didn't work.

The reason the first example didn't work is that you misspelled "function" as "fucntion". The following should work:
Analytic.collection.group( "function(x){ return {page_id : x.page_id}; }",
nil,
{ :count => 0 },
"function(x, y){ y.count++; }" )

I finally got it to work by
Analytic.collection.group( ['myapp_id'], {:page => 'products'},
{:pageviews => 0, :timeOnPage => 0},
"function(x, y) { y.pageviews += x.pageviews; y.timeOnPage += x.timeOnPage }" )
but then I used Map/Reduce afterwards as Map/Reduce seems like a more generic and powerful method.

Related

Rxjs6 - filter array of objects

I would filter array of objects using RXJS operator filter
I've array of objects like this single one:
{
id: string,
count: number
}
I would get objects which count > 20
I tried:
getVotes2(): Observable<Vote> {
return this._http.get<Vote>(url)
.pipe(
map( results => results ),
filter( result => result.count>20 )
);
}
next, without map and I always get all records.
Any ideas?
---------CORRECT CODE------------
getVotes2(): Observable<Vote[]> {
return this._http.get<Vote[]>(url)
.pipe(
map( results => results.filter( r => r.count < 20) )
)
}
You're confused on the use of the rx filter operator.
The filter rx operator is NOT the same as the array filter operator. The rx filter operates on a stream and excludes things from THE STREAM that meet a condition, the array filter operator operates on an array and removes items from an array based on a condition.
What you're currently doing is filtering the stream on some undefined "count" property of the array itself, so you're saying "if undefined > 20, then don't let the item through the stream", and one of the quirks of javascript, undefined is not greater than 20 despite being an invalid comparison.
What you need to do is this:
getVotes2(): Observable<Vote[]> {
return this._http.get<Vote[]>(url)
.pipe(
map( results => results.filter(r => r.count > 20) )
);
}
This way, you use rx Map to perform an operation on the item IN the stream and use the array filter on the item to filter the array.
Edit: as pointed out, the typing also needs to be correct to let typescript know that you're expecting an array of vote objects rather than a single vote object.
If http response you are getting is something like
{
data: {
results: [ {id: 'dd5144s', count: 14}, {id: 'dd51s4s', count: 22}, {id: 'dd5sa44s', count: 8} ]
}
}
Then try this:
return this._http.get<Vote>(url)
.pipe(
switchMap( results => results ),
filter( result => result.count>20 )
);
Hope this helps.

Bar Chart on Dimension-1 and Stacked by Dimension-2

Summary
I want to display a bar chart whose dimension is days and is stacked by a different category (i.e. x-axis = days and stack = category-1). I can do this "manually" in that I can write if-then's to zero or display the quantity, but I'm wondering if there's a systematic way to do this.
JSFiddle https://jsfiddle.net/wostoj/rum53tn2/
Details
I have data with dates, quantities, and other classifiers. For the purpose of this question I can simplify it to this:
data = [
{day: 1, cat: 'a', quantity: 25},
{day: 1, cat: 'b', quantity: 15},
{day: 1, cat: 'b', quantity: 10},
{day: 2, cat: 'a', quantity: 90},
{day: 2, cat: 'a', quantity: 45},
{day: 2, cat: 'b', quantity: 15},
]
I can set up a bar chart, by day, that shows total units and I can manually add the stacks for 'a' and 'b' as follows.
var dayDim = xf.dimension(_ => _.day);
var bar = dc.barChart("#chart");
bar
.dimension(dayDim)
.group(dayDim.group().reduceSum(
_ => _.cat === 'a' ? _.quantity : 0
))
.stack(dayDim.group().reduceSum(
_ => _.cat === 'b' ? _.quantity : 0
));
However, this is easy when my data has only 2 categories, but I'm wondering how I'd scale this to 10 or an unknown number of categories. I'd imagine the pseudo-code I'm trying to do is something like
dc.barChart("#chart")
.dimension(xf.dimension(_ => _.day))
.stackDim(xf.dimension(_ => _.cat))
.stackGroup(xf.dimension(_ => _.cat).group().reduceSum(_ => _.quantity));
I mentioned this in my answer to your other question, but why not expand on it a little bit here.
In the dc.js FAQ there is a standard pattern for custom reductions to reduce more than one value at once.
Say that you have a field named type which determines which type of value is in the row, and the value is in a field named value (in your case these are cat and quantity). Then
var group = dimension.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});
will reduce all the rows for each bin to an object where the keys are the types and the values are the sum of values with that type.
The way this works is that when crossfilter encounters a new key, it first uses the "initial" function to produce a new value. Here that value is an empty object.
Then for each row it encounters which falls into the bin labelled with that key, it calls the "add" function. p is the previous value of the bin, and v is the current row. Since we started with a blank object, we have to make sure we initialize each value; (p[v.type] || 0) will make sure that we start from 0 instead of undefined, because undefined + 1 is NaN and we hate NaNs.
We don't have to be as careful in the "remove" function, because the only way a row will be removed from a bin is if it was once added to it, so there must be a number in p[v.type].
Now that each bin contains an object with all the reduced values, the stack mixin has helpful extra parameters for .group() and .stack() which allow us to specify the name of the group/stack, and the accessor.
For example, if we want to pull items a and b from the objects for our stacks, we can use:
.group(group, 'a', kv => kv.value.a)
.stack(group, 'b', kv => kv.value.b)
It's not as convenient as it could be, but you can use these techniques to add stacks to a chart programmatically (see source).

Checking if a ruby hash contains a value greater than x

I have the following object returned from an InfluxDB query, and I want to be able to check if any of the derivatives are equal or greater than say 100, if so then do stuff.
I've been trying to use select to check that field, but I really don't actually understand how to work with a data structure like this. How would I go about iterating through every derivative value in my returned object?
I'm not really seeing an example that's similar to my case in the enumerable documentation.
https://ruby-doc.org/core-2.4.0/Enumerable.html
[{
"name" => "powerdns_value",
"tags" => nil,
"values" => [
{ "time" => "2017-03-21T14:20:00Z", "derivative" => 1},
{ "time" => "2017-03-21T14:30:00Z", "derivative" => 900},
{ "time" => "2017-03-21T14:40:00Z", "derivative" => 0},
{ "time" => "2017-03-21T15:20:00Z", "derivative" => 0}
]
}]
If you just want to know if one of the hashes in your array meet the condition
arr.first['values'].any? { |hash| hash['derivative'] >= 100 }

MongoDB and MongoRuby: Sorting on mapreduce

I am currently trying to do a simple mapreduce over some documents stored in MongoDB. I use
map = BSON::Code.new "function() { emit(this.userid, 1); }"
for the mapping and
reduce = BSON::Code.new "function(key, values) {
var sum = 0;
values.forEach(function(value) {
sum += value;
});
return sum;
}"
for the reduction. This works fine when I call map_reduce the following way:
output = col.map_reduce(map, reduce, # col is the collection in mongodb, e.g. db.users
{
:out => {:inline => true},
:raw => true
}
)
Now to the real question: How can I use the upper call to map_reduce to enable sorting? The manual says, that I must use sort and an array of [key, direction] pairs. I guessed the following should work, but it doesn't:
output = col.map_reduce(map, reduce,
{
:sort => [["value", Mongo::ASCENDING]],
:out => {:inline => true},
:raw => true
}
)
Do I have to choose another datatype? The option also doesn't work (same error), when using an empty [], although the manual says that is the default for the option. Unfortunately the error message from MongoDB doesn't help too much:
/usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/db.rb:506:in `command': Database command 'mapreduce' failed: {"assertion"=>"sort has to be blank or an Object", "assertionCode"=>13609, "errmsg"=>"db assertion failure", "ok"=>0.0} (Mongo::OperationFailure)
from /usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/collection.rb:576:in `map_reduce'
from ./mapreduce.rb:26:in `<main>'
If you need the full runnable code, please say so in the comments. I exclude it for now as it only contains the initialization of a connection to mongodb and initialization of the collection col by querying a database.
Use a BSON::OrderedHash and it will work.
output = col.map_reduce(map, reduce,
{
:sort => BSON::OrderedHash.new[{"value", Mongo::ASCENDING}],
:out => {:inline => true},
:raw => true
}
)

pythonic way to collect specific data from complex dict

i need to collect some data from complext dict based on Dot Notation key names
for example
sample data
data = {
'name': {'last': 'smith', 'first': 'bob'},
'address':{'city': 'NY', 'state': 'NY'},
'contact':{'phone':{'self':'1234', 'home':'222'}},
'age':38,
'other':'etc'
}
keys = ['contact.phone.self', 'name.last', 'age']
my logic
result = []
for rev_key in rev_keys:
current = data.copy()
rev_key = rev_key.split('.')
while rev_key:
value = rev_key.pop(0)
current = current[value]
result.append(current)
Thanks in advance!
[reduce(dict.get, key.split("."), data) for key in keys]
How about this?
def fetch( some_dict, key_iter ):
for key in key_iter:
subdict= some_dict
for field in key.split('.'):
subdict = subdict[field]
yield subdict
a_dict = {
'name': {'last': 'smith', 'first': 'bob'},
'address':{'city': 'NY', 'state': 'NY'},
'contact':{'phone':{'self':'1234', 'home':'222'}},
'age':38,
'other':'etc'
}
keys = ['contact.phone.self', 'name.last', 'age']
result = list( fetch( a_dict, keys ) )
Here's my crack at it:
>>> def find(tree,cur):
if len(cur)==1:
return tree[cur[0]]
else:
return find(tree[cur[0]],cur[1:])
>>> print [find(data,k.split(".")) for k in keys]
['1234', 'smith', 38]
Of course this will cause stack overflows if the items are too deeply nested (unless you explicitly raise the recursion depth), and I would use a deque instead of a list if this were production code.
Just write a function that gets one key at a time
def getdottedkey(data, dottedkey):
for key in dottedkey.split('.'):
data = data[key]
return data
print [getdottedkey(data, k) for k in keys]

Resources