MongoDB and MongoRuby: Sorting on mapreduce - ruby

I am currently trying to do a simple mapreduce over some documents stored in MongoDB. I use
map = BSON::Code.new "function() { emit(this.userid, 1); }"
for the mapping and
reduce = BSON::Code.new "function(key, values) {
var sum = 0;
values.forEach(function(value) {
sum += value;
});
return sum;
}"
for the reduction. This works fine when I call map_reduce the following way:
output = col.map_reduce(map, reduce, # col is the collection in mongodb, e.g. db.users
{
:out => {:inline => true},
:raw => true
}
)
Now to the real question: How can I use the upper call to map_reduce to enable sorting? The manual says, that I must use sort and an array of [key, direction] pairs. I guessed the following should work, but it doesn't:
output = col.map_reduce(map, reduce,
{
:sort => [["value", Mongo::ASCENDING]],
:out => {:inline => true},
:raw => true
}
)
Do I have to choose another datatype? The option also doesn't work (same error), when using an empty [], although the manual says that is the default for the option. Unfortunately the error message from MongoDB doesn't help too much:
/usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/db.rb:506:in `command': Database command 'mapreduce' failed: {"assertion"=>"sort has to be blank or an Object", "assertionCode"=>13609, "errmsg"=>"db assertion failure", "ok"=>0.0} (Mongo::OperationFailure)
from /usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/collection.rb:576:in `map_reduce'
from ./mapreduce.rb:26:in `<main>'
If you need the full runnable code, please say so in the comments. I exclude it for now as it only contains the initialization of a connection to mongodb and initialization of the collection col by querying a database.

Use a BSON::OrderedHash and it will work.
output = col.map_reduce(map, reduce,
{
:sort => BSON::OrderedHash.new[{"value", Mongo::ASCENDING}],
:out => {:inline => true},
:raw => true
}
)

Related

Rxjs6 - filter array of objects

I would filter array of objects using RXJS operator filter
I've array of objects like this single one:
{
id: string,
count: number
}
I would get objects which count > 20
I tried:
getVotes2(): Observable<Vote> {
return this._http.get<Vote>(url)
.pipe(
map( results => results ),
filter( result => result.count>20 )
);
}
next, without map and I always get all records.
Any ideas?
---------CORRECT CODE------------
getVotes2(): Observable<Vote[]> {
return this._http.get<Vote[]>(url)
.pipe(
map( results => results.filter( r => r.count < 20) )
)
}
You're confused on the use of the rx filter operator.
The filter rx operator is NOT the same as the array filter operator. The rx filter operates on a stream and excludes things from THE STREAM that meet a condition, the array filter operator operates on an array and removes items from an array based on a condition.
What you're currently doing is filtering the stream on some undefined "count" property of the array itself, so you're saying "if undefined > 20, then don't let the item through the stream", and one of the quirks of javascript, undefined is not greater than 20 despite being an invalid comparison.
What you need to do is this:
getVotes2(): Observable<Vote[]> {
return this._http.get<Vote[]>(url)
.pipe(
map( results => results.filter(r => r.count > 20) )
);
}
This way, you use rx Map to perform an operation on the item IN the stream and use the array filter on the item to filter the array.
Edit: as pointed out, the typing also needs to be correct to let typescript know that you're expecting an array of vote objects rather than a single vote object.
If http response you are getting is something like
{
data: {
results: [ {id: 'dd5144s', count: 14}, {id: 'dd51s4s', count: 22}, {id: 'dd5sa44s', count: 8} ]
}
}
Then try this:
return this._http.get<Vote>(url)
.pipe(
switchMap( results => results ),
filter( result => result.count>20 )
);
Hope this helps.

RxJs Observable: Execute function if empty/filtered

I've got an Observable that listens to some user input from a text box. If the observed string's length is >=3 (filter), it executes some HTTP call (switchMap).
Now I'd like to detect somehow if the user input has been filtered. Reason:
If the HTTP call has been done, it should show the results.
If the user input got filtered (== is invalid), it should clear the results.
Here's the code I'd like to have (see: ifFiltered):
this.userInput.valueChanges
.filter(val => val && val.length >= 3)
.ifFiltered(() => this.results = [])
.switchMap(val => getDataViaHTTP())
.subscribe(val => this.results = val);
I know, I could place that logic within the filter function for this simple example. But what if I have 10 different filters?
Did I miss any method that satisfies my needs?
Thanks in advance!
Either use partition like here RxJS modeling if else control structures with Observables operators
Or instead of filter use map and pipe the object if the former filter condition is true or null otherwise. so you can catch the null where ever you want in your chain with a filter.
Last option call some function in the else part of the filter function
We've had a similar case and tried it with partition as mentioned above but found it much handier to use throw here. So for your code
this.userInput.valueChanges
.do(val => {
if (!val || val.length < 3) {
throw new ValueTooShortError();
}
})
.switchMap(val => getDataViaHTTP())
.do(val => this.results = val)
.catch(() => this.results = [])
.subscribe();
I suggest having a common event stream, creating two filtered streams, and merging the two before subscription:
var o = this.userInput.valueChanges;
var empty= o.filter(t=> t.length < 3)
.map(t=>[])
var nonempty = o.filter(t=> t.length >= 3)
.switchMap(t=> getDataViaHTTP());
empty.merge(nonempty).subscribe(val => this.results = val);
I found another nice solution for my use case using Validators:
(I know that this is no solution using Observables as the question stated. Instead it's using Angular2 features to workaround the problem nicely.)
this.userInput.validator = Validators.compose([
Validators.required,
Validators.minLength(3)
]);
this.userInput.valueChanges
.filter(val => this.userInput.valid)
.switchMap(val => getDataViaHTTP())
.subscribe(val => this.results = val);
Now I can use the userInput.valid property and/or the userInput.statusChanges Observable to keep track of the input value.
May be it's late, but wanted to post for the members still seeking a more cleaner approach to validate IF EMPTY inside .map:
of(fooBar).pipe(
map(
(val) =>
({
...val,
Foo: (val.Bar
? val.Foo.map((e) => ({
title: e.Title,
link: e.Link,
}))
: []) as fooModal[],
}));
This code returns a empty array if val.bar is missing, but it's just an example you can use any validation & expression instead.

How to increase the speed of this MongoDB query?

MongoDB 2.0.7 & PHP 5
I'm trying to count the length of each array. Every document has one array. I want to get the number of elements in each array and the ID of the document. There are no indexes except from Id.
Here's my code:
$map = new MongoCode("function() {
emit(this._id,{
'_id':this._id,'cd':this.cd,'msgCount':this.cs[0].msgs.length}
);
}");
$reduce = new MongoCode("function(k, vals) {
return vals[0];
}");
$cmmd = smongo::$db->command(array(
"mapreduce" => "sessions",
"map" => $map,
"reduce" => $reduce,
"out" => "result"));
These are the timings. As you can see, the query is very slow
Array
(
[result] => result
[timeMillis] => 29452
[counts] => Array
(
[input] => 106026
[emit] => 106026
[reduce] => 0
[output] => 106026
)
[ok] => 1
)
How can I reduce the timings?
If you are going to frequently need the counts for your arrays, a better approach would be to include a count field in your actual documents. Otherwise you are going to be scanning all documents to do the count (as per your Map/Reduce example).
You can use an Atomic Operation such as $inc to increment/decrement this count at the same time as you are updating the arrays.

How to use MongoDB Ruby Driver to do a "Group" (group by)?

related to MongoDB Group using Ruby driver
if I want to do something like the following in SQL:
select page_id, count(page_id) from a_table group by page_id
I thought the MongoDB's doc says
http://api.mongodb.org/ruby/current/Mongo/Collection.html#group-instance_method
group(key, condition, initial, reduce, finalize = nil)
# returns an array
So from the other post, I am using:
Analytic.collection.group( "fucntion (x) return {page_id : x.page_id}",
nil,
{:count => 0},
"function(x, y) { y.count++ }" )
but it actually returns
[{"count"=>47.0}]
which is the total number of records (documents) in the collection. Is something not correct above? I thought the key might be a static string like in
http://kylebanker.com/blog/2009/11/mongodb-count-group/
db.pageviews.group(
{
key: {'user.agent': true},
initial: {sum: 0},
reduce: function(doc, prev) { prev.sum += 1}
});
but it is not in the other stackoverflow post.
Update: actually, in the link above, the solution like
Analytic.collection.group( ['page_id'], nil,
{:count => 0}, "function(x, y) { y.count++ }" )
works, but just wonder why the first method in this post didn't work.
The reason the first example didn't work is that you misspelled "function" as "fucntion". The following should work:
Analytic.collection.group( "function(x){ return {page_id : x.page_id}; }",
nil,
{ :count => 0 },
"function(x, y){ y.count++; }" )
I finally got it to work by
Analytic.collection.group( ['myapp_id'], {:page => 'products'},
{:pageviews => 0, :timeOnPage => 0},
"function(x, y) { y.pageviews += x.pageviews; y.timeOnPage += x.timeOnPage }" )
but then I used Map/Reduce afterwards as Map/Reduce seems like a more generic and powerful method.

increment value in a hash

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Resources