Mongo Group Query using the Ruby driver - ruby

I've got a working Mongo query that I need to translate into Ruby:
var reducer = function(current, result){
result.loginsCount++;
result.lastLoginTs = Math.max(result.lastLoginTs, current.timeStamp);
}
var finalizer = function(result){
result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0];
}
db.audit_log.group({
key : {user : true},
cond : {events : { $elemMatch : { action : 'LOGIN_SUCCESS'}}},
initial : {lastLoginTs : -1, loginsCount : 0},
reduce : reducer,
finalize : finalizer
})
I'm hitting several sticking points getting this to work in Ruby. I'm not really all that familiar with Mongo, and I'm not sure what to pass as arguments to the method calls. This is my best guess, after connecting to the database and a collection called audit_log:
audit_log.group({
"key" => {"user" => "true"},
"cond" => {"events" => { "$elemMatch" => { "action" => "LOGIN_SUCCESS"}}},
"initial" => {"lastLoginTs" => -1, "loginsCount" => 0},
"reduce" => "function(current, result){result.loginsCount += 1}",
"finalize" => "function(result){ result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0]; }
})
Or something like that. I've tried using a simpler aggregate operation using the Mongo docs, but I couldn't get that working, either. I was only able to get really simple queries to return results. Are those keys (key, cond, initial, etc.) even necessary, or is that only for JavaScript?

This is how the function finally took shape using the 1.10.0 Mongo gem:
#db.collection("audit_log").group(
[:user, :events],
{'events' => { '$elemMatch' => { 'action' => 'LOGIN_SUCCESS' }}},
{ 'lastLoginTs' => -1, 'loginsCount' => 0 },
"function(current, result){ result.loginsCount++; result.lastLoginTs = Math.max(result.lastLoginTs, current.timeStamp);}",
"function(result){ result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0];}"
)
With the Mongo Driver, you leave off the keys: "key", "cond", "initial", "reduce", "finalize" and simply pass in the respective values.
I've linked to two approaches taken by other SO users here and here.

Related

Can't select sub aggregation in Nest

I get these results in my Elastic query:
"Results" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "73c47133-8656-45e7-9499-14f52df07b70",
"doc_count" : 1,
"foo" : {
"doc_count" : 40,
"bar" : {
"doc_count" : 1,
"customscore" : {
"value" : 10.496919917864476
}
}
}
}
]
I am trying to get a list of anonymous objects with the key field as the key and customscore field as the value.
No matter what I try, I can't seem to write code in Nest that accesses the customscore value. Apparently, I'm the very first person in the world to use nested Aggregations with the Nest library. Either that, or the documentation is very lacking. I can easily reach the Buckets:
response?.Aggregations.Terms("Results").Buckets;
But I have no idea what to do with this object. Buckets contains several objects, which I would assume I could navigate by doing this:
bucketObject["foo"]["bar"]["customscore"]
But apparently not. I have found solutions that use for loops, solutions with long Linq queries, and all of them seem to return null for me. What am I missing?
Assuming the following query, which I think would match the response in the question
var client = new ElasticClient();
var response = client.Search<object>(s => s
.Index("some_index")
.Aggregations(a => a
.Terms("Results", t => t
.Field("some_field")
.Aggregations(aa => aa
.Filter("foo", f => f
.Filter(q => q.MatchAll())
.Aggregations(aaa => aaa
.Filter("bar", ff => ff
.Filter(q => q.MatchAll())
.Aggregations(aaaa => aaaa
.ValueCount("customscore", vc => vc
.Field("some_other_field")
)
)
)
)
)
)
)
)
);
To get a collection of anonymous types would be
var kvs = response.Aggregations.Terms("Results").Buckets
.Select(b => new
{
key = b.Key,
value = b.Filter("foo").Filter("bar").ValueCount("customscore").Value
});
.Aggregations exposes methods that convert the IAggregate response to the expected type

Change ElasticSearch track_total_hits in NEST

I was running thought examples of ElasticSearch, and read this link that says that there is a default set at 10,000, which also can be changed on the search calls, like on this example
GET twitter/_search
{
"track_total_hits": 100,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
The problem is, I'm trying to do the same on NEST, but I don't manage to replicate it. The only thing similar that I found, only accept a Boolean value and not a number. It is possible to change the total through NEST?
Here is the code that I tried:
var results = elasticClient.Search<MyClass>(s => s
.Query(q => q.QueryString(q2 => q2.Query(readLine)
.Fields(f => f.Field(p => p.MyField)))).TrackTotalHits(true));
As stated by #russcam here at the moment you can do it via casting ISearchRequest to IRequest<SearchRequestParameters>:
var client = new ElasticClient();
var searchResponse = client.Search<Document>(s =>
{
IRequest<SearchRequestParameters> request = s;
request.RequestParameters.SetQueryString("track_total_hits", 1000);
return s;
});
It will apply it as querystring parameter

Add/modify text between parentheses

I'm trying to make a classified text, and I'm having problem turning
(class1 (subclass1) (subclass2 item1 item2))
To
(class1 (subclass1 item1) (subclass2 item1 item2))
I have no idea to turn text above to below one, without caching subclass1 in memory. I'm using Perl on Linux, so any solution using shell script or Perl is welcome.
Edit: I've tried using grep, saving whole subclass1 in a variable, then modify and exporting it to the list; but the list may get larger and that way will use a lot of memory.
I have no idea to turn text above to below one
The general approach:
Parse the text.
You appear to have lists of space-separated lists and atoms. If so, the result could look like the following:
{
type => 'list',
value => [
{
type => 'atom',
value => 'class1',
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass1',
},
]
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass2',
},
{
type => 'atom',
value => 'item1',
},
{
type => 'atom',
value => 'item2',
},
],
}
],
}
It's possible that something far simpler could be generated, but you were light on details about the format.
Extract the necessary information from the tree.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $item = $tree->{value}[2]{value}[1]{value};
Perform the required modifications.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $new_atom = { type => 'atom', value => $item };
push #{ $tree->{value}[1]{value} }, $new_atom;
Serialize the data structure.
For the above data structure, you could use the following:
sub serialize {
my ($node) = #_;
return $node->{type} eq 'list'
? "(".join(" ", map { serialize($_) } #{ $node->{value} }).")"
: $node->{value};
}
Other approaches could be available depending on the specifics.

LINQ SelectMany equivalent in Scala

I have some quite simple .NET logic that I'm transplanting into a Scala codebase, and I don't really know the first thing about Scala. It includes a LINQ query that groups a collection of tagged objects by making use of an anonymous type projection to flatten and join, followed by grouping, eg:
var q = things.SelectMany(t => t.Tags, (t, tag) => new { Thing = t, Tag = tag })
.GroupBy(x => x.Tag, x => x.Thing);
In Scala it looks like flatMap might be of use, but I can't figure out how to combine it with groupBy via an anonymous.
Is this kind of thing a lot more complicated in Scala, or am I missing something simple?
UPDATE:
I ended up going with:
things.flatMap(t => t.Tags.map(x => (x,t))).groupBy(x => x._1)
and then of course later on when I access a value in the map I need to do:
.map(x => x._2)
to get the groups out of the tuple.
Simple when you know how!
Seems to me you want to do something like.
case class Tag(tag:String)
case class Thing(Tags : Seq[Tag])
val things :Seq[Thing] = Seq(Thing(Seq(Tag(""))))
val q = things.map {
thing => new {
val Thing = thing
val Tags = thing.Tags
}
}.flatMap {
thingAndTags => thingAndTags.Tags.map {
tag => new {
val Thing = thingAndTags.Thing
val Tag = tag
}
}
}. groupBy {
thingAndTag => thingAndTag.Tag
}.map {
tagAndSeqOfThingAndTags =>
tagAndSeqOfThingAndTags._1 -> tagAndSeqOfThingAndTags._2.map(x => x.Thing)
}
But in Scala anonymous objects are not really common but you can use Tuple2[T1,T2] instead of all the new { val ...}s,
val q = things.map {
thing => ( thing->thing.Tags)
}.flatMap {
thingAndTags => thingAndTags._2.map {
tag => (thingAndTags._1, tag)
}
}.groupBy {
thingAndTag => thingAndTag._2
}.map {
tagAndSeqOfThingAndTags =>
tagAndSeqOfThingAndTags._1 -> tagAndSeqOfThingAndTags._2.map(x => x._1)
}
its just a little confusing with all the ._1s and ._2s

Upsert Multiple Records with MongoDb

I'm trying to get MongoDB to upsert multiple records with the following query, ultimately using MongoMapper and the Mongo ruby driver.
db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)
This works fine if all the records exist, but does not create new records for records that do not exist. The following command has the desired effect from the shell, but is probably not ideal from the ruby driver.
[1,2].forEach(function(id) {db.foo.update({event_id: id}, {$inc: {visit:1}}, true, true) });
I could loop through each id I want to insert from within ruby, but that would necessitate a trip to the database for each item. Is there a way to upsert multiple items from the ruby driver with only a single trip to the database? What's the best practice here? Using mongomapper and the ruby driver, is there a way to send multiple updates in a single batch, generating something like the following?
db.foo.update({event_id: 1}, {$inc: {visit:1}}, true); db.foo.update({event_id: 2}, {$inc: {visit:1}}, true);
Sample Data:
Desired data after command if two records exist.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }
Actual data after command if two records exist.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }
Desired data after command if only the record with event_id 1 exists.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 1 }
Actual data after command if only the record with event_id 1 exists.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
This - correctly - will not insert any records with event_id 1 or 2 if they do not already exist
db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)
This is because the objNew part of the query (see http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers) does not have a value for field event_id. As a result, you will need at least X+1 trips to the database, where X is the number of event_ids, to ensure that you insert a record if one does not exist for a particular event_id (the +1 comes from the query above, which increases the visits counter for existing records). To say it in a different way, how does MongoDB know you want to use value 2 for the event_id and not 1? And why not 6?
W.r.t. batch insertion with ruby, I think it is possible as the following link suggests - although I've only used the Java driver: Batch insert/update using Mongoid?
What you are after is the Find and Modify command with the upsert option set to true. See the example from the Mongo test suite (same one linked to in the Find and Modify docs) for an example that looks very much like what you describe in your question.
I found a way to do this using the eval operator for server-side code execution. Here is the code snippit:
def batchpush(body, item_opts = {})
#batch << {
:body => body,
:duplicate_key => item_opts[:duplicate_key] || Mongo::Dequeue.generate_duplicate_key(body),
:priority => item_opts[:priority] || #config[:default_priority]
}
end
def batchprocess()
js = %Q|
function(batch) {
var nowutc = new Date();
var ret = [];
for(i in batch){
e = batch[i];
//ret.push(e);
var query = {
'duplicate_key': e.duplicate_key,
'complete': false,
'locked_at': null
};
var object = {
'$set': {
'body': e.body,
'inserted_at': nowutc,
'complete': false,
'locked_till': null,
'completed_at': null,
'priority': e.priority,
'duplicate_key': e.duplicate_key,
'completecount': 0
},
'$inc': {'count': 1}
};
db.#{collection.name}.update(query, object, true);
}
return ret;
}
|
cmd = BSON::OrderedHash.new
cmd['$eval'] = js
cmd['args'] = [#batch]
cmd['nolock'] = true
result = collection.db.command(cmd)
#batch.clear
#pp result
end
Multiple items are added with batchpush(), and then batchprocess() is called. The data is sent as an array, and the commands are all executed. This code is used in the MongoDequeue GEM, in this file.
Only one request is made, and all the upserts happen server-side.

Resources