How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby - ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?

I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.

Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Related

ElasticSearch Chaining queries based on result from first query

I have some data and I am looking to implement a search feature that probably requires chaining multiple queries. for example there are few people who are part of a group but each member in the database are separate. None of the data is nested.
For example
data = [
{
id: '1'
name: 'abc',
familyId: '3'
},
{
id: '2'
name: 'def',
familyId: '3'
},
{
id: '3'
name: 'ghi',
familyId: null
},
]
So now I am trying to implement a search feature where people can search by name, and if the name matches I want to show that result along with his family members. Each data is different and there is no connection between them apart from the familyId.
So currently my solution is to make a search using the name first and then from the result of my first search I will see if there is family ID present in the result, and if yes make another ES query to get all the members and then show the result.
Is there a away I could make it one query that will give me the desired output?
Any suggestion is very much appreciated.
there's no native Elasticsearch functionality for this unfortunately, your approach is the current best way to do it

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

Restrict a find query MongoDB with ruby driver

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.
You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

ElasticSearch Aggregations: subtracting aggregations based upon match

Using a simple albeit somewhat artificial example, let's say that I have several inventory docs stored in ElasticSearch where every document represents either the purchase or the sale of an item:
[
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149382734621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149383464621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149384824621},
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149387435621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149388434621},
{item_id: "bizbaz", type: "cost", value: 41.23, timestamp:149389424621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149389914621},
{item_id: "waahoo", type: "sale", value: 11.23, timestamp:149389914621},
...
]
And for a specified time range I want to calculate the current profit for each item. So for example I would want to return:
foobar_profit = sum(value of all documents item_id="foobar" and type="sale")
-sum(value of all documents item_id="foobar" and type="cost")
bizbaz_profit = sum(value of all documents item_id="bizbaz" and type="sale")
-sum(value of all documents item_id="bizbaz" and type="cost")
...
There are two aspects that I don't yet understand how to achieve.
I know how to aggregate over terms, so this would allow me to sum the value of of all "foobar" items regardless of type. But I don't know how to sum over all documents that match on two fields. For instance, I want to aggregate the above data set on the compound key (item_id,type). The dataset above would then yield the aggregations:
(foobar,cost)->24.68
(foobar,sale)->65.48
(bizbaz,cost)->41.23
(bizbaz,sale)->90.24
(waahoo,sale)->11.23
Presuming I can do #1, I will have aggregations like foobar_cost and foobar_sale. But I don't know how to combine two aggregations so that in this case foobar_profit = foobar_sale - foobar_cost. So the above aggregations would become
foobar_profit->40.8
bizbaz_profit->49.01
waahoo_profit->11.23
Some final notes:
In the example above, I only list 3 item_ids, but consider that there will be thousands of item_ids, so I can't do special-case queries per item_id.
Also, for a particular item, the cost and sale items will come in at different times, so we can't put the cost and sale price in the same document and diff the fields.
I can send back all the data and do the last step of the aggregations client side, but this might be a ton of data. Really, I need to do it on server side if possible so that I can sort the results by profit and return the top N.
You can just use nested aggregations. See here for a working example: https://gist.github.com/mattweber/71033b1bf2ebed1afd8e
I use a MatchAll Query in this example but you can replace that with a RangeQuery or whatever you need.

Getting the highest value of a column in MongoDB

I've been for some help on getting the highest value on a column for a mongo document. I can sort it and get the top/bottom, but I'm pretty sure there is a better way to do it.
I tried the following (and different combinations):
transactions.find("id" => x).max({"sellprice" => 0})
But it keeps throwing errors. What's a good way to do it besides sorting and getting the top/bottom?
Thank you!
max() does not work the way you would expect it to in SQL for Mongo. This is perhaps going to change in future versions but as of now, max,min are to be used with indexed keys primarily internally for sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the collection desc on that value and take the first.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
Sorting might be overkill. You can just do a group by
db.messages.group(
{key: { created_at:true },
cond: { active:1 },
reduce: function(obj,prev) { if(prev.cmax<obj.created_at) prev.cmax = obj.created_at; },
initial: { cmax: **any one value** }
});
db.collectionName.aggregate(
{
$group :
{
_id : "",
last :
{
$max : "$sellprice"
}
}
}
)
Example mongodb shell code for computing aggregates.
see mongodb manual entry for group (many applications) :: http://docs.mongodb.org/manual/reference/aggregation/group/#stage._S_group
In the below, replace the $vars with your collection key and target variable.
db.activity.aggregate(
{ $group : {
_id:"$your_collection_key",
min: {$min : "$your_target_variable"},
max: {$max : "$your_target_variable"}
}
}
)
Use aggregate():
db.transactions.aggregate([
{$match: {id: x}},
{$sort: {sellprice:-1}},
{$limit: 1},
{$project: {sellprice: 1}}
]);
It will work as per your requirement.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
If the column's indexed then a sort should be OK, assuming Mongo just uses the index to get an ordered collection. Otherwise it's more efficient to iterate over the collection, keeping note of the largest value seen. e.g.
max = nil
coll.find("id" => x).each do |doc|
if max == nil or doc['sellprice'] > max then
max = doc['sellprice']
end
end
(Apologies if my Ruby's a bit ropey, I haven't used it for a long time - but the general approach should be clear from the code.)
Assuming I was using the Ruby driver (I saw a mongodb-ruby tag on the bottom), I'd do something like the following if I wanted to get the maximum _id (assuming my _id is sortable). In my implementation, my _id was an integer.
result = my_collection.find({}, :sort => ['_id', :desc]).limit(1)
To get the minimum _id in the collection, just change :desc to :asc
Following query does the same thing:
db.student.find({}, {'_id':1}).sort({_id:-1}).limit(1)
For me, this produced following result:
{ "_id" : NumberLong(10934) }

Resources