Sorting values in couchdb - sorting

I'm trying to pull scores back from this view
"scores": {
"map": "function(doc) { emit(doc.appName, {_id:doc.username, username:doc.username, credits:doc.credits, avatar:doc.avatar}) }"
},
I pass in the appname and it returns the scores. The issue is the scores wont sort correctly. When I try to sort them they only sort by the first number so something like so
1500
50
7
900
As you can see the first numbers are sorted ASC but the whole number itself isn't. Is it possible to have couchdb sort the scores if the appname is the key?

is doc.appName a string? Turn it into a number:
function(doc) {
emit(parseInt(doc.appName), {_id:doc.username, username:doc.username, credits:doc.credits, avatar:doc.avatar});
// ^^^^^^^^
}

Use a complex key:
emit([doc.appName, doc.score], null)
Then query using a range:
startkey=["app1", 0]&endkey=["app1", {}]

Related

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

Elasticsearch calculate Max with cutoff

its an strange requirement.
we need to calculate a MAX value in our dataset, however, some of our data are BAD meaning, the MAX value will produce an undesired outcome.
say the values in field "myField" are:
INPUT:
10 30 20 40 1000000
CURRENT OUTPUT:
1000000
DESIRED OUTPUT:
40
{"aggs": {
"aggs": {
"maximum": {
"max": {
"field": "myField"
}
}
}
}
}
I thought of sorting the data but that'll be really slow as the actual data counts to 100K+.
So my question, is there a way to cutoff data in aggs so it ignores the actual MAX and return the SECOND MAX, Alternatively to ignore say the top 10% and returns the max value.
have you thought of using percentiles to eliminate outliers? Maybe run a percentile aggregation first and then use that as a base for a range filter?
The requirement seems a bit blurry to me, so this is just another try to help, not sure if this is what you are after.

MongoDB dynamic ranking

I use MongoDB and have a collection with about 100000 entries.
The entries contain data like that:
{"page": "page1", "user_count": 1400}
{"page": "page2", "user_count": 1100}
{"page": "page3", "user_count": 900}
...
I want to output a ranking of the entries according to the user_count like:
#1 - page1
#2 - page2
#3 - page3
...
...so far so good. I can simply use a loop counter if I just output a sorted list.
But I also have to support various search queries. So for example I get 20 results and want to show on which rank the results are. Like:
#432 - page1232
#32 - page223
#345 - page332
...
What's the best way to do that? I don't really want to store the ranking in the collection since the collection constantly changes. I tried to solve it with a lookup dictionary I have built on the fly but it was really slow. Does MongoDB have any special functionality for such cases that could help?
There's no single command that you can use to do this, but you can do it with count:
var doc = db.pages.findOne(); // Or however you get your document
var n = db.pages.find({user_count : {$gt : doc.user_count}}).count(); // This is the number of documents with a higher user_count
var ranking = n+1; // Your doc is next in a ranking
A separate qustion is whether you should do this. Consider the following:
You'll need an index on user_count. You may already have this.
You'll need to perform a count query for each record you are displaying. There's no way to batch these up.
Given this, you may impact your performance more than if you stored the ranking in the collection depending on the CRUD profile of your application - it's up to your to decide what is the best option.
There's no simple approach to solve this problem with MongoDB.
If it is possible I would advise you to look at the Redis with its Sorted Sets. As documentation says:
With Sorted Sets you can: Take a leader board in a massive online game, where every time a new score is submitted you update it using ZADD. You can easily take the top users using ZRANGE, you can also, given an user name, return its rank in the listing using ZRANK. Using ZRANK and ZRANGE together you can show users with a score similar to a given user. All very quickly.
You can easily take ranks for random pages by using MULTI/EXEC block. So it's the best approach for your task I think, and it will much faster than using MapReduce or reranking with mongodb.
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { page: "page1", user_count: 1400 }
// { page: "page2", user_count: 1100 }
// { page: "page3", user_count: 900 }
db.test.aggregate([
{ $setWindowFields: {
sortBy: { user_count: -1 },
output: { rank: { $rank: {} } }
}},
// { page: "page1", user_count: 1400, rank: 1 }
// { page: "page2", user_count: 1100, rank: 2 }
// { page: "page3", user_count: 900, rank: 3 }
{ $match: { page: "page2" } }
])
// { page: "page2", user_count: 1100, rank: 2 }
The $setWindowFields stage adds the global rank by:
sorting documents by decreasing order of user_count: sortBy: { user_count: -1 }
and adding the rank field in each document (output: { rank: { $rank: {} } })
which is the rank of the document amongst all documents based on the sorting field user_count: rank: { $rank: {} }.
The $match stage is there to simulate your filtering requirement.

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Getting the highest value of a column in MongoDB

I've been for some help on getting the highest value on a column for a mongo document. I can sort it and get the top/bottom, but I'm pretty sure there is a better way to do it.
I tried the following (and different combinations):
transactions.find("id" => x).max({"sellprice" => 0})
But it keeps throwing errors. What's a good way to do it besides sorting and getting the top/bottom?
Thank you!
max() does not work the way you would expect it to in SQL for Mongo. This is perhaps going to change in future versions but as of now, max,min are to be used with indexed keys primarily internally for sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the collection desc on that value and take the first.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
Sorting might be overkill. You can just do a group by
db.messages.group(
{key: { created_at:true },
cond: { active:1 },
reduce: function(obj,prev) { if(prev.cmax<obj.created_at) prev.cmax = obj.created_at; },
initial: { cmax: **any one value** }
});
db.collectionName.aggregate(
{
$group :
{
_id : "",
last :
{
$max : "$sellprice"
}
}
}
)
Example mongodb shell code for computing aggregates.
see mongodb manual entry for group (many applications) :: http://docs.mongodb.org/manual/reference/aggregation/group/#stage._S_group
In the below, replace the $vars with your collection key and target variable.
db.activity.aggregate(
{ $group : {
_id:"$your_collection_key",
min: {$min : "$your_target_variable"},
max: {$max : "$your_target_variable"}
}
}
)
Use aggregate():
db.transactions.aggregate([
{$match: {id: x}},
{$sort: {sellprice:-1}},
{$limit: 1},
{$project: {sellprice: 1}}
]);
It will work as per your requirement.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
If the column's indexed then a sort should be OK, assuming Mongo just uses the index to get an ordered collection. Otherwise it's more efficient to iterate over the collection, keeping note of the largest value seen. e.g.
max = nil
coll.find("id" => x).each do |doc|
if max == nil or doc['sellprice'] > max then
max = doc['sellprice']
end
end
(Apologies if my Ruby's a bit ropey, I haven't used it for a long time - but the general approach should be clear from the code.)
Assuming I was using the Ruby driver (I saw a mongodb-ruby tag on the bottom), I'd do something like the following if I wanted to get the maximum _id (assuming my _id is sortable). In my implementation, my _id was an integer.
result = my_collection.find({}, :sort => ['_id', :desc]).limit(1)
To get the minimum _id in the collection, just change :desc to :asc
Following query does the same thing:
db.student.find({}, {'_id':1}).sort({_id:-1}).limit(1)
For me, this produced following result:
{ "_id" : NumberLong(10934) }

Resources