CouchDB Views: remove duplicates *and* order by time - view

Based on a great answer to my previous question, I've partially solved a problem I'm having with CouchDB.
This resulted in a new view.
Now, the next thing I need to do is remove duplicates from this view while ordering by date.
For example, here is how I might query that view:
GET http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following?endkey=[%22c988a29740241c7d20fc7974be05ec54%22]&startkey=[%22c988a29740241c7d20fc7974be05ec54%22,{}]&descending=true&limit=3
Resulting in this:
HTTP 200 http://scoates-test.couchone.com/follow/_design/asset/_view/by_userid_following
http://scoates-test.couchone.com > $_.json.rows
[ { id: 'c988a29740241c7d20fc7974be067295'
, key:
[ 'c988a29740241c7d20fc7974be05ec54'
, '2010-11-26T17:00:00.000Z'
, 'clementine'
]
, value:
{ _id: 'c988a29740241c7d20fc7974be062ee8'
, owner: 'c988a29740241c7d20fc7974be05f67d'
}
}
, { id: 'c988a29740241c7d20fc7974be068278'
, key:
[ 'c988a29740241c7d20fc7974be05ec54'
, '2010-11-26T15:00:00.000Z'
, 'durian'
]
, value:
{ _id: 'c988a29740241c7d20fc7974be065115'
, owner: 'c988a29740241c7d20fc7974be060bb4'
}
}
, { id: 'c988a29740241c7d20fc7974be068026'
, key:
[ 'c988a29740241c7d20fc7974be05ec54'
, '2010-11-26T14:00:00.000Z'
, 'clementine'
]
, value:
{ _id: 'c988a29740241c7d20fc7974be063b6d'
, owner: 'c988a29740241c7d20fc7974be05ff71'
}
}
]
As you can see, "clementine" shows up twice.
If I change the view to emit the fruit/asset name as the second key (instead of the time), I can change the grouping depth to collapse these, but that doesn't solve my order-by-time requirement. Similarly, with the above setup, I can order by time, but I can't collapse duplicate asset names into single rows (to allow e.g. 10 assets per page).
Unfortunately, this is not a simple question to explain. Maybe this chat transcript will help a little.
Please help. I'm afraid that what I need to do is still not possible.
S

You can do this using list function. Here is an example to generate a really simple list containing all the owner fields without dupes. You can easily modify it to produce json or xml or anything you want.
Put it into your assets design doc inside the lists.nodupes and use like this:
http://admin:123#127.0.0.1:5984/follow/_design/assets/_list/nodupes/by_userid_following_reduce?group=true
function(head, req) {
start({
"headers": {
"Content-Type": "text/html"
}
});
var row;
var dupes = [];
while(row = getRow()) {
if (dupes.indexOf(row.key[2]) == -1) {
dupes.push(row.key[2]);
send(row.value[0].owner+"<br>");
}
}
}

Ordering by one field and uniquing on another isn't something the basic map reduce can do. All it can do is sort your data, and apply reduce rollups to dynamic key-ranges.
To find the latest entry for each type of fruit, you'd need to query once per fruit.
There are some ways to do this that are kinda sane.
You'll want a view with keys like [fruit_type, date], and then you can query like this:
for fruit in fruits
GET /db/_design/foo/_view/bar?startkey=["apples"]&limit=1&descending=true
This will give you the latest entry for each fruit.
The list operation could be used to do this, it would just echo the first row from each fruit's block. This would be efficient enough as long as each fruit has a small number of entries. Once there are many entries per fruit, you'll be discarding more data than you echo, so the multi-query approach actually scales better than the list approach, when you get to a large data set. Luckily they can both work on the same view index, so when you have to switch it won't be a big deal.

Related

CouchDB 2.0 - How to autoincrement keys in a View?

In CouchDB 2.0, I'm trying to create an ordered list as the keys from a View, but it doesn't work.
My code for the View document:
var i = 0;
function (doc) {
if (doc.type === "comment") {
emit(i++, doc.webpages);
}
}
The result is that all keys are equal to 0. How can I make it so that each document gets an autoincremented key?
Thanks!
A sequential ID probably isn't the best choice for most real applications. For example, if you were to build a commenting system I would approach it like this (there's a similar example in the couch docs):
Comments would be docs with a structure like this:
{
"_id": "comment_id",
"parent":"comment_id, or article_id if a top level comment"
"timestamp" : "iso datetime populated by the server",
"user_id": "the person who wrote the comment",
"content": "content of the comment"
}
To display all the top level comments of a given parent (either article or parent comment), you could use a view like this:
def function(doc){
emit([doc.parent, doc.timestamp, doc.user_id], doc._id)
}
To query this efficiently, you'd could use the following query options to grab the first twenty:
{
"startkey": ["parent_id"],
"endkey": ["parent_id", {}],
"limit": 20,
"skip": 0,
"include_docs": true
}
The comments will automatically be sorted by the date they were posted because the view is ordered by [parent, datetime, and then user]. You don't have the pass a value for anything other than parent with your key for benefit from this.
Another thing of note is by not passing the content of the comment to the view and instead using include_docs, your index will remain as slim as possible.
To expand on this:
If you want to show replies to a base comment, you can just change
the start and end keys to that comment's id.
If you want to show the next 20 comments, just change skip to 20.
If you want more comments shown initially, just up the limit value.
In answer to your comment, if you had an array or parents in your document like:
"parents" : ["a100", "a101", "a102"]
Everything else would remain the same, except you would emit a row for each parent.
def function(doc){
doc.parents.map( function (parent){
emit([doc.parent, doc.timestamp, doc.user_id], doc._id)
});
}

How to get a list of all keys across all documents in a RethinkDB table?

I have a dynamically populated table in which documents can have different keys that are not known in advance:
Document 1
{
'attribute1': 'foo',
'attribute2': 'bar'
}
Document 2
{
'attribute1': 'foo',
'attribute3': 'baz'
}
How can I get a list of all attributes present in all documents?
attribute1
attribute2
attribute3
I've tried grouping by keys() but I get a list of the possible attribute combinations, not the individual keys.
While this isn't fast enough if you have a lot of document, it will eventually finishes and won't consume lots of memory:
r.table('table')
.map(r.row.keys())
.reduce(function(left, right) {
return left.setUnion(right)
})
It will be slow, but you can write something like table.concatMap(function(row) { return row.keys(); }).distinct().
I'm not sure there's a solution that is more efficient than O(n) (?) unless you update some custom meta-data by each data update, but anyways, I guess I'd go
table.reduce(function(left, right) {
return left.merge(right)
}).keys()
You can simply create a secondary index of type multi:
r.table('foo').indexCreate('all_keys', function(d){
return d.without('id').keys()
}, {multi: true})
And to get all the keys, just run:
r.table('foo').distinct({index: 'all_keys'})
Voila ;-)

Document concurrent update

I have a document like:
{
owner: 'alex',
live: 'some guid'
}
Two or more users can update live field simultaneously.
How can I make sure that only the first user wins and others updates fails?
You can get the semantics you want if you store some variable like "times_updated" in the document. Operations on a single document are atomic, so you can check that the field is the value you expect, and then throw an error if it isn't.
It might look something like:
var timesUpdated = 3
r.table('foo').get(rowId).update(function(row) {
return r.branch(row('timesUpdated').eq(timesUpdated),
{
timesUpdated: row('timesUpdated').add(1),
live: 'some special value'
},
r.error('Someone else updated the live field!')
);
}, {returnChanges: true})
So if another query comes in before you for timesUpdated = 3, your query will blow up. When do you get timesUpdated? That depends on how your app is designed, and what you're trying to do.
Another thing to note is that adding {returnChanges: true} is really useful because it allows you to get the new value of timesUpdated atomically. You can also see what exactly changed in the updated document.

backbone.js: Retrieve a smaller version of model building a collection

I'm trying to build an api to create a collection in backbone. My Model is called log and has this (shortened) properties (format for getLog/<id>):
{
'id': string,
'duration': float,
'distance': float,
'startDate': string,
'endDate': string
}
I need to create a collection, because I have many logs and I want to display them in a list. The api for creating the collection (getAllLogs) takes 30 sec to run, which is to slow. It returns the same as the format as the api getLog/<id>, but in an array, one element for each log on the database.
To speed things up, I rebuild the api several times and optimize it to it's limits, but now I came to 30 sec, which is still to slow.
My question is if it is possible to have a collection filled with instances of a model without ALL the information in the model, just a part of it needed to display the list. This will increase the speed of loading the collection and displaying the list, while in the background I could continue loading all other properties, or load them only for the elements I really need.
In my case, the model would load only with this information:
{
'id': string,
'distance': float
}
and all other properties could be loaded later.
How can I do it? is it a good idea anyway?
thanks.
One way to do this is to use map to get the shortened model. Something like this will convert a Backbone.Collection "collection" with all properties to one with only "id" and "distance":
var shortCollection = new Backbone.Collection(collection.toJSON().map(function(x) {
return { id: x.id, distance: x.distance };
}));
Here's a Fiddle illustration.

Rails 4 and Mongoid: programmatically build query to search for different conditions on the same field

I'm building a advanced search functionality and, thanks to the help of some ruby fellows on SO, I've been already able to combine AND and OR conditions programmatically on different fields of the same class.
I ended up writing something similar to the accepted answer mentioned above, which I report here:
query = criteria.each_with_object({}) do |(field, values), query|
field = field.in if(values.is_a?(Array))
query[field] = values
end
MyClass.where(query)
Now, what might happen is that someone wants to search on a certain field with multiple criteria, something like:
"all the users where names contains 'abc' but not contains 'def'"
How would you write the query above?
Please note that I already have the regexes to do what I want to (see below), my question is mainly on how to combine them together.
#contains
Regex.new('.*' + val + '.*')
#not contains
Regex.new('^((?!'+ val +').)*$')
Thanks for your time!
* UPDATE *
I was playing with the console and this is working:
MyClass.where(name: /.*abc.*/).and(name: /^((?!def).)*$/)
My question remains: how do I do that programmatically? I shouldn't end up with more than two conditions on the same field but it's something I can't be sure of.
You could use an :$and operator to combine the individual queries:
MyClass.where(:$and => [
{ name: /.*abc.*/ },
{ name: /^((?!def).)*$/ }
])
That would change the overall query builder to something like this:
components = criteria.map do |field, value|
field = field.in if(value.is_a?(Array))
{ field => value }
end
query = components.length > 1 ? { :$and => components } : components.first
You build a list of the individual components and then, at the end, either combine them with :$and or, if there aren't enough components for :$and, just unwrap the single component and call that your query.

Resources