RethinkDB: order / index by version - build - rethinkdb

I have records with the following field: { version: "2.4.1-5dev" }. I want to order / index them by version. The documents should be ordered by the version / build combination (ascending by their partially-numeric values). Is it possible, and if so, how can I do that?
edit:
I still can't index/order by version. Again, I want to be able to sort by version, even if there are words like "dev" in them.
In python, there's pkg_resources.parse_version() which helps compare two versions by pkg_resources.parse_version(ver1) > pkg_resources.parse_version(ver2) and it works even for somewhat crazy version naming.
Is there any chance I can use pkg_resources.parse_version() as a cmp function for indexing / ordering, or alternatively, get the same result within a query when trying to order documents by the version field?

You can create an index with a function, in JavaScript it would be something like
r.table("product").indexCreate("version", function(product) {
return r.branch(
product("version").match('dev'),
null,
product("version").split('.').concatMap(function(version) {
return version.split('-')
}).map(function(num) {
return num.coerceTo('NUMBER')
})
})
That works because null is currently not stored in secondary indexes, this behavior may change though -- See https://github.com/rethinkdb/rethinkdb/issues/1032
The third argument in r.branch, split the value on . and - then coerce each value to a number. For example
r.expr("2.4.1-5").split('.').concatMap(r.row.split('-')).map(r.row.coerceTo('NUMBER'))
// will return [2,4,1,5]

Related

Proper Upsert (Atomic Update Counter Field or Insert Document) with RethinkDB

After looking at some SO questions and issues on RethinkDB github, I failed to come to a clear conclusion if atomic Upsert is possible?
Essentially I would like to perform the same operation as ZINCRBY using Redis.
If member does not exist in the sorted set, it is added with increment
as its score (as if its previous score was 0.0). If key does not
exist, a new sorted set with the specified member as its sole member
is created.
The current implementation appears to differ from almost all databases that I have used. With the data being replaced or inserted not updated. This is a simple use case, like update the last visit, update the number of clicks, update a product quantity. So I must be missing something very obvious, because I cannot see a simple way to do this.
Yes, it is possible. After get on the key, perform an atomic replace. Something like this might work:
function set_or_increment_score(player, points){
return r.table('scores').get(player).replace(
row =>
{ id: player,
score: r.branch(
row.eq(null),
points,
row('score').add(points))
});
}
It has the following behaviour:
> set_or_increment_score("alice", 1).run(conn)
{ inserted: 1 }
> set_or_increment_score("alice", 2).run(conn)
{ replaced: 1 }
It works because get returns null when the document doesn't exist, and a replace on a non-existing document tuns into an insert. See the documentation for replace
So I end up using the following code to go around the no Update issue.
r.db("test").table("t").insert(
{id:"A", type:"player", species:"warrior", score:0, xp:0, armor:0},
{conflict: function(id, oldDoc, newDoc) {
return newDoc.merge(oldDoc).merge(
{armor: oldDoc("armor").add(1)});
}
}
)
Do you think this is more readable/elegant or do you see any issues with the code compared to your sample?

rethinkdb - hasFields to find all documents with multiple multiple missing conditions

I found an answer for finding all documents in a table with missing fields in this SO thread RethinkDB - Find documents with missing field, however I want to filter according to a missing field AND a certain value in a different field.
I want to return all documents that are missing field email and whose isCurrent: value is 1. So, I want to return all current clients who are missing the email field, so that I can add the field.
The documentation on rethink's site does not cover this case.
Here's my best attempt:
r.db('client').table('basic_info').filter(function (row) {
return row.hasFields({email: true }).not(),
/*no idea how to add another criteria here (such as .filter({isCurrent:1})*/
}).filter
Actually, you can do it in one filter. And, also, it will be faster than your current solution:
r.db('client').table('basic_info').filter(function (row) {
return row.hasFields({email: true }).not()
.and(row.hasFields({isCurrent: true }))
.and(row("isCurrent").eq(1));
})
or:
r.db('client').table('basic_info').filter(function (row) {
return row.hasFields({email: true }).not()
.and(row("isCurrent").default(0).eq(1));
})
I just realized I can chain multiple .filter commands.
Here's what worked for me:
r.db('client').table('basic_info').filter(function (row) {
return row.hasFields({email: true }).not()
}).filter({isCurrent: 1}).;
My next quest: put all of these into an array and then feed the email addresses in batch

Searching a MongoDB collection from the end (c#)

I am looking for the most efficient way to get the last elements of a fairly large (> 1 million docs) MongoDB collection.
Specifically, it is the oplog collection and I am looking for all entries after a given timestamp. It makes no sense to search the first million or so entries for a timestamp larger than the current one, since they are all definitely older because the collection is stored in its natural order.
Is there a way to tell MongoDB to search from the end of a collection?
I tried a linq query with Skip(N) but it's very slow. It seems it parses through all documents from the beginning and just doesn't return the first N.
The most efficient way is probably using aggregation. If your collection is sorted, you can get the last Timestamp using this aggregation:
var group = new BsonDocument
{
{
"$group", new BsonDocument
{
{"_id", 0},
{"newestTimeStamp", new BsonDocument { {"$last","$timeStamp"} } }
}
}
};
var pipeline = new[] {group};
var result = _dtCollection.Aggregate(pipeline);
}
Then you can deserialize the result into a Timestamp class. If you want to get several elements, you could create a similar expression using $match.
Also make sure to add an index to the collection on the TimeStamp field. This will probably make your LINQ-query faster if you decide to use that instead.

How do I get unique field values using rethinkdb javascript?

I have a field which has similar values. For eg {country : 'US'} occurs multiple times in the table. Similar for other countries too. I want to return an array which contains non-redundant values of 'country' field. I am new to creating Databases so likely this is a trivial question but I couldn't find anything useful in rethinkdb api.[SOLVED]
Thanks
You can use distinct, but the distinct command was created for short sequences only.
If you have a lot of data, you can use map/reduce
r.table("data").map(function(doc) {
return r.object(doc("country"), true) // return { <country>: true}
}).reduce(function(left, right) {
return left.merge(right)
}).keys() // return all the keys of the final document

Rethinkdb: Know which records were updated

Is there any way to update a sequence and know the primary keys of the updated documents?
table.filter({some:"value"}).update({something:"else"})
Then know the primary keys of the records that were updated without needing a second query?
It's currently not possible to return multiple values with {returnVals: true}, see for example https://github.com/rethinkdb/rethinkdb/issues/1382
There's is however a way to trick the system with forEach
r.db('test').table('test').filter({some: "value"}).forEach(function(doc) {
return r.db('test').table('test').get(doc('id')).update({something: "else"}, {returnVals: true}).do(function(result) {
return {generated_keys: [result("new_val")]}
})
})("generated_keys")
While it works, it's really really hack-ish. Hopefully with array limits, returnVals will soon be available for range writes.

Resources