Proper Upsert (Atomic Update Counter Field or Insert Document) with RethinkDB - rethinkdb

After looking at some SO questions and issues on RethinkDB github, I failed to come to a clear conclusion if atomic Upsert is possible?
Essentially I would like to perform the same operation as ZINCRBY using Redis.
If member does not exist in the sorted set, it is added with increment
as its score (as if its previous score was 0.0). If key does not
exist, a new sorted set with the specified member as its sole member
is created.
The current implementation appears to differ from almost all databases that I have used. With the data being replaced or inserted not updated. This is a simple use case, like update the last visit, update the number of clicks, update a product quantity. So I must be missing something very obvious, because I cannot see a simple way to do this.

Yes, it is possible. After get on the key, perform an atomic replace. Something like this might work:
function set_or_increment_score(player, points){
return r.table('scores').get(player).replace(
row =>
{ id: player,
score: r.branch(
row.eq(null),
points,
row('score').add(points))
});
}
It has the following behaviour:
> set_or_increment_score("alice", 1).run(conn)
{ inserted: 1 }
> set_or_increment_score("alice", 2).run(conn)
{ replaced: 1 }
It works because get returns null when the document doesn't exist, and a replace on a non-existing document tuns into an insert. See the documentation for replace

So I end up using the following code to go around the no Update issue.
r.db("test").table("t").insert(
{id:"A", type:"player", species:"warrior", score:0, xp:0, armor:0},
{conflict: function(id, oldDoc, newDoc) {
return newDoc.merge(oldDoc).merge(
{armor: oldDoc("armor").add(1)});
}
}
)
Do you think this is more readable/elegant or do you see any issues with the code compared to your sample?

Related

How to update item conditionally with branch in RethinkDB

I am trying to do simple upsert to the array field based on branch condition. However branch does not accept a reql expression as argument and I get error Expected type SELECTION but found DATUM.
This is probably some obvious thing I've missed, however I can't find any working example anywhere.
Sample source:
var userId = 'userId';
var itemId = 'itemId';
r.db('db').table('items').get(itemId).do(function(item) {
return item('elements').default([]).contains(function (element) {
return element('userId').eq(userId);
}).branch(
r.expr("Element already exist"),
//Error: Expected type SELECTION but found DATUM
item.update({
elements: item('elements').default([]).append({
userId: 'userId'
})
})
)
})
The problem here is that item is a datum, not a selection. This happens because you used r.do. The variable doesn't retain information about where the object originally came from.
A solution that might seem to work would be to write a new r.db('db').table('items').get(itemId) expression. The problem with that option is the behavior isn't atomic -- two different queries might append the same element to the 'elements' array. Instead you should write your query in the form r.db('db').table('items').get(itemId).update(function(item) { return <something>;) so that the update gets applied atomically.

Document concurrent update

I have a document like:
{
owner: 'alex',
live: 'some guid'
}
Two or more users can update live field simultaneously.
How can I make sure that only the first user wins and others updates fails?
You can get the semantics you want if you store some variable like "times_updated" in the document. Operations on a single document are atomic, so you can check that the field is the value you expect, and then throw an error if it isn't.
It might look something like:
var timesUpdated = 3
r.table('foo').get(rowId).update(function(row) {
return r.branch(row('timesUpdated').eq(timesUpdated),
{
timesUpdated: row('timesUpdated').add(1),
live: 'some special value'
},
r.error('Someone else updated the live field!')
);
}, {returnChanges: true})
So if another query comes in before you for timesUpdated = 3, your query will blow up. When do you get timesUpdated? That depends on how your app is designed, and what you're trying to do.
Another thing to note is that adding {returnChanges: true} is really useful because it allows you to get the new value of timesUpdated atomically. You can also see what exactly changed in the updated document.

RethinkDB: order / index by version - build

I have records with the following field: { version: "2.4.1-5dev" }. I want to order / index them by version. The documents should be ordered by the version / build combination (ascending by their partially-numeric values). Is it possible, and if so, how can I do that?
edit:
I still can't index/order by version. Again, I want to be able to sort by version, even if there are words like "dev" in them.
In python, there's pkg_resources.parse_version() which helps compare two versions by pkg_resources.parse_version(ver1) > pkg_resources.parse_version(ver2) and it works even for somewhat crazy version naming.
Is there any chance I can use pkg_resources.parse_version() as a cmp function for indexing / ordering, or alternatively, get the same result within a query when trying to order documents by the version field?
You can create an index with a function, in JavaScript it would be something like
r.table("product").indexCreate("version", function(product) {
return r.branch(
product("version").match('dev'),
null,
product("version").split('.').concatMap(function(version) {
return version.split('-')
}).map(function(num) {
return num.coerceTo('NUMBER')
})
})
That works because null is currently not stored in secondary indexes, this behavior may change though -- See https://github.com/rethinkdb/rethinkdb/issues/1032
The third argument in r.branch, split the value on . and - then coerce each value to a number. For example
r.expr("2.4.1-5").split('.').concatMap(r.row.split('-')).map(r.row.coerceTo('NUMBER'))
// will return [2,4,1,5]

Rethinkdb: Know which records were updated

Is there any way to update a sequence and know the primary keys of the updated documents?
table.filter({some:"value"}).update({something:"else"})
Then know the primary keys of the records that were updated without needing a second query?
It's currently not possible to return multiple values with {returnVals: true}, see for example https://github.com/rethinkdb/rethinkdb/issues/1382
There's is however a way to trick the system with forEach
r.db('test').table('test').filter({some: "value"}).forEach(function(doc) {
return r.db('test').table('test').get(doc('id')).update({something: "else"}, {returnVals: true}).do(function(result) {
return {generated_keys: [result("new_val")]}
})
})("generated_keys")
While it works, it's really really hack-ish. Hopefully with array limits, returnVals will soon be available for range writes.

Couchdb view filtering by date

I have a simple document named Order structure with the fields id, name,
userId and timeScheduled.
What I would like to do is create a view where I can find the
document.id for those who's userId is some value and timeScheduledis
after a given date.
My view:
"by_users_after_time": {
"map": "function(doc) { if (doc.userId && doc.timeScheduled) {
emit([doc.timeScheduled, doc.userId], doc._id); }}"
}
If I do
localhost:5984/orders/_design/Order/_view/by_users_after_time?startKey="[2012-01-01T11:40:52.280Z,f98ba9a518650a6c15c566fc6f00c157]"
I get every result back. Is there a way to access key[1] to do an if
doc.userId == key[1] or something along those lines and simply emit on the
time?
This would be the SQL equivalent of
select id from Order where userId =
"f98ba9a518650a6c15c566fc6f00c157" and timeScheduled >
2012-01-01T11:40:52.280Z;
I did quite a few Google searches but I can't seem to find a good tutorial
on working with multiple keys. It's also possible that my approach is
entirely flawed so any guidance would be appreciated.
You only need to reverse the key, because username is known:
function (doc) {
if (doc.userId && doc.timeScheduled) {
emit([doc.userId, doc.timeScheduled], 1);
}
}
Then query with:
?startkey=["f98ba9a518650a6c15c566fc6f00c157","2012-01-01T11:40:52.280Z"]
NOTES:
the query parameter is startkey, not startKey;
the value of startkey is an array, not a string. Then the double quotes go around the username and date values, not around the array.
I emit 1 as value, instead of doc._id, to save disk-space. Every row of the result has an id field with the doc._id, then there's no need to repeat it.
don't forget to set an endkey=["f98ba9a518650a6c15c566fc6f00c157",{}], otherwise you get the data of all users > "f98ba9a518650a6c15c566fc6f00c157"
The answer actually came from the couchdb mailing list:
Essentially, the Date.parse() doesn't like the +0000 on the timestamps. By
doing a substring and removing the +0000, everything worked.
For the record,
document.write(new Date("2012-02-13T16:18:19.565+0000")); //Outputs Invalid
Date
document.write(Date.parse("2012-02-13T16:18:19.565+0000")); //Outputs NaN
But if you remove the +0000, both lines of code work perfectly.

Resources