I'm trying to learn how to use the lambda function to resolve conflicts when inserting into a table with rethinkdb and python, like the last example on this page. I would like to compare a timestamp field between the old_doc and new_doc and keep the more recent document.
r.table(import_table).insert(documents, conflict=lambda id, old_doc, new_doc: new_doc if new_doc['timestamp'] > old_doc['timestamp'] else old_doc).run()```
This gives the following error
rethinkdb.errors.ReqlQueryLogicError: Expected type DATUM but found
FUNCTION
I can't find much documentation on this error or using lambda functions for conflict resolution. Running something simpler like:
r.table(import_table).insert(documents, conflict=lambda id, old_doc, new_doc: new_doc).run()
gives the same error which makes me think that I'm not writing this correctly. Any help would be appreciated.
It seems to me that you are just looking for r.branch
r.table(import_table).insert(
documents,
conflict=lambda id, old_doc, new_doc: r.branch(
new_doc['timestamp'] > old_doc['timestamp'],
new_doc,
old_doc
)
).run()
In cased of conflict, insert expects a datum, you were just returning a function (lambda). Here, the conflict resolution lambda will return the result of the r.branch comparison, which will be a datum this time.
EDIT:
I just tries it in the data explorer, in javascript, and it works for me:
r.table('foo').insert(
[{id: 0, a: 3}, {id: 1, a: 2}],
{
conflict: function(id, old_doc, new_doc){
return r.branch(
old_doc('a').lt(new_doc('a')),
new_doc,
old_doc
)}
})
It seems to be the same than the python syntax I gave above, no?
Conflict function for inserting was implemented in rethinkdb 2.3.0 so this works after updating to that version or newer.
Related
My graphQL schema looks like this,
type Todo {
name: String!
created_at: Time
}
type Query {
allTodos: [Todo!]!
todosByCreatedAtFlag(created_at: Time!): [Todo!]!
}
This query works.
query {
todosByCreatedAtFlag(created_at: "2017-02-08T16:10:33Z") {
data {
_id
name
created_at
}
}
}
Could anyone point out how i can create greater than (or less than) Time query in graphql (using faunaDB).
GraphQL range queries are not supported (yet.. they're coming!)
FaunaDB does not provide range queries for their GraphQL out-of-the-box, we are working on these features.
.. but there is a workaround.
That doesn't mean though that it can't do range queries since range queries are supported in FQL and you can always 'escape' from GraphQL to FQL to implement more advanced queries by writing a User Defined Function (UDF).
.. using resolvers
By using the #resolver keyword in your schema you can implement GraphQL queries yourself by writing a User Defined Function in FaunaDB in FQL. There are some basic examples in the documentation bt I imagine you might need some help so I'll write you a simple example.
I added your schema and added two documents:
First thing is that our schema will be extended with the resolver:
type Todo {
name: String!
created_at: Time
}
type Query {
allTodos: [Todo!]!
todosByCreatedAtFlag(created_at: Time!): [Todo!]!
todosByCreatedRange(before: Time, after:Time): [Todo!]! #resolver
}
All this does is add a function for us to implement:
Which if we call via GraphQL gives us exactly that Abort message we saw in the screenshot before since it has not been implemented yet. But we can see that the GraphQL statement actually calls the function.
.. UDF implementation
First thing we will do is add the parameter which is just writing a name as the first parameter of the lambda:
Which also takes an array in case you need to pass multiple parameters (which I do in the resolver that I defined in the schema):
We'll add an index to support our query. Values are for ranges (and for return values and sorting). We'll add created_at to range over it and also add ref since we'll need the return value to get the actual document behind the index.
We could then start off by just writing a simple function (that won't work yet)
Query(
Lambda(
["before", "after"],
Paginate(
Range(Match(Index("todosByCreatedAtRange")), Var("before"), Var("after"))
)
)
)
and could test this by calling the function manually via the shell.
This indeed returns the two objects (range is inclusive).
Of course, there is one problem with this, it does not return the data in the structure that GraphQL expects it so we'll get these strange errors:
We can do two things now, either define a type in our Schema that fits these and/or we can adapt the data the returns. We'll do the latter and adapt our result to the expected [Todo!]! result to show you.
Step one, map over the result. The only thing we introduce here is the Map and the Lambda. We do not do anything special yet, we just return the reference instead of both the ts and the reference as an example.
Query(
Lambda(
["before", "after"],
Map(
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
Lambda(["created_at", "ref"], Var("ref"))
)
)
)
Calling it indeed shows that the function now only returns references.
Let's get the actual documents. I know that FQL is verbose (and with good reasons, although it should become less verbose in the future) so I started adding comments to clarify things
Query(
Lambda(
["before", "after"],
Map(
// This is just the query to get your range
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
// This is a function that will be executed on each result (with the help of Map)
Lambda(["created_at", "ref"],
// We'll use Let to structure our queries ( allowing us to use varaibles )
Let({
todo: Get(Var("ref"))
},
// And then we return something
Var("todo")))
)
)
)
Our function now returns data.. woohoo!
We still need to make sure this data is conforms to what GraphQL expects, and from the schema we can see that it expects a [Todo!]! (See docs tab) and a Todo looks like (see the schema tab):
type Todo {
_id: ID!
_ts: Long!
name: String!
created_at: Time
}
As you can also see from that docs tab is that 'non-resolver' queries are automatically changed to return TodoPages. The function we wrote so far actually return pages.
Option 1, change the schema and turn it into a paginated resolver.
We can fix this by adding the paginated: true option to the resolver. You will have to take into account for extra parameters that will be added to the resolver as explained here. I haven't tried that myself, so I'm not 100% certain how that would work. The advantage of a paginated resolve is that you can immediately take advantage of sane pagination in the GraphQL endpoint.
Option 2, turn it into a non-paginated result.
A paginated result is a result that looks as follows:
{ data: [ document1, document2, .. ],
before: ...
after: ..
}
The result doesn't accept pages but an array so I'll change it and retrieve the data field:
And we have our result.
The complete query looks as follows:
Query(
Lambda(
["before", "after"],
Select(
["data"],
Map(
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
Lambda(
["created_at", "ref"],
Let({ todo: Get(Var("ref")) }, Var("todo"))
)
)
)
)
)
Disclaimers
Once you go custom, pagination also becomes your responsibility (e.g. pass an extra parameter). You can't fetch relations out of the box anymore as you would normally do by just requesting the relations in the GraphQL body.
Some words on the benefits of UDFs and the hybrid of GraphQL/FQL
Before you shy away from FQL (and yes, we do have to add range queries and are working on that), here is some explanation on the UDF approach in general and why it makes sense to think about it anyway.
You will at a certain moment encounter things in GraphQL that are just impossible (complex conditional transactions, e.g. update document and update this other document only if some condition that results form the previous update is true). Users that use other GraphQL implementations typically solve this by writing a serverless function in case you have to implement advanced logic or transactions.
FaunaDB's answer to this is to use their User Defined Functions (UDFs). This is not a serverless function, it's a FaunaDB function implemented in FQL which might seem cumbersome at first but it's important to realize that it gives you the same benefits ( multi-region/strong consistency/scalability/free-tier/pay-as-you-go) that FaunaDB provides.
I am trying to do simple upsert to the array field based on branch condition. However branch does not accept a reql expression as argument and I get error Expected type SELECTION but found DATUM.
This is probably some obvious thing I've missed, however I can't find any working example anywhere.
Sample source:
var userId = 'userId';
var itemId = 'itemId';
r.db('db').table('items').get(itemId).do(function(item) {
return item('elements').default([]).contains(function (element) {
return element('userId').eq(userId);
}).branch(
r.expr("Element already exist"),
//Error: Expected type SELECTION but found DATUM
item.update({
elements: item('elements').default([]).append({
userId: 'userId'
})
})
)
})
The problem here is that item is a datum, not a selection. This happens because you used r.do. The variable doesn't retain information about where the object originally came from.
A solution that might seem to work would be to write a new r.db('db').table('items').get(itemId) expression. The problem with that option is the behavior isn't atomic -- two different queries might append the same element to the 'elements' array. Instead you should write your query in the form r.db('db').table('items').get(itemId).update(function(item) { return <something>;) so that the update gets applied atomically.
After looking at some SO questions and issues on RethinkDB github, I failed to come to a clear conclusion if atomic Upsert is possible?
Essentially I would like to perform the same operation as ZINCRBY using Redis.
If member does not exist in the sorted set, it is added with increment
as its score (as if its previous score was 0.0). If key does not
exist, a new sorted set with the specified member as its sole member
is created.
The current implementation appears to differ from almost all databases that I have used. With the data being replaced or inserted not updated. This is a simple use case, like update the last visit, update the number of clicks, update a product quantity. So I must be missing something very obvious, because I cannot see a simple way to do this.
Yes, it is possible. After get on the key, perform an atomic replace. Something like this might work:
function set_or_increment_score(player, points){
return r.table('scores').get(player).replace(
row =>
{ id: player,
score: r.branch(
row.eq(null),
points,
row('score').add(points))
});
}
It has the following behaviour:
> set_or_increment_score("alice", 1).run(conn)
{ inserted: 1 }
> set_or_increment_score("alice", 2).run(conn)
{ replaced: 1 }
It works because get returns null when the document doesn't exist, and a replace on a non-existing document tuns into an insert. See the documentation for replace
So I end up using the following code to go around the no Update issue.
r.db("test").table("t").insert(
{id:"A", type:"player", species:"warrior", score:0, xp:0, armor:0},
{conflict: function(id, oldDoc, newDoc) {
return newDoc.merge(oldDoc).merge(
{armor: oldDoc("armor").add(1)});
}
}
)
Do you think this is more readable/elegant or do you see any issues with the code compared to your sample?
I have a table "posts" with "timestamp".
Now I want from all user that have more than 1 post, to get all posts EXCEPT the most recent post.
With this query I can successfully check the users who have more than 1 post:
r.table("post")
.group('userId')
.count()
.ungroup()
.filter(r.row("reduction").gt(1))
I can get the last post of a specific user by doing
r.table("post")
.filter({userId: 'xxx'})
.max('timestamp')
Now I need to tie those somehow together, and then compare the timestamp from each row with the max('timestamp') to see if they are not equal. The following is what I had but it's obviously wrong
.filter(r.row('timestamp').ne(r.row('timestamp').max('timestamp')('timestamp')))
Any advice how I bring all this together?
Something like this ought to work:
r.table('post')
.group({
index: 'userId'
})
.ungroup()
.filter(function(doc) {
return doc('reduction').count().gt(1)
})
.group('group')('reduction')
.nth(0)
.orderBy(
r.desc('timestamp')
).skip(1)
With reservations for syntax errors; I built this query using python and then converted it to javascript. Especially unsure about the .nth(0) part, never used it in javascript. In python it's just [0].
I have records with the following field: { version: "2.4.1-5dev" }. I want to order / index them by version. The documents should be ordered by the version / build combination (ascending by their partially-numeric values). Is it possible, and if so, how can I do that?
edit:
I still can't index/order by version. Again, I want to be able to sort by version, even if there are words like "dev" in them.
In python, there's pkg_resources.parse_version() which helps compare two versions by pkg_resources.parse_version(ver1) > pkg_resources.parse_version(ver2) and it works even for somewhat crazy version naming.
Is there any chance I can use pkg_resources.parse_version() as a cmp function for indexing / ordering, or alternatively, get the same result within a query when trying to order documents by the version field?
You can create an index with a function, in JavaScript it would be something like
r.table("product").indexCreate("version", function(product) {
return r.branch(
product("version").match('dev'),
null,
product("version").split('.').concatMap(function(version) {
return version.split('-')
}).map(function(num) {
return num.coerceTo('NUMBER')
})
})
That works because null is currently not stored in secondary indexes, this behavior may change though -- See https://github.com/rethinkdb/rethinkdb/issues/1032
The third argument in r.branch, split the value on . and - then coerce each value to a number. For example
r.expr("2.4.1-5").split('.').concatMap(r.row.split('-')).map(r.row.coerceTo('NUMBER'))
// will return [2,4,1,5]