Subqueries to filter out in rethinkdb - rethinkdb

How do write an equivalent statement in RethinkDB using Python client driver?
SELECT id fields FROM tasks WHERE id NOT IN (SELECT id FROM finished_tasks)
This is what I tried:
r.table('tasks').filter(lambda row: r.not(row['id'] in r.table('finished_tasks').pluck("id").coerce_to('array').run()

In Java Script:
r.table("tasks").filter(function(task){
return r.expr(r.table("finished_tasks").pluck("id")).map(function(i){
return i("id");
}).coerceTo('array')
.contains(task("id"))
.not();
})
In Python should be something like this.

I don't have an example in Python. I give JavaScript example and I think you can compare on API doc to write Python equivalent.
Assume that id is also the primary key of finished_tasks table.
r.table('tasks').filter(function(task) {
return r.table('finished_tasks').get(task('id')).eq(null)
})
If id isn't primary key of finished_tasks, let's create a secondary index for it, then use it in getAll
// Create index
r.table('finished_tasks').indexCreate('finished_task', r.row('id'))
// Using index for efficient query
r.table('tasks').filter(function(task) {
return r.table('finished_tasks').getAll(task('id'), {index: 'finished_task'}).count().eq(0)
})

Related

Hasura GraphQL query order by nested array relationships (with only one element)?

From the Hasura documentation is not possible to order by nested array relationships, the thing is I'm using that relation to get only one element from the array (e.g. the latest entry in that table). There is any way to transform that array (with one element) into an object to be able to perform order by in the root query?. Example:
query GetMachinesQuery {
machines {
machine_id
machine_detail
last_upgrade: upgrades(order_by: { created_at: desc }, limit: 1) {
upgrade_state {
updated_at
status
}
}
}
}
Do I have any way to sort the root query by any of the fields (e.g. status) present in the last_upgrade? The possible workaround is create a view (doing the joins to get latest upgrade info for each machine) and then I can use an object relationship, any other alternatives with hasura?
Thank you !

How to use RethinkDB indices in the following scenario?

I'd like to use an index to select all documents that don't have a particular nested field set.
In my situation with the JS-api this works out to this:
r.table('sometable').filter(r.row('_state').hasFields("modifiedMakeRefs").not())
How would I use an index on the above? I.e.: filter doesn't support defining indices afaik?
You would write:
r.table('sometable').indexCreate('idx_name', function(row) {
return row('_state').hasFields("modifiedMakeRefs");
})
And then:
r.table('sometable').getAll(false, {index: 'idx_name'})

Can you do a join using an embedded array in a document with rethinkdb?

Say I have a user table with a property called favoriteUsers which is an embedded array. i.e.
users
{
name:'bob'
favoriteUsers:['jim', 'tim'] //can you have an index on an embedded array?
}
user_presence
{
name:'jim', //index on name
online_since:14440000
}
Can I do an inner or eqJoin against say a 2nd table using the embedded property, or would I have to pull favoriteUsers out of the users table and into a join table like in traditional sql?
r.table('users')
.getAll('bob', {index:'name'})
// inner join user_presence on user_presence.name in users.highlights
.eqJoin("name", r.table('user_presence'), {index:'name'})
Eventually, I'd like to call changes() on the query so that I can get a realtime update of the users favorite users presence changes
eqJoin can works on embedded document, but it works by compare a value which we transform/pick from the embedded document to mark secondary index on right table.
In any other complicated join, I would rather use concatMap together with getAll.
Let's say we can fetch user and user_presence of their favoriteUsers
r.table('users')
.getAll('bob', {index: 'name'})
.concatMap(function(user) {
return r.table('user_presence').filter(function(presence) {
return user("favoriteUsers").contains(presence("name"))
})
)
So ideally, now you get the data and do the join yourself by querying extra data that you need. My query may have some syntax/error but I hope it gives you the idea

Rethinkdb - filtering by value in another table

In our RethinkDB database, we have a table for orders, and a separate table that stores all the order items. Each entry in the OrderItems table has the orderId of the corresponding order.
I want to write a query that gets all SHIPPED order items (just the items from the OrderItems table ... I don't want the whole order). But whether the order is "shipped" is stored in the Order table.
So, is it possible to write a query that filters the OrderItems table based on the "shipped" value for the corresponding order in the Orders table?
If you're wondering, we're using the JS version of Rethinkdb.
UPDATE:
OK, I figured it out on my own! Here is my solution. I'm not positive that it is the best way (and certainly isn't super efficient), so if anyone else has ideas I'd still love to hear them.
I did it by running a .merge() to create a new field based on the Order table, then did a filter based on that value.
A semi-generalized query with filter from another table for my problem looks like this:
r.table('orderItems')
.merge(function(orderItem){
return {
orderShipped: r.table('orders').get(orderItem('orderId')).pluck('shipped') // I am plucking just the "shipped" value, since I don't want the entire order
}
})
.filter(function(orderItem){
return orderItem('orderShipped')('shipped').gt(0) // Filtering based on that new "shipped" value
})
it will be much easier.
r.table('orderItems').filter(function(orderItem){
return r.table('orders').get(orderItem('orderId'))('shipped').default(0).gt(0)
})
And it should be better to avoid result NULL, add '.default(0)'
It's probably better to create proper index before any finding. Without index, you cannot find document in a table with more than 100,000 element.
Also, filter is limit for only primary index.
A propery way is to using getAll and map
First, create index:
r.table("orderItems").indexCreate("orderId")
r.table("orders").indexCreate("shipStatus", r.row("shipped").default(0).gt(0))
With that index, we can find all of shipper order
r.table("orders").getAll(true, {index: "shipStatus"})
Now, we will use concatMap to transform the order into its equivalent orderItem
r.table("orders")
.getAll(true, {index: "shipStatus"})
.concatMap(function(order) {
return r.table("orderItems").getAll(order("id"), {index: "orderId"}).coerceTo("array")
})

RethinkDB index for filter + orderby

Lets say a comments table has the following structure:
id | author | timestamp | body
I want to use index for efficiently execute the following query:
r.table('comments').getAll("me", {index: "author"}).orderBy('timestamp').run(conn, callback)
Is there other efficient method I can use?
It looks that currently index is not supported for a filtered result of a table. When creating an index for timestamp and adding it as a hint in orderBy('timestamp', {index: timestamp}) I'm getting the following error:
RqlRuntimeError: Indexed order_by can only be performed on a TABLE. in:
This can be accomplished with a compound index on the "author" and "timestamp" fields. You can create such an index like so:
r.table("comments").index_create("author_timestamp", lambda x: [x["author"], x["timestamp"]])
Then you can use it to perform the query like so:
r.table("comments")
.between(["me", r.minval], ["me", r.maxval]
.order_by(index="author_timestamp)
The between works like the get_all did in your original query because it gets only documents that have the author "me" and any timestamp. Then we do an order_by on the same index which orders by the timestamp(since all of the keys have the same author.) the key here is that you can only use one index per table access so we need to cram all this information in to the same index.
It's currently not possible chain a getAll with a orderBy using indexes twice.
Ordering with an index can be done only on a table right now.
NB: The command to orderBy with an index is orderBy({index: 'timestamp'}) (no need to repeat the key)
The answer by Joe Doliner was selected but it seems wrong to me.
First, in the between command, no indexer was specified. Therefore between will use primary index.
Second, the between return a selection
table.between(lowerKey, upperKey[, {index: 'id', leftBound: 'closed', rightBound: 'open'}]) → selection
and orderBy cannot run on selection with an index, only table can use index.
table.orderBy([key1...], {index: index_name}) → selection<stream>
selection.orderBy(key1, [key2...]) → selection<array>
sequence.orderBy(key1, [key2...]) → array
You want to create what's called a "compound index." After that, you can query it efficiently.
//create compound index
r.table('comments')
.indexCreate(
'author__timestamp', [r.row("author"), r.row("timestamp")]
)
//the query
r.table('comments')
.between(
['me', r.minval],
['me', r.maxval],
{index: 'author__timestamp'}
)
.orderBy({index: r.desc('author__timestamp')}) //or "r.asc"
.skip(0) //pagi
.limit(10) //nation!
I like using two underscores for compound indexes. It's just stylistic. Doesn't matter how you choose to name your compound index.
Reference: How to use getall with orderby in RethinkDB

Resources