Rethinkdb - filtering by value in another table - rethinkdb

In our RethinkDB database, we have a table for orders, and a separate table that stores all the order items. Each entry in the OrderItems table has the orderId of the corresponding order.
I want to write a query that gets all SHIPPED order items (just the items from the OrderItems table ... I don't want the whole order). But whether the order is "shipped" is stored in the Order table.
So, is it possible to write a query that filters the OrderItems table based on the "shipped" value for the corresponding order in the Orders table?
If you're wondering, we're using the JS version of Rethinkdb.
UPDATE:
OK, I figured it out on my own! Here is my solution. I'm not positive that it is the best way (and certainly isn't super efficient), so if anyone else has ideas I'd still love to hear them.
I did it by running a .merge() to create a new field based on the Order table, then did a filter based on that value.
A semi-generalized query with filter from another table for my problem looks like this:
r.table('orderItems')
.merge(function(orderItem){
return {
orderShipped: r.table('orders').get(orderItem('orderId')).pluck('shipped') // I am plucking just the "shipped" value, since I don't want the entire order
}
})
.filter(function(orderItem){
return orderItem('orderShipped')('shipped').gt(0) // Filtering based on that new "shipped" value
})

it will be much easier.
r.table('orderItems').filter(function(orderItem){
return r.table('orders').get(orderItem('orderId'))('shipped').default(0).gt(0)
})
And it should be better to avoid result NULL, add '.default(0)'

It's probably better to create proper index before any finding. Without index, you cannot find document in a table with more than 100,000 element.
Also, filter is limit for only primary index.
A propery way is to using getAll and map
First, create index:
r.table("orderItems").indexCreate("orderId")
r.table("orders").indexCreate("shipStatus", r.row("shipped").default(0).gt(0))
With that index, we can find all of shipper order
r.table("orders").getAll(true, {index: "shipStatus"})
Now, we will use concatMap to transform the order into its equivalent orderItem
r.table("orders")
.getAll(true, {index: "shipStatus"})
.concatMap(function(order) {
return r.table("orderItems").getAll(order("id"), {index: "orderId"}).coerceTo("array")
})

Related

Is there a way to compare each item to a aggreated value?

I'm new to graphQL and Hasura. I'm trying(in Hasura) to let me users provide custom aggregation (ideally in the form of a normal graphQL query) and have then each item the results compared against the aggreation.
Here's a example. Assume I have this schema:
USERTABLE:
userID
Name
Age
City
Country
Gender
HairColor
INCOMETABLE:
userID
Income
I created a relationship in hasura and I can query the data but my users want to do custom scoring of users' income level. For example, one user may want to query the data broken down by country and gender.
For the first example the result maybe:
{Country : Canada
{ gender : female
{ userID: 1,
Name: Nancy Smith,..
#data below is on aggregated results
rank: 1
%fromAverage: 35%
}...
Where I'm struggling is the data showing the users info relative to the aggregated data.
for Rank, I get the order by sorting but I'm not sure how to display the relative ranking and for the %fromAverage, I'm not sure how to do it at all.
Is there a way to do this in Hasura? I suspected that actions might be able to do this but I'm not sure.
You can use track a Postgres view. Your view would have as many fields as you'd like calculated in SQL and tracked as a separate "table" on your graphql api.
I am giving examples below based on a simplification where you have just table called contacts with just a single field called: id which is an auto-integer. I am just adding the id of the current contact to the avg(id) (a useless endeavor to be sure; just to illustrate...). Obviously you can customize the logic to your liking.
A simple implementation of a view would look like this (make sure to hit 'track this' in hasura:
CREATE OR REPLACE VIEW contact_with_custom AS
SELECT id, (SELECT AVG(ID) FROM contacts) + id as custom FROM contacts;
See Extend with views
Another option is to use a computed field. This is just a postgres function that takes a row as an argument and returns some data and it just adds a new field to your existing 'table' in the Graphql API that is the return value of said function. (you don't 'track this' function; once created in the SQL section of Hasura, you add it as a 'computed field' under 'Modify' for the relevant table) Important to note that this option does not allow you to filter by this computed function, whereas in a view, all fields are filterable.
In the same schema mentioned above, a function for a computed field would look like this:
CREATE OR REPLACE FUNCTION custom(contact contacts)
RETURNS Numeric AS $$
SELECT (SELECT AVG(ID) from contacts ) + contact.id
$$ LANGUAGE sql STABLE;
Then you select this function for your computed field, naming it whatever you'd like...
See Computed fields

Can you do a join using an embedded array in a document with rethinkdb?

Say I have a user table with a property called favoriteUsers which is an embedded array. i.e.
users
{
name:'bob'
favoriteUsers:['jim', 'tim'] //can you have an index on an embedded array?
}
user_presence
{
name:'jim', //index on name
online_since:14440000
}
Can I do an inner or eqJoin against say a 2nd table using the embedded property, or would I have to pull favoriteUsers out of the users table and into a join table like in traditional sql?
r.table('users')
.getAll('bob', {index:'name'})
// inner join user_presence on user_presence.name in users.highlights
.eqJoin("name", r.table('user_presence'), {index:'name'})
Eventually, I'd like to call changes() on the query so that I can get a realtime update of the users favorite users presence changes
eqJoin can works on embedded document, but it works by compare a value which we transform/pick from the embedded document to mark secondary index on right table.
In any other complicated join, I would rather use concatMap together with getAll.
Let's say we can fetch user and user_presence of their favoriteUsers
r.table('users')
.getAll('bob', {index: 'name'})
.concatMap(function(user) {
return r.table('user_presence').filter(function(presence) {
return user("favoriteUsers").contains(presence("name"))
})
)
So ideally, now you get the data and do the join yourself by querying extra data that you need. My query may have some syntax/error but I hope it gives you the idea

How to create an outputschema which has nested bags in pig

I am trying out Pig UDFs and have been reading about it. While the online content was helpful, I am still not sure if I understand how to create a complex output schema which has nested bags.
Please help.The requirement is as follows. Say for example, I am analyzing e-commerce orders data. An order can have multiple products ordered in them.
I have the product level data grouped at an order level. This is the input to my UDF. So each grouped data at an order level containing information about the products in each order is my input.
InputSchema:
(grouped_at_order, {
(input_column_values_at_product1_level),
(input_column_values_at_product2_level)
})
I would be computing metrics both at an order level and at a product level in UDF. For example: sum(products) is an order level metric, color of each product is a product level metric. So, ForEach row grouped at an order level sent to UDF, I want to compute the order level & item level metrics.
Expected OutputSchema:
{
{ (orders, (computed_values_at_order_level)) },
{(productlevel,
{
(computed_values_at_product1_level),
(computed_values_at_product2_level),
(computed_values_at_product3_level)
}
)
}
}
The objective then is to persist the data at order level and product level in two separate output tables from pig.
Is there a better way of doing the same?
As #maxymoo said, before returning nested data from an UDF, I would check first if I really need it.
Anyway, if you do, the solution is not complicated but painfull. You just create schema, add field, then create a schema for the tuple, add the fields or the subbags into, and so on.
#Override
public Schema outputSchema(Schema input) {
Schema statsOrderLevel = new Schema();
statsOrderLevel.add(new FieldSchema("value", DataType.CHARARRAY));
Schema statsOrderLevelTuple = new Schema();
statsOrderLevelTuple.add(new FieldSchema(null, statsOrderLevel, DataType.TUPLE);
Schema statsOrderLevelBag = new Schema();
statsOrderLevelBag.add(new FieldSchema("stats", statsOrderLevelTuple, DataType.BAG));
[...]
}

Rethinkdb: Know which records were updated

Is there any way to update a sequence and know the primary keys of the updated documents?
table.filter({some:"value"}).update({something:"else"})
Then know the primary keys of the records that were updated without needing a second query?
It's currently not possible to return multiple values with {returnVals: true}, see for example https://github.com/rethinkdb/rethinkdb/issues/1382
There's is however a way to trick the system with forEach
r.db('test').table('test').filter({some: "value"}).forEach(function(doc) {
return r.db('test').table('test').get(doc('id')).update({something: "else"}, {returnVals: true}).do(function(result) {
return {generated_keys: [result("new_val")]}
})
})("generated_keys")
While it works, it's really really hack-ish. Hopefully with array limits, returnVals will soon be available for range writes.

RethinkDB index for filter + orderby

Lets say a comments table has the following structure:
id | author | timestamp | body
I want to use index for efficiently execute the following query:
r.table('comments').getAll("me", {index: "author"}).orderBy('timestamp').run(conn, callback)
Is there other efficient method I can use?
It looks that currently index is not supported for a filtered result of a table. When creating an index for timestamp and adding it as a hint in orderBy('timestamp', {index: timestamp}) I'm getting the following error:
RqlRuntimeError: Indexed order_by can only be performed on a TABLE. in:
This can be accomplished with a compound index on the "author" and "timestamp" fields. You can create such an index like so:
r.table("comments").index_create("author_timestamp", lambda x: [x["author"], x["timestamp"]])
Then you can use it to perform the query like so:
r.table("comments")
.between(["me", r.minval], ["me", r.maxval]
.order_by(index="author_timestamp)
The between works like the get_all did in your original query because it gets only documents that have the author "me" and any timestamp. Then we do an order_by on the same index which orders by the timestamp(since all of the keys have the same author.) the key here is that you can only use one index per table access so we need to cram all this information in to the same index.
It's currently not possible chain a getAll with a orderBy using indexes twice.
Ordering with an index can be done only on a table right now.
NB: The command to orderBy with an index is orderBy({index: 'timestamp'}) (no need to repeat the key)
The answer by Joe Doliner was selected but it seems wrong to me.
First, in the between command, no indexer was specified. Therefore between will use primary index.
Second, the between return a selection
table.between(lowerKey, upperKey[, {index: 'id', leftBound: 'closed', rightBound: 'open'}]) → selection
and orderBy cannot run on selection with an index, only table can use index.
table.orderBy([key1...], {index: index_name}) → selection<stream>
selection.orderBy(key1, [key2...]) → selection<array>
sequence.orderBy(key1, [key2...]) → array
You want to create what's called a "compound index." After that, you can query it efficiently.
//create compound index
r.table('comments')
.indexCreate(
'author__timestamp', [r.row("author"), r.row("timestamp")]
)
//the query
r.table('comments')
.between(
['me', r.minval],
['me', r.maxval],
{index: 'author__timestamp'}
)
.orderBy({index: r.desc('author__timestamp')}) //or "r.asc"
.skip(0) //pagi
.limit(10) //nation!
I like using two underscores for compound indexes. It's just stylistic. Doesn't matter how you choose to name your compound index.
Reference: How to use getall with orderby in RethinkDB

Resources