How Should Complex ReQL Queries be Composed? - rethinkdb

Are there any best practices or ReQL features that that help with composing complex ReQL queries?
In order to illustrate this, imagine a fruits table. Each document has the following structure.
{
"id": 123,
"name": "name",
"colour": "colour",
"weight": 5
}
If we wanted to retrieve all green fruits, we might use the following query.
r
.db('db')
.table('fruits')
.filter({colour: 'green'})
However, in more complex cases, we might wish to use a variety of complex command combinations. In such cases, bespoke queries could be written for each case, but this could be difficult to maintain and could violate the Don't Repeat Yourself (DRY) principle. Instead, we might wish to write bespoke queries which could chain custom commands, thus allowing complex queries to be composed in a modular fashion. This might take the following form.
r
.db('db')
.table('fruits')
.custom(component)
The component could be a function which accepts the last entity in the command chain as its argument and returns something, as follows.
function component(chain)
{
return chain
.filter({colour: 'green'});
};
This is not so much a feature proposal as an illustration of the problem of complex queries, although such a feature does seem intuitively useful.
Personally, my own efforts in resolving this problem have involved the creation of a compose utility function. It takes an array of functions as its main argument. Each function is called, passed a part of the query chain, and is expected to return an amended version of the query chain. Once the iteration is complete, a composition of the query components is returned. This can be viewed below.
function compose(queries, parameters)
{
if (queries.length > 1)
{
let composition = queries[0](parameters);
for (let index = 1; index < queries.length; index++)
{
let query = queries[index];
composition = query(composition, parameters);
};
return composition;
}
else
{
throw 'Must be two or more queries.';
};
};
function startQuery()
{
return RethinkDB;
};
function filterQuery1(query)
{
return query.filter({name: 'Grape'});
};
function filterQuery2(query)
{
return query.filter({colour: 'Green'});
};
function filterQuery3(query)
{
return query.orderBy(RethinkDB.desc('created'));
};
let composition = compose([startQuery, filterQuery1, filterQuery2, filterQuery3]);
composition.run(connection);
It would be great to know whether something like this exists, whether there are best practises to handle such cases, or whether this is an area where ReQL could benefit from improvements.

In RethinkDB doc, they state it clearly: All ReQL queries are chainable
Queries are constructed by making function calls in the programming
language you already know. You don’t have to concatenate strings or
construct specialized JSON objects to query the database. All ReQL
queries are chainable. You begin with a table and incrementally chain
transformers to the end of the query using the . operator
You do not have to compose another thing which just implicit your code, which gets it more difficult to read and be unnecessary eventually.
The simple way is assign the rethinkdb query and filter into the variables, anytime you need to add more complex logic, add directly to these variables, then run() it when your query is completed
Supposing I have to search a list of products with different filter inputs and getting pagination. The following code is exposed in javascript (This is simple code for illustration only)
let sorterDirection = 'asc';
let sorterColumnName = 'created_date';
var buildFilter = r.row('app_id').eq(appId).and(r.row('status').eq('public'))
// if there is no condition to start up, you could use r.expr(true)
// append every filter into the buildFilter var if they are positive
if (escapedKeyword != "") {
buildFilter = buildFilter.and(r.row('name').default('').downcase().match(escapedKeyword))
}
// you may have different filter to add, do the same to append them into buildFilter.
// start to make query
let query = r.table('yourTableName').filter(buildFilter);
query.orderBy(r[sorterDirection](sorterColumnName))
.slice(pageIndex * pageSize, (pageIndex * pageSize) + pageSize).run();

Related

How to add List in GraphQL

I am new to graphQL. My Ghapgh Ql query wants following format {"offers": false, "new": true, "favourites": true} where offers, new, favourites are of Filters in java class. Query can have any number of such filters. How to write graphql query which will accept list of filters and build query like above. I tried by writting above hadcoded query.
This question is similar to yours in that it discusses the concept of filtering with GraphQL. There is also a comment on the first answer that leads to some GraphCool documentation that discusses their approach to filtering. (GraphCool Docs). I'm not super knowledgeable on the Java front so hopefully this is still helpful.
Generally though, you can include optional params in your query, that if present filter your result in your resolve.
Resolves take args as the second argument. Depending on how many potential filters you would like to provide, you can choose the correct implementation.
Two options are, 1) have a param filter that takes a list of filters. (This is similar to GraphCool; 2) Just use variable names as args and filter on those things.
So for your example, I'm not really sure what your models look like exactly or what type of DB/ORM you are using, but if you went with option 2, you can essentially in your resolve just filter the result (JS sorry I'm not much of a java expert).
resolve: (_, args, context) => {
const result = someDbInteractionForResult()
result.filter( (item) => {
for (const [filter, value] of Object.entries(args)) {
const passFilter = value === item.filter;
if (!passFilter) {
return false;
}
}
return true;
}
}
This is still a similar approach to passing the filters in in the GraphCool manner. I think the decision is just that if you have other args that you will also include. If so, then you will want to block off your filters in its own object so it's easier to separate your concerns.

How to intercept Linq filters

In LINQ to Objects, Is there anyway to identify which entities/objects qualified/disqualified at each filter?
For e.g) Let's say I have an entity called "Product' (Id , name) and if i input 100 Products into a Linq query which has 5 "where" conditon and get 20 Products as output.
Is there any way to identify which product got filtered at which where condition ?
This can probably be generalized but you can do this. I just don't see the use case for it.
Use ToLookup() to partition your queries. The "disqualified" items would be lumped under the false group and you can continue your query with the true group.
e.g.,
var numbers = Enumerable.Range(0, 100);
var p1 = numbers.ToLookup(n1 => n1 < 50);
// p1[false] -> [ 50, 51, 52, ... ]
var p2 = p1[true].ToLookup(n2 => n2 % 2 == 0);
// p2[false] -> [ 1, 3, 5, 7, ... ]
var p3 = p2[true]... // and so on
While everybody so far showed you how to do the obvious thing, i.e. group your data into chunks matching one condition, nobody addressed what you're actually asking
identify which product got filtered at which where condition
The tough question is how are going to log the where condition?
It may seem trivial to do this, but you're going to fail because you won't be able to obtain a string representation of the Func object that was used. It is a delegate, i.e. compiled code, and you'll have to reverse-engineer it at runtime it to obtain the source code. This in itself is hard enough, but If the compiler chose to optimize the code you're lost.
Only if you're willing to create your own extension method(s) that use expressions in stead of Funcs you'll be able to log the where condition, because an expression consists of tokens that can easily be 'ToString-ed' at runtime.
For instance:
public static IEnumerable<T> WhereEx<T>(this IEnumerable<T> sequence, Expression<Func<T, bool>> condition)
{
var logString = condition.Body.ToString();
foreach (T item in sequence.Where(condition.Compile()))
{
yield return item;
// logging hook here, this one simply dumps in Linqpad.
string.Format("Item '{0}' meets '{1}'", item, logString).Dump();
}
}
Now you're really intercepting the filter. But you have no clue where in the code the filter is applied. If you also want to log stack frames, performance will likely become an issue (reflection!), because it's already under pressure by compiling the expression each time. Then logging should be an async operation involving a thread-safe logging queue.

How to combine collection of linq queries into a single sql request

Thanks for checking this out.
My situation is that I have a system where the user can create custom filtered views which I build into a linq query on the request. On the interface they want to see the counts of all the views they have created; pretty straight forward. I'm familiar with combining multiple queries into a single call but in this case I don't know how many queries I have initially.
Does anyone know of a technique where this loop combines the count queries into a single query that I can then execute with a ToList() or FirstOrDefault()?
//TODO Performance this isn't good...
foreach (IMeetingViewDetail view in currentViews)
{
view.RecordCount = GetViewSpecificQuery(view.CustomFilters).Count();
}
Here is an example of multiple queries combined as I'm referring to. This is two queries which I then combine into an anonymous projection resulting in a single request to the sql server.
IQueryable<EventType> eventTypes = _eventTypeService.GetRecords().AreActive<EventType>();
IQueryable<EventPreferredSetup> preferredSetupTypes = _eventPreferredSetupService.GetRecords().AreActive<EventPreferredSetup>();
var options = someBaseQuery.Select(x => new
{
EventTypes = eventTypes.AsEnumerable(),
PreferredSetupTypes = preferredSetupTypes.AsEnumerable()
}).FirstOrDefault();
Well, for performance considerations, I would change the interface from IEnumerable<T> to a collection that has a Count property. Both IList<T> and ICollection<T> have a count property.
This way, the collection object is keeping track of its size and you just need to read it.
If you really wanted to avoid the loop, you could redefine the RecordCount to be a lazy loaded integer that calls GetViewSpecificQuery to get the count once.
private int? _recordCount = null;
public int RecordCount
{
get
{
if (_recordCount == null)
_recordCount = GetViewSpecificQuery(view.CustomFilters).Count;
return _recordCount.Value;
}
}

Optimizing away OrderBy() when using Any()

So I have a fairly standard LINQ-to-Object setup.
var query = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
// ...
if (!condition && !query.Any())
return; // seems to enumerate and sort entire enumerable
// ...
foreach (var item in query)
// ...
This enumerates everything twice. Which is bad.
var queryFiltered = expensiveSrc.Where(x=> x.HasFoo);
var query = queryFiltered.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
if (!condition && !queryFiltered.Any())
return;
// ...
foreach (var item in query)
// ...
Works, but is there a better way?
Would there be any non-insane way to "enlighten" Any() to bypass the non-required operations? I think I remember this sort of optimisation going into EduLinq.
Why not just get rid of the redundant:
if (!query.Any())
return;
It really doesn't seem to be serving any purpose - even without it, the body of the foreach won't execute if the query yields no results. So with the Any() check in, you save nothing in the fast path, and enumerate twice in the slow path.
On the other hand, if you must know if there were any results found after the end of the loop, you might as well just use a flag:
bool itemFound = false;
foreach (var item in query)
{
itemFound = true;
... // Rest of the loop body goes here.
}
if(itemFound)
{
// ...
}
Or you could use the enumerator directly if you're really concerned about the redundant flag-setting in the loop body:
using(var erator = query.GetEnumerator())
{
bool itemFound = erator.MoveNext();
if(itemFound)
{
do
{
// Do something with erator.Current;
} while(erator.MoveNext())
}
// Do something with itemFound
}
There is not much information that can be extracted from an enumerable, so maybe it's better to turn the query into an IQueryable? This Any extension method walks down its expression tree skipping all irrelevant operations, then it turns the important branch into a delegate that can be called to obtain an optimized IQueryable. Standard Any method applied to it explicitly to avoid recursion. Not sure about corner cases, and maybe it makes sense to cache compiled queries, but with simple queries like yours it seems to work.
static class QueryableHelper {
public static bool Any<T>(this IQueryable<T> source) {
var e = source.Expression;
while (e is MethodCallExpression) {
var mce = e as MethodCallExpression;
switch (mce.Method.Name) {
case "Select":
case "OrderBy":
case "ThenBy": break;
default: goto dun;
}
e = mce.Arguments.First();
}
dun:
var d = Expression.Lambda<Func<IQueryable<T>>>(e).Compile();
return Queryable.Any(d());
}
}
Queries themselves must be modified like this:
var query = expensiveSrc.AsQueryable()
.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
Would there be any non-insane way to "enlighten" Any() to bypass the non-required operations? I think I remember this sort of optimisation going into EduLinq.
Well I'm not going to ignore any question which mentions Edulinq :)
In this case, Edulinq might well be faster than LINQ to Objects, as its OrderBy implementation is as lazy as it can be - it only sorts as much as it needs to in order to retrieve the elements it returns.
However, fundamentally it still has to read the whole sequence in before it returns anything. After all, the last element in the sequence could be the first one which has to be returned.
If you're in control of the whole stack, you could make Any() detect that it's being called on your "known" IOrderedEnumerable implementation, and go straight to the original source. Note that this does create a change in the observed behaviour though - if iterating over the whole sequence throws an exception (or has any other side effect) then that side-effect would be lost by the optimization. You could argue that's okay, of course - what counts as "valid" optimization in LINQ is a decidedly tricky area.
One other possibility which is pretty horrible but which would solve this particular problem would be to make the iterator returned from the IOrderedEnumerable just take the first value of MoveNext() from the source. That's enough for the normal implementation of Any, and at that point we don't need to know what the first element is. We could defer the actual sorting until the first time the Current property is used.
That's a pretty special-case optimization though - and one which I'd be wary to implement. I think Ani's approach is the better one - just use the fact that iterating over query using foreach will never go into the body of the loop if the query results are empty.
Edit (revised): This answer adressess the issue of the query executing twice, which I believe is the key issue. See below why:
Making Any() smarter is something that only the Linq implementers can do, IMO... Or it would be some dirty adventure using reflection.
Using a class as shown below, you can cache the output of the original enumerable, and let it be enumerated twice:
public class CachedEnumerable<T>
{
public CachedEnumerable(IEnumerable<T> enumerable)
{
_source = enumerable.GetEnumerator();
}
public IEnumerable<T> Enumerate()
{
int itemIndex = 0;
while (true)
{
if (itemIndex < _cache.Count)
{
yield return _cache[itemIndex];
itemIndex++;
continue;
}
if (!_source.MoveNext())
yield break;
var current = _source.Current;
_cache.Add(current);
yield return current;
itemIndex++;
}
}
private List<T> _cache = new List<T>();
private IEnumerator<T> _source;
}
This way you keep the lazy aspect of LINQ, keep the code readable and generic. It wil be slower that directly using IEnumerator<>. There are lots of opportunities to extend, and optimize this class, such as a policy for discarding old items, getting rid of the coroutine etc. But that is beyond the point of this question I think.
Oh, and the class is not thread safe as it is now. This wasn't asked, but I can imagine people trying. I think this could be easily added, if the source enumerable has no thread affinity..
Why would this be optimal?
Let's consider two possibilites: the enumeration could containt elements or it does not.
If it contains elements, this approach is optimal as the query is
only run once.
If it contains no elements, you would be tempted
to eliminate the OrderBy and Select part of your queries, as they add
no value. But.. if there are zero items after the Where() clause, there are zero items to sort, which will cost zero time (well, almost). The same goes for the Select() clause.
What if this is not fast enough yet? In that case my strategy would be to bypass Linq. Now, I really love linq, but it's elegance comes at a price. So for every 100 times of using Linq, there typically will be one or two computations that are important to execute really fast, which I write with good old for loops and lists. Part of mastering a technology is recognizing where it is not appropriate. Linq is no exception to that rule.
Try this:
var items = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName).ToList();
// ...
if (!condition && items.Count == 0)
return; // Just check the count
// ...
foreach (var item in items)
// ...
The query is executed just once.
but I've lost the streaming/lazy loading that's half the point of linq
Lazy loading (deferred execution), and 2 LINQ queries with disparate results cannot be optimized (reduced) to 1 query execution.
why are you not using a .ToArray()
var query = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName).ToArray();
if there are not elements, sorting and selecting should not give much overhead. if you are sorting, then you need anyway a cache where to store the data, so the overhead .ToArray produces should not be so much.
if you decompile the OrderedEnumerable class, you find that there an int[] array containing the references is formed, so you just create by using .ToArray (or .ToList) a new reference array.
BUT
if expensiveSrc comes from a database, other strategies could be better. if the ordering can be done in the database, this would give to you quite lot of overhead because the data is stored twice.

Model records ordering in Spine.js

As I can see in the Spine.js sources the Model.each() function returns Model's records in the order of their IDs. This is completely unreliable in scenarios where ordering is important: long person list etc.
Can you suggest a way to keep original records ordering (in the same order as they've arrived via refresh() or similar functions) ?
P.S.
Things are even worse because by default Spine.js internally uses new GUIDs as IDs. So records order is completely random which unacceptable.
EDIT:
Seems that in last commit https://github.com/maccman/spine/commit/116b722dd8ea9912b9906db6b70da7948c16948a
they made it possible, but I have not tested it myself because I switched from Spine to Knockout.
Bumped into the same problem learning spine.js. I'm using pure JS, so i was neglecting the the contact example http://spinejs.com/docs/example_contacts which helped out on this one. As a matter of fact, you can't really keep the ordering from the server this way, but you can do your own ordering with javascript.
Notice that i'm using the Element Pattern here. (http://spinejs.com/docs/controller_patterns)
First you set the function which is gonna do the sorting inside the model:
/*Extending the Student Model*/
Student.extend({
nameSort: function(a,b) {
if ((a.name || a.email) > (b.name || b.email))
return 1;
else
return -1
}
});
Then, in the students controller you set the elements using the sort:
/*Controller that manages the students*/
var Students = Spine.Controller.sub({
/*code ommited for simplicity*/
addOne: function(student){
var item = new StudentItem({item: student});
this.append(item.render());
},
addAll: function(){
var sortedByName = Student.all().sort(Student.nameSort);
var _self = this;
$.each(sortedByName, function(){_self.addOne(this)});
},
});
And that's it.

Resources