How to cache IQueryable result for paging - caching

What is the best way to cache Queryable result if every call need to calculate lot of things and return it to client.
Code Sample
[Queryable]
public IQueryable<Car> Get()
{
try
{
var result=GetCarList();
//GetCarList() calculation is taking around 1 min
return result.AsQueryable();
}
}
GetCarList()
{
var query = from car in db.CarDetail
where car.color == "white"
select car;
//10k records of white cars are selected with out considering makers
//white is mandatory
foreach (var car in query)
{
//Processing each record in every call
}
}
Query sample
First Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10
Second Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10$skip=10
Third Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10$skip=20
Every time each call is taking 1 min even though the calculation is same for current filter. what is the best way to cache this kind of api call?

As the OP explains in his comment, the object to cache is the list returned by the call to GetCarList(); and the result is always the same.
You can simply store this in Cache, see docs: Cache Class.
When you need it, check if it's in cache. If not, create it and store in cache before using (anywhere you want to use it). As the Cache is thread safe you will not have concurernty problems by accesing it from different requests.

Related

GraphQL Relay hasNextPage

How does graphql generates hasNextPage if only "first" parameter passed?
I am using
return relay.connectionFromPromisedArray(
global.app.get('model__user').getUsers(args),
args
);
and query:
query RootQueryType { viewer { user(id: 1){ id,email,friends(first: 5) {edges {cursor, node { id, email } }, pageInfo { hasNextPage } } } } }
So how can i pass to graphql / relay friends count so hasNextPage will be generated correct?
Relay pagination is not page based, but rather cursor based. So you paginate by saying "I want X items after item Y". Item Y is not pointed to as a page number or an offset, but rather as a pointer to that exact object, a so-called cursor. This model of pagination is nice for, for example, infinite scrolling. "Pages" are also stable after adding or removing items, as they don't depend on number of items.
hasNextPage in Relay GraphQL spec just indicates whether there are more items after the last element that has been retrieved. So in your case, it means there are more than 5 elements in total and you'll get more elements if you do
friends(first: 5, after: "CURSOR_TO_THE_LAST_ELEMENT")
You can retrieve cursor from the edges list, it's one of the elements alongside node there.
You can find detailed information on the relay pagination algorithm here: https://facebook.github.io/relay/graphql/connections.htm#sec-Pagination-algorithm.
To answer your specific question about hasNextPage, this is the algorithm:
function hasNextPage(allEdges, before, after, first, last) {
// If first was not set, return false.
if (first === null) { return false; }
// Apply the before & after cursor arguments to the set of edges.
// i.e. edges is the set of edges between the before and after cursors
const edges = ApplyCursorsToEdges(allEdges, before, after)
// If more edges exist between the before & after cursors than
// you are asking for then there is a next page.
if (edges.length > first) { return true; }
return false
}
A quick note on cursor vs page based pagination. It is generally a bad idea to paginate using fixed page sizes. A classic example of this is using the OFFSET keyword in SQL to grab the next page. There are many issues with this approach. For example, what would happen if a new object was inserted while you were in the middle of paginating the set? If the new object was inserted before the page you are currently grabbing and you use a fixed offset you are going to grab an object that you have already grabbed which leads to duplicate data in your presentation layer. Using cursors for pagination fixes this problem by allowing you to keep track of the objects themselves instead of counts of the objects.
Once last thing with relay pagination specifically. I recommend only using (first & after) OR (last & before) at any given time. Using both in the same query can lead to logical, yet unexpected results.
Best of luck!

Magento search queries yielding empty results in API

I have this chunk of code:
//to-do
public function searchVehicles($terms, $offset=1, $order='ASC')
{
if (trim($terms) == '') {
return array();
}
$query = $this->_getQuery($terms);
$query->setStoreId(1);
if ($query->getId()) {
$query->setPopularity($query->getPopularity()+1);
}
else {
$query->setPopularity(1);
}
$query->prepare();
$query->save();
$collection = Mage::getResourceModel('catalog/product_collection');
$collection->getSelect()->joinInner(
array('search_result' => $collection->getTable('catalogsearch/result')),
$collection->getConnection()->quoteInto(
'search_result.product_id=e.entity_id AND search_result.query_id=?',
$query->getId()
),
array('relevance' => 'relevance')
);
$collection->setStore(1);
//Mage::getSingleton('catalog/product_status')->addVisibleFilterToCollection($collection);
//Mage::getSingleton('catalog/product_visibility')->addVisibleInSearchFilterToCollection($collection);
return $this->_listProductCollection($collection, $offset, $order);
}
Which is inside a Resource class and reachable via SOAP.
Before we start: Yes, I remember to do the cache flushing and recompiling process - I clarify because this is an usual issue to newbies like me xDDD.
Now: I can access such method but it returns [].
SPECIAL NOTE: $this->_listProductCollection($collection, $offset, $order); WORKS since i'm using the same method in other collections fetched from other methods in the same resource, and have no trouble at all.
Let me review the intention of my code since I'm a newbie at Magento (I'm using version 1.6.2).
The code is based on the CatalogSearch/ResultController controller's indexAction() method, and tried to learn about it.
An empty query will yield an empty result and will not bother the Magento search engine.
There's only a Store (id = 1) in the site and the search query is created like this:
private function _getQuery($terms)
{
$query = Mage::getModel('catalogsearch/query')->loadByQuery($terms);
if (!$query->getId()) {
$query->setQueryText($terms);
}
return $query;
}
The query increases it's popularity (I took this code from the controller. I assume this is for statistical purposes only).
The query is prepared (I think this means: the MySQL internal query is prepared) so I can fetch it later.
The query is saved - AFAIK this means that the query results are iterated and cached so a subsequent same query will only fetch the stored results instead of processing the search again.
At this point the query will have an ID.
I get the whole Product collection, and join it with the search result table. SEEMS that the results table has - at least (queryId, matchedProductId). I only keep the products having IDs in the matched results, and from store 1.
I list the products.
Note that the filters are currently commented.
However, the returned list is [] (an empty list) when I hit this API entry point, althought searching in the usual search bar gives me the expected result.
Question: What am I missing? What did I misunderstood in the process?

How to combine collection of linq queries into a single sql request

Thanks for checking this out.
My situation is that I have a system where the user can create custom filtered views which I build into a linq query on the request. On the interface they want to see the counts of all the views they have created; pretty straight forward. I'm familiar with combining multiple queries into a single call but in this case I don't know how many queries I have initially.
Does anyone know of a technique where this loop combines the count queries into a single query that I can then execute with a ToList() or FirstOrDefault()?
//TODO Performance this isn't good...
foreach (IMeetingViewDetail view in currentViews)
{
view.RecordCount = GetViewSpecificQuery(view.CustomFilters).Count();
}
Here is an example of multiple queries combined as I'm referring to. This is two queries which I then combine into an anonymous projection resulting in a single request to the sql server.
IQueryable<EventType> eventTypes = _eventTypeService.GetRecords().AreActive<EventType>();
IQueryable<EventPreferredSetup> preferredSetupTypes = _eventPreferredSetupService.GetRecords().AreActive<EventPreferredSetup>();
var options = someBaseQuery.Select(x => new
{
EventTypes = eventTypes.AsEnumerable(),
PreferredSetupTypes = preferredSetupTypes.AsEnumerable()
}).FirstOrDefault();
Well, for performance considerations, I would change the interface from IEnumerable<T> to a collection that has a Count property. Both IList<T> and ICollection<T> have a count property.
This way, the collection object is keeping track of its size and you just need to read it.
If you really wanted to avoid the loop, you could redefine the RecordCount to be a lazy loaded integer that calls GetViewSpecificQuery to get the count once.
private int? _recordCount = null;
public int RecordCount
{
get
{
if (_recordCount == null)
_recordCount = GetViewSpecificQuery(view.CustomFilters).Count;
return _recordCount.Value;
}
}

Optimizing away OrderBy() when using Any()

So I have a fairly standard LINQ-to-Object setup.
var query = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
// ...
if (!condition && !query.Any())
return; // seems to enumerate and sort entire enumerable
// ...
foreach (var item in query)
// ...
This enumerates everything twice. Which is bad.
var queryFiltered = expensiveSrc.Where(x=> x.HasFoo);
var query = queryFiltered.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
if (!condition && !queryFiltered.Any())
return;
// ...
foreach (var item in query)
// ...
Works, but is there a better way?
Would there be any non-insane way to "enlighten" Any() to bypass the non-required operations? I think I remember this sort of optimisation going into EduLinq.
Why not just get rid of the redundant:
if (!query.Any())
return;
It really doesn't seem to be serving any purpose - even without it, the body of the foreach won't execute if the query yields no results. So with the Any() check in, you save nothing in the fast path, and enumerate twice in the slow path.
On the other hand, if you must know if there were any results found after the end of the loop, you might as well just use a flag:
bool itemFound = false;
foreach (var item in query)
{
itemFound = true;
... // Rest of the loop body goes here.
}
if(itemFound)
{
// ...
}
Or you could use the enumerator directly if you're really concerned about the redundant flag-setting in the loop body:
using(var erator = query.GetEnumerator())
{
bool itemFound = erator.MoveNext();
if(itemFound)
{
do
{
// Do something with erator.Current;
} while(erator.MoveNext())
}
// Do something with itemFound
}
There is not much information that can be extracted from an enumerable, so maybe it's better to turn the query into an IQueryable? This Any extension method walks down its expression tree skipping all irrelevant operations, then it turns the important branch into a delegate that can be called to obtain an optimized IQueryable. Standard Any method applied to it explicitly to avoid recursion. Not sure about corner cases, and maybe it makes sense to cache compiled queries, but with simple queries like yours it seems to work.
static class QueryableHelper {
public static bool Any<T>(this IQueryable<T> source) {
var e = source.Expression;
while (e is MethodCallExpression) {
var mce = e as MethodCallExpression;
switch (mce.Method.Name) {
case "Select":
case "OrderBy":
case "ThenBy": break;
default: goto dun;
}
e = mce.Arguments.First();
}
dun:
var d = Expression.Lambda<Func<IQueryable<T>>>(e).Compile();
return Queryable.Any(d());
}
}
Queries themselves must be modified like this:
var query = expensiveSrc.AsQueryable()
.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName);
Would there be any non-insane way to "enlighten" Any() to bypass the non-required operations? I think I remember this sort of optimisation going into EduLinq.
Well I'm not going to ignore any question which mentions Edulinq :)
In this case, Edulinq might well be faster than LINQ to Objects, as its OrderBy implementation is as lazy as it can be - it only sorts as much as it needs to in order to retrieve the elements it returns.
However, fundamentally it still has to read the whole sequence in before it returns anything. After all, the last element in the sequence could be the first one which has to be returned.
If you're in control of the whole stack, you could make Any() detect that it's being called on your "known" IOrderedEnumerable implementation, and go straight to the original source. Note that this does create a change in the observed behaviour though - if iterating over the whole sequence throws an exception (or has any other side effect) then that side-effect would be lost by the optimization. You could argue that's okay, of course - what counts as "valid" optimization in LINQ is a decidedly tricky area.
One other possibility which is pretty horrible but which would solve this particular problem would be to make the iterator returned from the IOrderedEnumerable just take the first value of MoveNext() from the source. That's enough for the normal implementation of Any, and at that point we don't need to know what the first element is. We could defer the actual sorting until the first time the Current property is used.
That's a pretty special-case optimization though - and one which I'd be wary to implement. I think Ani's approach is the better one - just use the fact that iterating over query using foreach will never go into the body of the loop if the query results are empty.
Edit (revised): This answer adressess the issue of the query executing twice, which I believe is the key issue. See below why:
Making Any() smarter is something that only the Linq implementers can do, IMO... Or it would be some dirty adventure using reflection.
Using a class as shown below, you can cache the output of the original enumerable, and let it be enumerated twice:
public class CachedEnumerable<T>
{
public CachedEnumerable(IEnumerable<T> enumerable)
{
_source = enumerable.GetEnumerator();
}
public IEnumerable<T> Enumerate()
{
int itemIndex = 0;
while (true)
{
if (itemIndex < _cache.Count)
{
yield return _cache[itemIndex];
itemIndex++;
continue;
}
if (!_source.MoveNext())
yield break;
var current = _source.Current;
_cache.Add(current);
yield return current;
itemIndex++;
}
}
private List<T> _cache = new List<T>();
private IEnumerator<T> _source;
}
This way you keep the lazy aspect of LINQ, keep the code readable and generic. It wil be slower that directly using IEnumerator<>. There are lots of opportunities to extend, and optimize this class, such as a policy for discarding old items, getting rid of the coroutine etc. But that is beyond the point of this question I think.
Oh, and the class is not thread safe as it is now. This wasn't asked, but I can imagine people trying. I think this could be easily added, if the source enumerable has no thread affinity..
Why would this be optimal?
Let's consider two possibilites: the enumeration could containt elements or it does not.
If it contains elements, this approach is optimal as the query is
only run once.
If it contains no elements, you would be tempted
to eliminate the OrderBy and Select part of your queries, as they add
no value. But.. if there are zero items after the Where() clause, there are zero items to sort, which will cost zero time (well, almost). The same goes for the Select() clause.
What if this is not fast enough yet? In that case my strategy would be to bypass Linq. Now, I really love linq, but it's elegance comes at a price. So for every 100 times of using Linq, there typically will be one or two computations that are important to execute really fast, which I write with good old for loops and lists. Part of mastering a technology is recognizing where it is not appropriate. Linq is no exception to that rule.
Try this:
var items = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName).ToList();
// ...
if (!condition && items.Count == 0)
return; // Just check the count
// ...
foreach (var item in items)
// ...
The query is executed just once.
but I've lost the streaming/lazy loading that's half the point of linq
Lazy loading (deferred execution), and 2 LINQ queries with disparate results cannot be optimized (reduced) to 1 query execution.
why are you not using a .ToArray()
var query = expensiveSrc.Where(x=> x.HasFoo)
.OrderBy(y => y.Bar.Count())
.Select(z => z.FrobberName).ToArray();
if there are not elements, sorting and selecting should not give much overhead. if you are sorting, then you need anyway a cache where to store the data, so the overhead .ToArray produces should not be so much.
if you decompile the OrderedEnumerable class, you find that there an int[] array containing the references is formed, so you just create by using .ToArray (or .ToList) a new reference array.
BUT
if expensiveSrc comes from a database, other strategies could be better. if the ordering can be done in the database, this would give to you quite lot of overhead because the data is stored twice.

How to cache mvc3 webgrid results (so that col sort click doesn't?

Can somebody please tell me how I can cache my webgrid results so that when I sort by column it doesn't re-run my stored procedure query every time?
When clicking on a column link to sort, the stored proc (which is a little slow) that populates the table/grid is re-executed every time and hits the database. Any caching tips and tricks would be greatly appreciated.
Thx!
Well inside the controller action which is invoking the method on your repository supposed to query the database you could check whether the cache already contains the results.
Here's a commonly used pattern:
public ActionResult Foo()
{
// Try fetching the results from the cache
var results = HttpContext.Cache["results"] as IEnumerable<MyViewModel>;
if (results == null)
{
// the results were not found in the cache => invoke the expensive
// operation to fetch them
results = _repository.GetResults();
// store the results into the cache so that on subsequent calls on this action
// the expensive operation would not be called
HttpContext.Cache["results"] = results;
}
// return the results to the view for displaying
return View(results);
}

Resources