Multiple instances of IDistributedCache? - caching

IDistributedCache is provided as a standard API for accessing distributed caches from within ASP.NET applications. The provided API is very simple, basically presenting a cache as a container of key-value pairs, with DistributedCacheEntryOptions providing per-entry expiry options.
Now let's say within a single app there are lots of different types of data to be cached, some of which we may wish to logically group. Maybe we want some types of data to be grouped so we can e.g. choose to flush it all from the cache without affecting other types of data, or maybe we want the ability to put some types of data in a different cache cluster with high availability, or more resources for better performance, etc.
Given this I am leaning towards having a containing object that holds multiple instances of IDistributedCache, one each for a logical grouping. Given that this seems like it would be a common requirement I wonder if there is some standard way of achieving this pattern. Or maybe the advice would be to put everything into a single cache with a compound key (e.g. groupName-key), although I would prefer not do do that as I think it limits the flexibility of the caching layer.
As an aside I noticed that the NCache API provides the ability to optionally assign a groupName and subGroupName to each cache entry, which I think is pretty much what I want. However I would prefer to code against a IDistributedCache (or similar) in order to allow for drop-in alternative caching implementations.
Maybe another option is to create my own interface to provide the abstraction, but then I don't get the choice of using pre-built off-the-shelf IDistributedCache implementations (e.g. from NCache and Redis).
Also see:

... within a single app there are lots of different types of data to be cached, some of which we may wish to logically group.
You could group your caches with wrapper interfaces like this:
public interface IDistributedCache01 : IDistributedCache { ... }
public interface IDistributedCache02 : IDistributedCache { ... }
Registration of those during startup would look something like this:
services.AddSingleton<IDistributedCache01, SqlServerCache>();
services.AddSingleton<IDistributedCache02, SqlServerCache>();
Then you can ask for specific caches in constructors:
public MyController(IDistributedCache01 cache)
_cache = cache;
It's worth looking at the implementation of the built-in service registration methods. They are quite simple. Here they are for AddDistributedRedisCache and AddDistributedSqlServerCache.
When we take out the defensive programming, the registration methods are two lines of code:
services.AddSingleton<IDistributedCache, SqlServerCache>();

Yes, I was also going to suggest that NCache Grouping Feature resolves your problem where you can assign groups to multiple items while adding them and then use Group APIs to manage these items as needed. Another solution could be through NCache Tags which are even more flexible in nature than groups and can be used to achieve the mentioned use case.
However, when using IDistributedCache interface, you are limited to cache calls that are supported with IDistributedCache interface. Although, NCache fully supports IDistributedCache interface but you still dont have option to use Groups or Tags. I will suggest below options for using Groups and Tags through IDistributedCache with NCache.
• Use NCache APIs directly along with NCache IDistributedCache interface. This will allow you to use additional functionality that NCache has and IDistributedCache interface lacks including Groups, Tags and other features. You will have to go outside IDistributedCache interface in this case to achieve the goal.
• Create your own custom extension methods for IDistributedCache and call NCache groups and Tag APIs in the extension method to achieve this. You will stay in your IDistributedCache implementation and will have additional functionality handled through your custom extension methods.

Related pass data between pages using delegate

I want to know if it is possible to pass data to another page without Querystrings or Session.
In other words:
Can i do this using delegates or any other way?
You can POST data to another page (this is slightly different than using querystrings but may be too similar for your liking). Any data POSTED to another web form can be read with Request.Form["name_of_control"].
In certain cases I've had to develop my own approach involving generating a GUID and passing that around from page-to-page. Then any page can pull my data structures associated with a given GUID from a static key/value structure... Similar to Sessions I suppose but I had more control over how it worked. It allowed for any user to have multiple simultaneous windows/tabs open to my application and each one would work without affecting or being affected by the others (because each were passing around a different GUID). Whatever approach you choose I do urge you to consider the fact that users may want to use your application via multiple windows/tabs at the same time.
The right tool for you depends on your needs. Remember your challenge lies is making HTTP which is inherently stateless more state-ful. This thread has a very good discussion on this topic: Best Practices for Passing Data Between Pages

Technology for database access system

I am currently designing system which should allow access to database. Assumptions are as follows:
Database should has access layer. The access layer should provide objects that represents database tables. (This would be done using some ORM framework).
Client which want to get data from database, should get object from access layer first, and then get data using those objects.
Clients could use Python, Java or C++.
Access layer is based on Java.
There won't be to many clients, but they will be opearating on large amounts of data.
The question which is hard for me is what technology should be used for passing object between acces layer and clients. I consider using ZeroC ICE, Apache Thrift or Google Protocol Buffers.
Does anyone have opinion which one is worth using?
This is my research for Protocol Buffers:
simple to use and easy to start
well documented
highly optimized
defining object data structure in java-like language
automatically generating implementation of setters and getters and build methods for Python, Java and C++
open-source bidnings for other languages
object could be extended without affecting old version of an applications
there are many of open-source RpcChanel and RpcController implementation (not tested)
need to implement object transfer
objects structure have to be defined before use, so we can't add some fields on the fly (Updated: there are posibilities to do that, see the comments)
if there is a need for reading one object's filed, we have to parse whole file (in contrast, in XML we could ignore chosen tags)
if we want to use RPC for invoke object methods, we need to define services and deliver RpcChanel and RpcController implementation
This is my research for Apache Thrift:
provide compiler that generates source code for supported languages (classes, all things that are important)
allow defining optional fields in the structures ( when we do not set value on a field, the size of transfered data is lower)
enable point out some methods that are "one way" (returning nothing and client after invokation do not wait for answer from server about completion processing of query)
support collections (maps, lists, sets), objects, primitives serialization (deserialization), constants, enumerations, exceptions
most of problems, errors are solved and explained
provide different methods of serialization: (TBinaryProtocol...) and different ways of exchanging data: (TBufferedTransport, TZlibTransport... )
compiler produces classes (structures) for languages thaw we can extend by adding some new methods.
possible to add fields to protocol(server as well as client) and remove other- old code and new one can properly interact(some rules in update)
enable asynchronous calls
easy to use
documentation - contains some errors that sometimes it is really hard to get to know what is the source of the problem
not allways problems are well taged (when we look for solution in the Internet).
not support overloading for service methods
tutorials cover only simple examples of thrift usage
hard to start
ICE ZeroC:
Is better than Protocol Buffers, because I wouldn't need to implement object passing by myself via e.g. sockets. ICE also gives ServantLocators which can provide management of connections.
The question is: whether ICE is much slower and less efficient than the PB?

Cache Management with Numerous Similar Database Queries

I'm trying to introduce caching into an existing server application because the database is starting to become overloaded.
Like many server applications we have the concept of a data layer. This data layer has many different methods that return domain model objects. For example, we have an employee data access object with methods like:
findEmployeesForAccount(long accountId)
findEmployeesWorkingInDepartment(long accountId, long departmentId)
findEmployeesBySearch(long accountId, String search)
Each method queries the database and returns a list of Employee domain objects.
Obviously, we want to try and cache as much as possible to limit the number of queries hitting the database, but how would we go about doing that?
I see a couple possible solutions:
1) We create a cache for each method call. E.g. for findEmployeesForAccount we would add an entry with a key account-employees-accountId. For findEmployeesWorkingInDepartment we could add an entry with a key department-employees-accountId-departmentId and so on. The problem I see with this is when we add a new employee into the system, we need to ensure that we add it to every list where appropriate, which seems hard to maintain and bug-prone.
2) We create a more generic query for findEmployeesForAccount (with more joins and/or queries because more information will be required). For other methods, we use findEmployeesForAccount and remove entries from the list that don't fit the specified criteria.
I'm new to caching so I'm wondering what strategies people use to handle situations like this? Any advice and/or resources on this type of stuff would be greatly appreciated.
I've been struggling with the same question myself for a few weeks now... so consider this a half-answer at best. One bit of advice that has been working out well for me is to use the Decorator Pattern to implement the cache layer. For example, here is an article detailing this in C#:
This allows you to literally "wrap" your existing data access methods without touching them. It also makes it very easy to swap out the cached version of your DAL for the direct access version at runtime quite easily (which can be useful for unit testing).
I'm still struggling to manage my cache keys, which seem to spiral out of control when there are numerous parameters involved. Inevitably, something ends up not being properly cleared from the cache and I have to resort to heavy-handed ClearAll() approaches that just wipe out everything. If you find a solution for cache key management, I would be interested, but I hope the decorator pattern layer approach is helpful.

Repository pattern with "modern" data access strategies

So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
Here is a blog post that describes the model you might be looking for.
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.

Caching Pattern: What do you call (and how do you replace) OpenSymphony OsCache "group" paradigm

A caching issue for you cache gurus.
We have used OpenSymphony's OsCache for several years and consider moving to a better/stronger/faster/actively-developed caching product.
We have used OsCache's "group entry" feature and have not found it elsewhere.
In short, OsCache allows you to specify one or more groups at 'entry
insertion time'. Later you can invalidate a "group of entries", without knowing the keys for each entry.
OsCache Example
Here is example code using this mechanism:
Object[] groups = {"mammal", "Northern Hemisphere", "cloven-feet"}
myCache.put(myKey, myValue , groups );
// later you can flush all 'mammal' entries
// or flush all 'cloven-foot'
Alternative: Matcher Mechanism
We use another home-grown cache written by a former team member which uses a 'key matcher' pattern for invalidating entries
In this approach you would define your 'key' and matcher' class as follows:
public class AnimalKey
String fRegion;
String fPhylum;
String fFootType;
..getters and setters go here
public class RegionMatcher implements ICacheKeyMatcher
String fRegion;
public RegionMatcher(String pRegion)
public boolean isMatch(Obect pKey)
boolean bMatch=false;
if (pKey instanceof AnimalKey)
AnimalKey key = (AninmalKey) pKey);
myCache.put(new AnimalKey("North America","mammal", "chews-the-cud");
//remove all entries for 'north america'
IKeyMatcher myMatcher= new AnimalKeyMatcher("North America");
This mechanism has simple implementation, but has a performance
downside: it has to spin through each entry to invalidate a group. (Though it's still faster than spinning through a database).
The question
(Warning: this may sound stupid) What do you call this functionality? OsCache calls it "cache groups". Neither JbossCache nor EhCache doesn't seem to neither define nor implement it. Realm? Region? Kingdom?
Do standard patterns exist for this "cache groups/region" paradigm?
How do rising-star caching products (e.g. ehcache, coherence, jbosscache) handle this problem
This paradigm isn't in the jcache spec, right? (JSR-107)
How do you handle "mass invalidation"? Caches are great until they grow stale. An API which allows you to invalidate wide swaths is a big help. (E.g. administrator wants to press a button and clear all the cached post entries for, say, a particular forum)
I too implemented a matcher approach when trying to scale a legacy system with an ad hoc invalidation process. The O(n) nature wasn't a problem since the caches were small, the invalidation was performed on a non-user facing thread, and it didn't hold the locks so there wasn't a contention penalty. This was needed for matching against keys that cross cut caches, such as to invalidate all data for a company in caches spread across the application. This was really a problem of having no design centers so the application was monolithic and poorly decomposed.
When we rewrote it based on domain services, I adopted a different strategy. We now had the domain for specific data centralized into specific caches, such as for configurations, so it became a desire for multi-lookup. In this case we realized that the key was just a subset of the value, so we could extract all of the keys after load from metadata (e.g. annotations). This allowed for fine grained grouping and a convenient programming model through our cache abstraction. I published the core data structure, IndexMap, in a tutorial on the idea. Its not meant for direct usage outside of an abstraction, but better solves the grouping problem we faced.
