Handling large object in stateless environment - performance

We have various windows services that load up a large amount of data i.e. mostly settings, from a database into an object which is used whenever calls are made to our various .net remoting functions (I know it's old!!). Having this object containing all these settings in memory saves us having the query the database constantly or load the data from a cache whenever queries are executed.
Settings in this "large" object are collections of data, from id, path, text, etc...
We want to move away from .net remoting to wcf and potentially get rid of our windows services and run the lot under IIS (and eventually Azure), but being stateless, I'm wondering how should we handle this?
1) What's the best method you can think of? From experience preferrably.
One suggestion that was made to me was to return all of this to the client, cache it and use only the relevant settings when making a wcf call.
2) Numerous services we have are polling services, constantly monitoring, databases, file locations, ftp locations, etc... How would you recommend to handle this in a stateless environment?? I can't see how this will be handled.
We use SQL Server, but I don't want to rely too heavily on the build-in features as we could potentially have to suppor the likes of mySQL & Oracle.
Thanks.
Thierry

You could store these settings in the AppSettings section of the config file (Web.config for IIS). Using the ConfigurationManager class, you can retrieve the relevant values as needed.
If you prefer to store a static instance of your settings object, suggest implementing a Singleton pattern for the same. Jon Skeet's article is a great starting point.
Hope this helps.

Related

Clarification on database caching

Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?
Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.

Do Different CRM Orgs Running On The Same Box Share The Same App Domain?

I'm doing some in memory Caching for some Plugins in Microsoft CRM. I'm attempting to figure out if I need to be concerned about different orgs populating the same cache:
// In Some Plugin
var settings = Singleton.GetCache["MyOrgSpecificSetting"];
// Use Org specific cached Setting:
or do I need to do something like this to be sure I don't cross contaminate settings:
// In Some Plugin
var settings = Singleton.GetCache[GetOrgId() + "MyOrgSpecificSetting"];
// Use Org specific cached Setting:
I'm guessing this would also need to be factored in for Custom Activities in the AsyncWorkflowService as well?
Great question. As far as I understand, you would run into the issue you describe if you set static data if your assemblies were not registered in Sandbox Mode, so you would have to create some way to uniquely qualify the reference (as your second example does).
However, this goes against Microsoft's best practices in Plugin/Workflow Activity development. Every plugin should not rely on state outside of the state that is passed into the plugin. Here is what it says on MSDN found HERE:
The plug-in's Execute method should be written to be stateless because
the constructor is not called for every invocation of the plug-in.
Also, multiple system threads could execute the plug-in at the same
time. All per invocation state information is stored in the context,
so you should not use global variables or attempt to store any data in
member variables for use during the next plug-in invocation unless
that data was obtained from the configuration parameter provided to
the constructor.
So the ideal way to managage caching would be to use either one or more CRM records (likely custom) or use a different service to cache this data.
Synchronous plugins of all organizations within CRM front-end run in the same AppDomain. So your second approach will work. Unfortunately async services are running in separate process from where it would not be possible to access your in-proc cache.
I think it's technically impossible for Microsoft NOT to implement each CRM organization in at least its own AppDomain, let alone an AppDomain per loaded assembly. I'm trying to imagine how multiple versions of a plugin-assembly are deployed to multiple organizations and loaded and executed in the same AppDomain and I can't think of a realistic way. But that may be my lack of imagination.
I think your problem lies more in the concurrency (multi-threading) than in sharing of the same plugin across organizations. #BlueSam quotes Microsoft where they seem to be saying that multiple instances of the same plugin can live in one AppDomain. Make sure multiple threads can concurrently read/write to your in-mem cache and you'll be fine. And if you really really want to be sure, prepend the cache key with the OrgId, like in your second example.
I figure you'll be able to implement a concurrent cache, so I won't go into detail there.

Should I make my CouchDB database server public-facing?

I'm new to CouchDb and am trying to comprehend how to properly make use of it. I'm coming from MongoDB where I would always write a web layer and put it in front of mongo so that I could allow users to access the data inside of it, etc. In fact, this is how I've used all databases for every web site that I've ever written. So, looking at Couch, I see that it's native API is HTTP and that it has built in things like OAuth support, and other features that hint to me that perhaps I should no longer have my code layer sitting in front of Couch, but instead write Views and things and just give out accounts to Couch to my users? I'm thinking in terms of like an HTTP-based API for a site of mine, or something that users would consume my data through. Opening up Couch like this seems odd to me, though. Is OAuth, in Couch's sense, meant more for remote access for software that I'd write and run internal to my own network "officially", or is it literally meant for the end users?
I know there might be things that could only be done through a code layer on top of CouchDB, like if you wanted additional non-database related things to occur during API requests, also. So thinking along those lines I think I will still need a code layer, anyway.
Dealer's choice.
Nodejitsu has a great writeup on this sort of topic here.
Not knowing your application specifics I'll take a broad approach...
Back-end
If you want to prevent users from ever seeing your database then make it back-end. You can pipe everything through something like node.js and present only what the user needs to see and they'll never know anything about the database.
See Resource View Presenter
Front-end
If you are not concerned about data security, you can host an entire app on CouchDB; see CouchApp. This approach has the benefit of using the replication mechanism to control publishing your site/data. The drawback here is that you will almost certainly run into some technical limitations that will require moving CouchDB closer to the backend.
Bl-end
Have the app server present the interface and the client pull the data from the database separately. This gives the most flexibility but can be a bag of hurt because even with good design this could lead to supportability and scalability issues.
My recommendation
Use CouchDB on the backend. If you need mobile clients to synchronize then use a secondary DB publicly exposed for this purpose and selectively sync this data to wherever it needs to go.
Simply put, no.
There's no way to secure Couch properly on a public facing site. There's no way to discriminate access at a fine enough granular level. If someone has access to any of the data, they have access to all of the data.
Not all data on a site is meant for public consumption, save for the most trivial of sites.

What's the best place for a database-backed, memory-resident global cache in an ASP.NET web server?

I have to cache an object hierarchy in-memory for performance reasons, which reflects a simple database table with columns (ObjectID, ParentObjectID, Timestamp) and view CurrentObjectHierarchy. I query the CurrentObjectHierarchy and use a hash table to cache the current parents of each object for quickly looking up the parent object ID, given any object ID. Querying the database table and constructing the cache is a 77ms operation on average, and ideally this refresh occurs only when a method in my database API is called that would change the hierarchy (adding/removing/reparenting an object).
Where is the best place for such a cache, if it must be accessed by multiple ASP.NET web applications, possibly running in different application pools?
Originally, I was storing the cache in a static variable in a C# dll shared by the different web applications. The problem, of course, is that while static variables can be accessed across threads, they cannot be accessed across processes, which is a problem when multiple web-apps are involved (possibly running in separate application pools). As a result... synchronized, thread-safe modifications to the object hierarchy cache in one application are not reflected in other applications, even though they are using the same code-base.
So I need a more global location for this cache. I cannot use static variables (as I just explained), session state (which is basically a per-user store), and application state (needs to be accessible across applications).
Potential places I've been considering are:
Some kind of global object storage within IIS itself, accessible from any thread in any application in any application pool (if such a place exists. Does it?)
A separate, custom web service that manages an exclusive cache.
Right now, I think the BEST solution is SQL CLR integration, because:
I can keep my current design using static variables
It's a separate service that already exists, so I don't have to write a custom one
It will be running in a single process (SQL Server), so the existing lock-based synchronization will work fine
The cache would be setting as close as possible to the data structures it represents!
I would embed the hierarchy-traversing methods in the SQL CLR DLL, so that I could make a single SQL call where I would normally make a regular method call. This all depends on SQL Server running in a single process and the CLR being loaded into that process, which I think is the case. What do you think of this? Can you see anything obviously wrong with this idea that I may be missing? Is this not an awesome idea?
EDIT:
After looking more closely, it seems that different ASP.NET applications actually run in the same process, but are isolated by AppDomains. If I could find a way to share and synchronize data across AppDomains, that would be very very useful. I'm reading about .NET Remoting now.
Microsoft is working on a distributed caching framework: Velocity. However, the latest release is a CTP3 version, so it may not be production ready...

Starting out, any suggestions?

I have started working in C# for almost a few months and I am looking for something more challenging and interesting. I use a media player called media monkey that supports custom vb scripts, well I made one that writes a file to a dir that has the current song playing, and is updated every time a new song is playing by rewriting what was there before.
Now I want to add this information to a database and keep a record of this and possibly add the information on my home page. I know I can hack a way for it to work, but I want to know what would be the "professional way" of doing things.
I came up with the following and got stuck. I would need an ODBC driver to connect to a database which seems messy, would a web service work? How would that work? Can a VbScript call a dll file to call upon a web service to modify data on a seperate server? Is that safe to do?
Many professional C# apps are n-tier. In your case, you would probably layer it like this:
On the server:
-Database Store
-Database Access/Business layer(sometimes two distinct components, depending on how complex the app is)
-Web Service
On the client:
-Web Service Client
-Any other layers to support client functionality.
So the Database Store would be something like some tables in an Oracle or Microsoft SQL Server, and would on your server.
Database Access/Business layer would be your code that retrieves and stores data to/from your database. It might also contain business objects, which are basically classes that have properties representing your data from your database. The benefit of the data access layer is that sometimes reading and writing to a database can require specialized code, and you don't want that code sprinkled throuought your application. So instead you can call functions in your data access layer that loads needed data into objects, so the rest of your application is just interacting with a regular old .NET object/class. These are called POCOs, which stands for something like Plan Old CLR Object. There are lots of variations on this of course, as people have taken different approaches to the problem of isaloting database access. Also it serves the purpose of minimizing breaking changes whenever the database changes. Since the database access logic is not sprinkled throughout the app, then there are fewer places that need to be updated if the database changes (such as adding new columns to a table or changing a name).
Sometimes the business layer will be it's own layer, and would contain most of the "logic" of the application. It would sit between the data access and web service layers. Using concepts from Service Oriented Architecture (SOA), you might have an authentication service, and a web request handling service. These services are a lot like a class that is always instantiated, there waiting to process requests. Your web request handling service would take a request, and maybe first call into the authentication service to verify credentials before honoring the request. SOA is one of those things I think should be used only when appropriate. It some cases just using Object Oriented techniques will give you the same benefits. Not always though. SOA, when done right, is more scalable, so it really depends on whether SOA offers you additional benefits that you need.
The Webservice would be responsible for receiving requests from the web, parsing/interpreting them, and acting on those requests by making calls into your business layer to update or retrieve data.
So the concept here would be that you could have many users of your service who publish their song updates through your service.
Your client would have a "web service client" layer which would be responible for formatting requests into messages, sending them to the web service, and retrieving messages from the web service. You would put very little application "logic" in your web service layer.
Now all this is probably overkill and inefficient for what you are wanting to do since you just want something for yourself, but it's the basic anatomy of a lot of webservice applications and would be a good learning exercise. The whole purpose of the layers is decoupling and simplicity. While more layers/components makes the application overall more complex, it means each component is simpler. This means it's easier to wrap your head around problems when you are only dealing with one component which interacts with only a couple other components(the sourounding layer). So there is a careful balance between few components and many components. Too few and they become monolithic and difficult to manage. Too many, and they become intertwined in complex ways. I have heard it said something along the lines of "If a class is getting too big and too complex, then split it up into a few more classes". In essence, don't start subdividing stuff for the heck of it just because it sounds like the right thing to do. Evaluate how complex your component is going to be before deciding if you want to split it up. Sometimes for simple cases your have a layer serving more than one purpose, for the sake of getting it done faster and making the overall design simpler. The point is, apply these concepts where appropriate. You will learn what is appropriate with experience, and you obviously understand that you can learn the most by "doing".
"Can vbscript call a COM component?" You can compile .NET DLLs with COM support. Many older things can call COM dlls.
I googled: vbscript dll
and got this: VB Script and DLLs
"Is that safe to do?" Your webservice will be where you would be most concerned with security. It's safe only if you design with security in mind and don't screw up. We all screw up sometimes though, which means there is no guarantee of it being perfectly secure.

Resources