Loading data to shared memory from database tables - shared-memory

Any idea about loading the data from database to shared memory, the idea is to fasten the data retrieval from frequently used tables?

the server will automatically cache frequently used tables. So I would no optimize from the server side. Now, if the client is querying remotely you might consider coying the data to a local database (like the free SQL Express).

You are talking about cache.
it is easily implemented.
but there are some tricks you need to remember:
You will need to log changes in the underlying table - and reload the cache when they happens.
(poll a change table).
Some operation might be faster inside the database then in your own memory structure.
(If you intereseted in a fast data access with no work at all there are some in-memory Databases that can do the trick for you).

Related

Difference between In-Memory cache and In-Memory Database

I was wondering if I could get an explanation between the differences between In-Memory cache(redis, memcached), In-Memory data grids (gemfire) and In-Memory database (VoltDB). I'm having a hard time distinguishing the key characteristics between the 3.
Cache - By definition means it is stored in memory. Any data stored in memory (RAM) for faster access is called cache. Examples: Ehcache, Memcache Typically you put an object in cache with String as Key and access the cache using the Key. It is very straight forward. It depends on the application when to access the cahce vs database and no complex processing happens in the Cache. If the cache spans multiple machines, then it is called distributed cache. For example, Netflix uses EVCAche which is built on top of Memcache to store the users movie recommendations that you see on the home screen.
In Memory Database - It has all the features of a Cache plus come processing/querying capabilities. Redis falls under this category. Redis supports multiple data structures and you can query the data in the Redis ( examples like get last 10 accessed items, get the most used item etc). It can span multiple machine and is usually very high performant and also support persistence to disk if needed. For example, Twitter uses Redis database to store the timeline information.
I don't know about gemfire and VoltDB, but even memcached and redis are very different. Memcached is really simple caching, a place to store variables in a very uncomplex fashion, and then retrieve them so you don't have to go to a file or database lookup every time you need that data. The types of variable are very simple. Redis on the other hand is actually an in memory database, with a very interesting selection of data types. It has a wonderful data type for doing sorted lists, which works great for applications such as leader boards. You add your new record to the data, and it gets sorted automagically.
So I wouldn't get too hung up on the categories. You really need to examine each tool differently to see what it can do for you, and the application you're building. It's kind of like trying to draw comparisons on nosql databases - they are all very different, and do different things well.
I would add that things in the "database" category tend to have more features to protect and replicate your data than a simple "cache". Cache is temporary (usually) where as database data should be persistent. Many cache solutions I've seen do not persist to disk, so if you lost power to your whole cluster, you'd lose everything in cache.
But there are some cache solutions that have persistence and replication features too, so the line is blurry.
An in-memory Cache is a common query store therefore relieves DB of read Workloads. Common examples of in-memory cache are Redis cache. An example could be Web site storing popular searches made by clients thereby relieving the DB of some load.
In-memory Cache provides query functionality on top of caching (storing session data in RAM (temporary storage)).
Memcache falls in the temp store caching category.

storing data in secondary database

Our application (java,spring, hibernate) uses postgress to store data.
We are looking to add an analysis engine to the application. I want to explore using a nosql db to run the analysis on. This is an attempt at learning the nosql a bit also to free the main application activity from performance penalty (as much as possible).
So, I want the data changes to also synch to the nosql db (in addition to postgres). Any synch mechanism will affect the performance of the main data/transaction activity.
Is it a good idea to push the data changes to a message bus and free the main transaction as early as possible ? Can anyone point me to frameworks/technologies/ideas that address this issue of same data going to two different data stores.
The simplest solution would be sending data to a Postgres read replica and running your analytics queries on that. The performance impact is minimal and this would save a lot of time compared to alternative approaches.
Unless you really know what you are doing, I would avoid NoSQL for this kind of application. If your dataset is too big for a Postgres read replica, you might want to use Redshift, which is a columnar datastore that is optimized for types of analytics queries typically performed.

How is memcached updated?

I have never used memcached before and I am confused on the following basic question.
Memcached is a cache right? And I assume we cache data from a DB for faster access. So when the DB is updated who is responsible to update the cache? Our code is does memcached "understand" when the DB has been updated?
Memcached is a cache right? And I assume we cache data from a DB for
faster access
Yes it is a cache, but you have to understand that a cache speed up the access when you are often accessing same data. If you access thousand times data/objects which are always different each other a cache doesn't help.
To answer your question:
So when the DB is updated who is responsible to update the cache?
Always you but you don't have to worry about if you are doing the right thing.
Our code is does memcached "understand" when the DB has been updated?
memcached doesn't know about your database. (actually the client doesn't know even about servers..) So when you use an object of your database you should check if is present in cache, if not you put in cache otherwise you are fine.. that is all. When the moment comes memcache will free the memory used by old data, or you can tell memcached to free data after a time you choose(read the API for details).
You are responsible to update the cache (or some plugin).
What happens is that the query is compressed to some key features and these are hashed. This is tested against the cache. If the value is in the cache, the data is returned directly from cache. Otherwise the query is performed, stored in cache and returned to the user.
In pseudo code:
key = query_key(your_sql_query)
if key in cache:
return cache.get(key)
else:
results = execute(your_sql_query)
cache.set(key, results, time_to_live)
return results.
The cache is cleared once in a while, you can give a time to live to a key, then your cached results are refreshed.
This is the most simple model, but can cause some inconsistencies.
One strategy is that if your code is also the only app that updates data, then your code can also refresh memcached as a second step after it has updated the database. Or at least evict the stale data from memcached, so the next time an app wants to read it, it will be forced to re-query the current data from the database and restore that latest data to memcached.
Another strategy is to store data in memcached with an expiration time, so memcached automatically purges that data element after a certain time. You pick the expiration time, based on your knowledge of how frequently the data might be updated, and how tolerant your app is of reading stale data.
So ultimately, you are the one responsible for putting data into memcached. Only you know what data is worth storing in the cache, what format you want to store it in, how frequently you expect to query it, and when to refresh it. You make this judgment on a case-by-case basis, because you know better than any automatic system the likely behavior of your data and your app.

memcached, business or data?

Is the cache part of the business or data layer in a simple LAMP stack?
It's cross cutting concern that may be applied to every piece of data in Business, Data or any other Layer that contains and works with data.
memcached is not part of a simple LAMP stack. The basic LAMP app takes its data directly from the database and templates it into the view. The simple application (and even many complicated ones) don't need any more than that.
You add memcached to an application because you've got data that is too slow to compute to do it all live on-the-fly. Whilst certainly memcache counts as being in the data layer, when you are relying on memcache you lose the consistency of a database server, which means you are usually going to need to include some application-specific rules for how long data is cached based on the business logic of your app. So sure, it impinges on the business layer. And if the stuff you're caching is pre-populated views (eg HTML), then it's touching the presentation layer too.
This wide-ranging and not-easily-encapsulated nature is why you shouldn't introduce memcache to an application until you really need to. Don't assume that it's a necessary foundation for performance; remember your database also has table and query caches you may be able to leverage without having to give up consistency and add cache expiry complexity.
Memcached sits between a database and webserver. Its a cache, but more importantly its a explicit cache. So things dont get on it on its own. You have to "put" and "get" from it. The biggest advantage is, that it is close to 10 times faster than a database. And if you fetch data from memcached, you wont need to make a sql call, thus saving your database some cycles to do something more important.
So a book catalog website is ideal candidate 80% reads and 20% writes. For more information <here>.

In Memory Caching of Dataset

I am planning to do some in memory caching of my data for operations in my web service. This data would be basically lookup values which do not change frequently. I was planning to get all that data in datasets (multiple tables) and store them till the data does not change on DB side. This is so because some of my data never changes, where some may change quite frequently. Any idea?
I would probably cache it at the DataTable level, then each table could have it's own caching rules (expiration time, last updated, etc, etc).

Resources