What are the typical ways to cache the result of a relational database query using Redis? - caching

What do developers commonly use as the key and value to cache the result from a SQL query into Redis? For example, if I have a Users table, and I want to cache the results from the query:
SELECT name, age FROM Users
1) Which Redis data structure should I use? Should I just have a single Key for the query and store the entire object returned by the database as the Value as such:
{ key: { object returned by database } }
Or should I use Redis' List data structure and loop through the rows individually and push them into the List as such:
{ key: [ ... ]}
Wouldn't this add computation time of O(N)? How is this more effective than just simply storing the object returned by the database?
Or should I use Redis' Hash Map data structure and loop through the rows individually and set a unique Key for each row with its corresponding attributes as such:
{ key1: {name: 'Bob', age: 25} }, { key2: {name: 'Sally', age: 15} }, ...
2) What would be a good rule of thumb with regards to the Key? From my understanding, some people just use the SQL query as the Key? But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)? Is this the best way to do it? If you are using an ORM, do you still use the SQL query as the key?

This is nicely analyzed in the Database Caching Strategies Using Redis whitepaper, by AWS.
Here the options discussed in the document. What is best is really a design decision based on tradeoffs you have to make for your specific use-case.
Cache the Database SQL ResultSet
Cache a serialized ResultSet object that contains the fetched database
row.
Pro: When data retrieval logic is abstracted (e.g., as in a Data Access Object or DAO layer), the consuming code expects only a
ResultSet object and does not need to be made aware of its
origination. A ResultSet object can be iterated over, regardless of
whether it originated from the database or was deserialized from the
cache, which greatly reduces integration logic. This pattern can be
applied to any relational database.
Con: Data retrieval still requires extracting values from the ResultSet object cursor and does not further simplify data access; it
only reduces data retrieval latency.
Cache Select Fields and Values in a Custom Format
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: This approach is easy to implement. You essentially store specific retrieved fields and values into a structure such as JSON or
XML and then SET that structure into a Redis string. The format you
choose should be something that conforms to your application’s data
access pattern.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis string and database
results). In addition, you are required to parse through the entire
structure to retrieve the individual attributes associated with it.
Cache Select Fields and Values into an Aggregate Redis Data Structure
Cache the fetched database row into a specific data structure that can
simplify the application’s data access.
Pro: When converting the ResultSet object into a format that simplifies access, such as a Redis Hash, your application is able to
use that data more effectively. This technique simplifies your data
access pattern by reducing the need to iterate over a ResultSet object
or by parsing a structure like a JSON object stored in a string. In
addition, working with aggregate data structures, such as Redis Lists,
Sets, and Hashes provide various attribute level commands associated
with setting and getting data, eliminating the overhead associated
with processing the data before being able to leverage it.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis Hash and database results).
Cache Serialized Application Object Entities
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: Use application objects in their native application state with simple serializing and deserializing techniques. This can
rapidly accelerate application performance by minimizing data
transformation logic.
Con: Advanced application development use case
Regarding 2)
What would be a good rule of thumb with regards to the Key?
Using the SQL query as the Key is OK for as long as you are sure it is unique. Add prefixes if there is a risk of not-uniqueness. You may have other databases with the same table names, leading to the same queries. Also make them invariant: all lower case or upper case. Redis keys are case-sensitive.
But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)?
Not necessarily, it comes down to what processing you are doing with the query. Chances are some are best stored as raw entire object for processing, some as JSON-stringified object to return quickly to the client, some as rows, etc. The best is to adapt accordingly.
Is this the best way to do it?
Not necessarily.
If you are using an ORM, do you still use the SQL query as the key?
You may if your ORM easily exposes the SQL Query programmatically, and it is consistent.
I wouldn't get fixed on the idea of using the SQL Query as key, use something you can be sure it is consistent, it will optimize your processing, and you'll have clear rules to invalidate. It could be the method call with parameters, the web API call, etc.

Related

Is it bad practice to store JSON members with Redis GEOADD?

My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.

Caching Strategy/Design Pattern for complex queries

We have an existing API with a very simple cache-hit/cache-miss system using Redis. It supports being searched by Key. So a query that translates to the following is easily cached based on it's primary key.
SELECT * FROM [Entities] WHERE PrimaryKeyCol = #p1
Any subsequent requests can lookup the entity in REDIS by it's primary key or fail back to the database, and then populate the cache with that result.
We're in the process of building a new API that will allow searches by a lot more params, will return multiple entries in the results, and will be under fairly high request volume (enough so that it will impact our existing DTU utilization in SQL Azure).
Queries will be searchable by several other terms, Multiple PKs in one search, various other FK lookup columns, LIKE/CONTAINS statements on text etc...
In this scenario, are there any design patterns, or cache strategies that we could consider. Redis doesn't seem to lend itself particularly well to these type of queries. I'm considering simply hashing the query params, and then cache that hash as the key, and the entire result set as the value.
But this feels like a bit of a naive approach given the key-value nature of Redis, and the fact that one entity might be contained within multiple result sets under multiple query hashes.
(For reference, the source of this data is currently SQL Azure, we're using Azure's hosted Redis service. We're also looking at alternative approaches to hitting the DB incl. denormalizing the data, ETLing the data to CosmosDB, hosting the data in Azure Search but there's other implications for doing these including Implementation time, "freshness" of data etc...)
Personally, I wouldn't try and cache the results, just the individual entities. When I've done things like this in the past, I return a list of IDs from live queries, and retrieve individual entities from my cache layer. That way the ID list is always "fresh", and you don't have nasty cache invalidation logic issues.
If you really do have commonly reoccurring searches, you can cache the results (of ids), but you will likely run into issues of pagination and such. Caching query results can be tricky, as you generally need to cache all the results, not just the first "page" worth. This is generally very expensive, and has high transfer costs that exceed the value of the caching.
Additionally, you will absolutely have freshness issues with caching query results. As new records show up, they won't be in the cached list. This is avoided with the entity-only cache, as the list of IDs is always fresh, just the entities themselves can be stale (but that has a much easier cache-expiration methodology).
If you are worried about the staleness of the entities, you can return not only an ID, but also a "Last updated date", which allows you to compare the freshness of each entity to the cache.

Storing metadata and raw data separately

Is there an advantage to storing the metadata (or indexing data) for a document/*LOB separate from the raw data.
For instance having a table/collection/bucket with index on (name,school)
ID: 123
name: Johny
School: Harvard
Transcript: /*2MB text/binary*/
vs
Metadata
ID: 123
name: Johny
School: Harvard
Data
ID: 123
Transcripts: /*2MB text/binary*/
Let's assume mongodb, although it's really db agnostic perhaps.
db.firstModel.find({},{transcripts:0}) vs db.secondModel.find()
Additionally if we have aggregation/grouping on the metadata, would the heavy payload in transcripts weigh it down (even though the aggregation is on other fields)? is it better to aggregate on the metadata collection separately, then retrieve by id from the data collection? Or is it better to respect the database design (keeping everything coupled in a single document)?
In Couchbase, if it works for your use case, an option might be to have the object ID for your 2MB document something like harvard::johny::123. Every object would have such a pattern for each object ID that is used consistently in your application. Therefore your application easily piece together the object ID. Then you do not have to query or use views. You know it is harvard and johny and his 123rd object, you can just get it by ID. You already know the answer, no querying and so Couchbase will be very fast.
That being said, there may be other meta data that you want to keep in that metadata object and you want to index on and then yes, in Couchbase it might be better to break out the documents like you suggest. In Couchbase it might even be better to put them in separate buckets so the indexers are only looking at things it will index.
For an example that may not be entirely applicable to your use case, but should give you an idea of what is possible go here
All of that being said, from experience I do not like keeping larger object like you suggest in a DB long term, regardless of the DB. From an operational perspective it is terrible. You are storing what amounts to static data in a layer that needs to be very performant, with usually expensive storage and having to backup those objects over time. They become a boat anchor around your neck after a few months/years. I suggest keeping the meta-data in a fast performing system like Couchbase (cache+persistence with replication, etc) that also has a pointer to the large objects in something that is best for dishing out large static objects like HDFS, Amazon S3, etc.

How to query the session in ASP.NET MVC with a dynamic query

I want to store some user data in memory, like some in-memory noSQL database.
But later on I want to query that data with a dynamic query constructed from the user. That query is stored in a classic DB like a string, so when I need to query the data stored in memory I would like to parse that string and construct the desired query (by some known rules).
I looked at Redis and I figured out it isn't maintained for Windows anymore, I have also looked at RavenDB but it's main query language is LINQ, even though it can be created dynamic Lucene Query.
Can you suggest me another in memory DB that work with ASP.NET and can be queried with a dynamically created query? Maybe I haven't seen all the options.
I prefer name-value or JSON based noSQL so it's schema can be easyly modified without the constraints of the relation type of DBs
I would suggest to simply use sqlite. It can be easily used as an in-memory database (just open the database using ":memory:" instead of a file name).
You can use a simple 2 columns table with a primary key to emulate a key/value store.
Here are a few links you might find helpful:
http://www.sqlite.org/inmemorydb.html
How to create asp.net web application using sqlite

Serializing objects as BLOBs in Oracle

I have a HashMap that I am serializing and deserializing to an Oracle db, in a BLOB data type field.
I want to perform a query, using this field.
Example, the application will make a new HashMap, and have some key-value pairs.
I want to query the db to see if a HashMap with this data already exists in the db.
I do not know how to do this, it seems strange if i have to go to every record in the db, deserialize it, then compare, Does SQL handle comparing BLOBs, so i could have...select * from PROCESSES where foo = ?....and foo is a BLOB type, and the ? is an instance of the new HashMap?
Thanks
Here's an article for you to read: Pounding a Nail: Old Shoe or Glass Bottle
I haven't heard much about your application's underlying architecture, but I can tell you immediately that there is never a reason why you should need to use a HashMap in this way. Its a bad technique, plain and simple.
The answer to your question is not a clever Oracle query, its a redesign of your application's architecture.
For a start, you should not serialize a HashMap to a database (more generally, you shouldn't serialize anything that you need to query against). Its much easier to create a table to represent hashmaps in your application as follows:
HashMaps
--------
MapID (pk int)
Key (pk varchar)
Value
Once you have the content of your hashmaps in your database, its trivial to query the database to see if the data already exists or produce any other kind of aggregate data:
SELECT Count(*) FROM HashMaps where MapID = ? AND Key = ?
Storing serialized objects in a database is almost always a bad idea, unless you know ahead of time that you don't need to query against them.
How are you serializing the HashMap? There are lots of ways to serialize data and an object like a HashMap. Comparing two maps, especially in serialized form, is not trivial, unless your serialization technique guarantees that two equivalent maps always serialize the same way.
One way you can get around this mess is to use XML serialization for some objects that rarely need to be queried. For example, where I work we have a log table where a certain log message is stored as an XML file in a CLOB field. This xml data represents a serialized Java object. Normally we query against other columns in the record, and only read/write the blob in single atomic steps. However once or twice it was necessary to do some deep inspection of the blob, and using XML allowed this to happen (Oracle supports querying XML in varchar2 or CLOB fields as well as native XML objects). It's a useful technique if used sparingly.
Look into dbms_crypto.hash to make a hash of your blob. Store the hash alongside the blob and it will give you something to narrow down the search to something manageable. I'm not recommending storing the hash map, but this is a general technique for searching for an exact match between blobs.
See also SQL - How do you compare a CLOB
i cannot disagree, but i'm being told to do so.
i appreciate your solution, and that's sort of what i had previously.
thanks
I haven't had the need to compare BLOBs, but it appears that it's supported through the dbms_lob package.
See dbms_lob.compare() at http://www.psoug.org/reference/dbms_lob.html
Oracle can have new data types defined with Java (or .net on windows) you could define a data type for your serialized object and define how queries work on it.
Good lack if you try this...
If you serialize your data to xml, and store the data in an xml you can then use xpaths within your sql query. (Sorry as I am more of a SqlServer person, I don’t know the details of how to do this in Oracle.)
If you EVERY need to update only part of the serialized data don’t do this.
Likewise if any of the data is pointed to by other data or points to other data don’t do this.

Resources