Due to the limitation of not being able to evict entries based on a partial key, I am thinking of a workaround using the cache name as my partial key and evicting all (there would only be one) entries in the cache. For example, let's say there are 2 key-value pairs like so:
"123#name1" -> value1,
"124#name2" -> value2
Ideally, at the time of eviction, I would like to remove all keys that contain the string "123". However, as this is not supported, the workaround I'm thinking of is to have the following:
"123" cache: "name1" -> value1
"124" cache: "name2" -> value2
Then at eviction, I would simply specify to remove all keys in "123" cache
The downside of this of course is that there would be a lot of different caches. Is there any performance penalty to this?
From reading this, it seems Redis at least only uses the cache name as a prefix. So it is not creating multiple separate caches underneath it. But I would like to verify my understanding.
I am also looking to use Redis as my underlying cache provider if that helps.
You can use few approaches to overcome this :
Use grouped data structures like sets, sorted sets and hashes : Each one of them supports really high number of member elements. So you can use them to store your cache items,and do the relevant lookups. However, do go through the performance difference ( would be very small ) on this kind of lookup compared to a direct key-value lookup.
Once you want to evict a group of cache keys which are of similar type, you just remove that data structure key from redis.
Use redis database numbers : You would need to edit redis.conf to increase maximum number of redis database numbers possible. Redis databases are just numbers which provide namespacing in which your key-values can lie. To group similar items, you would put them in the same database number, and empty the database with a single command whenever you want to flush that group of keys.
The caveat here is that, though you would be able to use same redis connection, you would have to switch databases through redis SELECT command
Related
My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.
We have an existing API with a very simple cache-hit/cache-miss system using Redis. It supports being searched by Key. So a query that translates to the following is easily cached based on it's primary key.
SELECT * FROM [Entities] WHERE PrimaryKeyCol = #p1
Any subsequent requests can lookup the entity in REDIS by it's primary key or fail back to the database, and then populate the cache with that result.
We're in the process of building a new API that will allow searches by a lot more params, will return multiple entries in the results, and will be under fairly high request volume (enough so that it will impact our existing DTU utilization in SQL Azure).
Queries will be searchable by several other terms, Multiple PKs in one search, various other FK lookup columns, LIKE/CONTAINS statements on text etc...
In this scenario, are there any design patterns, or cache strategies that we could consider. Redis doesn't seem to lend itself particularly well to these type of queries. I'm considering simply hashing the query params, and then cache that hash as the key, and the entire result set as the value.
But this feels like a bit of a naive approach given the key-value nature of Redis, and the fact that one entity might be contained within multiple result sets under multiple query hashes.
(For reference, the source of this data is currently SQL Azure, we're using Azure's hosted Redis service. We're also looking at alternative approaches to hitting the DB incl. denormalizing the data, ETLing the data to CosmosDB, hosting the data in Azure Search but there's other implications for doing these including Implementation time, "freshness" of data etc...)
Personally, I wouldn't try and cache the results, just the individual entities. When I've done things like this in the past, I return a list of IDs from live queries, and retrieve individual entities from my cache layer. That way the ID list is always "fresh", and you don't have nasty cache invalidation logic issues.
If you really do have commonly reoccurring searches, you can cache the results (of ids), but you will likely run into issues of pagination and such. Caching query results can be tricky, as you generally need to cache all the results, not just the first "page" worth. This is generally very expensive, and has high transfer costs that exceed the value of the caching.
Additionally, you will absolutely have freshness issues with caching query results. As new records show up, they won't be in the cached list. This is avoided with the entity-only cache, as the list of IDs is always fresh, just the entities themselves can be stale (but that has a much easier cache-expiration methodology).
If you are worried about the staleness of the entities, you can return not only an ID, but also a "Last updated date", which allows you to compare the freshness of each entity to the cache.
Trying to build a data set of two cache tables (which are currently stored in SQL Server) - one is the actual cache table (CacheTBL); the other is the staging table (CacheTBL_Staging).
The table structure has two columns - "key", "value"
So I'm wondering how to implement this in Redis as I'm a total noob to this NoSQL stuff. Should I use a SET or LIST? Or something else?
Thank you so much in advance!
You need to decide whether you want separate REDIS keys for all entries using SET and GET, or put them into hashes with HSET and HGET. If you use the first approach, your keys should include a prefix to distinguish between main and staging. If you use hashes, this is not necessary, because the hash name can also be used to distinguish these. You probably also need to decide how you want to check for cache validity, and what your cache flushing strategy should be. This normally requires some additional data structures in REDIS.
I have a scenario where I am dumping huge amount of data from Google Big Query to Redis SET data structure to get better response time. I need SET UNION operation to be done over millions of keys. I have tested with few thousands of keys and working fine. The question is, there is any limit on number of keys can be supplied to a SUNION command at a time? Is it really SUNION Key1 Key2 Key3 ..... KeyN?
Consider I have enough system capacity.
[...] over millions of keys
There's no statement in Redis' documentation about a limitation on how many keys can be provided in a single sunion command.
BTW, I doubt that doing such operation could be a good idea in Redis. Remember that Redis will get blocked until this operation and, and no other operation will be executed until the sunion ends.
My best advise will be you should do it using many sunionstore commands, and later get all results from many sets like if the whole sets would be pages of the result of sunion millions of keys.
sunionstore key:pages:1 key1 keyN
...and later you would use some iterator in your application layer to iterate over all generated pages.
I have a situation where I could really benefit from having system like memcached, but with the ability to store (per each key) sorted list of elements, and modifying the list by addition of values.
For example:
something.add_to_sorted_list( 'topics_list_sorted_by_title', 1234, 'some_title')
something.add_to_sorted_list( 'topics_list_sorted_by_title', 5436, 'zzz')
something.add_to_sorted_list( 'topics_list_sorted_by_title', 5623, 'aaa')
Which I then could use like this:
something.get_list_size( 'topics_list_sorted_by_title' )
// returns 3
something.get_list_elements( 'topics_list_sorted_by_title', 1, 10 )
// returns: 5623, 1234, 5436
Required system would allow me to easily get items count in every array, and fetch any number of values from the array, with the assumption that the values are sorted using attached value.
I hope that the description is clear. And the question is relatively simple: is there any such system?
Take a look at MongoDB. It uses memory mapped files, so is incredibly fast and should perform at a comparative level to MemCached.
MongoDB is a schema-less database that should support what you're looking for (indexing/sorting)
Redis supports both lists and sets. You can disable disk saving and use it like Memcached instead of going for MongoDB which would save data to disk.
MongoDB will fit. What's important it has indexes, so you can add an index by title for topics collection and then retrieve items sorted by the index:
db.topics.ensureIndex({"title": 1})
db.topics.find().sort({"title": 1})
why not just store an array in memcached? at least in python and PHP the memcached APIs support this (i think python uses pickle but i dont recall for sure).
if you need permanent data storage or backup, memcacheDB uses the same API.
basic pseudopython example:
getting stored data
stored=cache.get(storedDataName)
initialize list if you have not stored anything previously
if(stored==None):
stored={}
----------------
finding stored items
try:
alreadyHaveItem=stored[itemKey]
except KeyError:
print 'no result in cached'
----------------
adding new items
for item in newItemsDict:
stored[item]=newItems[item]
----------------
saving the results in cache
cache.set(storedDataName,stored,TTL)