When do we really need a key/value database instead of a key/value cache server? - caching

Most of the time,we just get the result from database,and then save it in cache server,with an expiration time.
When do we need to persistent that key/value pair,what's the significant benifit to do so?

If you need to persist the data, then you would want a key/value database. In particular, as part of the NoSQL movement, many people have suggested replacing traditional SQL databases with Key/Value pair databases - but ultimately, the choice remains with you which paradigm is a better fit for your application.

Use a key/value database when you are using a key/value cache and you don't need a sql database.
When you use memcached/mysql or similar, you need to write two sets of data access code - one for getting objects from the cache, and another from the database. If the cache is your database, you only need the one method, and it is usually simpler code.
You do lose some functionality by not using SQL, but in a lot of cases you don't need it. Only the worst applications actually leave constraint checking to the database. Ad-hoc queries become impractical at scale. The occasional lost or inconsistent record simply doesn't matter if you are working with tweets rather than financial data. How do you justify the added complexity of using a SQL database?

Related

Is it bad practice to store JSON members with Redis GEOADD?

My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.

How to store two different cache "tables" in Redis under the same database/index?

Trying to build a data set of two cache tables (which are currently stored in SQL Server) - one is the actual cache table (CacheTBL); the other is the staging table (CacheTBL_Staging).
The table structure has two columns - "key", "value"
So I'm wondering how to implement this in Redis as I'm a total noob to this NoSQL stuff. Should I use a SET or LIST? Or something else?
Thank you so much in advance!
You need to decide whether you want separate REDIS keys for all entries using SET and GET, or put them into hashes with HSET and HGET. If you use the first approach, your keys should include a prefix to distinguish between main and staging. If you use hashes, this is not necessary, because the hash name can also be used to distinguish these. You probably also need to decide how you want to check for cache validity, and what your cache flushing strategy should be. This normally requires some additional data structures in REDIS.

Searching/selecting query in cache

I have been using cache for a long time. We store data against some key and fetch it from cache whenever required. I know that StackOverflow and many other sites heavily rely on cache. My question is do they always use key-value mechanism for caching or do they form some sql like query within a cache? For instance, I want to view last week report. This report's content will vary each day. Do i need to store different reports against each day (where day as a key) or can I get this result from forming some query that aggregate result across different key? Does any caching product (like redis) provide this functionality?
Thanks In Advance
Cache is always done as a key-value hash table. This is how it stays so fast. If you're doing querying then you're not doing cache.
What you may be trying to ask is... you could have in your database a table that contains agregated report data. And you could query against that pre-calculated table.
One of the reasons for cache (e.g. memcached ) being fast is its simplicity of data access and querying protocol.
The more functionality you add, more tradeoff you will have to do on the efficiency part. A full fledged SQL engine in a "caching" database is not a good design. Though you can utilize a data structures oriented database like Redis to design your cache data to suit your querying needs. For example: one set or one hash for each date.
A step further, you can use databases like MongoDb , or memsql which are pretty fast and have rich querying support.So an aggregation report once a while won't be an issue.
However, as a design decision, you will have to accept that their caching throughput will not be as much as memcached or redis.

Best strategy for retrieving large dynamically-specified tables on an ASP.NET page

Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.

Best-performing method for associating arbitrary key/value pairs with a table row in a Postgres DB?

I have an otherwise perfectly relational data schema in place for my Postgres 8.4 DB, but I need the ability to associate arbitrary key/value pairs with several of my tables, with the assigned keys varying by row. Key/value pairs are user-generated, so I have no way of predicting them ahead of time or wrangling orderly schema changes.
I have the following requirements:
Key/value pairs will be read often, written occasionally. Reads must be reasonably fast.
No (present) need to query off of the keys or values. (But it might come in handy some day.)
I see the following possible solutions:
The Entity-Attribute-Value pattern/antipattern. Annoying, but the annoyance would be generally offset by my ORM.
Storing key/value pairs as serialized JSON data on a text column. A simple solution, and again the ORM comes in handy, but I can kiss my future self's need for queries good-bye.
Storing key/value pairs in some other NoSQL db--probably a key/value or document store. ORM is no help here. I'll have to manage the separate queries (and looming data integrity issues?) myself.
I'm concerned about query performance, as I hope to have a lot of these some day. I'm also concerned about programmer performance, as I have to build, maintain, and use the darned thing. Is there an obvious best approach here? Or something I've missed?
That's precisely what the hstore datatype is for in PostgreSQL.
http://www.postgresql.org/docs/current/static/hstore.html
It's really fast (you can index it) and quite easy to handle. The only drawback is that you can only store character data, but you'd have that problem with the other solutions as well.
Indexes support "exists" operator, so you can query quite quickly for rows where a certain key is present, or for rows where a specific attribute has a specific value.
And with 9.0 it got even better because some size restrictions were lifted.
hstore is generally good solution for that, but personally I prefer to use plain key:value tables. One table with definitions, other table with values and relation to bind values to definition, and relation to bind values to particular record in other table.
Why I'm against hstore? Because it's like a registry pattern. Often mentioned as example of anti pattern. You can put anything there, it's hard to easy validate if it's still needed, when loading a whole row (in ORM especially), the whole hstore is loaded which can have much junk and very little sense. Not mentioning that there is need to convert hstore data type into your language type and convert back again when saved. So you get some overhead of type conversion.
So actually I'm trying to convert all hstores in company I'm working for into simple key:value tables. It's not that hard task though, because structures kept here in hstore are huge (or at least big), and reading/writing an object crates huge overhead of function calls. Thus making a simple task like that "select * from base_product where id = 1;" is making a server sweat and hits performance badly. Want to point that performance issue is not because db, but because python has to convert several times results received from postgres. While key:value is not requiring such conversion.
As you do not control data then do not try to overcomplicate this.
create table sometable_attributes (
sometable_id int not null references sometable(sometable_id),
attribute_key varchar(50) not null check (length(attribute_key>0)),
attribute_value varchar(5000) not null,
primary_key(sometable_id, attribute_key)
);
This is like EAV, but without attribute_keys table, which has no added value if you do not control what will be there.
For speed you should periodically do "cluster sometable_attributes using sometable_attributes_idx", so all attributes for one row will be physically close.

Resources