Organizing memcache keys - caching

Im trying to find a good way to handle memcache keys for storing, retrieving and updating data to/from the cache layer in a more civilized way.
Found this pattern, which looks great, but how do I turn it into a functional part of a PHP application?
The Identity Map pattern: http://martinfowler.com/eaaCatalog/identityMap.html
Thanks!
Update: I have been told about the modified memcache (memcache-tag) that apparently does do a lot of this, but I can't install linux software on my windows development box...

Well, memcache use IS an identity map pattern. You check your cache, then you hit your database (or whatever else you're using). You can go about finding information about the source by storing objects instead of just values, but you'll take a performance hit for that.
You effectively cannot ask the cache what it contains as a list. To mass invalidate, you'll have to keep a list of what you put in and iterate it, or you'll have to iterate every possible key that could fit the pattern of concern. The resource you point out, memcache-tag can simplify this, but it doesn't appear to be maintained inline with the memcache project.
So your options now are iterative deletes, or totally flushing everything that is cached. Thus, I propose a design consideration is the question that you should be asking. In order to get a useful answer for you, I query thus: why do you want to do this?

Related

I want to create a desktop app with database-like search functions but without the SQL database

I know basic SQL, and SQL is all I know when it comes to storing and retrieving data. I want to create 1 .exe and it should contain all ~100,000 key-value pairs (i have the data in .txt files) and maybe an extra attribute for description (this I would add myself - like a note to myself).
I also would like to write it in a new language I don't know yet; like python or C# (I have made desktop apps written in Java & VB.net all with SQL databases). So language will not be an issue and I would appreciate suggestions.
These key-value pairs might not need to be updated and I'm willing to re-compile/repackage the code to make 1 change in the data. The key is 6 letters long and 2 numbers at the end like hxnaaa01. Each of these letters represent or describe something about itself so I would also need to search for a specific letter on a specific position to get exactly what I need.
I know that regex would work well with what I need but all I mentioned is all I know. I don't know enough and I don't know what keywords to google.
I have read about XML and CSV. I don't really know what they are and I'm not sure how all of this would fit in 1 executable.
To summarize, I need:
1 executable (Windows Desktop App)
Search function ~100k KVP+1more attribute (using regex?)
no database
with GUI
ability to add a "note" to each KVP
should be fast and lightweight
1 executable (Windows Desktop App), no database
Data persistence will require either additional files, or a database. It's pretty much unavoidable, you can store data in memory, but it's only persisted for as long as it resides there.
You have another requirement: "fast and lightweight".
To achieve this requirement, you'll need to really think about your solution, what technology you use and how you can improve it in future.
Although searching through data is pretty trivial, an efficient solution is not. It requires upfront research into algorithms, data structures and general practices. (which is a rabbit hole itself).
In the case of JSON [1], you'll need to create an additional file to contain all your key/value pairs, you can use C# to create the extra file (on first launch, for example).
JSON promises to be lightweight, I tend to agree, some may not. When dealing with the filesystem, I think it can be agreed is often far from lightweight solution.
JSON is very readable though:
{
"key": "value",
"comment": "oh this is cool"
}
There's a lot of factors that play into something being fast and lightweight, so there's a need for some research on your part.
Honestly, depending on your experience, I wouldn't focus so much on the fast, I'd focus more on it working, then refactor that into something that's fast if it's too slow. [2]
And again, depending on your experience, I'd stick to opening the file, using a for/loop to find my key and do something with the data found, plus reward myself for having something that works.
TL;DR: you need either a file, or database for truly persistent storage, JSON or a remotely hosted MySQL would work. Try not to focus too much on fast before you have something that works.
https://www.json.org/json-en.html [1]
https://stackoverflow.com/a/5581595/2932298
https://stackify.com/premature-optimization-evil/ [2]

How to keep your distributed cache clean?

In a N-Tier architecture, what would be the best patterns to use so that you can keep your cache clean?
I know it's easy to just set an absolute/sliding timeout, but is there a better mechanism available to allow you to mark your cache as dirty after you update the underlying persistence.
The difficulty I"m trying to wrap my head around is that Cache are usually stored as KVP. But a query is usually a fair bit more complex than that. So how can the gateway service tell the cache store that for such and such query, it needs to refetch from persistence.
I also can't afford to hand-code the cache update per query. I'm looking for a more systematic approach.
Is this just a pipe dream, or is there some way to do this elegantly?
Link/Guide/Post appreciated.
I have worked with AppFabric and I think tried to do what you are asking about. I was working on an auction site and I wanted to pro-actively invalidate items in the cache.
For example, we had listings (things for sale) and they would be present all over the cache (AppFabric). The data that represented a listing was in 10 different places. What I initially wanted was a way to say, "Ok, my listing has changed. Let me go find everywhere it exists in cache, and then update." (I think you say "mark as dirty" in your question)
I found doing this was incredibly difficult. There are tags in AppFabric that I tried to use, so I would mark a given object (or collection of objects) with a tag and that would let me query the cache and remove items. In other words, if an object had a LISTING tag, I would find it and invalidate it.
Eventually I settled on a two-pronged attack.
For 95% of the data I let it expire. It was a happy day when I decided this because everything got much easier to develop. I had to make some concessions in the UI etc., but it was well worth it.
For the last 5% of the data I resolved to only ever store it once. For example, a bid on a listing. Whenever a new bid came in, we'd pro-actively invalidate that object, and then everything that needed that information would be updated as well.

Front and back end techniques to increase performance

What are some of the common and notable performance issues/bottlenecks that are typically encountered in a web application in both, the front-end layer, and the back-end layer?
An example of what I mean in a database is not having something you are querying on be an index. That would slow down the query. On the front-end it might be something funky going on with JavaScript that makes your application seem slow.
What are the general rules of thumb that help navigate such issues? And what are some good to-do's?
Thanks,
Alex
On front-end:
-push all of your assets - css files, images, static content - to a CDN. Edgecast is pretty good and reasonably priced.
-don't use load entire javascript frameworks when you only need a few features from it. only load what's needed.
On back-end
-memcache the results from all database calls by using a hash of the sql query as the key name, and the result set as the value
-make sure you are not making your database tables really 'wide' - tons of columns and column types like 'text' and 'blob'
For the front-end, there are well-known guidelines/rules you can follow, and there are some great tools like YSlow that can help you pinpoint the bottlenecks.
For the back-end, as you've noted, efficient use of indexes is a must. Other optimizations usually involve caching, and basic stuff like avoiding doing stuff within loops that can be done once. I'm sure people here will have suggestions, but remember "premature optimization is the root of all evil!" :-)
Millhouse is on to it. I can also add:
Batch expensive operations up. For example: don't make lots of individual calls to a database if you can do it all in one hit.
Avoid server hops where you can.
Process in parallel if you can (not so common for your 'average' web app but quite possible in larger Enterprise scale apps).
Pre-process: crunching data, pre-puiblishing content etc, the more you can do before it's needed the better.
Use a CQRS-based architecture. CQRS stands for Command/Query Responsability Segregation; it basically means that you have different code (services) for reading from the DB and writing to the DB. A good practice for scalability is to have separate DB's for reading and writing (it actually does make sense, if you read more about CQRS), and you can scale out the reading database by having copies run on multiple servers.
CQRS is not only interesting from a scalability point of view, but also from a code maintenance and clarity point of view. It does take some effort to learn about CQRS and understand it, though.
Check out these links:
http://www.slideshare.net/skillsmatter/ddd-exchange-2010-udi-dahan-on-architectural-innovation-cqrs
http://www.slideshare.net/pjvdsande/rethink-your-architecture-with-cqrs
convert dynamic contents to static contents. regenerate those static contents if their dependent objects changed. I saw one article said that more than 80 percent contents are static on Amazon website.

Cache vs HashMap for simple usecase

This must be a very basic:- Just curious, If I don't need distributed, cache-as-sor models, why do we need third party cache libraries (ehcache, memcached) when all you need (for simple use case) is just a key-value pair holder, something like HashMap ?
A lot of thought goes into producing software, and the more thought and testing by others (and fixes) improves the value of the software and also validates the code as a model (I didn't say a good model).
For the example, above, how would you handle the deleting of "old" cache items? You would have to add more code/features to insure that the cache could be emptied.
Using memcache may be overkill for a simple program, but it's already solved many of the problems that you will have and gives you a bit of extra ability.
I would also use Redis as an example. You can DO a lot of stuff in your own language, but sometimes, Redis would make other items easier.
YMMV!
-daniel

Cache like mechanism (which Data Structure)?

I am fetching some questions from the server (database) and showing it to client (user) in the browser. The client will answer the question and based on his/her answer the next set of questions will be fetched from the database. Now, I want to pre-fetch the next set of questions while the user read the present question so that the waiting time for user to see the next question will be shorter.
My questions is, how to store the pre-fetched questions i.e. which data structure should I use to store the pre-fetched questions in the memory so that I can get better performance? I want a "cache" type of thing. Also once the user hit any question from the cache the question won't be there any more.
PS: Each question has unique Id.
Thanks
Naveen
There are multiple options to go about it. One that makes a big difference, one that makes little.
Little difference would be to fetch questions and store it in user's session. It's basically depends on where your session is stored, could also be database, or a file. This only makes sense if your db tables are very denormalized and it requires lots of joins to get the answer. I doubt that's the case so this won't make much difference for the user no matter which data structure used.
Big difference would make prefetching them with AJAX using javascript straight into the browser. In this case a simple array would suffice. JS gives you flexibility to build any objects with any properties, anything would be good enough. So write a poller in JS which fetches the questions from server while user is looking at the question, return them using JSON for example. JSON will become a simple object. Since each user stores only a couple of questions prefetched in their browser particular data structure choice won't make a difference here either.
Try using LinkedHashMap as You will have LRU algorithm implemented quickly with good performance.
Read this link as well :
LinkedHashMap as cache
First a few questions to adapt to your context :
assuming you use Java ?
using Hibernate also ?
If you want to prefetch in the server, many caching solutions exists.
Taking into account your unique id (see PS), if this ID is database related and you are using Hibernate, the easiest solution would be to configure the Hibernate second-level cache for that entity. Then, your only code would be to run the query in advance....
If theses requisites do not fit, I used EhCache as the caching solutions.
Somehow easy to start using, and it has plenty of features available when you later need them.

Resources