What's the benefit of the client-server model of memcached? - client-server

As I understand, the benefit of using memcached is to shorten the access time to the information stored in the database by caching it in the memory. But isn't the time overhead for the client-server model based on network protocol (e.g. TCP) also considerable as well? My guess is that it actually might be worse as network access is generally slower than hardware access. What am I getting wrong?
Thank you!

It's true that caching won't address network transport time. However, what matters to the user is the overall time from request to delivery. If this total time is perceptible, then your site does not seem responsive. Appropriate use of caching can improve responsiveness, even if your overall transport time is out of your control.
Also, caching can be used to reduce overall server load, which will essentially buy you more cycles. Consider the case of a query whose response is the same for all users - for example, imagine that you display some information about site activity or status every time a page is loaded, and this information does not depend on the identity of the user loading the page. Let's imagine also that this information does not change very rapidly. In this case, you might decide to recalculate the information every minute, or every five minutes, or every N page loads, or something of that nature, and always serve the cached version. In this case, you're getting two benefits. First, you've cut out a lot of repeated computation of values that you've decided don't really need to be recalculated, which takes some load off your servers. Second, you've ensured that users are always getting served from the cache rather than from computation, which might speed things up for them if the computation is expensive.
Both of those could - in the right circumstances - lead to improved performance from the user's perspective. But of course, as with any optimization, you need to have benchmarks and actually benchmark to data rather than to your perceptions of what ought to be correct.


Concurrent Product View Counter

So i have been studying some System Design Concepts and i have stumbled on this question of "Show Number Of People Viewing a Product".
There is some leeway on consistency for the design. But it should support high traffic e-commerce site.
My approach was to consider storing last 5-10 data in a timestamp manner along with the product-id and user session-id in a distributed cache like redis. This should be performed when user is viewing the product. While to get the actual count we have to have a separate read API that should hit a replica cache instance ( secondary ) and aggregate the result for that product-id.
I have to keep the computation logic in cyclic manner so that i dont waste too much memory on the calculation of timestamps more than 5-10 mins.
Do you think i should tweak my read/write strategy to optimise further.
Is the cache option good enough, specifically redis ?
what are more tried and tested approaches out there ?

Performance test with cache

We have a service which is heavy CPU bound, it will do a lot of calculation for a given parameter, fornatulayely, the calculation result can be cached.
For example, a request /data/{id}.png will cost almost 2s for the first time, but we will cache the response for later user. When the cache is hit, the response time is 200ms(since we will do some light weight operation on the cache before response).
Now we want to provide a performance test report for this service expecilly for the max-concurrency and response time, but for a specified request(with a specified id paramter), there will be a huge difference between with and without cache. That means during the test, if we clear the cache, and use the random generated id parameter the report, there maybe too less cache can be hit, which result in a bad report. If we pre-cache most of the response and do the some test, the report may be looks well.
So I wonder how to reflect the real performance for this suitation?
In order to know real performance you need to produce a realistic load. Not knowing the details of how will your service be used it is hard to come up with exact distributions of "cached" and "new" requests, however one thing is obvious: well-behaved load test must represent real life application usage, otherwise it doesn't make a lot of sense.
So happy path testing would be something like:
Using anticipated distribution of "new" and "cached" requests
Using anticipated number of users of your system
This performance testing type is known as Load Testing. However I wouldn't stop at this stage as load testing doesn't tell the full story.
The next step would be putting your system under a prolonged load (i.e. overnight or weekend). You might also want to increase the load to be above the anticipated value. This testing type is called Soak Testing and it is very good in discovering memory leaks and problems with lacking resources like disk space over time
And finally you can check when (and how) you app is gonna break. Start with 1 virtual user and gradually increase the load until response time begins exceeding acceptable thresholds or errors start occurring (whatever comes the first). At this point you might also know if the application recovers back to normal when the load decreases. This testing type is known as Stress Testing and most probably this way you will know your application bottleneck

Is combining rest api calls to reduce # requests worth doing?

My server used to handle 700+ user burst and now it is failing at around 200 users.
(Users are connecting to the server almost at the same time after clicking a push message)
I think the change is due to the change how the requests are made.
Back then, webserver collected all the information in a single response in an html.
Now, each section in a page is making a rest api request resulting in probably 10+ more requests.
I'm considering making an api endpoint to aggregate those requests for pages that users would open when they click on push notification.
Another solution I think of is caching those frequently used rest api responses.
Is it a good idea to combine api calls to reduce api calls ?
It is always a good idea to reduce API calls. The optimal solution is to get all the necessary data in one go without any unused information.
This results in less traffic, less requests (and loads) to the server, less RAM and CPU usage, as well as less concurrent DB operations.
Caching is also a great choice. You can consider both caching the entire request and separate parts of the response.
A combined API response means that there will be just one response, which will reduce the pre-execution time (where the app is loading everything), but will increase the processing time, because it's doing everything in one thread. This will result in less traffic, but a slightly slower response time.
From the user's perspective this would mean that if you combine everything, the page will load slower, but when it does it will load up entirely.
It's a matter of finding the balance.
And for the question if it's worth doing - it depends on your set-up. You should measure the start-up time of the application and the execution time and do the math.
Another thing you should consider is the amount of time this might require. There is also the solution of increasing the server power, like creating a clustered cache and using a load balancer to split the load. You should compare the needed time for both tasks and work from there.

In what ways does more RAM and Processing power on my server make my website faster?

I understand that the speed that a website loads is dependent on many things, however I'm interested to know how I can positively impact load speed by increasing the specifications on my dedicated server:
Does this allow my server to handle more requests?
Does this reduce roundtrips?
Does this decrease server response time?
Does this allow my server to generate pages on Wordpress faster?
Does this allow my server to handle more requests?
Requests come in and are essentially put into a queue until the system has enough time to handle it. By increasing system resources, such a queue might be faster processed, and such a queue might be configured to handle more requests simultaneously, so... yes-ish (note: this is very generalized)
Does this reduce roundtrips?
No, your application design is the only thing that effects this. If your application makes a request to the server, it makes a request (e.g., a "round trip"). If you increase your server resources, you do not in turn decrease the amount of requests your application makes.
Does this decrease server response time?
Yes, see first explanation. It can often decrease the response times for the same reasons given there. However, network latency and other factors outside the realm of the server can effect complete response processing times.
Does this allow my server to generate pages on Wordpress faster?
Again, see the first explanation. This can help your server generate pages faster by throwing more power at the processes that generate the pages. However, outside factors aside from the server still apply.
For performance, the two high target areas (assuming you don't have tons and tons of traffic, which most sites do not), are reducing database reads and caching. Caching covers various areas... data caching on the server, page output caching, browser caching for content, images, etc. If you're experiencing less than desirable performance, this is usually a good place to start.

Performance Optimization For Highly Interactive Websites

I recently completed development of a mid-traficked(?) website (peak 60k hits/hour), however, the site only needs to be updated once a minute - and achieving the required performance can be summed up by a single word: "caching".
For a site like SO where the data feeding the site changes all the time, I would imagine a different approach is required.
Page cache times presumably need to be short or non-existent, and updates need to be propogated across all the webservers very rapidly to keep all users up to date.
My guess is that you'd need a distributed cache to control the serving of data and pages that is updated on the order of a few seconds, with perhaps a distributed cache above the database to mediate writes?
Can those more experienced that I outline some of the key architectural/design principles they employ to ensure highly interactive websites like SO are performant?
The vast majority of sites have many more reads than writes. It's not uncommon to have thousands or even millions of reads to every write.
Therefore, any scaling solution depends on separating the scaling of the reads from the scaling of the writes. Typically scaling reads is really cheap and easy, scaling the writes is complicated and costly.
The most straightforward way to scale reads is to cache entire pages at a time and expire them after a certain number of seconds. If you look at the popular web-site, Slashdot. you can see that this is the way they scale their site. Unfortunately, this caching strategy can result in counter-intuitive behaviour for the end user.
I'm assuming from your question that you don't want this primitive sort of caching. Like you mention, you'll need to update the cache in place.
This is not as scary as it sounds. The key thing to realise is that from the server's point of view. Stackoverflow does not update all the time. It updates fairly rarely. Maybe once or twice per second. To a computer a second is nearly an eternity.
Moreover, updates tend to occur to items in the cache that do not depend on each other. Consider Stack Overflow as example. I imagine that each question page is cached separately. Most questions probably have an update per minute on average for the first fifteen minutes and then probably once an hour after that.
Thus, in most applications you barely need to scale your writes. They're so few and far between that you can have one server doing the writes; Updating the cache in place is actually a perfectly viable solution. Unless you have extremely high traffic, you're going to get very few concurrent updates to the same cached item at the same time.
So how do you set this up? My preferred solution is to cache each page individually to disk and then have many web-heads delivering these static pages from some mutually accessible space.
When a write needs to be done it is done from exactly one server and this updates that particular cached html page. Each server owns it's own subset of the cache so there isn't a single point of failure. The update process is carefully crafted so that a transaction ensures that no two requests are not writing to the file at exactly the same time.
I've found this design has met all the scaling requirements we have so far required. But it will depend on the nature of the site and the nature of the load as to whether this is the right thing to do for your project.
You might be interested in this article which describes how wikimedia's servers are structured. Very enlightening!
The article links to this pdf - be sure not to miss it.
