How to avoid bottleneck performance? - performance

A distributed system is described as scalable if it remains effective when there is a significant increase the number of resources and the number of users. However, these systems sometimes face performance bottlenecks. How can these be avoided?

The question is pretty broad, and depends entirely on what the system is doing.
Here are some things I've seen in systems to reduce bottlenecks.
Use caches, reducing network and disk bottlenecks. But remember that knowing when to evict from a cache eviction can a hard problem in some situations.
Use message queues to decouple components in the system. This way you can add more hardware to specific parts of the system that need it.
Delay computation when possible (often by using message queues). This takes the heat off the system during high-processing times.
Of course, design the system for parallel processing wherever possible. One host doing processing is NOT scalable. Note: most relational databases fall into the one-host bucket, this is why NoSQL has become suddenly popular; but not always appropriate (theoretically).
Use eventual consistency if possible. Strong consistency is much harder to scale.
Some are proponents for CQRS and DDD. Though I have never seen or designed a "CQRS system" nor a "DDD system," those have definitely affected the way I design systems.
There is a lot of overlap in the points above; some the techniques may use some of the others.
But, experience (your own and others) eventually teaches you about scalable systems. I keep up-to-date by reading about designs from google, amazon, twitter, facebook, and the like. Another good starting point is the high-scalability blog.

Just to build on a point discussed in the abover post, I would like to add that what you need for your distributed system is a distributed cache, so that when you intend on scaling your application the distributed cache acts like a "elastic" data-fabric meaning that you can increase storage capacity of the cache without compromising on performance and at the same time giving you a relaible platform that is accessible to multiple applications.
One such distributed caching solution is NCache. Do take a look!

Related

Interview question: How do you scale or optimize a microservice which receives millions of requests?

I was asked a question during an interview:
How do you optimize a microservice which receives millions of requests?
How do you optimize the latency/frequency of a service response which accessed multiple times?
My answer was:
I would check the DB query which makes the response slow and then configure the cache.
Can anyone let me in what are the ways a service can be optimized other than these? if there is anything cloud side?
It is a vast and complex questions which can have a lot of different (and very long) answers based on the context and the structure of your environment.
There are a lot of patterns and concepts which fit different scenario and architecture.
I would suggest you to start here: https://microservices.io/patterns/index.html
The guy behind the site (Chris Richardson) advocates microservices since a long time. You can find numerous talks of this guy on Youtube. It is a great way to start your journey in the microservice world.
And off course: https://martinfowler.com/articles/microservices.html
Here are my ideas on this question:
Caching, as you mentioned, is a good optimization point when around the data access layer. I would check to see if there is any opportunity to add a cache without breaking the consistency or other hard requirements there may be for the application.
I would analyze the CPU and memory usage and adjust them progressively while monitoring the latency closely. The objective here is to find the point when more resources does not decrease the latency significantly.
The above point has to consider the adjustment of the number of threads in the application, and also making sure that you optimize the synchronization scheme.
If the microservice is built in a JVM-based language, the GC is one component that may introduce latency, mostly when it kicks in, so if the latency spikes correlates with GC cycles, then I would search for optimizations there.
Making sure the application is reusing connections to external services efficiently is another optimization point that may be considered

Is there a theorem like CAP for web development?

When you're building something in a web development scenario you're often thinking of costs/resources, and you're often juggling between three resources:
CPU (Processing in general)
Memory (Storage in general)
Network/Bandwidth (Or maybe even external/server resources)
The theorem here is simple, you can only choose two of these to be low.
If you want low CPU and Memory, you'll have to ask the server to do the work (High bandwidth usage)
If you want low Memory and Bandwidth, the CPU will have to do extra work to create and recreate things on the go.
If you want low CPU and Bandwidth, the memory will have to store more information and possibly duplicated data.
My question here, is there a name for this theorem? Or the managing of these 3 resources? I would like to know more about the theory behind choosing the best options in each scenario and researches related to this.
To be honest I don't know if this is the right community for this question, it is mostly a theoretical/academic question.
It doesn't work this way. For all 3 resources you can achieve low usage percentage by simply adding more resources in most cases. If you have high CPU usage then deploy your app to the server with two times more CPUs. So if you have money you can achieve low usage levels. In CAP it doesn't matter how much money you have.
Also from Wikipedia:
CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made

When are page frame specific cache management policies useful?

I'm reading the O'Reilly Linux Kernel book and one of the things that was pointed out during the chapter on paging is that the Pentium cache lets the operating system associate a different cache management policy with each page frame. So I get that there could be scenarios where a program has very little spacial/temporal locality and memory accesses are random/infrequent enough that the probability of cache hits is below some sort of threshold.
I was wondering whether this mechanism is actually used in practice today? Or is it more of a feature that was necessary back when caches where fairly small and not as efficient as they are now? I could see it being useful for an embedded system with little overhead as far as system calls are necessary, are there other applications I am missing?
Having multiple cache management policies is widely used, whether by assigning whole regions using MTRRs (fixed/dynamic, as explained in Intel's PRM), MMIO regions, or through special instructions (e.g. streaming loads/stores, non-temporal prefetches, etc..). The use-cases also vary a lot, whether you're trying to map an external I/O device into virtual memory (and don't want CPU caching to impact its coherence), or whether you want to define a writethrough region for better integrity management of some database, or just want plain writeback to maximize the cache-hierarchy capacity and replacement efficiency (which means performance).
These usages often overlap (especially when multiple applications are running), so the flexibility is very much needed, as you said - you don't want data with little to no spatial/temporal locality to thrash out other lines you use all the time.
By the way, caches are never going to be big enough in the foreseeable future (with any known technology), since increasing them requires you to locate them further away from the core and pay in latency. So cache management is still, and is going to be for a long while, one of the most important things for performance critical systems and applications

LAMP stack performance under heavy traffic loads

I know the title of my question is rather vague, so I'll try to clarify as much as I can. Please feel free to moderate this question to make it more useful for the community.
Given a standard LAMP stack with more or less default settings (a bit of tuning is allowed, client-side and server-side caching turned on), running on modern hardware (16Gb RAM, 8-core CPU, unlimited disk space, etc), deploying a reasonably complicated CMS service (a Drupal or Wordpress project for arguments sake) - what amounts of traffic, SQL queries, user requests can I resonably expect to accommodate before I have to start thinking about performance?
NOTE: I know that specifics will greatly depend on the details of the project, i.e. optimizing MySQL queries, indexing stuff, minimizing filesystem hits - assuming web developers did a professional job - I'm really looking for a very rough figure in terms of visits per day, traffic during peak visiting times, how many records before (transactional) MySQL fumbles, so on.
I know the only way to really answer my question is to run load testing on a real project, and I'm concerned that my question may be treated as partly off-top.
I would like to get a set of figures from people with first-hand experience, e.g. "we ran such and such set-up and it handled at least this much load [problems started surfacing after such and such]". I'm also greatly interested in any condenced (I'm short on time atm) reading I can do to get a better understanding of the matter.
P.S. I'm meeting a client tomorrow to talk about his project, and I want to be prepared to reason about performance if his project turns out to be akin FourSquare.
Very tricky to answer without specifics as you have noted. If I was tasked with what you have to do, I would take each component in turn ( network interface, CPU/memory, physical IO load, SMP locking etc) and get the maximum capacity available, divide by rough estimate of use per request.
For example, network io. You might have 1x 1Gb card, which might achieve maybe 100Mbytes/sec. ( I tend to use 80% of theoretical max). How big will a typical 'hit' be? Perhaps 3kbytes average, for HTML, images etc. that means you can achieve 33k requests per second before you bottleneck at the physical level. These numbers are absolute maximums, depending on tools and skills you might not get anywhere near them, but nobody can exceed these maximums.
Repeat the above for every component, perhaps varying your numbers a little, and you will build a quick picture of what is likely to be a concern. Then, consider how you can quickly get more capacity in each component, can you just chuck $$ and gain more performance (eg use SSD drives instead of HD)? Or will you hit a limit that cannot be moved without rearchitecting? Also take into account what resources you have available, do you have lots of skilled programmer time, DBAs, or wads of cash? If you have lots of a resource, you can tend to reduce those constraints easier and quicker as you move along the experience curve.
Do not forget external components too, firewalls may have limits that are lower than expected for sustained traffic.
Sorry I cannot give you real numbers, our workloads are using custom servers, high memory caching and other tricks, and not using all the products you list. However, I would concentrate most on IO/SQL queries and possibly network IO, as these tend to be more hard limits, than CPU/memory, although I'm sure others will have a different opinion.
Obviously, the question is such that does not have a "proper" answer, but I'd like to close it and give some feedback. The client meeting has taken place, performance was indeed a biggie, their hosting platform turned out to be on the Amazon cloud :)
From research I've done independently:
Memcache is a must;
MySQL (or whatever persistent storage instance you're running) is usually the first to go. Solutions include running multiple virtual instances and replicate data between them, distributing the load;
http://highscalability.com/ is a good read :)

Is implementing a cache considered a difficult problem?

There are many questions on here about caching that doesn't work properly, or asking how to implement caches properly, for all sorts of things from HTTP to SQL queries, L1/L2 memory caching, etc..
Is it generally held to be a difficult problem in computer science terms?
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
I don't know if implementing a cache is really considered to be a difficult thing, but I suspect that most people that do don't really test them well enough to be sure that it would work, and that the cache actually does the job it is supposed to do. The thing about implementing a cache is that if you get it wrong, unless you are monitoring cache misses and comparing performance, you won't even know, since the cache failure will just "fail dumb" down to the backing store. So what if the cache has a 100% miss rate, it only makes the default case 1% slower anyway.
I think the issue is deciding how you want your cache to operate ? e.g. what strategies do you use for eviction (least recently used etc.). Is it going to be distributed etc.? Do you want to optimise cache hits etc.? Complexity is based around these issues.
Simple in-process caches can be trivial. Generally I've not had to implement these, since you can get off-the-shelf implementations for most platforms, and I would hesitate to reinvent the wheel (I presume your question is about the difficulty of implementation, rather than asking if you should create one yourself)
I think the difficulty that many programmers have is in restraining themselves from writing caching software for their application. It's good to remember that modern operating systems alrerady do a raft load of caching for you , and that implementing it yourself may well be a pesimisation.
Not until you try it in a nontrivial case.
I think that no matter the task, if there is need for concurrent access to what is difficult is just that: control of concurrent access. Just that can cause serious problems such as dead-lock, serialization in access, limitations on concurrency, among other.

Resources