I am caching opaque responses in SW which bloats CacheStorage exponentially upto order of 6 GB+. At times, I see SW responses slower than that from the browser cache.
Can bloated CacheStorage lead to slow read and hence degraded performance while serving request through SW? How much of this performance is driven by what kind of hard drive the machine is loaded with -- SSD vs HDD?
PS: I am aware that the ideal fix is to either fix the opaque responses or not cache them at all.
The Cache Storage API gives you programmatic control over cache expiration, and allows you to use cached responses within JavaScript to construct complex serving/fallback strategies that work independent of the network, which is generally speaking not possible when just using the browser's HTTP cache.
There is no specific expectation that using the Cache Storage API is going to be faster than the HTTP cache, though.
There's some degree of overhead involved whenever service worker code is run, and the details about that overhead's impact are likely to vary based on storage medium, CPU, browser version, and any number of other criteria. I will say that I don't believe the fact that these are opaque responses comes that heavily into play when it comes to runtime performance, and the additional quota usage is effectively "fictitious". It just translates into a higher number used when calculating available quota, but does not actually result in more data being written to disk.
Related
At the company I work, all our APIs send and expect requests/responses that follow the JSON:API standard, making the structure of the request/response content very regular.
Because of this regularity and the fact that we can have hundreds or thousands of records in one request, I think it would be fairly doable and worthwhile to start supporting compressed requests (every record would be something like < 50% of the size of its JSON:API counterpart).
To make a well informed judgement about the viability of this actually being worthwhile, I would have to know more about the relationship between request size and duration, but I cannot find any good resources on this. Anybody care to share their expertise/resources?
Bonus 1: If you were to have request performance issues, would you look at compression as a solution first, second, last?
Bonus 2: How does transmission overhead scale with size? (If I cut the size by 50%, by what percentage will the transmission overhead be cut?)
Request and response compression adds to a time and CPU penalty on both sender's side and receiver's side. The savings in time is in the transmission.
The weighing of the tradeoff depends a lot on the customers of the API -- when they make requests, how much do they request, what is requested, where they are located, type of device/os and capabilities etc.,
If the data is static -- for eg: a REST query apihost/resource/idxx returning a static resource, there are web standard approaches like caching of static resources that clients / proxies will be able to assist with.
If the data is dynamic -- there are architectural patterns that could be used.
If the data is huge -- eg: big scientific data sets, video etc., almost always you would find them being served statically with a metadata service that provides the dynamic layer. For eg: MPEG-DASH or HLS is just a collection of files.
I would choose compression as a last option relative to the other architectural options.
There are also implementation optimizations that would precede using compression of request/response. For eg:
Are your services using all available resources at disposal (cores, memory, i/o)
Does the architecture allow scale-up and scale-out and can the problem be handled effectively using that (remember the penalties on client side due to compression)
Can you use queueing, caching or other mechanisms to make things appear faster?
If you have explored all these and the answer is your system is optimal and you are looking at the most granular unit of service where data volume is an issue, by all means go after compression. Keep in mind that you need to budget compute resources for compression on the server side as well (for a fixed workload).
Your question#2 on transmission overhead vs size is a question around bandwidth and latency. Bandwidth determines how much you can push through the pipe. Latency governs the perceived response times. Whether the payload is 10 bytes or 10MB, latency for a client across the world encountering multiple hops will be larger relative to a client encountering only one or two hops and is bound by the round-trip time. So, a solution may be to distribute the servers and place them closer to your clients from across the world rather than compressing data. That is another reason why compression isn't the first thing to look at.
Baseline your performance and benchmark your experiments for a representative user base.
I think what you are weighing here is going to be the speed of your processor / cpu vs the speed of your network connection.
Network connection can be impacted by things like distance, signal strength, DNS provider, etc; whereas, your computer hardware is only limited by how much power you've put in it.
I'd wager that compressing your data before you are sending would result in shorter response times, yes, but it's=probably going to be a very small amount. If you are sending json, usually text isn't all that large to begin with, so you would probably only see a change in performance at the millisecond level.
If that's what you are looking for, I'd go ahead and implement it, set some timing before and after, and check your results.
I have a high-performance software server application that is expected to get increased traffic in the next few months.
I was wondering what approach or methodology is good to use in order to gauge if the server still has the capacity to handle this increased load?
I think you're looking for Stress Testing and the scenario would be something like:
Create a load test simulating current real application usage
Start with current number of users and gradually increase the load until
you reach the "increased traffic" amount
or errors start occurring
or you start observing performance degradation
whatever comes the first
Depending on the outcome you either can state that your server can handle the increased load without any issues or you will come up with the saturation point and the first bottleneck
You might also want to execute a Soak Test - leave the system under high prolonged load for several hours or days, this way you can detect memory leaks or other capacity problems.
More information: Why ‘Normal’ Load Testing Isn’t Enough
Test the product with one-tenth the data and traffic. Be sure the activity is 'realistic'.
Then consider what will happen as traffic grows -- with the RAM, disk, cpu, network, etc, grow linearly or not?
While you are doing that, look for "hot spots". Optimize them.
Will you be using web pages? Databases? Etc. Each of these things scales differently. (In other words, you have not provided enough details in your question.)
Most canned benchmarks focus on one small aspect of computing; applying the results to a specific application is iffy.
I would start by collecting base line data on critical resources - typically, CPU, memory usage, disk usage, network usage - and track them over time. If any of those resources show regular spikes where they remain at 100% capacity for more than a fraction of a second, under current usage, you have a bottleneck somewhere. In this case, you cannot accept additional load without likely outages.
Next, I'd start figuring out what the bottleneck resource for your application is - it varies between applications, but in most cases it's the bottleneck resource that stops you from scaling further. Your CPU might be almost idle, but you're thrashing the disk I/O, for instance. That's a tricky process - load and stress testing are the way to go.
If you can resolve the bottleneck by buying better hardware, do so - it's much cheaper than rewriting the software. If you can't buy better hardware, look at load balancing. If you can't load balance, you've got to look at application architecture and implementation and see if there are ways to move the bottleneck.
It's quite typical for the bottleneck to move from one resource to the next - you've got CPU to behave, but now when you increase traffic, you're spiking disk I/O; once you resolve that, you may get another CPU challenge.
Our professor asked us to think of an embedded system design where caches cannot be used to their full advantage. I have been trying to find such a design but could not find one yet. If you know such a design, can you give a few tips?
Caches exploit the fact data (and code) exhibit locality.
So an embedded system wich does not exhibit locality, will not benefit from a cache.
Example:
An embedded system has 1MB of memory and 1kB of cache.
If this embedded system is accessing memory with short jumps it will stay long in the same 1kB area of memory, which could be successfully cached.
If this embedded system is jumping in different distant places inside this 1MB and does that frequently, then there is no locality and cache will be used badly.
Also note that depending on architecture you can have different caches for data and code, or a single one.
More specific example:
If your embedded system spends most of its time accessing the same data and (e.g.) running in a tight loop that will fit in cache, then you're using cache to a full advantage.
If your system is something like a database that will be fetching random data from any memory range, then cache can not be used to it's full advantage. (Because the application is not exhibiting locality of data/code.)
Another, but weird example
Sometimes if you are building safety-critical or mission-critical system, you will want your system to be highly predictable. Caches makes your code execution being very unpredictable, because you can't predict if a certain memory is cached or not, thus you don't know how long it will take to access this memory. Thus if you disable cache it allows you to judge you program's performance more precisely and calculate worst-case execution time. That is why it is common to disable cache in such systems.
I do not know what you background is but I suggest to read about what the "volatile" keyword does in the c language.
Think about how a cache works. For example if you want to defeat a cache, depending on the cache, you might try having your often accessed data at 0x10000000, 0x20000000, 0x30000000, 0x40000000, etc. It takes very little data at each location to cause cache thrashing and a significant performance loss.
Another one is that caches generally pull in a "cache line" A single instruction fetch may cause 8 or 16 or more bytes or words to be read. Any situation where on average you use a small percentage of the cache line before it is evicted to bring in another cache line, will make your performance with the cache on go down.
In general you have to first understand your cache, then come up with ways to defeat the performance gain, then think about any real world situations that would cause that. Not all caches are created equal so there is no one good or bad habit or attack that will work for all caches. Same goes for the same cache with different memories behind it or a different processor or memory interface or memory cycles in front of it. You also need to think of the system as a whole.
EDIT:
Perhaps I answered the wrong question. not...full advantage. that is a much simpler question. In what situations does the embedded application have to touch memory beyond the cache (after the initial fill)? Going to main memory wipes out the word full in "full advantage". IMO.
Caching does not offer an advantage, and is actually a hindrance, in controlling memory-mapped peripherals. Things like coprocessors, motor controllers, and UARTs often appear as just another memory location in the processor's address space. Instead of simply storing a value, those locations can cause something to happen in the real world when written to or read from.
Cache causes problems for these devices because when software writes to them, the peripheral doesn't immediately see the write. If the cache line never gets flushed, the peripheral may never actually receive a command even after the CPU has sent hundreds of them. If writing 0xf0 to 0x5432 was supposed to cause the #3 spark plug to fire, or the right aileron to tilt down 2 degrees, then the cache will delay or stop that signal and cause the system to fail.
Similarly, the cache can prevent the CPU from getting fresh data from sensors. The CPU reads repeatedly from the address, and cache keeps sending back the value that was there the first time. On the other side of the cache, the sensor waits patiently for a query that will never come, while the software on the CPU frantically adjusts controls that do nothing to correct gauge readings that never change.
In addition to almost complete answer by Halst, I would like to mention one additional case where caches may be far from being an advantage. If you have multiple-core SoC where all cores, of course, have own cache(s) and depending on how program code utilizes these cores - caches can be very ineffective. This may happen if ,for example, due to incorrect design or program specific (e.g. multi-core communication) some data block in RAM is concurrently used by 2 or more cores.
I'm developing a client/server application where the server holds large pieces of data such as big images or video files which are requested by the client and I need to create an in-memory client caching system to hold a few of those large data to speed up the process. Just to be clear, each individual image or video is not that big but the overall size of all of them can be really big.
But I'm faced with the "how much data should I cache" problem and was wondering if there are some kind of golden rules on Windows about what strategy I should adopt. The caching is done on the client, I do not need caching on the server.
Should I stay under x% of global memory usage at all time ? And how much would that be ? What will happen if another program is launched and takes up a lot of memory, should I empty the cache ?
Should I request how much free memory is available prior to caching and use a fixed percentage of that memory for my needs ?
I hope I do not have to go there but should I ask the user how much memory he is willing to allocate to my application ? If so, how can I calculate the default value for that property and for those who will never use that setting ?
Rather than create your own caching algorithms why don't you write the data to a file with the FILE_ATTRIBUTE_TEMPORARY attribute and make use of the client machine's own cache.
Although this approach appears to imply that you use a file, if there is memory available in the system then the file will never leave the cache and will remain in memory the whole time.
Some advantages:
You don't need to write any code.
The system cache takes account of all the other processes running. It would not be practical for you to take that on yourself.
On 64 bit Windows the system can use all the memory available to it for the cache. In a 32 bit Delphi process you are limited to the 32 bit address space.
Even if your cache is full and your files to get flushed to disk, local disk access is much faster than querying the database and then transmitting the files over the network.
It depends on what other software runs on the server. I would make it possible to configure it manually at first. Develop a system that can use a specific amount of memory. If you can, build it so that you can change that value while it is running.
If you got those possibilities, you can try some tweaking to see what works best. I don't know any golden rules, but I'd figure you should be able to set a percentage of total memory or total available memory with a specific minimum amount of memory to be free for the system at all times. If you save a miminum of say 500 MB for the server OS, you can use the rest, or 90% of the rest for your cache. But those numbers depend on the version of the OS and the other applications running on the server.
I think it's best to make the numbers configurable from the outside and create a management tool that lets you set the values manually first. Then, if you found out what works best, you can deduct formulas to calculate those values, and integrate them in your management tool. This tool should not be an integral part of the cache program itself (which will probably be a service without GUI anyway).
Questions:
One image can be requested by multiple clients? Or, one image can be requested by multiple times in a short interval?
How short is the interval?
The speed of the network is really high? Higher than the speed of the hard drive?? If you have a normal network, then the harddrive will be able to read the files from disk and deliver them over network in real time. Especially that Windows is already doing some good caching so the most recent files are already in cache.
The main purpose of the computer that is running the server app is to run the server? Or is just a normal computer used also for other tasks? In other words is it a dedicated server or a normal workstation/desktop?
but should I ask the user how much
memory he is willing to allocate to my
application ?
I would definitively go there!!!
If the user thinks that the server application is not a important application it will probably give it low priority (low cache). Else, it it thinks it is the most important running app, it will allow the app to allocate all RAM it needs in detriment of other less important applications.
Just deliver the application with that setting set by default to a acceptable value (which will be something like x% of the total amount of RAM). I will use like 70% of total RAM if the main purpose of the computer to hold this server application and about 40-50% if its purpose is 'general use' computer.
A server application usually needs resources set aside for its own use by its administrator. I would not care about others application behaviour, I would care about being a "polite" application, thereby it should allow memory cache size and so on to be configurable by the administator, which is the only one who knows how to configure his systems properly (usually...)
Defaults values should anyway take into consideration how much memory is available overall, especially on 32 bit systems with less than 4GB of memory (as long as Delphi delivers only 32 bit apps), to leave something free to the operating systems and avoids too frequent swapping. Asking the user to select it at setup is also advisable.
If the application is the only one running on a server, a value between 40 to 75% of available memory could be ok (depending on how much memory is needed beyond the cache), but again, ask the user because it's almost impossible to know what other applications running may need. You can also have a min cache size and a max cache size, start by allocating the lower value, and then grow it when and if needed, and shrink it if necessary.
On a 32 bit system this is a kind of memory usage that could benefit from using PAE/AWE to access more than 3GB of memory.
Update: you can also perform a monitoring of cache hits/misses and calculate which cache size would fit the user needs best (it could be too small but too large as well), and the advise the user about that.
To be honest, the questions you ask would not be my main concern. I would be more concerned with how effective my cache would be. If your files are really that big, how many can you hold in the cache? And if your client server app has many users, what are the chances that your cache will actually cache something someone else will use?
It might be worth doing an analysis before you burn too much time on the fine details.
I want to cache data on the client. What is the best algorithm/data structure that can be employed?
Case 1. The data to be stored requires extremely fast string searching capability.
Case 2. The cached data set can be large. I don't want to explode the client's memory usage and also I don't want to make a network and disk access calls which slows down my processing time on the client side
Solutions:
Case 1: I think suffix tree/Tries provides you with a good solution in this case.
Case 2: The two problems to consider here are:
To store large data with minimum memory consumption
Not to make any network calls to access any data which is not available in the cache.
LRU caching model is one solution I can think of but that does not prevent me from bloating the memory.
Is there any way to write down to a file and access without compromising the data (security aspect)?
Let me know if any point is not clear.
EDIT:
Josh, I know my requirements are non-realistic. To narrow down my requirement, I am looking for something which stores using LRU algorithm. It will be good if we can have dynamic size configuration for this LRU with a maximum limit to it. This will reduce the number of calls going to the network/database and provide a good performance as well.
If this LRU algorithm works on a compressed data which can be interpreted with a slight overhead (but less than a network call), it will be much better.
Check out all the available caching frameworks/libraries - I've found Ehcache to be very useful. You can also have it keep just some (most recent) in memory and failover to disk at a specified memory usage. The disk calls will still be a lot faster then network calls and you avoid taking all the memory.
Ehcache
Unfortunately, I think your expectations are unrealistic.
Keeping memory usage small, but also not making disk access calls means that you have nowhere to store the data.
Furthermore, to answer your question about security, there is no client side data storage (assuming you are talking about a web-application) that is "secure". You could encrypt it, but this will destroy your speed requirements as well as require server-side processing. Everything stored at and sent from the client is suspect.
Perhaps if you could describe the problem in greater detail we can suggest some realistic solutions.