Why is infrequently-accessed Azure blob storage slow? - performance

My Azure cloud service reads and writes to blobs using the .Net storage library (1.7). The blobs are in the same data centre as the service. In my first container, operations are fast (order of 10ms). In my second container they are very slow (typically about 2s or 14s, not much in between). Both are transferring the data using CloudBlob.DownloadToStream() into a MemoryStream. File sizes are typically less than 100kB.
Now I admit I haven't set up a proper test to be able to demonstrate all the above - I'm just going by my log files, so there could be some subtle difference in the way I am accessing the blobs. Apologies if this turns out to be the case.
Anyway, the only relevant difference between these two containers seems to be:
The fast container is accessed frequently (tens of thousands of requests per day), and the slow container quite infrequently (perhaps 200 requests per day).
The fast container typically stores items that are fetched soon afterwards. The slow container is often loading things that might have been stored days ago.
Question: What factors affect blob performance for infrequently-accessed blobs? What can I do to make it faster?
(I don't know how Azure blob storage is implemented, but based on the above I'm going to guess that the data is saved into a storage array and accessed via a dynamically scaling collection of VMs, each of which implements in-memory caching of blobs. Thus the ~14s delay occurs when Azure finds it needs to spin up the VMs. The ~2s delay occurs when a VM is available, but it needs to hunt down the data on a physical disk (seems rather slow), and the 10ms delay occurs when the item is stored in an in-memory cache, or something like that.)

Windows Azure Storage is not architected how you are describing (with an expanding number of cache VMs), so there would be no impact of some data being cached and other data not being cached on the Azure Storage server side. See Windows Azure Storage Architecture Overview for a good overview, or SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency for a more in depth look.
To determine why your blob requests are slower, the first thing to do would be to determine if the slow performance is server side or client side. Fortunately Azure Storage makes this easy via the Storage Analytics (Windows Azure Storage Logging: Using Logs to Track Storage Requests) - just compare the End To End latency and the Server Latency. I suspect you will see one of two things:
Low E2E and Low Server. This would indicate that either the request is getting delayed being sent from the client (ie. not enough worker threads), or your logging is providing incorrect data.
High E2E and Low Server. This would indicate a problem on the client side in processing the request (not enough worker threads to process the Response, slow processing of the memory stream, etc).

Related

Azure Redis Memory Usage (w3wp.exe dump) Insight

Can anyone tell me if the type of behavior outlined in the memory dump from Visual Studio
Is normal? for instance does the StackExchange.Redis.PhysicalConnection run that high on inclusive size (bytes)? Or is that really high?
Basically we are experiencing slowness with our web head after converting our code to run on Azure Redis from Session (we are now serializing and deserializing as needed and storing in Redis cache) but overall performance is horrible.
The requests complete but it can take a while, is that due to the single threaded nature of Redis? We are using the configuration outlined as best practice by the Azure Redis team as outlined here https://stackoverflow.com/a/28821220
What else can we look at to help increase the performance as the current performance is not acceptable as a viable replacement for our session based implementation (asp.net webforms/sql server/azure IaaS) we currently have.
PS - Serialization and Deserialization does cause a hit, we understand that IIS spoiled us with its own special memory pool for non-serialized datasets and such, but there is no way that it should cause a 300-500% increase in page loads like it is now for us.
Thoughts appreciated!
#Tim Wieman
How large are your cached objects?
They can range in size, there are some datasets stored in redis.
What type of objects are they?
Most objects are custom objects w/variable number of properties, some even contain collections.
What serializer are you using?
We are using Newtonsoft for anything that doesn't require Rowstate and the required binary serializer for the datasets that do need rowstate.
All serialization, and subsequent deserialization, is done in code before call redis databases StringGet or StringSet.
If appears the memory was in fact extremely high, we were erroneously creating thousands of connections to Redis instead of a singleton instance of the Redis Cache.
The multiple connections were not getting cleaned up by the GC before the CPU would get to 98% and the server would become unresponsive.
We adjusted our code to ensure a single instance of the connection to Azure Redis is used for all Redis calls and have tested thoroughly.
It appears to be resolves as Azure Redis is no longer eating up memory or CPU resources.

Monitoring I/O requests

One of my Railo web applications generates too many I/O requests.
Since it's hosted on an Amazon Ec2 instance, that directly affects my billing badly, because of EBS disk activity (hundreds of milions of operations).
How can I monitor I/O requests? The perfect tool would allow me to find which template/component makes intensive I/O.
I'm already using FusionReactor and that's great for profiling memory spaces and so on, but it doesn't have anything for I/O.
so you could start out by using the operating system monitoring tools to see if you have mainly reads or writes, next step is looking at memory issues despite it being an disk IO issue, maybe your servers are low on memory and thrashing the drives as they are swapping pages in and out of memory.
if you have not done so turn on the template cache this will stop railo checking the file system on every page request (provided you have the memory).
if you have plenty of memory (both for your OS and for the JVM) and you have template caching on start looking for your busy pages in fusion reactor, check for cffile, cfdirectory and other tags in these pages.... good luck.
also use of queries of queries is often a culprit in high disk io as internally a database is used which runs pages to disk on large resultsets if I remeber correctly.

WebApi - Redis cache vs Output cache

I have been studying about Redis (no experience at all - just studied theory), and after doing some research, found out that its also being used as cache. e.g. StackOverfolow it self.
My question is, if I have an asp.net WebApi service, and I use output caching at the WebApi level to cache responses, I am basically storing kind of key/value (request/response) in server's memory to deliver cached responses.
Now as redis is an in memory database, how will it help me to substitute WebApi's output caching with redis cache?
Is there any advantage?
I tried to go through this answer redis-cache-vs-using-memory-directyly, but I guess I didn't got the key line in the answer:
"Basically, if you need your application to scale on several nodes sharing the same data, then something like Redis (or any other remote key/value store) will be required."
I am basically storing kind of key/value (request/response) in server's memory to deliver cached responses.
This means that after a server restart, the server will have to rebuild the cache . That won't be the case with Redis. So one advantage of Redis over a homemade in-memory solution is persistence (only if that's an issue for you and that you did not planned to write persistence yourself).
Then instead of coding your own expiring mechanism, you can use Redis EXPIRE or command EXPIREAT or even simply specifying the expire timestamp when putting the api output string into cache with SETEX.
if you need your application to scale on several nodes sharing the same data
What it means is that if you have multiple instances of the same api servers, putting the cache into redis will allow these servers to share the same cache, thus reducing, for instance, memory consumption (1 cache instead of 3 in-memory cache), and so on...

Best EC2 setup for redis server [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We are deploying a large scale web application that uses only redis as a data store. I notice the the benchmark of our redis master is around 8000 transactions per second on EC2, far less than the stated benchmarks on dedicated hardware.
I understand that there is a performance penalty for running Redis on a virtual machine like EC2, but I would love some pointers from people who have deployed Redis in production environments on EC2 on what EC2 setup you have found most effective for getting more out of redis.
Thanks.
EC2 is probably not the best environment to run Redis on virtualized hardware, but it is a popular one, and there are a number of points to know to get the best from Redis on this platform.
I'm one of the authors of http://redis.io/topics/benchmarks and http://redis.io/topics/latency which cover most of the topics I present below. This is just a summary of the main points.
Virtualization toll
It is not specific to EC2, but Redis is significantly slower when running on a VM (in term of maximum supported throughput). This is due to the fact for basic operations, Redis does not add much overhead to the epoll/read/write system calls required to handle client connections (like memcached, or other efficient key/value stores). System calls are typically more expensive on a VM, and they represent a significant part of Redis activity (especially in benchmarks). In that conditions, a 50% decrease in term of maximum throughput compared to bare metal is not uncommon.
Of course, it also depends on the quality of the hypervisor. For EC2, Xen is used.
Benchmarking in good conditions
Benchmarking can be tricky, especially on a platform like EC2. One point often forgotten is to ensure a proper configuration for both the benchmark client and server. For instance, do not run redis-benchmark on a CPU starved micro-instance (which will likely be throttled down by Amazon) while targeting your Redis server. Both machines are equally important to get a good maximum throughput.
Actually, to evaluate Redis performance, you need to:
run redis-benchmark locally (on the same machine than the server), assuming you have more than one vCPU core.
run redis-benchmark remotely (from a different VM), on a machine whose QoS configuration is equivalent to the server machine
So you can evaluate and compare performance of the machines and the network.
On EC2, you will have the best results with second generation M3 instances (or high-memory, or cluster compute instances) so you can benefit of HVM (hardware virtualization) instead of relying on slower para-virtualization.
The fork issue
This is not specific to EC2, but to Xen: forking a large process can be really slow on Xen (it looks better with kvm). For Redis this is a big problem if you plan to use persistence: both persistence options (RDB or AOF) require the main thread to fork and launch background save or rewrite processes.
In some cases, fork latency can freeze Redis event loop for several seconds. The more memory managed by the Redis instance, the more latency.
On EC2, be sure to use a HVM enabled instance (M3, high-memory, cluster), it will mitigate the issue.
Then, if you have large memory requirements, and your application can tolerate it, consider running several smaller Redis instances on the same machine, and shard your data. It can decrease the latency due to fork operations to an acceptable level.
Persistence configuration
This is a key point to get good performance from Redis (both on VM and bare metal). So please take the time to carefully read http://redis.io/topics/persistence
If you use RDB, keep in mind the memory copy-on-write mechanism will start duplicating pages once the save background process has been forked off. So you need to ensure there is enough memory for Redis itself, plus some margin to cope with the COW. the amount of extra memory depends on your workload. The more you write in the instance, the more extra memory you need.
Please note writing a file may also consume some memory (because of the filesystem cache), so during a Redis background save, you need to account for Redis memory, COW overhead, and size of the dump file.
The machine running the Redis server must never swap. If it does, the result will be catastrophic. Contrary to some other stores, Redis is not virtual memory friendly.
With Linux, be sure to set sensible system parameters: vm.overcommit_memory=1 and vm.swappiness=0 (or a very low value anyway). Do not use old kernel versions: they are quite bad at enforcing a low swappiness (resulting in swapping when a large file is written).
If you use AOF, review the fsync options. It is a tradeoff between raw performance and durability of the write operations. You need to make a choice and define a strategy.
You also need to get familiar with the EC2 storage options. On some VM, you have the choice between ephemeral storage and EBS. On some others, you only have EBS.
Ephemeral storage is generally faster, and you will probably get less issues than with EBS, but you can easily loose your data in case of disk failure or reboot of the host, etc ... You can imagine putting RDB snapshots on ephemeral storage, and then copying the resulting files to EBS directories, as a tradeoff between performance and robustness.
EBS is remote storage: it may eat the standard network bandwidth allocated to the VM, and impact the maximum throughput of Redis. If you plan to use EBS, consider selecting the "EBS-optimized" option to establish a QoS between the standard network and storage links.
Finally, a very common setup for performance demanding instances with EC2 is to deactivate persistence on the master, and only activate it on a slave instance. It is probably less safe for the data, but it may prevent a lot of potential latency issues on the master.

how does windows azure platform scale for my app?

Just a question about Azure.
Yes, I know roughly about Azure and cloud computing. I will put it in this way:
say, in normal way, I build a program listening to a TCP port. I run this server program in a server. I also build a client program, which connects to the server through specified port. Once a client is connected, my server program will compute some thing and return to the client.
Above is the normal model, or say my program's model.
Now I want to use Azure. I want to use because my clients are too many, let's say 1 million a day. I don't want to rent 1000 servers and maintain them. ( just a assumption for the number of clients)
I have looked at the Azure pricing plan. It say about CPU and talks about small, median, large instances.
I don't know what they mean. for e.g., in my above assumed case, how many instances do I need? or at most I can get from azure for extra large (8 small instances?)
How does Azure scale for my program? If I choose small instance (my server program is very little, just compute some data and return to clients), will Azure scale for me? or Azure just gives me one virture server and let it overload?
Please consider the CPU only, not storage or network traffic.
You choose two things: what size of VM to run (small, medium, large) and how many of those VMs to run. That means you could choose a small VM (single processor) and run 100 "instances" of it (100 VMs), or you could choose a large VM (eight processors on the same server) and run 10 instances of it (10 VMs).
Today, Windows Azure doesn't automatically adjust your scale, so it's up to you to use the web portal or the Service Management API to increase the number of instances as your need increases.
One factor to consider is if your app can take advantage of multi-core environments - multi-thread, shared memory, etc. to improve its scale. If it can, it may be better to use 5 2x core (i.e. medium) VMs than 10 1x core (small) VMs. You may find in some cases that 2 4x core VMs perform better than 5 2core.
If your app is not parallel/multi-core, then you could just do some 'x' number of small VMs. The charges are linear anyway - i.e. a 2core VM is twice the cost of a single core.
Other factors would include the scratch disk size & memory available in the VM.
One other suggestion - you may want to look into leveraging the Azure queues (i.e. have the client post to the queue and the workers pull from there). This would allow you to transparently (to the client) increase/decrease the workers w/out worrying about connections, etc. Also, if a processing step failed and crashed your instance the message would persist and be picked up by one of the others.
I suggest you also monitor, evaluate, and perfect the results of your Azure configuration.
For "Monitoring Applications in Windows Azure" (and performance) please reference
http://channel9.msdn.com/learn/courses/Azure/Deployment/DeployingApplicationsinWindowsAzure/Exercise-3-Monitoring-Applications-in-Windows-Azure/
There is also a good blog entry titled "Visualizing Windows Azure diagnostic data"
Check out http://www.paraleap.com - simple service for automatically adjusting number of instances that you have according to demand.

Resources