how does windows azure platform scale for my app? - windows

Just a question about Azure.
Yes, I know roughly about Azure and cloud computing. I will put it in this way:
say, in normal way, I build a program listening to a TCP port. I run this server program in a server. I also build a client program, which connects to the server through specified port. Once a client is connected, my server program will compute some thing and return to the client.
Above is the normal model, or say my program's model.
Now I want to use Azure. I want to use because my clients are too many, let's say 1 million a day. I don't want to rent 1000 servers and maintain them. ( just a assumption for the number of clients)
I have looked at the Azure pricing plan. It say about CPU and talks about small, median, large instances.
I don't know what they mean. for e.g., in my above assumed case, how many instances do I need? or at most I can get from azure for extra large (8 small instances?)
How does Azure scale for my program? If I choose small instance (my server program is very little, just compute some data and return to clients), will Azure scale for me? or Azure just gives me one virture server and let it overload?
Please consider the CPU only, not storage or network traffic.

You choose two things: what size of VM to run (small, medium, large) and how many of those VMs to run. That means you could choose a small VM (single processor) and run 100 "instances" of it (100 VMs), or you could choose a large VM (eight processors on the same server) and run 10 instances of it (10 VMs).
Today, Windows Azure doesn't automatically adjust your scale, so it's up to you to use the web portal or the Service Management API to increase the number of instances as your need increases.

One factor to consider is if your app can take advantage of multi-core environments - multi-thread, shared memory, etc. to improve its scale. If it can, it may be better to use 5 2x core (i.e. medium) VMs than 10 1x core (small) VMs. You may find in some cases that 2 4x core VMs perform better than 5 2core.
If your app is not parallel/multi-core, then you could just do some 'x' number of small VMs. The charges are linear anyway - i.e. a 2core VM is twice the cost of a single core.
Other factors would include the scratch disk size & memory available in the VM.
One other suggestion - you may want to look into leveraging the Azure queues (i.e. have the client post to the queue and the workers pull from there). This would allow you to transparently (to the client) increase/decrease the workers w/out worrying about connections, etc. Also, if a processing step failed and crashed your instance the message would persist and be picked up by one of the others.

I suggest you also monitor, evaluate, and perfect the results of your Azure configuration.
For "Monitoring Applications in Windows Azure" (and performance) please reference
http://channel9.msdn.com/learn/courses/Azure/Deployment/DeployingApplicationsinWindowsAzure/Exercise-3-Monitoring-Applications-in-Windows-Azure/
There is also a good blog entry titled "Visualizing Windows Azure diagnostic data"

Check out http://www.paraleap.com - simple service for automatically adjusting number of instances that you have according to demand.

Related

Live video streaming performance testing for 10k VU

In basic terms, I want to make sure that our Livestreaming Shows can without issues have 10,000 viewers at one time. That the following things are working well:
Video Quality
Video Resolution
Video Latency
Do this can be done using local machine , I read local machine cannot produce such huge number of requests.
Do I needed to purchase addition premium platform or it can be done using jmeter alone.
Do this can be done using local machine - we don't know, it depends on your machine hardware specifications. I would take the following steps:
set up monitoring of the machine's resources consumption like CPU, RAM, etc. If you don't have better alternatives - you can go for JMeter PerfMon Plugin
make sure to follow JMeter Best Practices
start with 1 virtual user and gradually increase the load till 10 000 at the same time looking at the resources consumption
when any of monitored resources starts exceeding reasonable threshold, i.e. 90% of total available capacity stop the test and check how many users were online at this stage via i.e. Active Threads Over Time plugin
this is how many users you can simulate for particular this test from particular this machine. If it's 10 000 - you're good to go with a single one, if it's less - divide 10 000 to the number of users you were able to mimic and this will be the number of machines of that hardware specifications you will need for the test
JMeter out of box can be run in the clustered mode so given you have the machines you can use as the load generators there is no need to purchase anything else. If you don't - you can rent VMs from i.e. MS Azure or AWS EC2 or whatever is your favorite cloud provider. In this case you will need to pay for the machine/computing time according to the vendor price list
There are companies which offer "JMeter as a service", normally they charge more than cloud VM vendors but you won't need to worry about JMeter distributed configuration, results collection, etc. They are BlazeMeter, Flood.io, Redline13, etc.

NServiceBus: is this an illusion or a best practice?

Given an NServiceBus microservice that uses MSMQ, When I deploy few instances of that service into the same machine, Am I scaling out my application?, Am I improving the performance? or one instance is enough. shall I instead have a more powerful machine to handle messages?
No, running multiple instances on a single machine will not make things run faster, it is only making execution less efficient.
However, it might be that a single instance isn't giving you the expected performance even though your system monitoring indicates there are plenty of resources to spend but not used. In that case you might want tweak the configuration of your NServiceBus endpoint by configuration the amount of allowed parallel message execution.
On the following link you see how you can increase the concurrency:
https://docs.particular.net/nservicebus/operations/tuning
You can further scaleout by actually using multiple machines but if all these endpoints share the same central database your network or database server can easily become the bottleneck. If you consider deploying or scaling out your endpoints across multiple machines make sure that any storage solutions are also scaled out for these not to become your bottleneck.
Zero downtime upgrades/deployments
The only reason to have multiple instance on the same box is for example when deploying a new version, you can temporarily run the current and the new version side-by-side to achieve zero downtime deployments.

Using parallel-level in azcopy when transferring data from an Azure VM to an AWS VM

How does one precisely utilize --parallel-level when using azcopy on Ubuntu 14.04 to speed up download performance?
I chose a value of 100, but without any reason (just to see what happens). I can't find related documentation online.
I'm using it to transfer files from an Azure blob to an AWS EC2 VM. It's just a t2.micro instance - however I'm using it for testing purposes and once I get the hang of azcopy, I'm open to using a bigger instance. I have to transfer ~50 GB of data, mostly low res image (i.e. lots of files).
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux?toc=%2fazure%2fstorage%2fblobs%2ftoc.json
Option --parallel-level specifies the number of concurrent copy
operations. By default, AzCopy starts a certain number of concurrent
operations to increase the data transfer throughput. The number of
concurrent operations is equal eight times the number of processors
you have. If you are running AzCopy across a low-bandwidth network,
you can specify a lower number for --parallel-level to avoid failure
caused by resource competition.
Under most of the cases you don't need to specify this option, only when you're running AzCopy across a low-bandwidth network, you can specify a lower number of this option.

What is ECU(AWS)? What does that mean?

I did a test performance for my server(1 ECU), but My server only arrived 1000 users in testing, how many ECU I need for 15000 users?
The ECU (Elastic Compute Unit) was a unit of measure designed to provide a relative measure of performance between Amazon EC2 instance types. For example, an m1.small instance had 1 ECU, an m1.large had 2 ECUs, etc.
However, it is no longer possible to summarize the power of an instance in a single number. Some instances have more RAM, some have more CPUs or more powerful CPUs, GPUs, enhanced networking and even burst capabilities.
Therefore, the ECU has slowly disappeared from AWS services and documentation. It can still be viewed as an optional column in the Amazon EC2 Launch Instance console.
The ECU is definitely not a good measure of "the number of users" that a system can support. The number of users that a system can support are totally dependent upon the application architecture and its system requirements. When testing the number of users a system can support, closely monitor all system components (eg CPU load, RAM utilization, disk queues) to identify the bottleneck. You can then try to modify the application or improve the bottleneck to provide better application performance.

Policy for EC2 and ELB based on number of transcoding processes on each instance

I need to transcode massive number of audio files on a series of auto-scaling instances behind an ELB. The core of transcoding script is based on Node.Js and FFMPEG. Queuing is impossible because users are not patience! I need to control the number of transcodings on each instance to avoid CPU 100% problem.
My questions:
A- Is there any way to define a policy for ELB to control the number of connections to each instance? if not is there any parameter to control average CPU utilization on each instance and add a new one after triggering level? (I have found this slide but it is not complete) If it adds a new instance on the fly how much it takes time the new instance be 100% operative to serve the user ( I mean does auto scaling have long latency?)
B- Is there another alternative architecture to achieve same transcoding solution? (I have included my current idea to this answer as a drawing). I can not use third party solutions like Transcoding.com I need to have my native solution.
C- I use node.js for each instance and by socket to the user browser show progress. From browser side I send regularly some ajax request to the node.js side to get the progress information. Does this mechanism has problem with sticky session?
Thanks you.
If your scaling needs to take place in response to individual requests on the server (i.e. a single request would require X number of machines to execute in desired timeframe), then autoscaling is probably not going to be the answer for you, as you will have delay as the new instances become active. You will also potentially have much higher cost to run service in such manner as you could scale up and time a number of times in response to individual request, charging you for one hour minimum for each instance that is started.
If however you are concerned with autoscaling, to for example, increase your fleet 50% during peak times when you get request volume spikes (i.e. you already have many servers serving many requests, but you just need to keep latency down during peak hours by adding more instances), then autoscaling should probably work just fine for you.
There are any number of triggers you can configure to control scaling events in such a case.
ELB does support session affinity ("sticky" sessions).
You will want to use an AWS SDK. Normally you'd use one of the official ones for C#, Ruby etc. Since you're on node.js, try using this SDK on github to monitor, throttle and create instance connection pools etc.
https://github.com/awssum/awssum
there's also AWS2JS
https://github.com/SaltwaterC/aws2js

Resources