Slow download speed from Azure Storage - performance

While testing our storage virtualization solution for a client we found interesting situation:
When object is downloaded directly from Azure Blob Storage to
outside of Azure - the speed (tested by wget) is around 0.2 to 0.5 MB/s
When the same object is downloaded to a VM in Azure - the speed is around 60 MB/s
It may seem like a network issue, but....
When the same object is downloaded via our proxy that also runs in the same Azure region and actually downloads the same object from Azure Blob Storage without any caching (and converts to S3 which is irrelevant here) - the speed is 30 MB/s and is basically limited by client's network.
I tried different regions and the results are similar.
Is Azure somehow throttles traffic coming from Blob storage to outside of Azure?

Your proxies/VMs or whatever runs in the same datacenter, network requests does not leave the local network and the speed then depends on the infrastructure. (routers, firewalls, cables, etc ...).
I'm sure they won't limit speed between their own infrastructure so services hosted on the same network works at full speed.
When you download from outside the datacenter, it depends on the outside architecture. And then your speed depends on more factors, including your internet downloading speed. Maybe they also limit uploading speed on their side but 0.5 MB would be very low.
EDIT: even between different region, you will benefit of 2 VMs connected through optic backbones. That's normal to have a high speed. They can also consider the traffic is "inside" Azure even if it's not the same datacenter and so not limit downloading speed.
Did you tried to download from a fast internet connection outside of the datacenter ? For exemple setting up a VM on digitalocean and trying a wget from this machine ?

Related

Where can I host freely many images with a constant url?

I am looking for a hosting service to host several images. This hosting service must satisfy the following 2 criteria:
1) Be free
2) Constant prefix URL, e.g.:
https://www.hostfiles.com/img/img1.png
https://www.hostfiles.com/img/img2.png
https://www.hostfiles.com/img/img3.png
...
Not indispensable but preferably:
3) Upload capacity: +1GB/month
4) Transfer capacity: +8GB/month
I have been using Google's Firebase (which meets these criteria) but would like to test other alternatives.
Some Alternatives to Firebase (as of June 2018):
Amazon Web Services: https://aws.amazon.com/mobile/
AWS offers a free tier with 5GB file storage, and 50GB data transfer. This is the solution which will likely suit you the best.
More details: https://aws.amazon.com/mobile/pricing/
Other Solutions:
Backendless: https://backendless.com
Solution offers 1GB file storage, and 60 API calls per minute. The exact amount of data transfer is not specified.
Google Drive: https://drive.google.com
Solution offer s 15GB file storage, and data transfer limit is not specified but the threshold should be rather large. Note: Google Drive is not a solution which can be used for static website hosting. Publishing images on the other hand seems doable.
Tutorial: https://www.labnol.org/internet/host-website-on-google-drive/28178/
Kinvey: https://www.progress.com/kinvey/
Solution does not specify the limits in the free tier.
Byet Host: https://byet.host
I have read that you can have 1GB file storage, 50GB data transfer, however I am unable to verify this.
There are multiple services, for example, StackOverflow is using imgur.com, depends on your use it may be free or not, to know more about the limits check: https://help.imgur.com/hc/en-us/articles/115000083326-What-files-can-I-upload-What-is-the-size-limit-
On top of the service that you chose or either if you go with your own homemade solution you could always benefit from a CDN, for example, Cloudflare and its Mirage Image Optimization

Hosting Plastic SCM on Amazon?

I'm looking to setup Plastic SCM on a hosted server. Considering an Amazon EC2 instance for this. Any recommendations would be appreciated.
Minimum server specs for good performance
Tips on setup/config
Windows v. Linux
MySQL v. SQL Server v. SQL Express
Thanks!
We have extensively tested Plastic on EC2, in fact it is one of the main environments where we run Plastic SCM tests.
It all depends on the load that the server needs to handle.
Tiny server for occasional pushing and pulling
For instance, the demo server we use to handle the evaluation guide runs on a tiny EC2 instance, with Linux and MySQL and a total RAM of 512Mb. It is good for occasional pushing and pulling but of course not to be used under heavy load.
Big server for extreme load
On the other hand, we use a more powerful server to run 'load tests' with 300 concurrent bot clients doing about 2000 checkins per minute on a big repository. We detail the specs here. Basically, for higher perf:
20GB RAM
2 x Intel Xeon X5570
4 core per processor (2 threads per core) (2.7Ghz) – 16 logical cores – Amazon server running Windows Server 2012 + SQL Server 2012
Central vs distributed development
That being said, remember that if you setup a cloud server your bigger restriction for heavy load won't be the server itself but the network. If you plan to work in a centralized way (your workspaces directly work connected to the cloud server) then network will definitely be a consideration. Every checkin, every create branch, every switch to a new branch will mean connecting to the remote server and chances are that you won't get the same network speed you get on a LAN.
The other option is that you work distributed: you have your own Plastic repositories on the developer machines and you just push/pull to the central server. If that's the case it will work great and the requirements won't be high at all.
Specs for a 15-users team working distributed + Amazon EC2 server
If that's your case I'd go for:
Linux server + MySQL (cheaper than windows and works great)
Make sure you install the server with the packages we provide. We include our own build of Mono that will make wonders. Remember to set up the mono server to run with sgen (the latest Mono Garbage Collector).
Install MySQL (or MariaDB). Follow the instructions we provide here. Remember we do need to configure the max_allowed_packet in MySQL so it allows 10Mb packages (we use 4Mb but set it to 10). Everything is explained on the guide.
Use "user/password" security mode. Remember to configure the permissions so only your team can access :-)
For 15 users a m1.small instance will be more than enough (1.75Gb of RAM and a little bit of CPU).
Configure SSL and remove regular TCP so that your server is always secured. Check this.
We added an option in 5.4 that is able to store all data in an encrypted way, so even if the central repo is hacked in Amazon (unlikely) nobody will access your data.
Clients (I'll assume you're using Windows):
Install both client and server (remember we install a server to handle the local replicas of the repos).
Configure it in UP (user/password) mode.
Push and pull from the remote.
Alternatively you can also configure the SQLite backend (the one I've been using for 4 years now on Windows) which is extremely fast. By default, on Windows, a SQL Server Compact Edition (embedded) will be installed. It is ok too.
Connect to the server using SSL.
Hope it helps :-)

Scaling Tigase XMPP server on Amazon EC2

Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.

using aws ec2 as application server... so poor performance

I am planning to host a server in several countries(us, south east asia..)
I'm testing ec2 (ebs backed, large size) and getting horrible results.
The server just isn't fast enough. cpu/hard-drive/rount trip time
I am comparing the speed with my home linux box(dual i5 cpu,2gig memory,sata)
I feel my home server is faster about 10 times.
(compared compile time of heavy libraries, performing the same db updates.. and so on)
The server application is similar to web servers in what it does(little cpu usage, many db access(mysql in the ec2 root partition).
Am I missing something obvious? like ebs backed ec2 takes time to get stabilized after booting up or something.
Maybe, connecting to cross-continent(eg, from asia to US based ec2) is no-no in aws world?
Hope there are some explanations why I'm getting so poor performance with large size ec2.
I'd like to ask if my planned usage of aws is going to work at all, or should I look for other services other than aws.
if you want to monitor your EC2 instance, consider using Amazon's cloudwatch service. This service can monitor all your instance's resources, such as CPU utilization, memory usage, network latency, and request counts. It's also free in the amazon free tier.
I know some users report that after switching from amazon aws to rackspace cloud, their applications run faster without adding extra expenses. you might consider giving rackspace a test.

100mbps Dedicated Server same download speed as a Shared Host!

I have two specs from two different hosts I am using:
(a) Dedicated server with Full duplex 100Mbits internet connection ($140 per month)
(b) Shared Host on a server that has 100Mbits internet connection ($7 per month)
I have tested my application which downloads from other servers and lets users download from my site in turn. I have tested this again and again and it takes the same time to download files! But the dedicated is much faster in the final download to the clients computer.
Firstly, are there any Linux commands or tools I can use to test bandwidth properly for each server?
Secondly, why the hell do they have the same download speed from other servers??
Please shed some light on this as I feel I've been wasting money for no reason!!
Thanks all
First, you can use iperf to test your network speeds. Second, you're not paying for the speed, you're paying for the power and flexibility of having essentially your own server configured however you want. With a shared host, your site is most likely on a machine with a hundred other sites, each competing for resources.
Also, the bottleneck is probably not on your end or on your host's end, but rather somewhere in between the content you're fetching and your servers.
if i read correctly, your shared server is just as fast as the dedicated when fetching a file, but much slower when serving it.
I'd say that the box your shared server is in has the "out" bandwidth mostly used by the other client's slices, while the "in" bandwidth is mostly unused, so you get almost full performance.
sounds right, since serving files is a lot more common task than fetching them.
The big difference between shared hosting and dedicated hosting is that with dedicated hosting, you're the only account using that box. With shared hosting, there could be (and most likely are) thousands of other web sites hosted on it.
If one of those sites goes wonky and takes the whole box with it, your site goes too. On a dedicated box, the only site that's going to go wonky is yours.
With dedicated, you probably also have full admin rights to the box, which you probably don't have with the shared host.
Well, one possiblity is that the shared site is on a host that has very little load from the other shared sites. If all the shared sites are just sitting there getting very little hits, then your site basically is getting full use of the box, so its no different from a dedicated box.
But if those other sites start getting traffic, your site will be impacted.
Not sure if the shared site is full duplex or not, but that doesn't always make a difference (not an expert there).
Perhaps the servers they are downloading from are the bottleneck? You could have a dedicated gigabit pipe but it won't help if you can only get 10mbps from the other servers.
Remember the benefit of a dedicated host is that your performance will not be affected by other processes on the machine. The extra money guarantees you that 100mbit, not that you'll see better performance than a hosted machine at any given time.

Resources