Accelerated downloads with HTTP byte range headers

Accelerated downloads with HTTP byte range headers - ruby

Has anybody got any experience of using HTTP byte ranges across multiple parallel requests to speed up downloads?
I have an app that needs to download fairly large images from a web service (1MB +) and then send out the modified files (resized and cropped) to the browser. There are many of these images so it is likely that caching will be ineffective - i.e. the cache may well be empty. In this case we are hit by some fairly large latency times whilst waiting for the image to download, 500 m/s +, which is over 60% our app's total response time.
I am wondering if I could speed up the download of these images by using a group of parallel HTTP Range requests, e.g. each thread downloads 100kb of data and the responses are concatenated back into a full file.
Does anybody out there have any experience of this sort of thing? Would the overhead of the extra downloads negate a speed increase or might this actually technique work? The app is written in ruby but experiences / examples from any language would help.
A few specifics about the setup:
There are no bandwidth or connection restrictions on the service (it's owned by my company)
It is difficult to pre-generate all the cropped and resized images, there are millions with lots of potential permutations
It is difficult to host the app on the same hardware as the image disk boxes (political!)
Thanks

I found your post by Googling to see if someone had already written a parallel analogue of wget that does this. It's definitely possible and would be helpful for very large files over a relatively high-latency link: I've gotten >10x improvements in speed with multiple parallel TCP connections.
That said, since your organization runs both the app and the web service, I'm guessing your link is high-bandwidth and low-latency, so I suspect this approach will not help you.
Since you're transferring large numbers of small files (by modern standards), I suspect you are actually getting burned by the connection setup more than by the transfer speeds. You can test this by loading a similar page full of tiny images. In your situation you may want to go serial rather than parallel: see if your HTTP client library has an option to use persistent HTTP connections, so that the three-way handshake is done only once per page or less instead of once per image.
If you end up getting really fanatical about TCP latency, it's also possible to cheat, as certain major web services like to.
(My own problem involves the other end of the TCP performance spectrum, where a long round-trip time is really starting to drag on my bandwidth for multi-TB file transfers, so if you do turn up a parallel HTTP library, I'd love to hear about it. The only tool I found, called "puf", parallelizes by files rather than byteranges. If the above doesn't help you and you really need a parallel transfer tool, likewise get in touch: I may have given up and written it by then.)

I've written the backend and services for the sort of place you're pulling images from. Every site is different so details based on what I did might not apply to what you're trying to do.
Here's my thoughts:
If you have a service agreement with the company you're pulling images from (which you should because you have a fairly high bandwidth need), then preprocess their image catalog and store the thumbnails locally, either as database blobs or as files on disk with a database containing the paths to the files.
Doesn't that service already have the images available as thumbnails? They're not going to send a full-sized image to someone's browser either... unless they're crazy or sadistic and their users are crazy and masochistic. We preprocessed our images into three or four different thumbnail sizes so it would have been trivial to supply what you're trying to do.
If your request is something they expect then they should have an API or at least some resources (programmers) who can help you access the images in the fastest way possible. They should actually have a dedicated host for that purpose.
As a photographer I also need to mention that there could be copyright and/or terms-of-service issues with what you're doing, so make sure you're above board by consulting a lawyer AND the site you're accessing. Don't assume everything is ok, KNOW it is. Copyright laws don't fit the general public's conception of what copyrights are, so involving a lawyer up front can be really educational, plus give you a good feeling you're on solid ground. If you've already talked with one then you know what I'm saying.

I would guess that using any p2p network would be useless as there is more permutations then often used files.
Downloading parallel few parts of file can give improvement only in slow networks (slower then 4-10Mbps).
To get any improvement of using parallel download you need to ensure there will be enough server power. From you current problem (waiting over 500ms for connection) I assume you already have problem with servers:
you should add/improve load-balancing,
you should think of changing server software for something with more performance
And again if 500ms is 60% of total response time then you servers are overloaded, if you think they are not you should search for bottle neck in connections/server performance.

Related

Batching requests over HTTP2

Is it possible to get a better throughput from our servers, if we make one large HTTP request, as opposed to multiple smaller HTTP requests over HTTP2.
As per my understanding it should not produce any significant difference in performance since with HTTP2 we can have multiple requests multiplexed over one TCP connection.

Yes at a network level one large request will be more efficient than multiple small requests. This is due to the overhead of making a network request.
This is also why concatenating CSS and JavaScript and spriting for images were recommended under HTTP/1.1 so the amount of data sent was the same, but the amount of requests was considerably lower. In fact due to the way compression like gzip works the amount of data was often smaller when sending large requests.
HTTP/2 was designed to make the costs of HTTP requests a lot smaller though the reuse of a single TCP connection using multiplexing. In theory this would allow us to give up concatenation and spriting. The reality has been a little less than perfect though - usually due to browser inefficiencies rather than HTTP/2’s fault. The bottleneck has just moved and we need to optimise browsers for the new world. So, for now, some level of concenation and spriting is still recommended.
Getting back to your question, then yes it should have single effects at that network level and in fact HTTP/1.1 and HTTP/2 may even be similar in performance if you do this.
However beyond the network level you may discover other reasons not to bundle into fewer files. If you have one large JavaScript file for example then the browser must wait for it all to be downloaded before it can be parsed, compiled and run. You may be better getting smaller, more important JavaScript downloaded first. Similarly with image spriting you may be waiting for the entire sprite file to download before a single image is display.
Then there are the caching implications. Changing a single line of JS or adding a single image to the image Sprite requires creating a whole new large file, meaning the older one cannot be used and the whole thing needs to be downloaded again in its entirety.
Plus having large files can be more complicated to implemented and managed. They require a build step (maybe not a big deal as many sites do) and creating and managing image sprites through CSS is often more difficult.
Also if using this to stick with HTTP/1.1, then you may be missing out on are the other benefits of HTTP/2 including HPACK header compression and HTTP/2 push (though this is also more tricky to get right than initially thought/hoped!).
It’s a fascinating topic that I’ve spent a lot of time on, and best advice (as always!) is to understand the technology and test, test, test!

LAMP stack performance under heavy traffic loads

I know the title of my question is rather vague, so I'll try to clarify as much as I can. Please feel free to moderate this question to make it more useful for the community.
Given a standard LAMP stack with more or less default settings (a bit of tuning is allowed, client-side and server-side caching turned on), running on modern hardware (16Gb RAM, 8-core CPU, unlimited disk space, etc), deploying a reasonably complicated CMS service (a Drupal or Wordpress project for arguments sake) - what amounts of traffic, SQL queries, user requests can I resonably expect to accommodate before I have to start thinking about performance?
NOTE: I know that specifics will greatly depend on the details of the project, i.e. optimizing MySQL queries, indexing stuff, minimizing filesystem hits - assuming web developers did a professional job - I'm really looking for a very rough figure in terms of visits per day, traffic during peak visiting times, how many records before (transactional) MySQL fumbles, so on.
I know the only way to really answer my question is to run load testing on a real project, and I'm concerned that my question may be treated as partly off-top.
I would like to get a set of figures from people with first-hand experience, e.g. "we ran such and such set-up and it handled at least this much load [problems started surfacing after such and such]". I'm also greatly interested in any condenced (I'm short on time atm) reading I can do to get a better understanding of the matter.
P.S. I'm meeting a client tomorrow to talk about his project, and I want to be prepared to reason about performance if his project turns out to be akin FourSquare.

Very tricky to answer without specifics as you have noted. If I was tasked with what you have to do, I would take each component in turn ( network interface, CPU/memory, physical IO load, SMP locking etc) and get the maximum capacity available, divide by rough estimate of use per request.
For example, network io. You might have 1x 1Gb card, which might achieve maybe 100Mbytes/sec. ( I tend to use 80% of theoretical max). How big will a typical 'hit' be? Perhaps 3kbytes average, for HTML, images etc. that means you can achieve 33k requests per second before you bottleneck at the physical level. These numbers are absolute maximums, depending on tools and skills you might not get anywhere near them, but nobody can exceed these maximums.
Repeat the above for every component, perhaps varying your numbers a little, and you will build a quick picture of what is likely to be a concern. Then, consider how you can quickly get more capacity in each component, can you just chuck $$ and gain more performance (eg use SSD drives instead of HD)? Or will you hit a limit that cannot be moved without rearchitecting? Also take into account what resources you have available, do you have lots of skilled programmer time, DBAs, or wads of cash? If you have lots of a resource, you can tend to reduce those constraints easier and quicker as you move along the experience curve.
Do not forget external components too, firewalls may have limits that are lower than expected for sustained traffic.
Sorry I cannot give you real numbers, our workloads are using custom servers, high memory caching and other tricks, and not using all the products you list. However, I would concentrate most on IO/SQL queries and possibly network IO, as these tend to be more hard limits, than CPU/memory, although I'm sure others will have a different opinion.

Obviously, the question is such that does not have a "proper" answer, but I'd like to close it and give some feedback. The client meeting has taken place, performance was indeed a biggie, their hosting platform turned out to be on the Amazon cloud :)
From research I've done independently:
Memcache is a must;
MySQL (or whatever persistent storage instance you're running) is usually the first to go. Solutions include running multiple virtual instances and replicate data between them, distributing the load;
http://highscalability.com/ is a good read :)

Gauging a web browser's bandwidth

Is it possible to gauge a web browsers upload and/or download speed by monitoring normal http requests? Ideally a web application would be able to tell the speed of a client without any modifications and without client-side scripting like JavaScript/Java/Flash. So even if a client was accessing the service with a library like Curl it would still work. If this is possible, how? If its not possible, why? How accurate can this method be?
(If it helps assume PHP/Apache, but really this is a platform independent question. Also being able to gauge the upload speed is more important to me.)

Overview
You're asking for what is commonly called "passive" available bandwidth (ABW) measurement along a path (versus measuring a single link's ABW). There are a number of different techniques1 that estimate bandwidth using passive observation, or low-bandwidth "Active" ABW probing techniques. However, the most common algorithms used in production services are active ABW techniques; they observe packet streams from two different end-points.
I'm most familiar with yaz, which sends packets from one side and measures variation in delay on the other side. The one-sided passive path ABW measurement techniques are considered more experimental; there aren't solid implementations of the algorithms AFAIK.
Discussion
The problem with the task you've asked for is that all non-intrusive2 ABW measurement techniques rely on timing. Sadly, timing is a very tricky thing when working with http...
You have to deal with the reality of object caching (for instance, akamai) and http proxies (which terminate your TCP session prematurely and often spoof the web-server's IP address to the client).
You have to deal with web-hosts which may get intermittently slammed
Finally, active ABW techniques rely on a structured packet stream (wrt packet sizes and timing), unlike what you see in a standard http transfer.
Summary
In summary, unless you set up dedicated client / server / protocol just for ABW measurement, I think you'll be rather frustrated with the results. You can keep your ABW socket connections on TCP/80, but the tools I have seen won't use http3.
Editorial note: My original answer suggested that ABW with http was possible. On further reflection, I changed my mind.
END-NOTES:
---
See Sally Floyd's archive of end-to-end TCP/IP bandwidth estimation tools
The most common intrusive techniques (such as speedtest.net) use a flash or java applet in the browser to send & receive 3-5 parallel TCP streams to each endpoint for 20-30 seconds. Add the streams' average throughput (not including lost packets requiring retransmission) over time, and you get that path's tx and rx ABW. This is obviously pretty disruptive to VoIP calls, or any downloads in progress. Disruptive meausurements are called bulk transfer capacity (BTC). See RFC 3148: A Framework for Defining Empirical Bulk Transfer Capacity Metrics. BTC measurements often use HTTP, but BTC doesn't seem to be what you're after.
That is good, since it removes the risk of in-line caching by denying http caches an object to cache; although some tools (like yaz) are udp-only.

Due to the way TCP connections adapt to available bandwidth, no this is not possible. Requests are small and typically fit within one or two packets. You need a least a dozen full-size packets to get even a coarse bandwidth estimate, since TCP first has to scale up to available bandwidth ("TCP slow start"), and you need to average out jitter effects. If you want any accuracy, you're probably talking hundreds of packets required. That's why upload rate measurement scripts typically transfer several megabytes of data.
OTOH, you might be able to estimate round-trip delay from the three-way handshake and the timing of acks. But download speed has at least as much impact as upload speed.

There's no support in javascript or any browser component to measure upload performance.
The only way I can think of is if you are uploading to a page/http handler, and the page is receiving the incoming bytes, it can measure how many bytes it is receiving per second. Then store that in some application wide dictionary with a session ID.
Then from the browser you can periodically poll the server to get the value in the dictionary using the session ID and show it to user. This way you can tell how's the upload speed.

You can use AJAXOMeter, a JavaScript library which meassures your up- and download speed. You can see a live demo here.

That is not feasible in general as in-bound and out-bound bandwidth frequently is not symmetric. Different ISPs have significantly different ratios here that can vary on even time of the day basis.

How is it possible to limit download speed?

Lately I've asked this question. But the answer doesn't suit my demands, and I know that file hosting providers do manage to limit the speed. So I'm wondering what's the general algorithm/method to do that (I do mean downloading technique) - in particular limiting single connection/user download speed.
#back2dos I want to give a particular user a particular download speed (corresponding to hardware capabilities of course) or in other words give user ability to download some particular file with lets say 20kb/s. Surely I want to have an ability to change that to some other value.

You could use a token bucket ( http://en.wikipedia.org/wiki/Token_bucket)

Without mention of platform/language, it's difficult to answer, but a "leaky bucket" algorithm would probably be the best fit:
http://en.wikipedia.org/wiki/Leaky_bucket

Well, since this answer is really general, here's a very simple approach for plain TCP:
You put the resource handlers of all download connection into a list, paired up with information about what data is requested, and loop through it. Then you write a chunk of the required data onto the socket, maybe about 1.5K, which is the most commonly used maximum segment size, as far as I know. When you're at the and of the list, you start over. Before starting over, simply wait to get the desired average bandwidth.
Please note, if too many clients have lower bandwidth than you allow, then your TCP buffer is likely explode. some TCP bindings permit finding the size of currently buffered data for one socket. if it exceeds a threshold, you can simply skip the socket.
Also, if too many clients are connected, you will actually not have enough time to write to all the sockets, thus after one loop, you "have to wait for a negative time". Increasing the chunk size might speed up things in such scenarios, but at some point your server will stop getting faster.
A more simple approach is to do this on the client side, but this may generate a lot of overhead. The dead simple idea is to have the client request 1K every 50ms (assuming you want 20KB/s). You can even do that over HTTP, although I strongly suggest bigger chunk size, since HTTP has enourmous overheads.
My guess is, the best is to try to find a webserver capable of doing such things out of the box. I think Apache has a number of modules for al kinds of quota.
greetz
back2dos

How to build an image web server?

I am trying to build an web image server. It serves images to lots of clients(10 thousands+) simultaneously. (It will be a easier problem if there is fewer clients.) What is a good way to do so, with time delay as small as possible.
I am new to this field. Any suggestion will be welcomed.

Definitely look around for a good delivery service. Akamai is the best known.
if you really want to do it on your own, forget about Apache/IIS. much more appropriate are 'light' webservers. Two very good are lighthttp and NginX (wiki). NginX in particular, has a really solid performance.
Edit: Content Distribution Networks (CDNs) have flourished in the last few years, and it's much easier to find easier and cheaper ones. In particular, it's quite simple to put your static content in Amazon's S3 and use CloudFront.

If you want to design the fastest static file webserver with lowest latency, here's how I would do it.
Use an event loop to detect which sockets are ready
Put those sockets in a queue
Create a stack of threads to deal with sockets (1 for each core). When they finish, put them back on the stack.
Assign work to threads.
Cache all image files in memory.
This is essentially what IO completion ports are minus the caching of files. This model is available in Windows and Solaris.
http://technet.microsoft.com/en-us/sysinternals/bb963891.aspx
(source: microsoft.com)

With that many clients, you may want to look into using a content delivery network (such as Akamai). On the surface it might seem expensive, but if you really look at the cost of maintaining the hardware and particularly the cost of bandwidth, it starts to make economic sense.

How are the images to be served? Are the images generated on the fly? or are they static and stored as .jpg or other format on the file system?
Either way, I'd use ASP.NET .ashx (generic handlers) and use the System.Drawing classes.
You'll also want to setup TCP/IP Network Load Balancing per http://support.microsoft.com/kb/323431

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio