I plan to make a system for distributing VM images among several stations using BitTorrent protocol. Current system looks as follows:
|-[room with 20PCs]-
[srv_with_images]-->--[1Gbps-bottlneck]-->--|
|-[2nd room with 20PCs]-
All the PCs at once are downloading images through the 1Gbps bottleneck every night and it takes a lot of time. We plan to use BitTorrent to speed up the distribution of images using peer-to-peer exchange between all the PCs. However there is problem - when image appears on the origin server it starts to act as a single seed from whom all peers are downloading the file simultaneously. So we again fall into the trap of the bottleneck. To speed up the distribution we need to implement (at least we think that we need) an abstract high-level algorithm that:
Ensures the on the beggining when new image arrives only small portion of stations will be downloading the image from origin,
When the small portion will start seeding, rest of, or another bigger portion of PCs will start peering, or they will be peering only from the PCs in class, not from origin,
It shouldnt rely on "static" list of initial peers, as some computers may be offline during the day. We cant assume that any of the computers will always be up&running. A peer may also be turned off anytime.
Are there any specific algorithms that can help us desinging this? The most naive way would be to just keep active servers list somewhere and make some daemon that will be choosing initial peers for each torrent. But maybe there are some more elegant ways to do that kind of stuff??
Another option would be to ensure that only some peers ca download from origin, and rest of the peers do download from each other(but not from origin) - is it possible in BitTorrent protocol?
If you are using bittorrent no special coordination is necessary.
Peers behind the bottleneck can directly talk to each other and share the bandwidth. Using the rarest-first piece picking algorithm will mostly ensure that they download different pieces from the server and then share them with each other.
LSD may help to speed up lan-local discovery but it should work with a normal tracker too if there are no NAT shenanigans in play.
Related
This question is for an indication/hunch. I realize that it may have been discussed before and that there is no good, scientific answer; nevertheless i seek for experienced/qualified opinions, as there are no definite answers to be found. An indication will be valuable as a clue, hence i ask the community to allow a bit of fuzzyness.
Background:
Consider a very-large-area 3D simulation
with n participants (peers, people behind NAT) distributed over multiple cities.
where each participant is seen as one "moving object" in the simulation (hence each moving object is owned by a peer).
where each peer shall see all other moving object correctly (ie. positional updates are needed).
(The entire simulation is larger, so we now focus on one single blob, and consider it to be the entire "world").
Scale:
World/blob size 10x10 kilometers (almost flat world).
Object size: Length max 10 meters
(We omit things like occlusion, optmisations, balancing etc. Assume that all there is needs to be seen and updated).
The nature of "moving object":
it is physically/positionally restless (compare to a boat in big
waves).
it's movement must be sync'ed to all peers (but individual sync does not need to be simultaneus with other syncs).
if X sees one, but does not own it, it will behave well (deterministically, by X's local physics calculation) for maybe 1 second, but after that it will diverge (due to different frame rates) and needs a positional update (a UDP packet) from it's owner.
From a peers point of view:
He needs to update n-1 other peers
He need to receive updates from n-1 other peers
The positional updates are the critical ones, so focus only on those. One update is ca 20-30 doubles, ca. 200 bytes. Consider UDP only.
As i see it, there are two options. The first one is serverless, where everything works solely on peer2peer communication. The second one is having a server (one, for now) in the middle.
1. Serverless, p2p
Each peer must talk with many other peers. One problem is that "Nagle'ing" is useless. First reason is that all endpoints are different, Second is that the local data changes from frame to frame, and there is no point in accumulating multiple frames' data, to send in a larger packet, more sparsely. The oldest frames' data would be outdated. An advantage is however not being dependant on a server.
2. Server-supported
Each peer sends it's info to a high-performance, high-bandwidth server which is able to better receive and distribute to all peers, at a fast rate. Similarly, any peer would receive all peers' data from one endpoint only, the server.
Naurally, each peer runs a game loop.
Question: Hopefully based on some kind of experience, what would you throw as a maximum functional number of peers for case 1, case 2? Thx.
It is difficult to quantify, but for such a level of all-to-all synchronization I would recomend centralized control.
In p2p mode each peer would send n-1 and receive n-1 packets each pseudo-round. In centralized mode they would receive n-1 packets, but would send only 1, spending less time in this task. So centralized mode seems to be more scallable.
A server can check if update messages are consistent before delivering them. In p2p, each peer would have to deal with instable or disconnected peers, which could be better managed by a server.
In centralized mode update-time has to be better carefully choosen, because clients are more susceptible to experience higher latencies, as each packet has to travel towards the server, and then back to the clients. Choosing the best server for clients is one thing to consider.
Combining packets could make the information traverse the network faster, but as outdating of data is an issue, try to make sure each packet is as small as possible, transmission time is smaller in this case.
I have a shared hosting plan and am designing a single page site which will include a slideshow. The browser typically limits the number of simultaneous requests to a single domain. I don't expect a lot of traffic, but I would like the traffic I do receive to have fast load times. I may be able to add unlimited subdomains, but does that really affect the speed for the customer considering they are probably the only one polling my server and all subdomains point to the same processor? I have already created two versions of every image, one for the slideshow, and one for larger format via AJAX request, but the lag times are still a little long for my taste. Any suggestions?
Before you contrive a bunch of subdomains to maximize parallel connections, you should profile your page load behavior so you know where most of the time is being spent. There might be easier and more rewarding optimizations to make first.
There are several tools that can help with this, use all of them:
https://developers.google.com/speed/pagespeed/
http://developer.yahoo.com/yslow/
http://www.webpagetest.org/
Some important factors to look at are cache optimization and image compression.
If you've done all those things, and you are sure that you want to use multiple (sub)domains, then I would recommend using a content delivery network (CDN) instead of hosting the static files (images) on the same shared server. You might consider Amazon's CloudFront service. It's super easy to set up, and reasonably priced.
Lastly, don't get carried away with too many (sub)domains, because each host name will require a separate DNS lookup; find a balance.
I am implementing my first syncing code. In my case I will have 2 types of iOS clients per user that will sync records to a server using a lastSyncTimestamp, a 64 bit integer representing the Unix epoch in milliseconds of the last sync. Records can be created on the server or the clients at any time and the records are exchanged as JSON over HTTP.
I am not worried about conflicts as there are few updates and always from the same user. However, I am wondering if there are common things that I need to be aware of that can go wrong with a timestamp based approach such as syncing during daylight savings time, syncs conflicting with another, or other gotchas.
I know that git and some other version control system eschew syncing with timestamps for a content based negotiation syncing approach. I could imagine such an approach for my apps too, where using the uuid or hash of the objects, both peers announce which objects they own, and then exchange them until both peers have the same sets.
If anybody knows any advantages or disadvantages of content-based syncing versus timestamp-based syncing in general that would be helpful as well.
Edit - Here are some of the advantages/disadvantages that I have come up with for timestamp and content based syncing. Please challenge/correct.
Note - I am defining content-based syncing as simple negotiation of 2 sets of objects such as how 2 kids would exchange cards if you gave them each parts of a jumbled up pile of 2 identical sets of baseball cards and told them that as they look through them to announce and hand over any duplicates they found to the other until they both have identical sets.
Johnny - "I got this card."
Davey - "I got this bunch of cards. Give me that card."
Johnny - "Here is your card. Gimme that bunch of cards."
Davey - "Here are your bunch of cards."
....
Both - "We are done"
Advantages of timestamp-based syncing
Easy to implement
Single property used for syncing.
Disadvantages of timestamp-based syncing
Time is a relative concept to the observer and different machine's clocks can be out of sync. There are a couple ways to solve this. Generate timestamp on a single machine, which doesn't scale well and represents a single point of failure. Or use logical clocks such as vector clocks. For the average developer building their own system, vector clocks might be too complex to implement.
Timestamp based syncing works for client to master syncing but doesn't work as well for peer to peer syncing or where syncing can occur with 2 masters.
Single point of failure, whatever generates the timestamp.
Time is not really related to the content of what is being synced.
Advantages of content-based syncing
No per peer timestamp needs to be maintained. 2 peers can start a sync session and start syncing based on the content.
Well defined endpoint to sync - when both parties have identical sets.
Allows a peer to peer architecture, where any peer can act as client or server, providing they can host an HTTP server.
Sync works with the content of the sets, not with an abstract concept time.
Since sync is built around content, sync can be used to do content verification if desired. E.g. a SHA-1 hash can be computed on the content and used as the uuid. It can be compared to what is sent during syncing.
Even further, SHA-1 hashes can be based on previous hashes to maintain a consistent history of content.
Disadvantages of content-based syncing
Extra properties on your objects may be needed to implement.
More logic on both sides compared to timestamp based syncing.
Slightly more chatty protocol (this could be tuned by syncing content in clusters).
Part of the problem is that time is not an absolute concept. Whether something happens before or after something else is a matter of perspective, not of compliance with a wall clock.
Read up a bit on relativity of simultaneity to understand why people have stopped trying to use wall time for figuring these things out and have moved to constructs that represent actual causality using vector clocks (or at least Lamport clocks).
If you want to use a clock for synchronization, a logical clock will likely suit you best. You will avoid all of your clock sync issues and stuff.
I don't know if it applies in your environment, but you might consider whose time is "right", the client or the server (or if it even matters)? If all clients and all servers are not sync'd to the same time source there could be the possibility, however slight, of a client getting an unexpected result when syncing to (or from) the server using the client's "now" time.
Our development organization actually ran into some issues with this several years ago. Developer machines were not all sync'd to the same time source as the server where the SCM resided (and might not have been sync'd to any time source, thus the developer machine time could drift). A developer machine could be several minutes off after a few months. I don't recall all of the issues, but it seems like the build process tried to get all files modified since a certain time (the last build). Files could have been checked in, since the last build, that had modification times (from the client) that occurred BEFORE the last build.
It could be that our SCM procedures were just not very good, or that our SCM system or build process were unduly susceptible to this problem. Even today, all of our development machines are supposed to sync time with the server that has our SCM system on it.
Again, this was several years ago and I can't recall the details, but I wanted to mention it on the chance that it is significant in your case.
You could have a look at unison. It's file-based but you might find some of the ideas interesting.
How does one determine which peer you are connected to has the fastest connection(upload rate)?
Does the actual connection of the peer dominate who is fastest or will the peer who needs the most chunks cause him to upload the fastest as less people are downloading from him?
I want to write an algorithm which takes all the peers in the peer list returned from the tracker and determine wither which peers are closer using a ping and timing the response or some other way.
Thanks
A ping (ICMP echo request/reply) will give you the latency of a peer, but not the available bandwidth the peer has. You want the bandwidth since TCP is good at doing bandwidth*delay products and figuring out how to make a connection fast, even if it roundtrips a satellite.
What you do is to connect to all of them. Having 40 peers connected is not uncommon. And then you decide upon which to unchoke based on their current rates towards you (until you become a seeder). It also has to be fairly dynamic, since available bandwidth change over time. The best advice I can give is to read
http://www.bittorrent.org/bittorrentecon.pdf
which gives the general idea of how to implement the economics. But many clients do different things than the paper, so reading code is another option.
So: You want to measure bandwidth, not latency. Hence, ping is the wrong tool for the job. Measuring bandwidth is most easily done by tracking the rate at which you send packets to a peer.
I think that the choking/unchoking algorithm and selecting peers to unchoke is one of the hardest parts to get right in a client. It is best solved with pen, paper and brain, not by sitting in front of the computer writing code.
DNS Round Robin (DRR) permits to do cheap load balancing (distribution is a better term). It has the pro of permitting infinite horizontal scaling. The con is that if one of the web servers goes down, some clients continue to use the broken IP for minutes (min TTL 300s) or more, even if the DNS implements fail-over.
An Hardware Load Balancer (HLB) handles such web server failures transparently but it cannot scale its bandwidth indefinitely. An hot spare is also needed.
A good solution seems to use DRR in front to a group of HLB pairs. Each HLB pair never goes down and therefore DRR never keeps clients down. Plus, when bandwidth isn't enough you can add a new HLB pair to the group.
Problem: DRR moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
I could just avoid to use session stickiness but it makes better use of caches therefore is something that I want to preserve.
Question: is it possible/exist an HLB implementation where an instance can share its (sessionid,webserver) mapping with other instances?
If this is possible then a client would be routed to the same web server independently by the HLB that routed the request.
Thanks in advance.
Modern load balancers have very high throughput capabilities (gigabit). So unless you're running a huuuuuuuuuuge site (e.g. google), adding bandwidth is not why you'll need a new pair of load balancers, especially since most large sites offload much of their bandwidth to CDNs (Content Delivery Networks) like Akamai. If you're pumping a gigabit of un-CDN-able data through your site and don't already have a global load-balancing strategy, you've got bigger problems than cache affinity. :-)
Instead of bandwidth limits, sites tend to add additional LB pairs for geo-distribution of servers at separate data centers to ensure users spread across the world can talk to a server closest to them.
For that latter scenario, load balancer companies offer geo-location solutions, which (at least until a few years ago which was when I was following this stuff) were based on custom DNS implementations which looked at client IPs and resolved to the load balancer pairs Virtual IP address which is "closest" (in network topology or performance) to the client. These days, CDNs like Akamai also offer global load balancing services (e.g. http://www.akamai.com/html/technology/products/gtm.html). Amazon's EC2 hosting also supports this kind of feature for sites hosted there (see http://aws.amazon.com/elasticloadbalancing/).
Since users tend not to move across continents in the course of a single session, you automatically get affinity (aka "stickiness") with geographic load balancing, assuming your pairs are located in separate data centers.
Keep in mind that geo-location is really hard since you also have to geo-locate your data to ensure your back-end cross-data-center network doesn't get swamped.
I suspect that F5 and other vendors also offer single-datacenter solutions which achieve the same ends, if you're really concerned about the single point of failure of network infrastructure (routers, etc.) inside your datacenter. But router and switch vendors have high-availability solutions which may be more appropriate to address that issue.
Net-net, if I were you I wouldn't worry about multiple pairs of load balancers. Get one pair and, unless you have a lot of money and engineering time to burn, partner with a hoster who's good at keeping their data center network up and running.
That said, if cache affinity is such a big deal for your app that you're thinking about shelling out big $$$ for multiple pairs of load balancers, it may be worth considering some app architecture changes (like using an external caching cluster). Solutions like memcached (for linux) are designed for this scenario. Microsoft also has one coming called "Velocity".
Anyway, hope this is useful info-- it's admittedly been a while since I've been deeply involved in this space (I was part of the team which designed an application load balancing product for a large software vendor) so you might want to double-check my assumptions above with facts you can pull off the web from F5 and other LB vendors.
Ok, this is an ancient question, which I just found through a Google search. But for any future visitors, here is some additional clarifications:
Problem: [DNS Round Robin] moves clients randomly between the HLB pairs and therefore (AFAIK) session stickiness cannot work.
This premise is as best I can tell not accurate. It seems nobody really knows what old browsers might do, but presumably each browser window will stay on the same IP address as long as it's open. Newer operation systems probably obey the "match longest prefix" rule. Thus there shouldn't be much 'flapping', randomly switching from one load balancer IP to another.
However, if you're still worried about users getting randomly reassigned to a new load balancer pair, then a small modification of the classic L3/4 & L7 load balancing setup can help:
Publish DNS Round Robin records that go to Virtual high-availability IPs that are handled by L4 load balancers.
Have the L4 load balancers forward to pairs of L7 load balancers based on the origin IP address, i.e. use consistent hashing based on the end users IP to always route end users to the same L7 load balancer.
Have your L7 load balancers use "sticky sessions" as you want them to.
Essentially this is just a small modification to what Willy Tarreau (the creator of HAProxy) wrote years ago.
thanks for having put things in the right perspective.
I agree with you.
I did some reading and found:
Flickr: http://highscalability.com/flickr-architecture
4 billion queries per day --> about 50000 queries/s
Youtube: http://highscalability.com/youtube-architecture
100 million video views/day --> about 1200 video views/second
PlentyOfFish: http://highscalability.com/plentyoffish-architecture
600 pages/second
200 Mbps used
CDN used
Twitter: http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
300 tweets/second
600 req/s
A very top end LB like this can scale up :
200,000 SSL handshakes per second
1 million TCP connections per second
3.2 million HTTP requests per second
36 Gbps of TCP or HTTP throughput
Therefore, you are right a LB could hardly become a bottleneck.
Anyway I found this (old) article http://www.tenereillo.com/GSLBPageOfShame.htm
where it is explained that geo-aware DNS could create availability issues.
Could someone comment on that article?
Thanks,
Valentino
So why not keep it simple and have the DNS server give out a certain IP address (or addresses) based on the origin IP address (i.e. use consistent hashing based on the end users IP to always give end users the same IP address(es)) ?
I'm aware that this only provides a simple and cheap load distribution mechanism.
I have been looking for this, but haven't found a DNS server which implements this (although Bind has some possibilities with views).