What are the techniques/algorithms used in WAN optimization? I am looking for a reference which can give a good theory supported with code examples, I have taken a look in Steelhead manual from Riverbed and I found the following main techniques used in:
SDR (Scalable Data Referencing): which breaks up TCP data into
unique data chunks, each chunk has a reference number, where when the
same byte sequence occurs in future transmission, the reference
number is only sent across the LAN instead of raw data chunks.
Connection pooling: The product creates pools of idle TCP
connection (for HTTP as example), where when a client tries to create
a new connection to a previously visited target, it uses one from its
pool, which, in turns, overcomes three-way TCP handshake.
The product reduces the number of round trips over WAN for common
actions (opening/editing remote shared files/folders), it supports
most of intended protocols: CIFS, MAPI, HTTP … etc.
Data compression.
Through my search I found 3 open source projects aim to do WAN optimization, these are:
TrafficSqueezer
WANProxy
OpenNOP
TrafficSqueezer seems to have more features but the comments in its page in sorceforge do not give a good sense about it. I tried to find a document within these projects with good info but I couldn't.
the techniques that can reduce the traffic amount most - are of course compression and data deduplication (both WAN optimisers built up the same data based on a algorithm on memory or HDD - as soon as there is again the same traffic pattern - the pattern is replaced with a pointer to the data and a length - therefore you can save up to 99% when you transfer the same file twice, but even different files have a lot of common data where deduplication can optimise a lot!).
(you will find a lot of sources on the web: e.g. http://www.computerweekly.com/feature/How-data-deduplication-works)
in your example this is technique called SDR.
Riverbed has also a lot of protocol support - which makes for e.g. CIFS, SMB and MAPI more delay aware (e.g. a lot of packages are buffered and sent once - so save roundtrips)
Also F5 does e.g. FTP and HTTP optimisations to get those more performant.
when there is a lot delay on the WAN link - of course you can also save time with connection pooling - so pre-established TCP sessions (you can save the time that would be needed for a tcp 3way handshake)
so at a glance:
-data deduplication
-connection pooling
-compression
-protocol optimisation
i am sure you can find a lot in the f5 doku (F5 WOM is the product), bluecoat does offer WAN optimisation as well and of course Riverbed. also silverpeak might be worth a try.
for the opensouce ones i only have experiences on traffic squeezer, but there hasn't been a comparable feature-set to commercial products this time.
Related
I've read some performance claims about how Elixir and Erlang use hardware, and I'm trying to see if I understand their basis. Some background:
First, Erlang supports writing nested lists of immutable strings (iolists) to IO (files, sockets, etc) and uses writev and the strings' memory addresses to do so (see Evan Miller's blog post on this).
Second, the docs for an Erlang web framework called Chicago Boss say:
Erlang Respects Your RAM!
Erlang is different from other platforms because when rendering a server-side template, it doesn't create a separate copy of a web page in memory for each connected client. Instead, it constructs pointers to the same pieces of immutable memory across multiple requests.
So if two people request two different profile pages at the same time, they're actually sent the same chunks of memory for the header, footer, and other shared template snippets. The result is a server that can construct complex, uncached web pages for hundreds of users per second without breaking a sweat.
Third, a book about an Elixir (Erlang VM) web framework called Phoenix says:
Templates are precompiled. Phoenix doesn’t need to copy strings for each rendered template. At the hardware level, you’ll see caching come into play for these strings where it never did before.
From looking at the source, I know that this framework uses iolists to represent a completed response template.
Putting all this together, I think what's being implied is that if a web framework uses writev to tell the OS to send the same header and footer strings from the same memory locations, one web request after another, the hardware will be able to say "oh, I know that value, it's already in CPU cache so I don't have to look in RAM for it."
Is that right? (I have very little understanding of system calls and hardware.) If not, any ideas on how hardware caching is involved?
(Bonus if you can tell me how to see or infer what's happening.)
Yes, it's mostly the processor caches that help you. The time needed to retrieve the data is smaller as it's in a faster memory (ie the CPU caches).
Some pointers for understanding what the caches are and how they work:
https://www.quora.com/How-does-the-cache-memory-in-a-computer-work
http://www.hardwaresecrets.com/how-the-cache-memory-works/
http://lwn.net/Articles/252125/
To see this, measure how much a request takes (client side) in the normal server operation. After that have a separate process within the same vm that constantly creates and writes to disk a very large string (it probably has to be megabytes in size - whatever the size of the L2/L3 caches on your process are). Remeasure how much the request takes - if done correctly this should be at least 1 order of magnitude slower.
I have to model a bittorrent network, so there are a number of node connected each other. Each node has a download speed, say 600KBps, and an upload speed, say 130KBps.
The problem is: how can I model this in omnetpp? in the NED file i created the network this way. If A and B are nodes:
A.mygate$o++ --> {something} -->B.mygate$i++
B.mygate$o++ --> {something} -->A.mygate$i++
where mygate is a inout gate, $i and $o are the input and output half channel. But something must be a speed, but:
if I set a speed to the first line of code, this is the upload speed of A BUT is also the download speed of B. Ths is normal, because if I download from a slow server i have a slow download. How can I model the download speed of a peer in Omnetpp? I cannot understand this. Should i have to say: "allow k simultaneus download untill I reach the download speed?" or it is a bad approach? Can someone suggest me the right approach, and if a modul builtin in omnetpp already exists? I have read the manual but is a bit confusing. Thanks for every reply.
I suggest to take a look at OverSim which is a peer to peer network simulator on top of INET framework which is the framework to simulate internet related protocols in OMNeT++.
Generally each host should have a queue at link layer and the interfaces that are connected should manage that they do not put out further network packets until the transimmision line is not idle (which is determined by the datarate on the line and the length of the packet). Once the line gets idle, the next packet can be sent out on the line. This is how the datarate is limited by the actual channel.
If you don't want to implement this from scratch (no reason to do that) take a look at the INET framework. You should drop your hosts and connect their PPP interfaces with the asymmetric connection you have proposed in your question. The PPP interface in the StandardHost will do the queueing for you, so you only have to add some applications that generate your traffic, and you are set.
Still I would take a look at OverSim as it gives an even higher level abstraction on top of INET (I have no experience with it though)
What are the different options available for fast file transfer? I know fast is relative term. Basically one of the component(Tibco) is used to transfer the files(as large as 500MB) from our local folder to client folder(B2B zone). This component internally uses SFTP protocol and we found the implementation its quite slow(It takes 3/4 Hrs).
What are the alternatives to it ?
1)Can Zero MQ be used to transfer the files?
2)Tibco has one more product- Managed File Transfer which internally uses RocketStream protocol(http://support.rocketstream.com/docs/RS12/server/index.html?the_rocketstream_protocols.html) which they claim is super fast.
3)Can Apache Camel(FTP component) be used in such cases?
Latency and security are the main concerns.
Have a look at the ZeroMQ FileMQ project, it may fit your needs (https://github.com/zeromq/filemq)
I have written an application that communicates over TCP using a proprietary protocol. I have a test client that starts a thousand threads that each make requests from the client, and I noticed I can only get about 100 requests/second, even with fairly simple operations. These requests are all coming from the same client so that might be relevant.
I'm trying to understand how I can make things faster. I've read a bit about performance tuning in this area, but I'm trying to understand what I need to understand to performance tune network applications like this. Where should I start? What linux settings do I need to override, and how do I do it?
Help is greatly appreciated, thanks!
Have you considered using asynchronous methods to do the test instead of trying to spawn lots of threads. Each time one thread stops and other starts on the same cpu core, aka. context switching, there can be a very significant overhead. If you want a quick example of networking using asynchronous methods check out networkComms.net and look at how the NetworkComms.ConnectionListenModeUseSync property is used here. Obviously if you're running in linux you would have to use mono to run networkComms.net.
Play around with the Sysctls and Socket Options of the TCP stack: man tcp(7). E.g. you can change the send and receive buffer of tcp or switch NO_DELAY on. Actually to tune the TCP stack itself you should know how TCP works. Things like slow start, congestion control, congestion window etc. But this is related to the transmitting/receiving performance and the buffers maybe with your process handling.
You need to Understand the following Linux utility Command
uptime - Tell how long the system has been running.
uptime gives a one line display of the following information. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
top provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes
The mpstat command writes to standard output activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported. The mpstat command can be used both on SMP and UP machines, but in the latter, only global average activities will be printed. If no activity has been selected, then the default report is the CPU utilization report.
iostat - The iostat command is used for monitoring system input/output device
loading by observing the time the devices are active in relation to
their average transfer rates.
vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity. The first report produced gives averages since the last reboot
free - display information about free and used memory on the system
ping : ping uses the ICMP protocol's mandatory ECHO_REQUEST datagram to elicit an ICMP ECHO_RESPONSE from a host or gateway
Dstat allows you to view all of your system resources instantly, you can eg. compare disk usage in combination with interrupts from your IDE controller, or compare the network bandwidth numbers directly with the disk throughput (in the same interval)
Then you need to know how TCP protocol work, Learn how to identify the Network Latency, where is the Problem, is the problem with ACK,SYCN,ACK SYC, DATA, RELEASE
I would like to make a system whereby users can upload and download files. The system will have a centralized topography but will rely heavily on peers to transfer relevant data through the central node to other peers. Instead of peers holding entire files I would like for them to hold a compressed an encrypted portion of the whole data set.
Some client uploads file to server anonymously
I would like for the client to be able to upload using some sort of NAT (random ip), realizing that the server would not be able to send confirmation packets back to the client. Is ensuring data integrity feasible with a header relaying the total content length, and disregarding the entire upload if there is a mismatch?
Server indexes, compresses and splits the data into chunks adding identifying bytes to each chunk, encrypts it, and splits the data over the network while mapping the locations of each chunk.
The server will also update the file index for peers upon request. As more data is added to the system, I imagine that the compression can become more efficient. I would like to be able to push these new dictionary entries to peers so they can update both their chunks and the decompression system in the client software, without causing overt network strain. If encrypted, the chunks can be large without any client being aware of having part of x file.
Some client requests a file
The central node performs a lookup to determine the location of the chunks within the network and requests these chunks from peers. Once the chunks have been assembled, they are sent (still encrypted and compressed) to the client, who then translates the content into the decompressed file. It would be nice if an encrypted request could be made through a peer and relayed to a server, and onion routed through multiple paths with end-to-end encryption.
In the background, the server will be monitoring the stability and redundancy of the chunks, and if necessary will take on chunks that near extinction, and either hold them in it's own bank or redistribute them over the network if there are willing clients. In this way, the central node can shrink and grow as appropriate.
The goal is to have a network within which any client can upload or download data with no single other peer knowing who has done either, but with free and open access to all.
The system must be able to handle a massive amount of simultaneous connections while managing the peers and data library without loosing it's head.
What would be your optimal implementation?
Edit : Bounty opened.
Over the weekend, I implemented a system that does basically the above, minus part 1. For the upload, I just implemented SSL instead of forging the IP address. The system is weak in several areas. Files are split into 1MB chunks and encrypted, and sent to registered peers at random. The recipient(s) for each chunk are stored in the database. I fear that this will quickly grow too large to be manageable, but I also want to avoid having to flood the network with chunk requests. When a file is requested, the central node informs peers possessing the chunks that they need to send the chunks to x client (in p2p mode) or to the server (in direct mode), which then transfers the file down. The system is just one big hack, and written in ruby, which I imagine is not really up to the task. For the rewrite, I am considering using C++ with Boost.Asio.
I am looking for general suggestions regarding architecture and system design. I am not at all attached to my current implementation.
Current Topography
Server Handling client uploads,indexing, and content propagation
Server Handling client requests
Client for upload files and requesting files
Client Server accepting chunks and requests
I would like for the client not to have to have a persistent server running, but I can't think of a good way around it.
I would post some of the code but its embarassing. Thanks. Please ask any questions, the basic idea is to have a decent anonymous file sharing model combining the strengths of both the distributed and centralized model of content distribution. If you have a totally different idea, please feel free to post it if you want.
I would like for the client to be able
to upload using some sort of NAT
(random ip), realizing that the server
would not be able to send confirmation
packets back to the client. Is
ensuring data integrity feasible with
a header relaying the total content
length, and disregarding the entire
upload if there is a mismatch?
No, that's not feasible. If your packets are 1500 bytes, and you have 0.1% packetloss, the chance of a one megabyte file being uploaded without any lost packets is .999 ^ (1048576 / 1500) = 0.497, or under 50%. Further, it's not clear how the client would even know if the upload succeeded if the server has no way to send acknowledgements back to the client.
One way around the acknowledgement issue would be to use a rateless code, which allows the client to compute and send an effectively infinite number of unique blocks, such that any sufficiently large subset is enough to reconstruct the original file. This adds a large amount of complexity to both the client and server, however, and still requires some way to notify the client that the server has received the complete file.
It seems to me you're confusing several issues here. If your system has a centralized component to which your clients upload, why do you need to do NAT traversal at all?
For parts two and three of your question, you probably want to research Distributed Hash Tables and content-based addressing (but with major caveats explained here). Preventing the nodes from knowing the content of the files they store could be accomplished by, for example, encrypting the files with the first hash of their content, and storing them keyed by the second hash - this means that anyone who knows the hash of the file can retrieve it, but clients cannot decrypt the files they host.
In general, I would suggest starting by writing down a solid list of goals for the system you're designing, then looking for an architecture that suits those goals. In contrast, it sounds like you have some implicit goals, and have already picked a basic system architecture - which may not suit your full goals - based on that.
Sorry for arriving late at the generous 500 reputation party, but even if i am too late i would like to add a little of my research to your discussion.
Yes such a system would be nice, like Bittorrent but with encrypted files and hashes of the un-encrypted data. In BT you can add encrypted files of course, but then the hashes would be of the encrypted data and thus not possible to identify retrieval-sources without a centralized queryKey->hashCollection storage, i.e. a server that does all the work of identifying package-sources for every client. A similar system was attempted by Freenet (http://freenetproject.org/), although more limited than what you attempt.
For the NAT consideration let's first look at: aClient -> yourServer (and aClient->aClient later)
For the communication between a client and your server the NATs (and firewalls that shield the clients) are not an issue! Since the clients initiate the connection to your server (which has either fixed ip-address or a dns-entry (or dyndns)) you dont even have to think about NATs, the server can respond without an issue since, even if multiple clients are behind a single corporate firewall the firewall (its NAT) will look up with which client the server wants to communicate and forwards accordingly (without you having to tell it to).
Now the "hard" part: client -> client communication through firewalls/NAT: The central technique you can use is hole-punching (http://en.wikipedia.org/wiki/UDP_hole_punching). It works so well it is what Skype uses (from one client behind a corporate firewall to another; (if it does not succeed it uses a mirroring-server)). For this you need both clients to know the address of the other and then shoot some packets at eachother so how do they get eachother's addresses?: Your server gives the addresses to the clients (this requires that not only a requester but also every distributer open a connection to your server periodically).
Before i talk about your concern about data-integrity, the general distinction between packages and packets you could (and i guess should) make:
You just have to consider that you can separate between your (application-domain) packages (large) and the packets used for internet-transmission (small, limited by MTU among other things): It would be slow to have both be the same size, the size of a maximum tcp/ip packet is 576 (minus overhead; take a look here: http://www.comsci.us/datacom/ippacket.html ); you could do some experiments about what a good size for your packages is, my best guess is that 50k-1M would all be fine (but profiling would optimize that since we dont if most of the files you want to distribute are large or small).
About data-integrity: For your packages you definitely need a hash, i would recommend to directly take a cryptographic hash since this prevents tampering (in addition to corruption); you dont need to record the size of the packages since if the hash is faulty you have to re-transmit the package anyways. Bear in mind, that this kind of package-corruption is not very frequent if you use TCP/IP for packet transmission (yes, you can use TCP/IP even in your scenario), it automatically corrects (re-requests) transmission-errors. The huge advantage is that all computers and routers in between know TCP/IP and check for corruption automatically on every step in between the source and destination computer, so they can re-request the packet themselves which makes it very fast. They would not know about a packet-integrity-protocol you implement yourself so with that custom protocol the packet has to arrive at the destination before the re-request can even start.
For the next thought let's call the client which publishes a file the "publisher", i know this is kind of obvious, however it is important to distinguish this from "uploader", since the client does not need to upload the file to your server (just some info about it, see below).
Implementing the central indexing-server should be no problem, the problem would be that you plan to have it encrypt all the files itself instead of making the publisher do that heavy work (good encryption is very heavy lifting). The only problem with having the publisher (not the server) encrypt the data is, that you have to trust the publisher to give you reasonable search-keywords: theoretically it could give you a very attractive search-keyword every client desires together with a reference to bogus data (encrypted data is hard to distinguish from random data). But the solution to this problem is crowd-sourcing: make your server store a user-rating so downloaders can vote on files. The implementation of the table you need could be a regular-old hash-table of individual search-keywords to client-ID's (see below) who have that package. The publisher is at first the only client that holds the data, but every client that downloaded at least one of the packages should then be added to the hash-table's entry, so if the publisher goes offline and every package has been downloaded by at least one client everything continues working. Now critically the mapping client-ID->IP-Addresses is non-trivial because it changes often (e.g. every 24 hours for many clients); to compensate, you have to have another table on your server that makes this mapping and make the clients contact the server periodically (e.g. every hour) to tell it its IP-address. I would recommend using a crypto-hash for client-ID's so that it is impossible for one client to trash this table by telling you fake ID's.
For any questions and criticism, please comment.
I am not sure having one central point of attack (the central server) is a very good idea. That of course depends on the kind of problems you want to be able to handle. It also limits your scalability a lot