I have quite a bewildering problem.
I'm using a big C++ library for handling some proprietary protocol over UDP on Windows XP/7. It listens on one port throughout the run of the program, and waits for connections from distant peers.
Most of the time, this works well. However, due to some problems I'd experienced, I've decided to add a simple debug print directly after the call to WSARecvFrom (the win32 function used in the library to recv datagrams from my socket of interest, and tell what IP and port they came from).
Strangely enough, in some cases, I've discovered packets are dropped at the OS level (i.e. I see them in Wireshark, they have the right dst-port, all checksums are correct - but they never appear in the debug prints I've implanted into the code).
Now, I'm fully of the fact (people tend to mention a bit too often) that "UDP doesn't guarantee delivery" - but this is not relevant, as the packets are received by the machine - I see them in Wireshark.
Also, I'm familiar with OS buffers and the potential to fill up, but here comes the weird part...
I've done some research trying to find out which packets are dropped exactly. What I've discovered, is that all dropped packets share two things in common (though some, but definitely not most, of the packets that aren't dropped share these as well):
They are small. Many of the packets in the protocol are large, close to MTU - but all packets that are dropped are under 100 bytes (gross).
They are always one of two: a SYN-equivalent (i.e. the first packet a peer sends to us in order to initiate communications) or a FIN-equivalent (i.e. a packet a peer sends when it is no longer interested in talking to us).
Can either one of these two qualities affect the OS buffers, and cause packets to be randomly (or even more interesting - selectively) dropped?
Any light shed on this strange issue would be very appreciated.
Many thanks.
EDIT (24/10/12):
I think I may have missed an important detail. It seems that the packets dropped before arrival share something else in common: They (and I'm starting to believe, only they) are sent to the server by "new" peers, i.e. peers that it hasn't tried to contact before.
For example, if a syn-equivalent packet arrives from a peer* we've never seen before, it will not be seen by WSARecvFrom. However, if we have sent a syn-equivalent packet to that peer ourselves (even if it didn't reply at the time), and now it sends us a syn-equivalent, we will see it.
(*) I'm not sure whether this is a peer we haven't seen (i.e. ip:port) or just a port we haven't seen before.
Does this help?
Is this some kind of WinSock option I've never heard of? (as I stated above, the code is not mine, so it may be using socket options I'm not aware of)
Thanks again!
The OS has a fixed size buffer for data that has arrived at your socket but hasn't yet been read by you. When this buffer is exhausted, it'll start to discard data. Debug logging may exacerbate this by delaying the rate you pull data from the socket at, increasing the chances of overflows.
If this is the problem, you could at least reduce the instances of it by requesting a larger recv buffer.
You can check the size of your socket's recv buffer using
int recvBufSize;
int err = getsockopt(socket, SOL_SOCKET, SO_RCVBUF,
(char*)&recvBufSize, sizeof(recvBufSize));
and you can set it to a larger size using
int recvBufSize = /* usage specific size */;
int err = setsockopt(socket, SOL_SOCKET, SO_RCVBUF,
(const char*)&recvBufSize, sizeof(recvBufSize));
If you still see data being received by the OS but not delivered to your socket client, you could think about different approaches to logging. e.g.
Log to a RAM buffer and only print it occasionally (at whatever size you profile to be most efficient)
Log from a low priority thread, either accepting that the memory requirements for this will be unpredictable or adding code to discard data from the log's buffer when it gets full
I had a very similar issue, after confirming that the receive buffer wasn't causing drops, I learned that it was because I had the receive timeout set too low at 1ms. Setting the socket to non-blocking and not setting the receive timeout fixed the issue for me.
Turn off the Windows Firewall.
Does that fix it? If so, you can likely enable the Firewall back on and just add a rule for your program.
That's my most logical guess based on what you said here in your update:
It seems that the packets dropped before arrival share something else
in common: They (and I'm starting to believe, only they) are sent to
the server by "new" peers, i.e. peers that it hasn't tried to contact
before.
Faced same kind of problem on the redhat-linux as well.
This turn out to be a routing issue.
RCA is as follows:
True that UDP is able to reach destination machine(seen on Wireshark).
Now route to source is not found so there is no reply can be seen on the Wireshark.
On some OS you can see the request packet on the Wireshark but OS does not actually delivers the packets socket (You can see this socket in the netstat-nap).
In these case please check ping always (ping <dest ip> -I<source ip>)
Related
I'm looking for some way to get a signal on an I/O completion port when a socket becomes readable/writeable (i.e. the next send/recv will complete immediately). Basically I want an overlapped version of WSASelect.
(Yes, I know that for many applications, this is unnecessary, and you can just keep issuing overlapped send calls. But in other applications you want to delay generating the message to send until the last moment possible, as discussed e.g. here. In these cases it's useful to do (a) wait for socket to be writeable, (b) generate the next message, (c) send the next message.)
So far the best solution I've been able to come up with is to spawn a thread just to call select and then PostQueuedCompletionStatus, which is awful and not particularly scalable... is there any better way?
It turns out that this is possible!
Basically the trick is:
Use the WSAIoctl SIO_BASE_HANDLE to peek through any "layered service providers"
Use DeviceIoControl to submit an AFD_POLL request for the base handle, to the AFD driver (this is what select does internally)
There are many, many complications that are probably worth understanding, but at the end of the day the above should just work in practice. This is supposed to be a private API, but libuv uses it, and MS's compatibility policies mean that they will never break libuv, so you're fine. For details, read the thread starting from this message: https://github.com/python-trio/trio/issues/52#issuecomment-424591743
For detecting that a socket is readable, it turns out that there is an undocumented but well-known piece of folklore: you can issue a "zero byte read", i.e., an overlapped WSARecv with a zero-byte receive buffer, and that will not complete until there is some data to be read. This has been recommended for servers that are trying to do simultaneous reads from a large number of mostly-idle sockets, in order to avoid problems with memory usage (apparently IOCP receive buffers get pinned into RAM). An example of this technique can be seen in the libuv source code. They also have an additional refinement, which is that to use this with UDP sockets, they issue a zero-byte receive with MSG_PEEK set. (This is important because without that flag, the zero-byte receive would consume a packet, truncating it to zero bytes.) MSDN claims that you can't combine MSG_PEEK with overlapped I/O, but apparently it works for them...
Of course, that's only half of an answer, because there's still the question of detecting writability.
It's possible that a similar "zero-byte send" trick would work? (Used directly for TCP, and adding the MSG_PARTIAL flag on UDP sockets, to avoid actually sending a zero-byte packet.) Experimentally I've checked that attempting to do a zero-byte send on a non-writable non-blocking TCP socket returns WSAEWOULDBLOCK, so that's a promising sign, but I haven't tried with overlapped I/O. I'll get around to it eventually and update this answer; or alternatively if someone wants to try it first and post their own consolidated answer then I'll probably accept it :-)
I am trying to guess what the size is used for UDP input socket of a running application and couldn't find any piece of software able to do it.
The closer Ifound is TracePlus/Winsock but it only works with 32bits applications and mine is 64bits...
Rather than trying to guess what buffer sizes the app is actually using in its code, I would suggest you instead use a packet sniffer, such as Wireshark, to see the actual size of the packets that are actually being transmitted over the wire. The app has to be using buffer sizes that are at least at large as the packets, or else WinSock would report WSAEMSGSIZE errors and data would get truncated/discarded.
Have you tryied using hooking techniques?I think detours can help you
getsockopt() with the SO_RCVBUF option gives you the size of your socket receive buffer. Not sure that's whzt you really want though.
I need to get a list of interfaces on my local machine, along with their IP addresses, MACs and a set of QoS measurements ( Delay, Jitter, Error rate, Loss Rate, Bandwidth)...
I'm writing a kernel module to read these information from local network devices,So far I've extracted every thing mentioned above except for both Jitter and Bandwidth...
I'm using linux kernel 2.6.35
It depends what you mean by bandwidth. In most cases you only get from the PHY something that is better called bitrate. I guess you rather need some kind of information on the available bandwidth at a higher layer, which you can't get without active or passive measurements done, e.g. sending ICMP echo-like probe packets, and investigating replies. You should also make clear what the two points in the network are (both the actual endpoints and the communication layer) between which you would like to measure available bandwidth.
As for jitter you also need to do some kind of measurements, basically the same way as above.
I know this is an old post, but you could accomplish at least getting jitter by inspecting the RTCP packets if they're available. They come in on the +1 of the RTP port and come along with any RTP stream as far as I've seen. A lot of information can be gotten from RTCP, but for your purposes just the basic source description would do it:
EDIT: (didn't look at the preview)
Just check out this link for the details of the protocol, but you can get the jitter pretty easily from an RTCP packet.
Depending on what you're using the RTP stream for too there are a lot of other resources, like the VoIP Metrics Report Block in the Extended Report (https://www.rfc-editor.org/rfc/rfc3611#page-25).
EDIT:
As per Artem's request here is a basic flow of how you might do it:
An RTP stream is started on say port 16400 (the needed drivers/mechanism for this to happen are most likely already in place).
Tell the kernel to start listening on port 16401 (1 above your RTP stream's port) as well; this is where the RTCP pkts will start coming in.
As the RTCP pkts come in send them wherever you want to handle them (ie, if you're wanting to parse it in userspace or something).
Parse the pkts for the desired data. I'm not aware of a particular lib to do this, but it's pretty easy to just point some struct at it (in C) and dereference, watching out for Endianess.
I would like to make a system whereby users can upload and download files. The system will have a centralized topography but will rely heavily on peers to transfer relevant data through the central node to other peers. Instead of peers holding entire files I would like for them to hold a compressed an encrypted portion of the whole data set.
Some client uploads file to server anonymously
I would like for the client to be able to upload using some sort of NAT (random ip), realizing that the server would not be able to send confirmation packets back to the client. Is ensuring data integrity feasible with a header relaying the total content length, and disregarding the entire upload if there is a mismatch?
Server indexes, compresses and splits the data into chunks adding identifying bytes to each chunk, encrypts it, and splits the data over the network while mapping the locations of each chunk.
The server will also update the file index for peers upon request. As more data is added to the system, I imagine that the compression can become more efficient. I would like to be able to push these new dictionary entries to peers so they can update both their chunks and the decompression system in the client software, without causing overt network strain. If encrypted, the chunks can be large without any client being aware of having part of x file.
Some client requests a file
The central node performs a lookup to determine the location of the chunks within the network and requests these chunks from peers. Once the chunks have been assembled, they are sent (still encrypted and compressed) to the client, who then translates the content into the decompressed file. It would be nice if an encrypted request could be made through a peer and relayed to a server, and onion routed through multiple paths with end-to-end encryption.
In the background, the server will be monitoring the stability and redundancy of the chunks, and if necessary will take on chunks that near extinction, and either hold them in it's own bank or redistribute them over the network if there are willing clients. In this way, the central node can shrink and grow as appropriate.
The goal is to have a network within which any client can upload or download data with no single other peer knowing who has done either, but with free and open access to all.
The system must be able to handle a massive amount of simultaneous connections while managing the peers and data library without loosing it's head.
What would be your optimal implementation?
Edit : Bounty opened.
Over the weekend, I implemented a system that does basically the above, minus part 1. For the upload, I just implemented SSL instead of forging the IP address. The system is weak in several areas. Files are split into 1MB chunks and encrypted, and sent to registered peers at random. The recipient(s) for each chunk are stored in the database. I fear that this will quickly grow too large to be manageable, but I also want to avoid having to flood the network with chunk requests. When a file is requested, the central node informs peers possessing the chunks that they need to send the chunks to x client (in p2p mode) or to the server (in direct mode), which then transfers the file down. The system is just one big hack, and written in ruby, which I imagine is not really up to the task. For the rewrite, I am considering using C++ with Boost.Asio.
I am looking for general suggestions regarding architecture and system design. I am not at all attached to my current implementation.
Current Topography
Server Handling client uploads,indexing, and content propagation
Server Handling client requests
Client for upload files and requesting files
Client Server accepting chunks and requests
I would like for the client not to have to have a persistent server running, but I can't think of a good way around it.
I would post some of the code but its embarassing. Thanks. Please ask any questions, the basic idea is to have a decent anonymous file sharing model combining the strengths of both the distributed and centralized model of content distribution. If you have a totally different idea, please feel free to post it if you want.
I would like for the client to be able
to upload using some sort of NAT
(random ip), realizing that the server
would not be able to send confirmation
packets back to the client. Is
ensuring data integrity feasible with
a header relaying the total content
length, and disregarding the entire
upload if there is a mismatch?
No, that's not feasible. If your packets are 1500 bytes, and you have 0.1% packetloss, the chance of a one megabyte file being uploaded without any lost packets is .999 ^ (1048576 / 1500) = 0.497, or under 50%. Further, it's not clear how the client would even know if the upload succeeded if the server has no way to send acknowledgements back to the client.
One way around the acknowledgement issue would be to use a rateless code, which allows the client to compute and send an effectively infinite number of unique blocks, such that any sufficiently large subset is enough to reconstruct the original file. This adds a large amount of complexity to both the client and server, however, and still requires some way to notify the client that the server has received the complete file.
It seems to me you're confusing several issues here. If your system has a centralized component to which your clients upload, why do you need to do NAT traversal at all?
For parts two and three of your question, you probably want to research Distributed Hash Tables and content-based addressing (but with major caveats explained here). Preventing the nodes from knowing the content of the files they store could be accomplished by, for example, encrypting the files with the first hash of their content, and storing them keyed by the second hash - this means that anyone who knows the hash of the file can retrieve it, but clients cannot decrypt the files they host.
In general, I would suggest starting by writing down a solid list of goals for the system you're designing, then looking for an architecture that suits those goals. In contrast, it sounds like you have some implicit goals, and have already picked a basic system architecture - which may not suit your full goals - based on that.
Sorry for arriving late at the generous 500 reputation party, but even if i am too late i would like to add a little of my research to your discussion.
Yes such a system would be nice, like Bittorrent but with encrypted files and hashes of the un-encrypted data. In BT you can add encrypted files of course, but then the hashes would be of the encrypted data and thus not possible to identify retrieval-sources without a centralized queryKey->hashCollection storage, i.e. a server that does all the work of identifying package-sources for every client. A similar system was attempted by Freenet (http://freenetproject.org/), although more limited than what you attempt.
For the NAT consideration let's first look at: aClient -> yourServer (and aClient->aClient later)
For the communication between a client and your server the NATs (and firewalls that shield the clients) are not an issue! Since the clients initiate the connection to your server (which has either fixed ip-address or a dns-entry (or dyndns)) you dont even have to think about NATs, the server can respond without an issue since, even if multiple clients are behind a single corporate firewall the firewall (its NAT) will look up with which client the server wants to communicate and forwards accordingly (without you having to tell it to).
Now the "hard" part: client -> client communication through firewalls/NAT: The central technique you can use is hole-punching (http://en.wikipedia.org/wiki/UDP_hole_punching). It works so well it is what Skype uses (from one client behind a corporate firewall to another; (if it does not succeed it uses a mirroring-server)). For this you need both clients to know the address of the other and then shoot some packets at eachother so how do they get eachother's addresses?: Your server gives the addresses to the clients (this requires that not only a requester but also every distributer open a connection to your server periodically).
Before i talk about your concern about data-integrity, the general distinction between packages and packets you could (and i guess should) make:
You just have to consider that you can separate between your (application-domain) packages (large) and the packets used for internet-transmission (small, limited by MTU among other things): It would be slow to have both be the same size, the size of a maximum tcp/ip packet is 576 (minus overhead; take a look here: http://www.comsci.us/datacom/ippacket.html ); you could do some experiments about what a good size for your packages is, my best guess is that 50k-1M would all be fine (but profiling would optimize that since we dont if most of the files you want to distribute are large or small).
About data-integrity: For your packages you definitely need a hash, i would recommend to directly take a cryptographic hash since this prevents tampering (in addition to corruption); you dont need to record the size of the packages since if the hash is faulty you have to re-transmit the package anyways. Bear in mind, that this kind of package-corruption is not very frequent if you use TCP/IP for packet transmission (yes, you can use TCP/IP even in your scenario), it automatically corrects (re-requests) transmission-errors. The huge advantage is that all computers and routers in between know TCP/IP and check for corruption automatically on every step in between the source and destination computer, so they can re-request the packet themselves which makes it very fast. They would not know about a packet-integrity-protocol you implement yourself so with that custom protocol the packet has to arrive at the destination before the re-request can even start.
For the next thought let's call the client which publishes a file the "publisher", i know this is kind of obvious, however it is important to distinguish this from "uploader", since the client does not need to upload the file to your server (just some info about it, see below).
Implementing the central indexing-server should be no problem, the problem would be that you plan to have it encrypt all the files itself instead of making the publisher do that heavy work (good encryption is very heavy lifting). The only problem with having the publisher (not the server) encrypt the data is, that you have to trust the publisher to give you reasonable search-keywords: theoretically it could give you a very attractive search-keyword every client desires together with a reference to bogus data (encrypted data is hard to distinguish from random data). But the solution to this problem is crowd-sourcing: make your server store a user-rating so downloaders can vote on files. The implementation of the table you need could be a regular-old hash-table of individual search-keywords to client-ID's (see below) who have that package. The publisher is at first the only client that holds the data, but every client that downloaded at least one of the packages should then be added to the hash-table's entry, so if the publisher goes offline and every package has been downloaded by at least one client everything continues working. Now critically the mapping client-ID->IP-Addresses is non-trivial because it changes often (e.g. every 24 hours for many clients); to compensate, you have to have another table on your server that makes this mapping and make the clients contact the server periodically (e.g. every hour) to tell it its IP-address. I would recommend using a crypto-hash for client-ID's so that it is impossible for one client to trash this table by telling you fake ID's.
For any questions and criticism, please comment.
I am not sure having one central point of attack (the central server) is a very good idea. That of course depends on the kind of problems you want to be able to handle. It also limits your scalability a lot
I am using Ruby to test a C# networking application, which uses sockets. I open the connection with #socket = TCPSocket.new(IP,PORT) and it works - until the text I want to send is longer than 1024 characters. Then Ruby splits the message into 2 parts. C++ and C# send the message as one packet, so the C# application doesn't need to join the parts.
The messages never get longer than approx. 2000 chars. Is there a possibility to set the packet size for TCPSocket?
EDIT:
All of your answers are correct, but after reading a lot of ruby socket questions here on SO I found the solution:
socket.send(msg,0x4)
does not split the message. The option for direct send makes the difference.
I don't know if this works over the internet, but it works in my test lab.
Thx for the help.
TCP is a stream protocol. It does not care about application "messages". TCP can theoretically send your 1024 bytes in one packet, or in 1024 packets.
That said, keep in mind that Ethernet MTU is 1500 bytes. Factor in IP header, which is normally 20, and TCP header, which is at least 20. Then your 2000-char message will have to be sent in at least two packets. TCP also does flow control, which might be relevant to the issue. The best way to find out what's going on on the wire is to use tcpdump or wireshark.
The number of packets required to transmit your data should have very little effect on the stream in practice. What you might be encountering is a buffering problem in your implementation.
A socket should only be written to when it's in a "writeable" state, otherwise you risk overflowing the output buffer and causing the connection to be dropped by your networking stack.
As a TCP/IP socket functions as a simple stream, where data goes in, and comes out in order, the effect of packet fragmentation should be mostly irrelevant except for extremely time-sensitive applications.
Make sure you flush your output buffer when writing to the socket or you may have some data left waiting to transmit:
#socket.write(my_data)
#socket.flush