Is it possible to know the TCP buffer size dynamically on windows.I set the TCP buffer size using SO_SNDBUF,SO_RECVBUF and also can check its allocated buffer size using getsockopt(). But I wanted to know how to get the available buffer size so that if buffer size goes beyond I can take some action.Any utility or api will be equally useful.
My question is specific to windows.Though of anyone knows anything about linux it can also be useful to know for me to get any any parallel.
Buffers are used asynchronously by the kernel. You can not control them. Moreover, the underlying implementation can ignore your SO_SNDBUF/SO_RECVBUF requests, or choose to provide smaller/larger amounts than requested.
Related
I developed a TCP server in C/C++ which accepts connections by clients. One of the functionalities is reading arbitrary server memory specified by the client.
Note: Security is no concern here since client and server applications are ran locally only.
Uncompressed memory sending currently works as follows
The client sends the starting address and end address to the server.
The server replies with the memory read between the received starting and end address chunk-wise each time the sending buffer runs full.
The client reads the expected amount of bytes (length = end address - starting address)
Sending large chunks of memory with a potentially high amount of 0 memory is slow so using some sort of compression would seem like a good idea. This makes the communication quite a bit more complicated.
Compressed memory sending currently works as follows
The client sends the starting address and end address to the server.
The server reads a chunk of memory and compresses it with zlib. If the compressed memory is smaller than the original memory, it keeps the compressed one. The server saves the memory size, whether it's compressed or not and the compressed bytes in the sending buffer. When the buffer is full, it is sent back to the client. The send buffer layout is as follows:
Total bytes remaining in the buffer (int) | memory chunks count (int) | list of chunk sizes (int each) | list of whether a chunk is compressed or not (bool each) | list of the data (variable sizes each)
The client reads an int (the total bytes remaining). Then it reads the remaining buffer size using the remaining byte information. Now the client reads the memory chunks count (another int) to be able to parse the list of chunk sizes and the list of whether they are compressed or not. Then using the sizes as well as the compressed information, the client can access the list of data and apply a decompression if necessary. The raw memory buffer is then assembled from all the decompressed received data. Reading from the server continues till the expected amount of raw bytes is assembled.
My question is if the compression approach appears optimal or if I'm missing something important. Sending TCP messages is the bottleneck here so minimizing them while still transmitting the same data should be the key to optimize performance.
Hy, I will give you a few starting point. Remember those are only starting points.
First read this paper:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.2302&rep=rep1&type=pdf
and this
https://www.sandvine.com/hubfs/downloads/archive/whitepaper-tcp-optimization-opportunities-kpis-and-considerations.pdf
This will give you a hint what can go wrong and it is a lot. Basically my advice is concentrate on behavior of the server/network system. What I mean try to get it stress tested and try to get a consistent behavior.
If you get a congestion in the system have a strategy for that. Optimize the buffer sizes for the socket. Research how the ring buffers work for the network protocols. Research if you can use jumbo MTU. Test if jitter is a problem in your system. Often because of some higher power the protocols start behaving erratic ( OS is busy, or some memory allocation ).
Now the most important you need to stress test all the time every move you make. Have a consistent reproducable test with that you can test at any point.
If you are on linux setsockopt is your friend and enemy at the same time. Get to know how it works what it does.
Define boundaries what your server must be able to do and what not.
I wish you the best of luck. I'm optimizing my system for latency and it's tricky to say the least.
I'm studying Blocked sort-based indexing and the algorithm talks about loading files by some block of 32 or 64kb because disk reading is by block so it is efficient.
My first question is how am I supposed to load file by block?buffer reader of 64kb? But if I use java input stream, whether or not this optimization has already beed done and I can just tream the stream?
I actually use apache spark, so whether or not sparkContext.textFile() does this optimization? what about spark streaming?
I don't think on the JVM you have any direct view onto the file system that would make it meaningful to align reads and block-sizes. Also there are various kinds of drives and many different file systems now, and block sizes would most likely vary or even have little effect on the total I/O time.
The best performance would probably be to use java.nio.FileChannel, and then you can experiment with reading ByteBuffers of given block sizes to see if it makes any performance difference. I would guess the only effect you see is that the JVM overhead for very small buffers matters more (extreme case, reading byte by byte).
You may also use the file-channel's map method to get hold of a MappedByteBuffer.
At the moment, the transmit and receive packet size is defined by a macro
#define PKT_BUF_SZ (VLAN_ETH_FRAME_LEN + NET_IP_ALIGN + 4)
So PKT_BUF_SZ comes to around 1524 bytes. So the NIC I am having can handle incoming packets from the network which are <= 1524. Anything bigger than that causes the system to crash or worse reboot. Using Linux kernel 2.6.32 and RHEL 6.0, and a custom FPGA NIC.
Is there a way to change the PKY_BUF_SZ dynamically by getting the size of the incoming packet from the NIC? Will it add to the overhead? Should the hardware drop the packets before it reaches the driver ?
Any help/suggestion will be much appreciated.
This isn't something that can be answered without knowledge of the specific controller. They all work differently in details.
Some broadcom NICs for example have different-sized pools of buffers from which the controller will select an appropriate one based on the frame size. For example, a pool of small (256) byte buffers, a pool of standard size (1536 or so) buffers, and a pool of jumbo buffers.
Some intel NICs have allowed a list of fixed size buffers together with a maximum frame size and it will then pull as many consecutive buffers as needed (not sure linux ever supported this use though -- it's much more complicated for software to handle).
But the most common model that most NICs use (and in fact, I believe all of the commercial ones can be used this way): they expect an entire frame to fit in a single buffer, and your single buffer size needs to accommodate the largest frame you will receive.
Given that your NIC is a custom FPGA one, only its designers can advise you on the specifics you're asking. If linux is crashing when larger packets come through, then most likely either your allocated buffer size is not as large as you are telling the NIC it is (leading to overflow), or the NIC has a bug that is causing it to write into some other memory area.
Is it at all possible to allocate large (i.e. 32mb) physically contiguous memory regions from kernel code at runtime (i.e. not using bootmem)? From my experiments, it seems like it's not possible to get anything more than a 4mb chunk successfully, no matter what GFP flags I use. According to the documentation I've read, GFP_NOFAIL is supposed to make the kmalloc just wait as long as is necessary to free the requested amount, but from what I can tell it just makes the request hang indefinitely if you request more than is availble - it doesn't seem to be actively trying to free memory to fulfil the request (i.e. kswapd doesn't seem to be running). Is there some way to tell the kernel to aggressively start swapping stuff out in order to free the requested allocation?
Edit: So I see from Eugene's response that it's not going to be possible to get a 32mb region from a single kmalloc.... but is there any possibility of getting it done in more of a hackish kind of way? Like identifying the largest available contiguous region, then manually migrating/swapping away data on either side of it?
Or how about something like this:
1) Grab a bunch of 4mb chunks until you're out of memory.
2) Check them all to see if any of them happen to be contiguous, if so,
combine them.
3) kfree the rest
4) goto 1)
Might that work, if given enough time to run?
You might want to take a look at the Contiguous Memory Allocator patches. Judgging from the LWN article, these patches are exactly what you need.
Mircea's link is one option; if you have an IOMMU on your device you may be able to use that to present a contiguous view over a set of non-contiguous pages in memory as well.
When recording from a microphone in CoreAudio, what is kAudioDevicePropertyBufferFrameSize for? The docs say it's "A UInt32 whose value indicates the number of frames in the IO buffers". However, this doesn't give any indication of why you would want to set it.
The kAudioDevicePropertyBufferFrameSizeRange property gives you a valid minimum and maximum for the bufferframe size. Does setting the bufferframe size to the max slow things down? When would you want to set it to something other than the default?
Here's what they had to say on the CoreAudio list:
An application that is looking for low
latency IO should set this value as
small as it can keep up with.
On the other hand, apps that don't
have large interaction requirements or
other reasons for low latency can
increase this value to allow the data
to be chunked up in larger chunks and
reduce the number of times per second
the IOProc gets called. Note that this
does not necessarily lower the total
load on the system. In fact,
increasing the IO buffer size can have
the opposite impact as the buffers are
larger which makes them much less
likely to fit in caches and what not
which can really sap performance.
At the end of the day, the value an
app chooses for it's IO size is really
something that is dependent on the app
and what it does.
Usually you'd leave it at the default, but you might want to change the buffer size if you have an AudioUnit in the processing chain that expects or is optimized for a certain buffer size.
Also, generally, larger buffer sizes result in higher latency between recording and playback, while smaller buffer sizes increase the CPU load of each channel being recorded.