Purpose of bytes type in GRPC? - protocol-buffers

Can anyone explain in detail what is the purpose of using bytes in grpc? or when we can used bytes in grpc? or in which scenario we can used bytes data type? any one can give example on it?

Related

Most efficient way of sending (easily compressible) data over a TCP connection

I developed a TCP server in C/C++ which accepts connections by clients. One of the functionalities is reading arbitrary server memory specified by the client.
Note: Security is no concern here since client and server applications are ran locally only.
Uncompressed memory sending currently works as follows
The client sends the starting address and end address to the server.
The server replies with the memory read between the received starting and end address chunk-wise each time the sending buffer runs full.
The client reads the expected amount of bytes (length = end address - starting address)
Sending large chunks of memory with a potentially high amount of 0 memory is slow so using some sort of compression would seem like a good idea. This makes the communication quite a bit more complicated.
Compressed memory sending currently works as follows
The client sends the starting address and end address to the server.
The server reads a chunk of memory and compresses it with zlib. If the compressed memory is smaller than the original memory, it keeps the compressed one. The server saves the memory size, whether it's compressed or not and the compressed bytes in the sending buffer. When the buffer is full, it is sent back to the client. The send buffer layout is as follows:
Total bytes remaining in the buffer (int) | memory chunks count (int) | list of chunk sizes (int each) | list of whether a chunk is compressed or not (bool each) | list of the data (variable sizes each)
The client reads an int (the total bytes remaining). Then it reads the remaining buffer size using the remaining byte information. Now the client reads the memory chunks count (another int) to be able to parse the list of chunk sizes and the list of whether they are compressed or not. Then using the sizes as well as the compressed information, the client can access the list of data and apply a decompression if necessary. The raw memory buffer is then assembled from all the decompressed received data. Reading from the server continues till the expected amount of raw bytes is assembled.
My question is if the compression approach appears optimal or if I'm missing something important. Sending TCP messages is the bottleneck here so minimizing them while still transmitting the same data should be the key to optimize performance.
Hy, I will give you a few starting point. Remember those are only starting points.
First read this paper:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.2302&rep=rep1&type=pdf
and this
https://www.sandvine.com/hubfs/downloads/archive/whitepaper-tcp-optimization-opportunities-kpis-and-considerations.pdf
This will give you a hint what can go wrong and it is a lot. Basically my advice is concentrate on behavior of the server/network system. What I mean try to get it stress tested and try to get a consistent behavior.
If you get a congestion in the system have a strategy for that. Optimize the buffer sizes for the socket. Research how the ring buffers work for the network protocols. Research if you can use jumbo MTU. Test if jitter is a problem in your system. Often because of some higher power the protocols start behaving erratic ( OS is busy, or some memory allocation ).
Now the most important you need to stress test all the time every move you make. Have a consistent reproducable test with that you can test at any point.
If you are on linux setsockopt is your friend and enemy at the same time. Get to know how it works what it does.
Define boundaries what your server must be able to do and what not.
I wish you the best of luck. I'm optimizing my system for latency and it's tricky to say the least.

What do decode surface and output surface mean? How do they impact the decode performance?

I am studying NVIDIA decode samples. I noticed that there are 2 parameters named ulNumDecodeSurfaces and ulNumOutputSurfaces. The max value of ulNumDecodeSurfaces is 20, the max value of ulNumOutputSurfaces is 8.
Does anyone know what these 2 parameters mean? Will they impact the decode performance? How do they impact the decode performance?
They are found along with comments in cuviddec.h (available online at https://www.ffmpeg.org/doxygen/3.2/cuviddec_8h_source.html).
ulNumOutputSurfaces is the maximum number of output surfaces that the decoder can be writing each image to.
ulNumDecodeSurfaces has the comment: "Maximum number of internal decode surfaces", which is somewhat more ambiguous. The source code for this library is not available outside nVidia, so we'll have to rely on someone from nVidia responding with an authoritative answer. However, looking at the values that this is set to in the example code makes it look like this is the number of frames in the internal decoding pipeline. Presumably, making this larger increases GPU memory use but provides additional buffering so that the pipeline is less likely to block because the application is not pulling frames off the decoder fast enough. There is a comment elsewhere that indicates that there should always be at least 2 frames in the decode queue to keep all of the decode engines busy.

Calculate optimal package size (Optimal Payload ) for maximize network speed

I'm streaming tons of bytes form one PC to web application (javascript) (running on other machine/pc/mobile) through websocket.
Per my knowledge, there is a need to split the data into packages to maximize the network performance. My question is:
how I can obtain the payload size (optimal size) in real time to get the maximum speed!
is this different if I want to use WebRTC?
You should send as many bytes at once as you can. This not only optimizes sending on the network but also the software stack on the sending machine. As for the packet size, you'll find that about 1300-1400 bytes is a good size on most systems and networks, because it's a bit less than one "MTU".
Per my knowledge, there is a need to split the data into packages to maximize the network performance
There is indeed, but the TCP layer does that for you. The best you can do is provide TCP with as much data as possible, as quickly as possible, so it has the maximum choice.
my question is: 1. how I can obtain the payload size (optimal size) in real time
You can't.
to get the maximum speed?
You can't.
is this different if I want to use WebRTC?
No.

How to avoid buffer overflow on asynchronous non-blocking WSASend calls

This is regarding TCP server which is being developed on Windows. The server will have thousands of clients connected. And server will continuously keep sending random sized bytes (Lets say anything between 1 to 64KB) to clients in non-blocking asynchronous manner.
Currently I do not have any constraint or condition before I call WSASend. I just call it with buffer I got of whatever size, and receive callback (as it is non-blocking call) once data is sent.
The problem is that, if one or few of clients are slow in receiving data, eventually my server's kernel buffer get full and I end up getting buffer overflow (WSAENOBUFS) errors afterwards.
To avoid that, I plan to do like this:
If server has (X) size kernel buffer, and if maximum number of clients would be connected is (N) then I'll allow only (X)/(N) bytes to be written on socket of each client.
(Thus for 50K connection and for kernel buffer size 128 MB, I'll write only maximum 2684 bytes at a time to each socket) And once I receive callback, I can send next set of bytes.
This way even if any of or few clients are slow, it will not result in occupying all of kernel buffer with their pending data.
Now questions are:
Is it correct approach to do this?
If yes, how much
a. Size of kernel buffer (X), and
b. Maximum number of connections to be allowed (N),
should be good to go with for optimum performance.
Note: This is not duplicate of my previous question on same issue. But this is more about validating solution I came up after going through its answer and link I got in answer of the question.
Don't have multiple WSASend() calls outstanding on the same socket. Maintain your own buffer for outgoing data. When you put data in the buffer, if the buffer was previously empty then pass the current buffer content to WSASend(). Each time WSASend() completes, it tells you how many bytes it sent, so remove that many bytes from the front of the buffer, and if the buffer is not empty then call WSASend() again with the remaining content. While WSASend() is busy, if you need to send more data, just append it to the end of your buffer and let WSASend() see it when it is ready.
Let the error tell you. That's what it's for. Just keep sending until you get the error, then stop.

how to know the TCP buffer size dynamically

Is it possible to know the TCP buffer size dynamically on windows.I set the TCP buffer size using SO_SNDBUF,SO_RECVBUF and also can check its allocated buffer size using getsockopt(). But I wanted to know how to get the available buffer size so that if buffer size goes beyond I can take some action.Any utility or api will be equally useful.
My question is specific to windows.Though of anyone knows anything about linux it can also be useful to know for me to get any any parallel.
Buffers are used asynchronously by the kernel. You can not control them. Moreover, the underlying implementation can ignore your SO_SNDBUF/SO_RECVBUF requests, or choose to provide smaller/larger amounts than requested.

Resources