I want to create a simple gRPC endpoint which the user can upload his/her picture. The protocol buffer declaration is the following:
message UploadImageRequest {
AuthToken auth = 1;
// An enum with either JPG or PNG
FileType image_format = 2;
// Image file as bytes
bytes image = 3;
}
Is this approach of uploading pictures (and recieving pictures) still ok regardless of the warning in the gRPC documentation?
And if not, is the better approach (standard) to upload pictures using the standard form and storing the image file location instead?
For large binary transfers, the standard approach is chunking. Chunking can serve two purposes:
reduce the maximum amount of memory required to process each message
provide a boundary for recovering partial uploads.
For your use-case #2 probably isn't very necessary.
In gRPC, a client-streaming call allows for fairly natural chunking since it has flow control, pipelining, and is easy to maintain context in the client and server code. If you care about recovery of partial uploads, then bidirectional-streaming works well since the server can be responding with acknowledgements of progress that the client can use to resume.
Chunking using individual RPCs is also possible, but has more complications. When load balancing, the backend may be required to coordinate with other backends each chunk. If you upload the chunks serially, then the latency of the network can slow upload speed as you spend most of the time waiting to receive responses from the server. You then either have to upload in parallel (but how many in parallel?) or increase the chunk size. But increasing the chunk size increases the memory required to process each chunk and increases the granularity for recovering failed uploads. Parallel upload also requires the server to handle out-of-order uploads.
the solution provided in the question will not work for files having large sizes. it will only work for smaller image sizes.
the better and standard approach is use chunking. grpc supports streaming a built in. so it is fairly easy to send in chunks
syntax = 'proto3'
message UploadImageRequest{
bytes image = 1;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
in the above way we can use streaming for chunking.
for chunking all the languages provide its own way to chunk file based on chunk size.
Things to take care:
you need to handle the chunking logic, streaming helps in sending naturally.
if you want to send the metadata also there are three approaches.
1: use below structure
message UploadImageRequest{
AuthToken auth = 1;
FileType image_format = 2;
bytes image = 3;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
here bytes is still chunks and for the first chunk send AuthToken and FileType and for all other requests just don't send those metadata.
2: you can also use oneof which is much easier.
message UploadImageRequest{
oneof test_oneof {
Metadata meta = 2;
bytes image = 1;
}
}
message Metadata{
AuthToken auth = 1;
FileType image_format = 2;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
3: just use below structure and in first chunk send metadata and other chunks will have data. you need to handle that in code.
syntax = 'proto3'
message UploadImageRequest{
bytes message = 1;
}
rpc UploadImage(stream UploadImageRequest) returns (Ack);
lastly for auth you can use headers instead of sending that in message.
Related
This question already has answers here:
gRPC + Image Upload
(2 answers)
Closed 3 years ago.
Is there a good pattern for sending a large dataset using gRPC and protocol buffers that has a mix of some header data, and some large repeated data?
E.g. for a server that accepts a matrix generated by some other process as input, a service and message might look as follows:
service MatrixService {
rpc DoSomething(stream Matrix) returns (stream Matrix) {}
}
message Matrix {
uint32 num_rows = 1;
uint32 num_cols = 2;
string created_by = 3;
string creation_parameters = 4;
repeated float data = 5;
}
It's really only the data field that is large enough to require streaming, the rest of the parameters are just headers that the server only needs to receive once.
Is there some commonly used pattern for efficiently making gRPC request that contains some initial header information, and a large amount of repeated data?
Should oneOf be used in cases like this (i.e. splitting the Matrix message into a oneOf { MatrixHeader, MatrixData }?
Or is it typically more common to just set the header fields on the first request, and leave them blank by convention in subsequent requests?
Or are there some other solutions that I've not considered?
You might want to consider chunking your data. See gRPC + Image Upload.
Also, note that the maximum receive message size is 4 MB by default. You can increase it by setting the channel argument GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH
If I have a basic http handler for POST requests, how can I stop processing if the payload is larger than 100 KB?
From what I understand, in my POST Handler, behind the scenes the server is streaming the POSTED data. But if I try and access it, it will block correct?
I want to stop processing if it is over 100 KB in size.
Use http.MaxBytesReader to limit the amount of data read from the client. Execute this line of code
r.Body = http.MaxBytesReader(w, r.Body, 100000)
before calling r.ParseForm, r.FormValue or any other request method that reads the body.
Wrapping the request body with io.LimitedReader limits the amount of data read by the application, but does not necessarily limit the amount of data read by the server on behalf of the application.
Checking the request content length is unreliable because the field is not set to the actual request body size when chunked encoding is used.
I believe you can simply check http.Request.ContentLength param to know about the size of the posted request prior to decide whether to go ahead or return error if larger than expected.
I am using GWAN (v4.3.14) and facing a strange issue. I am trying to pass some long text in the query string. I have figured out that GWAN does not allow me to pass query parameters beyond a total request size of 537 characters.
It responds with a 400 Bad Request
An example string is :
http://xxx.xxx.xxx.xxx:yyyy/?t.cpp&c=DbE9kdOJGMm9yr7aypGlQBY1a9rZuiaMDAAnTJSbOBRJZo45YHbpAO5VENLa6IcmlSadZnTucpKBKb0E0G15pFHCgB4oNxqQ3m1K0CX8K15RQkawb8MThuoIHKp02vk9WwJFU5NkBJtwu80onudOkwWPUiGxKKcJiSwJJNcgDY1LQIJ1GnvgRGgomthoxppsZ1cl7zxIf5CjWggzsbUnADDTq5W4pBXveVnugOBHryqdTylhI4tudeae2jUnswezxtQM1qKG3ezGkM2dN68R7YxpCEfZ2N1nXggUkYdGn6em7veq5G5LpTVrdexn0fSozGbeNfHXS2OLjWGhffcEdGeu1dFKnFxNac6IETbIiVvTjv55wcZI7WBiTA0r60KJkUZYNn59W6XhnAwTk0zCYN2Rq8LraOjHzjXHjcyL9Sk6jw4D9K0wWLsiZHDfTOlnPr9jYp2SesyHlUJsCHPiHOR4fCBVwQMwh5YOddcpl2Kbr6CjSjWabaac
The code in my C++ file is:
# include "gwan.h"
# include <iostream>
using namespace std;
int main (int argc, char * argv[])
{
if(argc)
{
cout<<argv[0];
xbuf_cat(get_reply(argv), argv[0]);
}
else
{
xbuf_cat(get_reply(argv), "pass something to me to see it on your screen.");
}
return 200;
}
Can someone help me to make GWAN accept a query parameter of 1000 characters or more?
The error with G-WAN v4.5+ is "414: Request URI too large".
Many production HTTP servers disable PUT/POST Entities to avoid abuse.
G-WAN first used a limit slightly larger than 4KiB, but most requests do not need so much room so we have made it possible for developers to decide.
The example below (see entity_size.c for a working example) shows how to modify the G-WAN (server-global) PUT/POST Entity size limit from a servlet but this can also be done in the init() or the main() calls of a connection handler, and from the gwan/init.c script available in v4.10+:
u32 *max_entity_size = (int*)get_env(argv, MAX_ENTITY_SIZE);
*max_entity_size = 200 * 1024; // new size in bytes (200 KiB)
You can change the limit at any time (even while a given user is connected) by using IP filtering in a connection handler.
Your servlets will decide what to do with the entity anyway so you can dispose or store on disk or do real-time processing, see the entity.c example.
Beyond this, there are a few things to keep in mind:
to avoid DoS attacks letting everybody send huge entities to your server (in the GBs), you might enlarge the request size of authorized users only;
when dealing with requests without a PUT/POST Entity you may also dynamically enlarge the read buffer by allocating more memoy to the READ_XBUF by using xbuf_growto().
Now you know how to accept requests of any length. Make sure you do it only when needed.
You may want to check other related values like:
KALIVE_TMO // time-out in ms for HTTP keep-alives
REQUEST_TMO // time-out in ms waiting for request
MIN_SEND_SPEED // send rate in bytes/sec (if < close)
MIN_READ_SPEED // read rate in bytes/sec (if < close)
All of them can be setup from the gwan/init.c script - before any request can hit the server. This can also be done from G-WAN handlers and servlets, as shown in the examples cited above.
I have a sender, a message forwarder which sends fix sizes of byte data at a rate of 5 milliseconds per message to my receiving program written in vb6, when I run the message fowarder and my receiving program on one machine, there's no issue but when they run on separate machines, the receiving program starts to experience some abnormalities.
e.g:
private sub socket_DataArrival(index as integer, ByVal dataTotal as Long)
Dim Data() as Byte
Length.Text = dataTotal
socket.GetData byteData, vbArray + vbByte
If Length.Text = "100" Then
txtOutput.Text = "Message1"
ElseIf Length.Text = "150" Then
txtOutput.text = "Message2"
End Sub
I will sometimes receive "2 in 1" message as in it comes in as 250 bytes or a non-recognizable byte size when I should be receiving either 100 or 150 only but if I reduce the sending rate to a slower speed say 50 milliseconds per message then it will be fine.
Can anyone provide with an advice? Thanks.
When sending data over a network you have to get used to the fact that the packets may arrive out of order, not promptly, not at all, etc.
You need to improve your message protocol to include a header that states which type of message follows. If order is important include a sequence number (I'm assuming you're using UDP). At present you are relying on timing to separate messages, which you cannot rely on over a network.
Buffer all your arriving data and handle it in chunks - the header allows you to tell what chunk size to use. Separate your input buffering from your message handling - use the DataArrival event to add data to the buffer, use a Timer or some other means of polling the buffer to check if it has messages ready to parse. Alas, this is VB6 so threading is not so easy. Take a look at The Common Controls Replacement Project timer object DLL if you need a Timer class that doesn't rely on a UI element being present.
I am looking to send a large message > 1 MB through the windows sockets send api. Is there a efficient way to do this, I do not want to loop and then send the data in chunks. I have read somewhere that you can increase the socket buffer size and that could help. Could anyone please elaborate on this. Any help is appreciated
You should, and in fact must loop to send the data in chunks.
As explained in Beej's networking guide:
"send() returns the number of bytes actually sent out—this might be less than the number you told it to send! See, sometimes you tell it to send a whole gob of data and it just can't handle it. It'll fire off as much of the data as it can, and trust you to send the rest later."
This implies that even if you set the packet size to 1MB, the send() function may not send all of it, and you are forced to loop until the total number of bytes sent by your calls to send() total the number of bytes you are trying to send. In fact, the greater the size of the packet, the more likely it is that send() will not send it all.
Aside from all that, you don't want to send 1MB packets because if they get lost, you will have to transmit the entire 1MB packet again, whereas if you lost a 1K packet, retransmitting it is not a big deal.
In summary, you will have to loop your send() calls, and the receiver will even have to loop their recv() calls too. You will likely need to prepend a small header to each packet to tell the receiver how many bytes are being sent so the receiver can loop the appropriate number of times.
I suggest you take a look at Beej's network guide for more detailed info about send() and recv() and how to deal with this problem. It can be found at http://beej.us/guide/bgnet/output/print/bgnet_USLetter.pdf
Why don't you want to send it in chunks?
That's the way to do it in 99% of the cases.
What makes you think that sending in chunks is inefficient? The OS is likely to chunk large "send" calls anyway, and may coalesce small ones.
Likewise on the receiving side the client should be looping anyway as there's no guarantee of getting all the data in one go.
The windows sockets subsystem is not oblidged to send the whole buffer you provide anyway. You can't force it since some network level protocols have an upper limit for the packet size.
As a practical matter, you can actually allocate a large buffer and send in one call using Winsock. If you are not messing with socket buffer sizes, the buffer will generally be copied into kernel mode for sending anyway.
There is a theoretical possibility that it will return without sending everything, however, so you really should loop for correctness. The chunks you send should, however, be large (64k or the ballpark) to avoid repeated kernel transitions.
If you want to do a loop after all, you can use this C++ code:
#define DEFAULT_BUFLEN 1452
int SendStr(const SOCKET &ConnectSocket, const std::string &str, int strlen){
char sndbuf[DEFAULT_BUFLEN];
int sndbuflen = DEFAULT_BUFLEN;
int iResult;
int count = 0;
int len;
while(count < strlen){
len = min(strlen-count, sndbuflen);
//void * memcpy ( void * destination, const void * source, size_t num );
memcpy(sndbuf,str.data()+count,len);
// Send a buffer
iResult = send(ConnectSocket, sndbuf, len, 0);
// iResult: Bytes sent
if (iResult == SOCKET_ERROR){
throw WSAGetLastError();
}
else{
if(iResult > 0){
count+=iResult;
}
else{
break;
}
}
}
return count;
}