gRPC and schema evolution guarantees? - protocol-buffers

I am evaluating using gRPC. On the subject of compatibility with 'schema evolution', i did find the information that protocol buffers, which are exchanged by gRPC for data serialization, have a format such that different evolutions of a data in protobuf format can stay compatible, as long as the schemas evolution allows it.
But that does not tell me wether two iterations of a gRPC client/server will be able to exchange a command that did not change in the schema, regardless of the changes in the rest of the schema?
Does gRPC quarantee that an older generated version of a client or server code will always be able to launch/answer a command that has not changed in the schema file, with any more recent schema generated code on the interlocutor side, reglardless of the rest of the schema? (Assuming no other breaking changes like a non backward compatible gRPC version change)

gRPC has compatibility for a method as long as:
the proto package, service name, and method name is unchanged
the proto request and response messages are still compatible
the cardinality (unary vs streaming) of the request and response message remains unchanged
New services and methods can be added at will and not impact compatibility of preexisting methods. This doesn't get discussed much because those restrictions are mostly what people would expect.
There is actually some wiggle room in changing cardinality as unary is encoded the same as streaming on-the-wire, but it's generally just better to assume you can't change cardinality and add a new method instead.
This topic is discussed in the Modifying gRPC Services over Time talk (PDF slides and Youtube links available). I'll note that the slides are not intended to stand alone.

Related

grpc and protobuf: roles of a server and a client

I'm new to grpc and protobuf, and I'm trying to understand if grpc can fit my needs. Basically I have a piece of software which can invoke a script (bash or python) at certain stages and pass the script some parameters (for example, transaction status, some values, etc.), so I'd like to pass these parameters over grpc, i.e. the grpc communication has to be initiated by my script.
I know there is grpc python library, so I'd like to take advantage of those in my script. However it isn't quite clear to me if my script has to act as grpc client or server? Examples I have seen, are quite simple - request/reply, where requests are made by a client, and the server replies; this is not exactly what I'm having in mind.
Your question is vague making it difficult to provide guidance.
Stackoverflow prefers developer (coding) questions and open-ended guidance tends to be discouraged.
Couple of things:
Essentially gRPC is a mechanism by which something calls (invokes) functions|methods on something else. Usually (but not necessarily) the something else is accessed via a network. The basic idea is that you want to be able to call some procedure (function|method) e.g. something of the form add(a,b) but the thing where add is actually implemented|performed isn't your local machine but is remote. Ergo, Remote Procedure Call (RPC) and "g" for (perhaps originally) "Google"
Since gRPC is just (remote) procedure calling, there is often a concept that the caller is the client and the thing being called is a server but, these concepts are fluid and a client can be a server and a server can be a client too (depending on who's initiating the call).
gRPC is often (but not necessarily) used instead of REST, GraphQL and (many) others. It's important that you be aware of the "price" you pay for gRPC's benefits. You must define a schema for your messages. Messages are sent (over a network) using a(n efficient) binary format (i.e. non-human readable). gRPC uses HTTP/2. You must have an implementation for your language to be able to use gRPC (Python is supported; many languages are).
gRPC implementations vary but the major implementations support synchronous and asynchronous calls, request-response and client, server and bidirectional streaming.
In many cases, REST|HTTP is easier to use because it sends human-readable "messages", there are many tools (e.g. curl) available, and everyone's been using it forever.
I encourage you to read the content on the framework's site

Is protobuf only technique getting used in grpc?

Are there some other formats available which we can use instead of protobuf in grpc?
The gRPC stack has no strict dependency on the marshaller/serializer being used. All that gRPC sees is a binary buffer, with entirely opaque contents (it doesn't even specify a content-type header), sent over HTTP/2 routes.
By convention, gRPC is described by a .proto schema, which defines the gRPC methods and the payload messages, which then generates binding code using protocol buffers for the marshaller/serializer.
However, if you're willing to write the binding code yourself (or use a library that does), you can register gRPC endpoints using your own marshaller/serializer. The exact details of how to do this will vary between platforms/languages/libraries, but yes: it is possible. Since no metadata (headers, etc) is used to resolve the marshaller/serializer, both client and server must agree in advance what format is going to be used for the payload.
The gRPC protocol is agnostic to the marshaller/IDL, but Protocol Buffers is the only marshaller directly supported by gRPC.
I'm aware of Flatbuffers and Bond supporting gRPC. There are probably others.
You are free to support your own favorite format. It isn't easy, but it isn't hard; it mainly involves glue code and defining the RPC schema for gRPC to use. The gRPC + JSON gRPC blog post walks through the process for grpc-java. Each language is a bit different and has to be supported individually.

Thrift, Avro and ProtoBuf data governance

We have a use case of data streaming from the main transactional system to other downstream such as data analytics and machine learning team.
One of the requirements are to ensure data governance that data source can control who can read which column, and potentially lifecycle of a data to ensure data siting in another domain gets purged should the source data removed it, such as if a user deletes the account, we need to make sure the data in all downstream gets removed.
While we are considering Thrift, Avro and ProtoBuf, what are the common frameworks that we can use for such data governance? Do any of these protocol supports metadata for such data governance around data authorization, lifecycle?
Let me get this straight:
protobuf is not a security device; to someone with the right tools it is just as readable as xml or json, with the slight issue that it can be uncertain how to interpret some values;
It's not of a much difference than JSON nor XML. It is just an interface language. Sure, it has encoding, it is a bit different and a lot more customizable, but it does in no way confront security. It is up to you to secure the channel between sender and receiver.

How to wrap a proto message in a go model

I am currently working on moving our rest api based go service to gRPC, using protobuf. it's a huge service with a lot of APIs and already in production so i don't want to make so many changes to ruin the existing system.
So i want to use my go models as source of truth, and to generate .proto messages i think i can manage with this - Generate proto file from golang struct
Now my APIs also expect the request and response according to the defined go models, i will change them to use the .proto models for request and response. but when request/response is passed i want to wrap them in my go models and then the rest of the code doesn't need any changes.
In that case if the request is small i can simply copy all the fields in my go model but in case of big requests or nested models it's a big problem.
1) Am i doing this right way ?
2) No, what's the right way ?
3) Yes, how i can copy the big proto messages to go model and vice-versa for response ?
If you want to use the go models as the source of truth, why you do want to use the .proto generated ones for the REST request/response? Is it because you'd like to use proteus service generation (and share the code between REST and gRPC)?
Usually, if you wanted to migrate from REST to gRPC, the most common way probably would be to use grpc-gateway (note that since around 1.10.x you can use it in-process without resorting to the reverse proxy), but this would be "gRPC-first", where you derive REST from that, while it seems you want "REST- first", since you have your REST APIs already in production. In fact for this reason grpc-gateway probably wouldn't be totally suitable, because it could generate slightly different endpoints from your existing ones. It depends on how much can you afford to break backward compatibility (maybe you could generate a "v2" set of APIs and keep the old "v1" around for a little while, giving time to existing clients to migrate).

Management layer above Thrift

Thrift sounds awesome but can't find some basic stuff I'm used to in RPC frameworks (as HttpServlet). Example of the things I can't find: session management, filtering, upload/download progress.
I understand that the missing stuff might be a management layer on top of Thrift. If so, any example of such a layer? Perhaps AOP (Aspect Oriented)?
I can't imagine such a layer that compiles to all languages and that's I'm missing. Taking session management as an example, there might be several clients that all need to do some authentication and pass the session_id upon each RPC. I would expect a similar API for all languages doing so.
Anyone knows of a a management layer for Thrift?
So thrift itself is not going to help you out a lot here.
I have had similar desires, and have a few suggestions:
1. Put your management objects into the IDL
Simply add an api token or common transfer data struct as a parameter to all of your service methods. Set it as parameter id 15 so that it will always be the last parameter, even if you add others in the middle.
As the first step in your handler you can validate/store/do whatever with the extra data.
This has the advantage that it is valid in any platform that thrift supports.
2. Use thrift over http
If you use http as your transport, you can include whatever data as you want as http headers, and the thrift content as the body.
This will often require a custom http client for every platform you use to inject the data, and a custom handler on the server to use the data, but neither of those are prohibitively difficult.
3. Hack the protocol
It is possible to create your own custom protocol that wraps another protocol and injects custom data. Take a look at how the multiplexed protocol works in the thrift library for most languages:
c# here. It sends the method name across the wire as service:method. The multiplexed processor unwraps this encoding and passes it on to the appropriate processor.
I have used a similar method to encode arbitrary key/value pairs (like http headers) inside the method name.
The downside to this is that you need to write a more complicated extension for each platform you will be using. Once. It varies a bit from language to language how this works, but it is generally simple enough once you figure it out once.
These are just a few ideas I have had, and I am sure there are others. The nice thing about thrift is how the individual components are decoupled from each other. If you have special needs you can swap any of them out as you need to to add specific functionality.

Resources