In our Java app we need to accept a (large) Grpc message, extract a field, and then based on the value of that field forward the message on to another server.
I'm trying to avoid the overhead of completely deserializing the message before passing it on.
One way to do this would be to send the field as a separate query or header parameter, but Grpc doesn't support them.
Another way would be to extract just the field of interest from the payload, but Protobuf doesn't support partial or selective deserialization.
How else can I do this?
One way you can do this is by doing it on the server side. When the server is about to send a response, it can extract the field and set it as part of the initial headers sent. You can do this by using a ServerInterceptor to extract the field that you want from the response and add it to the Metadata.
Aside from that, Protocol buffers currently require that you parse the message before accessing the internal fields.
Related
What I want to do is to validate the data inside a protobuf message before I send it to an external network. This is providing a security check.
The problem is that protobufs allow sending additional fields using an updated proto file, which allows backwards compatibility.
What this means is when I go to check a message, my autogenerated code parses the object, but drops the unknown fields. So this means the transmitted bytes could have information I don't know about.
A work around would be to transmit the version of data I have parsed and checked, which would mean dropping the new fields. That's the right security thing to do, but I still won't know that someone is sending me new version of messages. It would be nice to log that and be told I might need to update. I also want to communicate back to the sender that some of their data is being dropped.
Is there a way to know if the format of the message I received mismatches from the format I expect to receive?
I need a comprehensive list of what is valid and what is invalid when sending a bulk payload to ElasticSearch.
Bulk endpoint is a indexing endpoint. So at a highlevel you can only send indexing requests to that endpoint.
Since its bulk, a valid request is designed around multiple doc and how they are delimited. e.g. if you can not using a ES client then you will need to have payload formatted as ndjson (new delimited json) and last doc should end in a new line as well. Better to use a client since a client will do this all for you.
Apart from syntax of payload data you can target a index , a type etc in URL.
You can also send some other param like "wait_for_completion", "retry_on_conflict" etc. These are parameter which will control how each requests will behave.
Needless to say but best would be to read up the doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
In IBM MQ, I have a requirement where I can get many types of xml from the queue. The xml messages will be conformed to already specified xsd (there are say, 5 xsd - which means I can get 5 different xml). When I get the message from queue, I would like to know the type of xml (if its xsd1 or xsd2 or so on)
The reason why I would want to know is, I am using a JaxB interface with SAX implementation, for which I need to give the java object corresponding to the xml as parameter. So I have to know which xsd the input and is and assign the parameter correspondingly.
The options I have is to set a property in the header to the message, but the party who is dropping the message into MQ is not ready.
What other options do I have? Can I get the file name (of xml) from the mq and find the xsd based on the name of the file? Or do I have to do I sax parsing and identify the root tag and derive the xsd type? Any other better option anybody has in mind?
Think of MQ like the Post Office. When you get a letter, the post office doesn't mess with anything on the inside (the payload) and if it changes the outside, it only changes routing information. If you want to sort incoming mail to different recipients, whoever is sending it has to put the data against which the sort criteria operate on the outside of the envelope. If that doesn't work, you must open the envelope and look for the recipient name, department, or whatever on the papers inside.
Your MQ message is that envelope. The sort criteria can be different queue names, a property of the message, a property of the message header, or something in the payload. But unless the sender explicitly sets the destination queue name based on the selection criteria, or sets the message or header property, your only option is to inspect the payload and figure it out.
If you have to inspect the payload, this is a perfect scenario for IBM Integration Broker. But you can also write an application to perform this function. Very often this is performed by a Dispatch app which gets the message, figures out where it goes, then puts it onto another queue and COMMITs the GET and PUT operations. But if the dispatch app must parse the XML to determine the correct queue, the message has to be parsed twice - once by the dispatcher, once by the receiving app.
I think you can do:
Does the incoming message has the file name at the beginning of the message body? In that case, after receiving the message your application can read first few bytes to get the file name. Based on the file name, application can use appropriate Xsd and pass the entire message body.
Our server A notifies 3rd party server B with an XML-formatted message, sent as HTTP POST request. It's us who specify the message format and other aspects of interaction.
We can specify that the XML is sent as
a) raw data (just the XML)
b) single POST parameter having some specific name (say, xml=XML)
The question is which way is better for the 3rd party in general, if we don't know the platform and language they are using.
I thought I had seen some problems in certain languages to easily parse the nameless raw data, though I don't remember any specific case. While my colleague insists that the parameter name is redundant, and it's really better to send the raw data without any name.
If you don't need send extra information in other post parameters the xml parameter name is redundant and innecesary as your teammate said, if the 3rd party waits only for a XML data only send the raw data in the POST body with the correct mime type and encoding and and do not complicate.
The process for Getting raw data is easy in most application server containers, so you dont care about that, most of them uses a Reader to get received data and manipulate it.
I have a client-server application where the server transmits serialized objects in protobuf format to a client, and I would like to retire a required field. Unfortunately I can't change both client and server at the same time to use the new .proto definition.
If I change a required field to be optional, but only for code that serializes messages (i.e. deserializing code has not been rebuilt and still thinks it's a required field), can I continue to publish messages that can be deserialized as long as I populate a value for the now-optional field?
(It appears to be fine to do so, at least for a few trivial cases I experimented with (only using Java), but I'm interested if it's a generally sensible approach, and whether there are any edge cases etc I should worry about).
Motivation: My goal is to retire a required field in a client-server application where the server publishes messages that are deserialized by the client. My intended approach is:
Change required field to optional on the trunk.
If it's necessary to deploy new server code (for unrelated features/fixes), ensure that the optional field continues to be populated in the message.
Gradually deploy updated code for all clients (this will take time as it requires involvement of other teams with their own release schedules)
Confirm that all clients have been updated.
Begin to retire (i.e. not populate) the optional field.
According to the encoding format documentation, whether a field is required or not is not encoded in serialized byte stream itself. That is, optional or required makes no difference in the encoded serialized message.
I've confirmed this in practice, using the Java generated code, by writing serialized messages to disk and comparing the output - using a message containing all of the supported primitive types as well as fields representing other types.
As long as the field is set, using the parseFrom(byte[]) method to deserialize will still work, because the byte[] will be the same.
However, one would wonder why you would change the field from required to optional until you are ready to allow it to be optional? Basically you are just making it "optional" in the .proto file, but you are enforcing that it is required by always populating it. Just a thought.