Can I write different types of messages to a chronicle-queue? - chronicle

I would like to write different types of messages to a chronicle-queue, and process messages in consumers depending on their types.
How can I do that?

Chronicle-Queue provides low level building blocks you can use to write any kind of message so it is up to you to choose the right data structure.
As example, you can prefix the data you write to a chronicle with a small header with some meta-data and then use it as discriminator for data processing.

To achieve this I use Wire
try (DocumentContext dc = appender.writingDocument())
{
final Wire wire = dc.wire();
final ValueOut v = wire.getValueOut();
valueOut.typePrefix(m.getClass());
valueOut.marshallable(m);
}
When reading back I:
try (DocumentContext dc = tailer.readingDocument())
{
final Wire wire = dc.wire();
final ValueIn valueIn = wire.getValueIn();
final Class clazz = valueIn.typePrefix();
// msgPool is a prealloacted hashmap containing the messages I read
final ReadMarshallable readObject = msgPool.get(clazz);
valueIn.readMarshallable(readObject)
// readObject can now be used
}

You can also write/read a generic object. This will be slightly slower than using your own scheme, but is it a simple way to always read the type you wrote.

Related

Best way to model gRPC messages

I would like to model messages for bidirectional streaming. In both directions I can expect different types of messages and I am unsure as to what the better practice would be. The two ideas as of now:
message MyMessage {
MessageType type = 1;
string payload = 2;
}
In this case I would have an enum that defines which type of message that is and a JSON payload that will be serialized and deserialized into models both client and sever side. The second approach is:
message MyMessage {
oneof type {
A typeA = 1;
B typeB = 2;
C typeC = 3;
}
}
In the second example a oneof is defined such that only one of the message types can be set. Both sides a switch must be made on each of the cases (A, B, C or None).
If you know all of the possible types ahead of time, using oneof would be the way to go here as you have described.
The major reason for using protocol buffers is the schema definition. With the schema definition, you get types, code generation, safe schema evolution, efficient encoding, etc. With the oneof approach, you will get these benefits for your nested payload.
I would not recommend using string payload since using a string for the actual payload removes many of the benefits of using protocol buffers. Also, even though you don't need a switch statement to deserialize the payload, you'll likely need a switch statement at some point to make use of the data (unless you're just forwarding the data on to some other system).
Alternative option
If you don't know the possible types ahead of time, you can use the Any type to embed an arbitrary protocol buffer message into your message.
import "google/protobuf/any.proto";
message MyMessage {
google.protobuf.Any payload = 1;
}
The Any message is basically your string payload approach, but using bytes instead of string.
message Any {
string type_url = 1;
bytes value = 2;
}
Using Any has the following advantages over string payload:
Encourages the use of protocol buffer to define the dynamic payload contents
Tooling in the protocol buffer library for each language for packing and unpacking protocol buffer messages into the Any type
For more information on Any, see:
https://developers.google.com/protocol-buffers/docs/proto3#any
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto

How do I combine a Kafka Processor with a Kafka Streams application?

I am trying to construct a Kafka Streams operator that takes in a Stream< timestamp, data> and outputs another Stream where the timestamps are sorted in ascending order; the purpose is to deal with streams that have "out of order" entries due to delays in the supplier.
At first, I thought about doing this with time-windowed aggregation, but then I happened upon a solution using a Kafka Processor. I figured I could then say something like:
class SortProcessor implements Processor<timestamp,data> ...
class SortProcessorSupplier ...supplies suitably initialized SortProcessor
KStream<timestamp,data> input_stream = ...sourced from "input_topic"
KStream<timestamp,data> output_stream =
input_stream.process( new SortProcessorSupplier(...parameters...) );
However, this doesn't work because KStream.process returns void.
So, my question is: How do I "wrap" the Processor so that I can use it as follows:
KStream<timestamp,data> input_stream = ...sourced from "input_topic"
KStream<timestamp,data> output_stream =
new WrappedSortProcessor( input_stream, ...parameters... )
Instead of a Processor you can use a Transformer, which is very similar to a Processor but is better suited to forwarding results on to the stream. You can then invoke it from the stream using the KStream.transform() method instead of process().

How to merge a serialized protobuf with a another protobuf without deserializing the first

I'm trying to understand if it's possible to take a serialized protobuf that makes up part of another protobuf and merge them together without having to deserialize the first protobuf.
For example, given a protobuf wrapper:
syntax = "proto2";
import "content.proto";
message WrapperContent {
required string metatData = 1;
required Content content = 2;
}
And then imagine we get a serialized version of Content below (i.e. Content is coming that is coming from a remote client):
syntax = "proto2";
message Content {
required string name = 1;
required bytes payload = 2;
}
Do you know if any way I can inject the serialized Content into the WrapperContent without first having to deserialize Content.
The reason I'm trying to inject Content without deserializing it, is that I'm try and save on the overhead of deserializing the message.
If that answer is, no, it's not possible. That is still helpful.
Thanks, Mike.
In protobuf, submessages are stored like bytes fields.
So you can make a modified copy of your wrapper:
message WrapperContentBytes {
required string metatData = 1;
required bytes content = 2;
}
and write the already serialized content data into the content field.
Decoders can use the unmodified WrapperContent message to decode also the submessage. The binary data on the wire will be the same so decoders do not know the difference.

Protobuf: Nesting a message of arbitrary type

in short, is there a way to define a protobuf Message that contains another Message of arbitrary type? Something like:
message OuterMsg {
required int32 type = 1;
required Message nestedMsg = 2; //Any sort of message can go here
}
I suspect that there's a way to do this because in the various protobuf-implementations, compiled messages extend from a common Message base class.
Otherwise I guess I have to create a common base Message for all sorts of messages like this:
message BaseNestedMessage {
extensions 1 to max;
}
and then do
message OuterMessage {
required int32 type = 1;
required BaseNestedMessage nestedMsg = 2;
}
Is this the only way to achieve this?
The most popular way to do is to make optional fields for each message type:
message UnionMessage
{
optional MsgType1 msg1 = 1;
optional MsgType2 msg2 = 2;
optional MsgType3 msg3 = 3;
}
This technique is also described in the official Google documentation, and is well-supported across implementations:
https://developers.google.com/protocol-buffers/docs/techniques#union
Not directly, basically; protocol buffers very much wants to know the structure in advance, and the type of the message is not included on the wire. The common Message base-class is an implementation detail for providing common plumbing code - the protocol buffers specification does not include inheritance.
There are, therefore, limited options:
use different field-numbers per message-type
serialize the message separately, and include it as a bytes type, and convey the "what is this?" information separately (presumably a discriminator / enumeration)
I should also note that some implementations may provide more support for this; protobuf-net (C# / .NET) supports (separately) both inheritance and dynamic message-types (i.e. what you have above), but that is primarily intended for use only from that one library to that one library. Because this is all in addition to the specification (remaining 100% valid in terms of the wire format), it may be unnecessarily messy to interpret such data from other implementations.
Alternatively to multiple optional fields, the oneof keyword can be used since v2.6 of Protocol Buffers.
message UnionMessage {
oneof data {
string a = 1;
bytes b = 2;
int32 c = 3;
}
}

Hadoop Custom Input format with the new API

I'm a newbie to Hadoop and I'm stuck with the following problem. What I'm trying to do is to map a shard of the database (please don't ask why I need to do that etc) to a mapper, then do certain operation on this data, output the results to reducers and use that output again to do the second phase map/reduce job on the same data using the same shard format.
Hadoop does not provide any input method to send a shard of the database. You can only send line by line using LineInputFormat and LineRecordReader. NLineInputFormat doesn't also help in this case. I need to extend FileInputFormat and RecordReader classes to write my own InputFormat. I have been advised to use LineRecordReader since the underlying code already deals with the FileSplits and all the problems associated with splitting the files.
All I need to do now is to override the nextKeyValue() method which I don't exactly know how.
for(int i=0;i<shard_size;i++){
if(lineRecordReader.nextKeyValue()){
lineValue.append(lineRecordReader.getCurrentValue().getBytes(),0,lineRecordReader.getCurrentValue().getLength());
}
}
The above code snippet is the one that wrote but somehow doesn't work well.
I would suggest to put into your input files connection strings and some other indications where to find the shard.
Mapper will take this information, connect to the database and do a job. I would not suggest t o convert result sets to hadoop's writable classes - it will hinder performance.
The problem I see to be addressed - is to have enough splits of this relatively small input.
You can simply create enough small files with a few shards references each, or you can tweak input format to build small splits. Second way will be more flexible.
What I did, is something like this. I wrote my own record reader to read n lines at a time and send them to mappers as input
public boolean nextKeyValue() throws IOException,
InterruptedException {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 5; i++) {
if (!lineRecordReader.nextKeyValue()) {
return false;
}
lineKey = lineRecordReader.getCurrentKey();
lineValue = lineRecordReader.getCurrentValue();
sb.append(lineValue.toString());
sb.append(eol);
}
lineValue.set(sb.toString());
//System.out.println(lineValue.toString());
return true;
// throw new UnsupportedOperationException("Not supported yet.");
}
how do you thin

Resources