I am working on a project to read from our existing ElasticSearch instance and produce messages in Pulsar. If I do this in a highly multithreaded way without any explicit synchronization, I get many occurances of the following log line:
Message with sequence id X might be a duplicate but cannot be determined at this time.
That is produced from this line of code in the Pulsar Java client:
https://github.com/apache/pulsar/blob/a4c3034f52f857ae0f4daf5d366ea9e578133bc2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L653
When I add a synchronized block to my method, synchronizing on the pulsar template, the error disappears, but my publish rate drops substantially.
Here is the current working implementation of my method that sends Protobuf messages to Pulsar:
public <T extends GeneratedMessageV3> CompletableFuture<MessageId> persist(T o) {
var descriptor = o.getDescriptorForType();
PulsarPersistTopicSettings settings = pulsarPersistConfig.getSettings(descriptor);
MessageBuilder<T> messageBuilder = Optional.ofNullable(pulsarPersistConfig.getMessageBuilder(descriptor))
.orElse(DefaultMessageBuilder.DEFAULT_MESSAGE_BUILDER);
Optional<ProducerBuilderCustomizer<T>> producerBuilderCustomizerOpt =
Optional.ofNullable(pulsarPersistConfig.getProducerBuilder(descriptor));
PulsarOperations.SendMessageBuilder<T> sendMessageBuilder;
sendMessageBuilder = pulsarTemplate.newMessage(o)
.withSchema(Schema.PROTOBUF_NATIVE(o.getClass()))
.withTopic(settings.getTopic());
producerBuilderCustomizerOpt.ifPresent(sendMessageBuilder::withProducerCustomizer);
sendMessageBuilder.withMessageCustomizer(mb -> messageBuilder.applyMessageBuilderKeys(o, mb));
synchronized (pulsarTemplate) {
try {
return sendMessageBuilder.sendAsync();
} catch (PulsarClientException re) {
throw new PulsarPersistException(re);
}
}
}
The original version of the above method did not have the synchronized(pulsarTemplate) { ... } block. It performed faster, but generated a lot of logs about duplicate messages, which I knew to be incorrect. Adding the synchronized block got rid of the log messages, but slowed down publishing.
What are the best practices for multithreaded access to the PulsarTemplate? Is there a better way to achieve very high throughput message publishing?
Should I look at using the reactive client instead?
EDIT: I've updated the code block to show the minimum synchronization necessary to avoid the log lines, which is just synchronizing during the .sendAsync(...) call.
Your usage w/o the synchronized should work. I will look into that though to see if I see anything else going on. In the meantime, it would be great to give the Reactive client a try.
This issue was initially tracked here, and the final resolution was that it was an issue that has been resolved in Pulsar 2.11.
Please try updating the Pulsar 2.11.
Related
Given default configuration and this binding
#Bean
public Function<Flux<Message<Input>>, Flux<Message<Output>>> process() {
return input -> input
.map(message -> {
// simplified
return MessageBuilder.build();
});
}
Is there any guarantee that input message offset is commited after output is written to Kafka? I don´t need full Transactions, and I can live with at-least-once delivery and possible duplicates, but I cannot loose output message. I was unable to find this exact scenario in docs, and I believe previous channel-based binding worked as I need it to, since it was blocking by nature, but I am not sure about functional.
I'm using ktor for server side development with websockets.
Documentations shows us this example of using incoming channel:
for (frame in incoming.mapNotNull { it as? Frame.Text }) {
// some
}
But mapNotNull is marked as deprecated in favor of Flow. How should I use this API and what problems could be there? For example, the Flow is a cold stream. It means that the producer function will be called on each collect. How does it work in context of websocket. Will it be reopened on second collect call, or maybe old messages will be delivered once after the next collect? How can I collect N messages, then stop collecting, then collect again?
Thanks in advance :)
How should I use this API and what problems could be there?
What I am using and what I have seen in one of the examples somewhere in the docs is the consumeAsFlow() method called on ReceiveChannel. Here is the entire snippet:
webSocket("/websocket") { //this: DefaultWebSocketServerSession
incoming
.consumeAsFlow()
.map { receive(it) }
.collect()
}
Haven't seen major issues with this approach. One thing you should be aware of (but that goes for the non-flow approach as well) is that if you throw inside your flow, then it will break the WebSocket connection, which is usually not something you'd like to do. It might be worth considering wrapping the entire thing in a try-catch.
Will it be reopened on second collect call, or maybe old messages will be delivered once after the next collect?
You open the websocket before you even start consuming the messages from the flow. You can see that inside webSocket() {} you are in the context of DefaultWebSocketServerSession. This is your connection management. Inside your flow you are simply receiving messages one by one as they arrive (after the connection has been established). If the connection breaks, then you're out of the flow. It needs to be re-established before you can process your messages. This establishing bit is done by the Route.webSocket() method. I do recommend taking a look at its Javadoc.
If you wish to add some clean up after the connection is closed you can add a finally block like so:
webSocket("/chat") {
try {
incoming
.consumeAsFlow()
.map { receive(it, client) }
.collect()
} finally {
// cleanup
}
}
In short: collect is called once per received message. If there is no connection (or it was broken) then collect won't be called.
How can I collect N messages, then stop collecting, then collect again?
What is the use case for this? I don't think you should be doing this with any flow. You can of course take(n) items from a flow, but you won't be able to take any more from it again.
Hello all who read this,
We have written a router function on azure in an app plan that receives messages from iothub
and depending the message type we route our message to another eventhub.
Previously we had 6 out bindings to eventhubs in this function
Recently we added 3 more message type so 3 more out binding to 3 more eventhubs
No processing of the messages happen in this function but what we see now is that we spend 16 times more time in the routing function.
Is there a performance issue about having multiple output bindings.
We don't see an increase in load of the incoming messages.
We are running on azure functions 1.0 (Runtime version: 1.0.12205.0 (~1))
Regards Ben
Simplified Sample code of the routing function
public static class IotHubRouterFunction
{
[FunctionName("IotHubRouterFunction")]
public static void Run([EventHubTrigger("%iothub%", Connection = "IothubRouterListen")]EventData myEventHubData,
[EventHub("%msg1-eventhub%", Connection = "msg1event")] ICollector<EventData> eventHub4Dmsg1Event,
[EventHub("%msg2-eventhub%", Connection = "msg2event")] ICollector<EventData> eventHub4Dmsg2Event,
[EventHub("%msg3-eventhub%", Connection = "msg3event")] ICollector<EventData> eventHub4Dmsg3Event,
//... like 6 more bindings like this
ILogger logger
)
{
try
{
var messageType = GetValue(myEventHubData.Properties, "type");
// routing
switch (messageType)
{
case "msg1event":
{
eventHub4DevicesStatusChanged.Add(eventHub4Dmsg1Event);
break;
}
case "msg2event":
{
eventHub4MeasurementLog.Add(eventHub4Dmsg2Event);
break;
}
case "msg3event":
{
eventHub4DeviceDiscovered.Add(eventHub4Dmsg3Event);
break;
}
//6 more cases like this
default:
{
logger.LogError("Unrouteable message of type: {messageType}", messageType);
break;
}
}
}
catch (Exception ex)
{
//removed
}
}
}
With 6 bindings the message fly through the router function at 50ms
With 9 bindings the message crawl through the router function at 800ms
CPU raised with 30% as well on the applan (we scaled extra so we have it under control but why so much what is causing this)
A little late with the follow up of what happened
In the end we found out what was going on
We have several instances of our app plan
but the old monitoring solution showed the average of the cpu and memory overall the instances of the applan.
Basically with switching to the newer metrics and azure monitoring we were able to drill down in the separate instances of the app plan and the instances of the functions.
We found out that one instance of a function which was running three times two of them norammly but the third function had crashed it's internal apppool and consumed all cpu power it got hold off and did absolutely nothing.
We restarted the function and all issues were gone.
Still wondering if it was something in our code that made it go through the roof
or that something happened in azure that made it go crazy.
:-s
When you are using Azure Function under App service plan then you have to watch out for performance parameters like scaling. Have you investigated your function is not getting overloaded ?
On the other hand , As part of your design this approach is wrong to me. With this many bindings there could be potential performance issues , and what if you are supposed to add more bindings in future ? If you are not performing any operation then you shouldn't be taking overhead of redirecting messages.
Event Grid
We can use event grids for that. Based on topic the IoT hub publishes the event to a topic and events are consumed by subscribers in your case other event hubs. You also get advantage of micro billing (serverless) and auto scaling as well. https://learn.microsoft.com/en-us/azure/event-grid/overview
I've been working with spring-boot 2.0.0.RC1 using the webflux starter (spring-boot-starter-webflux). I created a simple controller that returns a infinite flux. I would like that the Publisher only does its work if there is a client (Subscriber). Let's say I have a controller like this one:
#RestController
public class Demo {
#GetMapping(value = "/")
public Flux<String> getEvents(){
return Flux.create((FluxSink<String> sink) -> {
while(!sink.isCancelled()){
// TODO e.g. fetch data from somewhere
sink.next("DATA");
}
sink.complete();
}).doFinally(signal -> System.out.println("END"));
}
}
Now, when I try to run that code and access the endpoint http://localhost:8080/ with Chrome, then I can see the data. However, once I close the browser the while-loop continues since no cancel event has been fired. How can I terminate/cancel the streaming as soon as I close the browser?
From this answer I quote that:
Currently with HTTP, the exact backpressure information is not
transmitted over the network, since the HTTP protocol doesn't support
this. This can change if we use a different wire protocol.
I assume that, since backpressure is not supported by the HTTP protocol, it means that no cancel request will be made either.
Investigating a little bit further, by analyzing the network traffic, showed that the browser sends a TCP FIN as soon as I close the browser. Is there a way to configure Netty (or something else) so that a half-closed connection will trigger a cancel event on the publisher, making the while-loop stop?
Or do I have to write my own adapter similar to org.springframework.http.server.reactive.ServletHttpHandlerAdapter where I implement my own Subscriber?
Thanks for any help.
EDIT:
An IOException will be raised on the attempt to write data to the socket if there is no client. As you can see in the stack trace.
But that's not good enough, since it might take a while before the next chunk of data will be ready to send and therefore it takes the same amount of time to detect the gone client. As pointed out in Brian Clozel's answer it is a known issue in Reactor Netty. I tried to use Tomcat instead by adding the dependency to the POM.xml. Like this:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</dependency>
Although it replaces Netty and uses Tomcat instead, it does not seem reactive due to the fact that the browser does not show any data. However, there is no warning/info/exception in the console. Is spring-boot-starter-webflux as of this version (2.0.0.RC1) supposed to work together with Tomcat?
Since this is a known issue (see Brian Clozel's answer), I ended up using one Flux to fetch my real data and having another one in order to implement some sort of ping/heartbeat mechanism. As a result, I merge both together with Flux.merge().
Here you can see a simplified version of my solution:
#RestController
public class Demo {
public interface Notification{}
public static class MyData implements Notification{
…
public boolean isEmpty(){…}
}
#GetMapping(value = "/", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<? extends Notification>> getNotificationStream() {
return Flux.merge(getEventMessageStream(), getHeartbeatStream());
}
private Flux<ServerSentEvent<Notification>> getHeartbeatStream() {
return Flux.interval(Duration.ofSeconds(2))
.map(i -> ServerSentEvent.<Notification>builder().event("ping").build())
.doFinally(signalType ->System.out.println("END"));
}
private Flux<ServerSentEvent<MyData>> getEventMessageStream() {
return Flux.interval(Duration.ofSeconds(30))
.map(i -> {
// TODO e.g. fetch data from somewhere,
// if there is no data return an empty object
return data;
})
.filter(data -> !data.isEmpty())
.map(data -> ServerSentEvent
.builder(data)
.event("message").build());
}
}
I wrap everything up as ServerSentEvent<? extends Notification>. Notification is just a marker interface. I use the event field from the ServerSentEvent class in order to separate between data and ping events. Since the heartbeat Flux sends events constantly and in short intervals, the time it takes to detect that the client is gone is at most the length of that interval. Remember, I need that because it might take a while before I get some real data that can be sent and, as a result, it might also take a while before it detects that the client is gone. Like this, it will detect that the client is gone as soon as it can’t sent the ping (or possibly the message event).
One last note on the marker interface, which I called Notification. This is not really necessary, but it gives some type safety. Without that, we could write Flux<ServerSentEvent<?>> instead of Flux<ServerSentEvent<? extends Notification>> as return type for the getNotificationStream() method. Or also possible, make getHeartbeatStream() return Flux<ServerSentEvent<MyData>>. However, like this it would allow that any object could be sent, which I don’t want. As a consequence, I added the interface.
I'm not sure why this behaves like this, but I suspect it is because of the choice of generation operator. I think using the following would work:
return Flux.interval(Duration.ofMillis(500))
.map(input -> {
return "DATA";
});
According to Reactor's reference documentation, you're probably hitting the key difference between generate and push (I believe a quite similar approach using generate would probably work as well).
My comment was referring to the backpressure information (how many elements a Subscriber is willing to accept), but the success/error information is communicated over the network.
Depending on your choice of web server (Reactor Netty, Tomcat, Jetty, etc), closing the client connection might result in:
a cancel signal being received on the server side (I think this is supported by Netty)
an error signal being received by the server when it's trying to write on a connection that's been closed (I believe the Servlet spec does not provide that that callback and we're missing the cancel information).
In short: you don't need to do anything special, it should be supported already, but your Flux implementation might be the actual problem here.
Update: this is a known issue in Reactor Netty
I have to send a lot of data to I client connected to my server in small blocks.
So, I have something like:
for(;;) {
messageEvent.getChannel().write("Hello World");
}
The problem is that, for some reason, client is receiving dirty data, like Netty buffer is not clear at each iteration, so we got something like "Hello WorldHello".
If I make a little change in my code putting a thread sleep everything works fine:
for(;;) {
messageEvent.getChannel().write("Hello World");
Thread.sleep(1000);
}
As MRAB said, if the server is sending multiple messages on a channel without indicating the end of each message, then client can not always read the messages correctly. By adding sleep time after writing a message, will not solve the root cause of the problem either.
To fix this problem, have to mark the end of each message in a way that other party can identify, if client and server both are using Netty, you can add LengthFieldPrepender and LengthFieldBasedFrameDecoder before your json handlers.
String encodedMsg = new Gson().toJson(
sendToClient,newTypeToken<ArrayList<CoordinateVO>>() {}.getType());
By default, Gson uses html escaping for content, sometime this will lead to wired encoding, you can disable this if required by using a Gson factory
final static GsonBuilder gsonBuilder = new GsonBuilder().disableHtmlEscaping();
....
String encodedMsg = gsonBuilder.create().toJson(object);
In neither case are you sending anything to indicate where one item ends and the next begins, or how long each item is.
In the second case the sleep is getting the channel time out and flush, so the client sees a 'break', which it interprets as the end of the item.
The client should never see this "dirty data". If thats really the case then its a bug. But to be hornest I can't think of anything that could lead to this in netty. As every Channel.write(..) event will be added to a queue which then get written to the client when possible. So every data that is passed in the write(..) method will just get written. There is no "concat" of the data.
Do you maybe have some custom Encoder in the pipeline that buffers the data before sending it to the client ?
It would also help if you could show the complete code that gives this behavoir so we see what handlers are in the pipeline etc.