I got to know Chronicle-Queue from post:
Implementing a file based queue
Here's my use case:
I have a web server (say tomcat) which serves http requests
Each request processing might generate some tracing info.
I'll write these tracing info into a Chronicle-Queue (in bytes[], I'll do the marshalling/unmarshalling by my own, like using protobuf)
I'll have a dedicate thread to use a tailer to read from the Chronicle-Queue. Each message will be processed ONLY once, if failed, I'll have my own retry policy to put it back to the queue to allow next try.
Based on above use case, I have below questions:
How many appenders should be used? Multiple threads share 1 appender or each thread has its own appender?
is queue.acquireAppender() a heavy operation? Shall I cache the appender to avoid calls to acquireAppender()?
If for some reason server is down, can tailer remember the last success read entry and continue with next entry ? (like a millstone feature)
How can I purge/delete old files? Any API to do the purge?
And another irrelevant question:
Is it possible to use Chronicle-Queue to implement a file based BlockingQueue?
Thanks
Leon
How many appenders should be used? Multiple threads share 1 appender or each thread has its own appender?
I suggest you use queue.acquireAppender() and it will create Appenders as needed.
is queue.acquireAppender() a heavy operation? Shall I cache the appender to avoid calls to acquireAppender()?
It's not free but costs a ~100 of nanoseconds.
If for some reason server is down, can tailer remember the last success read entry and continue with next entry ? (like a millstone feature)
We suggest recording to another queue the outcomes of processing the first queue. In this you can record the index it is up to. This is a feature we are considering without the need to add a queue.
How can I purge/delete old files? Any API to do the purge?
If you set a StoreFileListener on the builder you can be notified when a file isn't needed any more.
Related
I have been starting to make greater use of the message data feature of masstransit and am getting to the point needing to manage the message data in the store - i.e. remove old data.
The obvious choice is to have some outside process tidy up data, but clearly a scheduled (or not) clean up could remove data still in use or referenced by error or dead letter queues.
Ideally I would like to limit stored message data retention to messages only in error or dead letter queues, and automatically remove data for messages that have been successfully processed.
What would be the best approach to achieve this with MassTransit? Perhaps with a MiddleWare approach or similar, and if that is the case what is the correct approach?
Manual cleanup is recommended, using whatever makes sense for the repository in use. Because messages may still be in queues, or in error/dead-letter queues as you pointed out, it is really up to development/operations team to know when the right time is to remove older message data.
I'd suggest monitoring and managing the error/dead-letter queues more aggressively, keeping them empty. And then, just figure a good timeframe to delete old message data - one week, ten days, whatever - and deal with it that way.
I have had a backlog item to come up with a way to automatically manage message data, but since message data can be forwarded (using the same stored data) either via publish or send, there is no good way to track references.
I have a spring boot application (let's say it's called app-1) that is connected to a kafka cluster and that consumes from a specific topic, let's say the topic is called "foo". Topic foo always receives a message when another application (let's say it's called app-2) has imported a new foo-item into the database.
The topic is primarily meant to be used in a third application (let's say it's called app-3) which sends out some e-Mail notification to people that may be interested in this new foo-item. App-3 is clustered, meaning there are multiple instances of it running at the same time. Kafka automatically balances the foo-topic messages between all these instances because they use the same consumer-id. This is good and in the case of app-3 it is actually desired.
In the case of app-2, however, the messages from the foo-topic are used for cache eviction. The logic is, basically, that if there is a new foo-item then the currently existing caches should probably be cleared, because their content depends on the foo-items. The issue is that app-2 is also clustered, which means that by default kafka-logic, every instance will only receive some of the messages sent to the foo-topic. This does not work correctly for this specific app tho, because whenever there is a new foo-item, all of the instances need to know about it because all of them need their clear their local caches.
From what I understand I have these two options if I want to keep the current logic:
Introduce a distributed cache for all instances of app-2 so that they all share the same cache. Then it does not matter if only one instance receives a foo-item, because the cache eviction will also affect the cache of the other instances; even though they never learned about the foo-item. I would like to avoid this solution, as a distributed cache would add a noticeable amount of complexity and also overhead.
Somehow manage to use a different consumer-id for each instance of app-2. Then they would be considered different consumers by kafka and they all would get each foo-topic message. However, I don't even know how to programmatically do this. The code of the application is not aware of replicated instances, there is no way to access any information about what node it is. If I use a randomly generated string on startup, then each time such instance restarts it would be considered a new consumer and would have to re-process all previous messages. That would be incorrect behavior as well.
Here is my bottom line question: Is it possible to make all instances of app-2 receive all messages from the foo-topic without completely breaking the way kafka is supposed to work? I know that it is probably very unconventional to use kafka-messages for cache eviction and I am entirely able to find an alternative mechanism for the cache eviction logic that does not depend on kafka-topic messages. However, the applications are for demonstration purposes and I thought it would be cool if more than one app read from this topic. But if I end up having to hack a dirty workaround to make it work then it's also bad for demonstration purposes and I would rather implement an alternative way of cache eviction.
As you mentioned, you could use different consumer ids with random strings.
If notifications are being read from the beginning, then you probably have ConsumerConfig.AUTO_OFFSET_RESET_CONFIG set to "earliest" somewhere in your consumer configuration. If this is the case, removing it will probably solve your problems - when the app will start it will only receive notification sent after the consumer started listening.
I am stuck in a typical use case or scenario where I am not sure what will be the behavior of Kafka..
SCENERIO : I am using Spring Kafka with spring Boot. In my application I am having one Rest end point which will read all messages from the beginning of a topic to check for the duplication of message then write to topic if not duplicate.
I am confused about what will be the behavior of the application when multiple instances of same microservice are deployed and offset is moved for seekFromBegining operation.
few questions in my mind are :
do reading from beginning of a topic (with the help of seek) block the topic ?
If Yes. then how to solve this typical use case where we have to validate for the
duplication of message before writing to the topic.
Using DB is not a solution because it will be resource intensive. and make the application slower.
Thanks everyone in Advance
Sounds like you need a Log Compaction feature:
Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.
Therefore when you specify some unique message key, you won't have more than one of them in the partition. And with that you don't need to read topic before storing at all.
I am using ActiveMQ and want to generate alerts for messages which are sitting int the queue for very long time. I looked at "Advisory Message" feature but it has no such provision. It is very important for me to use a solution which does not add too much overhead on AMQ.
Note:This requirement is very different from alerts when message moves to DLQ after expiry.
The only means of reviewing what is in a Queue really is to browse it and the broker will place limitations on how far into the contents of the queue you can browse.
A message broker is not a database and you should not try to treat as such. If you have concerns about things remaining on a queue for to long then explicit expiration is your most effective tool.
You can build you own tooling to track the advisories around message enqueue and dequeue but you'd just end up needing to persist that information to make it effective so going back and reevaluating why you need to do this and what might be a better choice of architecture might be appropriate.
If you insist on want to audit the contents of the Queues then you'd want to look at configuration for max browse page size to try and let you get further into the Queue on a browse but depending on depth this probably won't get you everything you want.
Application server creates a new transaction before calling MDB's onMessage method. Also I am processing database update in onMessage method. Transactions create additional overhead and processing several message in one transaction could increase performance.
Is it possible to make App server to use one transaction for several messages. Or maybe there are other approaches to this problem?
And, by the way, I can't use multiple instances, cause I need to preserve the sequence order.
I guess you can store the messages in a list and depending upon how many messages you want to process in one transaction you can check the size of the list and process the messages.