I have a simple shell script that connect to GCP and try to pull Pub/Sub messages from a topic.
When launched, it check if any message exist, does a simple action if so, then ack the message and loop .
It looks like that :
while [ 1 ]
do
gcloud pubsub subscriptions pull...
// Do something
gcloud pubsub subscriptions ack ...
done
Randomly it does not pull the messages : they stay in the queue and are not pulled.
So we tried to add a while loop when getting the message with something like 5 re-try in order to avoid those issues work better but not perfectly. I also think that is a bit shabby...
This issue happened on other project that where migrated from a script shell to Java (for some other reasons) where we used a pull subscription and it work perfectly on those projects now !
We must probably do something wrong but I don't know what...
I have read that sometimes gcloud pull less messages than what's really on the pubsub queue :
https://cloud.google.com/sdk/gcloud/reference/pubsub/subscriptions/pull
But it must at least pull one ... In our case no messages are pulled but randomly.
Is there something to improve here ?
In general, relying on a shell script that uses gcloud to retrieve messages and do something with them is not going to be an efficient way to use Cloud Pub/Sub. It is worth noting that the lack of messages being returned in pull is not indicative of a lack of messages; it just means that messages could not be returned before the pull request's deadline. The gcloud subscriptions pull command sets the returnImmediately property (see info in pull documentation) to true, which basically means that if there aren't messages already quickly accessible in memory, then no messages are going to be returned. This property is deprecated and should not be set to true, so that is probably something that we need to explore changing in gcloud.
You would be better off writing a subscriber using the client libraries that sets up a stream and continuously retrieves messages. If your intention is to run this only periodically, then you could write a job that reads messages and waits some time after messages have not been received and shuts down. Again, this would not guarantee that all messages would be consumed that are available, but it would be true in most cases.
A version of this in Java would look like this:
import com.google.cloud.pubsub.v1.AckReplyConsumer;
import com.google.cloud.pubsub.v1.MessageReceiver;
import com.google.pubsub.v1.ProjectSubscriptionName;
import com.google.pubsub.v1.PubsubMessage;
import java.util.concurrent.atomic.AtomicLong;
import org.joda.time.DateTime;
/** A basic Pub/Sub subscriber for purposes of demonstrating use of the API. */
public class Subscriber implements MessageReceiver {
private final String PROJECT_NAME = "my-project";
private final String SUBSCRIPTION_NAME = "my-subscription";
private com.google.cloud.pubsub.v1.Subscriber subscriber;
private AtomicLong lastReceivedTimestamp = new AtomicLong(0);
private Subscriber() {
ProjectSubscriptionName subscription =
ProjectSubscriptionName.of(PROJECT_NAME, SUBSCRIPTION_NAME);
com.google.cloud.pubsub.v1.Subscriber.Builder builder =
com.google.cloud.pubsub.v1.Subscriber.newBuilder(subscription, this);
try {
this.subscriber = builder.build();
} catch (Exception e) {
System.out.println("Could not create subscriber: " + e);
System.exit(1);
}
}
#Override
public void receiveMessage(PubsubMessage message, AckReplyConsumer consumer) {
// Process message
lastReceivedTimestamp.set(DateTime.now().getMillis());
consumer.ack();
}
private void run() {
subscriber.startAsync();
while (true) {
long now = DateTime.now().getMillis();
long currentReceived = lastReceivedTimestamp.get();
if (currentReceived > 0 && ((now - currentReceived) > 30000)) {
subscriber.stopAsync();
break;
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
System.out.println("Error while waiting for completion: " + e);
}
}
System.out.println("Subscriber has not received message in 30s. Stopping.");
subscriber.awaitTerminated();
}
public static void main(String[] args) {
Subscriber s = new Subscriber();
s.run();
System.exit(0);
}
}
Related
I have two data sources, each returning a Mono:
class CacheCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
class MasterCustomerClient {
Mono<Entity> createCustomer(Customer customer)
}
Callers to my application are hitting a Spring WebFlux controller:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
public Flux<Entity> createCustomer(#RequestBody Customer customer) {
return customerService.createNewCustomer(entity);
}
As long as either data source successfully completes its create operation, I want to immediately return a success response to the caller, however, I still want my service to continue processing the result of the other Mono stream, in the event that an error was encountered, so it can be logged.
The problem seems to be that as soon as a value is returned to the controller, a cancel signal is propagated back through the stream by Spring WebFlux and, thus, no information is logged about a failure.
Here's one attempt:
public Flux<Entity> createCustomer(final Customer customer) {
var cacheCreate = cacheClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in cache"));
var masterCreate = masterClient
.createCustomer(customer)
.doOnError(WebClientResponseException.class,
err -> log.error("Customer creation failed in master"));
return Flux.firstWithValue(cacheCreate, masterCreate)
.onErrorMap((err) -> new Exception("Customer creation failed in cache and master"));
}
Flux.firstWithValue() is great for emitting the first non-error value, but then whichever source is lagging behind is cancelled, meaning that any error is never logged out. I've also tried scheduling these two sources on their own Schedulers and that didn't seem to help either.
How can I perform these two calls asynchronously, and emit the first value to the caller, while continuing to listen for emissions on the slower source?
You can achieve that by transforming you operators to "hot" publishers using share() operator:
First subscriber launch the upstream operator, and additional subscribers get back result cached from the first subscriber:
Further Subscriber will share [...] the same result.
Once a second subscription has been done, the publisher is not cancellable:
It's worth noting this is an un-cancellable Subscription.
So, to achieve your requirement:
Apply share() on each of your operators
Launch a subscription on shared publishers to trigger processing
Use shared operators in your pipeline (here firstWithValue).
Sample example:
import java.time.Duration;
import reactor.core.publisher.Mono;
public class TestUncancellableMono {
// Mock a mono successing quickly
static Mono<String> quickSuccess() {
return Mono.delay(Duration.ofMillis(200)).thenReturn("SUCCESS !");
}
// Mock a mono taking more time and ending in error.
static Mono<String> longError() {
return Mono.delay(Duration.ofSeconds(1))
.<String>then(Mono.error(new Exception("ERROR !")))
.doOnCancel(() -> System.out.println("CANCELLED"))
.doOnError(err -> System.out.println(err.getMessage()));
}
public static void main(String[] args) throws Exception {
// Transform to hot publisher
var sharedQuick = quickSuccess().share();
var sharedLong = longError().share();
// Trigger launch
sharedQuick.subscribe();
sharedLong.subscribe();
// Subscribe back to get the cached result
Mono
.firstWithValue(sharedQuick, sharedLong)
.subscribe(System.out::println, err -> System.out.println(err.getMessage()));
// Wait for subscription to end.
Thread.sleep(2000);
}
}
The output of the sample is:
SUCCESS !
ERROR !
We can see that error message has been propagated properly, and that upstream publisher has not been cancelled.
I'm stucked with that kind of a problem. I use kafka as transport between services. Tried to draw sequence diagram
First of all planning service get main task and handling it, planning service pass it to few services then. My main problem is: I musn't pick another main task, until f.e. second service send result to kafka and planning service will process the result.
My main listener have this structure
#KafkaListener(
containerFactory = "genFactory",
topics = "${main}")
public void listenStartGeneratorTopic( GeneratorMessage message, Acknowledgment acknowledgment){
//do some logic
//THEN send message to first service, and then in that listener new task sends to second
sendTaskToQueue(task);
acknowledgment.acknowledge();
log.info("All done in method");
}
As I understood, I need aknowledge() after all my logic with result from second service will be done. So I tried to add boolean flag in CompletableFuture, setting it in true when my planning service get response from second service. And do blocking get() in main listener to continue after.
private CompletableFuture<Boolean> isMessageProcessed = new CompletableFuture<>();
#KafkaListener(topics = "${report}")
public void listenReport(ReportMessage reportMessage) {
isMessageProcessed = CompletableFuture.completedFuture(true);
}
}
#KafkaListener(
containerFactory = "genFactory",
topics = "${main}")
public void listenStartGeneratorTopic( GeneratorMessage message, Acknowledgment acknowledgment){
//do some logic
//THEN send message to first service, and then in that listener new task sends to second
sendTaskToQueue(task);
isMessageProcessed.join();
log.info("message is ready for commit");
acknowledgment.acknowledge();
}
That's looks strange enough and that idea doesn't bring me result.
So, can you give me advice, what can I do in that situation?
Why not using 6 topics? I believe this is better separation of duties and might allow you better scale,
Guess I would check KStream as well in your case...
My idea goes like this:
PLANNING SERVICE read from topic1.start do work send to topic2 ,
FIRST SERVICE read from topic2 do work and send to topic3
PLANNING SERVICE (another instance) read from topic3 do work and write to topic4
SECOND SERVICE reads topic4 do work send to topic5
PLANNING SERVICE (another instance) read from topic5 and write to topic6.done
I'm new to mass transit and have a question regarding how I should solve a failure to consume a message. Given the below code I am consuming INotificationRequestContract's. As you can see the code will break and not complete.
public class NotificationConsumerWorker : IConsumer<INotificationRequestContract>
{
private readonly ILogger<NotificationConsumerWorker> _logger;
private readonly INotificationCreator _notificationCreator;
public NotificationConsumerWorker(ILogger<NotificationConsumerWorker> logger, INotificationCreator notificationCreator)
{
_logger = logger;
_notificationCreator = notificationCreator;
}
public Task Consume(ConsumeContext<INotificationRequestContract> context)
{
try
{
throw new Exception("Horrible error");
}
catch (Exception e)
{
// >>>>> insert code here to put message back for later consumption. <<<<<
_logger.LogError(e, "Failed to consume message");
throw;
}
}
}
How do I best handle a scenario such as this where the consumption fails? In my specific case this is likely to occur if a required external service is unavailable.
I can see two solutions.
If there is a way to put the message back, or cancel the consumption so that it will be tried again.
I could store it locally in a database and create my own re-try method to wrap this (but would prefer not to for sake of simplicity).
The exceptions section of the documentation provides sufficient guidance for dealing with consumer exceptions.
There are two retry approaches, which can be used in combination:
Message Retry, which waits while the message is locked, in-process, for the next retry. Therefore, these should be short, to deal with transient issues.
Message Redelivery, which delays the message using either the broker delayed delivery, or a message scheduler, so that it is redelivered to the receive endpoint at some point in the future.
Once all retry/redelivery attempts are exhausted, the message is moved to the _error queue.
I am using Masstransit with RabbitMQ. As part of some deployment procedure, At some point in time I need my service to disconnect and stop receiving any messages.
Assuming that I won't need the bus until the next restart of the service, will it be Ok to use bus.StopAsync()?
Is there a way to get list of end points and then remove them from listining ?
You should StopAsync the bus, and then when ready, call StartAsync to bring it back up (or start it at the next service restart).
To stop receiving messages without stopping the buss I needed a solution that will avoid the consume message pipeline from consuming any type of message. I tried with observers but unsuccessfully. My solution came up with custom consuming message filter.
The filter part looks like this
public class ComsumersBlockingFilter<T> :
IFilter<ConsumeContext<T>>
where T : class
{
public void Probe(ProbeContext context)
{
var scope = context.CreateFilterScope("messageFilter");
}
public async Task Send(ConsumeContext<T> context, IPipe<ConsumeContext<T>> next)
{
// Check if the service is degraded (true for this demo)
var isServiceDegraded = true;
if (isServiceDegraded)
{
//Suspend the message for 5 seconds
await Task.Delay(TimeSpan.FromMilliseconds(5000), context.CancellationToken);
if (!context.CancellationToken.IsCancellationRequested)
{
//republish the message
await context.Publish(context.Message);
Console.WriteLine($"Message {context.MessageId} has been republished");
}
// NotifyConsumed to avoid skipped message
await context.NotifyConsumed(TimeSpan.Zero, "messageFilter");
}
else
{
//Next filter in the pipe is called
await next.Send(context);
}
}
}
The main idea is to delay with cancellation token and the republish the message. After that call contect.NotifyConsumed to avoid the next pipeline filters and return normally.
I have a business application with the following versions
spring boot(2.2.0.RELEASE) spring-Kafka(2.3.1-RELEASE)
spring-cloud-stream-binder-kafka(2.2.1-RELEASE)
spring-cloud-stream-binder-kafka-core(3.0.3-RELEASE)
spring-cloud-stream-binder-kafka-streams(3.0.3-RELEASE)
We have around 20 batches.Each batch using 6-7 topics to handle the business.Each service has its own state store to maintain the status of the batch whether its running/Idle.
Using the below code to query th store
#Autowired
private InteractiveQueryService interactiveQueryService;
public ReadOnlyKeyValueStore<String, String> fetchKeyValueStoreBy(String storeName) {
while (true) {
try {
log.info("Waiting for state store");
return new ReadOnlyKeyValueStoreWrapper<>(interactiveQueryService.getQueryableStore(storeName,
QueryableStoreTypes.<String, String> keyValueStore()));
} catch (final IllegalStateException e) {
try {
Thread.sleep(1000);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
When deploying the application in one instance(Linux machine) every thing is working fine.While deploying the application in 2 instance we find the folowing observations
state store is available in one instance and other dosen't have.
When the request is being processed by the instance which has the state store every thing is fine.
If the request falls to the instance which does not have state store the application is waiting in the while loop indefinitley(above code snippet).
While the instance without store waiting indefinitely and if we kill the other instance the above code returns the store and it was processing perfectly.
No clue what we are missing.
When you have multiple Kafka Streams processors running with interactive queries, the code that you showed above will not work the way you expect. It only returns results, if the keys that you are querying are on the same server. In order to fix this, you need to add the property - spring.cloud.stream.kafka.streams.binder.configuration.application.server: <server>:<port> on each instance. Make sure to change the server and port to the correct ones on each server. Then you have to write code similar to the following:
org.apache.kafka.streams.state.HostInfo hostInfo = interactiveQueryService.getHostInfo("store-name",
key, keySerializer);
if (interactiveQueryService.getCurrentHostInfo().equals(hostInfo)) {
//query from the store that is locally available
}
else {
//query from the remote host
}
Please see the reference docs for more information.
Here is a sample code that demonstrates that.