Publish-Subscribe Channels Both Going to Kafka Result in Duplicate KafkaProducerContexts - spring

I am attempting to use Spring Integration to send data from one channel to two different Kafka queues after those same data go through different transformations on the way to their respective queues. The problem is I apparently have duplicate producer contexts, and I don't know why.
Here is my flow configuration:
flow -> flow
.channel(“firstChannel")
.publishSubscribeChannel(Executors.newCachedThreadPool(), s -> s
.subscribe(f -> f
.transform(firstTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress), e -> e.id(“firstKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
)
.subscribe(f -> f
.transform(secondTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(secondMetadata(), brokerAddress), e -> e.id(“secondKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
));
The exception is this:
Could not register object [org.springframework.integration.kafka.support.KafkaProducerContext#3163987e] under bean name 'not_specified': there is already object [org.springframework.integration.kafka.support.KafkaProducerContext#15f193b8] bound
I have tried using different kafkaConfig objects, but that hasn't helped. Meanwhile, the ProducerMetadata instances are distinct as you can see from the different first parameters to addProducer. Those provide the names of the respective destination queues among other metadata.
It sounds like there are some implicit bean definitions that are being created that conflict with each other.
How can I resolve this exception with the two KafkaProducerContexts?

You should not to use .get() on those KafkaProducerMessageHandlerSpec and let Framework to work out the environment for you.
The issue is because KafkaProducerMessageHandlerSpec implements ComponentsRegistration and no body cares about the:
public Collection<Object> getComponentsToRegister() {
this.kafkaProducerContext.setProducerConfigurations(this.producerConfigurations);
return Collections.<Object>singleton(this.kafkaProducerContext);
}
after manual .get() invocation.
I agree, this a some inconvenience and we should find some better solution for end-application, but there is no yet choice, unless follow with the Spec style for the Framework components, like Kafka.outboundChannelAdapter().
Hope I am clear.
UPDATE
OK, it's definitely an issue on our side. And we will fix it soon:
https://jira.spring.io/browse/INTEXT-216
https://jira.spring.io/browse/INTEXT-217
Meanwhile the workaround for you is like this:
KafkaProducerContext kafkaProducerContext = (KafkaProducerContext) kafkaProducerMessageHandlerSpec.getComponentsToRegister().iterator().next();
kafkaProducerContext.setBeanName(null);
Where you should move
Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress)
to the separate private method to get access to that kafkaProducerContext.

Related

Can I store sensitive data in a Vert.x context in a Quarkus application?

I am looking for a place to store some request scoped attributes such as user id using a Quarkus request filter. I later want to retrieve these attributes in a Log handler and put them in the MDC logging context.
Is Vertx.currentContext() the right place to put such request attributes? Or can the properties I set on this context be read by other requests?
If this is not the right place to store such data, where would be the right place?
Yes ... and no :-D
Vertx.currentContext() can provide two type of objects:
root context shared between all the concurrent processing executed on this event loop (so do NOT share data)
duplicated contexts, which are local to the processing and its continuation (you can share in these)
In Quarkus 2.7.2, we have done a lot of work to improve our support of duplicated context. While before, they were only used for HTTP, they are now used for gRPC and #ConsumeEvent. Support for Kafka and AMQP is coming in Quarkus 2.8.
Also, in Quarkus 2.7.2, we introduced two new features that could be useful:
you cannot store data in a root context. We detect that for you and throw an UnsupportedOperationException. The reason is safety.
we introduced a new utility class ( io.smallrye.common.vertx.ContextLocals to access the context locals.
Here is a simple example:
AtomicInteger counter = new AtomicInteger();
public Uni<String> invoke() {
Context context = Vertx.currentContext();
ContextLocals.put("message", "hello");
ContextLocals.put("id", counter.incrementAndGet());
return invokeRemoteService()
// Switch back to our duplicated context:
.emitOn(runnable -> context.runOnContext(runnable))
.map(res -> {
// Can still access the context local data
String msg = ContextLocals.<String>get("message").orElseThrow();
Integer id = ContextLocals.<Integer>get("id").orElseThrow();
return "%s - %s - %d".formatted(res, msg, id);
});
}

How to handle weird API flow with implicit create step in custom terraform provider

Most terraform providers demand a predefined flow, Create/Read/Update/Delete/Exists
I am in a weird situation developing a provider against an API where this behavior diverges a bit.
There are two kinds of resources, Host and Scope. A host can have many scopes. Scopes are updated with configurations.
This generally fits well into the terraform flow, it has a full CRUDE flow possible - except for one instance.
When a new Host is made, it automatically has a default scope attached to it. It is always there, cannot be deleted etc.
I can't figure out how to have my provider gracefully handle this, as I would want the tf to treat it like any other resource, but it doesn't have an explicit CREATE/DELETE, only READ/UPDATE/EXISTS - but every other scope attached to the host would have CREATE/DELETE.
Importing is not an option due to density, requiring an import for every host would render the entire thing pointless.
I originally was going to attempt to split Scopes and Configurations into separate resources so one could be full-filled by the Host (the host providing the Scope ID for a configuration, and then other configurations can get their scope IDs from a scope resource)
However this approach falls apart because the API for both are the same, unless I wanted to add the abstraction of creating an empty scope then applying a configuration against it, which may not be fully supported. It would essentially be two resources controlling one resource which could lead to dramatic conflicts.
A paraphrased example of an execution I thought about implementing
resource "host" "test_integrations" {
name = "test.integrations.domain.com"
account_hash = "${local.integrationAccountHash}"
services = [40]
}
resource "configuration" "test_integrations_root_configuration" {
name = "root"
parent_host = "${host.test_integrations.id}"
account_hash = "${local.integrationAccountHash}"
scope_id = "${host.test_integrations.root_scope_id}"
hostnames = ["test.integrations.domain.com"]
}
resource "scope" "test_integrations_other" {
account_hash = "${local.integrationAccountHash}"
host_hash = "${host.test_integrations.id}"
path = "/non/root/path"
name = "Some Other URI Path"
}
resource "configuration" "test_integrations_other_configuration" {
name = "other"
parent_host = "${host.test_integrations.id}"
account_hash = "${local.integrationAccountHash}"
scope_id = "${host.test_integrations_other.id}"
}
In this example flow, a configuration and scope resource unfortunately are pointing to the same resource which I am worried would cause conflicts or confusion on who is responsible for what and dramatically confuses the create/delete lifecycle
But I can't figure out how the TF lifecycle would allow for a resource that would only UPDATE/READ/EXISTS if say a flag was given (and how state would handle that)
An alternative would be to just have a Configuration resource, but then if it was the root configuration it would need to skip create/delete as it is inherently tied to the host
Ideally I'd be able to handle this situation gracefully. I am trying to avoid including the root scope/configuration in the host definition as it would create a split in how they are written and handled.
The documentation for providers implies you can use a resource AS a schema object in a resource, but does not explain how or why. If it works the way I imagine it, it may work to create a resource that is only used to inject into the host perhaps - but I don't know if that is how it works and if it is how to accomplish it.
I believe I tentatively have found a solution after asking some folks on the gopher slack.
Using AWS Provider Default VPC as a reference, I can "clone" the resource into one with a custom Create/Delete lifecycle
Loose Example:
func defaultResourceConfiguration() *schema.Resource {
drc := resourceConfiguration()
drc.Create = resourceDefaultConfigurationCreate
drc.Delete = resourceDefaultConfigurationDelete
return drc
}
func resourceDefaultConfigurationCreate(d *schema.ResourceData, m interface{}) error {
// double check it exists and update the resource instead
return resourceConfigurationUpdate(d, m)
}
func resourceDefaultConfigurationDelete(d *schema.ResourceData, m interface{}) error {
log.Printf("[WARN] Cannot destroy Default Scope Configuration. Terraform will remove this resource from the state file, however resources may remain.")
return nil
}
This should allow me to provide an identical resource that is designed to interact with the already existing one created by its parent host.

Dealing with parallel flux in Reactor

I have created a parallet flux from iterable. And on each iterable I have to make a rest call. But while executing even if any of the request fails , all the remaining requests also fail. I want all the requests to be executed irrespective of failure or success.
I am currently using Flux.fromIterable and using runOn operator
Flux.fromIterable(actions)
.parallel()
.runOn(Schedulars.elastic())
.flatMap(request -> someRemoteCall)
.sequential()
.subscribe();
I want all the requests in iterable to be executed , irrespective of the failure or success. But as of now some gets executed and some gets failed.
There's three possible ways I generally use to achieve this:
Use the 3 argument version of flatMap(), the second of which is a mapperOnError -eg. .flatMap(request -> someRemoteCall(), x->Mono.empty(), null);
Use onErrorResume(x -> Mono.empty()) as a separate call to ignore any error;
Use .onErrorResume(MyException.class, x -> Mono.empty())) to just ignore errors of a certain type.
The second is what I tend to use by default, as I find that clearest.
Because of .parallel().runOn(...) usage you can't use onErrorContinue as below:
.parallel()
.runOn(...)
.flatMap(request -> someRemoteCall)
.onErrorContinue(...)
but you might be able to use it like this:
.parallel().runOn(...)
.flatMap(request -> someRemoteCall
.onErrorContinue((t, o) -> log.error("Skipped error: {}", t.getMessage()))
)
provided that someRemoteCall is a Mono or Flux not itself run on .parallel().runOn(...) rails.
But when you don't have a someRemoteCall you can do the trick below (see NOT_MONO_AND_NOT_FLUX) to ignore the unsafe processing run on .parallel().runOn(...) rails:
Optional<List<String>> foundImageNames =
Flux.fromStream(this.fileStoreService.walk(path))
.parallel(cpus, cpus)
.runOn(Schedulers.newBoundedElastic(cpus, Integer.MAX_VALUE, "import"), 1)
.flatMap(NOT_MONO_AND_NOT_FLUX -> Mono
.just(NOT_MONO_AND_NOT_FLUX)
.map(path -> sneak(() -> unsafeLocalSimpleProcessingReturningString(path)))
.onErrorContinue(FileNotFoundException.class,
(t, o) -> log.error("File missing:\n{}", t.getMessage()))
)
.collectSortedList(Comparator.naturalOrder())
.blockOptional();
I'm still in the process of learning WebFlux and Reactor, but try one of the onErrorContinue directly after flatMap (REST call) to drop (and potentially log) errors.
There are delay error operators in Reactor. You could write your code as follows:
Flux.fromIterable(actions)
.flatMapDelayError(request -> someRemoteCall(request).subscribeOn(Schedulers.elastic()), 256, 32)
.doOnNext(System.out::println)
.subscribe();
Note that this will still fail your flux in case of any inside publisher emits error, however, it will wait for all inner publishers to finish before doing that.
These operators also require to specify the concurrency and prefetch parameters. In the example I've set them to their default values which is used in regular flatMap calls.

Project Reactor: possibly misleading documentation about error handling

I am reading Reactor reference documentation about error handling and something seems wrong. For example this section about fallback method:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k))
.onErrorResume(e -> getFromCache(k));
But onErrorResume() lambda takes only one parameter e (error throwable). How k (previous value emitted by flux) is referenced here?
There are other similar code snippets in the docs. Am I reading this wrong?
Or if documentation is indeed incorrect how can I actually handle this case: recover from error by executing alternative path with previous value?
Yes, I think you found a bug in the documentation.
If you want to use k the call to onErrorResume must happen inside the argument to flatMap like so:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k)
.onErrorResume(e -> getFromCache(k))
);
Regarding your comment: It is not possible to have the value being processed as part of the onErrorXXX methods because the error in question might not be happening while a value was processed. Maybe it happened for example while handling backpressure (i.e. requestion more elements) or while subscribing.

Aws integration spring: Extend Visibility Timeout

Is it possible to extend the visibility time out of a message that is in flight.
See:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html.
Section: Changing a Message's Visibility Timeout.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sqs/AmazonSQSClient.html#changeMessageVisibility-com.amazonaws.services.sqs.model.ChangeMessageVisibilityRequest-
In summary I want to be able to extend the first set visibility timeout for a given message that is in flight.
Example if 15secs have passed I then want to extend the timeout by another 20secs. Better example in java docs above.
From my understanding in the links above you can do this on the amazon side.
Below are my current settings;
SqsMessageDrivenChannelAdapter adapter =
new SqsMessageDrivenChannelAdapter(queue);
adapter.setMessageDeletionPolicy(SqsMessageDeletionPolicy.ON_SUCCESS);
adapter.setMaxNumberOfMessages(1);
adapter.setSendTimeout(2000);
adapter.setVisibilityTimeout(200);
adapter.setWaitTimeOut(20);
Is it possible to extend this timeout?
Spring Cloud AWS supports this starting with Version 2.0. Injecting a Visiblity parameter in your SQS listener method does the trick:
#SqsListener(value = "my-sqs-queue")
void onMessageReceived(#Payload String payload, Visibility visibility) {
...
var extension = visibility.extend(20);
...
}
Note, that extend will work asynchronously and will return a Future. So if you want to be sure further down the processing, that the visibility of the message is really extended at the AWS side of things, either block on the Future using extension.get() or query the Future with extension.isDone()
OK. Looks like I see your point.
We can change visibility for particular message using API:
AmazonSQS.changeMessageVisibility(String queueUrl, String receiptHandle, Integer visibilityTimeout)
For this purpose in downstream flow you have to get access to (inject) AmazonSQS bean and extract special headers from the Message:
#Autowired
AmazonSQS amazonSqs;
#Autowired
ResourceIdResolver resourceIdResolver;
...
MessageHeaders headers = message.getHeaders();
DestinationResolver destinationResolver = new DynamicQueueUrlDestinationResolver(this.amazonSqs, this.resourceIdResolver);
String queueUrl = destinationResolver.resolveDestination(headers.get(AwsHeaders.QUEUE));
String receiptHandle = headers.get(AwsHeaders.RECEIPT_HANDLE);
amazonSqs.changeMessageVisibility(queueUrl, receiptHandle, YOUR_DESIRED_VISIBILITY_TIMEOUT);
But eh, I agree that we should provide something on the matter as out-of-the-box feature. That may be even something similar to QueueMessageAcknowledgment as a new header. Or even just one more changeMessageVisibility method to this one.
Please, raise a GH issue for Spring Cloud AWS project on the matter with link to this SO topic.

Resources