TopicProcessor has been removed since the 3.4+ reactor version and now I'm struggling how is the easiest way to replace it with a new API. I'd like to keep the implementation as reliable as possible, even if the performance will get worst. I don't want to struggle with loss events or an overflowing stream, just keep it stupid simple. I also need to buffer data, so I've used bufferTimeout for that and it worked perfectly with TopicProcessor.
How to achieve that with the new API?
I've tried to use Sinks but with a lot of pressure there was an overflow exception and the whole stream was terminated.
Below is my basic implementation before upgrade to Spring Boot 2.4.1.
package com.example.reactorplayground.config;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
public class TopicProcessor<T> implements Processor<T> {
private final reactor.core.publisher.TopicProcessor<T> processor;
public TopicProcessor() {
this.processor = reactor.core.publisher.TopicProcessor.share("in-memory-topic-thread", 1);
}
#Override
public Mono<T> add(T event) {
return Mono.fromCallable(() -> {
processor.onNext(event);
return event;
});
}
#Override
public Flux<T> consumeWith() {
return processor;
}
}
Related
I have tried to configure an existing Maven project to run using cucumber-junit-platform-engine.
I have used this repo as inspiration.
I added the Maven dependencies needed, as in the linked project using spring-boot-starter-parent version 2.4.5 and cucumber-jvm version 6.10.4.
I set the junit-platform properties as follows:
cucumber.execution.parallel.enabled=true
cucumber.execution.parallel.config.strategy=fixed
cucumber.execution.parallel.config.fixed.parallelism=4
Used annotation #Cucumber in the runner class and #SpringBootTest for classes with steps definition.
It seems to work fine with creating parallel threads, but the problem is it creates all the threads at the start and opens as many browser windows (drivers) as the number of scenarios (e.g. 51 instead of 4).
I am using a CucumberHooks class to add logic before and after scenarios and I'm guessing it interferes with the runner because of the annotations I'm using:
import java.util.List;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import io.cucumber.java.After;
import io.cucumber.java.Before;
import io.cucumber.java.Scenario;
import io.cucumber.plugin.ConcurrentEventListener;
import io.cucumber.plugin.event.EventHandler;
import io.cucumber.plugin.event.EventPublisher;
import io.cucumber.plugin.event.TestRunFinished;
import io.cucumber.plugin.event.TestRunStarted;
import io.github.bonigarcia.wdm.WebDriverManager;
public class CucumberHooks implements ConcurrentEventListener {
#Autowired
private ScenarioContext scenarioContext;
#Before
public void beforeScenario(Scenario scenario) {
scenarioContext.getNewDriverInstance();
scenarioContext.setScenario(scenario);
LOGGER.info("Driver initialized for scenario - {}", scenario.getName());
....
<some business logic here>
....
}
#After
public void afterScenario() {
Scenario scenario = scenarioContext.getScenario();
WebDriver driver = scenarioContext.getDriver();
takeErrorScreenshot(scenario, driver);
LOGGER.info("Driver will close for scenario - {}", scenario.getName());
driver.quit();
}
private void takeErrorScreenshot(Scenario scenario, WebDriver driver) {
if (scenario.isFailed()) {
final byte[] screenshot = ((TakesScreenshot) driver).getScreenshotAs(OutputType.BYTES);
scenario.attach(screenshot, "image/png", "Failure");
}
}
#Override
public void setEventPublisher(EventPublisher eventPublisher) {
eventPublisher.registerHandlerFor(TestRunStarted.class, beforeAll);
}
private EventHandler<TestRunStarted> beforeAll = event -> {
// something that needs doing before everything
.....<some business logic here>....
WebDriverManager.getInstance(DriverManagerType.CHROME).setup();
};
}
I tried replacing the #Before tag from io.cucumber.java with the #BeforeEach from org.junit.jupiter.api and it does not work.
How can I solve this issue?
New answer, JUnit 5 has been improved somewhat.
If you are on Java 9+ you can use the following in junit-platform.properties to enable a custom parallelism.
cucumber.execution.parallel.enabled=true
cucumber.execution.parallel.config.strategy=custom
cucumber.execution.parallel.config.custom.class=com.example.MyCustomParallelStrategy
And you'd implement MyCustomParallelStrategy as:
package com.example;
import org.junit.platform.engine.ConfigurationParameters;
import org.junit.platform.engine.support.hierarchical.ParallelExecutionConfiguration;
import org.junit.platform.engine.support.hierarchical.ParallelExecutionConfigurationStrategy;
import java.util.concurrent.ForkJoinPool;
import java.util.function.Predicate;
public class MyCustomParallelStrategy implements ParallelExecutionConfiguration, ParallelExecutionConfigurationStrategy {
private static final int FIXED_PARALLELISM = 4
#Override
public ParallelExecutionConfiguration createConfiguration(final ConfigurationParameters configurationParameters) {
return this;
}
#Override
public Predicate<? super ForkJoinPool> getSaturatePredicate() {
return (ForkJoinPool p) -> true;
}
#Override
public int getParallelism() {
return FIXED_PARALLELISM;
}
#Override
public int getMinimumRunnable() {
return FIXED_PARALLELISM;
}
#Override
public int getMaxPoolSize() {
return FIXED_PARALLELISM;
}
#Override
public int getCorePoolSize() {
return FIXED_PARALLELISM;
}
#Override
public int getKeepAliveSeconds() {
return 30;
}
On Java 9+ this will limit the max-pool size of the underlying forkjoin pool to FIXED_PARALLELISM and there should never be more then 8 web drivers active at the same time.
Also once JUnit5/#3044 is merged, released an integrated into Cucumber, you can use the cucumber.execution.parallel.config.fixed.max-pool-size on Java 9+ to limit the maximum number of concurrent tests.
So as it turns out parallism is mostly a suggestion. Cucumber uses JUnit5s ForkJoinPoolHierarchicalTestExecutorService which constructs a ForkJoinPool.
From the docs on ForkJoinPool:
For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors. The pool attempts to maintain enough active (or available) threads by dynamically adding, suspending, or resuming internal worker threads, even if some tasks are stalled waiting to join others. However, no such adjustments are guaranteed in the face of blocked I/O or other unmanaged synchronization.
So within a ForkJoinPool when ever a thread blocks for example because it starts asynchronous communication with the web driver another thread may be started to maintain the parallelism.
Since all threads wait, more threads are added to the pool and more web drivers are started.
This means that rather then relying on the ForkJoinPool to limit the number of webdrivers you have to do this yourself. You can use a library like Apache Commons Pool or implement a rudimentary pool using a counting semaphore.
#Component
#ScenarioScope
public class ScenarioContext {
private static final int MAX_CONCURRENT_WEB_DRIVERS = 1;
private static final Semaphore semaphore = new Semaphore(MAX_CONCURRENT_WEB_DRIVERS, true);
private WebDriver driver;
public WebDriver getDriver() {
if (driver != null) {
return driver;
}
try {
semaphore.acquire();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
try {
driver = CustomChromeDriver.getInstance();
} catch (Throwable t){
semaphore.release();
throw t;
}
return driver;
}
public void retireDriver() {
if (driver == null) {
return;
}
try {
driver.quit();
} finally {
driver = null;
semaphore.release();
}
}
}
I'm writing a Spring Boot application that starts up, gathers and converts millions of database entries into a new streamlined JSON format, and then sends them all to a GCP PubSub topic. I'm attempting to use Spring Batch for this, but I'm running into trouble implementing fault tolerance for my process. The database is rife with data quality issues, and sometimes my conversions to JSON will fail. When failures occur, I don't want the job to immediately quit, I want it to continue processing as many records as it can and, before completion, to report which exact records failed so that I, and or my team, can examine these problematic database entries.
To achieve this, I've attempted to use Spring Batch's SkipListener interface. But I'm also using an AsyncItemProcessor and an AsyncItemWriter in my process, and even though the exceptions are occurring during the processing, the SkipListener's onSkipInWrite() method is catching them - rather than the onSkipInProcess() method. And unfortunately, the onSkipInWrite() method doesn't have access to the original database entity, so I can't store its ID in my list of problematic DB entries.
Have I misconfigured something? Is there any other way to gain access to the objects from the reader that failed the processing step of an AsynItemProcessor?
Here's what I've tried...
I have a singleton Spring Component where I store how many DB entries I've successfully processed along with up to 20 problematic database entries.
#Component
#Getter //lombok
public class ProcessStatus {
private int processed;
private int failureCount;
private final List<UnexpectedFailure> unexpectedFailures = new ArrayList<>();
public void incrementProgress { processed++; }
public void logUnexpectedFailure(UnexpectedFailure failure) {
failureCount++;
unexpectedFailure.add(failure);
}
#Getter
#AllArgsConstructor
public static class UnexpectedFailure {
private Throwable error;
private DBProjection dbData;
}
}
I have a Spring batch Skip Listener that's supposed to catch failures and update my status component accordingly:
#AllArgsConstructor
public class ConversionSkipListener implements SkipListener<DBProjection, Future<JsonMessage>> {
private ProcessStatus processStatus;
#Override
public void onSkipInRead(Throwable error) {}
#Override
public void onSkipInProcess(DBProjection dbData, Throwable error) {
processStatus.logUnexpectedFailure(new ProcessStatus.UnexpectedFailure(error, dbData));
}
#Override
public void onSkipInWrite(Future<JsonMessage> messageFuture, Throwable error) {
//This is getting called instead!! Even though the exception happened during processing :(
//But I have no access to the original DBProjection data here, and messageFuture.get() gives me null.
}
}
And then I've configured my job like this:
#Configuration
public class ConversionBatchJobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private TaskExecutor processThreadPool;
#Bean
public SimpleCompletionPolicy processChunkSize(#Value("${commit.chunk.size:100}") Integer chunkSize) {
return new SimpleCompletionPolicy(chunkSize);
}
#Bean
#StepScope
public ItemStreamReader<DbProjection> dbReader(
MyDomainRepository myDomainRepository,
#Value("#{jobParameters[pageSize]}") Integer pageSize,
#Value("#{jobParameters[limit]}") Integer limit) {
RepositoryItemReader<DbProjection> myDomainRepositoryReader = new RepositoryItemReader<>();
myDomainRepositoryReader.setRepository(myDomainRepository);
myDomainRepositoryReader.setMethodName("findActiveDbDomains"); //A native query
myDomainRepositoryReader.setArguments(new ArrayList<Object>() {{
add("ACTIVE");
}});
myDomainRepositoryReader.setSort(new HashMap<String, Sort.Direction>() {{
put("update_date", Sort.Direction.ASC);
}});
myDomainRepositoryReader.setPageSize(pageSize);
myDomainRepositoryReader.setMaxItemCount(limit);
// myDomainRepositoryReader.setSaveState(false); <== haven't figured out what this does yet
return myDomainRepositoryReader;
}
#Bean
#StepScope
public ItemProcessor<DbProjection, JsonMessage> dataConverter(DataRetrievalSerivice dataRetrievalService) {
//Sometimes throws exceptions when DB data is exceptionally weird, bad, or missing
return new DbProjectionToJsonMessageConverter(dataRetrievalService);
}
#Bean
#StepScope
public AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter(
ItemProcessor<DbProjection, JsonMessage> dataConverter) throws Exception {
AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter = new AsyncItemProcessor<>();
asyncDataConverter.setDelegate(dataConverter);
asyncDataConverter.setTaskExecutor(processThreadPool);
asyncDataConverter.afterPropertiesSet();
return asyncDataConverter;
}
#Bean
#StepScope
public ItemWriter<JsonMessage> jsonPublisher(GcpPubsubPublisherService publisherService) {
return new JsonMessageWriter(publisherService);
}
#Bean
#StepScope
public AsyncItemWriter<JsonMessage> asyncJsonPublisher(ItemWriter<JsonMessage> jsonPublisher) throws Exception {
AsyncItemWriter<JsonMessage> asyncJsonPublisher = new AsyncItemWriter<>();
asyncJsonPublisher.setDelegate(jsonPublisher);
asyncJsonPublisher.afterPropertiesSet();
return asyncJsonPublisher;
}
#Bean
public Step conversionProcess(SimpleCompletionPolicy processChunkSize,
ItemStreamReader<DbProjection> dbReader,
AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter,
AsyncItemWriter<JsonMessage> asyncJsonPublisher,
ProcessStatus processStatus,
#Value("${conversion.failure.limit:20}") int maximumFailures) {
return stepBuilderFactory.get("conversionProcess")
.<DbProjection, Future<JsonMessage>>chunk(processChunkSize)
.reader(dbReader)
.processor(asyncDataConverter)
.writer(asyncJsonPublisher)
.faultTolerant()
.skipPolicy(new MyCustomConversionSkipPolicy(maximumFailures))
// ^ for now this returns true for everything until 20 failures
.listener(new ConversionSkipListener(processStatus))
.build();
}
#Bean
public Job conversionJob(Step conversionProcess) {
return jobBuilderFactory.get("conversionJob")
.start(conversionProcess)
.build();
}
}
This is because the future wrapped by the AsyncItemProcessor is only unwrapped in the AsyncItemWriter, so any exception that might occur at that time is seen as a write exception instead of a processing exception. That's why onSkipInWrite is called instead of onSkipInProcess.
This is actually a known limitation of this pattern which is documented in the Javadoc of the AsyncItemProcessor, here is an excerpt:
Because the Future is typically unwrapped in the ItemWriter,
there are lifecycle and stats limitations (since the framework doesn't know
what the result of the processor is).
While not an exhaustive list, things like StepExecution.filterCount will not
reflect the number of filtered items and
itemProcessListener.onProcessError(Object, Exception) will not be called.
The Javadoc states that the list is not exhaustive, and the side-effect regarding the SkipListener that you are experiencing is one these limitations.
Using Spring Boot 2.0.4 and JOOQ 3.11.3.
I have a server endpoint that needs fine-grained control over transaction management; it needs to issue multiple SQL statements before and after an external call and must not keep the DB transaction open while talking to the external site.
In the below code testTransactionV4 is the attempt I like best.
I've looked in the JOOQ manual but the transaction-management section is pretty light-on and seems to imply this is the way to do it.
It feels like I'm working harder than I should be here, which is usually a sign that I'm doing it wrong. Is there a better, "correct" way to do manual transaction management with Spring/JOOQ?
Also, any improvements to the implementation of the TransactionBean would be greatly appreciated (and upvoted).
But the point of this question is really just: "Is this the right way"?
TestEndpoint:
#Role.SystemApi
#SystemApiEndpoint
public class TestEndpoint {
private static Log log = to(TestEndpoint.class);
#Autowired private DSLContext db;
#Autowired private TransactionBean txBean;
#Autowired private Tx tx;
private void doNonTransactionalThing() {
log.info("long running thing that should not be inside a transaction");
}
/** Works; don't like the commitWithResult name but it'll do if there's
no better way. Implementation is ugly too.
*/
#JsonPostMethod("testTransactionV4")
public void testMultiTransactionWithTxBean() {
log.info("start testMultiTransactionWithTxBean");
AccountRecord account = txBean.commitWithResult( db ->
db.fetchOne(ACCOUNT, ACCOUNT.ID.eq(1)) );
doNonTransactionalThing();
account.setName("test_tx+"+new Date());
txBean.commit(db -> account.store() );
}
/** Works; but it's ugly, especially having to work around lambda final
requirements on references. */
#JsonPostMethod("testTransactionV3")
public void testMultiTransactionWithJooqApi() {
log.info("start testMultiTransactionWithJooqApi");
AtomicReference<AccountRecord> account = new AtomicReference<>();
db.transaction( config->
account.set(DSL.using(config).fetchOne(ACCOUNT, ACCOUNT.ID.eq(1))) );
doNonTransactionalThing();
account.get().setName("test_tx+"+new Date());
db.transaction(config->{
account.get().store();
});
}
/** Does not work, there's only one commit that spans over the long operation */
#JsonPostMethod("testTransactionV1")
#Transactional
public void testIncorrectSingleTransactionWithMethodAnnotation() {
log.info("start testIncorrectSingleTransactionWithMethodAnnotation");
AccountRecord account = db.fetchOne(ACCOUNT, ACCOUNT.ID.eq(1));
doNonTransactionalThing();
account.setName("test_tx+"+new Date());
account.store();
}
/** Works, but I don't like defining my tx boundaries this way, readability
is poor (relies on correct bean naming and even then is non-obvious) and is
fragile in the face of refactoring. When explicit TX boundaries are needed
I want them getting in my face straight away.
*/
#JsonPostMethod("testTransactionV2")
public void testMultiTransactionWithNestedComponent() {
log.info("start testTransactionWithComponentDelegation");
AccountRecord account = tx.readAccount();
doNonTransactionalThing();
account.setName("test_tx+"+new Date());
tx.writeAccount(account);
}
#Component
static class Tx {
#Autowired private DSLContext db;
#Transactional
public AccountRecord readAccount() {
return db.fetchOne(ACCOUNT, ACCOUNT.ID.eq(1));
}
#Transactional
public void writeAccount(AccountRecord account) {
account.store();
}
}
}
TransactionBean:
#Component
public class TransactionBean {
#Autowired private DSLContext db;
/**
Don't like the name, but can't figure out how to make it be just "commit".
*/
public <T> T commitWithResult(Function<DSLContext, T> worker) {
// Yuck, at the very least need an array or something as the holder.
AtomicReference<T> result = new AtomicReference<>();
db.transaction( config -> result.set(
worker.apply(DSL.using(config))
));
return result.get();
}
public void commit(Consumer<DSLContext> worker) {
db.transaction( config ->
worker.accept(DSL.using(config))
);
}
public void commit(Runnable worker) {
db.transaction( config ->
worker.run()
);
}
}
Use the TransactionTemplate to wrap the transactional part. Spring Boot provides one out-of-the-box so it is ready for use. You can use the execute method to wrap a call in a transaction.
#Autowired
private TransactionTemplate transaction;
#JsonPostMethod("testTransactionV1")
public void testIncorrectSingleTransactionWithTransactionTemplate() {
log.info("start testIncorrectSingleTransactionWithMethodAnnotation");
AccountRecord account = transaction.execute( status -> db.fetchOne(ACCOUNT, ACCOUNT.ID.eq(1)));
doNonTransactionalThing();
transaction.execute(status -> {
account.setName("test_tx+"+new Date());
account.store();
return null;
}
}
Something like that should do the trick. Not sure if the lambdas would work (keep forgetting the syntax of the TransactionCallback
In the olden days, we had ThreadLocal for programs to carry data along with the request path since all request processing was done on that thread and stuff like Logback used this with MDC.put("requestId", getNewRequestId());
Then Scala and functional programming came along and Futures came along and with them came Local.scala (at least I know the twitter Futures have this class). Future.scala knows about Local.scala and transfers the context through all the map/flatMap, etc. etc. functionality such that I can still do Local.set("requestId", getNewRequestId()); and then downstream after it has travelled over many threads, I can still access it with Local.get(...)
Soooo, my question is in Java, can I do the same thing with the new CompletableFuture somewhere with LocalContext or some object (not sure of the name) and in this way, I can modify Logback MDC context to store it in that context instead of a ThreadLocal such that I don't lose the request id and all my logs across the thenApply, thenAccept, etc. etc. still work just fine with logging and the -XrequestId flag in Logback configuration.
EDIT:
As an example. If you have a request come in and you are using Log4j or Logback, in a filter, you will set MDC.put("requestId", requestId) and then in your app, you will log many log statements line this:
log.info("request came in for url="+url);
log.info("request is complete");
Now, in the log output it will show this:
INFO {time}: requestId425 request came in for url=/mypath
INFO {time}: requestId425 request is complete
This is using a trick of ThreadLocal to achieve this. At Twitter, we use Scala and Twitter Futures in Scala along with a Local.scala class. Local.scala and Future.scala are tied together in that we can achieve the above scenario still which is very nice and all our log statements can log the request id so the developer never has to remember to log the request id and you can trace through a single customers request response cycle with that id.
I don't see this in Java :( which is very unfortunate as there are many use cases for that. Perhaps there is something I am not seeing though?
If you come across this, just poke the thread here
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-May/047867.html
to implement something like twitter Futures which transfer Locals (Much like ThreadLocal but transfers state).
See the def respond() method in here and how it calls Locals.save() and Locals.restort()
https://github.com/simonratner/twitter-util/blob/master/util-core/src/main/scala/com/twitter/util/Future.scala
If Java Authors would fix this, then the MDC in logback would work across all 3rd party libraries. Until then, IT WILL NOT WORK unless you can change the 3rd party library(doubtful you can do that).
My solution theme would be to (It would work with JDK 9+ as a couple of overridable methods are exposed since that version)
Make the complete ecosystem aware of MDC
And for that, we need to address the following scenarios:
When all do we get new instances of CompletableFuture from within this class? → We need to return a MDC aware version of the same rather.
When all do we get new instances of CompletableFuture from outside this class? → We need to return a MDC aware version of the same rather.
Which executor is used when in CompletableFuture class? → In all circumstances, we need to make sure that all executors are MDC aware
For that, let's create a MDC aware version class of CompletableFuture by extending it. My version of that would look like below
import org.slf4j.MDC;
import java.util.Map;
import java.util.concurrent.*;
import java.util.function.Function;
import java.util.function.Supplier;
public class MDCAwareCompletableFuture<T> extends CompletableFuture<T> {
public static final ExecutorService MDC_AWARE_ASYNC_POOL = new MDCAwareForkJoinPool();
#Override
public CompletableFuture newIncompleteFuture() {
return new MDCAwareCompletableFuture();
}
#Override
public Executor defaultExecutor() {
return MDC_AWARE_ASYNC_POOL;
}
public static <T> CompletionStage<T> getMDCAwareCompletionStage(CompletableFuture<T> future) {
return new MDCAwareCompletableFuture<>()
.completeAsync(() -> null)
.thenCombineAsync(future, (aVoid, value) -> value);
}
public static <T> CompletionStage<T> getMDCHandledCompletionStage(CompletableFuture<T> future,
Function<Throwable, T> throwableFunction) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return getMDCAwareCompletionStage(future)
.handle((value, throwable) -> {
setMDCContext(contextMap);
if (throwable != null) {
return throwableFunction.apply(throwable);
}
return value;
});
}
}
The MDCAwareForkJoinPool class would look like (have skipped the methods with ForkJoinTask parameters for simplicity)
public class MDCAwareForkJoinPool extends ForkJoinPool {
//Override constructors which you need
#Override
public <T> ForkJoinTask<T> submit(Callable<T> task) {
return super.submit(MDCUtility.wrapWithMdcContext(task));
}
#Override
public <T> ForkJoinTask<T> submit(Runnable task, T result) {
return super.submit(wrapWithMdcContext(task), result);
}
#Override
public ForkJoinTask<?> submit(Runnable task) {
return super.submit(wrapWithMdcContext(task));
}
#Override
public void execute(Runnable task) {
super.execute(wrapWithMdcContext(task));
}
}
The utility methods to wrap would be such as
public static <T> Callable<T> wrapWithMdcContext(Callable<T> task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.call();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static Runnable wrapWithMdcContext(Runnable task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.run();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static void setMDCContext(Map<String, String> contextMap) {
MDC.clear();
if (contextMap != null) {
MDC.setContextMap(contextMap);
}
}
Below are some guidelines for usage:
Use the class MDCAwareCompletableFuture rather than the class CompletableFuture.
A couple of methods in the class CompletableFuture instantiates the self version such as new CompletableFuture.... For such methods (most of the public static methods), use an alternative method to get an instance of MDCAwareCompletableFuture. An example of using an alternative could be rather than using CompletableFuture.supplyAsync(...), you can choose new MDCAwareCompletableFuture<>().completeAsync(...)
Convert the instance of CompletableFuture to MDCAwareCompletableFuture by using the method getMDCAwareCompletionStage when you get stuck with one because of say some external library which returns you an instance of CompletableFuture. Obviously, you can't retain the context within that library but this method would still retain the context after your code hits the application code.
While supplying an executor as a parameter, make sure that it is MDC Aware such as MDCAwareForkJoinPool. You could create MDCAwareThreadPoolExecutor by overriding execute method as well to serve your use case. You get the idea!
You can find a detailed explanation of all of the above here in a post about the same.
I am able to use standard spout,bolt combination to do streaming aggregation
and works very well in happy case, when using tick tuples to persist data at some interval
to make use of batching. Right now i am doing some failure management (tracking off tuples not saved etc) myself.(i.e not ootb from storm)
But i have read that trident gives you a higher abstraction and better failure management.
What i dont understand is whether there is tick tuple support in trident. Basically
I would like to batch in memory for the current minute or so and persist any aggregated data
for the previous minutes using trident.
Any pointers here or design suggestions would be helpful.
Thanks
Actually micro-batching is a built-in Trident's feature. You don't need any tick tuples for that. When you have something like this in your code:
topology
.newStream("myStream", spout)
.partitionPersist(
ElasticSearchEventState.getFactoryFor(connectionProvider),
new Fields("field1", "field2"),
new ElasticSearchEventUpdater()
)
(I'm using here my custom ElasticSearch state/updater, you might use something else)
So when you have something like this, under the hood Trident group your stream into batches and performs partitionPersist operation not on individual tuples but on those batches.
If you still need tick tuples for any reason, just create your tick spout, something like this works for me:
public class TickSpout implements IBatchSpout {
public static final String TIMESTAMP_FIELD = "timestamp";
private final long delay;
public TickSpout(long delay) {
this.delay = delay;
}
#Override
public void open(Map conf, TopologyContext context) {
}
#Override
public void emitBatch(long batchId, TridentCollector collector) {
Utils.sleep(delay);
collector.emit(new Values(System.currentTimeMillis()));
}
#Override
public void ack(long batchId) {
}
#Override
public void close() {
}
#Override
public Map getComponentConfiguration() {
return null;
}
#Override
public Fields getOutputFields() {
return new Fields(TIMESTAMP_FIELD);
}
}