Downlolad and save file from ClientRequest using ExchangeFunction in Project Reactor - spring-boot

I have problem with correctly saving a file after its download is complete in Project Reactor.
class HttpImageClientDownloader implements ImageClientDownloader {
private final ExchangeFunction exchangeFunction;
HttpImageClientDownloader() {
this.exchangeFunction = ExchangeFunctions.create(new ReactorClientHttpConnector());
}
#Override
public Mono<File> downloadImage(String url, Path destination) {
ClientRequest clientRequest = ClientRequest.create(HttpMethod.GET, URI.create(url)).build();
return exchangeFunction.exchange(clientRequest)
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
//.flatMapMany(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
.flatMap(dataBuffer -> {
AsynchronousFileChannel fileChannel = createFile(destination);
return DataBufferUtils
.write(dataBuffer, fileChannel, 0)
.publishOn(Schedulers.elastic())
.doOnNext(DataBufferUtils::release)
.then(Mono.just(destination.toFile()));
});
}
private AsynchronousFileChannel createFile(Path path) {
try {
return AsynchronousFileChannel.open(path, StandardOpenOption.CREATE);
} catch (Exception e) {
throw new ImageDownloadException("Error while creating file: " + path, e);
}
}
}
So my question is:
Is DataBufferUtils.write(dataBuffer, fileChannel, 0) blocking?
What about when the disk is slow?
And second question about what happens when ImageDownloadException occurs ,
In doOnNext I want to release the given data buffer, is that a good place for this kind operation?
I think also this line:
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
could be blocking...

Here's another (shorter) way to achieve that:
Flux<DataBuffer> data = this.webClient.get()
.uri("/greeting")
.retrieve()
.bodyToFlux(DataBuffer.class);
Path file = Files.createTempFile("spring", null);
WritableByteChannel channel = Files.newByteChannel(file, StandardOpenOption.WRITE);
Mono<File> result = DataBufferUtils.write(data, channel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
Now DataBufferUtils::write operations are not blocking because they use non-blocking IO with channels. Writing to such channels means it'll write whatever it can to the output buffer (i.e. may write all the DataBuffer or just part of it).
Using Flux::map or Flux::doOnNext is the right place to do that. But you're right, if an error occurs, you're still responsible for releasing the current buffer (and all the remaining ones). There might be something we can improve here in Spring Framework, please keep an eye on SPR-16782.
I don't see how your last sample shows anything blocking: all methods return reactive types and none are doing blocking I/O.

Related

How to use integrationFlows for new files?

I was following a tutorial on how to listen to a folder with spring integration and SseEmitter. I have this code now:
#Bean
IntegrationFlow inboundFlow ( #Value("${input-dir:file:C:\\Users\\kader\\Desktop\\Scaned\\}") File in){
return IntegrationFlows.from(Files.inboundAdapter(in).autoCreateDirectory(true),
poller -> poller.poller(spec -> spec.fixedRate(1000L)))
.transform(File.class, File::getAbsolutePath)
.handle(String.class, (path, map) -> {
sses.forEach((sse) -> {
try {
String p = path;
sse.send(SseEmitter.event().name("spring").data(p));
}
catch (IOException e) {
throw new RuntimeException(e);
}
});
return null ;
})
.get();
}
and it works but it sends all the files in the specified directory including the files that already exist, is there any way to make it ignore them and send the new files only???
Well, actually since you don't configure any filters on the Files.inboundAdapter(), there is a logic like this:
// no filters are provided
else if (Boolean.FALSE.equals(this.preventDuplicates)) {
filtersNeeded.add(new AcceptAllFileListFilter<File>());
}
else { // preventDuplicates is either TRUE or NULL
filtersNeeded.add(new AcceptOnceFileListFilter<File>());
}
Therefore an AcceptOnceFileListFilter is applied and no any already polled files are not going to be picked up on the subsequent poll tasks.
However you really may talk about something like "after application restart", so yes, in this case all the files are going to be pulled.
I believe you need to study what is the FileListFilter and use an appropriate for your use-case: https://docs.spring.io/spring-integration/docs/current/reference/html/files.html#file-reading

Kafka Streams - The state store may have migrated to another instance

I'm writing a basic application to test the Interactive Queries feature of Kafka Streams. Here is the code:
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
KeyValueBytesStoreSupplier waypointsStoreSupplier = Stores.persistentKeyValueStore("test-store");
StoreBuilder waypointsStoreBuilder = Stores.keyValueStoreBuilder(waypointsStoreSupplier, Serdes.ByteArray(), Serdes.Integer());
final KStream<byte[], byte[]> waypointsStream = builder.stream("sample1");
final KStream<byte[], TruckDriverWaypoint> waypointsDeserialized = waypointsStream
.mapValues(CustomSerdes::deserializeTruckDriverWaypoint)
.filter((k,v) -> v.isPresent())
.mapValues(Optional::get);
waypointsDeserialized.groupByKey().aggregate(
() -> 1,
(aggKey, newWaypoint, aggValue) -> {
aggValue = aggValue + 1;
return aggValue;
}, Materialized.<byte[], Integer, KeyValueStore<Bytes, byte[]>>as("test-store").withKeySerde(Serdes.ByteArray()).withValueSerde(Serdes.Integer())
);
final KafkaStreams streams = new KafkaStreams(builder.build(), new StreamsConfig(createStreamsProperties()));
streams.cleanUp();
streams.start();
ReadOnlyKeyValueStore<byte[], Integer> keyValueStore = streams.store("test-store", QueryableStoreTypes.keyValueStore());
KeyValueIterator<byte[], Integer> range = keyValueStore.all();
while (range.hasNext()) {
KeyValue<byte[], Integer> next = range.next();
System.out.println(next.value);
}
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
protected static Properties createStreamsProperties() {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "random167");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "client-id");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, Serdes.Integer().getClass().getName());
//streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10000);
return streamsConfiguration;
}
So my problem is, every time I run this I get this same error:
Exception in thread "main" org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, test-store, may have migrated to another instance.
I'm running only 1 instance of the application, and the topic I'm consuming from has only 1 partition.
Any idea what I'm doing wrong ?
Looks like you have a race condition. From the kafka streams javadoc for KafkaStreams::start() it says:
Start the KafkaStreams instance by starting all its threads. This function is expected to be called only once during the life cycle of the client.
Because threads are started in the background, this method does not block.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html
You're calling streams.store() immediately after streams.start(), but I'd wager that you're in a state where it hasn't initialized fully yet.
Since this is code appears to be just for testing, add a Thread.sleep(5000) or something in there and give it a go. (This is not a solution for production) Depending on your input rate into the topic, that'll probably give a bit of time for the store to start filling up with events so that your KeyValueIterator actually has something to process/print.
Probably not applicable to OP but might help others:
In trying to retrieve a KTable's store, make sure the the KTable's topic exists first or you'll get this exception.
I failed to call Storebuilder before consuming the store.
Typically this happens for two reasons:
The local KafkaStreams instance is not yet ready (i.e., not yet in
runtime state RUNNING, see Run-time Status Information) and thus its
local state stores cannot be queried yet. The local KafkaStreams
instance is ready (e.g. in runtime state RUNNING), but the particular
state store was just migrated to another instance behind the scenes.
This may notably happen during the startup phase of a distributed
application or when you are adding/removing application instances.
https://docs.confluent.io/platform/current/streams/faq.html#handling-invalidstatestoreexception-the-state-store-may-have-migrated-to-another-instance
The simplest approach is to guard against InvalidStateStoreException when calling KafkaStreams#store():
// Example: Wait until the store of type T is queryable. When it is, return a reference to the store.
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}

How to handle empty event in Spring reactor

Well, this sounds counter-intuitive to what reactive programming is, but I am unable to comprehend a way to handle nulls/exceptions.
private static class Data {
public Mono<String> first() {
return Mono.just("first");
}
public Mono<String> second() {
return Mono.just("second");
}
public Mono<String> empty() {
return Mono.empty();
}
}
I understand that fundamentally unless a publisher publishes an event, a subscriber will not act. So a code like this would work.
Data data = new Data();
data.first()
.subscribe(string -> Assertions.assertThat(string).isEqualTo("first"));
And if the first call returns empty, I can do this.
Data data = new Data();
data.empty()
.switchIfEmpty(data.second())
.subscribe(string -> Assertions.assertThat(string).isEqualTo("second"));
But how do I handle a case when both the calls return empty (typically this is an exception scenario that would need to be propagated to the user).
Data data = new Data();
data.empty()
.switchIfEmpty(data.empty())
.handle((string, sink) -> Objects.requireNonNull(string))
.block();
The handle is not called in the above example since no event was published.
as JB Nizet pointed out, you can chain in a second switchIfEmpty with a Mono.error.
Or, if you're fine with a NoSuchElementException, you could chain in single(). It enforces a strong contract of exactly one element, otherwise propagating that standard exception.

How to perform new operation on #RetryOnFailure by jcabi

Iam using jcabi-aspects to retry connection to my URL http://xxxxxx:8080/hello till the connection comes back.As you know #RetryOnFailure by jcabi has two fields attempts and delay.
I want to perform the operation like attempts(12)=expiryTime(1 min=60000 millis)/delay(5 sec=5000 millis) on jcabi #RetryOnFailure.How do i do this.The code snippet is as below.
#RetryOnFailure(attempts = 12, delay = 5)
public String load(URL url) {
return url.openConnection().getContent();
}
You can combine two annotations:
#Timeable(unit = TimeUnit.MINUTE, limit = 1)
#RetryOnFailure(attempts = Integer.MAX_VALUE, delay = 5)
public String load(URL url) {
return url.openConnection().getContent();
}
#RetryOnFailure will retry forever, but #Timeable will stop it in a minute.
The library you picked (jcabi) does not have this feature. But luckily the very handy RetryPolicies from Spring-Batch have been extracted (so you can use them alone, without the batching):
Spring-Retry
One of the many classes you could use from there is TimeoutRetryPolicy:
RetryTemplate template = new RetryTemplate();
TimeoutRetryPolicy policy = new TimeoutRetryPolicy();
policy.setTimeout(30000L);
template.setRetryPolicy(policy);
Foo result = template.execute(new RetryCallback<Foo>() {
public Foo doWithRetry(RetryContext context) {
// Do stuff that might fail, e.g. webservice operation
return result;
}
});
The whole spring-retry project is very easy to use and full of features, like backOffPolicies, listeners, etc.

Non-Blocking Endpoint: Returning an operation ID to the caller - Would like to get your opinion on my implementation?

Boot Pros,
I recently started to program in spring-boot and I stumbled upon a question where I would like to get your opinion on.
What I try to achieve:
I created a Controller that exposes a GET endpoint, named nonBlockingEndpoint. This nonBlockingEndpoint executes a pretty long operation that is resource heavy and can run between 20 and 40 seconds.(in the attached code, it is mocked by a Thread.sleep())
Whenever the nonBlockingEndpoint is called, the spring application should register that call and immediatelly return an Operation ID to the caller.
The caller can then use this ID to query on another endpoint queryOpStatus the status of this operation. At the beginning it will be started, and once the controller is done serving the reuqest it will be to a code such as SERVICE_OK. The caller then knows that his request was successfully completed on the server.
The solution that I found:
I have the following controller (note that it is explicitely not tagged with #Async)
It uses an APIOperationsManager to register that a new operation was started
I use the CompletableFuture java construct to supply the long running code as a new asynch process by using CompletableFuture.supplyAsync(() -> {}
I immdiatelly return a response to the caller, telling that the operation is in progress
Once the Async Task has finished, i use cf.thenRun() to update the Operation status via the API Operations Manager
Here is the code:
#GetMapping(path="/nonBlockingEndpoint")
public #ResponseBody ResponseOperation nonBlocking() {
// Register a new operation
APIOperationsManager apiOpsManager = APIOperationsManager.getInstance();
final int operationID = apiOpsManager.registerNewOperation(Constants.OpStatus.PROCESSING);
ResponseOperation response = new ResponseOperation();
response.setMessage("Triggered non-blocking call, use the operation id to check status");
response.setOperationID(operationID);
response.setOpRes(Constants.OpStatus.PROCESSING);
CompletableFuture<Boolean> cf = CompletableFuture.supplyAsync(() -> {
try {
// Here we will
Thread.sleep(10000L);
} catch (InterruptedException e) {}
// whatever the return value was
return true;
});
cf.thenRun(() ->{
// We are done with the super long process, so update our Operations Manager
APIOperationsManager a = APIOperationsManager.getInstance();
boolean asyncSuccess = false;
try {asyncSuccess = cf.get();}
catch (Exception e) {}
if(true == asyncSuccess) {
a.updateOperationStatus(operationID, Constants.OpStatus.OK);
a.updateOperationMessage(operationID, "success: The long running process has finished and this is your result: SOME RESULT" );
}
else {
a.updateOperationStatus(operationID, Constants.OpStatus.INTERNAL_ERROR);
a.updateOperationMessage(operationID, "error: The long running process has failed.");
}
});
return response;
}
Here is also the APIOperationsManager.java for completness:
public class APIOperationsManager {
private static APIOperationsManager instance = null;
private Vector<Operation> operations;
private int currentOperationId;
private static final Logger log = LoggerFactory.getLogger(Application.class);
protected APIOperationsManager() {}
public static APIOperationsManager getInstance() {
if(instance == null) {
synchronized(APIOperationsManager.class) {
if(instance == null) {
instance = new APIOperationsManager();
instance.operations = new Vector<Operation>();
instance.currentOperationId = 1;
}
}
}
return instance;
}
public synchronized int registerNewOperation(OpStatus status) {
cleanOperationsList();
currentOperationId = currentOperationId + 1;
Operation newOperation = new Operation(currentOperationId, status);
operations.add(newOperation);
log.info("Registered new Operation to watch: " + newOperation.toString());
return newOperation.getId();
}
public synchronized Operation getOperation(int id) {
for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
return op;
}
}
Operation notFound = new Operation(-1, OpStatus.INTERNAL_ERROR);
notFound.setCrated(null);
return notFound;
}
public synchronized void updateOperationStatus (int id, OpStatus newStatus) {
iteration : for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
op.setStatus(newStatus);
log.info("Updated Operation status: " + op.toString());
break iteration;
}
}
}
public synchronized void updateOperationMessage (int id, String message) {
iteration : for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if(op.getId() == id) {
op.setMessage(message);
log.info("Updated Operation status: " + op.toString());
break iteration;
}
}
}
private synchronized void cleanOperationsList() {
Date now = new Date();
for(Iterator<Operation> iterator = operations.iterator(); iterator.hasNext();) {
Operation op = iterator.next();
if((now.getTime() - op.getCrated().getTime()) >= Constants.MIN_HOLD_DURATION_OPERATIONS ) {
log.info("Removed operation from watchlist: " + op.toString());
iterator.remove();
}
}
}
}
The questions that I have
Is that concept a valid one that also scales? What could be improved?
Will i run into concurrency issues / race conditions?
Is there a better way to achieve the same in boot spring, but I just didn't find that yet? (maybe with the #Async directive?)
I would be very happy to get your feedback.
Thank you so much,
Peter P
It is a valid pattern to submit a long running task with one request, returning an id that allows the client to ask for the result later.
But there are some things I would suggest to reconsider :
do not use an Integer as id, as it allows an attacker to guess ids and to get the results for those ids. Instead use a random UUID.
if you need to restart your application, all ids and their results will be lost. You should persist them to a database.
Your solution will not work in a cluster with many instances of your application, as each instance would only know its 'own' ids and results. This could also be solved by persisting them to a database or Reddis store.
The way you are using CompletableFuture gives you no control over the number of threads used for the asynchronous operation. It is possible to do this with standard Java, but I would suggest to use Spring to configure the thread pool
Annotating the controller method with #Async is not an option, this does not work no way. Instead put all asynchronous operations into a simple service and annotate this with #Async. This has some advantages :
You can use this service also synchronously, which makes testing a lot easier
You can configure the thread pool with Spring
The /nonBlockingEndpoint should not return the id, but a complete link to the queryOpStatus, including id. The client than can directly use this link without any additional information.
Additionally there are some low level implementation issues which you may also want to change :
Do not use Vector, it synchronizes on every operation. Use a List instead. Iterating over a List is also much easier, you can use for-loops or streams.
If you need to lookup a value, do not iterate over a Vector or List, use a Map instead.
APIOperationsManager is a singleton. That makes no sense in a Spring application. Make it a normal PoJo and create a bean of it, get it autowired into the controller. Spring beans by default are singletons.
You should avoid to do complicated operations in a controller method. Instead move anything into a service (which may be annotated with #Async). This makes testing easier, as you can test this service without a web context
Hope this helps.
Do I need to make database access transactional ?
As long as you write/update only one row, there is no need to make this transactional as this is indeed 'atomic'.
If you write/update many rows at once you should make it transactional to guarantee, that either all rows are updated or none.
However, if two operations (may be from two clients) update the same row, always the last one will win.

Resources