Do terminal operations close the stream? - java-8

dirPath contains 200k files. I want to read them one by one and do some processing. The following snippet causes java.nio.file.FileSystemException: dirPath/file-N Too many open files. Isn't the terminal operation forEach() supposed to close the open stream (i.e. the open file) before moving to the next one? In other words, do I have to add try-with-resources for the streamed files?
Files.list(dirPath)
.forEach(filePath -> {
Files.lines(filePath).forEach() { ... }
});

No forEach does not close the stream (created by Files.list or Files.lines). It is documented in the javadoc, for example for Files.list:
The returned stream encapsulates a Reader. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.

A nested forEach is the wrong tool, in most cases.
The code
Files.list(dirPath).forEach(filePath -> Files.lines(filePath).forEach(line -> { ... });
can and should be replaced by
Files.list(dirPath).flatMap(filePath -> Files.lines(filePath)).forEach(line -> { ... });
or well, since it’s not that easy in this case:
Files.list(dirPath).flatMap(filePath -> {
try { return Files.lines(filePath);}
catch(IOException ex) { throw new UncheckedIOException(ex); }
}).forEach(line -> { });
as a side-effect, you get the following for free:
Stream.flatMap(…):
Each mapped stream is closed after its contents have been placed into this stream.
So that’s the preferred solution. Or well, to make it entirely correct:
try(Stream<Path> dirStream = Files.list(dirPath)) {
dirStream.flatMap(filePath -> {
try { return Files.lines(filePath);}
catch(IOException ex) { throw new UncheckedIOException(ex); }
}).forEach(line -> { });
}

Related

Webflux Reactor - Checking if all items in the original Flux were successful

i currently have this Reactor code where im not sure im doing this the idiomatic way.
My requirements are that for a list of accountIds, I make 2 requests which are done one after the other. One to delete the account data, the other is to trigger an event afterwards. The second request is only made if the first one succeeds.
At the end, i would like to know if all of the sets of requests were successful. I have achieved this with the code below.
Flux.fromIterable(List.of("accountId", "someOtherAccountId"))
.flatMap(accountId -> someWebclient.deleteAccountData(accountId)
.doOnSuccess(response -> log.info("Delete account data success"))
.onErrorResume(e -> {
log.info("Delete account data failure");
return Mono.empty();
})
.flatMap(deleteAccountDataResponse -> {
return eventServiceClient.triggerEvent("deleteAccountEvent")
.doOnSuccess(response -> log.info("Delete account event success"))
.onErrorResume(e -> {
log.info("Delete account event failure");
return Mono.empty();
});
}))
.count()
.subscribe(items -> {
if (items.intValue() == accountIdsToForget.size()) {
log.info("All accountIds deleted and events triggered successfully");
} else {
log.info("Not all accoundIds deleted and events triggered successfully");
}
});
Is there a better way to achieve this?
As the webclients can return errors for 4xx and 5xx, i am having to swallow that up with onErrorResume in order to prevent the error from bubbling up. Similarly, the only way i have been able to capture if all of the accountIds have been processed is by checking the size of the Flux against the size of the List which it was started with
Disclaimer: it is a little subjective how to provide a better solution. In this answer, I will provide my personal choice of error handling, that, in my opinion, provides best extensibility and readability.
I would model a result/report object (kind like Either in functional paradigm), so that each success or error is sent as a "next signal" downstream.
It requires a little more code/boilerplate, but the benefit is that we end up with a flow of successes and failures produced on the fly. It allows to detect errors early, and ease both error recovery and pipeline extensibility (for example, it is then very easy to switch between fail-fast and error silencing strategies, or to build complex reports from upstream results, etc.).
Let's try to apply this to your example. For simplicity, I will mock deletion and notification service with two methods that return an empty result on success:
static Mono<Void> delete(String account) {
if (account.isBlank()) return Mono.error(new IllegalArgumentException("EMPTY ACCOUNT !"));
else return Mono.empty();
}
static Mono<Void> notify(String event) {
if (event.isBlank()) return Mono.error(new IllegalArgumentException("UNKNOWN EVENT !"));
return Mono.empty();
}
I would make this steps:
Create result model:
sealed interface Result { String accountId(); }
sealed interface Error extends Result { Throwable cause(); }
record DeletionError(String accountId, Throwable cause) implements Error {}
record NotifyError(String accountId, Throwable cause) implements Error {}
record Success(String accountId) implements Result {}
Then, we can prepare our pipeline that will wrap our delete and notify operations to make them produce result objects:
static Flux<Result> deleteAndNotify(Flux<String> accounts) {
Function<String, Mono<Result>> safeDelete = account
-> delete(account)
.<Result>thenReturn(new Success(account))
.onErrorResume(err -> Mono.just(new DeletionError(account, err)));
Function<Result, Mono<Result>> safeNotify = deletionResult -> deletionResult instanceof Success
? notify("deleteAccountEvent")
.thenReturn(deletionResult)
.onErrorResume(err -> Mono.just(new NotifyError(deletionResult.accountId(), err)))
: Mono.just(deletionResult);
return accounts.flatMap(safeDelete)
.flatMap(safeNotify);
}
With the code above, you can already receive errors as they arrive. A simple program:
var results = deleteAndNotify(Flux.just("a1", "a2", " ", "a3"));
results.subscribe(System.out::println);
prints:
Success[accountId=a1]
Success[accountId=a2]
DeletionError[accountId= , cause=java.lang.IllegalArgumentException: EMPTY ACCOUNT !]
Success[accountId=a3]
Now, it becomes very simple to adapt your flow of control:
if we want to keep track of errors only, we just have to chain a simple filter: results.filter(it -> it instanceof Error)
To fail-fast, just map error result to a real error: results.flatMap(result -> result instanceof Error err ? Mono.error(err.cause()) : Mono.just(result))
You want to get an idea of the flow throughput ? Just time it: results.timed()
etc.
And if you want to count, you can now directly count errors and successes on the fly. It provides a few advantages:
You are not forced to know the number of accounts to delete in advance to verify if any error happened
You can have a live monitoring of the failed/succeeded operations
We can program counting like that:
record Count(long success, long deleteFailed, long notifyFailed) {
Count() { this(0, 0, 0); }
Count newSuccess() { return new Count(success + 1, deleteFailed, notifyFailed); }
Count newDeletionFailure() { return new Count(success, deleteFailed + 1, notifyFailed); }
Count newNotifyFailure() { return new Count(success, deleteFailed, notifyFailed + 1); }
}
var counting = results.scanWith(Count::new, (count, result) -> switch (result) {
case Success s -> count.newSuccess();
case DeletionError de -> count.newDeletionFailure();
case NotifyError ne -> count.newNotifyFailure();
});
Subscribing to this counting flow using the same input accounts as above would produce that kind of input:
Count[success=0, deleteFailed=0, notifyFailed=0]
Count[success=1, deleteFailed=0, notifyFailed=0]
Count[success=2, deleteFailed=0, notifyFailed=0]
Count[success=2, deleteFailed=1, notifyFailed=0]
Count[success=3, deleteFailed=1, notifyFailed=0]
If you want only a total count, then either use counting.last() or replace scanWith by reduceWith operator.
I hope this answer is of any help to you to better model pipelines/DAG/flows of operations.

Spring webflux with multiple sequential API call and convert to flux object without subscribe and block

I am working on spring reactive and need to call multiple calls sequentially to other REST API using webclient. The issue is I am able to call multiple calls to other Rest API but response am not able to read without subscribe or block. I can't use subscribe or block due to non reactive programming. Is there any way, i can merge while reading the response and send it as flux.
Below is the piece of code where I am stuck.
private Flux<SeasonsDto> getSeasonsInfo(List<HuntsSeasonsMapping> l2, String seasonsUrl) {
for (HuntsSeasonsMapping s : l2)
{
List<SeasonsJsonDto> list = huntsSeasonsProcessor.appendSeaosonToJson(s.getSeasonsRef());
for (SeasonsJsonDto sjdto:list)
{
Mono<SeasonsDto> mono =new SeasonsAdapter("http://localhost:8087/").callToSeasonsAPI(sjdto.getSeasonsRef());
//Not able to read stream without subscribe an return as Flux object
}
public Mono<SeasonsDto> callToSeasonsAPI(Long long1) {
LOGGER.debug("Seasons API call");
return this.webClient.get().uri("hunts/seasonsInfo/"
+long1).header("X-GoHunt-LoggedIn-User",
"a4d4b427-c716-458b-9bb5-9917b6aa30ff").retrieve().bodyToMono(SeasonsDto.class);
}
Please help to resolve this.
You need to combine the reactive streams using operators such as map, flatMap and concatMap.
private Flux<SeasonsDto> getSeasonsInfo(List<HuntsSeasonsMapping> l2, String seasonsUrl) {
List<Mono<SeasonsDto>> monos = new ArrayList<>();
for (HuntsSeasonsMapping s : l2) {
List<SeasonsJsonDto> list = huntsSeasonsProcessor.appendSeaosonToJson(s.getSeasonsRef());
for (SeasonsJsonDto sjdto:list) {
Mono<SeasonsDto> mono =new SeasonsAdapter("http://localhost:8087/").callToSeasonsAPI(sjdto.getSeasonsRef());
//Not able to read stream without subscribe an return as Flux object
monos.add(mono);
}
}
return Flux.fromIterable(monos).concatMap(mono -> mono);
}
This can further be improved using the steam API, which I suggest you look into, but I didn't want to change too much of your existing code.
I have figured how to do this. I have completely rewrite the code and change in reactive. It means all the for loop has been removed. Below is the code for the same and may be help for others.
public Flux<SeasonsDto> getAllSeasonDetails(String uuid) {
return hunterRepository.findByUuidAndIsPrimaryAndDeleted(uuid, true, false).next().flatMapMany(h1 -> {
return huntsMappingRepository.findByHunterIdAndDeleted(h1.getId(), false).flatMap(k -> {
return huntsMappingRepository.findByHuntReferrenceIdAndDeleted(k.getHuntReferrenceId(), false)
.flatMap(l2 -> {
return huntsSeasonsProcessor.appendSeaosonToJsonFlux(l2.getSeasonsDtl()).flatMap(fs -> {
return seasonsAdapter.callSeasonsAPI(fs.getSeasonsRef(), h1.getId(), uuid).map(k->{
return k;
});
});
});
});
});
}

How to use integrationFlows for new files?

I was following a tutorial on how to listen to a folder with spring integration and SseEmitter. I have this code now:
#Bean
IntegrationFlow inboundFlow ( #Value("${input-dir:file:C:\\Users\\kader\\Desktop\\Scaned\\}") File in){
return IntegrationFlows.from(Files.inboundAdapter(in).autoCreateDirectory(true),
poller -> poller.poller(spec -> spec.fixedRate(1000L)))
.transform(File.class, File::getAbsolutePath)
.handle(String.class, (path, map) -> {
sses.forEach((sse) -> {
try {
String p = path;
sse.send(SseEmitter.event().name("spring").data(p));
}
catch (IOException e) {
throw new RuntimeException(e);
}
});
return null ;
})
.get();
}
and it works but it sends all the files in the specified directory including the files that already exist, is there any way to make it ignore them and send the new files only???
Well, actually since you don't configure any filters on the Files.inboundAdapter(), there is a logic like this:
// no filters are provided
else if (Boolean.FALSE.equals(this.preventDuplicates)) {
filtersNeeded.add(new AcceptAllFileListFilter<File>());
}
else { // preventDuplicates is either TRUE or NULL
filtersNeeded.add(new AcceptOnceFileListFilter<File>());
}
Therefore an AcceptOnceFileListFilter is applied and no any already polled files are not going to be picked up on the subsequent poll tasks.
However you really may talk about something like "after application restart", so yes, in this case all the files are going to be pulled.
I believe you need to study what is the FileListFilter and use an appropriate for your use-case: https://docs.spring.io/spring-integration/docs/current/reference/html/files.html#file-reading

Downlolad and save file from ClientRequest using ExchangeFunction in Project Reactor

I have problem with correctly saving a file after its download is complete in Project Reactor.
class HttpImageClientDownloader implements ImageClientDownloader {
private final ExchangeFunction exchangeFunction;
HttpImageClientDownloader() {
this.exchangeFunction = ExchangeFunctions.create(new ReactorClientHttpConnector());
}
#Override
public Mono<File> downloadImage(String url, Path destination) {
ClientRequest clientRequest = ClientRequest.create(HttpMethod.GET, URI.create(url)).build();
return exchangeFunction.exchange(clientRequest)
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
//.flatMapMany(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
.flatMap(dataBuffer -> {
AsynchronousFileChannel fileChannel = createFile(destination);
return DataBufferUtils
.write(dataBuffer, fileChannel, 0)
.publishOn(Schedulers.elastic())
.doOnNext(DataBufferUtils::release)
.then(Mono.just(destination.toFile()));
});
}
private AsynchronousFileChannel createFile(Path path) {
try {
return AsynchronousFileChannel.open(path, StandardOpenOption.CREATE);
} catch (Exception e) {
throw new ImageDownloadException("Error while creating file: " + path, e);
}
}
}
So my question is:
Is DataBufferUtils.write(dataBuffer, fileChannel, 0) blocking?
What about when the disk is slow?
And second question about what happens when ImageDownloadException occurs ,
In doOnNext I want to release the given data buffer, is that a good place for this kind operation?
I think also this line:
.map(clientResponse -> clientResponse.body(BodyExtractors.toDataBuffers()))
could be blocking...
Here's another (shorter) way to achieve that:
Flux<DataBuffer> data = this.webClient.get()
.uri("/greeting")
.retrieve()
.bodyToFlux(DataBuffer.class);
Path file = Files.createTempFile("spring", null);
WritableByteChannel channel = Files.newByteChannel(file, StandardOpenOption.WRITE);
Mono<File> result = DataBufferUtils.write(data, channel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
Now DataBufferUtils::write operations are not blocking because they use non-blocking IO with channels. Writing to such channels means it'll write whatever it can to the output buffer (i.e. may write all the DataBuffer or just part of it).
Using Flux::map or Flux::doOnNext is the right place to do that. But you're right, if an error occurs, you're still responsible for releasing the current buffer (and all the remaining ones). There might be something we can improve here in Spring Framework, please keep an eye on SPR-16782.
I don't see how your last sample shows anything blocking: all methods return reactive types and none are doing blocking I/O.

struggling with asynchronous patterns using NSURLSession

I'm using Xcode 7 and Swift 2 but my question isn't necessarily code specific, I'll gladly take help of any variety.
In my app I have a list of favorites. Due to API TOS I can't store any data, so I just keep a stub I can use to lookup when the user opens the app. I also have to look up each favorite one by one as there is no batch method. Right now I have something like this:
self.api.loadFavorite(id, completion: { (event, errorMessage) -> Void in
if errorMessage == "" {
if let rc = self.refreshControl {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
rc.endRefreshing()
}
}
dispatch_async(dispatch_get_main_queue()) { () -> Void in
self.viewData.append(event)
self.viewData.sortInPlace({ $0.eventDate.compare($1.eventDate) == NSComparisonResult.OrderedDescending })
self.tableView.reloadData()
}
} else {
// some more error handling here
}
})
in api.loadFavorite I'm making a typical urlSession.dataTaskWithURL which is itself asynchronous.
You can see what happens here is that the results are loaded in one by one and after each one the view refreshes. This does work but its not optimal, for long lists you get a noticeable "flickering" as the view sorts and refreshes.
I want to be able to get all the results then just refresh once. I tried putting a dispatch group around the api.loadFavorites but the async calls in dataTaskWith URL don't seem to be bound by that group. I also tried putting the dispatch group around just the dataTaskWithURL but didn't have any better luck. The dispatch_group_notify always fires before all the data tasks are done.
Am I going at this all wrong? (probably) I considered switching to synchronous calls in the background thread since the api only allows one connection per client anyway but that just feels like the wrong approach.
I'd love to know how to get async calls that make other async calls grouped up so that I can get a single notification to update my UI.
For the record I've read about every dispatch group thread I could find here and I haven't been able to make any of them work. Most examples on the web are very simple, a series of print's in a dispatch group with a sleep to prove the case.
Thanks in advance.
If you want to invoke your method loadFavorite asynchronously in a loop for all favorite ids - which executes them in parallel - you can achieve this with a new method as shown below:
func loadFavorites(ids:[Int], completion: ([Event], ErrorType?) -> ()) {
var count = ids.count
var events = [Event]()
if count == 0 {
dispatch_async(dispatch_get_global_queue(0, 0)) {
completion(events, nil)
}
return
}
let sync_queue = dispatch_queue_create("sync_queue", dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_USER_INITIATED, 0))
for i in ids {
self.api.loadFavorite(i) { (event, message) in
dispatch_async(sync_queue) {
if message == "" {
events.append(event)
if --count == 0 {
dispatch_async(dispatch_get_global_queue(0, 0)) {
completion(events, nil)
}
}
}
else {
// handle error
}
}
}
}
}
Note:
- Use a sync queue in order to synchronise access to shared array
events and the counter!
- Use a global dispatch queue where you invoke the completion handler!
Then call it like below:
self.loadFavorites(favourites) { (events, error) in
if (error == nil) {
events.sortInPlace({ $0.eventDate.compare($1.eventDate) == NSComparisonResult.OrderedDescending })
dispatch_async(dispatch_get_main_queue()) { () -> Void in
self.viewData = events
self.tableView.reloadData()
}
}
if let rc = self.refreshControl {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
rc.endRefreshing()
}
}
Note also, that you need a different approach when you want to ensure that your calls to loadFavorite should be sequential.
If you need to support cancellation (well, who does not require this?), you might try to cancel the NSURLSession's tasks. However, in this case I would recommend to utilise a third party library which already supports cancellation of network tasks.
Alternatively, and in order to greatly simplify your asynchronous problems like those, build your network task and any other asynchronous task around a general utility class, frequently called Future or Promise. A future represents an eventual result, and is quite light wight. They are also "composable", that is you can define "continuations" which get invoked when the future completes, which in turn returns yet another future where you can add more continuations, and so force. See wiki Futures and Promises.
There are a couple of implementations in Swift and Objective-C. Ideally, these should also support cancellation. Unfortunately, I don't know any Swift library implementing Futures or Promises which support cancellation at this time - except my own library, which is not yet Open Source.
Another library which helps to solve common and also very complex asynchronous patterns is ReactiveCocoa, though it has a very steep learning curve and adds quite a lot of code to your project.
This is what finally worked for me. Easy once I figured it out. My problem was trying to take ObjC examples and rework them for swift.
func migrateFavorites(completion:(error: Bool) -> Void) {
let migrationGroup = dispatch_group_create()
let queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
// A lot of other code in there, fetching some core data etc
dispatch_group_enter(migrationGroup)
self.api.loadFavorite(id, completion: { (event, errorMessage) -> Void in
if errorMessage == "" {
if let rc = self.refreshControl {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
rc.endRefreshing()
}
}
dispatch_async(dispatch_get_main_queue()) { () -> Void in
self.viewData.append(event)
self.viewData.sortInPlace({ $0.eventDate.compare($1.eventDate) == NSComparisonResult.OrderedDescending })
self.tableView.reloadData()
}
} else {
// some more error handling here
}
dispatch_group_leave(migrationGroup)
})
dispatch_group_notify(migrationGroup, queue) { () -> Void in
NSLog("Migration Queue Complete")
dispatch_async(dispatch_get_main_queue()) { () -> Void in
completion(error: migrationError)
}
}
}
The key was:
ENTER the group just before the async call
LEAVE the group as the last line in the completion handler
As I mentioned all this is wrapped up in a function so I put the function's completion handler inside the dispatch_group_notify. So I call this function and the completion handler only gets invoked when all the async tasks are complete. Back on my main thread I check for the error and refresh the ui.
Hopefully this helps someone with the same problem.

Resources