Is there a way to batch upload a collection of InputStreams to Amazon S3 using the Java SDK? - spring-boot

I am aware of the TransferManager and the .uploadFileList() and .uploadFileDirectory() methods, however they accept java.io.File types as arguments. I have a collection of byte array input streams containing jpeg image data. I don't want to create in-memory files to store this data before I upload it either.
So what I need is essentially what the S3 client's PutObjectRequest does but for a collection of InputStream objects. Also, if one upload fails, I want to abort the whole thing and not upload anything, much like how a database transaction will reverse the changes if something goes wrong along the way.
Is this possible with the Java SDK?

Before I share an answer, please consider upgrading...
fyi - TransferManager is deprecated, now supported as TransferManagerBuilder in JAVA AWS SDK, please consider upgrading if TransferManagerBuilder Object suits your needs.
now since you asked about TransferManager, you could either 1) copy the code below and replace the functionality/arguments with your custom in memory handling of the input stream and handle it in your custom function... or; 2) further below is another sample, try to use this as-is...
Github source modify with with inputstream and issue listed here
private def uploadFile(is: InputStream, s3ObjectName: String, metadata: ObjectMetadata) = {
try {
val putObjectRequest = new PutObjectRequest(bucketName, s3ObjectName,
is, metadata)
// TransferManager supports asynchronous uploads and downloads
val upload = transferManager.upload(putObjectRequest)
upload.addProgressListener(ExceptionReporter.wrap(UploadProgressListener(putObjectRequest)))
} catch {
case e: Exception => throw new RuntimeException(e)
}
}
Bonus, Nice custom answer here using sequence input streams
public void combineFiles() {
List<String> files = getFiles();
long totalFileSize = files.stream()
.map(this::getContentLength)
.reduce(0L, (f, s) -> f + s);
try {
try (InputStream partialFile = new SequenceInputStream(getInputStreamEnumeration(files))) {
ObjectMetadata resultFileMetadata = new ObjectMetadata();
resultFileMetadata.setContentLength(totalFileSize);
s3Client.putObject("bucketName", "resultFilePath", partialFile, resultFileMetadata);
}
} catch (IOException e) {
LOG.error("An error occurred while combining files. {}", e);
}
}
private Enumeration<? extends InputStream> getInputStreamEnumeration(List<String> files) {
return new Enumeration<InputStream>() {
private Iterator<String> fileNamesIterator = files.iterator();
#Override
public boolean hasMoreElements() {
return fileNamesIterator.hasNext();
}
#Override
public InputStream nextElement() {
try {
return new FileInputStream(Paths.get(fileNamesIterator.next()).toFile());
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
throw new RuntimeException(e);
}
}
};
}

Related

How to transmit when data is ready through a rest call with Spring Boot?

I have an ssh manager to execute (bash) scripts on a server. It contains a commandWithContinousRead(String command, Consumer<String> consumer). Whenever an echo is called in the bash script it is consumed by the consumer. I want to extend this with Spring Boot and an HTTP call. When a client sends a request, the server streams the data when it's ready from a bash script and the client can print it out.
I know Server-Sent Events, however, I feel like that is mostly for events and usually uses multiple resources on an API.
Additionally, I tried searching for streaming topics, but had no success. I did find StreamingResponseBody from Spring, but it collects all the data and then sends it all at once.
I used Postman for testing, maybe it cannot handle streaming?
However, how do I test this?
Example:
#/bin/bash
# Scriptname: stream-this.sh
echo "Starting line"
sleep 4
echo "Middle line"
sleep 4
echo "End line"
Request with commandWithContinousRead, but prints everything at once after eight seconds.
#RequestMapping(value = "/stream-this", method = RequestMethod.POST,
produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public ???? streamScript() {
StreamingResponseBody stream = out -> {
sshManager.commandWithContinousRead("bash /scripts/stream-this.sh", echo -> {
try {
byte[] bytes = echo.getBytes(StandardCharsets.UTF_8);
out.write(bytes);
System.out.println(echo);
} catch (IOException e) {
e.printStackTrace();
}
});
};
return new ResponseEntity<>(stream, HttpStatus.OK);
}
Implementation of commandWithContinousRead function.
public void commandWithContinousRead(String command, Consumer<String> consumer) {
SSHClient client = buildClient();
try (Session session = client.startSession()) {
Session.Command cmd = session.exec(command);
BufferedReader br = new BufferedReader(new InputStreamReader(cmd.getInputStream(), StandardCharsets.UTF_8));
String line;
while ((line = br.readLine()) != null) {
consumer.accept(line);
}
br.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
client.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Now that you have posted the commandWithContinuousRead method, everything looks correct. Also, you've just now stated that you're testing with Postman, and that's definitely a problem -- postman doesn't support streaming responses
https://github.com/postmanlabs/postman-app-support/issues/5040
It's always a good idea to programmatically unit and integration test your code. A simple unit test doesn't even need to use Spring, or a real SSH connection (run the bash script local to the test). The unit test would just be testing the logic of your Consumer and would let you know that the reading of the output, and the bash script itself aren't blocking. Ideally, you would use junit, but here's a simple test class that I put together that shows what I mean.
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.lang.Process;
import java.nio.charset.StandardCharsets;
import java.util.function.Consumer;
public class Test {
// This would be a #Test instead of a main
public static void main(String... args) {
commandWithContinousRead("bash stream-this.sh", echo -> {
byte[] bytes = echo.getBytes(StandardCharsets.UTF_8);
// assert statements go here
System.out.println("In main -- " + echo);
});
}
public static void commandWithContinousRead(String command, Consumer<String> consumer) {
try {
Process process = Runtime.getRuntime().exec(command);
BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while ((line = br.readLine()) != null) {
consumer.accept(line);
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
An integration test would actually setup Spring, and would go through the endpoint, thereby testing in the same manner that the client/browser would. Commonly, this is done using #WebMvcTest and mockMvc async. You could choose to either mock the SSH client, or to have a server setup explicitly so your actual SSH client can connect to it. (The second option would expose/eliminate issues related to the ssh connection). This kind of test would expose issues with the spring setup/streaming response. You would need to set an artificial timeout on your mock mvc after say, 5 seconds and using a new mock mvc, after 9 seconds That would allow you to see that after 5 seconds, you've received the first echo, and after 9, you have the whole expected response. A good starting point for you would be to look at https://www.tabnine.com/code/java/methods/org.springframework.test.web.servlet.result.RequestResultMatchers/asyncStarted
Having passed those two levels of tests, then you would begin to suspect the client, which in this case, is Postman. If possible, try to use the actual browser(s) or clients that will be running your code. It may turn out that streaming might not be an option for you.
Please post the implementation of commandWithContinousRead
It could be a fundamental problem where the script that is echoing and sleeping is running on the same thread as the code that is supposed to read the echo and print them out. I.e., you're blocking while you wait for the bash script itself to run which would explain the 8 second delay before getting any output. Also, what type does commandWithContinousRead return? Depending, on how you're "reading" the echos in that method, you could be blocking there too. It's hard to say with 100% certainty without seeing the code for commandWithContinousRead.
Your return type will be a ResponseEntity<StreamingResponseBody> (to fill in the ????)
Okay, I came up with a solution that worked. As Pickled Brain mentioned, the main problem was Postman not working with streaming. Also, I went back to try SSE in a single call and I did by running the bash script in another thread. Additionally, I created an SSE client in Nodejs for testing purposes and it worked flawlessly.
Function to run the script, and place it in another thread.
private SseEmitter runScript() {
SseEmitter emitter = new SseEmitter(-1L); // -1L = no timeout
ExecutorService sseMvcExecutor = Executors.newSingleThreadExecutor();
sseMvcExecutor.execute(() -> {
try {
shellManager.commandWithContinousRead("bash scriptname"), s -> {
SseEmitter.SseEventBuilder event = SseEmitter.event().name("message").data(s);
try {
emitter.send(event);
System.out.println(s);
} catch (IOException e) {
e.printStackTrace();
}
});
emitter.send(SseEmitter.event().name("close").data(""));
} catch (IOException e) {
e.printStackTrace();
}
emitter.complete();
});
return emitter;
}
SSE Client:
const EventSource = require('eventsource'); // npm install eventsource
const url = 'yoururl';
var es = new EventSource(url);
es.onopen = function(ev) {
console.log("OPEN");
console.log(ev);
};
es.onmessage = function(ev) {
console.log("MESSAGE");
console.log(ev.data);
};
es.addEventListener('close', function() {
es.close();
console.log('closing!');
});
es.onerror = function(ev) {
console.log("ERROR");
console.log(ev);
es.close();
};
process.on('SIGINT', () => {
es.close();
console.log(es.CLOSED);
});

How to read and write files in a reactive way using InputStreamand OutputStream

I am trying to read an Excel file in manipulate it or add new data to it and write it back out. I am also trying to do this a complete reactive process using Flux and Mono. The Idea is to return the resulting file or bytearray via a webservice.
My question is how do I get a InputStream and OutputStream in a non blocking way?
I am using the Apache Poi library to read and generate the Excel File.
I currently have a solution based around a mix of Mono.fromCallable() and Blocking code getting the Input Stream.
For example the webservice part is as follows.
#GetMapping(value = API_BASE_PATH + "/download", produces = "application/vnd.ms-excel")
public Mono<ByteArrayResource> download() {
Flux<TimeKeepingEntry> createExcel = excelExport.createDocument(false);
return createExcel.then(Mono.fromCallable(() -> {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
excelExport.getWb().write(outputStream);
return new ByteArrayResource(outputStream.toByteArray());
}).subscribeOn(Schedulers.elastic()));
}
And the Processing of the file:
public Flux<TimeKeepingEntry> createDocument(boolean all) {
Flux<TimeKeepingEntry> entries = null;
try {
InputStream inputStream = new ClassPathResource("Timesheet Template.xlsx").getInputStream();
wb = WorkbookFactory.create(inputStream);
Sheet sheet = wb.getSheetAt(0);
log.info("Created document");
if (all) {
//all entries
} else {
entries = service.findByMonth(currentMonthName).log("Excel Export - retrievedMonths").sort(Comparator.comparing(TimeKeepingEntry::getDateOfMonth)).doOnNext(timeKeepingEntry-> {
this.populateEntry(sheet, timeKeepingEntry);
});
}
} catch (IOException e) {
log.error("Error Importing File", e);
}
return entries;
}
This works well enough but not very in line with Flux and Mono. Some guidance here would be good. I would prefer to have the whole sequence non-blocking.
Unfortunately the WorkbookFactory.create() operation is blocking, so you have to perform that operation using imperative code. However fetching each timeKeepingEntry can be done reactively. Your code would looks something like this:
public Flux<TimeKeepingEntry> createDocument() {
return Flux.generate(
this::getWorkbookSheet,
(sheet, sink) -> {
sink.next(getNextTimeKeepingEntryFrom(sheet));
},
this::closeWorkbook);
}
This will keep the workbook in memory, but will fetch each entry on demand when the elements of the Flux are requested.

How to extract and manipulate data within a Nifi processor

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.
The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

Multiple connections on the controller service (Spring)

I have written a controller which takes as a input the domain name , crawls the whole site and gives back the result in JSON format
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com
This gives the data google
http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com
This gives data for yahoo
If I try to run these two URL's simultaneously, I see that I am getting the mixed data, and the results of one is affecting the another, even though I try to hit them from different machines.
Here is my controller
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
List<URLdata> urldata = null;
Gson gson = new Gson();
String json;
urldata = crawlService.crawlURL("http://"+domain);
json = gson.toJson(urldata);
return json;
}
What do I need to do modify to allow many multiple independent connections.
Update
Following is my crawl Service
public List<URLdata> crawlURL(String domain) throws HttpStatusException, SQLException, IOException{
testDomain = domain;
urlList.clear();
urlMap.clear();
urldata.clear();
urlList.add(testDomain);
processPage(testDomain);
//Get all pages
for(int i = 1; i < urlList.size(); i++){
if(urlList.size()>=500){
break;
}
processPage(urlList.get(i));
//System.out.println(urlList.get(i));
}
//Calculate Time
for(int i = 0; i < urlList.size(); i++){
getTitleAndMeta(urlList.get(i));
}
return urldata;
}
public static void processPage(String URL) throws SQLException, IOException, HttpStatusException{
//get useful information
try{
Connection.Response response = Jsoup.connect(URL)
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.timeout(10000)
.execute();
Document doc = response.parse();
//get all links and recursively call the processPage method
Elements questions = doc.select("a[href]");
for(Element link: questions){
String linkName = link.attr("abs:href");
if(linkName.contains(testDomain.replaceAll("http://www.", ""))){
if(linkName.contains("#")){
linkName = linkName.substring(0, linkName.indexOf("#"));
}
if(linkName.contains("?")){
linkName = linkName.substring(0, linkName.indexOf("?"));
}
if(!urlList.contains(linkName) && urlList.size() <= 500){
urlList.add(linkName);
}
}
}
}
catch(HttpStatusException e){
System.out.println(e);
}
catch(SocketTimeoutException e){
System.out.println(e);
}
catch(UnsupportedMimeTypeException e){
System.out.println(e);
}
catch(UnknownHostException e){
System.out.println(e);
}
catch(MalformedURLException e){
System.out.println(e);
}
}
Each of your requests (http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.google.com and http://crawlmysite-tgugnani.rhcloud.com/getUrlCrawlData/www.yahoo.com) is processed in a separate thread. You have two instances of the crawlURL() method working simultaneously, but both methods use the same variables (testDomain, urlList, urlMap and urldata). So they mess up each other's data in these variables.
One way to fix the problem is to declare these variables locally (inside the method). This way, new instances of these variables will be created for each invocation of crawlURL(). Alternatively, you can create a new instance of your CrawlService class for each invocation of the crawlURL() method.
Synchronizing threads would be a bad idea here because one requests will wait for another to complete before it can be processed by crawlURL().
As far as SpringMVC is concerned every request running in separate thread. So I think problem is in crawlService which, I suppose, is not stateless (singleton-like). Try to create new crawl service for every request and check if your data is not mixed. If creating crawl service is expensive operation you should rewrite it to work in stateless way.
#RequestMapping("/getUrlCrawlData/{domain:.+}")
#ResponseBody
public String registerContact(#PathVariable("domain") String domain) throws HttpStatusException, SQLException, IOException {
Gson gson = new Gson();
List<URLdata> = new CrawlService().crawlURL("http://"+domain);
return gson.toJson(urldata);
}
I think
urldata = crawlService.crawlURL("http://"+domain);
This call to crawl Service is the one which is affected by Multiple requests coming simultaneously.
check whether crawlService is safe from multithreading.
ie check whether crawlURL() method is synchronized , if not make it synchronized.
or else synchronize the block of calling crawlservice inside controller.

How to send email with attachments

I want to send an email with an image attached with it. I am using spring 3 with velocity templates. I am able to do that but for some reasons when I add an extension with the image name I don't get the email delivered.
Following is the code I am using for it:
private MimeMessage createEmail(Application application, String templatePath, String subject, String toEmail, String fromEmail, String fromName) {
MimeMessage mimeMsg = mailSender.createMimeMessage();
Map<String, Object> model = new HashMap<String, Object>();
model.put("application", application);
String text = VelocityEngineUtils.mergeTemplateIntoString(velocityEngine, templatePath, model);
text = text.replaceAll("\n", "<br>");
try {
MimeMessageHelper helper = new MimeMessageHelper(mimeMsg, true);
helper.setSubject(subject);
helper.setTo(toEmail);
if (fromName == null) {
helper.setFrom(fromEmail);
} else {
try {
helper.setFrom(fromEmail, fromName);
} catch (UnsupportedEncodingException e) {
helper.setFrom(fromEmail);
}
}
helper.setSentDate(application.getDateCreated());
helper.setText(text, true);
InputStream inputStream = servletContext.getResourceAsStream("images/formstack1.jpg");
helper.addAttachment("formstack1", new ByteArrayResource(IOUtils.toByteArray(inputStream)));
} catch (MessagingException e) {
throw new RuntimeException(e);
}
catch (IOException e) {
throw new RuntimeException(e);
}
return mimeMsg;
}
Using the code above I could add formstack1 as attachment but it has no extension so I don't get the formstack1.jpg image file. But when I use formstack1.jpg for the name of resource to be attached in helper.addAttachment("formstack1", new ByteArrayResource(IOUtils.toByteArray(inputStream))); as formstack1 changed to formstack1.jpg I don't get even the email delivered. I am using smtp.gmail.com and 25 for port. I do get the email sent successfully message on the console though. But the email
is never delivered.
EDIT: If I keep it like helper.addAttachment("formstack1", new ByteArrayResource(IOUtils.toByteArray(inputStream))); and change the extension from nothing to .jpg while downloading the attached image I do get the desired image.
Could someone help me understand why is it happening and how send email with 1 or more attachments using spring 3.
Thanks.
You should better use Apache Commons HtmlEMail
http://commons.apache.org/email/userguide.html

Resources