How to upload byte array to S3 bucket in Java? - spring-boot

In a spring boot application I read an image file from a remote service, which returns byte array and in headers I can check what is file extension:
ResponseEntity<byte[]> result = restTemplate.exchange(url, HttpMethod.GET, entity, byte[].class);
Now I want to put this byte array in a S3 bucket in a folder which I decide during run time, for example folder name can base don current timestamp.
I checked AmazonS3 class, but it doesnt seem to have any such API which can help me?
How can this be done?

As per example from documentation:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/examples-s3-objects.html#upload-object
// Put Object. here 'bytes' is byte array.
PutObjectResponse response = s3.putObject(PutObjectRequest.builder().bucket(bucketName).key(filePathLocation).build(),RequestBody.fromBytes(bytes));

You can use the MinIO java S3 client. Here you can find the documentation.
The code will look something like the following one:
MinioClient minioClient =
MinioClient.builder()
.endpoint("https://play.min.io")
.credentials("Q3AM3UQ867SPQQA43P2F", "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG")
.build();
StringBuilder builder = new StringBuilder();
for (int i = 0; i < 1000; i++) {
builder.append(
"Sphinx of black quartz, judge my vow: Used by Adobe InDesign to display font samples. ");
builder.append("(29 letters)\n");
builder.append(
"Jackdaws love my big sphinx of quartz: Similarly, used by Windows XP for some fonts. ");
builder.append("(31 letters)\n");
builder.append(
"Pack my box with five dozen liquor jugs: According to Wikipedia, this one is used on ");
builder.append("NASAs Space Shuttle. (32 letters)\n");
builder.append(
"The quick onyx goblin jumps over the lazy dwarf: Flavor text from an Unhinged Magic Card. ");
builder.append("(39 letters)\n");
builder.append(
"How razorback-jumping frogs can level six piqued gymnasts!: Not going to win any brevity ");
builder.append("awards at 49 letters long, but old-time Mac users may recognize it.\n");
builder.append(
"Cozy lummox gives smart squid who asks for job pen: A 41-letter tester sentence for Mac ");
builder.append("computers after System 7.\n");
builder.append(
"A few others we like: Amazingly few discotheques provide jukeboxes; Now fax quiz Jack! my ");
builder.append("brave ghost pled; Watch Jeopardy!, Alex Trebeks fun TV quiz game.\n");
builder.append("---\n");
// Create a InputStream for object upload.
ByteArrayInputStream bais = new ByteArrayInputStream(builder.toString().getBytes("UTF-8"));
// Create object 'my-objectname' in 'my-bucketname' with content from the input stream.
minioClient.putObject(
PutObjectArgs.builder().bucket("my-bucketname").object("my-objectname").stream(
bais, bais.available(), -1)
.build());
bais.close();
System.out.println("my-objectname is uploaded successfully");
The full code can be found here.

Checkout the AWS JAVA SDK:
Here the getting started section:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/getting-started.html
In order to use in Spring context use the Maven dependency:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/setup-project-maven.html
Uploading an object to S3 Bucket:
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-objects.html#upload-object
import com.amazonaws.AmazonServiceException;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
System.out.format("Uploading %s to S3 bucket %s...\n", file_path, bucket_name);
final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.DEFAULT_REGION).build();
try {
s3.putObject(bucket_name, key_name, new File(file_path));
} catch (AmazonServiceException e) {
System.err.println(e.getErrorMessage());
System.exit(1);

Related

How to upload an image in chunks with client-side streaming gRPC using grpcurl

I have been trying to upload an image in chunks with client side streaming using grpcurl. The service is working without error except that at the server, image data received is 0 bytes.
The command I am using is:
grpcurl -proto image_service.proto -v -d # -plaintext localhost:3010 imageservice.ImageService.UploadImage < out
This link mentions that the chunk data should be base64 encode and so the contents of my out file are:
{"chunk_data": "<base64 encoded image data>"}
This is exactly what I am trying to achieve, but using grpcurl.
Please tell what is wrong in my command and what is the best way to achieve streaming via grpcurl.
I have 2 more questions:
Does gRPC handles the splitting of data into chunks?
How can I first send a meta-data chunk (ImageInfo type) and then the actual image data via grpcurl?
Here is my proto file:
syntax = "proto3";
package imageservice;
import "google/protobuf/wrappers.proto";
option go_package = "...";
service ImageService {
rpc UploadImage(stream UploadImageRequest) returns (UploadImageResponse) {}
}
message UploadImageRequest {
oneof data {
ImageInfo info = 1;
bytes chunk_data = 3;
};
}
message ImageInfo {
string unique_id = 1;
string image_type = 2;
}
message UploadImageResponse {
string url = 1;
}
Interesting question. I've not tried streaming messages with (the excellent) grpcurl.
The documentation does not explain how to do this but this issue shows how to stream using stdin.
I recommend you try it that way first to ensure that works for you.
If it does, then bundling various messages into a file (out) should also work.
Your follow-on questions suggest you're doing this incorrectly.
chunk_data is the result of having split the file into chunks; i.e. each of these base64-encoded strings should be a subset of your overall image file (i.e. a chunk).
your first message should be { "info": "...." }, subsequent messages will be { "chunk_data": "<base64-encoded chunk>" } until EOF.

MVStore Online Back Up

The information in the MVStore docs on backing up a database is a little vague, and I'm not familiar with all the concepts and terminology, so I wanted to see if the approach I came up with makes sense.
I'm a Clojure programmer, so please forgive my Java here:
// db is an MVStore instance
FileStore fs = db.getFileStore();
FileOutputStream fos = java.io.FileOutputStream(pathToBackupFile);
FileChannel outChannel = fos.getChannel();
try {
db.commit();
db.setReuseSpace(false);
ByteBuffer bb = fs.readFully(0, fs.size());
outChannel.write(bb);
}
finally {
outChannel.close();
db.setReuseSpace(true);
}
Here's what it looks like in Clojure in case my Java is bad:
(defn backup-db
[db path-to-backup-file]
(let [fs (.getFileStore db)
backup-file (java.io.FileOutputStream. path-to-backup-file)
out-channel (.getChannel backup-file)]
(try
(.commit db)
(.setReuseSpace db false)
(let [file-contents (.readFully fs 0 (.size fs))]
(.write out-channel file-contents))
(finally
(.close out-channel)
(.setReuseSpace db true)))))
My approach seems to work, but I wanted to make sure I'm not missing anything or see if there's a better way. Thanks!
P.S. I used the H2 tag because MVStore doesn't exist and I don't have enough reputation to create it.
The docs currently say:
The persisted data can be backed up at any time, even during write
operations (online backup). To do that, automatic disk space reuse
needs to be first disabled, so that new data is always appended at the
end of the file. Then, the file can be copied. The file handle is
available to the application. It is recommended to use the utility
class FileChannelInputStream to do this.
The classes FileChannelInputStream and FileChannelOutputStream convert a java.nio.FileChannel into a standard InputStream and OutputStream. There is existing H2 code in BackupCommand.java that shows how to use them. We can improve upon it using Java 9 input.transferTo(output); to copy the data:
public void backup(MVStore s, File backupFile) throws Exception {
try {
s.commit();
s.setReuseSpace(false);
try(RandomAccessFile outFile = new java.io.RandomAccessFile(backupFile, "rw");
FileChannelOutputStream output = new FileChannelOutputStream(outFile.getChannel(), false)){
try(FileChannelInputStream input = new FileChannelInputStream(s.getFileStore().getFile(), false)){
input.transferTo(output);
}
}
} finally {
s.setReuseSpace(true);
}
}
Note that when you create the FileChannelInputStream you have to pass false to tell it to not close the underlying file channel when the stream is closed. If you don't do that it will close the file that your FileStore is trying to use. That code uses try-with-resource syntax to make sure that the output file is properly closed.
In order to try this, I checked out the mvstore code then modified the TestMVStore to add a testBackup() method which is similar to the existing testSimple() code:
private void testBackup() throws Exception {
// write some records like testSimple
String fileName = getBaseDir() + "/" + getTestName();
FileUtils.delete(fileName);
MVStore s = openStore(fileName);
MVMap<Integer, String> m = s.openMap("data");
for (int i = 0; i < 3; i++) {
m.put(i, "hello " + i);
}
// create a backup
String fileNameBackup = getBaseDir() + "/" + getTestName() + ".backup";
FileUtils.delete(fileNameBackup);
backup(s, new File(fileNameBackup));
// this throws if you accidentally close the input channel you get from the store
s.close();
// open the backup and verify
s = openStore(fileNameBackup);
m = s.openMap("data");
for (int i = 0; i < 3; i++) {
assertEquals("hello " + i, m.get(i));
}
s.close();
}
With your example, you are reading into a ByteBuffer which must fit into memory. Using the stream transferTo method uses an internal buffer that is currently (as at Java11) set to 8192 bytes.

AWS multipart upload from inputStream has bad offfset

I am using the Java Amazon AWS SDK to perform some multipart uploads from HDFS to S3. My code is the following:
for (int i = startingPart; currentFilePosition < contentLength ; i++)
{
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - currentFilePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(s3Name)
.withUploadId(currentUploadId)
.withPartNumber(i)
.withFileOffset(currentFilePosition)
.withInputStream(inputStream)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
currentFilePosition += partSize;
inputStream.close();
lastFilePosition = currentFilePosition;
}
However, the uploaded file is not the same as the original one. More specifically, I am testing on a test file, which has about 20 MB. The parts I upload are 5 MB each. At the end of each 5MB part, I see some extra text, which is always 96 characters long.
Even stranger, if I add something stupid to .withFileOffset(), for example,
.withFileOffset(currentFilePosition-34)
the error stays the same. I was expecting to get other characters, but I am getting the EXACT 96 extra characters as if I hadn't modified the line.
Any ideas what might be wrong?
Thanks,
Serban
I figured it out. This came from a stupid assumption on my part. It turns out, the file offset in ".withFileOffset(...)" tells you the offset where to write in the destination file. It doesn't say anything about the source. By opening and closing the stream repeatedly, I am always writing from the beginning of the file, but to a different offset. The solution is to add a seek statement after opening the stream:
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
inputStream.seek(currentFilePosition);

PdfBox: PDF/A-1A to PDF/A-3A

i have the following problem:
i want to transform a PDF/A-1A document to a PDF/A-3A.
The original document is validated by Arobat Reader Pro, so i can asume it is PDF/A-1A conform.
I try to convert the PDF metadata with the following code:
private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");
XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());
PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);
XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");
PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);
pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
return cat;
}
If i try to validate the new file with Acrobat again, i get a validation error:
CIDset in subset font is incomplete (font contains glyphs that are not listed)
if i try to validate the file with this online validator (http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx) it is a valid PDF/A-3A....
am i missing something?
is nobody able to help?
EDIT: Here is the PDF file
This worked for us to be fully PDF/A-3 compliant regarding the CIDset issue:
private void removeCidSet(PDDocumentCatalog catalog) {
COSName cidSet = COSName.getPDFName("CIDSet");
// iterate over all pdf pages
for (Object object : catalog.getAllPages()) {
if (object instanceof PDPage) {
PDPage page = (PDPage) object;
Map<String, PDFont> fonts = page.getResources().getFonts();
Iterator<String> iterator = fonts.keySet().iterator();
// iterate over all fonts
while (iterator.hasNext()) {
PDFont pdFont = fonts.get(iterator.next());
if (pdFont instanceof PDType0Font) {
PDType0Font typedFont = (PDType0Font) pdFont;
if (typedFont.getDescendantFont() instanceof PDCIDFontType2Font) {
PDCIDFontType2Font f = (PDCIDFontType2Font) typedFont.getDescendantFont();
PDFontDescriptor fontDescriptor = f.getFontDescriptor();
if (fontDescriptor instanceof PDFontDescriptorDictionary) {
PDFontDescriptorDictionary fontDict = (PDFontDescriptorDictionary) fontDescriptor;
fontDict.getCOSDictionary().removeItem(cidSet);
}
}
}
}
}
}
}
OK - I think I have an answer on your question from the perspective of the callas and/or Adobe technology (and once more, I'm affiliated with callas and its pdfToolbox technology that is also used inside of Acrobat).
According to my research and the people I consulted, your example PDF document contains a font with a CID character set that is incomplete. Why does pdfToolbox or Acrobat say it's a valid PDF/A-1a file but not a valid PDF/A-3a file? Interesting question:
1) The rules for incomplete CID sets changed between PDF/A-1a and PDF/A-3a. They are stricter in PDF/A-3a.
2) But while in PDF/A-1a a CID set always had to be there, in PDF/A-3a you can have a valid, compliant file, without such a CID set.
So, your PDF file contains a CID set (which makes it valid for PDF/A-1a and A-3a) but while that CID set is fine for A-1a it does not contains all characters to be A-3a compliant.
To test at least part of this theory, I processed your file through pdfToolbox with a fixup entitled "Remove CIDset if incomplete". That correction (as the name implies) removes the CID set from the file but doesn't change anything else. After doing so your file validates as a valid A-3a file.
That leaves the question why the pdftools web site claims this is a valid PDF/A-3a file; according to the people I've spoken to, the result from preflight for this file is correct and there should be an error on this file. So perhaps that's something you need to take up with the pdftools guys (and they possibly with callas to figure out who's finally right).
Feel free to send me a personal message if you want to discuss this further - more discussion on the tools themselves probably becomes off-topic for this public site.

Importing binary data to parse.com

I'm trying to import data to parse.com so I can test my application (I'm new to parse and I've never used json before).
Can you please give me an example of a json file that I can use to import binary files (images) ?
NB : I'm trying to upload my data in bulk directry from the Data Browser. Here is a screencap : i.stack.imgur.com/bw9b4.png
In parse docs i think 2 sections could help you out depend on whether you want to use REST api of the android sdk.
rest api - see section on POST, uploading files that can be upload to parse using REST POST.
SDk - see section on "files"
code for Rest includes following:
use some HttpClient implementation having "ByteArrayEntity" class or something
Map your image to bytearrayEntity and POST it with the correct headers for Mime/Type in httpclient...
case POST:
HttpPost httpPost = new HttpPost(url); //urlends "audio OR "pic"
httpPost.setProtocolVersion(new ProtocolVersion("HTTP", 1,1));
httpPost.setConfig(this.config);
if ( mfile.canRead() ){
FileInputStream fis = new FileInputStream(mfile);
FileChannel fc = fis.getChannel(); // Get the file's size and then map it into memory
int sz = (int)fc.size();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);
data2 = new byte[bb.remaining()];
bb.get(data2);
ByteArrayEntity reqEntity = new ByteArrayEntity(data2);
httpPost.setEntity(reqEntity);
fis.close();
}
,,,
request.addHeader("Content-Type", "image/*") ;
pseudocode for post the runnable to execute the http request
The only binary data allowed to be loaded to parse.com are images. In other cases like files or streams .. etc the most suitable solution is to store a link to the binary data in another dedicated storage for such type of information.

Resources