MVStore Online Back Up - h2

The information in the MVStore docs on backing up a database is a little vague, and I'm not familiar with all the concepts and terminology, so I wanted to see if the approach I came up with makes sense.
I'm a Clojure programmer, so please forgive my Java here:
// db is an MVStore instance
FileStore fs = db.getFileStore();
FileOutputStream fos = java.io.FileOutputStream(pathToBackupFile);
FileChannel outChannel = fos.getChannel();
try {
db.commit();
db.setReuseSpace(false);
ByteBuffer bb = fs.readFully(0, fs.size());
outChannel.write(bb);
}
finally {
outChannel.close();
db.setReuseSpace(true);
}
Here's what it looks like in Clojure in case my Java is bad:
(defn backup-db
[db path-to-backup-file]
(let [fs (.getFileStore db)
backup-file (java.io.FileOutputStream. path-to-backup-file)
out-channel (.getChannel backup-file)]
(try
(.commit db)
(.setReuseSpace db false)
(let [file-contents (.readFully fs 0 (.size fs))]
(.write out-channel file-contents))
(finally
(.close out-channel)
(.setReuseSpace db true)))))
My approach seems to work, but I wanted to make sure I'm not missing anything or see if there's a better way. Thanks!
P.S. I used the H2 tag because MVStore doesn't exist and I don't have enough reputation to create it.

The docs currently say:
The persisted data can be backed up at any time, even during write
operations (online backup). To do that, automatic disk space reuse
needs to be first disabled, so that new data is always appended at the
end of the file. Then, the file can be copied. The file handle is
available to the application. It is recommended to use the utility
class FileChannelInputStream to do this.
The classes FileChannelInputStream and FileChannelOutputStream convert a java.nio.FileChannel into a standard InputStream and OutputStream. There is existing H2 code in BackupCommand.java that shows how to use them. We can improve upon it using Java 9 input.transferTo(output); to copy the data:
public void backup(MVStore s, File backupFile) throws Exception {
try {
s.commit();
s.setReuseSpace(false);
try(RandomAccessFile outFile = new java.io.RandomAccessFile(backupFile, "rw");
FileChannelOutputStream output = new FileChannelOutputStream(outFile.getChannel(), false)){
try(FileChannelInputStream input = new FileChannelInputStream(s.getFileStore().getFile(), false)){
input.transferTo(output);
}
}
} finally {
s.setReuseSpace(true);
}
}
Note that when you create the FileChannelInputStream you have to pass false to tell it to not close the underlying file channel when the stream is closed. If you don't do that it will close the file that your FileStore is trying to use. That code uses try-with-resource syntax to make sure that the output file is properly closed.
In order to try this, I checked out the mvstore code then modified the TestMVStore to add a testBackup() method which is similar to the existing testSimple() code:
private void testBackup() throws Exception {
// write some records like testSimple
String fileName = getBaseDir() + "/" + getTestName();
FileUtils.delete(fileName);
MVStore s = openStore(fileName);
MVMap<Integer, String> m = s.openMap("data");
for (int i = 0; i < 3; i++) {
m.put(i, "hello " + i);
}
// create a backup
String fileNameBackup = getBaseDir() + "/" + getTestName() + ".backup";
FileUtils.delete(fileNameBackup);
backup(s, new File(fileNameBackup));
// this throws if you accidentally close the input channel you get from the store
s.close();
// open the backup and verify
s = openStore(fileNameBackup);
m = s.openMap("data");
for (int i = 0; i < 3; i++) {
assertEquals("hello " + i, m.get(i));
}
s.close();
}
With your example, you are reading into a ByteBuffer which must fit into memory. Using the stream transferTo method uses an internal buffer that is currently (as at Java11) set to 8192 bytes.

Related

How to upload byte array to S3 bucket in Java?

In a spring boot application I read an image file from a remote service, which returns byte array and in headers I can check what is file extension:
ResponseEntity<byte[]> result = restTemplate.exchange(url, HttpMethod.GET, entity, byte[].class);
Now I want to put this byte array in a S3 bucket in a folder which I decide during run time, for example folder name can base don current timestamp.
I checked AmazonS3 class, but it doesnt seem to have any such API which can help me?
How can this be done?
As per example from documentation:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/examples-s3-objects.html#upload-object
// Put Object. here 'bytes' is byte array.
PutObjectResponse response = s3.putObject(PutObjectRequest.builder().bucket(bucketName).key(filePathLocation).build(),RequestBody.fromBytes(bytes));
You can use the MinIO java S3 client. Here you can find the documentation.
The code will look something like the following one:
MinioClient minioClient =
MinioClient.builder()
.endpoint("https://play.min.io")
.credentials("Q3AM3UQ867SPQQA43P2F", "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG")
.build();
StringBuilder builder = new StringBuilder();
for (int i = 0; i < 1000; i++) {
builder.append(
"Sphinx of black quartz, judge my vow: Used by Adobe InDesign to display font samples. ");
builder.append("(29 letters)\n");
builder.append(
"Jackdaws love my big sphinx of quartz: Similarly, used by Windows XP for some fonts. ");
builder.append("(31 letters)\n");
builder.append(
"Pack my box with five dozen liquor jugs: According to Wikipedia, this one is used on ");
builder.append("NASAs Space Shuttle. (32 letters)\n");
builder.append(
"The quick onyx goblin jumps over the lazy dwarf: Flavor text from an Unhinged Magic Card. ");
builder.append("(39 letters)\n");
builder.append(
"How razorback-jumping frogs can level six piqued gymnasts!: Not going to win any brevity ");
builder.append("awards at 49 letters long, but old-time Mac users may recognize it.\n");
builder.append(
"Cozy lummox gives smart squid who asks for job pen: A 41-letter tester sentence for Mac ");
builder.append("computers after System 7.\n");
builder.append(
"A few others we like: Amazingly few discotheques provide jukeboxes; Now fax quiz Jack! my ");
builder.append("brave ghost pled; Watch Jeopardy!, Alex Trebeks fun TV quiz game.\n");
builder.append("---\n");
// Create a InputStream for object upload.
ByteArrayInputStream bais = new ByteArrayInputStream(builder.toString().getBytes("UTF-8"));
// Create object 'my-objectname' in 'my-bucketname' with content from the input stream.
minioClient.putObject(
PutObjectArgs.builder().bucket("my-bucketname").object("my-objectname").stream(
bais, bais.available(), -1)
.build());
bais.close();
System.out.println("my-objectname is uploaded successfully");
The full code can be found here.
Checkout the AWS JAVA SDK:
Here the getting started section:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/getting-started.html
In order to use in Spring context use the Maven dependency:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/setup-project-maven.html
Uploading an object to S3 Bucket:
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-objects.html#upload-object
import com.amazonaws.AmazonServiceException;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
System.out.format("Uploading %s to S3 bucket %s...\n", file_path, bucket_name);
final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.DEFAULT_REGION).build();
try {
s3.putObject(bucket_name, key_name, new File(file_path));
} catch (AmazonServiceException e) {
System.err.println(e.getErrorMessage());
System.exit(1);

Bulk loading with LoadIncrementalHFiles and subdirectories

I wrote a Spark application that generates HFiles to be used for bulk loading with the LoadIncrementalHFiles command later. As the source data pool is very big, the input files are splitted into iterations that are processed one after the other. Each iteration creates its own HFile directory, so my HDFS structure looks like this:
/user/myuser/map_data/hfiles_0
... /hfiles_1
... /hfiles_2
... /hfiles_3
...
There are about 500 files in this map_data directory, therefore I'm searching for a way to automatically call the LoadIncrementalHFiles function, to process these subdirectories also in iterations later.
The corresponding command would be this:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles_0 mytable
I need to change this into an iterative command, as this command does not work with subdirectories (when I call it with the /user/myuser/map_data directory)!
I tried to use a Java Process instance to execute the command above automatically, but this doesn't seen to do anything (no output to console and also no more rows in my HBase table).
Using the org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles Java class out of my code also doesn't work, it's also not responsing!
Has anybody a working example for me? Or is there a parameter to be able to run the above hbase command on the parent directory? I'm working with HBase 1.1.2 in a Hortonworks Data Platform 2.5 cluster.
EDIT I tried to run the LoadIncrementalHFiles command from a Hadoop client Java application, but I'm getting an exception relating to snappy compression, see Run LoadIncrementalHFiles from Java client
The solution was to split the hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles_0 mytable command into many parts (one per command part), see this Java code snippet:
TreeSet<String> subDirs = getHFileDirectories(new Path(HDFS_PATH), hadoopConf);
for(String hFileDir : subDirs) {
try {
String pathToReadFrom = HDFS_OUTPUT_PATH + "/" + hFileDir;
==> String[] execCode = {"hbase", "org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles", "-Dcreate.table=no", pathToReadFrom, hbaseTableName};
ProcessBuilder pb = new ProcessBuilder(execCode);
pb.redirectErrorStream(true);
final Process p = pb.start();
// Write the output of the Process to the console
new Thread(new Runnable() {
public void run() {
BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = null;
try {
while ((line = input.readLine()) != null)
System.out.println(line);
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
// Wait for the end of the execution
p.waitFor();
...
}

PdfBox: PDF/A-1A to PDF/A-3A

i have the following problem:
i want to transform a PDF/A-1A document to a PDF/A-3A.
The original document is validated by Arobat Reader Pro, so i can asume it is PDF/A-1A conform.
I try to convert the PDF metadata with the following code:
private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");
XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());
PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);
XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");
PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);
pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
return cat;
}
If i try to validate the new file with Acrobat again, i get a validation error:
CIDset in subset font is incomplete (font contains glyphs that are not listed)
if i try to validate the file with this online validator (http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx) it is a valid PDF/A-3A....
am i missing something?
is nobody able to help?
EDIT: Here is the PDF file
This worked for us to be fully PDF/A-3 compliant regarding the CIDset issue:
private void removeCidSet(PDDocumentCatalog catalog) {
COSName cidSet = COSName.getPDFName("CIDSet");
// iterate over all pdf pages
for (Object object : catalog.getAllPages()) {
if (object instanceof PDPage) {
PDPage page = (PDPage) object;
Map<String, PDFont> fonts = page.getResources().getFonts();
Iterator<String> iterator = fonts.keySet().iterator();
// iterate over all fonts
while (iterator.hasNext()) {
PDFont pdFont = fonts.get(iterator.next());
if (pdFont instanceof PDType0Font) {
PDType0Font typedFont = (PDType0Font) pdFont;
if (typedFont.getDescendantFont() instanceof PDCIDFontType2Font) {
PDCIDFontType2Font f = (PDCIDFontType2Font) typedFont.getDescendantFont();
PDFontDescriptor fontDescriptor = f.getFontDescriptor();
if (fontDescriptor instanceof PDFontDescriptorDictionary) {
PDFontDescriptorDictionary fontDict = (PDFontDescriptorDictionary) fontDescriptor;
fontDict.getCOSDictionary().removeItem(cidSet);
}
}
}
}
}
}
}
OK - I think I have an answer on your question from the perspective of the callas and/or Adobe technology (and once more, I'm affiliated with callas and its pdfToolbox technology that is also used inside of Acrobat).
According to my research and the people I consulted, your example PDF document contains a font with a CID character set that is incomplete. Why does pdfToolbox or Acrobat say it's a valid PDF/A-1a file but not a valid PDF/A-3a file? Interesting question:
1) The rules for incomplete CID sets changed between PDF/A-1a and PDF/A-3a. They are stricter in PDF/A-3a.
2) But while in PDF/A-1a a CID set always had to be there, in PDF/A-3a you can have a valid, compliant file, without such a CID set.
So, your PDF file contains a CID set (which makes it valid for PDF/A-1a and A-3a) but while that CID set is fine for A-1a it does not contains all characters to be A-3a compliant.
To test at least part of this theory, I processed your file through pdfToolbox with a fixup entitled "Remove CIDset if incomplete". That correction (as the name implies) removes the CID set from the file but doesn't change anything else. After doing so your file validates as a valid A-3a file.
That leaves the question why the pdftools web site claims this is a valid PDF/A-3a file; according to the people I've spoken to, the result from preflight for this file is correct and there should be an error on this file. So perhaps that's something you need to take up with the pdftools guys (and they possibly with callas to figure out who's finally right).
Feel free to send me a personal message if you want to discuss this further - more discussion on the tools themselves probably becomes off-topic for this public site.

Read both header and body in one call using java

I have a mail reader class which sets the FetchProfile and later does a msg.getContent.
I want to do both reading of header and content in one call, basically download the full mail in one call. Because I have observed msg.getcontent makes a call to the server to get the body/content , if we can download the full mail in one call, a call to the server can be saved.
Is this possible?
The code is similar to this
inbox.open(Folder.READ_ONLY);
/* Get the messages which is unread in the Inbox */
Message messages[] = inbox.search(new FlagTerm(
new Flags(Flag.SEEN), false));
/* Use a suitable FetchProfile */
FetchProfile fp = new FetchProfile();
fp.add(FetchProfile.Item.ENVELOPE);
fp.add(FetchProfile.Item.CONTENT_INFO);
inbox.fetch(messages, fp);
for (int i = 0; i < messages.length; i++) {
System.out.println("MESSAGE #" + (i + 1) + ":");
Message message = messages[i];
**String content = message.getContent();**
System.out.println("Content : " + content);
}
Appreciate any help.
Thanks and Regards
Raaghu.K
If you want the entire message in one call, and don't need to use any of the features of the IMAP protocol, you have two choices:
Use POP3 instead of IMAP.
Use the Message.writeTo method to write the message content to a file or byte array and process it from there, e.g., using the MimeMessage constructor that takes an InputStream. (This makes a local copy of the entire message.)

Error Appending to IsolatedStorageFile

I am having some problems with Isolated file store , I am trying to append to a file, but when I use the code below, I get an error about invalid Arguments on this line
IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append,
FileMode.OpenOrCreate, myStore))
I think it has something to do with the Filemode.Append.. I am trying to append to the file rather than create new
// Obtain the virtual store for the application.
IsolatedStorageFile myStore = IsolatedStorageFile.GetUserStoreForApplication();
// Create a new folder and call it "MyFolder".
myStore.CreateDirectory("Folder");
// Specify the file path and options.
using (var isoFileStream = new IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append, FileMode.OpenOrCreate, myStore))
{
//Write the data
using (var isoFileWriter = new StreamWriter(isoFileStream))
{
isoFileWriter.WriteLine(textBox1.Text);
isoFileWriter.WriteLine(textBox2.Text);
isoFileWriter.WriteLine(textBox3.Text);
}
}
There is no overload that takes two FileModes. It should be
IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append,
FileAccess.Write, myStore));
Important thing to note about FileMode.Append is:
[FileMode.Append] Opens the file if it exists and seeks to the end of the file, or
creates a new file. Append can only be used in conjunction with Write.
Attempting to seek to a position before the end of the file will throw
an IOException and any attempt to read fails and throws an
NotSupportedException.
which is why FileAccess.Write is used.
It looks like you have FileMode.Append, FileMode.OpenOrCreate. That is 2 file modes. The first parameter is be FileMode and the second should be FileAccess.
That should fix your problem.

Resources