docx4j cant load file with error java.io.FileNotFoundException - spring-boot

in my spring boot project iam using docx4j to load a file from the target folder although the file exists when i use system.out.print("exists) it appears in the console . any solution ? here is the code
public void testDocx4j() throws Docx4JException, FileNotFoundException {
File file = ResourceUtils.getFile("classpath:compare.docx");
if(file.exists()){
System.out.println("exists !!");
}
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
}
i was trying to load the file with docx4j

The following works for me:
import java.io.IOException;
import java.io.InputStream;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.utils.ResourceUtils;
public class LoadAsResource {
public static void main(String[] args) throws Docx4JException, IOException {
InputStream is = ResourceUtils.getResource("sample-docxv2.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
System.out.println(documentPart.getXML());
}
}

Related

malformedURLException : no protocol

I am trying to execute a simple Hadoop program to read the contents of the file and print it on to the console.
I am following Hadoop The definitive guide : URLCat example
I am getting malformed URL Exception : no protocol
When i use -cat with hdfs://localhost/user/training/test.txt i am getting the contents printed out but when i use the same path while executing the jar i am getting the mentioned exception.
I have added static block where it sets the URLStreamHandlerFactory
EDITED :
My Program :
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
// vv URLCat
public class URLCat {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
InputStream in = null;
try {
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
}
}

incompatible types: Object cannot be converted to CoreLabel

I'm trying to use the Stanford tokenizer with the following example from their website:
import java.io.FileReader;
import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
public class TokenizerDemo {
public static void main(String[] args) throws IOException {
for (String arg : args) {
// option #1: By sentence.
DocumentPreprocessor dp = new DocumentPreprocessor(arg);
for (List sentence : dp) {
System.out.println(sentence);
}
// option #2: By token
PTBTokenizer ptbt = new PTBTokenizer(new FileReader(arg),
new CoreLabelTokenFactory(), "");
for (CoreLabel label; ptbt.hasNext(); ) {
label = ptbt.next();
System.out.println(label);
}
}
}
}
and I get the following error when I try to compile it:
TokenizerDemo.java:24: error: incompatible types: Object cannot be converted to CoreLabel
label = ptbt.next();
Does anyone know what the reason might be? In case you are interested, I'm using Java 1.8 and made sure that CLASSPATH contains the jar file.
Try parameterizing the PTBTokenizer class. For example:
PTBTokenizer<CoreLabel> ptbt = new PTBTokenizer<>(new FileReader(arg),
new CoreLabelTokenFactory(), "");

FileCopyUtils Springframework

I want to copy a file using spring FileCopyUtils.
this is the first time I used
I followed a tutorial and I get this exception
package com.sctrcd.multidsdemo.integration.repositories.foo;
import java.io.File;
import java.io.IOException;
import org.springframework.util.FileCopyUtils;
public class CopyTest {
public static void main(String[] args) throws InterruptedException,
IOException {
File source = new File("‪C:\\Users\\Momo Kh\\Desktop\\CV.pdf");
File dest = new File("C:\\Users\\Momo Kh\\Desktop\\Test\\CV.pdf");
FileCopyUtils.copy(source, dest);
}
}
And i have this Exception
Exception in thread "main" java.io.FileNotFoundException: ‪C:\Users\Momo Kh\Desktop\CV.pdf (La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.springframework.util.FileCopyUtils.copy(FileCopyUtils.java:63)
at com.sctrcd.multidsdemo.integration.repositories.foo.CopyTest.main(CopyTest.java:15)
Either you don't have the file or you don't have the necessary priviliges to touch it. Try some directory like C:\\Momo Kh\\CV.pdf instead. Maybe you can't access stuff under user.
This code works (The same as the last with some change)
I think it was a bug
package com.sctrcd.multidsdemo.integration.repositories.foo;
import java.io.File;
import java.io.IOException;
import org.springframework.util.FileCopyUtils;
public class CopyTest {
public static void main(String[] args) throws InterruptedException,
IOException {
File source = new File("C:\\Users\\Momo Kh\\Desktop\\CV.pdf");
File dest = new File("C:\\Users\\Momo Kh\\Desktop\\files\\destfile1.pdf");
long start = System.nanoTime();
long end;
// copy file using Spring FileCopyUtils
start = System.nanoTime();
FileCopyUtils.copy(source, dest);
end = System.nanoTime();
System.out.println("Time taken by Spring FileCopyUtils Copy = " + (end - start));
}
}
And the result
Time taken by Spring FileCopyUtils Copy = 41100377

Hadoop Not Finding Map Class

I am using hadoop-1.2.1 and trying to run a simple RowCount HBase job using ToolRunner. However, no matter what I seem to try, hadoop cannot find the map class. The jar file is being copied correctly into hdfs, but I can't seem to figure out where it is going wrong. Please help!
Here is the code:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HBaseRowCountToolRunnerTest extends Configured implements Tool
{
// What to copy.
public static final String JAR_NAME = "myJar.jar";
public static final String LOCAL_JAR = <path_to_jar> + JAR_NAME;
public static final String REMOTE_JAR = "/tmp/"+JAR_NAME;
public static void main(String[] args) throws Exception
{
Configuration config = HBaseConfiguration.create();
//All connection configs set here -- omitted to post the code
config.set("tmpjars", REMOTE_JAR);
FileSystem dfs = FileSystem.get(config);
System.out.println("pathString = " + (new Path(LOCAL_JAR)).toString() + " \n");
// Copy jar file to remote.
dfs.copyFromLocalFile(new Path(LOCAL_JAR), new Path(REMOTE_JAR));
// Get rid of jar file when we're done.
dfs.deleteOnExit(new Path(REMOTE_JAR));
// Run the job.
System.exit(ToolRunner.run(config, new HBaseRowCountToolRunnerTest(), args));
}
#Override
public int run(String[] args) throws Exception
{
Job job = new RowCountJob(getConf(), "testJob", "myLittleHBaseTable");
return job.waitForCompletion(true) ? 0 : 1;
}
public static class RowCountJob extends Job
{
RowCountJob(Configuration conf, String jobName, String tableName) throws IOException
{
super(conf, RowCountJob.class.getCanonicalName() + "_" + jobName);
setJarByClass(getClass());
Scan scan = new Scan();
scan.setCacheBlocks(false);
scan.setFilter(new FirstKeyOnlyFilter());
setOutputFormatClass(NullOutputFormat.class);
TableMapReduceUtil.initTableMapperJob(tableName, scan,
RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, this);
setNumReduceTasks(0);
}
}//end public static class RowCountJob extends Job
//Mapper that runs the count
//TableMapper -- TableMapper<KEYOUT, VALUEOUT> (*OUT by type)
public static class RowCounterMapper extends TableMapper<ImmutableBytesWritable, Result>
{
//Counter enumeration to count the actual rows
public static enum Counters {ROWS}
/**
* Maps the data.
*
* #param row The current table row key.
* #param values The columns.
* #param context The current context.
* #throws IOException When something is broken with the data.
* #see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN,
* org.apache.hadoop.mapreduce.Mapper.Context)
*/
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException
{
// Count every row containing data times 2, whether it's in qualifiers or values
context.getCounter(Counters.ROWS).increment(2);
}
}//end public static class RowCounterMapper extends TableMapper<ImmutableBytesWritable, Result>
}//end public static void main(String[] args) throws Exception
Ok- I found a workaround to the problem and thought that I would share for all others having similar issues...
As is turns out, I abandoned the tmpjars configuration option and just copied the jar file directed into the DistributedCache from the code itself. Here is what it looks like:
// Copy jar file to remote.
FileSystem dfs = FileSystem.get(conf);
dfs.copyFromLocalFile(new Path(LOCAL_JAR), new Path(REMOTE_JAR));
// Get rid of jar file when we're done.
dfs.deleteOnExit(new Path(REMOTE_JAR));
//Place it in the distributed cache
DistributedCache.addFileToClassPath(new Path(REMOTE_JAR), conf, dfs);
Perhaps it doesn't solve what is going on with tmpjars, but it does work.
I got the same problem today.Finally, I found it was because I forgot to insert the following sentence in the driver class...
job.setJarByClass(HBaseTestDriver.class);

convert MHT files to images

Are there any libraries or APIs available to convert MHT files to images? Can we use Universal Document Converter software to do this? Appreciate any thoughts.
If you really want to do this programatically,
MHT
Archived Web Page. When you save a Web
page as a Web archive in Internet
Explorer, the Web page saves this
information in Multipurpose Internet
Mail Extension HTML (MHTML) format
with a .MHT file extension. All
relative links in the Web page are
remapped and the embedded content is
included in the .MHT file.
you can use the JEditorPane to convert this into an Image
import javax.imageio.ImageIO;
import javax.swing.*;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.beans.PropertyChangeEvent;
import java.beans.PropertyChangeListener;
import java.io.File;
import java.io.IOException;
import java.net.URL;
public class Test {
private static volatile boolean loaded;
public static void main(String[] args) throws IOException {
loaded = false;
URL url = new URL("http://www.google.com");
JEditorPane editorPane = new JEditorPane();
editorPane.addPropertyChangeListener(new PropertyChangeListener() {
public void propertyChange(PropertyChangeEvent evt) {
if (evt.getPropertyName().equals("page")) {
loaded = true;
}
}
});
editorPane.setPage(url);
while (!loaded) {
Thread.yield();
}
File file = new File("out.png");
componentToImage(editorPane, file);
}
public static void componentToImage(Component comp, File file) throws IOException {
Dimension prefSize = comp.getPreferredSize();
System.out.println("prefSize = " + prefSize);
BufferedImage img = new BufferedImage(prefSize.width, comp.getPreferredSize().height,
BufferedImage.TYPE_INT_ARGB);
Graphics graphics = img.getGraphics();
comp.setSize(prefSize);
comp.paint(graphics);
ImageIO.write(img, "png", file);
}
}

Resources