How to fetch data from Hbase table which is running on linux system and java progamme which is run on window Could not locate executable null\bin\ - hadoop

How to fetch data from Hbase table which is running on linux system and java progamme which is run on window Could not locate executable null\bin\
//
this is my code to connect
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.20.129");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "192.168.20.129:60010");

Just add this method , call before connecting.
private static void workaround() {
//workaround for = java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
try {
new File("./bin/winutils.exe").createNewFile();
} catch (IOException e) {
logger.error(e);
}
}

Related

IText keep pfm file open in Ubuntu

We have a web app running on Tomcat/Ubuntu and using iText7.1.8 to generate pdf documents (Invoices). We noticed that our Tomcat crashed many times and then after investigations found that it was iText the problem. Here is the exception
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: 24: Too many open files
at org.apache.tomcat.jni.Socket.accept(Native Method)
at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:992)
at java.lang.Thread.run(Thread.java:745)
When we run this command: sudo ls -l /proc/Tomcat-PID/fd we notice that most of the files opened are with extension .pfm (ex: /usr/share/fonts/type1/gsfonts/n022004l.pfm) and never released. This number continue to increase till reaches the max number of opened files.
Here is the code in Java used to generate the pdf.
public static File convertToPDF(File pdfFile,URL webURL){
InputStream htmlStream=null;
FileOutputStream pdfStream=null;
try {
htmlStream=webURL.openStream();
pdfStream=new FileOutputStream(pdfFile);
ConverterProperties properties = new ConverterProperties();
properties.setFontProvider(new DefaultFontProvider(true, true, true));
HtmlConverter.convertToPdf(htmlStream, pdfStream,properties);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}finally{
try {
if(htmlStream!= null){
htmlStream.close();
}
if(pdfStream!= null){
pdfStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
return pdfFile;
}
Should we use a singleton to avoid multi instance, of this process which generates pdf, and the multiple files opened?
Environment:
Ubuntu 14.04
Tomcat 7.0.52
Java 1.7.0_80-b15
itext 7.1.8
Thank you
Fixed issue.
Use a singleton to get converter properties:
private static ConverterProperties properties;
private static DefaultFontProvider defaultFontProvider;
...
defaultFontProvider= new DefaultFontProvider(true, true, true);
properties.setFontProvider(defaultFontProvider);

Can't connect to remote hdfs using spring hadoop

I'm just starting in Hadoop and I'm trying to read from remote hdfs (remote hdfs in a docker container, accessible from localhost:32783) trough Spring Hadoop but I get the following error:
org.springframework.data.hadoop.HadoopException:
Cannot list resources Failed on local exception:
java.io.EOFException; Host Details : local host is: "user/127.0.1.1";
destination host is: "localhost":32783;
I'm triying to read the file using the following code:
HdfsClient hdfsClient = new HdfsClient();
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:32783");
FileSystem fs = FileSystem.get(conf);
SimplerFileSystem sFs = new SimplerFileSystem(fs);
hdfsClient.setsFs(sFs);
String filePath = "/tmp/tmpTestReadTest.txt";
String output = hdfsClient.readFile(filePath);
What hdfsClient.readFile(filePath) does is the following:
public class HdfsClient {
private SimplerFileSystem sFs;
public String readFile(String filePath) throws IOException {
FSDataInputStream inputStream = this.sFs.open(filePath);
output = getStringFromInputStream(inputStream.getWrappedStream());
inputStream.close();
}
return output;
}
Any guess why I can't read from the remote hdfs? Removing the conf.set("fs.defaultFS","hdfs://localhost:32783"); I can read, but just from local filepath.
I understand that the "hdfs://localhost:32783" is correct because changing it by a random uri gives Connection refused error
There migh be something wrong into my hadoop configuration?
Thank you!

How to get rid of NullPointerException in Flume Interceptor?

I have an interceptor written for Flume code is below:
public Event intercept(Event event) {
byte[] xmlstr = event.getBody();
InputStream instr = new ByteArrayInputStream(xmlstr);
//TransformerFactory factory = TransformerFactory.newInstance(TRANSFORMER_FACTORY_CLASS,TRANSFORMER_FACTORY_CLASS.getClass().getClassLoader());
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("removeNs.xslt"));
Transformer transformer = null;
try {
transformer = factory.newTransformer(xslt);
} catch (TransformerConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Source text = new StreamSource(instr);
OutputStream ostr = new ByteArrayOutputStream();
try {
transformer.transform(text, new StreamResult(ostr));
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
event.setBody(ostr.toString().getBytes());
return event;
}
I'm removing NameSpace from my source xml with removeNs.xslt file. So that I can store that data into HDFS and later put into hive. When my interceptor run it throw below error :
ERROR org.apache.flume.source.jms.JMSSource: Unexpected error processing events
java.lang.NullPointerException
at test.intercepter.App.intercept(App.java:59)
at test.intercepter.App.intercept(App.java:82)
at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:258)
at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:54)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
at java.lang.Thread.run(Thread.java:745)*
Can you suggest me what and where is the problem?
I found the solution. The problem was not anything else than new File("removeNs.xslt"). It was not able to find the location as I's not sure where to keep this file but later I get the flume agent path but as soon as I restart the flume agent it deletes all files which I kept in the flume agent dir. So I changed the code and kept the file material into my java code.

Unable to load OpenNLP sentence model in Hadoop map-reduce job

I'm trying to get OpenNLP integrated into a map-reduce job on Hadoop, starting with some basic sentence splitting. Within the map function, the following code is run:
public AnalysisFile analyze(String content) {
InputStream modelIn = null;
String[] sentences = null;
// references an absolute path to en-sent.bin
logger.info("sentenceModelPath: " + sentenceModelPath);
try {
modelIn = getClass().getResourceAsStream(sentenceModelPath);
SentenceModel model = new SentenceModel(modelIn);
SentenceDetectorME sentenceBreaker = new SentenceDetectorME(model);
sentences = sentenceBreaker.sentDetect(content);
} catch (FileNotFoundException e) {
logger.error("Unable to locate sentence model.");
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
logger.info("number of sentences: " + sentences.length);
<snip>
}
When I run my job, I'm getting an error in the log saying "in must not be null!" (source of class throwing error), which means that somehow I can't open an InputStream to the model. Other tidbits:
I've verified that the model file exists in the location sentenceModelPath refers to.
I've added Maven dependencies for opennlp-maxent:3.0.2-incubating, opennlp-tools:1.5.2-incubating, and opennlp-uima:1.5.2-incubating.
Hadoop is just running on my local machine.
Most of this is boilerplate from the OpenNLP documentation. Is there something I'm missing, either on the Hadoop side or the OpenNLP side, that would cause me to be unable to read from the model?
Your problem is the getClass().getResourceAsStream(sentenceModelPath) line. This will try to load a file from the classpath - neither the file in HDFS nor on the client local file system is part of the classpath at mapper / reducer runtime, so this is why you're seeing the Null error (the getResourceAsStream() returns null if the resource cannot be found).
To get around this you have a number of options:
Amend your code to load the file from HDFS:
modelIn = FileSystem.get(context.getConfiguration()).open(
new Path("/sandbox/corpus-analysis/nlp/en-sent.bin"));
Amend your code to load the file from the local dir, and use the -files GenericOptionsParser option (which copies to file from the local file system to HDFS, and back down to the local directory of the running mapper / reducer):
modelIn = new FileInputStream("en-sent.bin");
Hard-bake the file into the job jar (in the root dir of the jar), and amend your code to include a leading slash:
modelIn = getClass().getResourceAsStream("/en-sent.bin");</li>

Copying files from HDFS to local file system with JAVA

I am trying to copy files from HDFS to local filesystem for preprocessing. The below code should work according to the documentation. Although it doesn't give any error messages and the mapreduce job runs smoothly I can not see any output on my local hard drive. What do you think the problem is? Thanks.
try {
Path phdfs_input = new Path("hdfs://master:54310/user/hduser/conninput/"+value.toString());
Path plocal_input = new Path("/home/hduser/Desktop/"+avlue.toString());
FileSystem fs = FileSystem.get(context.getConfiguration());
fs.copyToLocalFile(phdfs_input, plocal_input);
/* String localoutput_file = "/home/hduser/Destop/output/"+value.toString();
String cmd1[] = {"mafia", "-mfi", ".5", "-ascii", "~/Desktop/"+value.toString(), localoutput_file };
File mafia_dir = new File("/home/hduser/");
ShellCommandExecutor s = new ShellCommandExecutor(cmd1, mafia_dir);*/
} catch (Exception e) {
e.printStackTrace();
}
Try using /user/hduser/conninput/"+value.toString() in the Path constructor instead of providing the master:54310 part. It should figure out master:54310 from the Configuration.

Resources