Adding jar via distributed cache hadoop - hadoop

I am adding a .jar file to the class path using Disributed Cache:
DistributedCache.addFileToClassPath(new Path("binary/tools.jar"), job.getConfiguration());
I am not sure whether addFileToClassPath() is the correct API to be used for adding .jar files to the class path. When I try to retrieve the class path from the mapper, I could not see the added jar. The classpath contains the working directory for the job (jobcache dir), but that does not include the jar distributed through Distributed Cache.
Properties prop = System.getProperties();
System.out.println("The classpath is: " + prop.getProperty("java.class.path", null));
I tried addArchiveToClassPath() too.. It did no work..
Am I missing something ?
Thanks,

The problem was was with the path. addFileToClassPath() or addArchiveToClassPath() takes only absolute path as input. binary/tools.jar is relative and hence did not work. I need to specify the path as /user/<username>/binary/tools.jar.. Now it works fine. Even hdfs://<hostname>:port/user/.. fails.
Thank you all..

Is the jar you are adding to the classpath on the local file system, or in HDFS?
DistributedCache expects the path you name to be in HDFS

Related

Give external path in #Value Spring annotation and Resource

In spring boot application how do I give an external windows path using #Value Spring annotation and Resource
The below example works fine that look into resources folder but I want to give the path outside of application like c:\data\sample2.csv
#Value("classPath:/sample2.csv")
private Resource inputResource;
...
#Bean
public FlatFileItemReader<Employee> reader() {
FlatFileItemReader<Employee> itemReader = new FlatFileItemReader<Employee>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setResource(inputResource);
and if I want to get the value from properties file in annotaion, whats the format to put the path in windows?
i tried these, none of them worked:
in code
#Value("${inputfile}")
in properties file:
inputfile="C:\Users\termine\dev\sample2.csv"
inputfile="\\C:\\Users\\termine\\dev\\sample2.csv"
inputfile="C:/Users/termine/dev/sample2.csv"
inputfile="file:\\C:\Users\termine\dev\sample2.csv"
inputfile="file://C://Users//termine///dev//sample2.csv"
When you use classpath spring will try to search with the classpath even if you provide the outside file path.
so instead of using classpath: you can use file:
Ex.
#Value("file:/sample2.csv") //provide full file path if any
Use the key spring.config.location in properties to set the config location. Spring-boot will by default load properties from the locations, with precedence like below :
A /config subdir of the current directory.
The current directory
A classpath /config package
The classpath root
and apart from this when you start the jar or in application.properties you can provide the location of the config file like :
$ java -jar myproject.jar --spring.config.location=classpath:/default.properties,classpath:/override.properties
You can serve static files from the local disk, by making the resource(s) "sample2.csv" as a static resource. An easy way to do this is by adding spring.resources.static-locations configuration to your applicaiton.properties file. Example:
spring.resources.static-locations=file:///C:/Temp/whatever/path/sample2.csv",classpath:/static-files, classpath:/more-static-resource
When I did this in one of the projects, I was able to access the file form the browser using localhost:8080/sample2.csv.

How to run external ruta scripts from a maven project without placing the script or its typesystem in the classpath?

Till now, I had been running ruta scripts from a maven project by creating AnalysisEngine and CAS, and processing the engine. To do this, I had placed all the scripts and descriptor files (Engine & TypeSystem) into scr/main/resources folder of the maven project.
Now I want to place the scripts and TypeSystem files in an external path and pass the path dynamically to my java code that runs the scripts. Is it possible to do it ? If so, how ?
I simply placed the files(script & descriptor) in an external path and passed the new path to instantiate the AnalysisEngine as below;
final AnalysisEngine engine = AnalysisEngineFactory.createEngine("home/admin/Desktop/TEST_ScriptFolder/com/textjuicer/ruta/date/Dazzle_ChapRef_UpdatedEngine");
Error
org.apache.uima.util.InvalidXMLException: An import could not be resolved. No file with name "home/admin/Desktop/TEST_ScriptFolder/com/textjuicer/ruta/date/Dazzle_ChapRef_UpdatedEngine.xml" was found in the class path or data path. (Descriptor: )
at org.apache.uima.resource.metadata.impl.Import_impl.findAbsoluteUrl(Import_impl.java:117)
at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription(AnalysisEngineFactory.java:869)
at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:107)
at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:93)
at ApplyingStyle.XmiTransformer.parseXMI(XmiTransformer.java:33)
at ApplyingStyle.ApplyStyle.applyStyleOnDocx(ApplyStyle.java:76)
There are two layers:
The RutaEngine needs to find the scripts/resources/descriptors
UIMA needs to be able to resolve imports of descriptors
The resource lookup in Ruta has two stages, it searches for them in the absolute paths specified in the configuration parameters. If the resource is not found it searches for it in the classpath. So you need to set the configuration parameters: scripts are located in scriptPaths, descriptors are located in descriptorPaths and wordlists are located in resourcePaths. See the documentation for further information.
The problems with the imports in descriptors can be solved by either setting the datapath in the UIMA ResourceManager or by changing the import to "location" instead of "name". The datapath can be used as a replacement for the classpath. The Ruta descriptos use import by location if it specified int he ruta-maven-plugin.
DISCLAIMER: I am a developer of UIMA Ruta

TOMCAT 7 Access properties file outside WEB INF/classes

So i can access properties file if its in WEB-INF/classes.
However if I keep the same file under TOMCAT/conf and updating catalina.properties to point to the path, I get an error like Name not bound.
I have almost tried everything...even tried with absolute path
if your file.properties file is in the WEB-INF/classes folder? Then:?
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("/file.properties");
Only way I could solve it was
1. Create a env folder under tomcat_home and put the config.properties there
2. update catalina.properties and add the tomcat_home\env path in common loader
3. Comment out from spring security XML
Once I commented the above line It ran fine.

how to give classpath for a property file using spring

Resource resource = new ClassPathResource("classpath:src/main/resources/template/datafields.properties");
Properties props = PropertiesLoaderUtils.loadProperties(resource);
Your problem is that your file is actually not in the application classpath. Looking at your folder paths I am assuming that you have a maven project structure and your properties file is present within resources directory. When your project is compiled, everything inside the resources directory is at the root of the classpath along with your compiled java classes. So you should instead use
Resource resource = new ClassPathResource("template/datafields.properties");
Classpath resource loads resources from the application classpath, so you need to be aware what all directories/jar files are in your classpath and their directory structure to successfully load resources.

How does Configuration.addResource() method work in hadoop

Does Configuration.addResource() method load resource file like ClassLoader of java or it just encapsulates ClassLoader class.Because I find it can not use String like "../resource.xml" as argument of addResource() to load resource file out of classpath, this property is just the same as ClassLoader.
Thx!
Browsing the Javadocs and source code for Configuration, Strings are assumed to be classpaths (line 1162), rather than relative to the file system - you should use URLs to reference files on the local file system as follows:
conf.addResource(new File("../resource.xml").toURI().toURL());

Resources