Getting stack overflow error in hadoop - hadoop

I am getting stack overflow error while accessing haddop file using java code.
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLCat
{
static
{
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception
{
InputStream in = null;
try
{
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
i used eclipse to debug this code then i came to know line
in = new URL(args[0]).openStream();
producing error.
I am runnung this code by passing hadoop file path i.e
hdfs://localhost/user/jay/abc.txt
Exception (pulled from comments) :
Exception in thread "main" java.lang.StackOverflowError
at java.nio.Buffer.<init>(Buffer.java:174)
at java.nio.ByteBuffer.<init>(ByteBuffer.java:259)
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:52)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:237)
at java.lang.StringCoding.encode(StringCoding.java:272)
at java.lang.String.getBytes(String.java:946)
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
.. stack trace truncated ..

1) This is because of the bug in the FSURLStreamHandlerFactory class provided by hadoop. Please note that the bug is fixed in the latest jar which contains this class.
2) This file is located in hadoop-common-2.0.0-cdh4.2.1.jar. To understand the problem completely we have to understand how the java.net.URL class works.
Working of URL object
When we create a new URL using any one of its constructor without passing "URLStreamHandler" (either through passing null for its value or calling constructor which does not take URLStreamHandler object as its parameter) then internally it calls a method called getURLStreamHandler(). This method returns the URLStreamHandler object and sets a member
variable in URL class.
This object knows how to construct a connection of a particular scheme like "http", "file"... and so on. This URLStreamHandler is constructed by the factory called
URLStreamHandlerFactory.
3) In the problem example given above the URLStreamHandlerFactory was set to "FsUrlStreamHandlerFactory" by calling the following static method.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
So when we create a new URL then this "FSUrlStreamHandlerFactory" is used to create the URLStreamHandler object for this new URL by calling its createURLStreamHandler(protocol) method.
This method inturn calls a method called loadFileSystems() of FileSystem class. The loadFileSystems() method invokes the ServiceLoader.load("FileSystem.class") so it tries to read the binary names of the FileSystem implementation classes by searching all META-INF/services/*.FileSystem files of all jar files in classpath and reading its entries.
4) Remember that the each jar is handled as URL object meaning for each jar an URL object is created by the ClassLoader internally. The class loader supplies the URLStreamHandler object
when constructing the URL for these jars so these URLs will not be affected by the "FSUrlStreamHandlerFactory" we set because the URL has already having the "URLStreamHandler". Since we are
dealing with jar files the class loader sets the "URLStreamHandler" as of type "sun.net.www.protocol.jar.Handler".
5) Now inorder to read the entries inside the jar files for the FileSystem implementation classes the "sun.net.www.protocol.jar.Handler" needs to construct the URL object for each entry by
calling the URL constructor without the URLStreamHandler object. Since we already defined the URLStreamHandlerFactory as "FSUrlStreamHandlerFactory" it calls the createURLStreamHandler
(protocol) method which causes to recurse indefinetly and lead to the "StackOverflowException".
This bug is known as the "HADOOP-9041" by the Hadoop committters. The link is https://issues.apache.org/jira/browse/HADOOP-9041.
I know this is somewhat complicated.
So in short the solution to this problem is given below.
1) Use the latest jar hadoop-common-2.0.0-cdh4.2.1.jar which has the fix for this bug
or
2) Put the following statement in the static block before setting the URLStreamHandlerFactory.
static {
FileSystem.getFileSystemClass("file",new Configuration());
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
Note that the first statement inside the static block doesn't depend on FsUrlStreamHandlerFactory now and uses the default handler for file:// to read the file entires in META-INF/services/*.FileSystem files.

I have a workaround.
It would be great if someone more familiar with the current state of the Hadoop world (Jan 2014) would enlighten us and/or explain the behavior.
I encountered the same of StackOverflowError when trying to run URLCat from Haddop The Definitive Guide Third Edition Tom White
I have the problem with Cloudera QuickStart 4.4.0 and 4.3.0
Using both jdk1.6.0_32 and jdk1.6.0_45
The problem occurs during initializion/class loading of org.apache.hadoop.fs.FileSystem underneath java.net.URL
There is some kind of recursive exception handling that is kicking in.
I did the best I could to trace it down.
The path leads to java.util.ServiceLoader which then invokes sun.misc.CompoundEnumeration.nextElement()
Unfortunately, the source for sun.misc.CompoundEnumeration is not included in the jdk src.zip ... perhaps an oversight because it is in java package sun.misc
In an attempt to trigger the error through another execution path I came up with a workaround ...
You can avoid the conditions that lead to StackOverflowError by invoking org.apache.hadoop.fs.FileSystem.getFileSystemClass(String, Configuration) prior to registering the StreamHandlerFactory.
This can be done by modifying the static initialization block (see original listing above):
static {
Configuration conf = new Configuration();
try {
FileSystem.getFileSystemClass("file", conf);
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
};
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
This can also be accomplished by moving the contents of this static block to your main().
I found another reference to this error from Aug 2011 at stackoverflow with FsUrlStreamHandlerFactory
I am quite puzzled that more hadoop newbies have not stumbled onto this problem ... buy the Hadoop book ... download Cloudera QuickStart ... try a very simple example ... FAIL!?
Any insight from more experienced folks would be appreciated.

Related

In the reference implementation of the JAX-RS Whiteboard for OSGi, what calls createWhiteboard(..)?

The reference implementation of OSGi's JAX-RS Whiteboard is called Aries JAX-RS Whiteboard.
My question is, how and when does the factory method for the Whiteboard.class get called?
public static Whiteboard createWhiteboard(
Dictionary<String, ?> configuration) {
return new Whiteboard(configuration);
}
Like, for example, if I drop the jar into an Apache Felix instance?
I searched the whole project for the createWhiteboard symbol, but I did not find anything calling it. I know it is OSGi Runtime that does this, but how, where?
Ok, so I answered my own question.
The Whiteboard.class is called by a separate "activator" class that implements the standardized OSGi callback interface, BundleActivator: CxfJaxrsBundleActivator at line 76. This is analogous to the entry point of a program. Then, at line 105, the runWhiteboard method is called, which abstracts away the call to createWhiteboard use a method that is perhaps way more complicated than it should be starting at line 198.
The main calls in the stack in bottom-up order would be:
createWhiteboard(configuration)
runWhiteboard(bundleContext, configuration)
start(BundleContext bundleContext) throws Exception

Spring 4 Join point to get method argument names and values

I am using Spring 4.3. Is it possible to get method parameter names and values passed to it? I believe this can be done using AOP (before advice) if possible could you please give me a source code.
The following works as expected (Java 8 + Spring 5.0.4 + AspectJ 1.8.13):
#Aspect
#Component
public class SomeAspect {
#Around("#annotation(SomeAnnotation)")
public Object aroundAdvice(ProceedingJoinPoint joinPoint) throws Throwable {
CodeSignature codeSignature = (CodeSignature) joinPoint.getSignature();
System.out.println("First parameter's name: " + codeSignature.getParameterNames()[0]);
System.out.println("First argument's value: " + joinPoint.getArgs()[0]);
return joinPoint.proceed();
}
}
CodeSignature methodSignature = (CodeSignature) joinPoint.getSignature();
String[] sigParamNames = methodSignature.getParameterNames();
You can get method signature arguments names.
Unfortunately, you can't do this. It is a well-known limitation of bytecode - argument names can't be obtained using reflection, as they are not always stored in bytecode.
As workaround, you can add additional annotations like #ParamName(name = "paramName").
So that, you can get params names in the following way:
MethodSignature.getMethod().getParameterAnnotations()
UPDATE
Since Java 8 you can do this
You can obtain the names of the formal parameters of any method or constructor with the method java.lang.reflect.Executable.getParameters. (The classes Method and Constructor extend the class Executable and therefore inherit the method Executable.getParameters.) However, .class files do not store formal parameter names by default. This is because many tools that produce and consume class files may not expect the larger static and dynamic footprint of .class files that contain parameter names. In particular, these tools would have to handle larger .class files, and the Java Virtual Machine (JVM) would use more memory. In addition, some parameter names, such as secret or password, may expose information about security-sensitive methods.
To store formal parameter names in a particular .class file, and thus
enable the Reflection API to retrieve formal parameter names, compile
the source file with the -parameters option to the javac compiler.
https://docs.oracle.com/javase/tutorial/reflect/member/methodparameterreflection.html
In your AOP advice you can use methods of the JoinPoint to get access to methods and their parameters. There are multiple examples online and at stackoverflow.
Get method arguments using spring aop?
For getting arguments: https://docs.jboss.org/jbossaop/docs/2.0.0.GA/docs/aspect-framework/apidocs/org/jboss/aop/joinpoint/MethodInvocation.html#getArguments()
For getting method details: https://docs.jboss.org/jbossaop/docs/2.0.0.GA/docs/aspect-framework/apidocs/org/jboss/aop/joinpoint/MethodInvocation.html#getMethod%28%29

Where to find details for the API of Context in hadoop?

I coded some routine Hadoop MapReduce jobs, and thus call the context.write() method just based on some examples from the given Apache Hadoop source code. But such kinda copy doesn't help me understand the Hadoop API deeper.
Therefore, recently I started to read the Hadoop API document (https://hadoop.apache.org/docs/r2.7.0/api/) more carefully and try to figure out are there any other methods in Context except for context.write(). For instance, in the teragen example, context.getCounter() is used.
But to my surprise, I couldn't find the Context class documentation at all from the link above.
Where can i find the documentation for the Context class in hadoop?
You can start to work out whats going on if you dig into the standard Mapper class source (around line 106).
public abstract class Context
implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
So this is just an abstract class which implements the MapContext interface found here (Javadoc link).
The concrete implementation is MapContextImpl found here.
It looks like the ContextFactory (source) is responsible for creating the different implementations of Context.

Unable to load Alfresco bean

I'm hoping to make some calls to solr using Alfresco's org.alfresco.repo.search.impl.solr.SolrAdminHTTPClient class. However that bean search.solrAdminHTTPCLient does not seem to be accessible to me from the standard application context. Attempting to add a dependency and property reference for my own bean (via xml) has failed as well. Any reason this is not accessible?
public class MyClass extends DeclarativeWebScript implements ApplicationContextAware{
...
SolrAdminHTTPClient adminClient = (SolrAdminHTTPClient) appContext.getBean("search.solrAdminHTTPCLient");
Would like to avoid creating my own clients for standard solr admin queries.
Judging by the folder tree leading to this file, I would say that bean is available in the search SubSystem which means it lives completely in a different context, a child context in fact.
So you need to lookup that context first, before trying to retrieve your bean !
UPDATE: I have done some digging, and I guess that your window to that child context is in this particular bean.
So I think you can do the following :
SwitchableApplicationContextFactory search = (SwitchableApplicationContextFactory)applicationContext.getBean("Search");
ApplicationContext searchCtx = search.getApplicationContext();
SolrAdminHTTPClient adminClient = (SolrAdminHTTPClient) searchCtx.getBean("search.solrAdminHTTPCLient");
A friend from the IRC channel has however suggested an alternative solution:
Set up a seperate ChildApplicationContextFactory for Each and every bean you which to access in your child context, and he suggested you get some inspiration from this.

Which happens first--Dependency Injection from Spring, or the execution of a static block?

I have a class that uses a static block to initialize a static Hashtable. This is done by reading a properties file, parsing the contents of the file, and then setting the appropriate values into the Hashtable.
However, instead of specifying the location of the file, I would like to inject the location instead using Spring, basically to eliminate any hard-coded values in the class. I did see somewhere else that it is in fact possible to inject into a static variable, but that it will involve the use of a non-static setter.
So my question is--will the invocation of the setter happen before the static block is executed, or will the static block execute first before Spring invokes the setter (which will basically cause an exception in my code)?
Thank you!
The static initializer is executed by the classloader as part of loading the class before any code is granted access to the class. Since Spring must instantiate the class--which definitely requires loading the class--before it can call setters on that instance, the static initializer block has already run.

Resources