ClassNotFoundException on TableMapper when I use my own TableInputFormat - hadoop

I am trying to use my own TableInputFormat for a MapReduceJob in the following way
TableMapReduceUtil.initTableMapperJob("mytable",
MyScan,
MyMapper.class,
MyKey.class,
MyValue.class,
myJob,true, MyTableInputFormat.class);
When I run the job, I get a ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableMapper - any idea why ?
if I do not use the last two parameters of initTableMapperJob, then there is no ClassNotFoundException (but obviously that defeats the purpose)
I am struggling on this for few days now.
This is somewhere someone did the same thing Extending Hadoop's TableInputFormat to scan with a prefix used for distribution of timestamp keys, but I am not being able to ask the question on that thread.
I am working on a Cloudera Cluster 4.3.0 with Hadoop 2
Adding the stacktrace error:
java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableMapper at
java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:190) at
java.lang.ClassLoader.loadClass(ClassLoader.java:306) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
java.lang.ClassLoader.loadClass(ClassLoader.java:247) at
java.lang.ClassLoader.defineClass1(Native Method) ....
Thanks a lot for helping
Regards

Please see Overriding TableMapper splits. So I overrode TableMapReduceUtil and added TableMapper.class to the addDependencyJars method. Then I proceeded in the same way
MyTableMapReduceUtil.initTableMapperJob("MyTable", // input table
myScan,
MyMapper.class,
MyKey.class,
MyValue.class,
myJob,
true,CustomSplitTableInputFormat.class);
Where CustomSplitTableInputFormat extends TableInputFormat

Related

No [ManagedType] was found for the key class [nz.cri.gns.mapservice.userdomain.DataUser]

I have two data sources. The moment I add ANY repository to the second datasource, this error comes up for whatever entity the repository used.
Using spring config instead of persistance, and EclipseLink JPA. The strange thing is that is has nearly identical to a working project that was used as template. Different data sources and obviously different tree scanned, but otherwise the config seems setup exactly the same. What is the equivalent spring data config equivalent to exclude-unlisted-classes. I will happily put up code but can anyone give me hint of where I should start looking?
stackdump looks like:
Caused by: java.lang.IllegalArgumentException: No [ManagedType] was found for the key class [nz.cri.gns.mapservice.userdomain.DataUser] in the Metamodel - please verify that the [Managed] class was referenced in persistence.xml using a specific nz.cri.gns.mapservice.userdomain.DataUser property or a global false element.
at org.eclipse.persistence.internal.jpa.metamodel.MetamodelImpl.entityEmbeddableManagedTypeNotFound(MetamodelImpl.java:177)
at org.eclipse.persistence.internal.jpa.metamodel.MetamodelImpl.managedType(MetamodelImpl.java:519)
at org.springframework.data.jpa.repository.support.JpaMetamodelEntityInformation.(JpaMetamodelEntityInformation.java:68)
at org.springframework.data.jpa.repository.support.JpaEntityInformationSupport.getEntityInformation(JpaEntityInformationSupport.java:67)
at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getEntityInformation(JpaRepositoryFactory.java:152)
at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:99)
at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:81)
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:185)
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:251)
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:237)
at org.springframework.data.jpa.repository.support.JpaRepositoryFactoryBean.afterPropertiesSet(JpaRepositoryFactoryBean.java:92)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1637)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574)
Duh! Make sure everything in the SetPackagesToScan is spelled correctly! No errors result from a typo, but classes dont go into the metamodel either.

How to setup two different CacheManager in spring-boot

I'm trying to setup a spring-boot application with two CacheManagers, with code as below:
#SpringBootApplication
#EnableCaching
public class TestApplication {
...
}
#Configuration
public class TestGuavaCacheConfig extends CachingConfigurerSupport {
...
}
#Configuration
public class TestRedisCacheConfig extends CachingConfigurerSupport {
...
}
But when I start the app, it always fails with the following error:
Caused by: java.lang.IllegalStateException: 2 implementations of
CachingConfigurer were found when only 1 was expected. Refactor the
configuration such that CachingConfigurer is implemented only once or
not at all. at
org.springframework.cache.annotation.AbstractCachingConfiguration.setConfigurers(AbstractCachingConfiguration.java:71)
~[spring-context-4.2.4.RELEASE.jar:4.2.4.RELEASE] at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[na:1.8.0_66] at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[na:1.8.0_66] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_66] at java.lang.reflect.Method.invoke(Method.java:497)
~[na:1.8.0_66] at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:654)
~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE] at
org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE] at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:331)
~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE]
... 59 common frames omitted
It seems that spring-boot can't support two CacheManager(s). Is this true ?
TL;DR CachingConfigurer is meant to configure the default cache settings.
This has nothing to do with Spring Boot, that interface (and the related exception) comes straight from Spring Framework.
CachingConfigurer allows you to specify the default CacheManager that your application should be using. As the exception states, you can't have two of them. That does not mean you can't have two cache managers of course.
What are you trying to do exactly? If you want to define two cache managers and use the cacheManager attribute of the #CacheConfig or #Cacheable annotations, then your (only) CacheConfigurer implementation should define the default one and you should create the other like any other bean that you'll reference in the annotation.
If you want to switch from one cache to the other, consider implementing the CacheResolver instead and wrap your CacheManager instances in it. Based on a custom annotation and/or a cache name, you'll be able to return the cache(s) to use with some custom code of yours.

java.lang.NullPointerException while doing sentimental analysis with stanford-nlp API

I am new to stanford-nlp API. I am trying to just sentimental analysis with stanford API but it's throwing exception. please see the below logs.
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.4 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [4.7 sec].
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1.1 sec].
Adding annotator dcoref
Adding annotator sentiment
Exception in thread "main" java.lang.NoClassDefFoundError: org/ejml/simple/SimpleBase
at edu.stanford.nlp.pipeline.SentimentAnnotator.<init> (SentimentAnnotator.java:48)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$14.create(StanfordCoreNLP.java:850)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:262)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
at io.stanford.NLP.findSentiment(NLP.java:30)
at io.stanford.TestStanford.main(TestStanford.java:8)
Caused by: java.lang.ClassNotFoundException: org.ejml.simple.SimpleBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
With the Stanford CoreNLP 3.5.2 distribution there should be a .jar file called ejml-0.23.jar which contains the missing class ; make sure to put this jar in your classpath, in fact you probably want all of the jars that come with Stanford CoreNLP 3.5.2 in your classpath!
What is the code that produces this output? My strong suspicion is that you have not included the "sentiment" annotator in your annotators list, either in the properties file you are using to run the code, or the properties object you have passed into the annotation pipeline. Without running the sentiment annotator, the document will not have the sentiment annotations attached, and will therefore null pointer when trying to retrieve them.

Why does spark throw NotSerializableException org.apache.hadoop.io.NullWritable with sequence files

Why does spark throw NotSerializableException org.apache.hadoop.io.NullWritable with sequence files? My code (very simple):
import org.apache.hadoop.io.{BytesWritable, NullWritable}
sc.sequenceFile[NullWritable, BytesWritable](in).repartition(1000).saveAsSequenceFile(out, None)
The exception
org.apache.spark.SparkException: Job aborted: Task 1.0:66 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.NullWritable
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
So it is possible to read non-serializable types into an RDD - i.e. have an RDD of something that is not serializable (which seems counter intuitive). But once you wish to perform an operation on that RDD that requires the objects to be serializable, like repartition it needs to be serializable. Moreover it turns out that those weird classes SomethingWritable, although invented for the sole perpose of serializing things are not actually serializable :(. So you must map these things to byte arrays and back again:
sc.sequenceFile[NullWritable, BytesWritable](in)
.map(_._2.copyBytes()).repartition(1000)
.map(a => (NullWritable.get(), new BytesWritable(a)))
.saveAsSequenceFile(out, None)
Also see: https://stackoverflow.com/a/22594142/1586965
In spark if you try to use a third party class which is not serializable it throws NotSerializable exception.It's because of the closure property of spark i.e whatever instance variable (which are defined outside the transformation operation) you try to access inside a transformation operation spark tries to serialize it as well as all the dependent classes of that object.

Android Application class method onCreate being called multiple times

i've overloaded the Application class in my android app and i'm using the ACRA report system.
My app looks like ( real source code here ) :
public class MyApplication extends Application
{
#Override
public void onCreate() {
ACRA.init( this );
/*
* Initialize my singletons etc
* ...
* ...
*/
super.onCreate();
}
}
And as far as i know, the Application object should be created only once, so the onCreate method should be called only once.
The problem is, that in my crash reports ( from ACRA ) i have this:
java.lang.RuntimeException: Unable to create service it.evilsocket.myapp.net.N ...
java.lang.RuntimeException: Unable to create service it.evilsocket.myapp.net.NetworkMonitorService: java.lang.RuntimeException: Unable to create application it.evilsocket.myapp.MyApplication: **java.lang.IllegalStateException: ACRA#init called more than once**
at android.app.ActivityThread.handleCreateService(ActivityThread.java:2283)
at android.app.ActivityThread.access$1600(ActivityThread.java:127)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1212)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:4441)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.RuntimeException: Unable to create application it.evilsocket.myapp.MyApplication: java.lang.IllegalStateException: ACRA#init called more than once
at android.app.LoadedApk.makeApplication(LoadedApk.java:495)
at android.app.ActivityThread.handleCreateService(ActivityThread.java:2269)
... 10 more
Caused by: java.lang.IllegalStateException: ACRA#init called more than once
at org.acra.ACRA.init(ACRA.java:118)
at it.evilsocket.myapp.MyApplication.onCreate(MyApplication.java:46)
at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:969)
at android.app.LoadedApk.makeApplication(LoadedApk.java:492)
... 11 more
java.lang.RuntimeException: Unable to create application it.evilsocket.myapp.MyApplication: java.lang.IllegalStateException: ACRA#init called more than once
at android.app.LoadedApk.makeApplication(LoadedApk.java:495)
at android.app.ActivityThread.handleCreateService(ActivityThread.java:2269)
at android.app.ActivityThread.access$1600(ActivityThread.java:127)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1212)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:4441)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.lang.IllegalStateException: ACRA#init called more than once
at org.acra.ACRA.init(ACRA.java:118)
at it.evilsocket.myapp.MyApplication.onCreate(MyApplication.java:46)
at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:969)
at android.app.LoadedApk.makeApplication(LoadedApk.java:492)
... 11 more
java.lang.IllegalStateException: ACRA#init called more than once
at org.acra.ACRA.init(ACRA.java:118)
at it.evilsocket.myapp.MyApplication.onCreate(MyApplication.java:46)
at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:969)
at android.app.LoadedApk.makeApplication(LoadedApk.java:492)
at android.app.ActivityThread.handleCreateService(ActivityThread.java:2269)
at android.app.ActivityThread.access$1600(ActivityThread.java:127)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1212)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:4441)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)
at dalvik.system.NativeStart.main(Native Method)
So it seems like the app onCreate is being called multiple times, any idea on this ?
NOTES:
In my android xml manifest, i did NOT use the
android:process="string" attribute.
Yes, i'm sure that in my initialization routines i'm not accidentally
calling MyApplication.onCreate .
I think that you have additional process in your application. That is why Application.onCreate is called more than once. Look into your manifest file and try to find the activity or service with something like android:process= . This means that activity/service is starting in second Dalvik VM, and that's why another application instance is created.
If you look at the stack trace, it looks like ACRA.init is calling makeApplication. I suspect that there's some sort of code to check if the application has been created already and if not, create it and that it is caused by your calling ACRA.init before super.onCreate. Generally when overriding onCreate methods (whether Application or Activity) it's recommended to call super.onCreate as the first line of your implementation and do your custom stuff afterwards. I'd give that a shot and see if it fixes things.
I'm also seeing this with ACRA 4.4.0 in the wild.
Perhaps something as simple as this under the init method?
if (mApplication != null) {
throw new IllegalStateException("ACRA#init called more than once");
//(return or finish or gracefully exit somehow)
} else {
mApplication = app;
//and then continue with rest of acra init...
Edit: 12/27/12 As a follow up to this, it looks like Kevin has adopted these changes. Details are here: https://github.com/ACRA/acra/commit/cda06f5b803a09e9e7cc7dafae2c65c8fa69b861
I have been looking at ACRA source code recently and think this issue has been addressed in ACRA in two ways:
Instead of onCreate(), now ACRA's doc recommends initializing ACRA
in attachBaseContext which is called before onCreate().
ACRA does have some logic to check if there are other ACRA instances
during the initialization. If yes, ACRA will unregister the existing
reporter and bypass the current reporter by some Proxy treatment.
Check init function in ACRA.kt

Resources