HBase client does not work under JBoss AS 7.1 - hadoop

i am having a JBoss application which needs to talk remotely with an HBase server. When using the simple console project the HBase client works perfectly but when deployed in the JBoss server looks like the server is not loading the class org.apache.hadoop.hdfs.web.resources.UserProvider.
Can anyone help with an workaround or with a fix ? ?
Your replies are much appreciated.
Error message
ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/HFPlatformWeb]] (http--0.0.0.0-8080-6) StandardWrapper.Throwable: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.web.resources.UserProvider from ...
List of jars
commons-configuration-1.6.jar
commons-lang-2.5.jar
commons-logging-1.1.1.jar
guava-11.0.2.jar
hadoop-auth-2.0.0-cdh4.4.0.jar
hadoop-common-2.0.0-cdh4.4.0.jar
hadoop-core-2.0.0-mr1-cdh4.4.0.jar
hadoop-hdfs-2.0.0-cdh4.4.0.jar
hbase.jar
log4j-1.2.17.jar
protobuf-java-2.4.0a.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
zookeeper-3.4.5-cdh4.4.0.jar

At least one clue should be in exception trace. It is strange you need hdfs.web.resources at all. Please look at your exception stack from one side and on cloudra JARs from another to see where this class 'lives'.
Do you really have loaded hadoop-hdfs? As far as I remember it is not 'fixed' dependency but rather implementation of some mechanics to handle HDFS scheme.
I'd recommend to upgrade Cloudera cluster to Cloudera 5 environment. Rather big step starting from HBase 0.96.x and Hadoop 2.3.x which is really serious advantage. For me another difference was YARN infrastructure as default MR handler. This seem not to fix your issue but if you don't do it now, you will get this upgrade complexity soon. It starts from HBase being split on sub-components rather than hbase.jar for CDH4. Dependencies look really different.
WARNING: Last point is just my recommendation based on my own experience if your cluster is yet in experimental phase.

Related

How do I programmatically install Maven libraries to a cluster using init scripts?

Have been trying for a while now and Im sure the solution is simple enough, just struggling to find it. Im pretty new so be easy on me..!
Its a requirement to do this using a premade init-script, which is then selected in the UI when configuring the cluster.
I am trying to install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18 to a cluster on Azure Databricks. Following the documentations example (it is installing a postgresql driver) they produce an init script using the following command:
dbutils.fs.put("/databricks/scripts/postgresql-install.sh","""
#!/bin/bash
wget --quiet -O /mnt/driver-daemon/jars/postgresql-42.2.2.jar https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.2/postgresql-42.2.2.jar""", True)```
My question is, what is the /mnt/driver-daemon/jars/postgresql-42.2.2.jar section of this code? And what would I have to do to make this work for my situation?
Many thanks in advance.
/mnt/driver-daemon/jars/postgresql-42.2.2.jar here is the output path where the jar file will be put. But it makes no sense as this jar won't be put into CLASSPATH and won't be found by Spark. Jars need to be put into /databricks/jars/ directory, where they will be picked up by Spark automatically.
But this method with downloading of jars works only for jars without dependencies, and for libraries like EventHubs connector this is not a case - they won't work if dependencies aren't downloaded as well. Instead it's better to use Cluster UI or Libraries API (or Jobs API for jobs) - with these methods, all dependencies will be fetched as well.
P.S. But really, instead of using EventHubs connector, it's better to use Kafka protocol that is supported by EventHubs as well. There are several reasons for that:
It's better from performance standpoint
It's better from stability standpoint
Kafka connector is included into DBR, so you don't need to install anything extra
You can read how to use Spark + EventHubs + Kafka connector in the EventHubs documentation.

Creating thin jar for submitting spark applications

Any insights on how can we use thin jar to submit spark applications?
The scenario is such that if some specific dependency is not present in the classpath of the project or is specific to some distribution cloudera or hortonworks it throws an exception if the appropriate version of jars are not used.
How can we avoid such scenarios?
The only thin jar you can make is one that doesn't compile the Spark core libraries into the JAR. For example Spark SQL and Spark Streaming don't need included, but unless Spark was compiled with Hive support during installation, you'll still need that one.
You'll need to contact your Hadoop cluster administrators to know what version of Spark is available, how it was built, and what libraries are available in $SPARK_HOME out of the box.
In my experience, I've never ran into a specific dependency to HDP or CDH as I've ran a Spark 2.3 job submitted to YARN fine, while neither vendor officially supports that version. The only thing you need is to match the Spark version with your code, not necessarily Hadoop/YARN/Hive versions. Kafka, Cassandra, other connectors are all extra anyway and they can't be in a thin jar

usergrid 2.0 database setup error

I am trying to get usergrid 2.0 running.
I built the sources and deployed to tomcat. the status shows usergrid is running
when i try to setup database (http://localhost:8080/system/database/setup) it results in an error. "Error migrating Core Persistence"
Error:
{"error":"runtime","timestamp":1234567890,"duration":0,"error_description":"Error migrating Core Persistence","exception":"java.lang.RuntimeException"}
How to resolve this ?
you must be running cassandra 1.2.1* and Elastic Search current version. Also you cannot upgrade from a 1.0 cluster.
There could be a variety of things wrong. Maybe your Cassandra and ElasticSearch instances are not available, maybe you have specified the wrong hostnames/ports for them in your usergrid-deployment.properties file, maybe your properties file is not in the Tomcat classpath (or maybe there you are hitting a bug in Usergrid).
Since you see a RuntimeException, there is probably a stack trace in Tomcat's catalina.out log file that could provide information to help you diagnose the problem.

How to run titan server in embedded tomcat?

I've been looking around for a way to run a titan server in tomcat, but I can't find any information about this.
Anyone that knows how this can be done?
Since you are asking about running "Titan Server" in Tomcat, that really just means how to run Rexster inside of Tomcat. We dropped official support for Tomcat many, many versions ago, but I believe there are still those have it deployed that way which means it is in fact possible. I guess this would also only apply to hosting the Jersey-based REST endpoints and not RexPro.
To get started I would simplify the stack and just get Rexster running in Tomcat. I would search around the gremlin-users mailing list for what people have posted on the topic, but I think that this one is the most relevant:
https://groups.google.com/forum/#!msg/gremlin-users/s0g9Sd_xjSw/LQ3_ugL680cJ
If I remember correctly the key to making things work lies in this Rexster class: RexsterApplicationProvider. Note the class comments with the sample web.xml fragment.
I suspect you just to fire up an instance of Titan with Cassandra etc when Tomcat starts?
If this is the case you can make a InitialListener in your web.xml that starts a Singleton or service that starts up the titan graph connection and then you can use it in your other servlets/whatever code base you have running.

Bare minimum of dependencies to work with HDFS

I need to put some files into HDFS from my client application. I am not planning to schedule a job to hadoop, just need to drop something into HDFS.
Maven dependency on hadoop-core brings a lot of stuff like jersey-core etc, which I don't need at all.
Is there any simple client library to work with HDFS without getting a full stack of hadoop dependencies? What is the minimal set of maven dependencies I can use?
Is webhdfs the only option?
They introduced hadoop-client which is much better then hadoop-core as a client library.

Resources