Setting up GeoServer on GeoMesa HBase on AWS S3 - geoserver

I am running GeoMesa Hbase on AWS S3. I am able to ingest / export data from inside the cluster with geomesa-hbase ingest / export but I am trying to acces the data remotely. I have installed GeoServer (on the same Master node where GeoMesa is running if that is relevant) but I have difficulty with providing GeoServer the correct JARs to acces GeoMesa. I can find the list of JARs that I should provide to GeoServer here but I am not sure how or where to collect them. I have tried using the install-hadoop.sh & install-hbase.sh shell scripts in the /opt/geomesa/bin folder to install the HBase, Hadoop and Zookeeper JARs into GeoServers’ WEB-INF/lib folder, but if I change the Hadoop, Zookeeper & Hbase version in these shell scripts to be the same as the versions running on my cluster it does not find any JARS.
I am running everything on an EMR 6.2.0 release version (which comes with Hadoop 3.2.1, Hbase 2.2.6 and Zookeeper 3.4.14). On top of the cluster I am running GeoMesa 3.0.0-m0 with GeoServer 2.17 but I have also tried GeoMesa 2.4.0 with GeoServer 2.15. I’m fine with switching in either the EMR release version or GeoMesa/Server if that makes things easier.

For posterity, the setup that worked was:
GeoMesa 3.1.1
GeoServer 2.17.3
Extract the geomesa-hbase-gs-plugin into GeoServer's WEB-INF/lib directory
Run install-dependencies.sh (without modification) from the GeoMesa binary distribution to copy jars into GeoServer's WEB-INF/lib directory
Copy the hbase-site.xml into GeoServer's WEB-INF/classes directory

Related

How to install a jar in databricks using ADF

We are able to install the jar file using the UI method to a particular cluster. But our requirement to install it on all the ondemand clusters in the workspace.
We are using the below shell script to download the jar file to DBFS. Not sure how we can refer/install this jar in all cluster using a global init script
curl https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.12.0/spark-xml_2.12-0.12.0.jar >/dbfs/FileStore/jars/maven/com/databricks/spark_xml_2_12_0_12_0.jar
Any help would be really appreciated!!
There is an alternate solution for adding jar library to the job cluster which is called from Azure data factory while running our job.
In ADF, while calling the notebook we have the option to include the jar directory in DBFS or we can able to give the Maven coordinates.
ADF SETTINGS
In the global init script you can just download this file into /databricks/jars/ directory - then it will be picked up by cluster

Apache Ignite: What are the dependencies of IgniteHadoopIgfsSecondaryFileSystem?

I am trying to setup IGFS with Hadoop as the secondary storage. I have set my configuration as shown here but I keep getting NoClassDefFoundErrors. I have downloaded both binary distributions of Ignite and have tried building from source also but the dependencies are not included. hadoop-common-2.6.0.jar and ignite-hadoop-1.4.0.jar provided some of the dependencies but now I am getting a NoClassDefFoundError for org/apache/hadoop/mapred/JobConf which by my understanding is a deprecated class...
I have been following the instructions on the Apache Ignite website but this is as far as I've gotten.
What dependencies do I need for IgniteHadoopIgfsSecondaryFileSystem as the secondary storage?
It looks like the problem is that Ignite node does not have Hadoop libraries on the classpath. To fix that please try to do the following:
1) use "Hadoop Accelerator" edition of Ignite distribution (use -Dignite.edition=hadoop if you're building the distribution yourself).
2) Set HADOOP_HOME environment variable for the Ignite process if you're using Apache Hadoop distribution, or, if you use another distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop file exists and has appropriate contents.
Alternatively, you can manually add necessary Hadoop dependencies to Ignite node classpath: these are dependencies of groupId "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they are:
hadoop-annotations
hadoop-auth
hadoop-common
hadoop-hdfs
hadoop-mapreduce-client-common
hadoop-mapreduce-client-core
If you don't want to deal with dependency management yourself - which is a real hard thing to do manually - I'd suggest you look at the projects providing orchestration and deployment services for software stacks. Check Apache Bigtop (bigtop.apache.org), that provides pre-cut linux packages for Apache Ignite, Hadoop, HDFS and pretty much anything else in this space. You can grab the latest nightly packages from our CI at http://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages

Configuring hadoop 2.5 in eclipse

I'm trying to configure map-reduce in eclipse indigo with hadoop version 2.5. I downloaded hadoop 2.5 source and added all the libraries in the eclipse project.
While trying to run the project, it is showing following error
Java path and classpath was set properly. Please help me.!!
Configuring cygiwn SSH is mandatory to use eclipse map-reduce?
I am not sure what you are trying to do here. If you are running the application in eclipse as a regular traditional java program the following may help.
Hadoop map reduce programs must run the program using the hadoop jar command usually after using SSH ( PuTTY ) onto the cluster and using TFTP ( FileZila ) to port the .jar file to the cluster.
Usage: hadoop jar <jar> [mainClass] args…
If you want to debug the application use java.util.logging.Logger.

How does zookeeper determine the 'java.library.path' for a hadoop job?

I am running hadoop jobs on a distributed cluster using oozie. I give a setting 'oozie.libpath' for the oozie jobs.
Recently, I have deleted few of my older version jar files from the library path oozie uses and I have replaced them with newer versions. However, when I run my hadoop job, my older version of jar files and newer version of jar files both get loaded and the mapreduce is still using the older version.
I am not sure where zookeeper is loading the jar files from. Are there any default settings that it loads the jar files from ? There is only one library path in my HDFS and it does not have those jar files.
I have found what is going wrong. The wrong jar is shipped with my job file. Should have check here first

Hadoop Eclipse Plugin Erros in Ubuntu

I am trying to build some program on hadoop with ubuntu. I am able to successfully install and run hadoop on my machine in pseudo-distributed mode. But when I tried to use eclipse-plugin for making project,I am facing several issue. After putting parameters for connecting to the server in the eclipse plugin I am getting the following error:
1.Error: java.io.IOException:Unknown Protocol to jobTracker:org.apache.hadoop.hdfs.Protocol.ClientProtocol
I am using hadoop 0.20 version and eclipse plugin is also from the configuration directory. Any suggestion or reason why these errors are coming.And what can I do for build hadoop project on eclipse?
Go to "Edit hadoop location".
Switch Map/Reduce Master port with DFS master port.

Resources