Java sax parser setup in Maven - maven

I having trouble getting to first base with SAX XML parsing. I'm using Maven for my java coding.I'm trying to parse a mitochondrial haplotree.
I have this dependency in my pom.xml:
<!-- https://mvnrepository.com/artifact/sax/sax -->
<dependency>
<groupId>sax</groupId>
<artifactId>sax</artifactId>
<version>2.0.1</version>
</dependency>
However, in my code I get errors with
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
I've tried several other pom and import options found on the internet, but cannot get the code to recognize the imports.
What am I missing?

Related

how to import a Maven dependency using SBT

I am trying to get embedded-cassandra in my scala/play project which uses sbt instead of maven. (https://github.com/nosan/embedded-cassandra/wiki)
I translated the following maven dependency into sbt.
<!-- Core API -->
<dependency>
<groupId>com.github.nosan</groupId>
<artifactId>embedded-cassandra</artifactId>
<version>2.0.1</version>
</dependency>
<!-- Test Extensions (Spring, JUnit, etc.) -->
<dependency>
<groupId>com.github.nosan</groupId>
<artifactId>embedded-cassandra-test</artifactId>
<version>2.0.1</version>
<scope>test</scope>
</dependency>
SBT conversion
"com.github.nosan"%"embedded-cassandra" % "2.0.1" % "test"
But I am getting compilation error when I try to import embedded-cassandra in my unit test.
import com.github.nosan.embedded.cassandra.Cassandra
error
Error:(7, 12) object github is not a member of package com
import com.github.nosan.embedded.cassandra.Cassandra
What am I doing wrong?
Turns out, the issue was that SBT hadn't downloaded the dependency. I re-imported the project and things worked. I made another change. I removed the % test from the sbt entry though to be honest I don't know if that had any implications.

import of amqp from org.springframework get error

I'm working on existing Scala project which using the spring framework and I need to import org.springframework.amqp but when I tried to build the project I get:
Error:(15, 28) object amqp is not a member of package
org.springframework import org.springframework.amqp
It is really strange since I can see it in the formal website and I can see it in lot of examples in the web.
Any idea what is the problem?
A Maven dependency was missing. This is what I was need to add:
<dependency>
<groupId>org.springframework.amqp</groupId>
<artifactId>spring-amqp</artifactId>
<version>2.1.2.RELEASE</version>
</dependency>

Intellij can’t import dependency import by spring-boot-starter

I want use org.yaml.snake.Yaml in my springboot application. Since the spring-boot-starter has already included the snakeYML. thus I don't need import the org.yaml.snake.Yaml again in my pom.xml file.
when I type import org.yaml.snake.Yaml in source code, eclipse can exactly import it. However, when I use the Intellij it can't import the package.
As I edit the pom.xml and add below:
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.10</version>
</dependency>
It works fine!
I wander is this a bug of intellij?

Why does spark-submit fail to find kafka data source unless --packages is used?

I am trying to integrate Kafka in my Spark app, here is my POM file required entries:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>${spark.stream.kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>${kafka.version}</version>
</dependency>
Corresponding artifact versions are:
<kafka.version>0.10.2.0</kafka.version>
<spark.stream.kafka.version>2.2.0</spark.stream.kafka.version>
I have been scratching my head over:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
I also tried supplying the jar with --jars parameter, however it is not helping. What am I missing here?
Code:
private static void startKafkaConsumerStream() {
Dataset<HttpPackage> ds1 = _spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", getProperty("kafka.bootstrap.servers"))
.option("subscribe", HTTP_FED_VO_TOPIC)
.load() // Getting the error here
.as(Encoders.bean(HttpPackage.class));
ds1.foreach((ForeachFunction<HttpPackage>) req ->System.out.print(req));
}
And _spark is defined as:
_spark = SparkSession
.builder()
.appName(_properties.getProperty("app.name"))
.config("spark.master", _properties.getProperty("master"))
.config("spark.es.nodes", _properties.getProperty("es.hosts"))
.config("spark.es.port", _properties.getProperty("es.port"))
.config("spark.es.index.auto.create", "true")
.config("es.net.http.auth.user", _properties.getProperty("es.net.http.auth.user"))
.config("es.net.http.auth.pass", _properties.getProperty("es.net.http.auth.pass"))
.getOrCreate();
My imports are:
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.spark.api.java.function.ForeachFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
However when I run my code as mentioned here and which is with the package option:
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0
It works
Spark Structured Streaming supports Apache Kafka as the streaming source and sink using the external kafka-0-10-sql module.
kafka-0-10-sql module is not available to Spark applications that are submitted for execution using spark-submit. The module is external and to have it available you should define it as a dependency.
Unless you use kafka-0-10-sql module-specific code in your Spark application you don't have to define the module as a dependency in pom.xml. You simply don't need a compilation dependency on the module since no code uses the module's code. You code against interfaces which is one of the reasons why Spark SQL is so pleasant to use (i.e. it requires very little to code to have fairly sophisticated distributed application).
spark-submit however will require --packages command-line option that you've reported it worked fine.
However when I run my code as mentioned here and which is with the package option:
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0
The reason it worked fine with --packages is that you have to tell Spark infrastructure where to find the definition of kafka format.
That leads us to the other "issue" (or a requirement) to run streaming Spark applications with Kafka. You have to specify the runtime dependency on spark-sql-kafka module.
You specify a runtime dependency using --packages command-line option (that downloads the necessary jars after you spark-submit your Spark application) or creating a so-called uber-jar (or a fat-jar).
That's where pom.xml comes to play (and that's why people offered their help with pom.xml and the module as a dependency).
So, first of all, you have to specify the dependency in pom.xml.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.2.0</version>
</dependency>
And the last but not least, you have to build an uber-jar that you configure in pom.xml using Apache Maven Shade Plugin.
With Apache Maven Shade Plugin you create an Uber JAR that will include all the "infrastructure" for kafka format to work, inside the Spark application jar file. As a matter of fact, the Uber JAR will contain all the necessary runtime dependencies and so you could spark-submit with the jar alone (and no --packages option or similar).
Add below dependency to your pom.xml file.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.2.0</version>
</dependency>
Update your dependencies and versions. Below given dependencies should work fine:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.1.1</version>
</dependency>
PS: Note provided scope in first two dependencies.

Hadoop imports cannot be resolved in eclipse

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
Learning hadoop programming from scratch.
I have written above line of code in eclipse.
it is showing error as "import org.apache.hadoop.io. cannot be resolved.".
I have already added external jar file "Hadoop-mapreduce-client-core-2.7.3,jar"
Is there anything else to add ?
In your all the dependents jar needs to be added, a single jar file wont help.
Try Using Maven.
Dependency is available on below link.
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/1.2.1
1) Convert the project to a maven project if it's not yet.Project>Configure>Convert to maven project.
2)Add this dependency :
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
This should resolve your errors. This worked for me!

Resources