How to use Evaluators in Java to score on a PMML using org.apache.spark? - apache-spark-mllib

I've implemented the code for scoring on a provided PMML file and a csv data file (Linear Regression) using Spark and Java. For this I've used jpmml-evaluator-spark and spark-mllib_2.11 maven artifacts, and it works fine.
Now, I'm looking at replacing jpmml-evaluator-spark library, which is AGPL licensed, to something similar may be bundled within org-apache-spark (or any other fully open source option)
I don't see Evaluators for scoring on a PMML available in org.apache.spark group of dependencies. Please confirm if this is correct and suggest some alternative.
https://github.com/jpmml/jpmml-evaluator-spark
This is the PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/) and is AGPL.
Also refer to: http://spark.apache.org/docs/latest/ml-guide.html
These suggest that whatever is packaged along with apache spark includes algorithms and model creation and training, but scoring on the model is not available here & has its dependencies included in the jpmml-evaluator-spark only.
import org.apache.spark.ml.Transformer;
import org.apache.spark.sql.Dataset;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.EvaluatorBuilder;
import org.jpmml.evaluator.LoadingModelEvaluatorBuilder;
import org.jpmml.evaluator.spark.TransformerBuilder;
...
...
...
EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder().setLocatable(false)
.setVisitors(new DefaultVisitorBattery()).load(pmmlInputStream);
Evaluator evaluator = evaluatorBuilder.build();
evaluator.verify();
TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator).withLabelCol("Predicted_SpeciesCategory").exploded(true);
Transformer pmmlTransformer = pmmlTransformerBuilder.build();
Dataset<?> resultDataset = pmmlTransformer.transform(csvDataset);
...
...
Maven dependencies:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-evaluator-spark</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-sparkml</artifactId>
<version>1.5.4</version>
</dependency>
This code still has dependency on org.jpmml library, which I wish to remove. Looking for an alternative using org.apache.spark library to achieve similar results.

You could use the PMML4S-Spark to evaluate a PMML model against Spark, for example:
import org.pmml4s.spark.ScoreModel
val model = ScoreModel.fromInputStream(pmmlInputStream)
val resultDataset = model.transform(csvDataset)
If you want to use PMML4S-Spark in Java, it's also easy to use and similar as Scala, for example:
import org.pmml4s.spark.ScoreModel;
import org.apache.spark.sql.Dataset;
ScoreModel model = ScoreModel.fromInputStream(pmmlInputStream);
Dataset<?> resultDataset = model.transform(csvDataset);
BTW, PMML4S-Spark's license is APL 2.0.

My answer might be totally irrelevant to your question, but since I faced an issue and was coming to this question again and again - I don't want others to face it. I reached to the solution some how by going through many stackoverflows...
The problem
I was using spark in java using the old dependency maven code:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator-metro</artifactId>
<version>1.6.3</version>
</dependency>
Which I thought is perfect and will work. But that was not recognizing TransformerBuilder as one of its libraries.
The dependency code given below should solve your problem if your problem is related to TransformerBuilder:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-evaluator-spark</artifactId>
<version>1.3.0</version>
</dependency>
That was it. You're welcome in advance 😉

Related

Error while trying to do POStagging: Error while loading a tagger model (probably missing model file)

I am trying to use StanfordNLP for croatian using windows command prompt. I have downloaded the specific model for this language (hr_set_models) with .pt files.
I have created the .properties file but I get the following message:
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
There is no problem for the tokenizer model and the file hr_set_tagger.pt is in the folder.
I see that in the model folder there is also a file named hr_set.pretrain.pt, I do not know if I should use it in the .properties file.
Thanks in advance!
Bellow is the .properties file I have created.
annotators = tokenize, ssplit, pos, lemma, depparse
# tokenize
tokenize.model = hr_set_models/hr_set_tokenizer.pt
# pos
pos.model = hr_set_models/hr_set_tagger.pt
# lemma
lemma.model = hr_set_models/hr_set_lemmatizer.pt
#depparse
depparse.model = hr_set_models/hr_set_parser.pt
You need to use the full Python system. There are no Java models for Croatian, so you shouldn't be using the Stanford CoreNLP server.
There is more documentation here: https://stanfordnlp.github.io/stanfordnlp/pipeline.html
Try to use
<dependencies>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
<classifier>models</classifier>
</dependency>
</dependencies>

how to add a feature in Nitrogen opendaylight?

I am trying to add some feature to my open daylight project (e.g. l2switch, dlux, rest,...).
I used to edit the features.xml and the pom.xml for add there features in Carbon release. I am currently using Nitrogen release, when adding these dependencies in my features pom.xml file, I am still unable to detect the features when I login to my karaf (using feature:install/list).
<dependency>
<groupId>org.opendaylight.netconf</groupId>
<artifactId>features-restconf</artifactId>
<classifier>features</classifier>
<version>${restconf.version}</version>
<type>xml</type>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.opendaylight.dluxapps</groupId>
<artifactId>features-dluxapps</artifactId>
<classifier>features</classifier>
<version>${dluxapps.version}</version>
<type>xml</type>
<scope>runtime</scope>
</dependency>
am I missing something else? when I try to add repositories,as I previously did in carbon-release. The feature.xml it automatically re-generated and all my editing is removed.
I am using Nitrogen release by defining and -DarchetypeVersion=1.4.0 when generating my maven artifact.
See the upstream configuration management tooling for running-code examples being used constantly in downstreams like OPNFV.
# Configuration of Karaf features to install
file { 'org.apache.karaf.features.cfg':
ensure => file,
path => '/opt/opendaylight/etc/org.apache.karaf.features.cfg',
# Set user:group owners
owner => 'odl',
group => 'odl',
}
$features_csv = join($opendaylight::features, ',')
file_line { 'featuresBoot':
path => '/opt/opendaylight/etc/org.apache.karaf.features.cfg',
line => "featuresBoot=${features_csv}",
match => '^featuresBoot=.*$',
}
puppet-opendaylight, manifests/config.pp, stable/nitrogen
So basically you shouldn't be editing the XML directly, you should edit the configuration that generates the XML. I'm surprised that worked in Carbon.
I recommend directly using upstream configuration management tooling, like puppet-opendaylight or ansible-opendaylight, vs trying to figure out the configuration knobs yourself, duplicating effort. If you're doing a more complex deployment, look at the OPNFV installer scenarios (that build on these ODL tools) vs trying to solve that very hard problem yourself.

Flink JDBCInputFormat cannot find method 'setRowTypeInfo'

I want to use flink-jdbc to get data from mysql。
I have seen an example on Apache flink website
// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer> dbData =
env.createInput(
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
.setDBUrl("jdbc:derby:memory:persons")
.setQuery("select name, age from persons")
.setRowTypeInfo(new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO))
.finish()
);
But when i try to write a demo, i can't find the method 'setRowTypeInfo'.
It was like this
import org.apache.flink.api.common.typeinfo.BasicTypeInfo
import org.apache.flink.api.java.ExecutionEnvironment
import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
import org.apache.flink.api.scala._
/**
* Created by lulijun on 17/7/7.
*/
object FlinkJDBC {
def main(args:Array[String]): Unit = {
val env = ExecutionEnvironment.createLocalEnvironment()
val dbData = env.createInput(
JDBCInputFormat.buildJDBCInputFormat
.setDrivername("com.mysql.jdbc.Driver")
.setDBUrl("XXX")
.setUsername("xxx")
.setPassword("XXX")
.setQuery("select name, age from persons")
.setRowTypeInfo(new Nothing(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO))
.finish)
dbData.print()
env.execute()
}
}
The "setRowTypeInfo" method is always red, and the IDEA prompts
"cannot resolve symbol setRowTypeInfo"
The jar version of flink-jdbc i used is 1.0.0.
<dependencies>
<!-- Use this dependency if you are using the DataSet API -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-jdbc</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.36</version>
</dependency>
</dependencies>
I have searched a lot, and most of the people use the method exactly like the official document, but on one mentioned this problem.
I doubt whether I used the wrong version of flink-jdbc, but I cannot get any information about the right way to use flink-jdbc.
If you know the problem, please teach me.Thank you.
I changed the flink-jdbc version from 1.0.0 to 1.3.0 and the problem solved.
But when I search flink-jdbc on maven websit
https://mvnrepository.com/search?q=flink-jdbc, I can't get the right information in the first few pages, It makes me thought the version of flink-jdbc do not need to be matched with other flink jars.
But the truth is flink-jdbc/1.1.3 use class RowTypeInfo of package api.table, but flink-jdbc/1.3.0 use class RowTypeInfo of package api.java.They have close ties with each other.
We must make sure the version is matched.

ignite-indexing and H2 version

When I use the spatial index module in ignite1.6.0 , I found it depends 1.3.175 version of the H2, but I need to use the 1.4.X h2 version.
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.3.175</version>
<scope>compile</scope>
</dependency>
This method org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing # start will call org.h2.constant.SysProperties and org.h2.util.Utils, in front of the class in the 1.3.176 version of the above have been It does not exist, the latter class is missing serializer variables.
if (SysProperties.serializeJavaObject) {
U.warn(log, "Serialization of Java objects in H2 was enabled.");
SysProperties.serializeJavaObject = false;
}
if (Utils.serializer != null)
U.warn(log, "Custom H2 serialization is already configured, will override.");
Utils.serializer = h2Serializer();
Is there any way to solve it?
Ignite depends on H2 1.3.175 and you can't use any other version. If you already have some code that depends on 1.4, you should isolate Ignite-related code in a separate module in your project. This way different versions of H2 will coextist.

eye.candy.sixties not found?

When I try to run my report, I'm getting this exception:
Chart theme 'eye.candy.sixties' not found.
net.sf.jasperreports.engine.JRRuntimeException: Chart theme 'eye.candy.sixties' not found.
Sure enough, I couldn't find the theme defined anywhere in jasper-4.0.2.jar. What library do I need to get the default ireport chart themes?
I had this problem with charts using the 'aegean' theme in a web application.
I copied the jasperreports-chart-themes-4.x.x.jar eg
jasperreports-server-cp-4.0.0/ireport/ireport/modules/ext/jasperreports-chart-themes-4.0.0.jar
into my WEB-INF/lib and the charts worked.
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports-chart-themes</artifactId>
<version>${jasperReport.version}</version>
</dependency>
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports-fonts</artifactId>
<version>${jasperReport.version}</version>
</dependency>
You would have to build a project and a jar with the themes manually. There doesn't seem to be an easy library you could just include.

Resources