How can I know spark-core version? - hadoop

I am using spark 1.5.2 in hdp, and version for hadoop is 2.7.1.2.3.4.7-4. When I attempt to add jars in maven pom file like this
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
I dont know where to find version for spark-core. There are 2.11 and 2.10.
any help is appreciated.

That version you are mentioning denotes which version of Scala you want to use for the spark-core.
You need to check Scala's version on your cluster to know if it's 2.10 that you need or 2.11.

Related

Where should I get the maven dependencies when migrating a mapreduce project from hdp to bigtop?

I am migrating a map-reduce java-project (built using maven) from Horton Works to Big Top.
I am trying to figure out what is the best way to ensure that the depedency versions in my java-project match the jar files deployed on the cluster by Big Top.
We are currently targetting Big Top 3.2.0.
I am inspecting their BOM file and using those versions in my pom file.
For example, when we were using hdp I had something like
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2.3.1.4.0-315</version>
</dependency>
According to the Big Top BOM file the Spark Version is 3.2.3 & Scala Library Version is 2.12.13. Does that mean that the new maven depdency in our project pom file should be
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.3</version>
</dependency>
Is there a place where the exact maven dependencies are listed? Is this the correct way to migrate our project's POM file?

Elasticsearch plugin for PySpark 3.1.1

I used Elasticsearch Spark 7.12.0 with PySpark 2.4.5 successfully. Both read and write were perfect. Now, I'm testing the upgrade to Spark 3.1.1, this integration doesn't work anymore. No code change in PySpark between 2.4.5 & 3.1.1.
Is there a compatible plugin? Has anyone got this to work with PySpark 3.1.1?
The error:
Try to use package org.elasticsearch:elasticsearch-spark-30_2.12:7.13.1
The error you're seeing (java.lang.NoClassDefFoundError: scala/Product$class) usually indicates that you are trying to use a package built for an incompatible version of Scala.
If you are using the most recent zip package from Elasticsearch, as of the date of your question, it is still built for Scala v11, as per the conversation here:
https://github.com/elastic/elasticsearch-hadoop/pull/1589
You can confirm the version of Scala used to build your PySpark by doing
spark-submit --version
from the command line. After the Spark logo it will say something like
Using Scala version 2.12.10
You need to take a look at this page:
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
On that page you can see the compatibility matrix.
Elastic gives you some info on "installation" for Hadoop here: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html
For Spark, it provides this:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId>
<version>7.14.0</version>
</dependency>
Now if you're using PySpark, you may be unfamiliar with Maven, so I can appreciate that it's not that helpful to be given the maven dependency.
Here's a minimal way to get maven to get your jar for you, without having to get into the weeds of an unfamiliar tool.
Install maven (apt install maven)
Create a new directory
In that directory, create a file called pom.xml
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>spark-es</groupId>
<artifactId>spark-esj</artifactId>
<version>1</version>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId>
<version>7.14.0</version>
</dependency>
</dependencies>
Save that file and create an additional directory called "targetdir" (it could be called anything)
Then
mvn dependency:copy-dependencies -DoutputDirectory=targetdir
You'll find your jar in targetdir.

How to upgrade Apache POI version in Alfresco 5.2. project?

I have Alfresco 5.2 project with Apache POI version "3.10.1-20151016-alfresco-patched" and I need to update in up to 5.0.0 version.
I have add the section
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>5.0.0</version>
</dependency>
But as I can see the class XWPFTableCell still has an old version: /home/katya3/.m2/repository/org/apache/poi/poi-ooxml/3.10.1-20151016-alfresco-patched/poi-ooxml-3.10.1-20151016-alfresco-patched-sources.jar!/org/apache/poi/xwpf/usermodel/XWPFTableCell.java
Also I can not see the required method setWidth (added in 4.0.0) so the version is still old. Please teach me how to upgrade?
Thank you!
Current Alfresco supported Poi version is 4.1.2. Use this version and do a maven clean and install, it will work.
<dependency.poi.version>4.1.2</dependency.poi.version>

Which libraries are needed for mapreduce on HBase?

I have a very basic question! what libraries I need for MapReduce on HBase?
I know I must use TableMapper and I've got hadoop-client 2.2.0 and hbase-client 0.98.2 from maven but there's no TableMapper in API.
Thank you.
In addition to hbase-client dependency you need same version hbase-server dependency, this will also include the mapreduce libraries you need.
If you're using Maven, you need to add in your pom.xml file:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.1.2</version>
</dependency>
(At the moment of writing last version is 1.1.2)
Good luck!
before hbase 2.0,you need add the follow dependce in the pom.xml
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.1</version>
after hbase 2.0,
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>2.0.0</version>

Choosing dependency version in maven and maven plugin

I have a maven plugin which is using hsqldb 1.8.0.10. In my pom.xml from the plugin, it is declared like this:
<dependency>
<groupId>hsqldb</groupId>
<artifactId>hsqldb</artifactId>
<version>1.8.0.10</version>
</dependency>
But if I run that plugin from another maven project, and that project has a newer version of hsqldb (for instance 1.9.0), how can I configure my plugin that he will use the newest version of hsqldb, without changing it's pom.xml?
And is it possible to do this the other way around as well? If my other maven project uses hsqldb 1.7.0 (for instance), that he will use the 1.8.0.10 version which is specified in the maven plugin itself?
I hope someone can answer my question.
Kind regards,
Walle
Your main question is possible, but it might not work properly if the plugin doesn't work with the newer code for any reason.
A plugin can have it's own personal dependencies section, and will use standard Maven dependency resolution, choosing the highest version requested. So, you can do
<plugin>
<groupId>some.group.id</groupId>
<artifactId>some.artifact.id</artifactId>
<version>someversion</version>
<dependencies>
<dependency>
<groupId>hsqldb</groupId>
<artifactId>hsqldb</artifactId>
<version>1.9.0</version>
</dependency>
</dependencies>
</plugin>
I don't think going the other way around is possible, though.
use properties place holder for the version, say ${hsqldb.version} then declare in different project pom the version you want to put in it

Resources