hadoop-2.6.0 install snappy not ok - hadoop

my hadoop version is "hadoop-2.6.0-cdh5.4.5"
my linux version is "Description: CentOS release 6.5 (Final)"
my hadoop home is "/home/hadoop/hadoop-2.6.0-cdh5.4.5"
I have install snappy in /usr/lib64:
libsnappy.so -> libsnappy.so.1.1.4
libsnappy.so.1 -> libsnappy.so.1.1.4
libsnappy.so.1.1.4
I also finish hadoop-snappy by mvn:
1) copy hadoop-snappy-0.0.1-SNAPSHOT.jar /home/hadoop/hadoop-2.6.0-cdh5.4.5/lib/.
2)cp -r $HADOOP-SNAPPY_CODE_HOME/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 /home/hadoop/hadoop-2.6.0-cdh5.4.5/lib/native/
I ln the lib in /home/hadoop/hadoop-2.6.0-cdh5.4.5/lib/native:
lrwxrwxrwx. 1 root root 23 Jan 17 17:11 libsnappy.so -> /usr/lib64/libsnappy.so
lrwxrwxrwx. 1 root root 25 Jan 17 17:16 libsnappy.so.1 -> /usr/lib64/libsnappy.so.1
I also finish the /home/hadoop/hadoop-2.6.0-cdh5.4.5/etc/hadoop/hadoop.env.sh:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
but when I check the snappy lib:
[hadoop#stormmaster hadoop]$ hadoop checknative
16/01/17 17:52:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
**snappy: false**
lz4: false
bzip2: false
openssl: false
16/01/17 17:52:20 INFO util.ExitUtil: Exiting with status 1
I try to download a hadoop-native-64-2.6.0,and put in /home/hadoop/hadoop-2.6.0-cdh5.4.5/lib/native:
-rw-r--r--. 1 hadoop hadoop 1119486 Jan 15 15:51 libhadoop.a
-rw-r--r--. 1 hadoop hadoop 1486964 Jan 15 15:51 libhadooppipes.a
lrwxrwxrwx. 1 hadoop hadoop 24 Jan 11 15:11 libhadoopsnappy.so -> libhadoopsnappy.so.0.0.1
lrwxrwxrwx. 1 hadoop hadoop 24 Jan 11 15:11 libhadoopsnappy.so.0 -> libhadoopsnappy.so.0.0.1
-rwxr-xr-x. 1 hadoop hadoop 54437 Jan 15 15:51 libhadoopsnappy.so.0.0.1
-rwxr-xr-x. 1 hadoop hadoop 671189 Jan 15 15:51 libhadoop.so
-rwxr-xr-x. 1 hadoop hadoop 671189 Jan 15 15:51 libhadoop.so.1.0.0
-rw-r--r--. 1 hadoop hadoop 581944 Jan 15 15:51 libhadooputils.a
-rw-r--r--. 1 hadoop hadoop 359458 Jan 15 15:51 libhdfs.a
-rwxr-xr-x. 1 hadoop hadoop 228435 Jan 15 15:51 libhdfs.so
-rwxr-xr-x. 1 hadoop hadoop 228435 Jan 15 15:51 libhdfs.so.0.0.0
also hadoop checknative:
16/01/17 17:55:00 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
16/01/17 17:55:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
**hadoop: true /home/hadoop/hadoop-2.6.0-cdh5.4.5/lib/native/libhadoop.so
zlib: true /lib64/libz.so.1
snappy: false
lz4: true revision:99
bzip2: false
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!**
how can I solve the problem ,and use snappy in hadoop,thx!

Related

Not able to configure databricks with external hive metastore

I am following this document https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options
to connect to my external hive metastore. My metastore version is 3.1.0 and followed the document.
docs.databricks.comdocs.databricks.com
External Apache Hive metastore — Databricks Documentation
Learn how to connect to external Apache Hive metastores in Databricks.
10:51
I have getting this error when trying to connect to external hive metastore
org/apache/hadoop/hive/conf/HiveConf when creating Hive client using classpath:
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars
spark.sql.hive.metastore.jars=/databricks/hive_metastore_jars/*
When I do an ls on /databricks/hive_metastore_jars/, I can see all copied files
10:52
Do I need to copy any hive specific files and upload it in this folder?
I did exactly what was mentioned in the site
These are the contents of my hive_metastore_jars
total 56K
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 1585025573715-0
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 d596a6ec-e105-4a6e-af95-df3feffc263d_resources
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 repl
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-2959157d-2018-441a-a7d3-d7cecb8a645f
drwxr-xr-x 4 root root 4.0K Mar 24 05:06 root
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-30a72ee5-304c-432b-9c13-0439511fb0cd
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-a19d167b-d571-4e58-a961-d7f6ced3d52f
-rwxr-xr-x 1 root root 5.5K Mar 24 05:06 _CleanRShell.r3763856699176668909resource.r
-rwxr-xr-x 1 root root 9.7K Mar 24 05:06 _dbutils.r9057087446822479911resource.r
-rwxr-xr-x 1 root root 301 Mar 24 05:06 _rServeScript.r1949348184439973964resource.r
-rwxr-xr-x 1 root root 1.5K Mar 24 05:06 _startR.sh5660449951005543051resource.r
Am I missing anything?
Strangely If I look into the cluster boot logs here is what I get
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionDriverName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionURL unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionUserName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionPassword unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.schema.autoCreateAll unknown - will be ignored
20/03/24 07:29:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
20/03/24 07:29:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
I have already set the above configurations and it shows in the logs as well
20/03/24 07:28:59 INFO SparkContext: Spark configuration:
spark.hadoop.javax.jdo.option.ConnectionDriverName=org.mariadb.jdbc.Driver
spark.hadoop.javax.jdo.option.ConnectionPassword=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionURL=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionUserName=*********(redacted)
Also version information is available in my hive metastore, I can connect to mysql and see it it shows
SCHEMA_VERSION : 3.1.0
VER_ID = 1
From the output, it looks like the jars are not copied to the "/databricks/hive_metastore_jars/" location. As mentioned in the documentation link you shared:
Set spark.sql.hive.metastore.jars set to maven
Restart the cluster with the above configuration and then check in the Spark driver logs for the message :
17/11/18 22:41:19 INFO IsolatedClientLoader: Downloaded metastore jars to <path>
From this location copy the jars to DBFS from the same cluster and then use an init script to copy the jars from DBFS to "/databricks/hive_metastore_jars/"
As I am using azure mysql there is one more step I need to perform
https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

Where to use the HDFS data in Web UI - MapR

I am able to connect to Mapr Control System - MCS - port 8080 (Web - UI) but where can i see the data file i copied from my local files system.??
On the MCS Navigation Pane , I can see the below tabs but i dont see where to navigate to data folder.
**Cluster
Dashboard
Nodes
Node Heatmap
Jobs
MapR-FS
MapR Tables
Volumes
Mirror Volumes
USer Disk USage
Snapshot
NFS HA
NFS Setup
Alarms
...
...
..
system Settings
-..
..
..
HBASE
...
Resource Manager
Job History Server
..**
**Hadoop Command:**
]$ hadoop fs -ls -R /user
drwxr-xr-x - mapr mapr 1 2019-06-25 12:47 /user/hive
drwxrwxr-x - mapr mapr 0 2019-06-25 12:47 /user/hive/warehouse
drwxrwxrwx - mapr mapr 6 2019-06-26 11:34 /user/mapr
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/.sparkStaging
drwxr-xr-x - mapr mapr 0 2019-06-24 15:40 /user/mapr/hadoop
drwxrwxrwx - mapr mapr 1 2019-06-26 11:48 /user/mapr/rajesh
-rwxr-xr-x 3 webload webload 219 2019-06-26 11:48 /user/mapr/rajesh/sample.txt
-rwxr-xr-x 3 mapr mapr 219 2019-06-25 13:46 /user/mapr/sample.txt
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/spark-warehouse
drwxr-xr-x - mapr mapr 1 2019-06-24 13:37 /user/mapr/tmp
drwxrwxrwx - mapr mapr 0 2019-06-24 13:37 /user/mapr/tmp/hive

HDInsight Oozie 4.2.0.2.5 Spark2 Action Jackson collision

I was following the following guide:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/ch_oozie-spark-action.html#spark-config-oozie-spark2
This enabled me to configure the following workflow.xml
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">
<start to = "Raw-To-Parquet" />
<action name="Raw-To-Parquet">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<!--
<prepare>
<delete path="[PATH]"/>
<mkdir path="[PATH]"/>
</prepare>
<job-xml>[SPARK SETTINGS FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
</configuration>
-->
<master>${master}</master>
<!--
<mode>[SPARK MODE]</mode>
-->
<name>Raw-To-Parquet</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>${nameNode}/spark-examples_2.11-2.0.2.2.5.6.3-5.jar</jar>
<spark-opts>--conf spark.yarn.jars=spark2/* --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.3-5 --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.3-5</spark-opts>
<!--SPARK_JAVA_OPTS="-Dhdp.verion=xxx"
<arg>[ARG-VALUE]</arg>
<arg>[ARG-VALUE]</arg>
-->
</spark>
<ok to="End"/>
<error to="Fail"/>
</action>
<kill name = "Fail">
<message>Job failed</message>
</kill>
<end name = "End" />
</workflow-app>
Job.properties
master=yarn-cluster
nameNode=wasb://hdi-adam-ak#hdiadamakstore.blob.core.windows.net
jobTracker=hn1-hdi-ad.hcgue2snotaezkuexzoymd0nlh.ax.internal.cloudapp.net:8088
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/project-example/oozie
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.action.sharelib.for.spark=spark2
Which begins a workflow, but then dies due to a collision with Jackson jars.
18/05/02 12:39:12 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByLookup(BasicSerializerFactory.java:302)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:218)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
See the below to see the contents of my oozie sharelibs
oozie#hn0-hdi-ad:/quantexa/oozie$ hadoop fs -ls -R /user/oozie/share/lib/lib_20180502121937 | grep jackson
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 46983 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-annotations-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 258876 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-core-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1171380 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-databind-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 48418 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-dataformat-cbor-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 41263 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-module-paranamer-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 515604 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-module-scala_2.11-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 40341 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/json4s-jackson_2.11-3.2.11.jar
-rw-r--r-- 1 oozie supergroup 1048110 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/parquet-jackson-1.7.0.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 549415 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-module-scala_2.10-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 39953 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/json4s-jackson_2.10-3.2.10.jar
-rw-r--r-- 1 oozie supergroup 1048110 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/parquet-jackson-1.7.0.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-databind-2.4.4.jar
You can then see the contents of the container which YARN creates:
root#wn0-hdi-ad:/mnt/resource/hadoop/yarn/local/usercache/oozie/appcache# ll application_1525249303830_0045/container_1525249303830_0045_01_000001/
total 1020
drwx--x--- 3 yarn hadoop 20480 May 2 12:38 ./
drwx--x--- 16 yarn hadoop 4096 May 2 12:40 ../
//removed due to word count
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 __app__.jar -> /mnt/resource/hadoop/yarn/local/filecache/261/spark-examples_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-annotations-2.4.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/498/jackson-annotations-2.4.0.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-annotations-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/504/jackson-annotations-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 68 May 2 12:38 jackson-core-2.4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/293/jackson-core-2.4.4.jar*
lrwxrwxrwx 1 yarn hadoop 68 May 2 12:38 jackson-core-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/304/jackson-core-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 jackson-core-asl-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/467/jackson-core-asl-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 72 May 2 12:38 jackson-databind-2.4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/507/jackson-databind-2.4.4.jar*
lrwxrwxrwx 1 yarn hadoop 72 May 2 12:38 jackson-databind-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/514/jackson-databind-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 79 May 2 12:38 jackson-dataformat-cbor-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/357/jackson-dataformat-cbor-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 70 May 2 12:38 jackson-jaxrs-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/509/jackson-jaxrs-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-mapper-asl-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/446/jackson-mapper-asl-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 80 May 2 12:38 jackson-module-paranamer-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/373/jackson-module-paranamer-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 jackson-module-scala_2.11-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/391/jackson-module-scala_2.11-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 67 May 2 12:38 jackson-xc-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/392/jackson-xc-1.9.13.jar*
//removed due to word count
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 spark-catalyst_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/296/spark-catalyst_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 82 May 2 12:38 spark-cloud_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/351/spark-cloud_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 __spark_conf__ -> /mnt/resource/hadoop/yarn/local/usercache/oozie/filecache/1630/__spark_conf__.zip/
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-core_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/482/spark-core_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-graphx_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/295/spark-graphx_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-hive_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/453/spark-hive_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 94 May 2 12:38 spark-hive-thriftserver_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/317/spark-hive-thriftserver_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 spark-launcher_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/374/spark-launcher_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 82 May 2 12:38 spark-mllib_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/359/spark-mllib_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 88 May 2 12:38 spark-mllib-local_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/291/spark-mllib-local_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 91 May 2 12:38 spark-network-common_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/433/spark-network-common_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 92 May 2 12:38 spark-network-shuffle_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/333/spark-network-shuffle_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-repl_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/425/spark-repl_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-sketch_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/407/spark-sketch_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 80 May 2 12:38 spark-sql_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/329/spark-sql_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 86 May 2 12:38 spark-streaming_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/452/spark-streaming_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-tags_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/423/spark-tags_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-unsafe_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/428/spark-unsafe_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-yarn_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/478/spark-yarn_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 66 May 2 12:38 spire_2.11-0.7.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/300/spire_2.11-0.7.4.jar*
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 spire-macros_2.11-0.7.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/297/spire-macros_2.11-0.7.4.jar*
lrwxrwxrwx 1 yarn hadoop 59 May 2 12:38 ST4-4.0.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/289/ST4-4.0.4.jar*
lrwxrwxrwx 1 yarn hadoop 64 May 2 12:38 stax-api-1.0.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/458/stax-api-1.0.1.jar*
lrwxrwxrwx 1 yarn hadoop 64 May 2 12:38 stax-api-1.0-2.jar -> /mnt/resource/hadoop/yarn/local/filecache/404/stax-api-1.0-2.jar*
lrwxrwxrwx 1 yarn hadoop 62 May 2 12:38 stream-2.7.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/345/stream-2.7.0.jar*
lrwxrwxrwx 1 yarn hadoop 70 May 2 12:38 stringtemplate-3.2.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/315/stringtemplate-3.2.1.jar*
lrwxrwxrwx 1 yarn hadoop 65 May 2 12:38 super-csv-2.2.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/469/super-csv-2.2.0.jar*
drwx--x--- 2 yarn hadoop 4096 May 2 12:38 tmp/
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 univocity-parsers-2.1.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/416/univocity-parsers-2.1.1.jar*
lrwxrwxrwx 1 yarn hadoop 76 May 2 12:38 validation-api-1.1.0.Final.jar -> /mnt/resource/hadoop/yarn/local/filecache/500/validation-api-1.1.0.Final.jar*
lrwxrwxrwx 1 yarn hadoop 71 May 2 12:38 xbean-asm5-shaded-4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/493/xbean-asm5-shaded-4.4.jar*
lrwxrwxrwx 1 yarn hadoop 66 May 2 12:38 xercesImpl-2.9.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/476/xercesImpl-2.9.1.jar*
lrwxrwxrwx 1 yarn hadoop 61 May 2 12:38 xmlenc-0.52.jar -> /mnt/resource/hadoop/yarn/local/filecache/491/xmlenc-0.52.jar*
lrwxrwxrwx 1 yarn hadoop 56 May 2 12:38 xz-1.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/456/xz-1.0.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 zookeeper-3.4.6.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/449/zookeeper-3.4.6.2.5.6.3-5.jar*
So from the above it seems that regardless of
oozie.action.sharelib.for.spark=spark2
in the Job.properties, YARN/Oozie is loading all of the jars including the old version of jackson into the container. I am setting --conf spark.yarn.jars=spark2/* on the spark job itself too.
So I think that Oozie is spawning a map-reduce job with all of the oozie sharelib jar. This job then spawns a new container for the Spark action which contains all the jars causing the collision. I need the spark container to only include the spark jars.
I am using an Oozie version < 4.3.1
see below fixes:
https://issues.apache.org/jira/browse/OOZIE-2606
https://issues.apache.org/jira/browse/OOZIE-2658
https://issues.apache.org/jira/browse/OOZIE-2787
https://issues.apache.org/jira/browse/OOZIE-2802
These fixes ensure that the Spark container only contains the correct Jars on the path avoiding the Jackson collision.

How to run pig scripts from HDFS?

I am trying to run pig script from the hdfs but it shows error as the file does not exist.
My hdfs Directory
[cloudera#quickstart ~]$ hdfs dfs -ls /
Found 11 items
drwxrwxrwx - hdfs supergroup 0 2016-08-10 14:35 /benchmarks
drwxr-xr-x - hbase supergroup 0 2017-08-19 23:51 /hbase
drwxr-xr-x - cloudera supergroup 0 2017-07-13 04:53 /home
drwxr-xr-x - cloudera supergroup 0 2017-08-27 07:26 /input
drwxr-xr-x - cloudera supergroup 0 2017-07-30 14:30 /output
drwxr-xr-x - solr solr 0 2016-08-10 14:37 /solr
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 11:59 /success.pig
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 12:04 /success.script
drwxrwxrwt - hdfs supergroup 0 2017-08-27 12:07 /tmp
drwxr-xr-x - hdfs supergroup 0 2016-09-28 09:00 /user
drwxr-xr-x - hdfs supergroup 0 2016-08-10 14:37 /var
Command executed
[cloudera#quickstart ~]$ pig -x mapreduce /success.pig
Error Message
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-08-27 12:34:39,160 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-08-27 12:34:39,162 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1503862479069.log
2017-08-27 12:34:47,079 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /success.pig does not exist
Details at logfile: /home/cloudera/pig_1503862479069.log
What am I missing ?
You may use -f <script location> option and option value to run script located at HDFS path. But script location need to be absolute path as given in following syntax and example.
Syntax:
pig -f <fs.defaultFS>/<script path in hdfs>
Example:
pig -f hdfs://Foton/user/root/script.pig

Native library lz4 not available for Spark

How do I add the lz4 native libraries for use by Spark workers?
I have tried to add them via both LD_LIBRARY_PATH and ( as shown - but no accepted or even upvoted answer - in Apache Spark Native Libraries ) - in SPARK_LIBRARY_PATH. They are not working: we get:
java.lang.RuntimeException: native lz4 library not available
at org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1201)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1094)
at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1444)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:277)
at BIDMat.HDFSIO.writeThing(HDFSIO.scala:96)
Here is the LD_LIBRARY_PATH
$echo $LD_LIBRARY_PATH
/usr/local/Cellar/lz4/r131/lib:/usr/local/Cellar/hadoop/2.7.2/libexec/lib:
12:15:35/BIDMach_Spark $ll /usr/local/Cellar/lz4/r131/lib
and the contents of the lz4 related entry:
$ll /usr/local/Cellar/lz4/r131/lib
total 528
-r--r--r-- 1 macuser admin 71144 Sep 21 2015 liblz4.a
drwxr-xr-x 7 macuser admin 238 Sep 21 2015 .
drwxr-xr-x 3 macuser admin 102 Jun 13 10:41 pkgconfig
-r--r--r-- 1 macuser admin 64120 Jun 13 10:41 liblz4.dylib
-r--r--r-- 1 macuser admin 64120 Jun 13 10:41 liblz4.1.dylib
-r--r--r-- 1 macuser admin 64120 Jun 13 10:41 liblz4.1.7.1.dylib
Update your hadoop jars and should work perfectly fine.

Resources