How to run pig scripts from HDFS? - hadoop

I am trying to run pig script from the hdfs but it shows error as the file does not exist.
My hdfs Directory
[cloudera#quickstart ~]$ hdfs dfs -ls /
Found 11 items
drwxrwxrwx - hdfs supergroup 0 2016-08-10 14:35 /benchmarks
drwxr-xr-x - hbase supergroup 0 2017-08-19 23:51 /hbase
drwxr-xr-x - cloudera supergroup 0 2017-07-13 04:53 /home
drwxr-xr-x - cloudera supergroup 0 2017-08-27 07:26 /input
drwxr-xr-x - cloudera supergroup 0 2017-07-30 14:30 /output
drwxr-xr-x - solr solr 0 2016-08-10 14:37 /solr
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 11:59 /success.pig
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 12:04 /success.script
drwxrwxrwt - hdfs supergroup 0 2017-08-27 12:07 /tmp
drwxr-xr-x - hdfs supergroup 0 2016-09-28 09:00 /user
drwxr-xr-x - hdfs supergroup 0 2016-08-10 14:37 /var
Command executed
[cloudera#quickstart ~]$ pig -x mapreduce /success.pig
Error Message
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-08-27 12:34:39,160 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-08-27 12:34:39,162 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1503862479069.log
2017-08-27 12:34:47,079 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /success.pig does not exist
Details at logfile: /home/cloudera/pig_1503862479069.log
What am I missing ?

You may use -f <script location> option and option value to run script located at HDFS path. But script location need to be absolute path as given in following syntax and example.
Syntax:
pig -f <fs.defaultFS>/<script path in hdfs>
Example:
pig -f hdfs://Foton/user/root/script.pig

Related

Not able to configure databricks with external hive metastore

I am following this document https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options
to connect to my external hive metastore. My metastore version is 3.1.0 and followed the document.
docs.databricks.comdocs.databricks.com
External Apache Hive metastore — Databricks Documentation
Learn how to connect to external Apache Hive metastores in Databricks.
10:51
I have getting this error when trying to connect to external hive metastore
org/apache/hadoop/hive/conf/HiveConf when creating Hive client using classpath:
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars
spark.sql.hive.metastore.jars=/databricks/hive_metastore_jars/*
When I do an ls on /databricks/hive_metastore_jars/, I can see all copied files
10:52
Do I need to copy any hive specific files and upload it in this folder?
I did exactly what was mentioned in the site
These are the contents of my hive_metastore_jars
total 56K
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 1585025573715-0
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 d596a6ec-e105-4a6e-af95-df3feffc263d_resources
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 repl
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-2959157d-2018-441a-a7d3-d7cecb8a645f
drwxr-xr-x 4 root root 4.0K Mar 24 05:06 root
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-30a72ee5-304c-432b-9c13-0439511fb0cd
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-a19d167b-d571-4e58-a961-d7f6ced3d52f
-rwxr-xr-x 1 root root 5.5K Mar 24 05:06 _CleanRShell.r3763856699176668909resource.r
-rwxr-xr-x 1 root root 9.7K Mar 24 05:06 _dbutils.r9057087446822479911resource.r
-rwxr-xr-x 1 root root 301 Mar 24 05:06 _rServeScript.r1949348184439973964resource.r
-rwxr-xr-x 1 root root 1.5K Mar 24 05:06 _startR.sh5660449951005543051resource.r
Am I missing anything?
Strangely If I look into the cluster boot logs here is what I get
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionDriverName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionURL unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionUserName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionPassword unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.schema.autoCreateAll unknown - will be ignored
20/03/24 07:29:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
20/03/24 07:29:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
I have already set the above configurations and it shows in the logs as well
20/03/24 07:28:59 INFO SparkContext: Spark configuration:
spark.hadoop.javax.jdo.option.ConnectionDriverName=org.mariadb.jdbc.Driver
spark.hadoop.javax.jdo.option.ConnectionPassword=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionURL=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionUserName=*********(redacted)
Also version information is available in my hive metastore, I can connect to mysql and see it it shows
SCHEMA_VERSION : 3.1.0
VER_ID = 1
From the output, it looks like the jars are not copied to the "/databricks/hive_metastore_jars/" location. As mentioned in the documentation link you shared:
Set spark.sql.hive.metastore.jars set to maven
Restart the cluster with the above configuration and then check in the Spark driver logs for the message :
17/11/18 22:41:19 INFO IsolatedClientLoader: Downloaded metastore jars to <path>
From this location copy the jars to DBFS from the same cluster and then use an init script to copy the jars from DBFS to "/databricks/hive_metastore_jars/"
As I am using azure mysql there is one more step I need to perform
https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

Where to use the HDFS data in Web UI - MapR

I am able to connect to Mapr Control System - MCS - port 8080 (Web - UI) but where can i see the data file i copied from my local files system.??
On the MCS Navigation Pane , I can see the below tabs but i dont see where to navigate to data folder.
**Cluster
Dashboard
Nodes
Node Heatmap
Jobs
MapR-FS
MapR Tables
Volumes
Mirror Volumes
USer Disk USage
Snapshot
NFS HA
NFS Setup
Alarms
...
...
..
system Settings
-..
..
..
HBASE
...
Resource Manager
Job History Server
..**
**Hadoop Command:**
]$ hadoop fs -ls -R /user
drwxr-xr-x - mapr mapr 1 2019-06-25 12:47 /user/hive
drwxrwxr-x - mapr mapr 0 2019-06-25 12:47 /user/hive/warehouse
drwxrwxrwx - mapr mapr 6 2019-06-26 11:34 /user/mapr
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/.sparkStaging
drwxr-xr-x - mapr mapr 0 2019-06-24 15:40 /user/mapr/hadoop
drwxrwxrwx - mapr mapr 1 2019-06-26 11:48 /user/mapr/rajesh
-rwxr-xr-x 3 webload webload 219 2019-06-26 11:48 /user/mapr/rajesh/sample.txt
-rwxr-xr-x 3 mapr mapr 219 2019-06-25 13:46 /user/mapr/sample.txt
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/spark-warehouse
drwxr-xr-x - mapr mapr 1 2019-06-24 13:37 /user/mapr/tmp
drwxrwxrwx - mapr mapr 0 2019-06-24 13:37 /user/mapr/tmp/hive

HDInsight Oozie 4.2.0.2.5 Spark2 Action Jackson collision

I was following the following guide:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/ch_oozie-spark-action.html#spark-config-oozie-spark2
This enabled me to configure the following workflow.xml
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">
<start to = "Raw-To-Parquet" />
<action name="Raw-To-Parquet">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<!--
<prepare>
<delete path="[PATH]"/>
<mkdir path="[PATH]"/>
</prepare>
<job-xml>[SPARK SETTINGS FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
</configuration>
-->
<master>${master}</master>
<!--
<mode>[SPARK MODE]</mode>
-->
<name>Raw-To-Parquet</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>${nameNode}/spark-examples_2.11-2.0.2.2.5.6.3-5.jar</jar>
<spark-opts>--conf spark.yarn.jars=spark2/* --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.3-5 --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.3-5</spark-opts>
<!--SPARK_JAVA_OPTS="-Dhdp.verion=xxx"
<arg>[ARG-VALUE]</arg>
<arg>[ARG-VALUE]</arg>
-->
</spark>
<ok to="End"/>
<error to="Fail"/>
</action>
<kill name = "Fail">
<message>Job failed</message>
</kill>
<end name = "End" />
</workflow-app>
Job.properties
master=yarn-cluster
nameNode=wasb://hdi-adam-ak#hdiadamakstore.blob.core.windows.net
jobTracker=hn1-hdi-ad.hcgue2snotaezkuexzoymd0nlh.ax.internal.cloudapp.net:8088
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/project-example/oozie
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.action.sharelib.for.spark=spark2
Which begins a workflow, but then dies due to a collision with Jackson jars.
18/05/02 12:39:12 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByLookup(BasicSerializerFactory.java:302)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:218)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
See the below to see the contents of my oozie sharelibs
oozie#hn0-hdi-ad:/quantexa/oozie$ hadoop fs -ls -R /user/oozie/share/lib/lib_20180502121937 | grep jackson
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/distcp/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/hcatalog/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/hive2/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/mapreduce-streaming/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/oozie/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/pig/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 46983 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-annotations-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 258876 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-core-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1171380 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-databind-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 48418 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-dataformat-cbor-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 18336 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-jaxrs-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 41263 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-module-paranamer-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 515604 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-module-scala_2.11-2.6.5.jar
-rw-r--r-- 1 oozie supergroup 27084 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/jackson-xc-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 40341 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/json4s-jackson_2.11-3.2.11.jar
-rw-r--r-- 1 oozie supergroup 1048110 2018-05-02 12:21 /user/oozie/share/lib/lib_20180502121937/spark2/parquet-jackson-1.7.0.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 232248 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-core-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-databind-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 780664 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-mapper-asl-1.9.13.jar
-rw-r--r-- 1 oozie supergroup 549415 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/jackson-module-scala_2.10-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 39953 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/json4s-jackson_2.10-3.2.10.jar
-rw-r--r-- 1 oozie supergroup 1048110 2018-05-02 12:20 /user/oozie/share/lib/lib_20180502121937/spark_orig/parquet-jackson-1.7.0.jar
-rw-r--r-- 1 oozie supergroup 38605 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-annotations-2.4.0.jar
-rw-r--r-- 1 oozie supergroup 225302 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-core-2.4.4.jar
-rw-r--r-- 1 oozie supergroup 1076926 2018-05-02 12:19 /user/oozie/share/lib/lib_20180502121937/sqoop/jackson-databind-2.4.4.jar
You can then see the contents of the container which YARN creates:
root#wn0-hdi-ad:/mnt/resource/hadoop/yarn/local/usercache/oozie/appcache# ll application_1525249303830_0045/container_1525249303830_0045_01_000001/
total 1020
drwx--x--- 3 yarn hadoop 20480 May 2 12:38 ./
drwx--x--- 16 yarn hadoop 4096 May 2 12:40 ../
//removed due to word count
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 __app__.jar -> /mnt/resource/hadoop/yarn/local/filecache/261/spark-examples_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-annotations-2.4.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/498/jackson-annotations-2.4.0.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-annotations-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/504/jackson-annotations-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 68 May 2 12:38 jackson-core-2.4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/293/jackson-core-2.4.4.jar*
lrwxrwxrwx 1 yarn hadoop 68 May 2 12:38 jackson-core-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/304/jackson-core-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 jackson-core-asl-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/467/jackson-core-asl-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 72 May 2 12:38 jackson-databind-2.4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/507/jackson-databind-2.4.4.jar*
lrwxrwxrwx 1 yarn hadoop 72 May 2 12:38 jackson-databind-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/514/jackson-databind-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 79 May 2 12:38 jackson-dataformat-cbor-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/357/jackson-dataformat-cbor-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 70 May 2 12:38 jackson-jaxrs-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/509/jackson-jaxrs-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 jackson-mapper-asl-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/446/jackson-mapper-asl-1.9.13.jar*
lrwxrwxrwx 1 yarn hadoop 80 May 2 12:38 jackson-module-paranamer-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/373/jackson-module-paranamer-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 jackson-module-scala_2.11-2.6.5.jar -> /mnt/resource/hadoop/yarn/local/filecache/391/jackson-module-scala_2.11-2.6.5.jar*
lrwxrwxrwx 1 yarn hadoop 67 May 2 12:38 jackson-xc-1.9.13.jar -> /mnt/resource/hadoop/yarn/local/filecache/392/jackson-xc-1.9.13.jar*
//removed due to word count
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 spark-catalyst_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/296/spark-catalyst_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 82 May 2 12:38 spark-cloud_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/351/spark-cloud_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 __spark_conf__ -> /mnt/resource/hadoop/yarn/local/usercache/oozie/filecache/1630/__spark_conf__.zip/
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-core_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/482/spark-core_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-graphx_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/295/spark-graphx_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-hive_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/453/spark-hive_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 94 May 2 12:38 spark-hive-thriftserver_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/317/spark-hive-thriftserver_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 85 May 2 12:38 spark-launcher_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/374/spark-launcher_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 82 May 2 12:38 spark-mllib_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/359/spark-mllib_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 88 May 2 12:38 spark-mllib-local_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/291/spark-mllib-local_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 91 May 2 12:38 spark-network-common_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/433/spark-network-common_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 92 May 2 12:38 spark-network-shuffle_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/333/spark-network-shuffle_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-repl_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/425/spark-repl_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-sketch_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/407/spark-sketch_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 80 May 2 12:38 spark-sql_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/329/spark-sql_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 86 May 2 12:38 spark-streaming_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/452/spark-streaming_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-tags_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/423/spark-tags_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 83 May 2 12:38 spark-unsafe_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/428/spark-unsafe_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 81 May 2 12:38 spark-yarn_2.11-2.0.2.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/478/spark-yarn_2.11-2.0.2.2.5.6.3-5.jar*
lrwxrwxrwx 1 yarn hadoop 66 May 2 12:38 spire_2.11-0.7.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/300/spire_2.11-0.7.4.jar*
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 spire-macros_2.11-0.7.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/297/spire-macros_2.11-0.7.4.jar*
lrwxrwxrwx 1 yarn hadoop 59 May 2 12:38 ST4-4.0.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/289/ST4-4.0.4.jar*
lrwxrwxrwx 1 yarn hadoop 64 May 2 12:38 stax-api-1.0.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/458/stax-api-1.0.1.jar*
lrwxrwxrwx 1 yarn hadoop 64 May 2 12:38 stax-api-1.0-2.jar -> /mnt/resource/hadoop/yarn/local/filecache/404/stax-api-1.0-2.jar*
lrwxrwxrwx 1 yarn hadoop 62 May 2 12:38 stream-2.7.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/345/stream-2.7.0.jar*
lrwxrwxrwx 1 yarn hadoop 70 May 2 12:38 stringtemplate-3.2.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/315/stringtemplate-3.2.1.jar*
lrwxrwxrwx 1 yarn hadoop 65 May 2 12:38 super-csv-2.2.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/469/super-csv-2.2.0.jar*
drwx--x--- 2 yarn hadoop 4096 May 2 12:38 tmp/
lrwxrwxrwx 1 yarn hadoop 73 May 2 12:38 univocity-parsers-2.1.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/416/univocity-parsers-2.1.1.jar*
lrwxrwxrwx 1 yarn hadoop 76 May 2 12:38 validation-api-1.1.0.Final.jar -> /mnt/resource/hadoop/yarn/local/filecache/500/validation-api-1.1.0.Final.jar*
lrwxrwxrwx 1 yarn hadoop 71 May 2 12:38 xbean-asm5-shaded-4.4.jar -> /mnt/resource/hadoop/yarn/local/filecache/493/xbean-asm5-shaded-4.4.jar*
lrwxrwxrwx 1 yarn hadoop 66 May 2 12:38 xercesImpl-2.9.1.jar -> /mnt/resource/hadoop/yarn/local/filecache/476/xercesImpl-2.9.1.jar*
lrwxrwxrwx 1 yarn hadoop 61 May 2 12:38 xmlenc-0.52.jar -> /mnt/resource/hadoop/yarn/local/filecache/491/xmlenc-0.52.jar*
lrwxrwxrwx 1 yarn hadoop 56 May 2 12:38 xz-1.0.jar -> /mnt/resource/hadoop/yarn/local/filecache/456/xz-1.0.jar*
lrwxrwxrwx 1 yarn hadoop 75 May 2 12:38 zookeeper-3.4.6.2.5.6.3-5.jar -> /mnt/resource/hadoop/yarn/local/filecache/449/zookeeper-3.4.6.2.5.6.3-5.jar*
So from the above it seems that regardless of
oozie.action.sharelib.for.spark=spark2
in the Job.properties, YARN/Oozie is loading all of the jars including the old version of jackson into the container. I am setting --conf spark.yarn.jars=spark2/* on the spark job itself too.
So I think that Oozie is spawning a map-reduce job with all of the oozie sharelib jar. This job then spawns a new container for the Spark action which contains all the jars causing the collision. I need the spark container to only include the spark jars.
I am using an Oozie version < 4.3.1
see below fixes:
https://issues.apache.org/jira/browse/OOZIE-2606
https://issues.apache.org/jira/browse/OOZIE-2658
https://issues.apache.org/jira/browse/OOZIE-2787
https://issues.apache.org/jira/browse/OOZIE-2802
These fixes ensure that the Spark container only contains the correct Jars on the path avoiding the Jackson collision.

Unable to start datanode and file permissions of datanode are changing when start-dfs.sh is started

I was facing issues in deploying local files to hdfs and found that I should have "drwx------" for datanode and namenode.
Initial permission status of datanode and namenode in hdfs.
drwx------ 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
Permission of datanode is changed to 755
hduser#pradeep:~$ chmod -R 755 /usr/local/hadoop_store/hdfs/
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
After initiating start-dfs.sh, datanode didn't start and permissions to datanode were restored to original state.
hduser#pradeep:~$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop- hduser-namenode-pradeep.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-pradeep.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-pradeep.out
hduser#pradeep:~$ jps
4385 Jps
3903 NameNode
4255 SecondaryNameNode
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwx------ 3 hduser hadoop 4096 Mar 2 22:34 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 22:34 namenode
As datanode is not running i am not able to deploy data to hdfs from local file system. I couldn't understand or find any reason why the file permissions are restored to previous state only for datanode folder.
it appears the name space id generated by the NameNode is different from your DataNode.
Solution:
if you goto the path where your hadoop files are stored on the local file system.
forexample /usr/local/hadoop. go down the path to /usr/local/hadoop/tmp/dfs/name/version. copy the namespaceid and take it to the path /usr/local/hadoop/tmp/dfs/data/version , replace the namespaceid.
i hope this helps.

how to run appmaster.jar in yarn

I want to run some application which is killed or finished (not running application).
I run application like this :
$ yarn jar /usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar -jar /usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar -queue queueTEST01 -shell_command "vmstat 60" -container_memory 2048
and figure out that it saves two files in hdfs.
***
[root#em-name01 logs]# hadoop fs -ls
-rw-r--r-- 3 hadoop supergroup 45579 2016-11-02 11:03 /user/hadoop/DistributedShell/application_1478051489888_0004/AppMaster.jar
-rwx--x--- 3 hadoop supergroup 79 2016-11-02 11:03 /user/hadoop/DistributedShell/application_1478051489888_0004/shellCommands
If the application killed or finished, how can I run the AppMaster.jar again???
Or is here some other way?

Resources