I want to run some application which is killed or finished (not running application).
I run application like this :
$ yarn jar /usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar -jar /usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar -queue queueTEST01 -shell_command "vmstat 60" -container_memory 2048
and figure out that it saves two files in hdfs.
***
[root#em-name01 logs]# hadoop fs -ls
-rw-r--r-- 3 hadoop supergroup 45579 2016-11-02 11:03 /user/hadoop/DistributedShell/application_1478051489888_0004/AppMaster.jar
-rwx--x--- 3 hadoop supergroup 79 2016-11-02 11:03 /user/hadoop/DistributedShell/application_1478051489888_0004/shellCommands
If the application killed or finished, how can I run the AppMaster.jar again???
Or is here some other way?
Related
I am able to connect to Mapr Control System - MCS - port 8080 (Web - UI) but where can i see the data file i copied from my local files system.??
On the MCS Navigation Pane , I can see the below tabs but i dont see where to navigate to data folder.
**Cluster
Dashboard
Nodes
Node Heatmap
Jobs
MapR-FS
MapR Tables
Volumes
Mirror Volumes
USer Disk USage
Snapshot
NFS HA
NFS Setup
Alarms
...
...
..
system Settings
-..
..
..
HBASE
...
Resource Manager
Job History Server
..**
**Hadoop Command:**
]$ hadoop fs -ls -R /user
drwxr-xr-x - mapr mapr 1 2019-06-25 12:47 /user/hive
drwxrwxr-x - mapr mapr 0 2019-06-25 12:47 /user/hive/warehouse
drwxrwxrwx - mapr mapr 6 2019-06-26 11:34 /user/mapr
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/.sparkStaging
drwxr-xr-x - mapr mapr 0 2019-06-24 15:40 /user/mapr/hadoop
drwxrwxrwx - mapr mapr 1 2019-06-26 11:48 /user/mapr/rajesh
-rwxr-xr-x 3 webload webload 219 2019-06-26 11:48 /user/mapr/rajesh/sample.txt
-rwxr-xr-x 3 mapr mapr 219 2019-06-25 13:46 /user/mapr/sample.txt
drwxr-xr-x - mapr mapr 0 2019-06-25 12:14 /user/mapr/spark-warehouse
drwxr-xr-x - mapr mapr 1 2019-06-24 13:37 /user/mapr/tmp
drwxrwxrwx - mapr mapr 0 2019-06-24 13:37 /user/mapr/tmp/hive
I am trying to run pig script from the hdfs but it shows error as the file does not exist.
My hdfs Directory
[cloudera#quickstart ~]$ hdfs dfs -ls /
Found 11 items
drwxrwxrwx - hdfs supergroup 0 2016-08-10 14:35 /benchmarks
drwxr-xr-x - hbase supergroup 0 2017-08-19 23:51 /hbase
drwxr-xr-x - cloudera supergroup 0 2017-07-13 04:53 /home
drwxr-xr-x - cloudera supergroup 0 2017-08-27 07:26 /input
drwxr-xr-x - cloudera supergroup 0 2017-07-30 14:30 /output
drwxr-xr-x - solr solr 0 2016-08-10 14:37 /solr
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 11:59 /success.pig
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 12:04 /success.script
drwxrwxrwt - hdfs supergroup 0 2017-08-27 12:07 /tmp
drwxr-xr-x - hdfs supergroup 0 2016-09-28 09:00 /user
drwxr-xr-x - hdfs supergroup 0 2016-08-10 14:37 /var
Command executed
[cloudera#quickstart ~]$ pig -x mapreduce /success.pig
Error Message
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-08-27 12:34:39,160 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-08-27 12:34:39,162 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1503862479069.log
2017-08-27 12:34:47,079 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /success.pig does not exist
Details at logfile: /home/cloudera/pig_1503862479069.log
What am I missing ?
You may use -f <script location> option and option value to run script located at HDFS path. But script location need to be absolute path as given in following syntax and example.
Syntax:
pig -f <fs.defaultFS>/<script path in hdfs>
Example:
pig -f hdfs://Foton/user/root/script.pig
I have done my Hadoop cluster including 1 NameNode and 2 DataNodes and everything works perfectly :)
Now, I want to add a Hadoop Edge (aka Hadoop Gateway), I followed instructions here and finally, I execute :
hadoop fs -ls /
But unfortunately, I expected to see my HDFS's content but I see my local FS :
Found 22 items
-rw-r--r-- 1 root root 0 2017-03-30 16:44 /autorelabel
dr-xr-xr-x - root root 20480 2017-03-30 16:49 /bin
...
drwxr-xr-x - root root 20480 2016-07-08 17:31 /home
I think my core-site.xml is configurated as needed with specific property :
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnodemaster1:8020/</value>
</property>
hadoopmaster1 is my namenode and is reachable ..
I don't understand why I see my Local FS and not my HDFS .. Thank you :)
I was facing issues in deploying local files to hdfs and found that I should have "drwx------" for datanode and namenode.
Initial permission status of datanode and namenode in hdfs.
drwx------ 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
Permission of datanode is changed to 755
hduser#pradeep:~$ chmod -R 755 /usr/local/hadoop_store/hdfs/
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
After initiating start-dfs.sh, datanode didn't start and permissions to datanode were restored to original state.
hduser#pradeep:~$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop- hduser-namenode-pradeep.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-pradeep.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-pradeep.out
hduser#pradeep:~$ jps
4385 Jps
3903 NameNode
4255 SecondaryNameNode
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwx------ 3 hduser hadoop 4096 Mar 2 22:34 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 22:34 namenode
As datanode is not running i am not able to deploy data to hdfs from local file system. I couldn't understand or find any reason why the file permissions are restored to previous state only for datanode folder.
it appears the name space id generated by the NameNode is different from your DataNode.
Solution:
if you goto the path where your hadoop files are stored on the local file system.
forexample /usr/local/hadoop. go down the path to /usr/local/hadoop/tmp/dfs/name/version. copy the namespaceid and take it to the path /usr/local/hadoop/tmp/dfs/data/version , replace the namespaceid.
i hope this helps.
I have a question.When i run my spark task , with hive-jdbc in local ,i can connect the hive .But when i run it in cluster with spark-submit,it failed .
Exception in thread "main" java.sql.SQLException: Could not establish connection to jdbc:hive2://172.16.28.99:10000/vdm_da_dev.db: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=vdm_da_dev.db})"
The client is hive-jdbc-1.1.0-cdh5.6.0,and the server is hive-1.1.0-cdh5.6.0.
But the spark is spark-2.0.0,the hive-jdbc in /opt/spark/jars is hive-jdbc-1.2.1.spark2.jar.
I replace it with hive-jdbc-1.1.0-cdh5.6.0.jar of all nodes,but still the wrong .
I package the project with dependencies,but in the spark-submit,it did not use them.
How can i do it?
Please,anyone can help me . Thanks very much.
You can use guide from cloudera for additional parameters:
running spark apps on cloudera
In general spark-submit should look like:
spark-submit --class *class_main* \
--master yarn \
--deploy-mode cluster \
--conf "key=value" \
--files path_to_spark_conf/hive-site.xml \
--jars full_path/addiational_dependecy_jars \
app_package.jar
In --jars possibly you should provide datanucleus-core, datanucleus-rdbms, datanucleus-api-jdo in order to work with hive and hive metastore.
Thanks FaigB very much!
I did as #FaigB told,but i found there were no the dependence jars in dependency classpath.
Then i removed all the hive dependence jars in /opt/spark/jars/,and copy them from /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/.Then i solved the mistake of mismatched version.
Just like:
-rw-r--r--. 1 spark spark 138464 8月 19 13:55 hive-beeline-1.2.1.spark2.jar_bk
-rw-r--r--. 1 spark spark 37995 11月 15 01:00 hive-cli-1.1.0-cdh5.6.0.jar
-rw-r--r--. 1 spark spark 40817 8月 19 13:55 hive-cli-1.2.1.spark2.jar_bk
-rw-r--r--. 1 spark spark 11498852 8月 19 13:55 hive-exec-1.2.1.spark2.jar_bk
-rw-r--r--. 1 spark spark 95006 11月 9 14:59 hive-jdbc-1.1.0-cdh5.6.0.jar
-rw-r--r--. 1 spark spark 100680 8月 19 13:55 hive-jdbc-1.2.1.spark2.jar_bk
-rw-r--r--. 1 spark spark 5505200 8月 19 13:55 hive-metastore-1.2.1.spark2.jar_bk
-rw-r--r--. 1 spark spark 1874242 11月 15 01:00 hive-service-1.1.0-cdh5.6.0.jar
-rw-r--r--. 1 spark spark 1033763 8月 19 13:55 spark-hive_2.11-2.0.0.jar_bk
-rw-r--r--. 1 spark spark 1813851 8月 19 13:55 spark-hive-thriftserver_2.11-2.0.0.jar_bk
[spark#d2 jars]$ cp hive-service-1.1.0-cdh5.6.0.jar /opt/spark/jars/
[spark#d2 jars]$ cp hive-cli-1.1.0-cdh5.6.0.jar /opt/spark/jars/
[spark#d2 jars]$ cp spark-hive_2.10-1.5.0-cdh5.6.0.jar /opt/spark/jars/
[spark#d2 jars]$ cp hive-exec-1.1.0-cdh5.6.0.jar /opt/spark/jars/
[spark#d2 jars]$ cp hive-metastore-1.1.0-cdh5.6.0.jar /opt/spark/jars/
[spark#d2 jars]$ cp hive-beeline-1.1.0-cdh5.6.0.jar /opt/spark/jars/