Is there any way our Job history server wont show our MR application related information - hadoop

I tried spark.eventlog.dir=false and then the Spark history server is not showing any information related to this.
Is there any similar way Job history server wont show our application related information similar to Spark History Server when spark.eventlog.dir is set to false.

This can be done if we set below highlighted property to any location other than default location which is there in mapred-site.xml
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi -D mapreduce.map.memory.mb=512 -D mapreduce.reduce.memory.mb=512 -Dmapreduce.jobhistory.intermediate-done-dir=<<"new-location">> 2 10
When this is done then the Job history server can't move logs from intermediate done dir to done dir. Since it reads from the location which is configured in mapred-site.xml

Related

Apache Spark: History server (logging) + non super-user access (HDFS)

I have a working HDFS and a running Spark framework in a remote server.
I am running SparkR applications and hope to see the logs of the completed UI as well.
I followed all the instructions here: Windows: Apache Spark History Server Config
and was able to start the History Server on the server.
However, only when the super-user(person who started the name node of Hadoop) and who started the Spark processes fires a Spark application remotely, the logging takes places successfully in HDFS path & we are able to view the History Web UI of Spark as well.
When I run the same application from my user ID (remotely), though it shows on port 18080 a History Server is up and running, it does not log any of my applications.
I have been given read, write and execute access to the folder in HDFS.
The spark-defaults.conf file now looks like this:
spark.eventLog.enabled true
spark.history.fs.logDirectory hdfs://XX.XX.XX.XX:19000/user/logs
spark.eventLog.dir hdfs://XX.XX.XX.XX:19000/user/logs
spark.history.ui.acls.enable false
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
Am I missing out on some permissions or config settings somewhere(Spark? HDFS)?
Any pointers/tips to proceed from here would be appreciated.

spark history server does not show jobs or stages

We are trying to use spark history server to further improve our spark jobs. The spark job correctly writes the eventlog into HDFS and the spark history server also can access this eventlog: we do see the job in the spark history server job listing but aside from the environment variables and executors everything is empty...
Any ideas on how we can make the spark history server show everything (we really want to see the DAG for instance) ?
We are using spark 1.4.1.
Thanks.
I had a similar issue. I am browsing the history server with port forwarding with ssh. After granting the read permission to all the files in the log directory, they appear in my history server!
cd {SPARK_EVENT_LOG_DIR}
chmod +r * # grant the read permission to all users for all files

Hadoop 2.2.0 Web UI not showing Job Progress

I have installed Single node hadoop 2.2.0 from this link. When i run a job from terminal, it works fine with output. Web UI's i used
- Resource Manager : http://localhost:8088
- Namenode Daemon : http://localhost:50070
But from Resource Manager's web UI(shown above) i can't see job progress like Submitted Jobs, Running Jobs, etc..
MY /etc/hosts file is as follows:
127.0.0.1 localhost
127.0.1.1 meitpict
My System has IP: 192.168.2.96(I tried by removing this ip but still it didn't worked)
The only host:port i mentioned is in core-site.xml and that is:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:54310</value>
</property>
Even i get these problems while executing Map - Reduce job.
Best what i did that while executing Map-Reduce job you will get link on the box from which you executed MapR job for checking the job progress something like below:
http://node1:19888/jobhistory/job/job_1409462238335_0002/mapreduce/app
Here node1 is set to 192.168.56.101 in my hosts file entry and that is for NameNode box.
So at the time of your MapR job is running you can go to the UI link provided by MapR framework.
ANd when it gets opened then do not close and there you can find details about other jobs also, when they started and when they got finished etc.
So next time better to check your putty console output after submitting the MapR job, you will definitely see a link for the current job to check it status from the browser UI.
In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working

JobHistory server in Hadoop 2 could not load history file from HDFS

Error message looks like this:
Could not load history file hdfs://namenodeha:8020/mr-history/tmp/hdfs/job_1392049860497_0005-1392129567754-hdfs-word+count-1392129599308-1-1-SUCCEEDED-default.jhist
Actually, I know the answer to the problem. The defaul settings of /mr-history files is:
hadoop fs -chown -R $MAPRED_USER:$HDFS_USER /mr-history
But when running a job (under $HDFS_USER), job file is saved to /mr-history/tmp/hdfs under $HDFS_USER:$HDFS_USER and then not accessible to $MAPRED_USER (where JobHistory server is running). After changing the permissions back again the job file can be load.
But it is happening again with every new job. So can someone help me, what is the pernament solution to this, thank you.
I ran into the same problem.
As a workaround I added the $MAPRED_USER user to the $HDFS_USER group, it helped.

Oozie job submission fails

I am trying to submit an example map reduce oozie job and all the properties are configured properly with regards to the path and name node and job-tracker port etc. I validated the workflow.xml too . when I deploy the job I get a job id and when I check the status I see a status KILLED and the details basically say that
/var/tmp/oozie/oozie-oozi7188507762062318929.dir/map-reduce-launcher.jar does not exist.
In order to resolve this error, just crate hdfs folders and give appropriate permissions to them.
http://kadirsert.blogspot.com.tr/2014/03/oozie-says-jar-does-not-exist.html
Local file system (no HDFS) should have '/var/tmp/oozie' directory.
If the directory doesn't exist, create the directory and restart the Oozie server. Then there comes a lot of files under /var/tmp/oozie including *-launcher.jar files.
'/var/tmp/oozie' is the value of -Djava.io.tmpdir variable in Oozie server start-up command line. You can check the value using 'ps -ef | grep oozie' where the Oozie server is running.

Resources