How logs printed directly onto the console in yarn-cluster mode using spark - hadoop

I am new in spark and i want to print logs on console using apache spark in yarn cluster mode.

You need to check the value in log4j.properties file. In my case i have this file in /etc/spark/conf.dist directory
log4j.rootCategory=INFO,console
INFO - prints the all the logs on the console. You can change the value to ERROR, WARN to limit the information you would like to see on the console as sparks logs can be overwhelming

Related

Where does YARN application logs get stored in EMR before sending to S3

I have a requirement to write Yarn application logs from EMR to different source other than S3 .. Can you please lep me where does applications logs get saved in EMR master instance
If the application is submitted to the emr as a step then the logs will reside in:
/var/log/hadoop/steps/<<step-id>>/<<log-file>>
most logs for emr can be found under the /var/logs directory in the master node
you could also use the yarn cli to get the application logs and redirect the returned log stream to a file to do whatever you want with.
yarn logs -applicationId <<application_id>> > application_log_file.log
Yarn logs are found at /var/log/hadoop-yarn/, and yarn container logs are found at /var/log/hadoop-yarn/container
Links:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-debugging.html
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html

Getting "User [dr.who] is not authorized to view the logs for application <AppID>" while running a YARN application

I'm running a custom Yarn Application using Apache Twill in HDP 2.5 cluster, but I'm not able to see my own container logs (syslog, stderr and stdout) when I go to my container web page:
Also the login changes from my kerberos to "dr.who" when I navigate to this page.
But I can see the logs of map-reduce jobs. Hadoop version is 2.7.3 and the cluster is yarn acl enabled.
i had this issue with hadoop ui. I found in this doc, that the hadoop.http.staticuser.user is set to dr.who by default and you need include it in the related setting file (in my issue is core-site.xml file).
so late but hope useful.

Apache Spark: History server (logging) + non super-user access (HDFS)

I have a working HDFS and a running Spark framework in a remote server.
I am running SparkR applications and hope to see the logs of the completed UI as well.
I followed all the instructions here: Windows: Apache Spark History Server Config
and was able to start the History Server on the server.
However, only when the super-user(person who started the name node of Hadoop) and who started the Spark processes fires a Spark application remotely, the logging takes places successfully in HDFS path & we are able to view the History Web UI of Spark as well.
When I run the same application from my user ID (remotely), though it shows on port 18080 a History Server is up and running, it does not log any of my applications.
I have been given read, write and execute access to the folder in HDFS.
The spark-defaults.conf file now looks like this:
spark.eventLog.enabled true
spark.history.fs.logDirectory hdfs://XX.XX.XX.XX:19000/user/logs
spark.eventLog.dir hdfs://XX.XX.XX.XX:19000/user/logs
spark.history.ui.acls.enable false
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
Am I missing out on some permissions or config settings somewhere(Spark? HDFS)?
Any pointers/tips to proceed from here would be appreciated.

Unable to start hive in terminal

I am using hive for querying and data processing purpose in my hadoop main node,but I am not able to start hive in terminal and way taking too much time and not starting as show below.
#hive
Logging initialized using configuration in file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
you can lookup for the actual problem in HIVE
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO.
The logs are stored in the directory /tmp/<user.name>:
/tmp/<user.name>/hive.log
Note: In local mode, prior to Hive 0.13.0 the log file name was ".log" instead of "hive.log".

I cannot see the running applications in hadoop 2.5.2 (yarn)

I installed hadoop 2.5.2, and I can run the wordcount sample successfully. However, when I want to see the application running on yarn (job running), I cannot as all applictaions interface is always empty (shown in the following screen).
Is there anyway to make the jobs visible?
Please try localhost:19888 or check value of the the property for web url for job history (mapreduce.jobhistory.webapp.address) configured in you yarn config file.

Resources