Yarn Timeline Server produces logs at /var/log/hadoop-yarn location. We see two type of log files:
hadoop-yarn-timelineserver-<host_ip>*.log*
hadoop-yarn-timelineserver-<host_ip>*.out*
The disk is getting filled due .out file growing indefinitely which result into disk full errors.
Any solution to rotate .out file or add a size check on the same
Can you check your DataNode and NameNode logs for any "IllegalAccessException" errors? If yes, then the issue you are facing is related to a known Hadoop bug of Jersey 1.9 version. There are temporary solutions such as upgrading to latest version of Jersey.
https://issues.apache.org/jira/browse/HADOOP-11461
Related
I am trying to move my non-HA namenode to HA. After setting up all the configurations for JournalNode by following the Apache Hadoop documentation, I was able to bring the namenodes up. However, the namenodes are crashing immediately and throwing the follwing error.
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log. We expected txid 43891997, but got txid 45321534.
I tried to recover the edit logs, initialize the shared edits etc., but nothing works. I am not sure how to fix this problem without formatting namenode since I do not want to loose any data.
Any help is greatly appreciated. Thanking in advance.
The problem was with the limit of open files on a linux machine. I increased the limit of open files and then the initialization of shared edits worked.
I am using hadoop apache 2.7.1 cluster which consists of 4 data nodes and two name nodes becuase it is high available
deployed on centos 7
and it began working at 01-08-2017
and we know that logs will be generated for each service
and let's take the current logs for example
hadoop-root-datanode-dn1.log
hadoop-root-datanode-dn2.log
where hadoop_root is the user iam logging with
my problem is:
in dn1 log i can find info from 01-08-2017 until today
but in dn2 log doesn't have all complete info ,as it is emptied every day so it has only info related to today
is there any properties to control this behavior or it is centos problem
any help please ?
By default, the .log files are rotated daily by log4j. This is configurable with /etc/hadoop/conf/log4j.properties.
https://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
Not to suggest you're running a Cloudera cluster, but if you did, those files are not deleted. They're rolled and renamed
Oh, and I would suggest not running your daemons as root. Most hadoop installation guides explicitly have you create a hdfs or hadoop user
I deleted multiple old files (HiveLogs/MR-Job intermediate files) from HDFS location /temp/hive-user/hive_2015*.
After that, I noticed my four node cluster is responding very slow and having the following issue.
I re-started my cluster, it worked fine for 3-4 hours, and then again it started giving same issue as follows:
Hadoopdfs health page is getting loaded very slowly.
File browsing is very slow
Namenode logs getting full with "Blocks does not belongs to any File".
All operation to my cluster is slow.
I found it could be because of I deleted hdfs files, according to HDFS JIRA- 7815 and 7480, as I deleted huge numbers of file Namenode could not delete blocks properly. As Namenode was busy with multiple deletions tasks. This is an existing bug with older version of Hadoop (older than 2.6.0).
Can anyone please suggest quick fix without upgrading my hadoop cluster or patch installation?
How can I identify those orphan blocks and delete them from Hadoop FS?
I am installing a new hadoop cluster(total 5 nodes) using the Ambari dashboard. While deploying the cluster it fails but with warnings of disk space issues and error messages like '/' needs atleast 2GB of diskspace for mount. But I have allocated total 50GB of disk to each node. Upon googling for the solution I found that I need to make diskspacecheck=0 in the etc/yum.conf file as suggested in the below link(point 3.6):
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_troubleshooting/content/_resolving_ambari_installer_problems.html
But I am using ubuntu image in the nodes and there is no yum file. And I didn't get any file with "diskspacecheck" parameter. Can anybody tell me how to solve this issue and successfully deploy my cluster?
We are using cloudera CDH 5.3. I am facing a problem wherein the size of "/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr" and "dncp-vlock-verification.log.prev" keeps increasing to TBs within hours. I read in some of the blogs and they mention it is an HDFS bug. A temporary solution to this problem is to stop the datanode services and delete these files. But we have observed that the log file increases in size on either of the datanodes (even on the same node after deleting it). Thus, it requires continuous monitoring.
Does anyone have a permanent solution to this problem?
One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.
Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.