Continously shows Capacity used 90% - hortonworks-data-platform

Continously shows Capacity used 90% - hortonworks-data-platform

I've two questions.
How to mount the directory for Ambari disk usage.
I started to run the tera gen program and it does not go beyond 10% map tasks, Ambari continously shows me the message that: Capacity Used: [90.69%, 27.7 GB], Capacity Total: [30.5 GB], path=/usr/hdp I restarted the cluster, restarted Ambari but no use.
What is the way around?

Well,
After a few trial error I found the solution for the same.
You can change the location of log and local directories to bigger place
Remove the old log files from Ambari server.
Documented here.

Related

YARN : 1/1 local-dirs are bad Alert

I have an issue where all 3 node managers in my cluster are marked as bad with local dirs bad alerts.
I have seen many answers where it says, this error is due to YARN reaching its maximum default disk threshold which is 90%, but I can assure I have plenty of space on the YARN disk. (just 35% of the disk is used). I suspect the YARN directory is corrupted.
Does anyone know of this alert/solution other than YARN reaching its disk threshold value??

I got the solution to this issue.
There was no write permission on the folder to other users except OWNER, I granted the write permission on YARN folder to YARN and I could run the map reduce job. All 3 node managers are healthy now.

There are also other scenarios when this would happen:
disk is bad or going bad
disk has space, but has exhausted it's inodes
disk was mounted read-only
disk is a NFS mounted and the NFS server is down or lost connectivity

HDFS Showing 0 Blocks after cluster reboot

I've setup a small cluster for testing / academic proposes, I have 3 nodes, one of which is acting both as namenode and datanode (and secondarynamenode).
I've uploaded 60GB of files (about 6.5 Million files) and uploads started to get really slow, so I read on the internet that I could stop the secondary namenode service on the main machine, at the moment it had no effect on anything.
After I rebooted all 3 computers, two of my datanodes show 0 blocks (despite showing disk usage in web interface) even with both namenodes services running.
One of the nodes with problem is the one running the namenode as well so I am guessing it is not a network problem.
any ideas on how can I get these blocks to be recognized again? (without start it all over again which took about two weeks to upload all)
Update
After half an hour after another reboot this showed in the logs:
2018-03-01 08:22:50,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x199d1a180e357c12, containing 1 storage report(s), of which we sent 0. The reports had 6656617 total blocks and used 0 RPC(s). This took 679 msec to generate and 94 msecs for RPC and NN processing. Got back no commands.
2018-03-01 08:22:50,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.io.EOFException: End of File Exception between local host is: "Warpcore/192.168.15.200"; destination host is: "warpcore":9000; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
And the EOF stack trace, after searching the web I discovered this [http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-5-5-0-datanode-failed-to-send-a-large-block-report/m-p/34420] but still can't understand how to fix this.
The report block is too big and need to be split, but I don't know how or where should I configure this. I´m googling...

The problem seems to be low RAM on my namenode, as a workaround I added more directories to the namenode configuration as if I had multiple disks and rebalanced the files manually as instructed ins the comments here.
As hadoop 3.0 reports each disk separately the datenode was able to report and I was able to retrieve the files, this is an ugly workaround and not for production, but good enough for my academic purposes.
An interesting side effect was the datanode reporting multiple times the available disk space wich could lead into serious problems on production.
It seems a better solution is using HAR to reduce the number of blocks as described here and here

Hadoop: Why I got "Max Non Heap Memory is -1 B." message in Namenode Information Web? What it does mean?

I have a cluster to use Hadoop 2.6.5 (one master that works as namenode and datanode, and two slaves) that I made using VirtualBox (easch node has Xubuntu 16.04 installed).
A priori, the installation is right because I ran a wordcount example and it was OK.
In master:50070 (where I see the namenode information), I get this:
"Max Non Heap Memory is -1 B."
Do you know what does it mean? I couldn't find the answer, and I want to check it out because after running wordcount I tried to run my own program and it was not succesfull, in despite of it run OK in my single node installation of Hadoop.
I hope to be clear, please let me know if you need more information.
Thank you!

Even though I haven't used Hadoop that much, when I go to the Web UI I get the same message "Max Non Heap Memory is -1 B.", as far as I know that only means that the memory usage is undefined.
Link to source of my answer

ambari metrics diskspacecheck issue

I am installing a new hadoop cluster(total 5 nodes) using the Ambari dashboard. While deploying the cluster it fails but with warnings of disk space issues and error messages like '/' needs atleast 2GB of diskspace for mount. But I have allocated total 50GB of disk to each node. Upon googling for the solution I found that I need to make diskspacecheck=0 in the etc/yum.conf file as suggested in the below link(point 3.6):
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_troubleshooting/content/_resolving_ambari_installer_problems.html
But I am using ubuntu image in the nodes and there is no yum file. And I didn't get any file with "diskspacecheck" parameter. Can anybody tell me how to solve this issue and successfully deploy my cluster?

Region server geting down frequently after system start

I am running hbase on HDP on Amazon machine,
When i reboot my system and start all hbase services, it get started.
But after some time my region server get down.
Latest error that i am getting from its log file is that
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /apps/hbase/data/usertable/dd5a251551619e0109349a0dce855e1b/recovered.edits/0000000000000001172.temp could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1657)
Now i am not able to start it.
Any suggestion why it is happing.
Thanks in advance.

Make sure you datanodes are up and running. Also, set "dfs.data.dir" to some permanent location, if you haven't done it yet. It defaults to the "/tmp" dir which gets emptied at each restart. Also, make sure that your datanodes are able to talk to the namenode and there is no network related issue and the datanode machines have enough free space left.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio