Failed to start data node in Hadoop Cluster - hadoop

I am trying to installing CDH 4.6 in my cluster which is of 3 nodes.
One data node out of this 3 is not able to start at all.
Tried searching and solving this by all possible ways, but failed.
Please help me in solving this.
Below is the log.
5:49:10.708 PM FATAL org.apache.hadoop.hdfs.server.datanode.DataNode
Exception in secureMain
java.io.IOException: the path component: '/' is world-writable. Its permissions are 0777. Please fix this or select a different socket path.
at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:191)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:42)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:603)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:570)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:741)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:344)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1728)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1751)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1904)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1925)
5:49:10.723 PM INFO org.apache.hadoop.util.ExitUtil
Exiting with status 1
5:49:10.725 PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at xx.xx.xxx.xxxxx

Have you confirmed that your root filesystem is not set to 777 permissions?
This should be the correct permissions for root(/):
[root#server ~]# ls -Ald /
dr-xr-xr-x. 29 root root 4096 Feb 20 13:53 /
If you see this, then your root filesystems need to be chmod 555:
[root#server ~]# ls -Ald /
drwxrwxrwx. 29 root root 4096 Feb 20 13:53 /

Changing permission to 755 of root filesystem will resolve issue

Related

Not able to configure databricks with external hive metastore

I am following this document https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options
to connect to my external hive metastore. My metastore version is 3.1.0 and followed the document.
docs.databricks.comdocs.databricks.com
External Apache Hive metastore — Databricks Documentation
Learn how to connect to external Apache Hive metastores in Databricks.
10:51
I have getting this error when trying to connect to external hive metastore
org/apache/hadoop/hive/conf/HiveConf when creating Hive client using classpath:
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars
spark.sql.hive.metastore.jars=/databricks/hive_metastore_jars/*
When I do an ls on /databricks/hive_metastore_jars/, I can see all copied files
10:52
Do I need to copy any hive specific files and upload it in this folder?
I did exactly what was mentioned in the site
These are the contents of my hive_metastore_jars
total 56K
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 1585025573715-0
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 d596a6ec-e105-4a6e-af95-df3feffc263d_resources
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 repl
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-2959157d-2018-441a-a7d3-d7cecb8a645f
drwxr-xr-x 4 root root 4.0K Mar 24 05:06 root
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-30a72ee5-304c-432b-9c13-0439511fb0cd
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-a19d167b-d571-4e58-a961-d7f6ced3d52f
-rwxr-xr-x 1 root root 5.5K Mar 24 05:06 _CleanRShell.r3763856699176668909resource.r
-rwxr-xr-x 1 root root 9.7K Mar 24 05:06 _dbutils.r9057087446822479911resource.r
-rwxr-xr-x 1 root root 301 Mar 24 05:06 _rServeScript.r1949348184439973964resource.r
-rwxr-xr-x 1 root root 1.5K Mar 24 05:06 _startR.sh5660449951005543051resource.r
Am I missing anything?
Strangely If I look into the cluster boot logs here is what I get
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionDriverName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionURL unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionUserName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionPassword unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.schema.autoCreateAll unknown - will be ignored
20/03/24 07:29:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
20/03/24 07:29:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
I have already set the above configurations and it shows in the logs as well
20/03/24 07:28:59 INFO SparkContext: Spark configuration:
spark.hadoop.javax.jdo.option.ConnectionDriverName=org.mariadb.jdbc.Driver
spark.hadoop.javax.jdo.option.ConnectionPassword=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionURL=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionUserName=*********(redacted)
Also version information is available in my hive metastore, I can connect to mysql and see it it shows
SCHEMA_VERSION : 3.1.0
VER_ID = 1
From the output, it looks like the jars are not copied to the "/databricks/hive_metastore_jars/" location. As mentioned in the documentation link you shared:
Set spark.sql.hive.metastore.jars set to maven
Restart the cluster with the above configuration and then check in the Spark driver logs for the message :
17/11/18 22:41:19 INFO IsolatedClientLoader: Downloaded metastore jars to <path>
From this location copy the jars to DBFS from the same cluster and then use an init script to copy the jars from DBFS to "/databricks/hive_metastore_jars/"
As I am using azure mysql there is one more step I need to perform
https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

How to run pig scripts from HDFS?

I am trying to run pig script from the hdfs but it shows error as the file does not exist.
My hdfs Directory
[cloudera#quickstart ~]$ hdfs dfs -ls /
Found 11 items
drwxrwxrwx - hdfs supergroup 0 2016-08-10 14:35 /benchmarks
drwxr-xr-x - hbase supergroup 0 2017-08-19 23:51 /hbase
drwxr-xr-x - cloudera supergroup 0 2017-07-13 04:53 /home
drwxr-xr-x - cloudera supergroup 0 2017-08-27 07:26 /input
drwxr-xr-x - cloudera supergroup 0 2017-07-30 14:30 /output
drwxr-xr-x - solr solr 0 2016-08-10 14:37 /solr
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 11:59 /success.pig
-rw-r--r-- 1 cloudera supergroup 273 2017-08-27 12:04 /success.script
drwxrwxrwt - hdfs supergroup 0 2017-08-27 12:07 /tmp
drwxr-xr-x - hdfs supergroup 0 2016-09-28 09:00 /user
drwxr-xr-x - hdfs supergroup 0 2016-08-10 14:37 /var
Command executed
[cloudera#quickstart ~]$ pig -x mapreduce /success.pig
Error Message
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-08-27 12:34:39,160 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-08-27 12:34:39,162 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1503862479069.log
2017-08-27 12:34:47,079 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /success.pig does not exist
Details at logfile: /home/cloudera/pig_1503862479069.log
What am I missing ?
You may use -f <script location> option and option value to run script located at HDFS path. But script location need to be absolute path as given in following syntax and example.
Syntax:
pig -f <fs.defaultFS>/<script path in hdfs>
Example:
pig -f hdfs://Foton/user/root/script.pig

Hadoop Edge HDFS points to local FS

I have done my Hadoop cluster including 1 NameNode and 2 DataNodes and everything works perfectly :)
Now, I want to add a Hadoop Edge (aka Hadoop Gateway), I followed instructions here and finally, I execute :
hadoop fs -ls /
But unfortunately, I expected to see my HDFS's content but I see my local FS :
Found 22 items
-rw-r--r-- 1 root root 0 2017-03-30 16:44 /autorelabel
dr-xr-xr-x - root root 20480 2017-03-30 16:49 /bin
...
drwxr-xr-x - root root 20480 2016-07-08 17:31 /home
I think my core-site.xml is configurated as needed with specific property :
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnodemaster1:8020/</value>
</property>
hadoopmaster1 is my namenode and is reachable ..
I don't understand why I see my Local FS and not my HDFS .. Thank you :)

Unable to start datanode and file permissions of datanode are changing when start-dfs.sh is started

I was facing issues in deploying local files to hdfs and found that I should have "drwx------" for datanode and namenode.
Initial permission status of datanode and namenode in hdfs.
drwx------ 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
Permission of datanode is changed to 755
hduser#pradeep:~$ chmod -R 755 /usr/local/hadoop_store/hdfs/
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
After initiating start-dfs.sh, datanode didn't start and permissions to datanode were restored to original state.
hduser#pradeep:~$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop- hduser-namenode-pradeep.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-pradeep.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-pradeep.out
hduser#pradeep:~$ jps
4385 Jps
3903 NameNode
4255 SecondaryNameNode
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwx------ 3 hduser hadoop 4096 Mar 2 22:34 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 22:34 namenode
As datanode is not running i am not able to deploy data to hdfs from local file system. I couldn't understand or find any reason why the file permissions are restored to previous state only for datanode folder.
it appears the name space id generated by the NameNode is different from your DataNode.
Solution:
if you goto the path where your hadoop files are stored on the local file system.
forexample /usr/local/hadoop. go down the path to /usr/local/hadoop/tmp/dfs/name/version. copy the namespaceid and take it to the path /usr/local/hadoop/tmp/dfs/data/version , replace the namespaceid.
i hope this helps.

Cloudera hdfs another namenode already locked the storage directory

I am running CDH-5.3.2-1.cdh5.3.2.p0.10 with ClouderaManager on Centos 6.6.
My HDFS service was working on a Cluster. But I wanted to change the mounting point for the hadoop data. Yet without success, so I came with the idea to rollback all changes, but the previous configuration doesnt work what is discouraging.
I have two nodes within the cluster. One node for data is bad DataNodes Health Bad.
In the log I have got a few errors
1:40:10.821 PM ERROR org.apache.hadoop.hdfs.server.common.Storage
It appears that another namenode 931#spark1.xxx.xx has already locked the storage directory
1:40:10.821 PM INFO org.apache.hadoop.hdfs.server.common.Storage
Cannot lock storage /dfs/nn. The directory is already locked
1:40:10.821 PM WARN org.apache.hadoop.hdfs.server.common.Storage
java.io.IOException: Cannot lock storage /dfs/nn. The directory is already locked
1:40:10.822 PM FATAL org.apache.hadoop.hdfs.server.datanode.DataNode
Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to spark1.xxx.xx/10.10.10.10:8022. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:463)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
I have been trying many possible solutions but without any luck.
formatting hadoop namenode -format
stopping cluster and rm -rf /dfs/* [and reformatting]
some adjustments to /dfs/nn/current/VERSION file
removing in_use.lock file and starting only a lacking node
removing a file in /tmp/hsperfdata_hdfs/ with name like the pid locking the directory.
There are files in the directory
[root#spark1 dfs]# ll
total 8
drwxr-xr-x 3 hdfs hdfs 4096 Apr 28 13:39 nn
drwx------ 3 hdfs hadoop 4096 Apr 28 13:40 snn
There is no dn dir what is a bit interesting.
All operations on hdfs files I perform as an hdfs user.
In the file /etc/hadoop/conf/hdfs-site.xml there is
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///dfs/nn</value>
</property>
Here is a similar thread of CDH users google group which might help you : https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/FYu0gZcdXuE
Also did you do the namenode format from cloudera manager or command line ? Ideally you should be doing it through cloudera manager and not command line.

Resources