hadoop BlockMissingException - hadoop

I am getting below error:
Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 file=/user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar
Failing this attempt. Failing the application.
Although I have set replication factor 3 for /user/oozie/share/lib/ directory. All the jars under this path are available on 3 datanode but few jars are missing.
Can any body suggest why this is happening and how to prevent this.

I was getting the same exception while trying to read a file from hdfs. The solution under the section "Clients use Hostnames when connecting to DataNodes" from this link worked for me:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html#Clients_use_Hostnames_when_connecting_to_DataNodes
I added this XML block to "hdfs-site.xml" and restarted the datanode and namenode servers:
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>

please check the file's owner in hdfs directory, I met this issue because the owner is "root", it got solved when I changed it to "your_user".

Got the same error when using Trino to connect to hive, I tried to connect HDFS from a Trino worker and found that port 9866 is not open on HDFS, opened the port on HDFS datenode and solved the problem. Related document: https://www.ibm.com/docs/en/spectrum-scale-bda?topic=requirements-firewall-recommendations-hdfs-transparency https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Related

Can't get Master Kerberos principal for use as renewer for Talend Batch Jobs

we are trying to use talend batch (spark) jobs to access hive in a Kerberos cluster but we are getting the below "Can't get Master Kerberos principal for use as renewer" error.
By using the standard jobs(non spark) in talend we are able to access hive without any issue.
Below are the observation:
When we are running spark jobs talend could able to connect to hive
metastore and validating the syntax. ex if I provide the wrong table
name it does return "table not found".
when we select count(*) from table where there is no data it returns
"NULL" but if some data present in Hdfs(table) It failed with the error
"Can't get Master Kerberos principal for use as renewer".
I am not sure exactly what is the issue which is causing the token problem. could some one help us know the root cause.
One more thing to add instead of hive if I read / write to hdfs using spark batch jobs it works , So only problem is with hive and Kerberos.
You should include the hadoop config in the classpath (:/path/hadoop-configuration). You should include all configuration files in that hadoop configuration directory, not only the core-site.xml and hdfs-site.xml files. It happened to me and that solved the problem.
the same problem when I start spark on k8s,
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: Can't get Master Kerberos principal for use as renewer
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:133)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:243)
at org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:52)
at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(WholeTextFileRDD.scala:54)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
and I just add yarn-site.xml to the HADOOP_CONFIG_DIR.
the yarn-site.xml only contains yarn.resourcemanager.principal
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/_HOST#DM.COM</value>
</property>
</configuration>
this working for me.

Access hdfs from outside the cluster

I have a hadoop cluster on aws and I am trying to access it from outside the cluster through a hadoop client. I can successfully hdfs dfs -ls and see all contents but when I try to put or get a file I get this error:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.FsShell.displayError(FsShell.java:304)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:289)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
I have hadoop 2.6.0 installed in both my cluster and my local machine. I have copied the conf files of the cluster to the local machine and have these options in hdfs-site.xml (along with some other options).
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
My core-site.xml contains a single property in both the cluster and the client:
<property>
<name>fs.defaultFS</name>
<value>hdfs://public-dns:9000</value>
<description>NameNode URI</description>
</property>
I found similar questions but wasn't able to find a solution to this.
How about you SSH into that machine?
I know this is a very bad idea but to get the work done, you can first copy that file on machine using scp and then SSH into that cluster/master and do hdfs dfs -put on that copied local file.
You can also automate this via a script but again, this is just to get the work done for now.
Wait for someone else to answer to know the proper way!
I had similar issue with my cluster when running hadoop fs -get and I could resolve it. Just check if all your data nodes are resolvable using FQDN(Fully Qualified Domain Name) from your local host. In my case nc command was successful using ip addresses for data nodes but not with host name.
run below command :
for i in cat /<host list file>; do nc -vz $i 50010; done
50010 is default datanode port
when you run any hadoop command it try to connect to data nodes using FQDN and thats where it gives this weird NPE.
Do below export and run your hadoop command
export HADOOP_ROOT_LOGGER=DEBUG,console
you will see this NPE comes when it is trying to connect to any datanode for data transfer.
I had a java code which was also doing hadoop fs -get using APIs and there ,exception was more clearer
java.lang.Exception: java.nio.channels.UnresolvedAddressException
Let me know if this helps you.

How do I run sqlline with Phoenix?

When I try to run Phoenix's sqlline.py localhostcommand, I get
WARN util.DynamicClassLoader: Failed to identify the fs of
dir hdfs://localhost:54310/hbase/lib, ignored
java.io.IOException: No FileSystem for scheme:
hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass...
and nothing else happens. I also could not get Squirrel to work (it freezes when I click 'list drivers').
As per these instructions, I have copied phoenix-4.2.1-server.jar to my hbase/lib folder and restarted hbase. I have also copied core-site.xml and hbase-site.xml to my phoenix/bin directory.
I have not added 'the phoenix-[version]-client.jar to the classpath of any Phoenix client'
since I do not know what this refers to.
I am using HBase 0.98.6.1-hadoop2, Phoenix 4.2.1 and hadoop 2.2.0.
I fix the same issue by adding setting in
${PHOENIX_HOME}/bin/hbase-site.xml
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

cdh4.3,Exception from the logs ,after ./start-dfs.sh ,datanode and namenode start fail

here is the logs from hadoop-datanode-...log:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1421227885-192.168.2.14-1371135284949 (storage id DS-30209445-192.168.2.41-50010-1371109358645) service to /192.168.2.8:8020
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-30209445-192.168.2.41-50010-1371109358645, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-f16e4a3e-4776-4893-9f43-b04d8dc651c9;nsid=1710848135;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:648)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3498)
my mistake:namenode can start,datanode can't start
I saw this once too, the namenode server needs to do a reverse lookup request ,
so an nslookup 192.168.2.41 should return a name, it doesn't so 0.0.0.0 is also recorded
You don't need to hardcode address into /etc/hosts if you have dns working correctly (i.e. the in-addr.arpa file matches the entries in domain file) But if you don't have dns then you need to help hadoop out.
There seems to be a Name Resolution issue.
Datanode denied communication with namenode:
DatanodeRegistration(0.0.0.0,
storageID=DS-30209445-192.168.2.41-50010-1371109358645,
infoPort=50075, ipcPort=50020,
Here DataNode is identifying itself as 0.0.0.0.
Looks like dfs.hosts enforcement. Can you recheck on your NameNode's hdfs-site.xml configs that you are surely not using a dfs.hosts file?
This error may arise if the datanode that is trying to connect to the namenode is either listed in the file defined by dfs.hosts.exclude or that dfs.hosts is used and that datanode is not listed within that file. Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes. Restart hadoop after that and run hadoop dfsadmin -refreshNodes.
HTH
Reverse DNS lookup is required when a datanode tries to register with a namenode. I got the same exceptions with Hadoop 2.6.0 because my DNS does not allow reverse lookup.
But you can disable Hadoop's reverse lookup by setting this configuration "dfs.namenode.datanode.registration.ip-hostname-check" to false in hdfs-site.xml
I got this solution from here and it solved my problem.

hadoop hdfs points to file:/// not hdfs://

So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.

Resources