hdfs get fails on a particular client node

hdfs get fails on a particular client node - hadoop

I have a strange problem with HDFS. While get operations on an existant file work like a charm on all clients accessing a HDFS cluster, it fails on one client:
Working host:
[user#host1]$ hadoop fs -ls /path/to/file.csv
found 1 items
-rw-r--r-- 3 compute supergroup 1628 2013-12-10 12:22 /path/to/file.csv
[user#host1]$ hadoop fs -get /path/to/file.csv /tmp/test.csv
[user#host1]$ cat /tmp/test.csv
48991,24768,2013-12-10 00:00:00,1,0.0001,0.0001
Not working host:
[user#host2]$ hadoop fs -ls /path/to/file.csv
Found 1 items
-rw-r--r-- 3 compute supergroup 1628 2013-12-10 12:22 /path/to/file.csv
[user#host2]$ hadoop fs -get /path/to/file.csv /tmp/test.csv
get: java.lang.NullPointerException
[user#host2]$ cat /tmp/test.csv
cat: /tmp/test.csv: No such file or directory
Using a java hdfs client on working host:
[user#host1]$ java -jar hadoop_get-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://my.namenode:port /path/to/file.csv
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
48991,24768,2013-12-10 00:00:00,1,0.0001,0.0001
Using a java hdfs client on non working host:
[user#host2]$ java -jar hadoop_get-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://my.namenode:port /path/to/file.csv
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:154)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:957)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:947)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:171)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:138)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:131)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1104)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:246)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:79)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:711)
at my.namespace.client.Client.main(Client.java:34)

This was resolved for us by deploying client configurations, refreshing cluster, and restarting HDFS.

Are you using CDH4? We've got the same problem after upgrading from CDH3.
Try researching reverse DNS lookup name for the problem host - we'd found difference with problem host and hosts with no problems only in DNS resolving. After fixing it - all is ok.

Related

Hive does not launch in Ubuntu 20.04 and throws error

I am setting up hive after I setup hadoop on my personal machine. However after I install all the steps to configure hive I am still unable to use hive. I see the following error message:
hadoop#ub:/usr/local/hive/lib$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = d7574f3f-4ec1-4fac-bbbd-631c5f698e3c
Exception in thread "main" java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
When I run jps I get the following:
hadoop#acharya-ub:/usr/local/hive/lib$ jps
4866 Jps
28644 SecondaryNameNode
28117 NameNode
28965 ResourceManager
29353 NodeManager
28349 DataNode
After searching over the internet, I found that running hiverser2 helped them but it hasn't helped me and I still get the same error message even after I start hiveserver2 service.
I am using java 11, hadoop-3.3.0 and apache-hive-3.1.2
I created symbolic links for the hadoop and hive locations:
lrwxrwxrwx 1 root root 23 Sep 1 22:39 hadoop -> /usr/local/hadoop-3.3.0
drwxr-xr-x 11 hadoop hadoop 4096 Sep 1 22:46 hadoop-3.3.0
drwxrwxr-x 10 hadoop hadoop 4096 Sep 1 23:02 apache-hive-3.1.2-bin
lrwxrwxrwx 1 root root 33 Sep 1 23:03 hive -> /usr/local/apache-hive-3.1.2-bin/
I then setup the following in my .bashrc file:
HADOOP_HOME=/usr/local/hadoop
HIVE_HOME=usr/local/hive
Can someone help please?

SparkR Error: The root scratch dir: /tmp/hive on HDFS should be writable

I'm trying to initialize SparkR but I'm getting a permissions error. My Spark Version is spark-2.2.1-bin-hadoop2.6. I have searched for this error and how to solve it and I have found several related topics. However, I'm not able to solve it using the same approach that in those topics, the solution they give (and the one I tried) is giving permisions to the /tmp/hive directory using the following command:
sudo -u hdfs hadoop fs -chmod -R 777 /tmp/hive
Can anyone with enough knowledge give me another possible solution?
The error stacktrace is the following one:
$ sudo ./bin/sparkR
R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Launching java with spark-submit command /opt/cloudera/parcels/spark-2.2.1-bin-hadoop2.6/bin/spark-submit "sparkr-shell" /tmp/RtmpecLPo8/backend_port4be122057a03
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/12/19 12:53:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/19 12:53:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/12/19 12:53:23 ERROR RBackendHandler: getOrCreateSparkSession on org.apache.spark.sql.api.r.SQLUtils failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:167)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:108)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:40)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:71)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:70)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.sql.api.r.SQLUtils$.setSparkContextSessionConf(SQLUtils.scala:70)
at org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:63)
at org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
... 36 more
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
... 52 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 61 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 75 more
Error in handleErrors(returnStatus, conn) :
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:71)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:70)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.sca
The result of hadoop fs -ls /tmp:
$ hadoop fs -ls /tmp
Found 5 items
drwxrwxrwx - hdfs supergroup 0 2017-12-19 14:47 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - yarn supergroup 0 2017-11-07 12:36 /tmp/hadoop-yarn
drwx--x--x - hbase supergroup 0 2017-09-07 10:44 /tmp/hbase-staging
drwx-wx-wx - josholsan supergroup 0 2017-12-19 13:09 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2017-09-12 09:34 /tmp/logs
Thanks you so much in advance!!!

Since your error permissions do not match the output of the file system, sounds like you downloaded Spark but didn't configure it, therefore it's defaulting to local disk
First, try using spark-shell alone from CDH installation to run a smoketest.
I think Cloudera includes SparkR (they just don't officially support it). I don't see a reason why they would remove it from the installation.
My Spark Version is spark-2.2.1-bin-hadoop2.6.
You downloaded the version that includes hadoop (based on the end of the filename). Since you say you set it up on your cluster, you should use the download option without precompiled Hadoop. And unless it's actually a Cloudera parcel, don't place it in that /opt/cloudera/parcels directory.
Then, once you have that, extract it somewhere, and open conf/spark-env.sh (copy the template to this file)
Update the values to at least contain the same information as the other Spark installation that come with CDH
Ensure HADOOP_CONF_DIR points at the configuration directory of Hadoop on your system. /etc/hadoop/conf/

Hadoop\HDFS: "no such file or directory"

I have installed Hadoop 2.2 on a single machine using this tutorial: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Some details were changed a little bit - for example, I used java 8, /hadoop root dir etc. Users, SSH, config keys - the same.
Namenode was successfully formatted:
13/12/22 05:42:31 INFO common.Storage: Storage directory /hadoop/tmp/dfs/name has been successfully formatted.
13/12/22 05:42:31 INFO namenode.FSImage: Saving image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/12/22 05:42:32 INFO namenode.FSImage: Image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
13/12/22 05:42:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/12/22 05:42:32 INFO util.ExitUtil: Exiting with status 0
13/12/22 05:42:32 INFO namenode.NameNode: SHUTDOWN_MSG:
However, not 'mkdir' neither even 'ls' command worked:
$ /hadoop/hadoop/bin/hadoop fs -ls
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
13/12/22 05:39:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
Thanks for any help guys.

Try
hadoop fs -ls /
Tested on hadoop 2.4

In Hadoop 2.4
hdfs dfs -mkdir /input
hdfs dfs -ls /

Worked in my case:
First Get hadoop installed path by :
echo ${HADOOP_INSTALL} //in my case output is : `/user/local/hadoop`
Then create directory at your hadoop installed path, If you know your hadoop installed directory ignore above command
hadoop fs -mkdir -p /user/local/hadoop/your_directory
Here hadoop is directory
Tested on hadoop 2.4

I have verified this worked in Hadoop 2.5
hdfs dfs -mkdir /input
(where /input is the HDFS directory)

Why would Hadoop hftp serve directories but not files?

I'm trying to move files from one cluster to another using distcp, using the hftp protocol as specified in their instructions.
I can read directories over hftp, but when I attempt to get a file I get a 500 (internal server error). To eliminate the possibility of network and firewall issues, I'm using hadoop fs -ls and hadoop fs -cat commands on the source server in order to attempt to figure out this issue.
This provides a directory of the files:
hadoop fs -ls logfiles/day_id=19991231/hour_id=1999123123
-rw-r--r-- 3 username supergroup 812 2012-12-16 17:21 logfiles/day_id=19991231/hour_id=1999123123/000008_0
This gives me a "file not found" error, which it should because the file isn't there:
hadoop fs -cat hftp://hserver.domain.com:50070/user/username/logfiles/day_id=19991231/hour_id=1999123123/000008_0x
cat: `hftp://hserver.domain.com:50070/user/username/logfiles/day_id=19991231/hour_id=1999123123/000008_0x': No such file or directory
This line gives me a 500 internal server error. The file is confirmed on the server.
hadoop fs -cat hftp://hserver.domain.com:50070/user/username/logfiles/day_id=19991231/hour_id=1999123123/000008_0
cat: HTTP_OK expected, received 500
Here is a stack trace of what distcp logs when I attempt this:
java.io.IOException: HTTP_OK expected, received 500
at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:365)
at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:424)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Can someone tell me why hftp is failing to serve files?

I ran into the same issue and eventually found a solution.
Everythning is explained here in details: http://www.swiss-scalability.com/2015/01/hadoop-hftp-returns-error-httpok.html
But in a nutshell, we're probably binding the NameNode RPC on a wildcard address (i.e. dfs.namenode.rpc-address point to the IP of the interface, and not 0.0.0.0).
Does not work with HFTP:
<property>
<name>dfs.namenode.rpc-address</name>
<value>0.0.0.0:8020</value>
</property>
Works with HFTP:
<property>
<name>dfs.namenode.rpc-address</name>
<value>10.0.1.2:8020</value>
</property>

No data nodes are started

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:
http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html
After running the start-all.sh script I run "jps".
I get this output:
4825 NameNode
5391 TaskTracker
5242 JobTracker
5477 Jps
5140 SecondaryNameNode
When I try to add information to the hdfs using:
bin/hadoop fs -put conf input
I got an error:
hadoop#m1a2:~/software/hadoop$ bin/hadoop fs -put conf input
12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting...
put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1030)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.
Does anybody know what I have done wrong, or how to fix this problem?
EDIT: This is the datanode.log file:
2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = m1a2/139.147.5.55
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55
************************************************************/

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
From that page:
At the moment, there seem to be two workarounds as described below.
Workaround 1: Start from scratch
I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:
Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster
When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.
Workaround 2: Updating namespaceID of problematic DataNodes
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:
Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode
If you followed the instructions in my tutorials, the full path of the relevant files are:
NameNode: /app/hadoop/tmp/dfs/name/current/VERSION
DataNode: /app/hadoop/tmp/dfs/data/current/VERSION
(background: dfs.data.dir is by default set to
${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir
in this tutorial to /app/hadoop/tmp).
If you wonder how the contents of VERSION look like, here’s one of mine:
# contents of /current/VERSION
namespaceID=393514426
storageID=DS-1706792599-10.10.10.1-50010-1204306713481
cTime=1215607609074
storageType=DATA_NODE
layoutVersion=-13

Okay, I post this once more:
In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.

I had the same issue on pseudo node using hadoop1.1.2
So I ran bin/stop-all.sh to stop the cluster
then saw the configuration of my hadoop tmp directory in hdfs-site.xml
<name>hadoop.tmp.dir</name>
<value>/root/data/hdfstmp</value>
So I went into /root/data/hdfstmp and deleted all the files using command (you may loose ur data)
rm -rf *
and then format namenode again
bin/hadoop namenode -format
and then start the cluster using
bin/start-all.sh
Main reason is bin/hadoop namenode -format didn't remove the old data. So we have to delete it manually.

Do following steps:
1. bin/stop-all.sh
2. remove dfs/ and mapred/ folder of hadoop.tmp.dir in core-site.xml
3. bin/hadoop namenode -format
4. bin/start-all.sh
5. jps

Try formatting your datanode and restart it.

I have been using CDH4 as my version of hadoop and have been having trouble configuring it. Even after trying to reformat my namenode I was still receiving the error.
My VERSION file was located in
/var/lib/hadoop-hdfs/cache/{username}/dfs/data/current/VERSION
You can find the location of the HDFS cache directory by looking for the hadoop.tmp.dir property:
more /etc/hadoop/conf/hdfs-site.xml
I found that by doing
cd /var/lib/hadoop-hdfs/cache/
rm -rf *
and then reformatting the namenode I was finally able to fix the issue. Thanks to the first reply for helping me to figure out what folder I needed to bomb.

I tried with the approach 2 as suggested by Jared Stehler in the Chris Shain answer and i can confirm that after doing these changes , i was able to resolve the above mentioned problem.
I used the same version number for both the name and data VERSION file. Mean to say copied the version number from file VERSION inside (/app/hadoop/tmp/dfs/name/current) to VERSION inside (/app/hadoop/tmp/dfs/data/current) and it worked like charm
Cheers !

I encountered this issue when using an unmodified cloudera quickstart vm 4.4.0-1
For reference, the cloudera manager said my datanode was in good health, even though the error message in the DataStreamer stacktrace said no datanodes were running.
credit goes to workaround #2 from https://stackoverflow.com/a/10110369/249538 but i'll detail my specific experience using the cloudera quickstart vm.
Specifically, I did:
in this order, stop the services hue1, hive1, mapreduce1, hdfs1
via the cloudera manager http://localhost.localdomain:7180/cmf/services/status
found my VERSION files via:
sudo find / -name VERSION
i got:
/dfs/dn/current/BP-780931682-127.0.0.1-1381159027878/current/VERSION
/dfs/dn/current/VERSION
/dfs/nn/current/VERSION
/dfs/snn/current/VERSION
i checked the contents of those files, but they all had a matching namespaceID except one file was just totally missing it. so i added an entry to it.
then i restarted the services in reverse order via the cloudera manager.
now i can -put stuff onto hdfs.

In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is
<property>
<name>dfs.name.dir</name>
<value>/path/to/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/path/to/data</value>
</property>

I have the same problem with datanode missing
and i follow this step that worked for me
1.find the folder that datanode located in.
cd hadoop/hadoopdata/hdfs
2.look in the folder and you will see what file you have in hdfs
ls
3.delete the datanode folder because it is old version of datanode
rm -rf/datanode/*
4. you will get the new version after run the previous command
5. start new datanode
hadoop-daemon.sh start datanode
6. refresh the web services. You will see the lost node appears
my terminal

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio