java.io.IOException: Incomplete HDFS URI - hadoop

I am unable to find the HDFS path and save the log files of Twitter. Also it gives two warnings.
WARN: HBASE_HOME not found
WARN: HIVE_HOME not found
The error is:
java.io.IOException: Incomplete HDFS URI, no host: hdfs://l27.0.0.1:9000/tweets/movies/2016/01/29/01/FlumeData.1454010974716.tmp
My core-site.xml is
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

Related

when creating table in hive in mac os failing with error localhost:9000 failed on connection

hive> CREATE SCHEMA IF NOT EXISTS inconv_seql;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.net.ConnectException Call From User-MacBook-Air.local/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused)
localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused;
the above error is due to hadoop demons with the ports 9000 are not running in your local machine,
please start hadoop and then start hive by following the below steps.
1. check hadoop is running,
hduser#ubuntu:~$ jps
if you could not find any hadoop daemons running in your local, then start hadoop follow the below command,
hduser#ubuntu:~$ $HADOOP_HOME/sbin/start-all.sh
2. check the hive-site.xml,core-site.xml
hive-site.xml
<property>
<name>hive.metastore.db.type</name>
<value>DERBY</value>
<description> Expects one of [derby, oracle, mysql, mssql, postgres]. Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it. </description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:8020/user/hive/warehouse</value>
<description>location of default database for the warehouse</description> </property>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
and try to launch hive terminal and proceed.

Hadoop ignores filesystem location config on Windows 10

I am trying to install single node hadoop on Windows 10.
I have used various guides, but failed. Last one I used was https://github.com/MuhammadBilalYar/Hadoop-On-Window/wiki/Step-by-step-Hadoop-2.8.0-installation-on-Window-10
I have confogured my dfs as
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///V:/DB/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///V:/DB/hadoop/datanode</value>
</property>
Formatting went well.
Unfortunately, when I run start-all, I get in one of windows
18/04/22 21:36:17 WARN datanode.DataNode: Invalid dfs.datanode.data.dir V:\DB\hadoop\datanode :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not readable: V:\DB\hadoop\datanode
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:101)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2544)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2729)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2753)
18/04/22 21:36:17 ERROR datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/V:/DB/hadoop/datanode/"
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2631)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2544)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2729)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2753)
Looks like it has problems with local Windows path specification. What to do?

org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length

I have set up hadoop cluster on 2 machines.
One machine has both master and slave-1.
2nd machine has slave-2.
When I started the cluster with start-all.sh, I got following error in secondarynamenode's .out file:
java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; Host Details : local host is: "ip-10-179-185-169/10.179.185.169"; destination host is: "hadoop-master":9000;
Following is my JPS output
98366 Jps
96704 DataNode
97284 NodeManager
97148 ResourceManager
96919 SecondaryNameNode
Can someone help me tackle this error ?
I also had this problem.
Please check core-site.xml
(this should be under the dir where you downloaded Hadoop, for me the path is: /home/algo/hadoop/etc/hadoop/core-site.xml)
The file should look like this:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/algo/hdfs/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Solution: using hdfs://localhost:9000 as ip:port.
Might be a problem with the port number you are using. Try this : https://stackoverflow.com/a/60701948/8504709

Namenode daemon not starting properly

I have just started learning hadoop from the book Hadoop: The definitive guide.
I followed the tutorial for Hadoop installation in Pseudodistribution mode. I enabled the passwordless login to ssh.
Formatted the hdfs filesystem before using it for the first time. It started successfully for the first time.
After that I copied a text file using copyFromLocal to HDFS and everything went fine. But if I restart the system and start the daemons again and look at the web UI , only YARN is started successfully.
When I issue the stop-dfs.sh commmand I get
Stopping namenodes on [localhost]
localhost: no namenode to stop
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
If I format the hdfs file system again and then try starting the daemons then they all start successfully.
Here are my configuration files.Exactly as what is told in hadoop definitive guide book.
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
This is the error in the namenode log file
WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop/dfs/name does not exist
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
This is from mapred log
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 33 more
I visited apache hadoop : connection refused which says
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I found there is an entry in my /etc/hosts, but if I remove it my sudo breaks causing error sudo: unable to resolve host . What should I append in /etc/hosts if not remove my hostname mapped to 127.0.1.1
I cannot understand what is the root cause of this problem.
Well it says in your Namenode log file that default storage of your namenode directory is /tmp/hadoop. The /tmp directory is formatted in linux on reboot by some systems. So it must be the problem.
You need to change your default namenode and datanode directory by changing your hdfs-site.xml configuration file.
Add this in your hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/"your-user-name"/hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/"your-user-name"/datanode</value>
</property>
After this format your namenode by hdfs namenode -format command.
I think this will end your problem.
If configuration file is not a problem, please try following:
1.first delete all contents from temporary folder:
rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
2.format the namenode:
bin/hadoop namenode -format
3.start all processes again:
bin/start-all.sh

Hadoop S3 List Bucket Contents Error

Not sure what I'm missing here. What's the best way to troubleshoot the below message I get when trying bucket contents from s3 using hadoop 2.7.1. This should be pretty straight forward where I have my core-site.xml and hdfs-site.xml files as below and then when I try to run hadoop fs -ls s3a://<bucket_name>
hdfs-site.xml:
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret</value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
</configuration>
Error Message:
[root#ip-10-239-197-136 ~]# hadoop fs -ls s3a://<bucket_name>
15/11/29 17:43:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-ls: Fatal internal error
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: ED84F95A33096A67, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: l4FDE3LnYtOSj0TNUrwqv3yX/3x3RgesasBWDo7WcdrS3rkn/6TCz9+Rt6uylVxGcMaztu7gYH8=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

Resources