Hadoop S3 List Bucket Contents Error - hadoop

Not sure what I'm missing here. What's the best way to troubleshoot the below message I get when trying bucket contents from s3 using hadoop 2.7.1. This should be pretty straight forward where I have my core-site.xml and hdfs-site.xml files as below and then when I try to run hadoop fs -ls s3a://<bucket_name>
hdfs-site.xml:
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret</value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
</configuration>
Error Message:
[root#ip-10-239-197-136 ~]# hadoop fs -ls s3a://<bucket_name>
15/11/29 17:43:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-ls: Fatal internal error
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: ED84F95A33096A67, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: l4FDE3LnYtOSj0TNUrwqv3yX/3x3RgesasBWDo7WcdrS3rkn/6TCz9+Rt6uylVxGcMaztu7gYH8=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

Related

Unable to copy HDFS data to S3 bucket

I have an issue related to a similar question asked before. I'm unable to copy data from HDFS to an S3 bucket in IBM Cloud.
I use command: hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
I've added extra properties in /etc/hadoop/core-site.xml file:
<property>
<name>fs.s3a.access.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.eu-de.cloud-object-storage.appdomain.cloud</value>
</property>
<property>
<name>fs.s3a.multipart.size</name>
<value>104857600</value>
</property>
I receive following error message:
root#e05ffff9bac9:/etc/hadoop# hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
2021-04-29 13:29:36,723 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314)
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1237)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:280)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:240)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
Invalid arguments: null
Connection to S3 bucket with AWS CLI works fine. Thanks in advance for help!

Namenode daemon not starting properly

I have just started learning hadoop from the book Hadoop: The definitive guide.
I followed the tutorial for Hadoop installation in Pseudodistribution mode. I enabled the passwordless login to ssh.
Formatted the hdfs filesystem before using it for the first time. It started successfully for the first time.
After that I copied a text file using copyFromLocal to HDFS and everything went fine. But if I restart the system and start the daemons again and look at the web UI , only YARN is started successfully.
When I issue the stop-dfs.sh commmand I get
Stopping namenodes on [localhost]
localhost: no namenode to stop
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
If I format the hdfs file system again and then try starting the daemons then they all start successfully.
Here are my configuration files.Exactly as what is told in hadoop definitive guide book.
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
This is the error in the namenode log file
WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop/dfs/name does not exist
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
This is from mapred log
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 33 more
I visited apache hadoop : connection refused which says
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I found there is an entry in my /etc/hosts, but if I remove it my sudo breaks causing error sudo: unable to resolve host . What should I append in /etc/hosts if not remove my hostname mapped to 127.0.1.1
I cannot understand what is the root cause of this problem.
Well it says in your Namenode log file that default storage of your namenode directory is /tmp/hadoop. The /tmp directory is formatted in linux on reboot by some systems. So it must be the problem.
You need to change your default namenode and datanode directory by changing your hdfs-site.xml configuration file.
Add this in your hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/"your-user-name"/hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/"your-user-name"/datanode</value>
</property>
After this format your namenode by hdfs namenode -format command.
I think this will end your problem.
If configuration file is not a problem, please try following:
1.first delete all contents from temporary folder:
rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
2.format the namenode:
bin/hadoop namenode -format
3.start all processes again:
bin/start-all.sh

Hadoop: Incorrect configuration

Hi stackoverflow community,
so I've been wanting to install hadoop, but I have come to a problem.
I've looked at other approaches, but I still keep receiving. I am completely new to hadoop, so I don't really know where to go. I am on a macbook pro with El Capitan if relevant. Once I make sbin/start-dfs.sh I receive this:
sbin/start-dfs.sh
16/05/10 11:09:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
Password:
localhost: /usr/local/Cellar/hadoop/2.7.2/libexec/sbin/hadoop-daemon.sh: line 69: [: MacBook: integer expression expected
localhost: starting namenode, logging to /usr/local/Cellar/hadoop/2.7.2/libexec/logs/hadoop-name-namenode-name’s
localhost: Error: Could not find or load main class MacBook
The hadoop-daemon.sh is:
The relevant XMLs are as follow:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
If anything is wanted I will freely provide. Thank you for all the help and I truly appreciate it, since I really want to start using Hadoop.
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home
export HADOOP_PREFIX=/usr/local/Cellar/hadoop
Hey so this is an update if anyone is considered: I now get this
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [myIP#]
New note: I am redoing the process with and refollowing this guide. Whether or not success is mine, I will post my update here :)!
zhongyaonan.com/hadoop-tutorial/…
Looks like your conf directory is not set properly try following steps
export HADOOP_CONF_DIR = $HADOOP_HOME/etc/hadoop
hdfs namenode -format
hdfs getconf -namenodes
./start-dfs.sh

java.io.IOException: Incomplete HDFS URI

I am unable to find the HDFS path and save the log files of Twitter. Also it gives two warnings.
WARN: HBASE_HOME not found
WARN: HIVE_HOME not found
The error is:
java.io.IOException: Incomplete HDFS URI, no host: hdfs://l27.0.0.1:9000/tweets/movies/2016/01/29/01/FlumeData.1454010974716.tmp
My core-site.xml is
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

How to configure hadoop to use non-default port: "0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused"

When I run start-dfs I get the below error and it looks like I need to tell hadoop to use a different port since that is what I require when I ssh into localhost. In other words the following works successfully: ssh -p 2020 localhost.
[Wed Jan 06 16:57:34 root#~]# start-dfs.sh
16/01/06 16:57:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: namenode running as process 85236. Stop it first.
localhost: datanode running as process 85397. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
16/01/06 16:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/hdfs/datanode</value>
</property>
</configuration>
If your Hadoop cluster nodes run sshd listening on a non-standard port, then it is possible to tell the Hadoop scripts to initiate ssh connections to that port. In fact, it's possible to customize any of the options passed to the ssh command.
This is controlled by an environment variable named HADOOP_SSH_OPTS. You can edit your hadoop-env.sh file and define it there. (By default this environment variable is not defined.)
For example:
export HADOOP_SSH_OPTS="-p 2020"

Resources