Unable to copy HDFS data to S3 bucket - hadoop

I have an issue related to a similar question asked before. I'm unable to copy data from HDFS to an S3 bucket in IBM Cloud.
I use command: hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
I've added extra properties in /etc/hadoop/core-site.xml file:
<property>
<name>fs.s3a.access.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.eu-de.cloud-object-storage.appdomain.cloud</value>
</property>
<property>
<name>fs.s3a.multipart.size</name>
<value>104857600</value>
</property>
I receive following error message:
root#e05ffff9bac9:/etc/hadoop# hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
2021-04-29 13:29:36,723 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314)
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1237)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:280)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:240)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
Invalid arguments: null
Connection to S3 bucket with AWS CLI works fine. Thanks in advance for help!

Related

Hadoop ignores filesystem location config on Windows 10

I am trying to install single node hadoop on Windows 10.
I have used various guides, but failed. Last one I used was https://github.com/MuhammadBilalYar/Hadoop-On-Window/wiki/Step-by-step-Hadoop-2.8.0-installation-on-Window-10
I have confogured my dfs as
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///V:/DB/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///V:/DB/hadoop/datanode</value>
</property>
Formatting went well.
Unfortunately, when I run start-all, I get in one of windows
18/04/22 21:36:17 WARN datanode.DataNode: Invalid dfs.datanode.data.dir V:\DB\hadoop\datanode :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not readable: V:\DB\hadoop\datanode
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:101)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2580)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2622)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2544)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2729)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2753)
18/04/22 21:36:17 ERROR datanode.DataNode: Exception in secureMain
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/V:/DB/hadoop/datanode/"
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2631)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2604)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2497)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2544)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2729)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2753)
Looks like it has problems with local Windows path specification. What to do?

Replace HDFS form local disk to s3 getting error (org.apache.hadoop.service.AbstractService)

We are trying to setup Cloudera 5.5 where HDFS will be working on s3 only for that we have already configured the necessory properties in Core-site.xml
<property>
<name>fs.s3a.access.key</name>
<value>################</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>###############</value>
</property>
<property>
<name>fs.default.name</name>
<value>s3a://bucket_Name</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://bucket_Name</value>
</property>
After setting it up we were able to browse the files for s3 bucket from command
hadoop fs -ls /
And it shows the files available on s3 only.
But when we start the yarn services JobHistory server fails to start with below error and on launching pig jobs we are getting same error
PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils
Unable to create default file context [s3a://kyvosps]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
On serching on Internet we found that we need to set following properties as well in core-site.xml
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The implementation class of the S3A Filesystem</description>
</property>
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The FileSystem for S3A Filesystem</description>
</property>
After setting the above properties we are getting following error
org.apache.hadoop.service.AbstractService
Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:131)
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:157)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getDefaultFileContext(JobHistoryUtils.java:247)
The jars needed for this is in place but still getting the error any help will be great. Thanks in advance
Update
I tried to remove the property fs.AbstractFileSystem.s3a.impl but it give me the same first exception the one i was getting previously which is
org.apache.hadoop.security.UserGroupInformation
PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils
Unable to create default file context [s3a://bucket_name]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
The problem is not with the location of the jars.
The problem is with the setting:
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The FileSystem for S3A Filesystem</description>
</property>
This setting is not needed. Because of this setting, it is searching for following constructor in S3AFileSystem class and there is no such constructor:
S3AFileSystem(URI theUri, Configuration conf);
Following exception clearly tells that it is unable to find a constructor for S3AFileSystem with URI and Configuration parameters.
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
To resolve this problem, remove fs.AbstractFileSystem.s3a.impl setting from core-site.xml. Just having fs.s3a.impl setting in core-site.xml should solve your problem.
EDIT:
org.apache.hadoop.fs.s3a.S3AFileSystem just implements FileSystem.
Hence, you cannot set value of fs.AbstractFileSystem.s3a.impl to org.apache.hadoop.fs.s3a.S3AFileSystem, since org.apache.hadoop.fs.s3a.S3AFileSystem does not implement AbstractFileSystem.
I am using Hadoop 2.7.0 and in this version s3A is not exposed as AbstractFileSystem.
There is JIRA ticket: https://issues.apache.org/jira/browse/HADOOP-11262 to implement the same and the fix is available in Hadoop 2.8.0.
Assuming, your jar has exposed s3A as AbstractFileSystem, you need to set the following for fs.AbstractFileSystem.s3a.impl:
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
</property>
That will solve your problem.

Hadoop S3 List Bucket Contents Error

Not sure what I'm missing here. What's the best way to troubleshoot the below message I get when trying bucket contents from s3 using hadoop 2.7.1. This should be pretty straight forward where I have my core-site.xml and hdfs-site.xml files as below and then when I try to run hadoop fs -ls s3a://<bucket_name>
hdfs-site.xml:
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret</value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
</configuration>
Error Message:
[root#ip-10-239-197-136 ~]# hadoop fs -ls s3a://<bucket_name>
15/11/29 17:43:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-ls: Fatal internal error
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: ED84F95A33096A67, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: l4FDE3LnYtOSj0TNUrwqv3yX/3x3RgesasBWDo7WcdrS3rkn/6TCz9+Rt6uylVxGcMaztu7gYH8=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

Error: E0902: Exception occured: [User: Root is not allowed to impersonate root

I am trying to follow the steps given at http://www.rohitmenon.com/index.php/apache-oozie-installation/
Note: I am not using cloudera distibution of hadoop
The above link is similar to http://oozie.apache.org/docs/4.0.1/DG_QuickStart.html
but with more descriptive seems to me
however while running the below command as a root user i am getting exception
./bin/oozie-setup.sh sharelib create -fs
Note: i have two live node shown at dfshealth.jsp . and i have updated the core-site.xml for all three(including namenode) with property as below
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
i understand this is point where i am making mistake Could someone please guide me
Stacktrace
org.apache.oozie.service.HadoopAccessorException: E0902: Exception occured: [User: root is not allowed to impersonate root]
at
org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:430)
at org.apache.oozie.tools.OozieSharelibCLI.run(OozieSharelibCLI.java:144)
at org.apache.oozie.tools.OozieSharelibCLI.main(OozieSharelibCLI.java:52)
Caused by: org.apache.hadoop.ipc.RemoteException: User: root is not allowed to impersonate root
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:276)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:241)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:422)
at org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:420)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:420)
... 2 more
--------------------------------------
Note: Getting E0902: Exception occured: [User: oozie is not allowed to impersonate oozie] i have followed this link as well but not able to solve my problem
if i change the core-site.xml as below only for NameNode
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>[NAMENODE IP]</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
I get the exception as
Unauthorized connection for super-user: hadoop
After adding the property files into core-site.xml restart your hadoop and try. Even though if it not works format the namenode and start hadoop it will work.
You need to add these properties in core-site.xml for impersonation in order to solve your whitelist error
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
Hope this fixes your issue.
Follow the advice in the article below. Hadoop before 1.1.0 doesn't support wildcard so you have to explicitly specified the hosts and the groups
http://mail-archives.apache.org/mod_mbox/oozie-user/201212.mbox/%3CCAOcnVr1TZZ5X0Mrb7fFA8JdW6rO6PgoJ9u0=2UYbfXf_o8r=DA#mail.gmail.com%3E
I solved the problem by adding those lines in the core-site.xml-file
hadoop.proxyuser.root.hosts
value = *
hadoop.proxyuser.root.groups
value = *
and it works perfectly all my databases and tables are shown.
./oozie-setup.sh sharelib create -fs hdfs://localhost:9000
try to run this command using sudo.
check for hdfs if this path already exits i.e., /user/user_name/share/lib, if it exists remove it using
hadoop fs -rmr /user/user_name
After that run sudo ./oozied.sh. oozie will be started. Then check for your localhost:11000.

Error running mapreduce sample in hadoop 0.23.6

I deployed Hadoop 0.23.6 in Ubuntu 12.04 LTS. I am able to copy files across and do file manipulation. I am using YARN for mapreduce.
I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
Command used:
bin/hadoop jar hadoop-mapreduce-examples-0.23.6.jar randomwriter -Dmapreduce.randomwriter.mapsperhost=1 -Dmapreduce.job.user.name=$USER -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars hadoop-mapreduce-client-app-0.23.6.jar output
Hadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Verify your yarn-site.xml configuration. You need to have below properties configured.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
For more details, have look at jira
https://issues.apache.org/jira/browse/MAPREDUCE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Resources