Unable to upload file or create directory via Hadoop UI - hadoop

I have installed hadoop-3.2.1 in Ubuntu 18.04 with Java-8. I am able to send files to HDFS using the hadoop fs -put command via terminal. But when I try to upload files or create a directory via UI, I am getting the following errors:
While Uploading a file :
Couldn't upload the file temp.txt
While Creating a directory :
Permission denied: user=dr.who, access=WRITE,
inode="/":user1:supergroup:drwxr-xr-x
hdfs-site.xml file :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
</configuration>

Read about HDFS Permissions on HDFS Permissions Guide.
Temporarily, you can turn permissions completely off in hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

Related

How to access s3 files using on prem hadoop cluster?

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI.
Snapshot of the commands:
(base) [cloudera#quickstart conf]$ **aws s3 ls s3://gft-risk-aml-market-dev/**
PRE test/
2019-11-27 04:11:26 458 required
(base) [cloudera#quickstart conf]$ **hdfs dfs -ls s3://gft-risk-aml-market-dev/**
19/11/27 05:30:45 WARN fs.FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
ls: `s3://gft-risk-aml-market-dev/': No such file or directory
I have put the core-site.xml details.
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ANHS</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>EOo</value>
</property>
<property>
<name>fs.s3.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3.connection.ssl.enabled</name>
<value>false</value>
</property>
Finally. Cloudera Quickstart V13 and below core-site.xml worked.
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>AKIAxxxx</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>Xxxxxx</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
<description>The implementation class of the S3A AbstractFileSystem.</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.readahead.range</name>
<value>64K</value>
<description>Bytes to read ahead during a seek() before closing and
re-opening the S3 HTTP connection. This option will be overridden if
any call to setReadahead() is made to an open stream.</description>
</property>
<property>
<name>fs.s3a.list.version</name>
<value>2</value>
<description>Select which version of the S3 SDK's List Objects API to use.
Currently support 2 (default) and 1 (older API).</description>
</property>
I would use the Linux console to mount the S3 bucket and then move files from there to HDFS in that fashion. You will probably need to install it on the Cloudera quickstart by sudo'ing into root first, e.g., sudo yum install s3fs-fuse

Pseudo Distributed Mode Hadoop

I have installed Pseudo Distributed mode Hadoop 2.7.3 in Mac & did all Configuration which is specified in Plural Sight. I Copied Csv file from Local to hdfs. But next day when i searched for files , it is not present in hdfs and removed automatically. Is there any other conf setting so that my files are not loss?
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Thanks,
Add these properties to hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/username/hadoop-dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/username/hadoop-dfs/data</value>
</property>
The metadata and data blocks are stored under /tmp by default as it is the value of hadoop.tmp.dir. The contents inside /tmp are deleted on reboot.
After adding these properties, format the namenode and start the services.

Getting connection refused while reading file from hdfs using pyspark

I installed hadoop 2.7, set the paths and set the configurations in core-site.xml and hdfs-site.xml as follows:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
I also started the hdfs using start-dfs.sh. Inspite of mentioning the IP address in the configuration, I get connection refused error like:
Call From spark/<ip_addr> to localhost:8020 failed on connection exception: java.net.ConnectException:Connection refused
I stored file onto hdfs from my vm using:
hadoop fs -put /opt/TestLogs/traffic_log.log /usr/local/hadoop/TestLogs
This is a part of my code in pyspark to read file from hdfs and then extract the fields:
file = sc.textFile("hdfs://<ip_addr>/usr/local/hadoop/TestLogs/traffic_log.log")
result = file.filter(lambda x: len(x)>0)
result = result.map(lambda x: x.split("\n"))
print(result) # PythonRDD[2] at RDD at PythonRDD.scala
lines = result.map(func1).collect() #this is where I get the connection refused error.
print(lines)
func1 is function containing regex expressions to extract the fields from my logs. And then the result is returned to lines. This program works perfectly fine when reading text file directly from vm.
Spark version:spark-2.0.2-bin-hadoop2.7
VM: CentOS
How to resolve this error? Am I missing out something?
Two things need to be set:
1) In hdfs-site.xml make sure you have permissions disabled:
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<property>
2) In core-site.xml set your IP address to the IP address of the master:
<property>
<name>fs.defaultFS</name>
<value>hdfs://<MASTER IP ADDRESS>:8020</value>
<property>

cant impersonate on hive server2

Im trying to connect to hive server 2 through a JDBC connector, but Im getting the error:
'user x cant impersonate y'
I added these properties to my core-site.xml file:
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
Also, in hive-site.xml I have:
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
I have my authentication set to none and I am connecting as anonymous. I have restarted my cluster since changing the config files as well as running:
hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -chmod g+w /tmp
Can anyone suggest why Im still getting the error?
If you are trying to connect as user named anonymous, the properties should be
<property>
<name>hadoop.proxyuser.anonymous.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.anonymous.groups</name>
<value>*</value>
</property>

Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current

I get the error
Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current
While trying to install hadoop on my local Mac.
What could be the reason for this? Just for reference, I'm putting my xml files down below:
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
I think my problem lies in my hdfs-site.xml file, but I'm not sure how to pinpoint/change it.
I'm using this tutorial, and "hadoop" in the file path is replaced by my username.
Possible error: misconfiguration of the hdfs-site.xml file
This happened to me when I was following a setup tutorial. The contents of the hdfs-site.xml for me was
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dataNode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Only then I realized that the text hadoop in the above file corresponds to the user name, where in my case, it had to replaced with hduser. When both occurrences of hadoop was replaced with hduser, the hdfs namenode -format command worked fine.
I had this problem too and it was a permission problem. I just did:
sudo chmod 777 /home/hadoop/hadoopinfra/hdfs/namenode/
and works!
In the step where you need to verify the hadoop installation, instead of 'hdfs namenode -format' use '/usr/local/hadoop/bin/hdfs namenode -format'
Found this answer from:
hadoop java.io.IOException: while running namenode -format
If you are not using any other distro than native hadoop, then add the current user to hadoop group and retry formatting the namenode.
sudo usermod -a -G hadoop <current-username>
In case of using thirdparty hadoop distros such Cloudera, Hortonworks or MapR, switch to root user and again switch to hdfs user then try formatting the namenode will succeed.
$ sudo -i
$ su - hdfs
$ hdfs namenode -format
Try the Hadoop command with sudo

Resources