cant impersonate on hive server2 - hadoop

Im trying to connect to hive server 2 through a JDBC connector, but Im getting the error:
'user x cant impersonate y'
I added these properties to my core-site.xml file:
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
Also, in hive-site.xml I have:
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
I have my authentication set to none and I am connecting as anonymous. I have restarted my cluster since changing the config files as well as running:
hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -chmod g+w /tmp
Can anyone suggest why Im still getting the error?

If you are trying to connect as user named anonymous, the properties should be
<property>
<name>hadoop.proxyuser.anonymous.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.anonymous.groups</name>
<value>*</value>
</property>

Related

Unable to upload file or create directory via Hadoop UI

I have installed hadoop-3.2.1 in Ubuntu 18.04 with Java-8. I am able to send files to HDFS using the hadoop fs -put command via terminal. But when I try to upload files or create a directory via UI, I am getting the following errors:
While Uploading a file :
Couldn't upload the file temp.txt
While Creating a directory :
Permission denied: user=dr.who, access=WRITE,
inode="/":user1:supergroup:drwxr-xr-x
hdfs-site.xml file :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
</configuration>
Read about HDFS Permissions on HDFS Permissions Guide.
Temporarily, you can turn permissions completely off in hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

How to access s3 files using on prem hadoop cluster?

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI.
Snapshot of the commands:
(base) [cloudera#quickstart conf]$ **aws s3 ls s3://gft-risk-aml-market-dev/**
PRE test/
2019-11-27 04:11:26 458 required
(base) [cloudera#quickstart conf]$ **hdfs dfs -ls s3://gft-risk-aml-market-dev/**
19/11/27 05:30:45 WARN fs.FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
ls: `s3://gft-risk-aml-market-dev/': No such file or directory
I have put the core-site.xml details.
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ANHS</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>EOo</value>
</property>
<property>
<name>fs.s3.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3.connection.ssl.enabled</name>
<value>false</value>
</property>
Finally. Cloudera Quickstart V13 and below core-site.xml worked.
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>AKIAxxxx</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>Xxxxxx</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
<description>The implementation class of the S3A AbstractFileSystem.</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.readahead.range</name>
<value>64K</value>
<description>Bytes to read ahead during a seek() before closing and
re-opening the S3 HTTP connection. This option will be overridden if
any call to setReadahead() is made to an open stream.</description>
</property>
<property>
<name>fs.s3a.list.version</name>
<value>2</value>
<description>Select which version of the S3 SDK's List Objects API to use.
Currently support 2 (default) and 1 (older API).</description>
</property>
I would use the Linux console to mount the S3 bucket and then move files from there to HDFS in that fashion. You will probably need to install it on the Cloudera quickstart by sudo'ing into root first, e.g., sudo yum install s3fs-fuse

Getting connection refused while reading file from hdfs using pyspark

I installed hadoop 2.7, set the paths and set the configurations in core-site.xml and hdfs-site.xml as follows:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
I also started the hdfs using start-dfs.sh. Inspite of mentioning the IP address in the configuration, I get connection refused error like:
Call From spark/<ip_addr> to localhost:8020 failed on connection exception: java.net.ConnectException:Connection refused
I stored file onto hdfs from my vm using:
hadoop fs -put /opt/TestLogs/traffic_log.log /usr/local/hadoop/TestLogs
This is a part of my code in pyspark to read file from hdfs and then extract the fields:
file = sc.textFile("hdfs://<ip_addr>/usr/local/hadoop/TestLogs/traffic_log.log")
result = file.filter(lambda x: len(x)>0)
result = result.map(lambda x: x.split("\n"))
print(result) # PythonRDD[2] at RDD at PythonRDD.scala
lines = result.map(func1).collect() #this is where I get the connection refused error.
print(lines)
func1 is function containing regex expressions to extract the fields from my logs. And then the result is returned to lines. This program works perfectly fine when reading text file directly from vm.
Spark version:spark-2.0.2-bin-hadoop2.7
VM: CentOS
How to resolve this error? Am I missing out something?
Two things need to be set:
1) In hdfs-site.xml make sure you have permissions disabled:
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<property>
2) In core-site.xml set your IP address to the IP address of the master:
<property>
<name>fs.defaultFS</name>
<value>hdfs://<MASTER IP ADDRESS>:8020</value>
<property>

Hadoop - Tables not displayed in Hive

The problem i am facing is:
Everytime I login in to HIVE CLI, all the created databases & tables are gone. I can see them in the warehouse directory in Hadoop GUI. However same is not reflecting through CLI. Please help me resolve the issue.
I am using Hadoop - 1.0.4 & Hive - 1.2.1.
I have configured (warehouse dir, temp dir, derby metastore dir) inhive-site.xml as per documentation.
properties in hive-site.xml
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hadoop/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hadoop/hive</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>700</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.metastore.connect.retries</name>
<value>3</value>
<description>Number of retries while opening a connection to metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/usr/hadoop/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

Getting E0902: Exception occured: [User: oozie is not allowed to impersonate oozie]

Hi i am new to Oozie and i am getting this error E0902: Exception occured: [User: pramod is not allowed to impersonate pramod] when i run the following command
./oozie job -oozie htt p://localhost:11000/oozie/ -config ~/Desktop/map-reduce /job.properties -run.
My hadoop version is 1.0.3 and oozie version is 3.3.2 and running in a pseudo mode
The following is the content of my core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/pramod/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.proxyuser.${user.name}.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.${user.name}.groups</name>
<value>*</value>
</property>
</configuration>
Can somebody help
Hadoop 1.0.x does not support wildcards. http://mail-archives.apache.org/mod_mbox/oozie-user/201212.mbox/%3CCAOcnVr1TZZ5X0Mrb7fFA8JdW6rO6PgoJ9u0=2UYbfXf_o8r=DA#mail.gmail.com%3E
So try
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>oozie,pramod</value>
</property>
One thing missed in the discussion above:
In core-site.xml you need to use the user with which oozie is started, as in the user that invoked the command "bin/oozied.sh start". For example: if you have "hadoop.proxyuser.bob.hosts" along with hadoop.proxyuser.bob.groups, then the user 'bob' would be required to start oozie using "bin/oozied.sh start".
I don't think you can use variables in the key name - you'll need to hardcode the user name rather than ${user.name}.
I assume you have an oozie user (which the oozie server is run as), so basically you want to configure as follows to allow the oozie user to impersonate anyone from any host:
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
Make sure you restart your HDFS / MAPREDUCE services for this to take affect

Resources