Unable to connect hive jdbc through beeline - hadoop

I am new to hive and wanted to make connection, I am able to do so using Hive CLI now I want to connect hive through beeline but I am getting below error while connecting.
Tried to connect hive with transportMode as http but that is also not working.
jdbc:hive2://localhost:10001/default;transportMode=http
Please refer my hive-site.xml file.
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
<description>
Expects one of [binary, http].
Transport mode of HiveServer2.
</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
<description>
Expects one of [nosasl, none, ldap, kerberos, pam, custom].
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module
NOSASL: Raw transport
</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>Path component of URL endpoint when in HTTP mode.</description>
</property>
Running below command does not return any value:
netstat -an | grep 10000
netstat -an | grep 10001

beeline requires HiveServer2 process to be running.
If this is a vanilla installation, you can start HiveServer2 as a background process using this command,
nohup $HIVE_HOME/bin/hiveserver2 &
In addition to this, you have to add the user hiveuser or any user used when connecting via beeline as the proxyuser to be able to access HDFS
Add these properties to core-site.xml of HDFS and restart the services.
<property>
<name>hadoop.proxyuser.hiveuser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hiveuser.groups</name>
<value>*</value>
</property>

Related

Set up Hadoop KMS

I tried to set up Hadoop KMS server and client.
below is my kms.site.xml
<property>
<name>hadoop.kms.key.provider.uri</name>
<value>jceks://file#/${user.home}/kms.keystore</value>
<description>
URI of the backing KeyProvider for the KMS.
</description>
</property>
<property>
<name>hadoop.security.keystore.java-keystore-provider.password-file</name>
<value>kms.keystore.password</value>
<description>
If using the JavaKeyStoreProvider, the file name for the keystore password.
</description>
</property>
In core-site.xml added below
<property>
<name>dfs.encryption.key.provider.uri</name>
<value>kms://http#mydomain:16000/kms</value>
</property>
in hdfs-site added below
<property>
<name>dfs.encryption.key.provider.uri</name>
<value>kms://http#mydomain:16000/kms</value>
</property>
Then restarted hadoop and used ./kms.sh start to start kms
But when i m trying to generate a key using below command
hadoop key create key_demo -size 256
i m getting below message , am i missing anything ?
There are no valid (non-transient) providers configured.
No action has been taken. Use the -provider option to specify
a provider. If you want to use a transient provider then you
MUST use the -provider argument.
I am using hadoop 3.3.1
this is my kms-site.xml:
<property>
<name>hadoop.kms.key.provider.uri</name>
<value>jceks://file#/${user.home}/kms.keystore</value>
</property>
<property>
<name>hadoop.security.keystore.java-keystore-provider.password-file</name>
<value>kms.keystore.password</value>
</property>
this is my core-site.xml:
<property>
<name>hadoop.security.key.provider.path</name>
<value>kms://http#localhost:9600/kms</value>
<description>
The KeyProvider to use when interacting with encryption keys used
when reading and writing to an encryption zone.
</description>
</property>
Before adding those to my core-site.xml, I also get the same message as yours. I think you are using hadoop v2, so your port number for keyProvider are still 16000, I use v3. I also see that you are still using JavaKeyStoreProvider like the example in hadoop documentation (so am I), if you don't provide "password file" which is kms.keystore.password the KMS will terminate immediately after starting up. So, you would need to place an empty file in your classpath, which is in /hadoop_home/etc/
I know i arrive quite late, hope it help.

Spark tries to connect to localhost instead of configured servers

This error information shows up:
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.net.ConnectException Call From undefined.hostname.localhost/192.168.xx.xxx to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused);
I don't understand why I will visit the localhost:9000, the host in my core-site.xml is hdfs://192.168.xx.xx:9000, why did I visit localhost:9000
Is it the default host?
Please make sure that hive-site.xml is present in your spark config directory /etc/spark/conf/ and configure the hive configuration settings.
## core-site.xml
fs.defaultFS
## Hive config
hive.metastore.uris
In hive-site.xml, you can configure the as follows. Please configure your hive meta-store details.
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-xx.xx.xx.xx:8020</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://ip-xx.xx.xx:9083</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastorecreateDatabase
IfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password for connecting to mysql server </description>
</property>
</configuration>
Your error is related to HDFS, not Hive or SparkSQL
You need to ensure that your HADOOP_HOME or HADOOP_CONF_DIR are correctly setup in the spark-env.sh if you would like to connect to the correct Hadoop environment rather than use the defaults.
I reset the metastore in Mysql. I use the localhost in my core-site.xml at that time, I init my metastore.So I reset the metastore, and the problem solved.
First,go to the mysql command line,drop the database(metastore) which you set in your hive-site.xml.
Then,change dictory to the $HIVE_HOME/bin,execute schematool -initSchema -dbType mysql, and the problem is solved.the error due to the metastore in mysql is too late(I have set the metastore in standby environments),I turn to the cluster environment later, but the metastore is previous, so I can create table in hive,not in sparksql.
Thank someone who helps me.#Ravikumar,#cricket_007

What's the best solution for Hive proxy user in HDFS?

I'm very confusing by the proxyuser setting in HDFS and Hive. I have the doAs option enabled in hive-site.xml
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
And proxyuser in core-site.xml
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
But this will cause:
2017-03-29 16:24:59,022 INFO org.apache.hadoop.ipc.Server: Connection from 172.16.0.239:60920 for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol is unauthorized for user hive (auth:PROXY) via hive (auth:SIMPLE)
2017-03-29 16:24:59,023 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 9000: readAndProcess from client 172.16.0.239 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: hive is not allowed to impersonate hive]
I didn't set proxyuser to "hive" like most example saying is because core-site.xml is shared by other services, I don't like every service access HDFS as hive, but I still gave it a try so that now the core-site.xml looks as:
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
I lunched beeline again, however, the login is fine this time, but when a command was running, yarn thrown exception:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Permission denied: user=hive, access=WRITE, inode="/user/yarn/hive/.staging":hdfs:supergroup:drwxr-xr-x
proxyuser "hive" has been denied from the staging folder which is owned by "hdfs". I don't think give 777 to the staging folder is a good idea as it makes no sense to give HDFS protection but open the folder to everyone. So my question is what's the best solution to setup the permission between Hive, Hdfs and Yarn?
Hadoop permission is just a nightmare to me, please help.
Adding proxyuser entries in core-site.xml would allow the superuser named hive to connect from any host (as value is *) to impersonate a user belonging to any group (as value is *).
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
This can be made more restrictive by passing actual hostnames and group names (Refer Superusers). The access privileges the superuser hive has on the FS will be applicable for the impersonating users.
For a multi-user Hadoop environment, the best practice would be to create dedicated directories for every superuser and configure the associated service to store files in it. And create a group supergroup for all these superusers so that group level access privileges can be given for the files, if required.
Add this property in hdfs-site.xml to configure the supergroup
<property>
<name>dfs.permissions.superusergroup</name>
<value>supergroup</value>
<description>The name of the group of super-users.</description>
</property>

How to get the URL for Hive Web Interface

Sorry, it may be a basic question. I tried to google it but couldn't find exact solution
I am trying to find out URL for my Hive web interface.
Through this I can check the tables present in it. With the help of the web interface URL I can also access the beeline command line interface
I am accessing my company's server for hadoop interface through putty.
I access hdfs web interface using
http://ibmlnx01:50070/
However when I try the below URLs, it doesn't show any web userinterface
http://ibmlnx01:9999/
http://ibmlnx01:10000/
http://0.0.0.0:9999/
http://0.0.0.0:10000
Below is my hive-default.xml.template
I couldn't copy the whole file. But copied the main code I hope its sufficient
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-0.12.0.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface will listen on</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>This is the port the Hive Web Interface will listen on</description>
</property>
Below is the code for hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://volgalnx03.ad.infosys.com/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1234</value>
<description>password for connecting to mysql server </description>
</property>
</configuration>
I connect the putty terminal through 10.66.82.52 IP address. If this is of any help
Hue (Hadoop User Experience) is the web interface for analyzing data with Hadoop (and not just with Hive).
Hue's standard port is 8888.
However, that value may be different in your installation.
Look for the http_port entry in /etc/hue/conf/hue.ini if 8888 doesn't work for you.
just appending hwi helped me ;)
http://host:9999/hwi
Make sure u executed below line first.
$HIVE_HOME/bin/hive --service hwi
At first I tried to get ui using url http://host:9999/ and it returned 404.
Double check the installation steps in www.tutorialspoint.com/hive/hive_quick_guide.htm.
Then, make sure you have JDK on your server. Finally, you should run the execution command:
hive --service hwi
In addition, I inserted a link about Gareth's personal experience in the same situation.
https://dzone.com/articles/hadoop-hive-web-interface

CDH 4.1.1. Hive JDBC connection

I have pretty strange problem. I do have an CDH 4.1.1 cluster with Cloudera manager free installed.
Beeswax works fine, hive CLI utility works fine.
Not nothing listens to port 10000 (a default port for Hive JDBC connectivity). I do have standalone test CDH 4.1.1 VMWare image (you can download it from Cloudera site). There is the same situation but there 10000 is opened and I can query Hive.
What do I do wrong? Why 10000 is closed? What do I have to make it work on my cluster?
Ok, so CDH 4.1.1 demo image does have running Hive Thrift server as a service. If we are working with CDH cluster which is under contorol of Cloudera manager free, we have to do:
goto Cloodera Manager Free (CMFree)
*Services»
Service hue1»*
fill the field: Hive Configuration Safety Valve
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>100</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>10.66.48.23</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<!-- TODO: add concurrency support-->
<property>
<name>hive.support.concurrency</name>
<value>false</value><!-- -should be true -->
</property>
<!-- do it later
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.yoyodyne.com,zk2.yoyodyne.com,zk3.yoyodyne.com</value>
</property>
-->
Restart service
And then start hive thrift server:
[devops#cdh-1 ~]$ sudo -u hdfs /usr/bin/hive --service hiveserver
Here you can see how it can be "demonized" on CentOS
http://blog.milford.io/2010/06/daemonizing-the-apache-hive-thrift-server-on-centos/

Resources