Connection refused when i run hive select query - hadoop

I am having trouble in running hive select query.
I have created DB(mydb) in hive, as soon as i run query on mydb Tables, it gives me below error.
Failed with exception java.io.IOException:java.net.ConnectException: Call From oodles- Latitude-3540/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Configuration for my core-site.xml file of hadoop is show below
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.114:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
</description>
</property>
</configuration>
And Configuration for my mapred-site.xml.template file is
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.114:8021</value>
<description>The host and port that the MapReduce job tracker runsat.</description>
</property>
</configuration>
if i change the host name in both the file from 192.168.0.114 to localhost then hive query is working fine but not working with 192.168.0.114
why hive always points to localhost:9000 , cant we change it to point at my preferred location(192.168.0.114:9000)?
How can i fix hive select query to show me result with above configuration of hadoop conf files?
Hope u friends got my question clearly!

i catch the problem which was causing this error
Failed with exception java.io.IOException:java.net.ConnectException: Call From oodles- Latitude-3540/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
By default hive create tables according to your configuration of namenode i.e
hdfs://localhost:9000/user/hive/warehouse.
later if you change the configuration of namenode i.e if you change fs.default.name property to
hdfs://hostname:9000
in core-site.xml also in hive-site.xml and try to access table by executing select query it means you are trying to search at previous location i.e hdfs://localhost:9000/user/hive/warehouse to which your namenode is not connected. Currently your namenode is connected to hdfs://hostname:9000 that’s why it gives you
Call From oodles-Latitude-3540/127.0.1.1 to localhost:9000 failed on connection exception
In my case i changed my hive-site.xml file like this
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/new/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hadoop.embedded.local.mode</name>
<value>false</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hostname:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hostname:8021</value>
</property>
</configuration>
now when hive creates table it will choose value of fs.default.name from here. (hostname is my ip which i mention in hosts file in /etc/hosts like shown below)
127.0.0.1 localhost
127.0.1.1 oodles-Latitude-3540
192.168.0.113 hostname
My core-site.xml file
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hostname:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
</description>
</property>
</configuration>
My Mapred-site.xml file
<configuration>
<property
<name>mapred.job.tracker</name>
<value>hostname:8021</value>
</property>
</configuration>
Now if your table location is same as your namenode i.e
fs.default.name = hdfs://hostname:9000
then it will give you no error.
You can check the location of your table by executing this query -->
show create table <table name>
Any query . Feel free to Ask!

Since its working fine with localhost.I would suggest adding your IP address in the /etc/hosts file. define all the cluster nodes DNS name as well as its IP .
Hope this helps .
--UPDATE--
A sample hosts mapping:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.7.192.56 hostname

Related

unable to connect to hdfs on localhost

I am unable to connect to hdfs on port 9000, I keep getting this error:
localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused
hdfs-site.xml file is this:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/localhdfs/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
</configuration>
and core-site.xml file is this:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
I have restarted the cluster multiple times, I keep getting connection errors:
this is my /etc/hosts file look like:
127.0.0.1 localhost
what am I missing?
Why are you using port 9000 in your config? fs.defaultFS should contain something like: hdfs://nameofcluster
Is this a single node instance? Sandbox? Are you running the command hdfs dfs -ls /?
I would first check:
Remove port from fs.default.name
iptables or Firewalls
hadoop.proxyuser.hdfs.hosts
hadoop.proxyuser.hdfs.groups
Ranger
Logs are exceeding 80% of disk
/etc/hosts file contains the FQDN and your machine's public IP.
Get the IP with ip a command and set is like: 192.166.6.6 abc.xxx.com
Delete fs.default.name property from hdfs-site.xml
Configure a Single node cluster

Spark tries to connect to localhost instead of configured servers

This error information shows up:
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.net.ConnectException Call From undefined.hostname.localhost/192.168.xx.xxx to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused);
I don't understand why I will visit the localhost:9000, the host in my core-site.xml is hdfs://192.168.xx.xx:9000, why did I visit localhost:9000
Is it the default host?
Please make sure that hive-site.xml is present in your spark config directory /etc/spark/conf/ and configure the hive configuration settings.
## core-site.xml
fs.defaultFS
## Hive config
hive.metastore.uris
In hive-site.xml, you can configure the as follows. Please configure your hive meta-store details.
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-xx.xx.xx.xx:8020</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://ip-xx.xx.xx:9083</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastorecreateDatabase
IfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password for connecting to mysql server </description>
</property>
</configuration>
Your error is related to HDFS, not Hive or SparkSQL
You need to ensure that your HADOOP_HOME or HADOOP_CONF_DIR are correctly setup in the spark-env.sh if you would like to connect to the correct Hadoop environment rather than use the defaults.
I reset the metastore in Mysql. I use the localhost in my core-site.xml at that time, I init my metastore.So I reset the metastore, and the problem solved.
First,go to the mysql command line,drop the database(metastore) which you set in your hive-site.xml.
Then,change dictory to the $HIVE_HOME/bin,execute schematool -initSchema -dbType mysql, and the problem is solved.the error due to the metastore in mysql is too late(I have set the metastore in standby environments),I turn to the cluster environment later, but the metastore is previous, so I can create table in hive,not in sparksql.
Thank someone who helps me.#Ravikumar,#cricket_007

Getting connection refused while reading file from hdfs using pyspark

I installed hadoop 2.7, set the paths and set the configurations in core-site.xml and hdfs-site.xml as follows:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<ip_addr>:9000/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/kavya/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kavya/hdfs/name</value>
</property>
</configuration>
I also started the hdfs using start-dfs.sh. Inspite of mentioning the IP address in the configuration, I get connection refused error like:
Call From spark/<ip_addr> to localhost:8020 failed on connection exception: java.net.ConnectException:Connection refused
I stored file onto hdfs from my vm using:
hadoop fs -put /opt/TestLogs/traffic_log.log /usr/local/hadoop/TestLogs
This is a part of my code in pyspark to read file from hdfs and then extract the fields:
file = sc.textFile("hdfs://<ip_addr>/usr/local/hadoop/TestLogs/traffic_log.log")
result = file.filter(lambda x: len(x)>0)
result = result.map(lambda x: x.split("\n"))
print(result) # PythonRDD[2] at RDD at PythonRDD.scala
lines = result.map(func1).collect() #this is where I get the connection refused error.
print(lines)
func1 is function containing regex expressions to extract the fields from my logs. And then the result is returned to lines. This program works perfectly fine when reading text file directly from vm.
Spark version:spark-2.0.2-bin-hadoop2.7
VM: CentOS
How to resolve this error? Am I missing out something?
Two things need to be set:
1) In hdfs-site.xml make sure you have permissions disabled:
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<property>
2) In core-site.xml set your IP address to the IP address of the master:
<property>
<name>fs.defaultFS</name>
<value>hdfs://<MASTER IP ADDRESS>:8020</value>
<property>

Issues in saving bulk data in HBase in Pseudo-distributed mode

I am setting up CDH4 in a pseudo-distributed mode.
I have set up Hadoop, and as suggested on CDH4 installation guide, have also completed the hdfs demo successfully.
I have also set up, HIVE, & HBase.
To populate the data in Hbase, I have written a java client, which populates the bulk data in HBase (around 1M rows each in 4 tables).
Now I am facing two issues:
When java client is running to port the dummy data into hbase, the regionserver shut down after around 4,50,000 rows of data is entered in total.
Using Hive, I am not able to access tables created in HBase, or worst, even cannot create tables from hive shell. Though, the hbase shell shows me the data/table structure (whetever has been generated before regionserver shut down.)
I have seen other posts regarding same. Seems that the 2nd issue is related to my /etc/hosts or hive-site.xml. Thus, I am pasting contents of both of them.
/etc/hosts
198.251.79.225 u17162752.onlinehome-server.com u17162752
198.251.79.225 default-domain.com
198.251.79.225 hbase.zookeeper.quorum localhost
198.251.79.225 cloudera-vm # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
127.0.1.1 cloudera-vm-local localhost
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
</configuration>
These issue are holding me from accomplish the task, I am supposed to.
Thanks in advance
Abhiskek
PS: This is my first post to this forum, so apologies, for anything inappropriate, you might have found! Thanks for bearing with me.
Hi Tariq, Thanks for your response. I have somehow managed to get over this. Now, I am facing another issue.
I am having 4 tables in HBase already, for which I want to create external tables in hive shell. But on running create external table commands on hive shell gives following error:
'ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in -ROOT- for region .META.,,1.1028785192 containing row'
Also, this error appears when I do something in HBase shell.
The other error that comes with the former one, on hbase shell is related to zookeeper. Stacktrace:
'WARN zookeeper.ZKUtil: catalogtracker-on- org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#6a9a56bf- 0x1413718482c0010 Unable to get data of znode /hbase/unassigned/1028785192
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/1028785192'
Please help. Thanks!

MR scratch error on hive/hbase integration

I'm running hive and hbase on a 2-node-hadoop.
I'm using hadoop-0.20.205.0, hive-0.9.0, hbase-0.92.0, and zookeeper-3.4.2.
hive and hbase works fine separately. Then I followed this manual to integrate hive and hbase.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
hive started without errors, and I created the sample table
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
show tables in hive and list or scan in hbase works well.
But when I select * from hbase_table_1; in hive, I get errors
2012-09-12 11:25:56,975 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR scratch directory - check filesystem config (null))
java.lang.RuntimeException: Error while making MR scratch directory - check filesystem config (null)
...
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://10.10.10.15:54310/tmp/hive-hadoop/hive_2012-09-12_11-25-56_602_1946700606338541381, expected: hdfs://hadoop01:54310
It says fs is wrong, but I don't think it's right to config fs to such a path, and where should I config it?
Here is my config files. Ip address of hadoop01 is 10.10.10.15.
hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.10.10.15</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/datas/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:54310/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
Anyone can help please?
I solved it myself.
Modify $HADOOP_HOME/conf/core-site.xml, change dfs.default.name from ip to hostname. like this
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop01:54310/</value>
</property>
Make sure that both this property and hbase.rootdir property in hbase-site.xml use same hostname or ip.

Resources