How to use hadoop from spark thrift server? - hadoop

Please consider the following setup.
hadoop version 2.6.4
spark version 2.1.0
OS CentOS Linux release 7.2.1511 (Core)
All software is installed on a single machine as a single node cluster, spark is installed in standalone mode.
I am trying to use Spark Thrift Server.
To start the spark thrift server I run the shell script
start-thriftserver.sh
After running the thrift server, I can run beeline command line tool and issue the following commands:
The commands run successfully:
!connect jdbc:hive2://localhost:10000 user_name '' org.apache.hive.jdbc.HiveDriver
create database testdb;
use testdb;
create table names_tab(a int, name string) row format delimited fields terminated by ' ';
My first question is where on haddop is the underlying file/folder for this table/database created?
The problem is even if hadoop is stopped using stop-all.sh, still the create table/database command is successful,
which makes me think that the table is not created on hadoop at all.
My second question is how do I tell spark where in the world is hadoop installed?
and ask spark to use hadoop as the underlying data store for all queries run from beeline.
Am I supposed to install spark in some other mode?
Thanks in advance.

My objective was to get the beeline command line utility work through Spark Thrift Server using hadoop as underlying data-store and I got it to work. My setup was like this:
Hadoop <--> Spark <--> SparkThriftServer <--> beeline
I wanted to configure spark in such a manner that it uses hadoop for all queries run at beeline command line utility.
The trick was to specify the following property in spark-defaults.xml.
spark.sql.warehouse.dir hdfs://localhost:9000/user/hive/warehouse
By default spark uses derby for both meta data and the data itself (called warehouse in spark)
In order to have spark use hadoop as warehouse I had to add this property.
Here is a sample output
./beeline
Beeline version 1.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 abbasbutt '' org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Spark SQL (version 2.1.0)
Driver: Hive JDBC (version 1.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> create database my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.379 seconds)
0: jdbc:hive2://localhost:10000> use my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.03 seconds)
0: jdbc:hive2://localhost:10000> create table my_names_tab(a int, b string) row format delimited fields terminated by ' ';
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.11 seconds)
0: jdbc:hive2://localhost:10000>
Here are the corresponding files in hadoop
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/
17/01/19 10:48:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/fdw_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:23 /user/hive/warehouse/my_spark_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:47 /user/hive/warehouse/my_test_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/testdb.db
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/my_test_db.db/
17/01/19 10:50:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:50 /user/hive/warehouse/my_test_db.db/my_names_tab
[abbasbutt#localhost test]$

Related

Ambari Hive UTF-8 problems

Have a problem with cyrillic symbols at hive tables. Installed versions:
ambari-server 2.4.2.0-136
hive-2-5-3-0-37 1.2.1000.2.5.3.0-37
Ubuntu 14.04
Whats the problem:
Set locale to ru_RU.UTF-8:
spark#hadoop:~$ locale
LANG=ru_RU.UTF-8
LANGUAGE=ru_RU:ru
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=ru_RU.UTF-8
Connect to hive and create test table:
spark#hadoop:~$ beeline -n spark -u jdbc:hive2://spark#hadoop.domain.com:10000/
Connecting to enter code herejdbc:hive2://spark#hadoop.domain.com:10000/
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
0: jdbc:hive2://spark#hadoop.domain.com> CREATE TABLE `test`(`name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'serialization.encoding'='UTF-8');
No rows affected (0,127 seconds)
Insert cyrillic symbols:
0: jdbc:hive2://spark#hadoop.domain.com> insert into test values('привет');
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into test values('привет')(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1490211406894_2481)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table default.test from hdfs://hadoop.domain.com:8020/apps/hive/warehouse/test/.hive-staging_hive_2017-03-23_13-41-46_215_3133047104896717605-116/-ext-10000
INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
No rows affected (6,652 seconds)
Select from table:
0: jdbc:hive2://spark#hadoop.domain.com> select * from test;
+------------+--+
| test.name |
+------------+--+
| ?#825B |
+------------+--+
1 row selected (0,162 seconds)
I've read a lot of bugs at apache hive, tested unicode, utf-8, utf-16, some isos encodings with no luck.
Can somebody help me with that?
Thanks!
Guys from Hortonwroks helped me with that issue. Seems that it's a bug.
https://community.hortonworks.com/answers/90989/view.html
https://issues.apache.org/jira/browse/HIVE-13983

Getting permission denied error when executing Hive query

I'm getting the following error when executing a select count(*) from tablename query when connected to beeline.
ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:201)
I can execute showtables; successfully but get this error anytime I execute a query. I am logged in as the hadoop user that has access to both hadoop and hive.
I've granted the folder where the tables resides full permissions:
drwxr-xr-x - hadoop supergroup 0 2015-06-03 15:44 /data1
drwxrwxrwx - hadoop hadoop 0 2015-06-05 15:23 /tmp
drwxrwxrwx - hadoop supergroup 0 2015-06-05 15:24 /user
The table is in the user directory.
Environment details:
OS: CentOS
Hadoop: HW 2.6.0
Hive: 1.2
Any help would be greatly appreciated.
Is this a hive managed table in that case could you print what you get when you do
hadoop fs -ls /user
hadoop fs -ls /user/hive
hadoop fs -ls /user/hive/warehouse
the error suggests that you are accessing a table from a user who is not the owner and seems like user does not have read and execute access

Not able to read hdfs files through pig on pseudo node cluster

I have this very basic test (immediately after installation of both hadoop 2.7 and pig 0.14)
the file exists in hdfs -
hdfs://master:50070/user/raghav/family<r 2> 32
hdfs://master:50070/user/raghav/nsedata <dir>
however, when i run the following,
A = LOAD 'family';
dump A;
i get the following error message -
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.0 0.14.0 raghav 2015-05-19 21:38:35 2015-05-19 21:38:41 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1432066972596_0002 A MAP_ONLY Message: Job failed! hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833,
Input(s):
Failed to read data from "hdfs://master:50070/user/raghav/family"
Output(s):
Failed to produce result in "hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833"
Further investigations reveal a bit more.. As indicated, I can see the file on hdfs (from within pig through ls command) and also from shell prompt using hadoop fs commands. however, neither pig nor hive are able to see the files on hdfs.
I also tried to play around with the nematode ports (tried different values 8020, 9000, 50070) but the bahaviour remains same. I tried looking through nematode and datanode logs too, but couldn't find anything more...
serious help required !!!
Answers to some questions
myhost raghav$ hdfs dfs -ls /user/raghav/family
15/05/20 08:03:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
myhost raghav$ hdfs dfs -ls /user/raghav/
15/05/20 08:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
drwxr-xr-x - raghav supergroup 0 2015-05-15 00:25 /user/raghav/nsedata
myhost raghav$ hadoop fs -ls /
15/05/20 08:04:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - raghav supergroup 0 2015-05-19 23:06 /tmp
drwxr-xr-x - raghav supergroup 0 2015-05-20 07:30 /user
myhost raghav$
Further tests reveal that hive is able to use hdfs, but pig still can't. I could create an external table in hive, successfully pointing to the example file 'family'
create external table xfamily(name STRING, age INT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION '/user/raghav';
OK
Time taken: 0.023 seconds
hive> select * from xfamily;
xxxxxx - expected data shows up.

Hive 0.9, Hbase 0.98.5 and Hadoop 1.2.1

I have a single node Hadoop system. Installed version is 1.2.1. I have installed Hbase 0.98.5 and then I have installed Hive 0.9.
All the processes are running on my node.
Process details:
[root#localhost /]# jps
24396 SecondaryNameNode
24152 NameNode
23954 Jps
24274 DataNode
24488 JobTracker
24607 TaskTracker
1282 HQuorumPeer
2429 HMaster
2589 HRegionServer
From HBase shell, I am able to retrieve my table:
--------Hbase Shell-----------------
hbase(main):001:0> scan 'nancy'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.5-hadoop1/lib/slf4j-log4j12- 1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/lib/slf4j-log4j12- 1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
ROW COLUMN+CELL
2 column=cf:name, timestamp=1410461757734, value=Test
1 row(s) in 0.2220 seconds
I am also able to retrieve table list from Hive:
----------Hive Shell----------------------------------
hive> SHOW TABLES;
OK
pokes
Time taken: 3.195 seconds
I am able to create, populate tables in HBase and also in Hive. But I am unable to integrate Hive with Hbase.
When I am trying to register a table in Hive, I am getting the following error message:
hive> CREATE EXTERNAL TABLE hbase_table_2(key int,name string) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:name")
TBLPROPERTIES("hbase.table.name"="nancy");
FAILED: Error in metadata: java.lang.IllegalArgumentException: Not a host:port pair: PBUF localhost��ɞ�ކ)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
After following various websites on how to resolve the error, I have also done the following changes.
I have moved the hbase-clinet*.jar, hbase-server*.jar, hbase-common*.jar, hbase-protocol*.jar, hbase-common*.jar, zookeeper*.jar, guavae*.jar from $HBASE_HOME\lib to $HIVE\lib directory.
Copied the hbase configuration $HBASE_HOME\conf, hadoop configuration files from $HADOOP_HOME\conf to $HIVE_HOME\conf
Copied the hive-hbase-handler-0.9.0.jar and hive-common-0.9.0.jar from $HIVE_HOME\lib to $HADOOP_HOME\lib and $HBASE_HOME\lib.
Kindly give me some information on how to resolve the issue.

Hadoop\HDFS: "no such file or directory"

I have installed Hadoop 2.2 on a single machine using this tutorial: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Some details were changed a little bit - for example, I used java 8, /hadoop root dir etc. Users, SSH, config keys - the same.
Namenode was successfully formatted:
13/12/22 05:42:31 INFO common.Storage: Storage directory /hadoop/tmp/dfs/name has been successfully formatted.
13/12/22 05:42:31 INFO namenode.FSImage: Saving image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/12/22 05:42:32 INFO namenode.FSImage: Image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
13/12/22 05:42:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/12/22 05:42:32 INFO util.ExitUtil: Exiting with status 0
13/12/22 05:42:32 INFO namenode.NameNode: SHUTDOWN_MSG:
However, not 'mkdir' neither even 'ls' command worked:
$ /hadoop/hadoop/bin/hadoop fs -ls
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
13/12/22 05:39:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
Thanks for any help guys.
Try
hadoop fs -ls /
Tested on hadoop 2.4
In Hadoop 2.4
hdfs dfs -mkdir /input
hdfs dfs -ls /
Worked in my case:
First Get hadoop installed path by :
echo ${HADOOP_INSTALL} //in my case output is : `/user/local/hadoop`
Then create directory at your hadoop installed path, If you know your hadoop installed directory ignore above command
hadoop fs -mkdir -p /user/local/hadoop/your_directory
Here hadoop is directory
Tested on hadoop 2.4
I have verified this worked in Hadoop 2.5
hdfs dfs -mkdir /input
(where /input is the HDFS directory)

Resources