user xxxx not authorized to view the data (state=,code=0) in spark-sql & hive - hadoop

I am able to create table in spark-sql using beeline,but when i am trying to run query on created table I am getting error
"user not authorized to view the data".
below are the steps which i have perform
$SPARK_HOME/bin/beeline
!connect jdbc:hive2://server:10000 username password
CREATE EXTERNAL TABLE IF NOT EXISTS tablename(Name STRING,count INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/xxxx/'
when i do show tables i can see table has been created ,but while querying on tables i am getting error.
please help

Related

Inserting local csv to a Hive table from Qubole

I have a csv on my local machine, and I access Hive through Qubole web console. I am trying to upload the csv as a new table, but couldn't figure out. I have tried the following:
LOAD DATA LOCAL INPATH <path> INTO TABLE <table>;
I get the error saying No files matching path file
I am guessing that the csv has to be in some remote server where hive is actually running, and not on my local machine. The solutions I saw doesn't explain how to handle this issue. Can someone help me out reg. this?
Qubole allows you to define hive external/managed tables on the data sitting on your cloud storage ( s3 or azure storage ) - so LOAD from your local box wont work. you will have to upload this on your cloud storage and then define an external table against it -
CREATE External TABLE orc1ext(
`itinid` string, itinid1 string)
stored as ORC
LOCATION
's3n://mybucket/def.us.qubole.com/warehouse/testing.db/orc1';
INSERT INTO TABLE orc1ext SELECT itinid, itinid
FROM default.default_qubole_airline_origin_destination LIMIT 5;
First, create a table on hive using the field names present in your csv file.syntax which you are using seems correct.
Use below syntax for creating table
CREATE TABLE foobar(key string, stats map<string, bigint>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':' ;
and then load data using below format,then mention path name correctly
LOAD DATA LOCAL INPATH '/yourfilepath/foobar.csv' INTO TABLE foobar;

Hive error - Select * from table ;

I created one external table in hive which was successfully created.
create external table load_tweets(id BIGINT,text STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/user/cloudera/data/tweets_raw';
But, when I did:
hive> select * from load_tweets;
I got the below error:
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream#5dfb0646; line: 1, column: 2]**
Please suggest me how to fix this. Is it the twitter o/p file which was created using flume was corrupted or anything else?
You'll need to do two additional things.
1) Put data into the file (perhaps using INSERT). Or maybe it's already there. In either case, you'll then need to
2) from Hive, msck repair table load_tweets;
For Hive tables, the schema and other meta-information about the data is stored in what's called the Hive Metastore -- it's actually a relational database under the covers. When you perform operations on Hive tables created without the LOCATION keyword (that is, internal, not external tables), the Hive will automatically update the metastore.
But most Hive use-cases cause data to be appended to files that are updated using other processes, and thus external tables are common. If new partitions are created externally, before you can query them with Hive you need to force the metastore to sync with the current state of the data using msck repair table <tablename>;.

WHY does this simple Hive table declaration work? As if by magic

The following HQL works to create a Hive table in HDInsight which I can successfully query. But, I have several questions about WHY it works:
My data rows are, in fact, terminated by carriage return line feed, so why does 'COLLECTION ITEMS TERMINATED BY \002' work? And what is \002 anyway? And no location for the blob is specified so, again, why does this work?
All attempts at creating the same table and specifying "CREATE EXTERNAL TABLE...LOCATION '/user/hive/warehouse/salesorderdetail'" have failed. The table is created but no data is returned. Leave off "external" and don't specify any location and suddenly it works. Wtf?
CREATE TABLE IF NOT EXISTS default.salesorderdetail(
SalesOrderID int,
ProductID int,
OrderQty int,
LineTotal decimal
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE
Any insights are greatly appreciated.
UPDATE:Thanks for the help so far. Here's the exact syntax I'm using to attempt external table creation. (I've only changed the storage account name.) I don't see what I'm doing wrong.
drop table default.salesorderdetailx;
CREATE EXTERNAL TABLE default.salesorderdetailx(SalesOrderID int,
ProductID int,
OrderQty int,
LineTotal decimal)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE
LOCATION 'wasb://mycn-1#my.blob.core.windows.net/mycn-1/hive/warehouse/salesorderdetailx'
When you create your cluster in HDInsight, you have to specify underlying blob storage. It assumes that you are referencing that blob storage. You don't need to specific a location because your query is creating an internal table (see answer #2 below) which is created at a default location. External tables need to specify a location in Azure blob storage (outside of the cluster) so that the data in the table is not deleted when the cluster is dropped. See the Hive DDL for more information.
By default, tables are created as internal, and you have to specify the "external" to make them external tables.
Use EXTERNAL tables when:
Data is used outside Hive
You need data to be updateable in real time
Data is needed when you drop the cluster or the table
Hive should not own data and control settings, directories, etc.
Use INTERNAL tables when:
You want Hive to manage the data and storage
Short term usage (like a temp table)
Creating table based on existing table (AS SELECT)
Does the container "user/hive/warehouse/salesorderdetail" exist in your blob storage? That might explain why it is failing for your external table query.

Sqoop - Create empty hive partitioned table based on schema of oracle partitioned table

I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.

Creating External Table in Hive using HIVE JDBC : Not Possible?

External Table creation via HIVE JDBC isnt reflected in the hive datawarehouse whereas the normal table creation inside the hive datawarehouse happens without any issue.
After creating the table via Hive JDBC,
stmt.executeQuery("create external table trial (TOPIC STRING) row format delimited fields terminated by '' STORED as TEXTFILE LOCATION '/user/ranjitha/trial'");`
no error returned.
But when I try retrieving from this table trial, nothing is returned.
Here in this link, https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/YTekdFtbelE, it says external table creation not possible using HIVE JDBC.
It would be really helpful if someone can guide me on the above. Is this not possible with JDBC or is there another alternative for the same.
Thanks

Resources