How to extract the column names and data types from each table in the below DDL file.
-- Create statement 1
-- Some text
Create table Test1(
id int,
name string
);
Create table Test2(
id int,
city string
)
Output should look like below for each table -
[{"column":"id","dataType":"int"},
{"column":"name","dataType":"string"}]
[{"column":"id","dataType":"int"},
{"column":"city","dataType":"string"}]
Related
There is a Hive-table with 2 string columns one partition "cmd_out".
I'm trying to rename all 2 columns ('col1', 'col2'), by using Replace-function:
Alter table 'table_test' replace columns(
'col22' String,
'coll33' String
)
But I receive the following exception:
Partition column name 'cmd_out' conflicts with table columns.
When I include the partition column in query
Alter table 'table_test' replace columns(
'cmd_out' String,
'col22' String,
'coll33' String
)
I receive:
Duplicate column name cmd_out in the table definition
if you want to rename a column, you need to use alter table ... change.
Here is the syntax
alter table mytab change col1 new_col1 string;
First of all I have created the table "emp" in Hive by using below commands:
create table emp (id INT, name STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Then load the data in this "emp" table by this below command:
LOAD DATA LOCAL INPATH '\home\cloudera\Desktop\emp.txt' overwrite into table emp;
When I select the data from "emp" table: it show me first field of table Null
like this:
You have an header row in your file and the first value id cannot be converted into INT therefore being replaced by NULL.
add tblproperties ("skip.header.line.count"="1") to your table definition
For an existing table -
alter table emp set tblproperties ("skip.header.line.count"="1");
I have a CSv file:
Name,Age,City,Country
SACHIN,44,PUNE,INDIA
TENDULKAR,45,MUMBAI,INDIA
SOURAV,45,NEW YORK,USA
GANGULY,45,CHICAGO,USA
I created a HIVE table and loaded the data into it.
I found that the above file is wrong and corrected file is below:
Name,Age,City,Country
SACHIN,44,PUNE,INDIA
TENDULKAR,45,MUMBAI,INDIA
SOURAV,45,NEW JERSEY,USA
GANGULY,45,CHICAGO,USA
I need to update my main table with correct file.
I have tried below approaches.
1- Created the main table as a partitioned table on City and dynamically loaded the first file.
Step1- Creating a temp table and loading the old.csv file as it is without partitioning. This step I am doing to insert data in main table dyn dynamically by not creating separate input files per partition.
create table temp(
name string,
age int,
city string,
country string)
row format delimited
fields terminated by ','
stored as textfile;
Step2- Loaded old file into temporary table.
load data local inpath '/home/test_data/old.csv' into table temp;
Step3- Creating the main partitioned table.
create table dyn(
name string,
age int)
partitioned by(city string,country string)
row format delimited
fields terminated by ','
stored as textfile;
Step4- Inserting dynamically the old.csv file into the partitioned table from temporary table.
insert into table dyn
partition(city,country)
select name,age,city,country from temp;
Old recorded dynamically inserted into main table. In the next steps I am trying to correct the main table dyn with old.csv to new.csv
Step5- Creating another temporary table with new and correct input file.
create table temp1(
name string,
age int,
city string,
country string)
row format delimited
fields terminated by ','
stored as textfile;
Step6- Loading the new and correct input file into second temp table which will then be used to overwrite the main table but only the row whose data was wrong in old.csv. That is for SOURAV,45,NEW YORK,USA to SOURAV,45,NEW JERSEY,USA.
load data local inpath '/home/test_data/new.csv' into table temp1;
Overwriting the main table but only the row whose data was wrong in old.csv. That is for SOURAV,45,NEW YORK,USA to SOURAV,45,NEW JERSEY,USA.
Final overwrite Step7 attempt 1-
insert overwrite table dyn partition(country='USA' , city='NEW YORK') select city,country from temp1 t where t.city='NEW JERSEY' and t.country='USA';
Result:- Inserted NUll in Name column.
NEW JERSEY NULL NEW YORK USA
Final overwrite Step7 attempt 2-
insert overwrite table dyn partition(country='USA' , city='NEW YORK') select name,age from temp1 t where t.city='NEW JERSEY' and t.country='USA';
Result:- No change in dyn table. Same as before. NEW YORk did not update to NEW JERSEY
Final overwrite Step7 attempt3 -
insert overwrite table dyn partition(country='USA' , city='NEW YORK') select * from temp1 t where t.city='NEW JERSEY' and t.country='USA';
Error:- FAILED: SemanticException [Error 10044]: Line 1:23 Cannot Insert into target table because column number/types are different. Table insclause-0 has 2 columns,but query has 4 columns
What is the correct approach for this problem.
I'm completely new to Hive and Stack Overflow. I'm trying to create a table with complex data type "STRUCT" and then populate it using INSERT INTO TABLE in Hive.
I'm using the following code:
CREATE TABLE struct_test
(
address STRUCT<
houseno: STRING
,streetname: STRING
,town: STRING
,postcode: STRING
>
);
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('123', 'GoldStreet', London', W1a9JF') AS address
FROM dummy_table
LIMIT 1;
I get the following error:
Error while compiling statement: FAILED: semanticException [Error
10044]: Cannot insert into target because column number type are
different 'struct_test': Cannot convert column 0 from struct to
array>.
I was able to use similar code with success to create and populate a data type Array but am having difficulty with Struct. I've tried lots of code examples I've found online but none of them seem to work for me... I would really appreciate some help on this as I've been stuck on it for quite a while now! Thanks.
your sql error. you should use sql:
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno','123','streetname','GoldStreet', 'town','London', 'postcode','W1a9JF') AS address
FROM dummy_table LIMIT 1;
You can not insert complex data type directly in Hive.For inserting structs you have function named_struct. You need to create a dummy table with data that you want to be inserted in Structs column of desired table.
Like in your case create a dummy table
CREATE TABLE DUMMY ( houseno: STRING
,streetname: STRING
,town: STRING
,postcode: STRING);
Then to insert in desired table do
INSERT INTO struct_test SELECT named_struct('houseno',houseno,'streetname'
,streetname,'town',town,'postcode',postcode) from dummy;
No need to create any dummy table : just use command :
insert into struct_test
select named_struct("houseno","house_number","streetname","xxxy","town","town_name","postcode","postcode_name");
is Possible:
you must give the columns names in sentence from dummy or other table.
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno','123','streetname','GoldStreet', 'town','London', 'postcode','W1a9JF') AS address
FROM dummy
Or
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno',tb.col1,'streetname',tb.col2, 'town',tb.col3, 'postcode',tb.col4) AS address
FROM table1 as tb
CREATE TABLE IF NOT EXISTS sunil_table(
id INT,
name STRING,
address STRUCT<state:STRING,city:STRING,pincode:INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '.';
INSERT INTO sunil_table 1,"name" SELECT named_struct(
"state","haryana","city","fbd","pincode",4500);???
how to insert both (normal and complex)data into table
I want to read a .tsv file from Hbase into hive. The file has a columnfamily, which has 3 columns inside: news, social and all. The aim is to store these columns in an table in hbase which has the columns news, social and all.
CREATE EXTERNAL TABLE IF NOT EXISTS topwords_logs (key String,
columnfamily String , wort String , col String, occurance int)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/home
/hfu/Testdaten';
load data local inpath '/home/hfu/Testdaten/part-r-00000.tsv' into table topwords_logs;
CREATE TABLE newtopwords (columnall int, columnsocial int , columnnews int) PARTITIONED BY(wort STRING) STORED AS SEQUENCEFILE;
Here i created a external table, which contain the data from hbase. Further on I created a table with the 3 columns.
What i have tried so far is this:
insert overwrite table newtopwords partition(wort)
select occurance ,'1', '1' , wort from topwords_log;
This Code works fine, but i have for each column an extra where clause. How can I insert data like this?
insert overwrite table newtopwords partition(wort)
values(columnall,(select occurance from topwords_logs where col =' all')),(columnnews,( select occurance from topwords_logs where col =' news')) ,(columnsocial,( select occurance from topwords_logs where col =' social')),(wort,(select wort from topwords_log));
This code isnt working ->NoViableAltException.
On every example I just see Code, where they insert data without a Where clause. How can I insert Data with a Where clause?