Cannot use a "." in a Hive table column name - hadoop

I'm using Hive 2.1.1 and I'm attempting to create a table with . in a column name:
CREATE TABLE `test_table`(
`field.with.dots` string
);
When I do so I get:
FAILED: ParseException line 4:0 Failed to recognize predicate ')'. Failed rule: '[., :] can not be used in column name in create table statement.' in column specification
I must be doing something wrong because the hive documentation says:
In Hive release 0.13.0 and later, by default column names can be specified within backticks (`) and contain any Unicode character (HIVE-6013)
. is a unicode character. And idea what I might be doing?
To give you more context this is on an Amazon EMR 5.5.0 cluster. Thanks!

Source code: HiveParser
...
private char [] excludedCharForColumnName = {'.', ':'};
...
private CommonTree throwColumnNameException() throws RecognitionException {
throw new FailedPredicateException(input, Arrays.toString(excludedCharForColumnName) + " can not be used in column name in create table statement.", "");
}
Jira ticket :Disallow create table with dot/colon in column name
Please note the motivation:
Since we don't allow users to query column names with dot in the
middle such as emp.no, don't allow users to create tables with such
columns that cannot be queried
It seems create table was handled, but not CTAS nor ALTER TABLE...
hive> create table t as select 1 as `a.b.c`;
OK
hive> desc t;
OK
col_name data_type comment
a.b.c int
Time taken: 0.441 seconds, Fetched: 1 row(s)
hive> select * from t;
FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from [0:a.b.c]
hive> create table t (i int);
OK
hive> alter table t change column i `a.b.c` int
hive> select * from t;
Error while compiling statement: FAILED: RuntimeException java.lang.RuntimeException: cannot find field a from [0:a.b.c]
P.s.
I have updated the documentation (look for colon)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Related

Alter Table Add Column if Not Present | Beeline CLI

I was looking for way to add column in Hive table via Beeline interface only when its not present.
create table employee(ename string , eid string);
alter table employee add columns (eid string);
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Duplicate column name: eid (state=08S01,code=1)
is there any way to ignore error or not try adding a column if already present ?
Thanks
Shashi

add partition in hive table based on a sub query

I am trying to add partition to a hive table (partitioned by date)
My problem is that the date needs to be fetched from another table.
My query looks like :
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date = (SELECT max(server_date) FROM processed_table));
When i run the query hive throws the following error:
Error: Error while compiling statement: FAILED: ParseException line 1:84 cannot recognize input near '(' 'SELECT' 'max' in constant (state=42000,code=40000)
Hive does not allow to use functions/UDF's for the partition column.
Approach 1:
To achieve this you can run the first query and store the result in one variable and then execute the query.
server_date=$(hive -e "set hive.cli.print.header=false; select max(server_date) from processed_table;")
hive -hiveconf "server_date"="$server_date" -f your_hive_script.hql
Inside your script you can use the following statement:
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date =${hiveconf:server_date});
For more information on the hive variable substitution, you can refer link
Approach 2:
In this approach, you will need to create a temporary table if the partition data you are expecting is already not loaded in any other partitioned table.
Considering your data doesn't have the server_date column.
Load the data into temporary table
set hive.exec.dynamic.partition=true;
Execute the below query:
INSERT OVERWRITE TABLE my_table PARTITION (server_date)
SELECT b.column1, b.column2,........,a.server_date as server_date FROM (select max(server_date) as server_date from ) a, my_table b;

Hive - Insert data into partitioned table: partition not found

I'm having issue while trying to inserting new data in Hive external partitioned table.
Table is partitioned by day, the error I got is:
FAILED: SemanticException [Error 10006]: Line 1:51 Partition not found ''18102016''
My query is as following:
ALTER TABLE my_source_table RECOVER PARTITIONS;
INSERT OVERWRITE TABLE my_dest_table PARTITION (d = '18102016')
SELECT
'III' AS primary_alias_type,
iii_id AS primary_alias_id,
FROM
my_source_table
WHERE
d = '18102016'
The my_dest_table has been created as:
CREATE EXTERNAL TABLE my_dest_table (
primary_alias_type string,
primary_alias_id
) PARTITIONED BY (d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://my_bucket/my_external_tables/'
Any idea on what I'm doing wrong? Thanks!
I believe you should ALTER TABLE my_source_table RECOVER PARTITIONS; do this for your destination table.
ALTER TABLE my_dest_table RECOVER PARTITIONS;
try this.
Note: Of course you should remove the extra comma what Alex L mentioned. Which will give other parsing error.

Hive: Insert into hive table with column using select 1

Let's say I have a hive table test_entry with column called entry_id.
hive> desc test_entry;
OK
entry_id int
Time taken: 0.4 seconds, Fetched: 1 row(s)
Suppose I need to insert one row into this above table using select 1 (which returns 1). For example: A syntax which looks like the below:
hive> insert into table test_entry select 1;
But I get the below error:
FAILED: NullPointerException null
So effectively, I would like to insert one row for entry)id whose value will be 1 with such a select statement(without referring another table).
How can this be done?
Hive does not support what you're trying to do. Inserts to ORC based tables was introduced in Hive 0.13.
Prior to that, you have to specify a FROM clause if you're doing INSERT .. SELECT
A workaround might be to create an external table with one row and do the following
INSERT .. SELECT 1 FROM table_with_one_row

Data not getting loaded into Partitioned Table in Hive

I am trying to create partition for my Table inorder to update a value.
This is my sample data
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
I want to update Janet's Department to B.
So for doing that I created a table with Department as partition.
create external table trail (EmployeeID Int,FirstName
String,Designation String,Salary Int) PARTITIONED BY (Department
String) row format delimited fields terminated by "," location
'/user/sreeveni/HIVE';
But while doing the above command.
No data are inserted into trail table.
hive>select * from trail;
OK
Time taken: 0.193 seconds
hive>desc trail;
OK
employeeid int None
firstname string None
designation string None
salary int None
department string None
# Partition Information
# col_name data_type comment
department string None
Am I doing anything wrong?
UPDATE
As suggested I tried to insert data into my table
load data inpath '/user/aibladmin/HIVE' overwrite into table trail
Partition(Department);
But it is showing
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode
requires at least one static partition column. To turn this off set
hive.exec.dynamic.partition.mode=nonstrict
After setting set hive.exec.dynamic.partition.mode=nonstrict also didnt work fine.
Anything else to do.
Try both below properties
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
And while writing insert statement for a partitioned table make sure that you specify the partition columns at the last in select clause.
You cannot directly insert data(Hdfs File) into a Partitioned hive table.
First you need to create a normal table, then you will insert that table data into partitioned table.
set hive.exec.dynamic.partition.mode=strict means when ever you are populating hive table it must have at least one static partition column.
set hive.exec.dynamic.partition.mode=nonstrict In this mode you don't need any static partition column.
Try the following:
Start by creating the table:
create external table test23 (EmployeeID Int,FirstName String,Designation String,Salary Int) PARTITIONED BY (Department String) row format delimited fields terminated by "," location '/user/rocky/HIVE';
Create a directory in hdfs with partition name :
$ hadoop fs -mkdir /user/rocky/HIVE/department=50000
Create a local file abc.txt by filtering records having department equal to 50000:
$ cat abc.txt
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
Put it into HDFS:
$ hadoop fs -put /home/yarn/abc.txt /user/rocky/HIVE/department=50000
Now alter the table:
ALTER TABLE test23 ADD PARTITION(department=50000);
And check the result:
select * from test23 ;
just set those 2 properties BEFORE you getOrCreate() the spark session:
SparkSession
.builder
.config(new SparkConf())
.appName(appName)
.enableHiveSupport()
.config("hive.exec.dynamic.partition","true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.getOrCreate()
I ran into the same problem and yes these two properties are needed. However, I used JDBC driver with Scala to set these properties before executing Hive statements. The problem, however, was that I was executing a bunch of properties (SET statements) in one execution statement like this
conn = DriverManager.getConnection(conf.get[String]("hive.jdbc.url"))
conn.createStatement().execute(
"SET spark.executor.memory = 2G;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.other.statements =blabla ;")
For some reason, the driver was not able to interpret all these as separate statements, so I needed to execute each one of them separately.
conn = DriverManager.getConnection(conf.get[String]("hive.jdbc.url"))
conn.createStatement().execute("SET spark.executor.memory = 2G;")
conn.createStatement().execute("SET hive.exec.dynamic.partition.mode=nonstrict;")
conn.createStatement().execute("SET hive.other.statements =blabla ;")
Can you try running
MSCK REPAIR TABLE table_name;
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

Resources