Spark SQL throwing error "java.lang.UnsupportedOperationException: Unknown field type: void" - hadoop

I am getting below error in Spark(1.6) SQL while creating a table with column value default as NULL. Ex: create table test as select column_a, NULL as column_b from test_temp;
The same thing works in Hive and creates the column with data type "void".
I am using empty string instead of NULL to avoid the exception and new column getting string data type.
Is there any better way to insert null values in hive table using spark sql ?
2017-12-26 07:27:59 ERROR StandardImsLogger$:177 - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Unknown field type: void
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:789)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:746)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:428)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:239)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:238)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:281)
at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:426)
at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute$1(CreateTableAsSelect.scala:72)
at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$1(CreateTableAsSelect.scala:47)
at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:56)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:153)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829)

I couldn't find much information regarding the datatype void but it looks like it is somewhat equivalent to the Any datatype we have in Scala.
The table at the end of this page explains that a void can be cast to any other data type.
Here are some JIRA issues that are kinda similar to the problem you are facing
HIVE-2901
HIVE-747
So, as mentioned in the comment, instead of NULL you can cast it to any of the implicit data types.
select cast(NULL as string) as column_b

I started to get a similar issue. I build the code down to an example
WITH DATA
AS (
SELECT 1 ISSUE_ID,
DATE(NULL) DueDate,
MAKE_DATE(2000,01,01) DDate
UNION ALL
SELECT 1 ISSUE_ID,
MAKE_DATE(2000,01,01),
MAKE_DATE(2000,01,02)
)
SELECT ISNOTNULL(lag(IT.DueDate, 1) OVER (PARTITION by IT.ISSUE_ID ORDER BY IT.DDate ))
AND ISNULL(IT.DueDate)
FROM DATA IT

Related

Insert data into a column with user defined datatype in DB2

I created a user defined datatype using the following query
CREATE TYPE test3.data_type1 AS
(xmldata clob(204800) logged NOT compact)
INSTANTIABLE
inline LENGTH 195
MODE DB2SQL;
I then created a table using the following query:
CREATE TABLE test3.test_table
(col1 varchar(256),
col2 test3.data_type1,
col3 varchar(254),
col4 varchar(128),
col5 varchar(128) WITH DEFAULT USER,
col6 timestamp WITH DEFAULT,
col7 varchar(128),
col8 timestamp)
;
I am now trying to insert data into this table, test_table using:
INSERT INTO test3.test_table (col2)
VALUES
(data_type1('aaaa'));
The error I get is as follows:
SQL Error [42884]: No authorized routine named "DATA_TYPE1" of type "FUNCTION" having compatible arguments was found.. SQLCODE=-440, SQLSTATE=42884, DRIVER=4.26.14
I also tried multiple options of this insert statement but with no luck. I also tried creating a function as mentioned in this post INSERT into DB2 with a user defined type column
But, I get an error while creating this function as well. Here is the function as per this post :
CREATE FUNCTION data_type1 (xmldata clob(204800))
RETURNS data_type1
BEGIN ATOMIC
DECLARE t data_type1;
SET t = data_type1();
SET t..xmldata = xmldata;
RETURN t;
END
The error I get with this is as follows:
SQL Error [42704]: "DATA_TYPE1" is an undefined name.. SQLCODE=-204, SQLSTATE=42704, DRIVER=4.26.14
DB2 version - 11.5 LUW
Any pointers in the right direction would help.

query to change the date format in the column

I have a report to create but there's a little problem I can't solve because the column(date) I generate has a different value. I use it in a subquery. My question is can I used a format so that I can manage to edit the value of the column? Please see the table below for reference,
My column(date) contains
date_columns
2019-06-20T11:09:15.674+00:00
2019-06-20T11:09:15.674+00:00
2019-06-20T11:09:15.674+00:00
2019-06-20T11:09:15.673+00:00
Now, my problem is it returned me ORA-01427: single-row subquery returns more than one row becaue of that 2019-06-20T11:09:15.673+00:00. Can I do a format to make it looked like 2019-06-20T11:09:15?
I tried the query below but nothing changed. It returned me a same error.
select distinct to_date(substr(dar.last_update_date,1,15),'YYYY-MM-DD HH:MI:SS')
select distinct to_date(dar.last_update_date,1,15,'YYYY-MM-DD HH:MI:SS')
Thanks!
2019-06-20T11:09:15.673+00:00 appears to be a string of a datetime in the official XML representation. We can turn it into an actual timestamp using to_timestamp_tz() and then cast the timestamp to a date:
select cast(
to_timestamp_tz('2019-06-20T11:09:15.673+00:00','YYYY-MM-DD"T"HH24:MI:SS:FFTZH:TZM')
as date)
from dual;
However, I'm not sure how this will resolve the ORA-01427: single-row subquery returns more than one row error. This exception occurs when we use a subquery like this …
where empno = ( select empno
from emp
where deptno = 30
and sal > 2300 )
… and the subquery returns more than one row because the WHERE clause is too lax. The solution is to fix the subquery's WHERE clause so it returns only one row (or use distinct in the subquery's projection if that's not possible).

how to insert uniontype in hive

I read the famous example about union type in hive
CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
SELECT foo FROM union_test;
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
Ok.. great.
what is the syntax in hive sql to insert these lines ?
I tried insert into union_test values (1.0)
SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different 'union_test': Cannot convert column 0 from string to uniontype,struct>.
On the other hand, if I create one table with a double, how can I feed it with union_test table ?
Surely there is a tip.
Thanks
Did you consider looking at the documentation?
Straight from Hive Language Manual UDF, under "Complex type constructors"...
create_union (tag, val1, val2, ...)
Creates a union type with the value that is being pointed to by the tag parameter.
OK, that explanation about the "tag parameter" is very cryptic.
For examples, just look at the bottom of that blog post and/or at the answer to that question on the HortonWorks forum.
CREATE table testtable5(c1 integer,c2 uniontype<integer,string>);
INSERT INTO testtable5 VALUES(5,create_union(0,1,'testing'));
SELECT create_union(if(c1<0,0,1),c1,c2) from testtable5;
INSERT INTO testtable5 VALUES(1,cretae_union(1,1,'testing'));
SELECT create_union(if(c1<0,0,1),c1,c2) from testtable5;

How do you insert data into complex data type "Struct" in Hive

I'm completely new to Hive and Stack Overflow. I'm trying to create a table with complex data type "STRUCT" and then populate it using INSERT INTO TABLE in Hive.
I'm using the following code:
CREATE TABLE struct_test
(
address STRUCT<
houseno: STRING
,streetname: STRING
,town: STRING
,postcode: STRING
>
);
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('123', 'GoldStreet', London', W1a9JF') AS address
FROM dummy_table
LIMIT 1;
I get the following error:
Error while compiling statement: FAILED: semanticException [Error
10044]: Cannot insert into target because column number type are
different 'struct_test': Cannot convert column 0 from struct to
array>.
I was able to use similar code with success to create and populate a data type Array but am having difficulty with Struct. I've tried lots of code examples I've found online but none of them seem to work for me... I would really appreciate some help on this as I've been stuck on it for quite a while now! Thanks.
your sql error. you should use sql:
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno','123','streetname','GoldStreet', 'town','London', 'postcode','W1a9JF') AS address
FROM dummy_table LIMIT 1;
You can not insert complex data type directly in Hive.For inserting structs you have function named_struct. You need to create a dummy table with data that you want to be inserted in Structs column of desired table.
Like in your case create a dummy table
CREATE TABLE DUMMY ( houseno: STRING
,streetname: STRING
,town: STRING
,postcode: STRING);
Then to insert in desired table do
INSERT INTO struct_test SELECT named_struct('houseno',houseno,'streetname'
,streetname,'town',town,'postcode',postcode) from dummy;
No need to create any dummy table : just use command :
insert into struct_test
select named_struct("houseno","house_number","streetname","xxxy","town","town_name","postcode","postcode_name");
is Possible:
you must give the columns names in sentence from dummy or other table.
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno','123','streetname','GoldStreet', 'town','London', 'postcode','W1a9JF') AS address
FROM dummy
Or
INSERT INTO TABLE struct_test
SELECT NAMED_STRUCT('houseno',tb.col1,'streetname',tb.col2, 'town',tb.col3, 'postcode',tb.col4) AS address
FROM table1 as tb
CREATE TABLE IF NOT EXISTS sunil_table(
id INT,
name STRING,
address STRUCT<state:STRING,city:STRING,pincode:INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '.';
INSERT INTO sunil_table 1,"name" SELECT named_struct(
"state","haryana","city","fbd","pincode",4500);???
how to insert both (normal and complex)data into table

Apache Hive - Single Insert Date Value

I'm trying to insert a date into a date column using Hive. So far, here's what i've tried
INSERT INTO table1 (EmpNo, DOB)
VALUES ('Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date));
AND
INSERT INTO table table1 values('Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date));
AND
INSERT INTO table1 SELECT
'Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date);
But i still get
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
OR
FAILED: ParseException line 2:186 Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement
Hive ACID has been enabled on the ORC based table and simple inserts without dates are working.
I think i'm missing something really simple. But can't put my finger on it.
Ok. I found it. I feel like a doofus now.
It was as simple as
INSERT INTO table1 values ('Clerk#0008000', '2016-01-01');

Resources