How to insert data to array column programmatically? - jdbc

There is a hive table t has column
create table pageAds(id int,pageid STRING,adid_list Array<string>)
and I could insert it directly in Beeline console
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.7 by Apache Hive
0: jdbc:hive2://hadoop1.data.com:10000>
insert into pageAds select 1, 'front_page', array('1','2','3');
How to insert data into it programmatically by using JDBC
From Doc I know below method not supported by PreparedStatement
setArray(int parameterIndex, Array x)
Does exist some other manner could insert data to hive array column programmatically?

Related

Spark : How to alter oracle session (nls_date_format) from spark?

I have a dataframe that contains Timestamp column (2021-01-19 13:00:30).
When i send this dataframe to an existing table in Oracle (19c) using Spark (2) Scala jdbc, it insert 2021-01-19 13:00:30.000000 even if the column in Oracle is TIMESTAMP(0)
Exemple :
df.write.mode(SaveMode.Append).jdbc(url, tableName, connectionProperties)
I tried to alter session from Spark before sending data, by creating connection than execute the code :
connection.setAutoCommit(true)
val statement: Statement = connection.createStatement()
statement.executeQuery("alter session set nls_date_format='YYYY-MM-DD HH24:MI:SS")
But it doesn't really alter the session (I have no error).
Should I specify in df.write ... the pattern of my Timestamp ? Otherwise, am I altering correctly the session from Spark ?

alter table/add columns in non native table in hive

I created a hive table with a storage handler and now I want to add a column to that table but it gives me below error:
[Code: 10134, SQL State: 42000] Error while compiling statement: FAILED:
SemanticException [Error 10134]: ALTER TABLE can only be used for [ADDPROPS,
DROPPROPS] to a non-native table
As per the hive documentation any hive table you create with storage handler is non native table.
Here's a link https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
There is a JIRA case for enhancement is open with Apache for the same.
https://issues.apache.org/jira/browse/HIVE-1240
For ex, I am using Druid Storage Handler in my case.
I created a hive table using:
CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `metric1` int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
and then I am trying to add a column:
ALTER TABLE druid_table_1 ADD COLUMNS (`dimension2` STRING);
With above approach I am getting an error.
Is there any other way to add a column to non native tables in hive without recreating it?
Patch is available in HDP 2.5+ from Hortonworks. Support for ADD columns has been added in ALTER statement.
Column can be added into druid table using ALTER table DDL in hive.
ALTER TABLE ADD COLUMNS (col_name data_type)
There is no need to specify partition spec as these are druid backed hive tables and partition/storage is maintained by druid.

Can't read data in Presto - can in Hive

I have a Hive DB - I created a table, compatible to Parquet file type.
CREATE EXTERNAL TABLE `default.table`(
`date` date,
`udid` string,
`message_token` string)
PARTITIONED BY (
`dt` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://Bucket/Folder')
I added partitions to this table, but I can't query the data.
In Hive: I can see the partitions when using "Show partitions from default.table", and I get the number of queries when using "Select count(*) from default.table".
In Presto: I can see the partitions when using "Show partitions from default.table", but when I try to query the data itself - it looks like there's no data - empty return with "select *", and 0 when trying "select count(*)".
Hive cluster is AWS EMR, version: emr-5.9.0, Applications: Hive 2.3.0, Presto 0.184, instance type: r3.2xlarge.
Does someone know why I get these differences between Hive and Presto?
Thanks!

Spark Sql 1.5 dataframe saveAsTable how to add hive table properties

I am running spark sql on hive. I need to add auto.purge table properties while creating new hive table. I tried below code to add options while calling saveAsTable method :
inputDF.write.option("auto.purge" -> "true").saveAsTable(hiveTableName)
Above line of code added a property under WITH SERDEPROPERTIES of table.
I need to add this property under TBLPROPERTIES section of hive DDL.
Finally i found a solution, I am not sure if this is the best solution.
Unfortunately Spark 1.5 sql saveAsTable method doesn't support table property as input.They are creating new tableProperties map before hive table creation.
check out below code:
https://github.com/apache/spark/blob/v1.5.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
To add table properties to existing hive table use alter table command.
ALTER TABLE table_name SET TBLPROPERTIES ('auto.purge'='true');
Above command will add table property to hive meta store.
To drop existing table inside encryption zone run above command before drop command.

Is it possible to join a hive table with oracle table?

I have a problem in writing Query using HiveQL.
Is it possible to join a hive table with oracle table?
if yes how?
if no why?
To access data stored in your Hive tables, including joining on them, you will need Oracle Big Data connector.
From the documentation:
Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL.
You first access Hive tables from Oracle Database via external tables . The The external table definition is generated automatically from the Hive table definition. Hive table data can be accessed by querying this external table. The data can be queried with Oracle SQL and joined with other tables in the database.
You can use the Hive table that uses data and can access this Hive table from Oracle Database.

Resources