Error using Polybase to load Parquet file: class java.lang.Integer cannot be cast to class parquet.io.api.Binary - parquet

I have a snappy.parquet file with a schema like this:
{
"type": "struct",
"fields": [{
"name": "MyTinyInt",
"type": "byte",
"nullable": true,
"metadata": {}
}
...
]
}
Update: parquet-tools reveals this:
############ Column(MyTinyInt) ############
name: MyTinyInt
path: MyTinyInt
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=8, isSigned=true)
converted_type (legacy): INT_8
When I try and run a stored procedure in Azure Data Studio to load this into an external staging table with PolyBase I get the error:
11:16:21Started executing query at Line 113
Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary (java.lang.Integer is in module java.base of loader 'bootstrap'; parquet.io.api.Binary is in unnamed module of loader 'app')
The load into the external table works fine with only varchars
CREATE EXTERNAL TABLE [domain].[TempTable]
(
...
MyTinyInt tinyint NULL,
...
)
WITH
(
LOCATION = ''' + #Location + ''',
DATA_SOURCE = datalake,
FILE_FORMAT = parquet_snappy
)
The data will eventually be merged into a Data Warehouse Synapse table. In that table the column will have to be of type tinyint.

I have the same issue and good support plan in Azure, so I've got an answer from Microsoft:
there is a known bug in ADF for this particular scenario: The date
type in parquet should be mapped as data type date in Sql sever
however, ADF incorrectly converts this type to Datetime2 which causes
a conflict in PolyBase. I have confirmation for the core engineering
team that this will be rectified with a fix by the end of November and
will be published directly into the ADF product.
In the meantime, as a workaround:
Create the target table with data type DATE as opposed to DATETIME2
Configure the Copy Activity Sink settings to use Copy Command as opposed to PolyBase
but even Copy command don't work for me, so only one workaround is to use Bulk insert, but Bulk is extremely slow and on big datasets it's would be a problem

Related

Function not found when generating code with DDL Database - jooq

I have a spring boot gradle project with a mysql database. Previously under jooq version 3.13.6 my sql was parsed without errors. When updating to a higher jooq version (3.14.X and 3.15.X) and generating/parsing the migrations with jooq, I get the following output:
SEVERE DDLDatabase Error: Your SQL string could not be parsed or
interpreted. This may have a variety of reasons, including:
The jOOQ parser doesn't understand your SQL
The jOOQ DDL simulation logic (translating to H2) cannot simulate your SQL
org.h2.jdbc.JdbcSQLSyntaxErrorException: Function "coalesce" not found;
A basic sql example where the error occurs is given below. Parsing the same view worked with jooq 3.13.6.
DROP VIEW IF EXISTS view1;
CREATE VIEW view1 AS
SELECT COALESCE(SUM(table1.col1), 0) AS 'sum'
FROM table1;
I am currently lost here. I don't see any related changes in the jooq changelog.
Any help or directions to further have a look into are highly appreciated.
Extended Stacktrace:
11:10:30 SEVERE DDLDatabase Error : Your SQL string could not be parsed or interpreted. This may have a variety of reasons, including:
- The jOOQ parser doesn't understand your SQL
- The jOOQ DDL simulation logic (translating to H2) cannot simulate your SQL
If you think this is a bug or a feature worth requesting, please report it here: https://github.com/jOOQ/jOOQ/issues/new/choose
As a workaround, you can use the Settings.parseIgnoreComments syntax documented here:
https://www.jooq.org/doc/latest/manual/sql-building/dsl-context/custom-settings/settings-parser/
11:10:30 SEVERE Error while loading file: /Users/axel/projects/service/./src/main/resources/db/migration/V5__create_view1.sql
11:10:30 SEVERE Error in file: /Users/axel/projects/service/build/tmp/generateJooq/config.xml. Error : Error while exporting schema
org.jooq.exception.DataAccessException: Error while exporting schema
at org.jooq.meta.extensions.AbstractInterpretingDatabase.connection(AbstractInterpretingDatabase.java:103)
at org.jooq.meta.extensions.AbstractInterpretingDatabase.create0(AbstractInterpretingDatabase.java:77)
at org.jooq.meta.AbstractDatabase.create(AbstractDatabase.java:332)
at org.jooq.meta.AbstractDatabase.create(AbstractDatabase.java:322)
at org.jooq.meta.AbstractDatabase.setConnection(AbstractDatabase.java:312)
at org.jooq.codegen.GenerationTool.run0(GenerationTool.java:531)
at org.jooq.codegen.GenerationTool.run(GenerationTool.java:237)
at org.jooq.codegen.GenerationTool.generate(GenerationTool.java:232)
at org.jooq.codegen.GenerationTool.main(GenerationTool.java:204)
Caused by: org.jooq.exception.DataAccessException: SQL [create view "view1" as select "coalesce"("sum"("table1"."col1"), 0) "sum" from "table1"]; Function "coalesce" not found; SQL statement:
create view "view1" as select "coalesce"("sum"("table1"."col1"), 0) "sum" from "table1" [90022-200]
at org.jooq_3.15.5.H2.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:2988)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:639)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:349)
at org.jooq.meta.extensions.ddl.DDLDatabase.load(DDLDatabase.java:183)
at org.jooq.meta.extensions.ddl.DDLDatabase.lambda$export$0(DDLDatabase.java:156)
at org.jooq.FilePattern.load0(FilePattern.java:307)
at org.jooq.FilePattern.load(FilePattern.java:287)
at org.jooq.FilePattern.load(FilePattern.java:300)
at org.jooq.FilePattern.load(FilePattern.java:251)
at org.jooq.meta.extensions.ddl.DDLDatabase.export(DDLDatabase.java:156)
at org.jooq.meta.extensions.AbstractInterpretingDatabase.connection(AbstractInterpretingDatabase.java:100)
... 8 more
Caused by: org.h2.jdbc.JdbcSQLSyntaxErrorException: Function "coalesce" not found; SQL statement:
create view "view1" as select "coalesce"("sum"("table1"."col1"), 0) "sum" from "table1" [90022-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:576)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
at org.h2.message.DbException.get(DbException.java:205)
at org.h2.message.DbException.get(DbException.java:181)
at org.h2.command.Parser.readJavaFunction(Parser.java:3565)
at org.h2.command.Parser.readFunction(Parser.java:3770)
at org.h2.command.Parser.readTerm(Parser.java:4305)
at org.h2.command.Parser.readFactor(Parser.java:3343)
at org.h2.command.Parser.readSum(Parser.java:3330)
at org.h2.command.Parser.readConcat(Parser.java:3305)
at org.h2.command.Parser.readCondition(Parser.java:3108)
at org.h2.command.Parser.readExpression(Parser.java:3059)
at org.h2.command.Parser.readFunctionParameters(Parser.java:3778)
at org.h2.command.Parser.readFunction(Parser.java:3772)
at org.h2.command.Parser.readTerm(Parser.java:4305)
at org.h2.command.Parser.readFactor(Parser.java:3343)
at org.h2.command.Parser.readSum(Parser.java:3330)
at org.h2.command.Parser.readConcat(Parser.java:3305)
at org.h2.command.Parser.readCondition(Parser.java:3108)
at org.h2.command.Parser.readExpression(Parser.java:3059)
at org.h2.command.Parser.parseSelectExpressions(Parser.java:2931)
at org.h2.command.Parser.parseSelect(Parser.java:2952)
at org.h2.command.Parser.parseQuerySub(Parser.java:2817)
at org.h2.command.Parser.parseSelectUnion(Parser.java:2649)
at org.h2.command.Parser.parseQuery(Parser.java:2620)
at org.h2.command.Parser.parseCreateView(Parser.java:6950)
at org.h2.command.Parser.parseCreate(Parser.java:6223)
at org.h2.command.Parser.parsePrepared(Parser.java:903)
at org.h2.command.Parser.parse(Parser.java:843)
at org.h2.command.Parser.parse(Parser.java:815)
at org.h2.command.Parser.prepareCommand(Parser.java:738)
at org.h2.engine.Session.prepareLocal(Session.java:657)
at org.h2.engine.Session.prepareCommand(Session.java:595)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1235)
at org.h2.jdbc.JdbcStatement.executeInternal(JdbcStatement.java:212)
at org.h2.jdbc.JdbcStatement.execute(JdbcStatement.java:201)
at org.jooq.tools.jdbc.DefaultStatement.execute(DefaultStatement.java:102)
at org.jooq.impl.SettingsEnabledPreparedStatement.execute(SettingsEnabledPreparedStatement.java:227)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:414)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:335)
... 16 more
> Task :generateJooq FAILED
Jooq Configuration:
jooq {
version = "3.15.5"
edition = JooqEdition.OSS
configurations {
main {
generationTool {
generator {
name = 'org.jooq.codegen.KotlinGenerator'
strategy {
name = 'org.jooq.codegen.DefaultGeneratorStrategy'
}
generate {
relations = true
deprecated = false
records = true
immutablePojos = true
fluentSetters = true
daos = false
pojosEqualsAndHashCode = true
javaTimeTypes = true
}
target {
packageName = 'de.project.service.jooq'
}
database {
name = 'org.jooq.meta.extensions.ddl.DDLDatabase'
properties {
property {
key = 'scripts'
value = 'src/main/resources/db/migration/*.sql'
}
property {
key = 'sort'
value = 'semantic'
}
property {
key = 'unqualifiedSchema'
value = 'none'
}
property {
key = 'defaultNameCase'
value = 'lower'
}
}
}
}
}
}
}
You probably have the following configuration set:
<property>
<key>defaultNameCase</key>
<value>lower</value>
</property>
In jOOQ 3.15, this transforms all identifiers to lower case and quotes them before handing the SQL statement to H2 behind the scenes for DDL simulation, in order to emulate e.g. PostgreSQL behaviour, where unquoted identifiers are lower case, not upper case as in many other RDBMS.
There's a bug in the current implementation, which also quotes built-in functions, not just user defined objects. See:
https://github.com/jOOQ/jOOQ/issues/9931 (general problem related to "system names")
https://github.com/jOOQ/jOOQ/issues/12752 (DDLDatabase specific problem)
The only workaround I can think of would be to turn off that property again, and manually quote all identifiers to be lower case. Alternatively, instead of using the DDLDatabase, you can always connect to an actual database instead, e.g. by using testcontainers. This will be much more robust in many ways, anyway, than the DDLDatabase
In any case, this is quite the frequent problem, so, I've fixed this for the upcoming jOOQ 3.16. The above setting will no longer quote "system names", which are well known identifiers of built-in functions

Unable to load data into parquet file format?

I am trying to parse log data into parquet file format in hive , the separator used is "||-||".
The sample row is
"b8905bfc-dc34-463e-a6ac-879e50c2e630||-||syntrans1||-||CitBook"
After performing the data staging I am able to get the result
"b8905bfc-dc34-463e-a6ac-879e50c2e630 syntrans1 CitBook ".
While converting the data to parquet file format I got error :
`
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:137)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
... 24 more
This is what I have tried
create table log (a String ,b String ,c String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
"field.delim"="||-||",
"collection.delim"="-",
"mapkey.delim"="#"
);
create table log_par(
a String ,
b String ,
c String
) stored as PARQUET ;
insert into logspar select * from log_par ;
`
Aman kumar,
To resolve this issue, run the hive query after adding the following jar:
hive> add jar hive-contrib.jar;
To add the jar permanently, do the following:
1.On Hive Server host, create a /usr/hdp//hive/auxlib directory.
2.Copy /usr/hdp//hive/lib/hive-contrib-.jar to /usr/hdp//hive/auxlib.
3.Restart the HS2 server.
Please check further reference.
https://community.hortonworks.com/content/supportkb/150175/errororgapachehadoophivecontribserde2multidelimits.html.
https://community.hortonworks.com/questions/79075/loading-data-to-hive-via-pig-orgapachehadoophiveco.html
Let me know,if you face any issues

HbaseStorageHandler plugin in Drill

I am able to query hive,hbase individually by using Drill.Now i am trying to query HbaseStorageHandler type tables in hive. For this in Drill, Hive Storage Plugin I added these properties as,
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://trinitybdClusterM02.trinitymobility.local:9083",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "hdfs://trinitybdClusterM02.trinitymobility.local:9000",
"hive.metastore.sasl.enabled": "false",
"hbase.zookeeper.quorum": "localhost",
"hbase.zookeeper.property.clientPort": "2181"
}
}
I tried to query like,
0: jdbc:drill:zk=localhost> use hive.test;
0: jdbc:drill:zk=localhost> select * from twitter_test_nlp limit 1;
It is giving error as,
Error: SYSTEM ERROR: NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setAttribute(Ljava/lang/String;[B)V
Fragment 0:0
[Error Id: fc3994f4-7d7e-475e-870b-259ac91ea81a on trinitybdClusterM02.trinitymobility.local:31010] (state=,code=0)
Anybody is using this type please share me what properties I have to add for query HBaseStorageHandler tables of Hive.
In drill 1.9 this problem has resolved. drill 1.9 directly supports HbaseStorageHandler tables(Hive and hbase integrated tables) also with hive storage plug-in.And it directly supports spatial queries also like st_contains() etc.So if anybody need these type of requirements use drill 1.9.0.

SPARK SQL (1.5.1) connect to Oracle and write to Avro

I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1

Need javax.jdo.option.ConnectionURL for cassandra

Are the below properties in hive-site.xml correct for Hive access to cassandra??
(I HAVE COPIED ENTIRE HIVE-DEFAULT.XML CONTENT BUT HAVE CHANGED ONLY THE BELOW PROPERTIES)
javax.jdo.option.ConnectionURL : cassandra://localhost:9160
javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbclass: jdbc:cassandra
hive.stats.jdbcdriver: org.apache.cassandra.cql.jdbc.CassandraDriver
hive.stats.dbconnectionstring: jdbc:cassandra:;databaseName=TempStatsStore;create=true
I am running 1-node Cassandra. But, later would make it a minimum 2 node cluster.
When I run the below table creation command I get an error:
CREATE EXTERNAL TABLE MyHiveTable
(m string, n string, o string, p string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "test",
"cassandra.cql3.type" = "text, text, text, text");
Error:
FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
don't know about jdo settings but you could try this link which is far better option for integrating hive with cassandra -
https://github.com/milliondreams/hive/tree/cas-support-cql/cassandra-handler

Resources