Apache Drill 1.2 and Oracle JDBC - jdbc

Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit in embedded mode.
I'm curious if anyone has had any success connecting Apache Drill to an Oracle DB. I've updated the drill-override.conf with the following configurations (per documents):
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "localhost:2181",
drill.exec.sys.store.provider.local.path = "/mypath"
}
and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can successfully create the storage plug-in:
{
"type": "jdbc",
"driver": "oracle.jdbc.driver.OracleDriver",
"url": "jdbc:oracle:thin:#<IP>:<PORT>:<SID>",
"username": "USERNAME",
"password": "PASSWORD",
"enabled": true
}
but when I issue a query such as:
select * from <storage_name>.<schema_name>.`dual`;
I get the following error:
Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column 15 to line 1, column 20: Table '<storage_name>.<schema_name>.dual' not found [Error Id: 57a4153c-6378-4026-b90c-9bb727e131ae on <computer_name>:<PORT>].
I've tried to query other schema/tables and get a similar result. I've also tried connecting to Teradata and get the same error. Does any one have suggestions/run into similar issues?

It's working with Drill 1.3 (released on 23-Dec-2015)
Plugin: name - oracle
{
"type": "jdbc",
"driver": "oracle.jdbc.driver.OracleDriver",
"url": "jdbc:oracle:thin:user/password#192.xxx.xxx.xxx:1521:orcl ",
"enabled": true
}
Query:
select * from <plugin-name>.<user-name>.<table-name>;
Example:
select * from oracle.USER.SAMPLE;
Check drill's documentation for more details.
Note: Make sure you added ojdbc7.12.1.0.2.jar(recommended in docs) in apache-drill-1.3.0/jars/3rdparty

It kind of works in Apache drill 1.3.
The strange thing is that I can only query the tables for which there are synonyms created...
In the command line try:
use <storage_name>;
show tables;
This will give you a list of objects that you can query - dual is not on that list ;-).

I'm using apache-drill-1.9.0 and it seems that the schema name is interpreted case sensitive and must be be therefore be in upper case.
For a table user1.my_tab (which is per default created in Oracle in upper case)
this works in Drill (plugin name is oracle)
SELECT * FROM oracle.USER1.my_tab;
But this triggers an error
SELECT * FROM oracle.user1.my_tab;
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'oracle.user1.my_tab' not found
An alternative approach is to set the plugin name and the schema name with use (owner must be upper case as well)
0: jdbc:drill:zk=local> use oracle.USER1;
+-------+-------------------------------------------+
| ok | summary |
+-------+-------------------------------------------+
| true | Default schema changed to [oracle.USER1] |
+-------+-------------------------------------------+
1 row selected (0,169 seconds)
0: jdbc:drill:zk=local> select * from my_tab;
+------+
| X |
+------+
| 1.0 |
| 1.0 |
+------+
2 rows selected (0,151 seconds)

Related

Numeric Timestamp in Kafka Oracle JDBC Connector

I'm currently trying to setup a JDBC Connector with the goal to read data from a Oracle DB and push it to a Kafka topic using Kafka Connect. I wanted to use the "timestamp" mode:
timestamp: use a timestamp (or timestamp-like) column to detect new and modified rows. This assumes the column is updated with each write, and that values are monotonically incrementing, but not necessarily unique.
https://docs.confluent.io/kafka-connectors/jdbc/current/source-connector/source_config_options.html#mode
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"dialect.name" : "OracleDatabaseDialect",
"connection.url": "XXX",
"connection.user" : "XXX",
"connection.password" : "XXX",
"mode" : "timestamp",
"quote.sql.identifiers": "never"
"timestamp.column.name" : "LAST_UPDATE_DATE",
"query" : "select a1.ID, a1.LAST_UPDATE_DATE, b1.CODE
from a1
left join b1 on a1.ID = b1.ID
...
}
My problem is, that the timestamp on the database is defined as NUMBER(15), e.g. 20221220145930000. The Connector creates a where statement at the end of my defined query like
where a1.LAST_UPDATE_DATE > :1 and a1.LAST_UPDATE_DATE < :2 order by a1.LAST_UPDATE_DATE asc
This leads to an error message: ORA-00932: inconsistent datatypes: expected NUMBER got TIMESTAMP
Unfortunately, the database is not under my control (proprietary software). I have only read permissions.
Is there a possibility to set the timestamp type in this connector? I already tried to use the (to_timestamp() function directly in the SQL-statement and a SMT (timestampConverter) without success

Error using Polybase to load Parquet file: class java.lang.Integer cannot be cast to class parquet.io.api.Binary

I have a snappy.parquet file with a schema like this:
{
"type": "struct",
"fields": [{
"name": "MyTinyInt",
"type": "byte",
"nullable": true,
"metadata": {}
}
...
]
}
Update: parquet-tools reveals this:
############ Column(MyTinyInt) ############
name: MyTinyInt
path: MyTinyInt
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=8, isSigned=true)
converted_type (legacy): INT_8
When I try and run a stored procedure in Azure Data Studio to load this into an external staging table with PolyBase I get the error:
11:16:21Started executing query at Line 113
Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary (java.lang.Integer is in module java.base of loader 'bootstrap'; parquet.io.api.Binary is in unnamed module of loader 'app')
The load into the external table works fine with only varchars
CREATE EXTERNAL TABLE [domain].[TempTable]
(
...
MyTinyInt tinyint NULL,
...
)
WITH
(
LOCATION = ''' + #Location + ''',
DATA_SOURCE = datalake,
FILE_FORMAT = parquet_snappy
)
The data will eventually be merged into a Data Warehouse Synapse table. In that table the column will have to be of type tinyint.
I have the same issue and good support plan in Azure, so I've got an answer from Microsoft:
there is a known bug in ADF for this particular scenario: The date
type in parquet should be mapped as data type date in Sql sever
however, ADF incorrectly converts this type to Datetime2 which causes
a conflict in PolyBase. I have confirmation for the core engineering
team that this will be rectified with a fix by the end of November and
will be published directly into the ADF product.
In the meantime, as a workaround:
Create the target table with data type DATE as opposed to DATETIME2
Configure the Copy Activity Sink settings to use Copy Command as opposed to PolyBase
but even Copy command don't work for me, so only one workaround is to use Bulk insert, but Bulk is extremely slow and on big datasets it's would be a problem

Issues using Kafka KSQL AVRO table as a source for a Kafka Connect JDBC Sink

I've been struggling with this for about a week now trying to get a simple (3 fields) AVRO formated KSQL table as a source to a JDBC connector sink (mysql)
I am getting the following errors (after INFO line):
[2018-12-11 18:58:50,678] INFO Setting metadata for table "DSB_ERROR_TABLE_WINDOWED" to Table{name='"DSB_ERROR_TABLE_WINDOWED"', columns=[Column{'MOD_CLASS', isPrimaryKey=false, allowsNull=true, sqlType=VARCHAR}, Column{'METHOD', isPrimaryKey=false, allowsNull=true, sqlType=VARCHAR}, Column{'COUNT', isPrimaryKey=false, allowsNull=true, sqlType=BIGINT}]} (io.confluent.connect.jdbc.util.TableDefinitions)
[2018-12-11 18:58:50,679] ERROR WorkerSinkTask{id=dev-dsb-errors-mysql-sink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. (org.apache.kafka.connect.runtime.WorkerSinkTask)
org.apache.kafka.connect.errors.ConnectException: No fields found using key and value schemas for table: DSB_ERROR_TABLE_WINDOWED
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:127)
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:64)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:79)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:124)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:63)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:75)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I can tell that the sink is doing something properly as the schema is pulled (see just before the error above) and the table is created successfully in the database with the proper schema:
MariaDB [dsb_errors_ksql]> describe DSB_ERROR_TABLE_WINDOWED;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| MOD_CLASS | varchar(256) | YES | | NULL | |
| METHOD | varchar(256) | YES | | NULL | |
| COUNT | bigint(20) | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
And here is the KTABLE definition:
ksql> describe extended DSB_ERROR_TABLE_windowed;
Name : DSB_ERROR_TABLE_WINDOWED
Type : TABLE
Key field : KSQL_INTERNAL_COL_0|+|KSQL_INTERNAL_COL_1
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : AVRO
Kafka topic : DSB_ERROR_TABLE_WINDOWED (partitions: 4, replication: 1)
Field | Type
---------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
MOD_CLASS | VARCHAR(STRING)
METHOD | VARCHAR(STRING)
COUNT | BIGINT
---------------------------------------
Queries that write into this TABLE
-----------------------------------
CTAS_DSB_ERROR_TABLE_WINDOWED_37 : create table DSB_ERROR_TABLE_windowed with (value_format='avro') as select mod_class, method, count(*) as count from DSB_ERROR_STREAM window session ( 60 seconds) group by mod_class, method having count(*) > 0;
There is an entry auto generated in the schema registry for this table (but no key entry):
{
"subject": "DSB_ERROR_TABLE_WINDOWED-value",
"version": 7,
"id": 143,
"schema": "{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"MOD_CLASS\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"METHOD\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"COUNT\",\"type\":[\"null\",\"long\"],\"default\":null}]}"
}
and here is the Connect Worker definition:
{ "name": "dev-dsb-errors-mysql-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "DSB_ERROR_TABLE_WINDOWED",
"connection.url": "jdbc:mysql://os-compute-d01.maeagle.corp:32692/dsb_errors_ksql?user=xxxxxx&password=xxxxxx",
"auto.create": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://kafka-d01.maeagle.corp:8081",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
}
My understanding (which could be wrong) is that KSQL should be creating the appropriate AVRO schemas in the Schema Registry and Kafka Connect should be able to read those back properly. As I noted above, something is working as the appropriate table is being generated in Mysql, although I am surprised that there is not a key field created...
Most of the posts and examples are using JSON as opposed to AVRO so they haven't been particularly useful.
It seems to be at the deserialization portion of reading and inserting of the topic record...
I am at a loss at this point and could use some guidance.
I have also opened a similiar ticket via github:
https://github.com/confluentinc/ksql/issues/2250
Regards,
--John
As John says above, the key in the topic's record is not a string, but a string post-fixed with a single Java serialized 64bit integer, representing the window start time.
Connect does not come with a SMT that can handle the windowed key format. However, it would be possible to write one to strip off the integer and just return the natural key. You could then include this on the class path and update your connect config.
If you require the window start time in the database, then you can update you ksqlDB query to include the window start time as a field in the value.

Sonar 5.6 : cannot update issue

I try to update an issue on the UI (assign/set severity/open). But nothing append.
When i see the network exchange i have a 404:
{"errors":[{"msg":"Issue with key '76b53a17-fa8f-4d04-b999-1fd5e401fee0' does not exist"}]}
But i found my issue in my database (mysql):
mysql> select kee from issues where kee='76b53a17-fa8f-4d04-b999-1fd5e401fee0';
+--------------------------------------+
| kee |
+--------------------------------------+
| 76b53a17-fa8f-4d04-b999-1fd5e401fee0 |
+--------------------------------------+
1 row in set (0.00 sec)
We try to find the query executed my sonar. We just find it on the head version of sonar (IssueFinder and IBatis config) and it's work:
select i.id,
i.kee as kee,
i.rule_id as ruleId,
i.severity as severity,
i.manual_severity as manualSeverity,
i.message as message,
i.line as line,
i.locations as locations,
i.gap as gap,
i.effort as effort,
i.status as status,
i.resolution as resolution,
i.checksum as checksum,
i.assignee as assignee,
i.author_login as authorLogin,
i.tags as tagsString,
i.issue_attributes as issueAttributes,
i.issue_creation_date as issueCreationTime,
i.issue_update_date as issueUpdateTime,
i.issue_close_date as issueCloseTime,
i.created_at as createdAt,
i.updated_at as updatedAt,
r.plugin_rule_key as ruleKey,
r.plugin_name as ruleRepo,
r.language as language,
p.kee as componentKey,
i.component_uuid as componentUuid,
p.module_uuid as moduleUuid,
p.module_uuid_path as moduleUuidPath,
p.path as filePath,
root.kee as projectKey,
i.project_uuid as projectUuid,
i.issue_type as type
from issues i
inner join rules r on r.id=i.rule_id
inner join projects p on p.uuid=i.component_uuid
inner join projects root on root.uuid=i.project_uuid
where i.kee='76b53a17-fa8f-4d04-b999-1fd5e401fee0';
it's return one row.
What can i do? Is it a bug?
The ES folder is probably corrupted. Here are the steps to clean it up :
Stop the SonarQube server
Remove the {SONARQUBE_INSTALLATION}/data/es folder
Restart the server

HbaseStorageHandler plugin in Drill

I am able to query hive,hbase individually by using Drill.Now i am trying to query HbaseStorageHandler type tables in hive. For this in Drill, Hive Storage Plugin I added these properties as,
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://trinitybdClusterM02.trinitymobility.local:9083",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "hdfs://trinitybdClusterM02.trinitymobility.local:9000",
"hive.metastore.sasl.enabled": "false",
"hbase.zookeeper.quorum": "localhost",
"hbase.zookeeper.property.clientPort": "2181"
}
}
I tried to query like,
0: jdbc:drill:zk=localhost> use hive.test;
0: jdbc:drill:zk=localhost> select * from twitter_test_nlp limit 1;
It is giving error as,
Error: SYSTEM ERROR: NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setAttribute(Ljava/lang/String;[B)V
Fragment 0:0
[Error Id: fc3994f4-7d7e-475e-870b-259ac91ea81a on trinitybdClusterM02.trinitymobility.local:31010] (state=,code=0)
Anybody is using this type please share me what properties I have to add for query HBaseStorageHandler tables of Hive.
In drill 1.9 this problem has resolved. drill 1.9 directly supports HbaseStorageHandler tables(Hive and hbase integrated tables) also with hive storage plug-in.And it directly supports spatial queries also like st_contains() etc.So if anybody need these type of requirements use drill 1.9.0.

Resources