How to set the key of the JDBC source connector (kafka)? - apache-kafka-connect

I'm reading data from a mysql database table using a Kafka Source JDBC connector and publishing it to the topic test-mysql-petai.
The database table has 2 fields where Id is the primary key:
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | YES | | NULL | |
+---------+-------------+------+-----+---------+----------------+
I need the value of the id field to be the Key of the topic. I tried adding a transformation to the jdbc connector properties.
JDBCConnector.properties:
name=jdbc-source-connector
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://127.0.0.1:3306/test?user=dins&password=pw&serverTimezone=UTC
table.whitelist=petai
mode=incrementing
incrementing.column.name=id
schema.pattern=""
transforms=createKey,extractInt
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.createKey.fields=id
transforms.extractInt.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.extractInt.field=id
topic.prefix=test-mysql-jdbc-
But, when I read the keys and values using a consumer, I get following:
Key = {"schema":{"type":"int32","optional":false},"payload":61}
Value ={"id":61,"name":"ttt"}
I need to get the following:
Key = 61
Value ={"id":61,"name":"ttt"}
What am I doing wrong? Any help is appreciated.
Thank you.

If you don't want to include a schema to keys, you can tell Kafka Connect about it by setting key.converter.schemas.enable=false.
For a detailed explanation, please see Kafka Connect Deep Dive – Converters and Serialization Explained by Robin Moffatt.

Related

Unable to query/select data those inserted through Spark SQL

I am trying to insert data into a Hive Managed table that has a partition.
Show create table output for reference.
+--------------------------------------------------------------------------------------------------+--+
| createtab_stmt |
+--------------------------------------------------------------------------------------------------+--+
| CREATE TABLE `part_test08`( |
| `id` string, |
| `name` string, |
| `baseamount` double, |
| `billtoaccid` string, |
| `extendedamount` double, |
| `netamount` decimal(19,5), |
| `netunitamount` decimal(19,5), |
| `pricingdate` timestamp, |
| `quantity` int, |
| `invoiceid` string, |
| `shiptoaccid` string, |
| `soldtoaccid` string, |
| `ingested_on` timestamp, |
| `external_id` string) |
| PARTITIONED BY ( |
| `productid` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION |
| 'wasb://blobrootpath/hive/warehouse/db_103.db/part_test08' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transactional'='true', |
| 'transactional_properties'='default', |
| 'transient_lastDdlTime'='1549962363') |
+--------------------------------------------------------------------------------------------------+--+
Trying to execute SQL statement to insert records into the part table like below
sparkSession.sql("INSERT INTO TABLE db_103.part_test08 PARTITION(ProductId) SELECT reflect('java.util.UUID', 'randomUUID'),stg_name,stg_baseamount,stg_billtoaccid,stg_extendedamount,stg_netamount,stg_netunitamount,stg_pricingdate,stg_quantity,stg_invoiceid,stg_shiptoaccid,stg_soldtoaccid,'2019-02-12 09:06:07.566',stg_id,stg_ProductId FROM tmp_table WHERE part_id IS NULL");
Without insert statement, if we run select query then getting below data.
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
|reflect(java.util.UUID, randomUUID)|stg_name|stg_baseamount| stg_billtoaccid|stg_extendedamount|stg_netamount|stg_netunitamount| stg_pricingdate|stg_quantity|stg_invoiceid| stg_shiptoaccid| stg_soldtoaccid|2019-02-12 09:06:07.566|stg_id|stg_ProductId|
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
| 4e0b4331-b551-42d...| OLI6| 16.0|2DD4E682-6B4F-E81...| 34.567| 1166.74380| 916.78000|2018-10-18 05:06:22| 13| I1|2DD4E682-6B4F-E81...|2DD4E682-6B4F-E81...| 2019-02-12 09:06:...| 6| P3|
| 8b327a8e-dd3c-445...| OLI7| 16.0|2DD4E682-6B4F-E81...| 34.567| 766.74380| 1016.78000|2018-10-18 05:06:22| 13| I6|2DD4E682-6B4F-E81...|2DD4E682-6B4F-E81...| 2019-02-12 09:06:...| 7| P4|
| c0e14b9a-8d1a-426...| OLI5| 14.6555| null| 34.56| 500.87000| 814.65000|2018-10-11 05:06:22| 45| I4|29B73C4E-846B-E71...|29B73C4E-846B-E71...| 2019-02-12 09:06:...| 5| P1|
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
Earlier I was getting error while inserting into Managed table. But after restarting Hive & Thrift services now there is no error in execution of job but not able to see those inserted data while doing select query through beeline/program. I can see partition with delta files also got inserted into hive/warehouse, see below screenshot.
Also, I can see some warnings as below not sure its related to the error or not.
Cannot get ACID state for db_103.part_test08 from null
One more note: If I use External Table then it is working fine can able to view data as well.
We are using Azure HDInsight Spark 2.3 (HDI 4.0 Preview) cluster with below Service Stacks.
HDFS: 3.1.1
Hive: 3.1.0
Spark2: 2.3.1
Have you added below set commands while trying to insert data.
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I have faced similar issue where i was not allowed to do any read / write operation, after adding the above properties I was able to query the table.
Since you have not faced any issue with external table, not sure if this is going to solve your problem.

cant load data in hive table,table created with partitioned by. Told me wrong with 'hive.partition doesn't

when i load data into hive,it told me wrong with 'hive.partition doesn't exists'
centos7.2
hive 0.12
hadoop 2.7.6
---------------line ---------
when I create the table, using the external table into MySQL,
the detail is:
create external table people(id int,name string)
partitioned by (logdate string,hour string) row format delimited
fields terminated by ',';
but! it takes wrong when I load data.
java.sql.SQLSyntaxErrorException: Table 'hive.PARTITIONS' doesn't exist
so I check databases and tables in MySQL. And database hive has existed, but table partition really not exists.
then, I found a table name like that -->partition_keys
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| DATABASE_PARAMS |
| DBS |
| PARTITION_KEYS |
| SDS |
| SD_PARAMS |
then I describe and select that table (partition_keys)
mysql> describe PARTITION_KEYS;
+--------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| TBL_ID | bigint(20) | NO | PRI | NULL | |
| PKEY_COMMENT | varchar(4000) | YES | | NULL | |
| PKEY_NAME | varchar(128) | NO | PRI | NULL | |
| PKEY_TYPE | varchar(767) | NO | | NULL | |
| INTEGER_IDX | int(11) | NO | | NULL | |
+--------------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
mysql>
mysql> select * from PARTITION_KEYS;
+--------+--------------+-----------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------+-----------+-------------+
| 1 | NULL | hour | string | 1 |
| 1 | NULL | logdate | string | 0 |
| 6 | NULL | hour | string | 1 |
| 6 | NULL | logdate | string | 0 |
+--------+--------------+-----------+-----------+-------------+
4 rows in set (0.00 sec)
my metadata is in it.
but I can't load data into the hive table I made yet. how can I do that?
I found it couldn't change the table name into partition caused by the keyword of MySQL is partition either.
I need help, it has taken me a lot of time to deal with it. thx.
------------------update to upload my hive-site.xml-----------------------------
this site couldn't fill more than 30000 characters. And i just use the default configuration cp -r hive-default.xml hive-site.xml . So that i just make some change in this default site as following:
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth</value>
<description>Sasl QOP value; Set it to one of following values to enable higher levels of
protection for hive server2 communication with clients.
"auth" - authentication only (default)
"auth-int" - authentication plus integrity protection
"auth-conf" - authentication plus integrity and confidentiality protection
This is applicable only hive server2 is configured to use kerberos authentication.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
other changes are the connect name,passsword. Without doing anything about the hive-site.xml.

Hive: Error while fetching data

I tried to connect to Hive using the below query:
select * from some-table where yyyy = 2018 and mm = 01 and dd = 05 runs
The query ran successfully.
After adding one more filter, i.e string data type
The following error is generated:
java.io.IOException:java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to
org.apache.hadoop.io.Text
The error is generated by Serializer-Deserializers.
Root Cause: When you created the table, you probably didn't define the STORED AS tag. Try to describe your table using desc <table name> and you may see something like this:
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
Which is not in the good practice. Your SerDes use Lazy Evaluations by default. Create a table using STORED AS ORC and then try to describe your table and the result may be different this time:
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
Try this and you may be able to resolve the issue.

Will Eloquent automatically add this foreign key to the primary key's index?

Let's say I have a parent and a child table, in Laravel, for table Order, my model would look like this:
public function up()
{
Schema::create('orders', function(Blueprint $table)
{
$table->integer('customer_id')->unsigned();
$table->foreign('customer_id')->references('id')->on('customers')->onDelete('cascade');
$table->increments('id');
I know Eloquent would consider id to be the primary key of orders, so an index would be automatically created on that primary key.
What should I do to make sure that customer_id is part of the primary key's index, setup in that order:
1. customer_id
2. id
Example of tables
Customer
+-------------+
| id | --> primary key
|- - - - - - -|
| name |
| address |
+-------------+
|
|
|
A
Order
+------------------+
| customer_id (fk) | --- primary key
| id | --- primary key
|- - - - - - - - - |
| date |
+------------------+
Will Eloquent automatically add this foreign key to the primary key's index?
Well, not automatic but its very easy.
To specify custom primary key, you can call primary() method from Blueprint class, called through $table. ie $table->primary().
For a single primary key, it accepts a string specifying the name of column to be made primary.
For composite key, you can pass an array of strings of the columns to be made primary. In your case
$table->primary(
['id', 'customer_id']
)
I decided to try this out and see what happens.
Starting with the customers table, I ran this statement...
CREATE TABLE customers (
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255),
created_at DATETIME,
updated_at DATETIME,
PRIMARY KEY (id)
);
I then created the following...
CREATE TABLE orders (
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
customer_id INT(11) UNSIGNED,
created_at DATETIME,
updated_at DATETIME,
PRIMARY KEY (id, customer_id)
);
Note here if you use PRIMARY KEY (customer_id, id), it will result in a SQL error. This makes me believe the DB2 functionality you are trying to replicate will not work exactly the same on MySQL and we actually need a foreign key.
Then after filling these tables with test data, I ran the following...
EXPLAIN SELECT *
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id;
This results in
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------------------------------------------+
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ref | ROWS | Extra |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------------------------------------------+
| 1 | SIMPLE | customers | ALL | PRIMARY | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 262402 | USING WHERE; USING JOIN buffer (flat, BNL JOIN) |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------------------------------------------+
I then added the foreign key...
ALTER TABLE orders ADD FOREIGN KEY customer_id (customer_id) REFERENCES customers (id) ON DELETE CASCADE ON UPDATE CASCADE;
And running the same exact explain query before with the same exact data, I now get the results...
+------+-------------+-----------+------+---------------+-------------+---------+-----------------------+-------+-------+
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ref | ROWS | Extra |
+------+-------------+-----------+------+---------------+-------------+---------+-----------------------+-------+-------+
| 1 | SIMPLE | customers | ALL | PRIMARY | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | orders | ref | customer_id | customer_id | 4 | cookbook.customers.id | 43751 | |
+------+-------------+-----------+------+---------------+-------------+---------+-----------------------+-------+-------+
As you can see, much fewer rows are being evaluated when I add the foreign key which is exactly what we are looking for. Surprising for me probably because I'm not a DBA, running the following produces the same results...
EXPLAIN SELECT * FROM orders WHERE customer_id = 4;
Even in this case, the composite primary key isn't doing anything for you however the foreign key is helping immensely.
With all that said, I think it's safe to forego the composite primary key and just set the id up as primary and set the customer_id up as a foreign key. This also gives you the benefit of being able to cascade deletes and updates.

Relation between 3 Models in Laravel 5.1 ("like many-to-many-through")

I'm developing a simple Schools management app for an exam and have so far built a simple Many-To-Many relation between a School model and the Field model and of course a pivot table.
Now I want to relate my User model to them so that I can query for example
{user} is studying {field_of_study} at {school}
At the end I want to be able to query for example
how many users are studying at School XY or,
how many are studying Field Z or,
how many are studying Field Z at School XY.
Furthermore I want to be able to query all fields of study for a given school and vice versa.
My tables so far
Table users:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| id | bigint(20) unsigned |
| username | varchar(60) |
| password | varchar(60) |
+------------+---------------------+
Table schools:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| id | bigint(20) unsigned |
| name | varchar(60) |
+------------+---------------------+
Table fields:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| id | bigint(20) unsigned |
| name | varchar(60) |
+------------+---------------------+
Table schools_pivot:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| id | bigint(20) unsigned |
| school_id | bigint(20) |
| field_id | bigint(20) |
+------------+---------------------+
Unfortunatly I absolutly have no clue how to relate three Eloquent models in such a way and couldn't find any thing on the web (I probably searched for the wrong terms).
I'm pretty new to Laravel and Eloquent, so please be kind with me ;)
it's better to define relationship between two table and then alternately the other one.you can read this artical. maybe it will help you three-way-pivot-table-in-eloquent

Resources