How to create KSQL table from a topic with composite key? - apache-kafka-connect

I have some topic data with the fields stringA stringB and I am just trying to use that as the key when creating a KSQL table from a topic.

Here's an example. First, I'll create and populate a test stream
ksql> CREATE STREAM TEST (STRINGA VARCHAR,
STRINGB VARCHAR,
COL3 INT)
WITH (KAFKA_TOPIC='TEST',
PARTITIONS=1,
VALUE_FORMAT='JSON');
Message
----------------
Stream created
----------------
ksql> INSERT INTO TEST (STRINGA, STRINGB, COL3) VALUES ('A','B',1);
ksql> INSERT INTO TEST (STRINGA, STRINGB, COL3) VALUES ('A','B',2);
ksql> INSERT INTO TEST (STRINGA, STRINGB, COL3) VALUES ('C','D',3);
ksql>
ksql> SET 'auto.offset.reset' = 'earliest';
Successfully changed local property 'auto.offset.reset' to 'earliest'. Use the UNSET command to revert your change.
ksql> SELECT * FROM TEST EMIT CHANGES LIMIT 3;
+--------------+--------+---------+----------+------+
|ROWTIME |ROWKEY |STRINGA |STRINGB |COL3 |
+--------------+--------+---------+----------+------+
|1578569329184 |null |A |B |1 |
|1578569331653 |null |A |B |2 |
|1578569339177 |null |C |D |3 |
Note that ROWKEY is null.
Now I'll create a new stream, populated from the first, and create the composite column set it as the key. I'm also including the original fields themselves, but this is optional if you don't need them:
ksql> CREATE STREAM TEST_REKEY AS
SELECT STRINGA + STRINGB AS MY_COMPOSITE_KEY,
STRINGA,
STRINGB,
COL3
FROM TEST
PARTITION BY MY_COMPOSITE_KEY ;
Message
------------------------------------------------------------------------------------------
Stream TEST_REKEY created and running. Created by query with query ID: CSAS_TEST_REKEY_9
------------------------------------------------------------------------------------------
Now you have a stream of data with the key set to your composite key:
ksql> SELECT ROWKEY , COL3 FROM TEST_REKEY EMIT CHANGES LIMIT 3;
+---------+-------+
|ROWKEY |COL3 |
+---------+-------+
|AB |1 |
|AB |2 |
|CD |3 |
Limit Reached
Query terminated
You can also inspect the underlying Kafka topic to verify the key:
ksql> PRINT TEST_REKEY LIMIT 3;
Format:JSON
{"ROWTIME":1578569329184,"ROWKEY":"AB","MY_COMPOSITE_KEY":"AB","STRINGA":"A","STRINGB":"B","COL3":1}
{"ROWTIME":1578569331653,"ROWKEY":"AB","MY_COMPOSITE_KEY":"AB","STRINGA":"A","STRINGB":"B","COL3":2}
{"ROWTIME":1578569339177,"ROWKEY":"CD","MY_COMPOSITE_KEY":"CD","STRINGA":"C","STRINGB":"D","COL3":3}
ksql>
With this done, we can now declare a table on top of the re-keyed topic:
CREATE TABLE TEST_TABLE (ROWKEY VARCHAR KEY,
COL3 INT)
WITH (KAFKA_TOPIC='TEST_REKEY', VALUE_FORMAT='JSON');
From this table we can query the state. Note that the composite key AB only shows the latest value, which is part of the semantics of a table (compare to the stream above, in which you see both values - both stream and table are the same Kafka topic):
ksql> SELECT * FROM TEST_TABLE EMIT CHANGES;
+----------------+---------+------+
|ROWTIME |ROWKEY |COL3 |
+----------------+---------+------+
|1578569331653 |AB |2 |
|1578569339177 |CD |3 |

Just an update to #Robin Moffat..
Use the below
CREATE STREAM TEST_REKEY AS
SELECT STRINGA + STRINGB AS MY_COMPOSITE_KEY,
STRINGA,
STRINGB,
COL3
FROM TEST
PARTITION BY STRINGA + STRINGB ;
instead of
CREATE STREAM TEST_REKEY AS
SELECT STRINGA + STRINGB AS MY_COMPOSITE_KEY,
STRINGA,
STRINGB,
COL3
FROM TEST
PARTITION BY MY_COMPOSITE_KEY ;
NOTE: column ordering matters
Worked for me! (CLI v0.10.1, Server v0.10.1)

Related

How to call Oracle stored procedure using muliple param values from table

I have requirement to call stored procedure for which input parameters are multiples values are coming from table.
For example:
sp(p1,p2,p2,...);
Table looks like this:
Col1 | Col2
-----+-------
p1 | val1
p2 | Val2
p3 | val3
How we can achieve this in PL/SQL block?

How can I avoid db_file_sequential_read on an index during inserts

I have a table with two indexes. I'm running inserts one at a time, committing every 1000 rows. I can't do bulk inserts because the business logic requires checking the updated data row by row.
My index is causing very high db_file_sequential_read waits. I can order the incoming data to avoid these on this index, but then I suffer the same penalty on a different index.
The actual table is too long to disable and subsequently recreate the indexes.
This shows the slowness I am suffering from. The first set of numbers is from a staging server using an encrypted tablespace. The second set of numbers is from a production server using a non-encrypted tablespace.
-- create random test data in foo
create table foo as (
select dbms_random.random() id, dbms_random.string('U', 25) val
from dual connect by level <= 100000
);
create index foo_id_idx on foo (id, val);
-- create data table in bar
create table bar as (
select * from foo where 0 = 1
);
-- populate bar with unordered data (3.12s / 1.22s)
insert into bar select * from foo; commit;
-- add id index
create index bar_id_idx on bar (id);
-- populate indexed bar with unordered data (36.73s / 2.24s)
truncate table bar;
insert into bar select * from foo; commit;
-- populate indexed bar with id ordered data (4.84s / 0.6s)
truncate table bar;
insert into bar select * from foo order by id; commit;
-- add val index (actual production setup)
create index bar_val_idx on bar (val);
-- populate multi-indexed bar with unordered data (84.482s / 3.1s)
truncate table bar;
insert into bar select * from foo order by val; commit;
-- populate multi-indexed bar with id ordered data (50.641s / 2.631s)
truncate table bar;
insert into bar select * from foo order by id; commit;
-- alter index on foo to support order by clause
drop index foo_id_idx;
create index foo_val_idx on foo (val, id);
-- populate multi-indexed bar with val ordered data (37.31s / 2.66s)
truncate table bar;
insert into bar select * from foo order by val; commit;
This seems like such a huge penalty for the second index to go from 5s to 84s. Of course I can bypass most of the penalty for one index by ordering the data, but not for both. Should I be looking at buffer, cache, memory or something else to help avoid the disk IO, or should I be looking at some other strategy like index organized tables?
EDIT 1: Added numbers from a production box & wait information.
In one hour on production with the actual insert process (not the simplified example above):
Executions 56,715
Rows Processed 56,715
Parses 1
Disk Reads 36,958
Sorts 0
Buffer Gets 754,970
db_file_sequential_read wait 323s
memory/cpu wait 26s
You may have the same data stored in two different orders, if you store two copies:
materialized view
backed by IOT with (val, id) key
refresh on commit
advanced query rewrite enabled, for mview index to be usable by queries against the original table
This may be updated in bulk, thus giving some profit to pay off its own maintenance. Keep your fingers crossed for advanced query rewrite to work properly.
Executable example for the mview.
If, and only if, the row-by-row insertion of second index is a problem, then this has slim chance of performing faster, due to bulk update of the mview on commit.
set timing on
create table foo (
id number,
val varchar2(30)
) pctfree 0;
create index foo_id_idx on foo (id, val);
alter table foo
add constraint pk_foo primary key (id) using index foo_id_idx;
create materialized view log on foo with primary key;
create table mv_foo (
id number,
val varchar2(30),
constraint pk_mv_foo primary key (val, id)
) organization index;
create materialized view mv_foo
on prebuilt table
refresh fast on commit with primary key
enable query rewrite
as
select id, val
from foo;
begin
-- to reset mview staleness
dbms_mview.refresh('mv_foo', method => 'c');
end;
/
insert into foo(id, val)
select dbms_random.random(), dbms_random.string('U', 25)
from dual connect by level <= 10000;
commit;
begin
dbms_stats.gather_table_stats(user, 'foo');
dbms_stats.gather_table_stats(user, 'mv_foo');
end;
/
explain plan for
select /*+ first rows */
id, val
from foo
order by id, val;
select * from table(dbms_xplan.display);
explain plan for
select /*+ first rows */
id, val
from foo
order by val, id;
select * from table(dbms_xplan.display);
Please see how mview IOT is transparently used when FOO filtered by VAL in the second statement:
explain plan for select * from foo where id = :x
explain plan succeeded.
6ms elapsed
select * from table(dbms_xplan.display)
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 2466643623
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 33 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| FOO_ID_IDX | 1 | 33 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=TO_NUMBER(:Z))
explain plan for select * from foo where val = :x
explain plan succeeded.
25ms elapsed
select * from table(dbms_xplan.display)
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 386525678
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 33 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| PK_MV_FOO | 1 | 33 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MV_FOO"."VAL"=:X)
You need QUERY REWRITE system privilege for that to work.

oracle query with pivot and row totals

Here is my entire requirement along with create and insert commands.
create table sample
( schedule_id varchar2(256),
schedule_details varchar2(256),
pod_name varchar2(256),
milestone varchar2(256),
metric_column varchar2(256),
metric_value varchar2(4000));
insert into sample values ('23456','details','ABCD','M1','status','DONE');
insert into sample values ('123456','details','ABCD','M1','progress','GREEN');
insert into sample values ('123456','details','ABCD','M1','last_reported','2016-03-18 04:18:56');
insert into sample values ('123456','det','ABCD','M2','status','DONE');
insert into sample values ('123456','det','ABCD','M2','progress','RED');
insert into sample values ('123456','det','ABCD','M2','last_reported','2016-03-19 04:18:56');
insert into sample values ('123456','det','ABCD','M3','status','SKIP');
insert into sample values ('123456','det','ABCD','M3','progress','YELLOW');
insert into sample values ('123456','det','ABCD','M3','last_reported','2016-03-20 04:18:56');
insert into sample values ('123456','det''EFGH','M1','status','DONE');
insert into sample values ('123456','det','EFGH','M1','progress','GREEN');
insert into sample values ('123456','det','EFGH','M1','last_reported','2016-03-10 04:18:56');
insert into sample values ('123456','det','EFGH','M2','status','DONE');
insert into sample values ('123456','det','EFGH','M2','progress','RED');
insert into sample values ('123456','det','EFGH','M2','last_reported','2016-03-12 04:18:56');
insert into sample values ('123456','det','EFGH','M3','status','SKIP');
insert into sample values ('123456','det','EFGH','M3','progress','YELLOW');
insert into sample values ('123456','det','EFGH','M3','last_reported','2016-03-11 04:18:56');
insert into sample values ('7890','det','ABCD','M1','status','DONE');
insert into sample values ('7890','det','ABCD','M1','progress','GREEN');
insert into sample values ('7890','det','ABCD','M1','last_reported','2016-04-18 04:18:56');
insert into sample values ('7890','det','ABCD','M2','status','DONE');
insert into sample values ('7890','det','ABCD','M2','progress','RED');
insert into sample values ('7890','det','ABCD','M2','last_reported','2016-04-19 04:18:56');
insert into sample values ('7890','det','ABCD','M3','status','FAILED');
insert into sample values ('7890','det','ABCD','M3','progress','GREEN');
insert into sample values ('7890','det','ABCD','M3','last_reported','2016-04-20 04:18:56');
insert into sample values ('7890','det','PQRS','M1','status','DONE');
insert into sample values ('7890','det','PQRS','M1','progress','GREEN');
insert into sample values ('7890','det','PQRS','M1','last_reported','2016-04-10 04:18:56');
insert into sample values ('7890','det','PQRS','M2','status','DONE');
insert into sample values ('7890','det','PQRS','M2','progress','RED');
insert into sample values ('7890','det','PQRS','M2','last_reported','2016-04-11 04:18:56');
insert into sample values ('7890','det','PQRS','M3','status','SKIP');
insert into sample values ('7890','det','PQRS','M3','progress','GREEN');
insert into sample values ('7890','det','PQRS','M3','last_reported','2016-04-12 04:18:56');
my output is
SCHEDULE_ID | STATUS_DONE | STATUS_SKIP | STATUS_FAILED | STATUS_TOTAL | PROGRESS_RED | PROGRESS_GREEN | PROGRESS_YELLOW | PROGRESS_TOTAL
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
123456 | 1 | 1 | 0 | 2 | 1 | 0 | 1 | 2
7890 | 0 | 2 | 0 | 2 | 0 | 1 | 1 | 2
The query needs to fetch the distinct number of pod_name for each schedule_id. For each pod it needs to take the metric_column->status and metric_column->progress of the milestone which has latest metric_column->last_reported. So the value of STATUS_DONE,is the number of POD_NAME that have the latest milestone (based on last_reported) as status DONE.
For eg for schedule id 123456, the pod ABCD, the status and progress of milestone M3 needs to be taken as M3 has the latest last_reported compared to M1 and M2.
I have the query with distinct pod name without taking into consideration the latest milestone.
select *
from (SELECT SCHEDULE_ID,SCHEDULE_DETAILS,POD_NAME,
CASE WHEN GROUPING(value) = 0
THEN value
ELSE CASE WHEN METRIC_COLUMN='status'
THEN 'STATUS_TOTAL'
ELSE 'PROGRESS_TOTAL'
END
END value
FROM sample
WHERE metric_column IN ('status','progress' )
GROUP BY SCHEDULE_ID,SCHEDULE_DETAILS,POD_NAME, METRIC_COLUMN, ROLLUP(value))
PIVOT (COUNT( DISTINCT POD_NAME) FOR VALUE in ('DONE' AS "DONE",'SKIP' AS "SKIP",'FAILED' AS "FAILED",'STATUS_TOTAL' AS "STATUS_TOTAL",'GREEN' AS GREEN,'RED' AS "RED",'YELLOW' AS "YELLOW", 'PROGRESS_TOTAL' AS "PROGRESS_TOTAL"));

Oracle Temporary table columns and data from another table's rows

Much like these questions:
MSSQL Create Temporary Table whose columns names are obtained from row values of another table
Create a table with column names derived from row values of another table
I need to do the same thing with Oracle, but i also need to fill this new table with data from another table which is organized in a particular way. An Example:
Table Users
|id|name |
----------
|1 |admin|
|2 |user |
Table user_data_cols
|id|field_name|field_description|
---------------------------------
|1 |age |Your age |
|2 |children |Your children |
Table user_data_rows
|id|user_id|col_id|field_value|
-------------------------------
|1 |1 |1 |32 |
|2 |1 |2 |1 |
|3 |2 |1 |19 |
|4 |2 |2 |0 |
What i want is to create, using only sql, a table like this:
|user_id|age|children|
----------------------
|1 |32 |1 |
|2 |19 |0 |
Starting from the data in the other tables (which might change with time so i'll need to create a new table if a new field is added)
Is such a thing even possible? I feel this might be against a lot of good practices but it can't be helped...
For comparison purposes here is the answer from your link:
Create a table with column names derived from row values of another table
SELECT
CONCAT(
'CREATE TABLE Table_2 (',
GROUP_CONCAT(DISTINCT
CONCAT(nameCol, ' VARCHAR(50)')
SEPARATOR ','),
');')
FROM
Table_1
INTO #sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;
Now let's do the same in Oracle 11g:
DECLARE
stmt varchar2(8000);
BEGIN
SELECT 'CREATE TABLE Table_2 ('||
(SELECT
LISTAGG(nameCol, ', ') WITHIN GROUP (ORDER BY nameCol) "cols"
FROM
(SELECT DISTINCT nameCol||' VARCHAR2(50)' nameCol FROM table_1) table_x)||
')'
INTO stmt
FROM DUAL;
EXECUTE IMMEDIATE stmt;
END;
/
You asked if you can do this using 'using only sql', my answer uses a PL/SQL block.
You can fill such a table with data, using a similar strategy. As others have noted, you must know the columns at parse time, to get around that restriction, you can follow a strategy such as was done here:
Dynamic Pivot

Oracle partition key

I have many tables with large amount of data. The PK is the column (TAB_ID) which has data type RAW(16). I created the hash partitions with partition key having the TAB_ID column.
My issue is: the SQL statement (select * from my_table where tab_id = 'aas1df') does not use partition pruning. If I change the column datatype to varchar2(32), partition pruning works.
Why does not partition pruning work with partition key which have datatype RAW(16)?
I'm just guessing: try select * from my_table where 'aas1df' = tab_id.
Probably the datatype conversion works other way that expected. Anyway you should use the function UTL_RAW.CAST_TO_RAW.
Edited:
Is your table partitioned by TAB_ID? If yes, then there is something wrong with your design, you usually partition table by some useful business value, NOT by surrogate key.
If you know the PK value you do not need partition pruning at all. When Oracle traverses the PK index it gets ROWID value. This ROWID contains file-number, block ID and also row number within the block. So Oracle can access directly the row.
HEXTORAW enables partition pruning.
In the sample below the Pstart and Pstop are literal numbers, implying partition pruning occurs.
create table my_table
(
TAB_ID raw(16),
a number,
constraint my_table_pk primary key (tab_id)
)
partition by hash(tab_id) partitions 16;
explain plan for
select *
from my_table
where tab_id = hextoraw('1BCDB0E06E7C498CBE42B72A1758B432');
select * from table(dbms_xplan.display(format => 'basic +partition'));
Plan hash value: 1204448714
--------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | |
| 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| MY_TABLE | 2 | 2 |
| 2 | INDEX UNIQUE SCAN | MY_TABLE_PK | | |
--------------------------------------------------------------------------

Resources