I am using Kafka source connector configuration to produce data from a source table (Maria DB) into a Kafka topic. For some tables, I am using mode='timestamp' in the connect config.
I have two fields created_timestamp and updated_timestamp. I want to produce records as per updated_timestamp field. But there is a chance that the updated_timestamp field might contain NULL values.
Currently, I am using the NVL function in the query parameter to select created_time whenever updated_timestamp is NULL. But records not flown from source table to topic.
Kafka source connector configuration
curl -k -X POST https://:8083/connectors -H "Content-Type: application/json" -d '{"name": "TEST_SOURCECONNECT","config": {"topic.prefix": "topic-name","connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector","connection.url": "jdbc:mysql://db ip","connection.user": "","connection.password": "","mode": "timestamp","query": "SELECT t.* from (SELECT *, IFNULL(updated_timestamp, created_timestamp) as custom_timestamp FROM TABLE) t","timestamp.column.name": "custom_timestamp","validate.non.null": "false","poll.interval.ms": 60000}}'
Related
Currently, I have an endpoint that allows the update of multiple records. Before saving the changes to the database I need to validate the data that is being sent from the front end.
I am having an issue with some kinds of validations that require checks against all the other records in the database table. (Ex: ate intervals overlaps/gaps, unique pairs checks).
In these cases when I try to do the validations I have two sets of data
The data sent from the front end that are stored in memory/variable.
The data on the database.
For the validations to be run correctly I need a way to merge the data in memory(the updated records) with the data in a database(original records + other data that is not currently being updated).
Is there a good way of doing this that does not require loading everything on the memory and merging both datasets there?
Another idea I am thinking of is to open a database transaction set the new data to the database and then when executing the gaps/overlap check queries use dirty read. I don't know if this is a good approach though.
Extra notes:
I am using Oracle as a database and Dapper to communicate with it.
The tables that need validation usually hold millions of records.
The same issue is for the create endpoint.
Another example
I am trying to create entities. The create endpoint is called and it has these data on the body (date format dd/mm/yyy):
StartDate
EndDate
01/01/2022
05/01/2022
10/01/2022
11/01/2022
12/01/2022
15/01/2022
In database I have these records saved:
Id
StartDate
EndDate
1
06/01/2022
09/01/2022
2
16/01/2022
20/01/2022
I need to check if there are any gaps between the dates. If there are I need to send a warning to the user(data in the database can be invalid - the application has old data and I can't do anything about that at the moment).
The way I check for this right now is like by using the SQL below
WITH CTE_INNERDATA AS(
SELECT s.STARTDATE, s.ENDDATE
FROM mytable s
WHERE FK = somefkvalue
UNION ALL
SELECT :StartDt AS STARTDATE, :EndDt AS ENDDATE FROM DUAL -- this row contains the data from one of the rows that came form the front-end
),
CTE_DATA AS (
SELECT ctid.STARTDATE, ctid.ENDDATE, LAG(ctid.ENDDATE, 1) OVER (ORDER BY ctid.STARTDATE) AS PREV_ENDDATE FROM CTE_INNERDATA ctid
)
SELECT COUNT(1) FROM cte_data ctd
WHERE PREV_ENDDATE IS NOT NULL
AND PREV_ENDDATE < STARTDATE
Using this SQL query when validating the third row (12/01/2022 - 15/01/2022) there will be a gap between dates 09/01/2022 and 12/01/2022.
This issue would be fixed if instead of using union with a single row, to use it with all the rows send from the front-end but I can't figure out a way to do something like that.
#Update
I iterate through the records the frontend sent and call this method to check for gaps.
private async Task ValidateIntervalGaps(int someFkValue, DateTime startdate, DateTime endDate)
{
var connection = _connectionProvider.GetOpenedConnection();
var gapsCount = await connection.QueryFirstAsync<int>(#"<<Query from above>>",
new { StartDt = startdate, EndDt = endDate, somefkvalue= someFkValue });
if (gapsCount > 0)
{
// Add warning message here
}
}
Can someone explain or show how Nifi's ExecuteSQLRecord would work with parameters? The documentation says:
If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value,
where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type.
I've been able to use the HandleHttpRequest, ExtractText, to make this query work. curl -d "select * from MY_TABLE WHERE NAME = '1234'" http://localhost:5555
I'm unsure how I would update the ExecuteSQLRecord to make it work with parameters to avoid a sql injections.
Would I replace the 'test' with a ? and extract the attributes with another processor? I wish there was an example.
The query should be select * from MY_TABLE where NAME = '?', and then incoming flowfiles will need to have the following attributes (from your example):
sql.args.1.type: varchar
sql.args.1.value: 1234
For multiple parameters, it would follow this general pattern:
Query: select * from MY_TABLE where NAME = '?' and OTHER_COL = '?' ...
Flowfile attributes:
sql.args.1.type: varchar
sql.args.1.value: First Last
sql.args.2.type: integer
sql.args.2.value: 1234
...
I have the following json on a topic that the JDBC connector publishes to
{"APP_SETTING_ID":9,"USER_ID":10,"APP_SETTING_NAME":"my_name","SETTING_KEY":"my_setting_key"}
Here's my connector file
name=data.app_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT APP_SETTING_ID, APP_SETTING_NAME, SETTING_KEY,FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=data.app_setting
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
I now want to insert a key to this message by multiplying the two integer fields - APP_SETTING_ID and USER_ID. So the key for this message becomes 9*10 = 90
Is this transformation possible through Connect and if so could someone please shed light on it
I would try seeing how far you can get with
query=SELECT APP_SETTING_ID, APP_SETTING_NAME, SETTING_KEY, (APP_SETTING_ID*USER_ID) as _key FROM MY_TABLE with (nolock)
Then add an ExtractKey transform
transforms=AddKeys,ExtractKey
# this make a map
transforms.AddKeys.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.AddKeys.fields=_key
# this gets one field from the map
transforms.ExtractKey.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractKey.field=_key
I need to sort the table data based on the date (by combining 3 fields) and limit the records.
we can do it by writing raw query like below:
SELECT * FROM employee
order by STR_TO_DATE(CONCAT(emp_date_day, '/', emp_date_month, '/', emp_date_year), '%d/%m/%Y') desc
limit 2;
But I would like to implement the same using JPA.
How can I do it in Spring Data JPA ?
I have a birt dataset for a db2 query. My query works fine without parameters with the following query...
with params as (SELECT '2014-02-16' enddate,'1' locationid FROM sysibm.sysdummy1)
select
t.registerid
from (
select
...
FROM params, mytable sos
WHERE sos.locationid=params.locationid
AND sos.repositorytype ='xxx'
AND sos.repositoryaccountability='xxx'
AND sos.terminalid='xxx'
AND DATE(sos.balanceDate) between date(params.enddate)-6 DAY and date(params.enddate)
GROUP BY sos.terminalid,sos.balancedate,params.enddate) t
GROUP BY
t.registerid
WITH UR
But when I change the top line to ...
with params as (SELECT ? enddate,? locationid FROM sysibm.sysdummy1)
And make the two input paramters of string datatype I get db2 errors sqlcode -418. But i know that it is not my querty because my query works.
What is the right way for me to set up the parameters so there is no error?
thanks
I'm not familiar with DB2 programming, but on Oracle the ? works anywhere in the query.
Have you looked at http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z9.doc.codes%2Fsrc%2Ftpc%2Fn418.htm?
Seems that on DB2 it's a bit more complicated and you should use "typed parameter markers".
The doc says:
Typed parameter marker
A parameter marker that is specified with its target data type. A typed parameter marker has the general form:
CAST(? AS data-type)
This invocation of a CAST specification is a "promise" that the data type of the parameter at run time will be of the data type that is specified or some data type that is assignable to the specified data type.
Apart from that, always assure that your date strings are in the format that the DB expects, and use explicit format masks in the date function, like this:
with params as (
SELECT cast (? as varchar(10)) enddate,
cast (? as varchar2(80)) locationid
FROM sysibm.sysdummy1
)
select
...
from params, ...
where ...
AND DATE(sos.balanceDate) between date(XXX(params.enddate))-6 DAY and date(XXX(params.enddate))
...
Unfortunately I cannot tell you how the XXX function should look on DB2.
On Oracle, an example would be
to_date('2014-02-18', 'YYYY-MM-DD')
On DB2, see Converting a string to a date in DB2
In addition to hvb answer, i see two options:
Option 1 you could use a DB2 stored procedure instead of a plain SQL query. Thus there won't be these limitations you face to, due to JDBC query parameters.
Option 2, we should be able to remove the first line of the query "with params as" and replace it with question marks within the query:
select
t.registerid
from (
select
sos.terminalid,sos.balancedate,max(sos.balanceDate) as maxdate
FROM params, mytable sos
WHERE sos.locationid=?
AND sos.repositorytype ='xxx'
AND sos.repositoryaccountability='xxx'
AND sos.terminalid='xxx'
AND DATE(sos.balanceDate) between date(?)-6 DAY and date(?)
GROUP BY sos.terminalid,sos.balancedate) t
GROUP BY
t.registerid
A minor drawback is, this time we need to declare 3 dataset parameters in BIRT instead of 2. More nasty, i removed params.endDate from "group by" and replaced it with "max(sos.balanceDate)" in select clause. This is very near but not strictly equivalent. If this is not acceptable in your context, a stored procedure might be the best option.