How to pass dynamic SQL statement to the JDBCIO connector in apache beam? - jdbc

Apache beam provides the JDBCIO connector to connect to CloudSql postgreSQL. My job reads an event from pub/sub. The event body is as below:
tableName,
list<value>
I need to write to the table based on the table name that I get in from my message.
The JDBCIO has prepared statement which will let me parameterize the values in my insert query. But I need to generate the insert query dynamically based on the information present in the event.
pipeline
.apply(PubsubIO.readStrings().fromSubscription())
.apply(convertToKV())
.apply(JdbcIO.<List<String>>>write()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
"com.mysql.jdbc.Driver", "jdbc:mysql://hostname:3306/mydb")
.withUsername("username")
.withPassword("password"))
.withStatement("insert into Person values(?, ?)")
.withPreparedStatementSetter(new JdbcIO.PreparedStatementSetter<KV<Integer, String>>() {
public void setParameters(KV<Integer, String> element, PreparedStatement query)
throws SQLException {
i=0
for each element in list
query.setInt(i, element.get(i);
i++;
}
})
);
I should be able to create the SQL statement dynamically based on the input event from the pcollection.
My select statement should be dynamically generated based on the list value and the table name. Please let me know whether we can do this or not.
Update:-
im trying to manually call the jdbc driver inside the parDo function but getting the below error.
No suitable driver found for jdbcURL.
Please let me know if im missing anyting:
#Setup
public void doAnyRequiredSetup() throws SQLException
{
LoggingContextUtil.installContext(loggingContext);
connection=DriverManager.getConnection(JdbcUrl,user,password);
statement=connection.createStatement();
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("In doAnyRequiredSetup logging Context is now set and JDBC connection is .");
}
}
#SuppressWarnings("unchecked")
#ProcessElement
public void processElement(ProcessContext context)
{
JsonNode element=context.element();
try {
String query=formatQuery(baseQuery);
boolean result=statement.execute(query);
if(LOGGER.isDebugEnabled()) {
LOGGER.debug("Executed query : "+query+" and the result is "+ result);
}
} catch (IllegalArgumentException | SQLException e) {
ErrorMessage em = new ErrorMessage(element.toString(), "Insert Query Failed", e.getMessage());
context.output(ValidateTagHelper.FAILURE_TAG,em);
}
}

You can not have dynamic queries on JdbcIO based on the input elements.
The ParDo has to reset as you like, you can rewrite your ParDo in which you would call the JDBC driver manually.
If find this other workaround, you can split the input PColleciton into multiple outputs. That will work if your use case is limited to some predefined set of queries that you can choose from based on the input. This way you split the input into multiple PCollections and then attach differently configured IOs to each.
https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1

You can try to read pubsub messages with attributes and in attributes, you can pass the table name and values in the form of Key-value pair.
PCollection<PubsubMessage> pubsubMessage = pipeline
.apply(PubsubIO.readMessagesWithAttributes().fromSubscription("")

Related

In Spring Boot, how to I get the column names from a StoredProcedureQuery result?

I am working on creating a simple utility that allows our users to execute a pre-selected list of stored procedures that return a simple list result set as a JSON string. The result set varies based on the selected procedure. I am able to get the results easily enough (and pass back as JSON as required), but the results don't include the column names.
The most common answer I found online is to use ResultSetMetaData or NativeQuery, but I couldn't figure out how to extract the metadata or transform the query properly using a StoredProcedureQuery object. How do I get the column names from a StoredProcedureQuery result?
Here is my code:
#SuppressWarnings("unchecked")
public String executeProcedure(String procedure, String jsonData) {
//Set up a call to the stored procedure
StoredProcedureQuery query = entityManager.createStoredProcedureQuery(procedure);
//Register and set the parameters
query.registerStoredProcedureParameter(0, String.class, ParameterMode.IN);
query.setParameter(0, jsonData);
String jsonResults = "[{}]";
try {
//Execute the query and store the results
query.execute();
List list = query.getResultList();
jsonResults = new Gson().toJson(list);
} finally {
try {
//Cleanup
query.unwrap(ProcedureOutputs.class).release();
} catch(Exception e) {
e.printStackTrace();
}
}
return jsonResults;
}
The challenge is to get a ResultSet. In order to list the column names you need a ResultSet to do the following to access metadata. (Column names are metadata)
ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
System.out.println("Column name: "+resultSetMetaData.getColumnName(1));
System.out.println("Column type: "+resultSetMetaData.getColumnTypeName(1));
You can't get ResultSet (or metadata) from javax.persistence.StoredProcedureQuery or from spring-jpa Support JPA 2.1 stored procedures returning result sets
You can with low-level JDBC as follows:
CallableStatement stmnt = conn.prepareCall("{call demoSp(?, ?)}");
stmnt.setString(1, "abcdefg");
ResultSet resultSet1 = stmnt.executeQuery();
resultSet1.getMetaData(); // etc

A way to update multiple records together?

I am trying to see if there is a way to improve the way data is inserted and updated.
I am using ORACLE DB with JDBC.
The current way i'm doing is to update (e.g.)customer record by using a FOR loop after checking if toUpdate is true . An Example such as the sample code below, followed by calling an existing DAO update() to do so. But this would not allow for the UPSERT of multiple data together.
However, is there a better way to UPSERT multiple data together?
if (toUpdate) {
for (Customer customerRec : customerRecList)
customerRecDAO.update(customerRec);
}
Yes you can use batching:
public <T> int saveInBatch(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn){
try{
MapSqlParameterSource[] params = records.stream().map(paramFn).toArray(MapSqlParameterSource[]::new);
int rowCount = jdbcTemplate.batchUpdate(sql, params);
return Arrays.stream(rowCount).sum();
}
catch(Exception e){
//exception handling
}
}
paramFn is a lambda of function such that you can map records to their values. example could be
(record)->{
return new MapSqlParameterSource("username" ,username),Integer.class);//just example
}
why we use MapSqlParameterSource
You can call saveInBatch in such a way that you pass smaller batches or customized batches of records. Suppose you have a million records then you may want to update only 200-400 records at a time so you can do something like below:
private <T> int saveRecords(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn) throws Exception{
return Lists.partition(records, 300).stream().map(batch-> saveInBatch(batch, sql, paramFn)).mapToInt(Integer::intValue).sum();
}
Note: above is not well optimized or streams are not used to their best but this is a working code I tried ages back :).

Astyanax/Cassandra - Getting "Re-preparing already prepared query" warning with caching enabled

I'm trying to insert some data to Cassandra with Astyanax, by I'm getting a lot of "Re-preparing already prepared query" warnings even if have caching enabled:
22:08:03,703 WARN Cluster:1702 - Re-preparing already prepared query INSERT INTO test.test (key,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9) VALUES (?,?,?,?,?,?,?,?,?,?,?) . Please note that preparing the same query more than once is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once.
22:08:03,707 WARN Cluster:1702 - Re-preparing already prepared query INSERT INTO test.test (key,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9) VALUES (?,?,?,?,?,?,?,?,?,?,?) . Please note that preparing the same query more than once is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once.
22:08:03,708 WARN Cluster:1702 - Re-preparing already prepared query INSERT INTO test.test (key,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9) VALUES (?,?,?,?,?,?,?,?,?,?,?) . Please note that preparing the same query more than once is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once.
Source code:
Connect: (executed once)
#Override
public void connect() throws ClientException {
AstyanaxContext<Keyspace> context = new AstyanaxContext.Builder()
.forCluster(clusterName)
.forKeyspace(keyspaceName)
.withHostSupplier(new Supplier<List<Host>>() {
#Override
public List<Host> get() {
return Collections.singletonList(new Host(host, 9160));
}
})
.withAstyanaxConfiguration(
new AstyanaxConfigurationImpl().setDiscoveryType(NodeDiscoveryType.DISCOVERY_SERVICE)
.setDiscoveryDelayInSeconds(60000))
.withConnectionPoolConfiguration(new JavaDriverConfigBuilder().build())
.buildKeyspace(CqlFamilyFactory.getInstance());
context.start();
keyspace = context.getClient();
columnFamilyTemplate = new ColumnFamily<String, String>(columnFamily,
StringSerializer.get(), StringSerializer.get());
try {
columnFamilyTemplate.describe(keyspace);
} catch (ConnectionException e) {
throw new ClientException(e);
}
insert = keyspace.prepareMutationBatch().withCaching(true);
}
Insert: (executed multiple times)
insert.discardMutations();
final ColumnListMutation<String> row = insert.withRow(columnFamilyTemplate, key);
for (Map.Entry<String, String> pair : columnValues.entrySet()) {
final String column = pair.getKey();
final String value = pair.getValue();
row.putColumn(column, value, null);
}
try {
insert.withCaching(true).execute();
} catch (ConnectionException e) {
throw new ClientException(e);
}
The warning message suggests that the caching is not actually working. Any idea how to fix it?

Springjdbc template select query execution is taking long time

i am executing one select query using Springjdbc template and it is returning nearly 1000 ids as a set. but it is taking 10 mins time for execution using Spring jdbc template.
but in Toad, same query is executing with in seconds.
can any one pls help me regarding this?
and i am using below code:
return (HashSet)this.jdbcTemplate.query(
(String) sqlMap.get("SQL_NRChargePromoApIDList"), new Object[] { }, new DataMapperAPID());
public Object mapRow(ResultSet rs, int rowNum) throws SQLException {
HashSet compList = new HashSet();
compList.add(rs.getString("ap_id"));
while(rs.next()){
compList.add(rs.getString("ap_id"));
}
return compList;
}
You don't need to call rs.next() as the whole point of the RowMapper that you pass in to JdbcTemplate.query() is that it will automatically iterate across all the rows in the resultset and insert the mapped object into a list, which it will then return. the row mapper should simply extract the app_id and then return it. Upon completion, you will get back a List.

Resultset Metadata from Spring JDBCTemplate Query methods

Is there any way I can get resultset object from one of jdbctemplate query methods?
I have a code like
List<ResultSet> rsList = template.query(finalQuery, new RowMapper<ResultSet>() {
public ResultSet mapRow(ResultSet rs, int rowNum) throws SQLException {
return rs;
}
}
);
I wanted to execute my sql statement stored in finalQuery String and get the resultset. The query is a complex join on 6 to 7 tables and I am select 4-5 columns from each table and wanted to get the metadata of those columns to transform data types and data to downstream systems.
If it is a simple query and I am fetching form only one table I can use RowMapper#mapRow and inside that maprow method i can call ResultsetExtractor.extractData to get list of results; but in this case I have complex joins in my query and I am trying to get resultset Object and from that resultset metadata...
The above code is not good because for each result it will return same resultset object and I dont want to store them in list ...
Once more thing is if maprow is called for each result from my query will JDBCTemplate close the rs and connection even though my list has reference to RS object?
Is there any simple method like jdbcTemplate.queryForResultSet(sql) ?
Now I have implemented my own ResultSet Extractor to process and insert data into downstream systems
sourceJdbcTemplate.query(finalQuery, new CustomResultSetProcessor(targetTable, targetJdbcTemplate));
This CustomResultSetProcessor implements ResultSetExtractor and in extractData method I am calling 3 different methods 1 is get ColumnTypes form rs.getMetaData() and second is getColumnTypes of target metadata by running
SELECT NAME, COLTYPE, TBNAME FROM SYSIBM.SYSCOLUMNS WHERE TBNAME ='TABLENAME' AND TABCREATOR='TABLE CREATOR'
and in 3rd method I am building the insert statement (prepared) form target columntypes and finally calling that using
new BatchPreparedStatementSetter()
{
#Override
public void setValues(PreparedStatement insertStmt, int i) throws SQLException{} }
Hope this helps to others...
Note that the whole point of Spring JDBC Template is that it automatically closes all resources, including ResultSet, after execution of callback method. Therefore it would be better to extract necessary data inside a callback method and allow Spring to close the ResultSet after it.
If result of data extraction is not a List, you can use ResultSetExtractor instead of RowMapper:
SomeComplexResult r = template.query(finalQuery,
new ResultSetExtractor<SomeComplexResult>() {
public SomeResult extractData(ResultSet) {
// do complex processing of ResultSet and return its result as SomeComplexResult
}
});
Something like this would also work:
Connection con = DataSourceUtils.getConnection(dataSource); // your datasource
Statement s = con.createStatement();
ResultSet rs = s.executeQuery(query); // your query
ResultSetMetaData rsmd = rs.getMetaData();
Although I agree with #axtavt that ResultSetExtractor is preferred in Spring environment, it does force you to execute the query.
The code below does not require you to do so, so that the client code is not required to provide the actual arguments for the query parameters:
public SomeResult getMetadata(String querySql) throws SQLException {
Assert.hasText(querySql);
DataSource ds = jdbcTemplate.getDataSource();
Connection con = null;
PreparedStatement ps = null;
try {
con = DataSourceUtils.getConnection(ds);
ps = con.prepareStatement(querySql);
ResultSetMetaData md = ps.getMetaData(); //<-- the query is compiled, but not executed
return processMetadata(md);
} finally {
JdbcUtils.closeStatement(ps);
DataSourceUtils.releaseConnection(con, ds);
}
}

Resources