I am trying to see if there is a way to improve the way data is inserted and updated.
I am using ORACLE DB with JDBC.
The current way i'm doing is to update (e.g.)customer record by using a FOR loop after checking if toUpdate is true . An Example such as the sample code below, followed by calling an existing DAO update() to do so. But this would not allow for the UPSERT of multiple data together.
However, is there a better way to UPSERT multiple data together?
if (toUpdate) {
for (Customer customerRec : customerRecList)
customerRecDAO.update(customerRec);
}
Yes you can use batching:
public <T> int saveInBatch(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn){
try{
MapSqlParameterSource[] params = records.stream().map(paramFn).toArray(MapSqlParameterSource[]::new);
int rowCount = jdbcTemplate.batchUpdate(sql, params);
return Arrays.stream(rowCount).sum();
}
catch(Exception e){
//exception handling
}
}
paramFn is a lambda of function such that you can map records to their values. example could be
(record)->{
return new MapSqlParameterSource("username" ,username),Integer.class);//just example
}
why we use MapSqlParameterSource
You can call saveInBatch in such a way that you pass smaller batches or customized batches of records. Suppose you have a million records then you may want to update only 200-400 records at a time so you can do something like below:
private <T> int saveRecords(List<T> records, String sql, Function<T, MapSqlParameterSource> paramFn) throws Exception{
return Lists.partition(records, 300).stream().map(batch-> saveInBatch(batch, sql, paramFn)).mapToInt(Integer::intValue).sum();
}
Note: above is not well optimized or streams are not used to their best but this is a working code I tried ages back :).
Related
I already went through many links like: Spring Batch - Skip Record On Process and simply looking to validate the records in the processor before writing it to the MongoDB.
I've 500 records in the Oracle DB and on 162th record, below code's line-1 satisfy and after than no other records are getting considered for writing, so instead of 500 records, I supposed to get 480 records, 20 records I want to skip because its EFFECTIVE_DATE is null which I don't want to consider for writting.
public class StudentRowMapper implements RowMapper<Student> {
#Override
public Student mapRow(ResultSet rs, int rowNum) throws SQLException {
if(rs.getString("EFFECTIVE_DATE") == null) { //Line-1
return null;
}
else {
Student Student = new Student();
Student.setRowIdObject(rs.getInt("PK_ID"));
.............
.............
.............
.............
return Student;
}
}
}
Aggreed with #Mahmoud, you can also :
Add this filter on the query of your mongodb reader : "{ EFFECTIVE_DATE: null }"
Return null in your processor
simply looking to validate the records in the processor before writing it to the MongoDB.
ValidatingItemProcessor is what you are looking for. It allows you to validate items and skip them or filter them (see filter parameter) before passing them to the writer.
I am new to Scala and I am looking at a bit of Scala code which involves a Spring JDBC template and a RowMapper:
It is something like this:
val db = jdbcTemplate.queryForObject(QUERY, new RowMapper[SomeObject]() {
def mapRow(ResultSet rs, int rowNum) {
var s = new SomeObject()
s.setParam1 = rs.getDouble("columnName")
return s
}
})
db
I am writing this from memory so I have just used generic names.
I was wondering why db is written at the end. I can't think of what purpose it serves.
Also, If I had several JDBC templates and an object like s in the example where I wanted to populate it's data with output from the several JDBC templates. Is it possible to do this in one function? Is it possible to have a mapRow function which doesn't return anything so that I could maybe have an array of templates and loop through them?
Thanks
The db at the end means return db where return statement is skipped. This is a standard convention in Scala. Seems like your code is a body of the function which suppose to return db. The first statement simply assigns the result of the query to a db
RowMapper interface can be replaced with implicit conversion to a function of the following type (ResultSet, Int) => SomeObject, which means that it takes two parameters (ResultSet and Int) and returns the result of type SomeObject
i am executing one select query using Springjdbc template and it is returning nearly 1000 ids as a set. but it is taking 10 mins time for execution using Spring jdbc template.
but in Toad, same query is executing with in seconds.
can any one pls help me regarding this?
and i am using below code:
return (HashSet)this.jdbcTemplate.query(
(String) sqlMap.get("SQL_NRChargePromoApIDList"), new Object[] { }, new DataMapperAPID());
public Object mapRow(ResultSet rs, int rowNum) throws SQLException {
HashSet compList = new HashSet();
compList.add(rs.getString("ap_id"));
while(rs.next()){
compList.add(rs.getString("ap_id"));
}
return compList;
}
You don't need to call rs.next() as the whole point of the RowMapper that you pass in to JdbcTemplate.query() is that it will automatically iterate across all the rows in the resultset and insert the mapped object into a list, which it will then return. the row mapper should simply extract the app_id and then return it. Upon completion, you will get back a List.
Is there any way I can get resultset object from one of jdbctemplate query methods?
I have a code like
List<ResultSet> rsList = template.query(finalQuery, new RowMapper<ResultSet>() {
public ResultSet mapRow(ResultSet rs, int rowNum) throws SQLException {
return rs;
}
}
);
I wanted to execute my sql statement stored in finalQuery String and get the resultset. The query is a complex join on 6 to 7 tables and I am select 4-5 columns from each table and wanted to get the metadata of those columns to transform data types and data to downstream systems.
If it is a simple query and I am fetching form only one table I can use RowMapper#mapRow and inside that maprow method i can call ResultsetExtractor.extractData to get list of results; but in this case I have complex joins in my query and I am trying to get resultset Object and from that resultset metadata...
The above code is not good because for each result it will return same resultset object and I dont want to store them in list ...
Once more thing is if maprow is called for each result from my query will JDBCTemplate close the rs and connection even though my list has reference to RS object?
Is there any simple method like jdbcTemplate.queryForResultSet(sql) ?
Now I have implemented my own ResultSet Extractor to process and insert data into downstream systems
sourceJdbcTemplate.query(finalQuery, new CustomResultSetProcessor(targetTable, targetJdbcTemplate));
This CustomResultSetProcessor implements ResultSetExtractor and in extractData method I am calling 3 different methods 1 is get ColumnTypes form rs.getMetaData() and second is getColumnTypes of target metadata by running
SELECT NAME, COLTYPE, TBNAME FROM SYSIBM.SYSCOLUMNS WHERE TBNAME ='TABLENAME' AND TABCREATOR='TABLE CREATOR'
and in 3rd method I am building the insert statement (prepared) form target columntypes and finally calling that using
new BatchPreparedStatementSetter()
{
#Override
public void setValues(PreparedStatement insertStmt, int i) throws SQLException{} }
Hope this helps to others...
Note that the whole point of Spring JDBC Template is that it automatically closes all resources, including ResultSet, after execution of callback method. Therefore it would be better to extract necessary data inside a callback method and allow Spring to close the ResultSet after it.
If result of data extraction is not a List, you can use ResultSetExtractor instead of RowMapper:
SomeComplexResult r = template.query(finalQuery,
new ResultSetExtractor<SomeComplexResult>() {
public SomeResult extractData(ResultSet) {
// do complex processing of ResultSet and return its result as SomeComplexResult
}
});
Something like this would also work:
Connection con = DataSourceUtils.getConnection(dataSource); // your datasource
Statement s = con.createStatement();
ResultSet rs = s.executeQuery(query); // your query
ResultSetMetaData rsmd = rs.getMetaData();
Although I agree with #axtavt that ResultSetExtractor is preferred in Spring environment, it does force you to execute the query.
The code below does not require you to do so, so that the client code is not required to provide the actual arguments for the query parameters:
public SomeResult getMetadata(String querySql) throws SQLException {
Assert.hasText(querySql);
DataSource ds = jdbcTemplate.getDataSource();
Connection con = null;
PreparedStatement ps = null;
try {
con = DataSourceUtils.getConnection(ds);
ps = con.prepareStatement(querySql);
ResultSetMetaData md = ps.getMetaData(); //<-- the query is compiled, but not executed
return processMetadata(md);
} finally {
JdbcUtils.closeStatement(ps);
DataSourceUtils.releaseConnection(con, ds);
}
}
In trying to solve:
Linq .Contains with large set causes TDS error
I think I've stumbled across a solution, and I'd like to see if it's a kosher way of approaching the problem.
(short summary) I'd like to linq-join against a list of record id's that aren't (wholly or at least easily) generated in SQL. It's a big list and frequently blows past the 2100 item limit for the TDS RPC call. So what I'd have done in SQL is thrown them in a temp table, and then joined against that when I needed them.
So I did the same in Linq.
In my MyDB.dbml file I added:
<Table Name="#temptab" Member="TempTabs">
<Type Name="TempTab">
<Column Name="recno" Type="System.Int32" DbType="Int NOT NULL"
IsPrimaryKey="true" CanBeNull="false" />
</Type>
</Table>
Opening the designer and closing it added the necessary entries there, although for completeness, I will quote from the MyDB.desginer.cs file:
[Table(Name="#temptab")]
public partial class TempTab : INotifyPropertyChanging, INotifyPropertyChanged
{
private static PropertyChangingEventArgs emptyChangingEventArgs = new PropertyChangingEventArgs(String.Empty);
private int _recno;
#region Extensibility Method Definitions
partial void OnLoaded();
partial void OnValidate(System.Data.Linq.ChangeAction action);
partial void OnCreated();
partial void OnrecnoChanging(int value);
partial void OnrecnoChanged();
#endregion
public TempTab()
{
OnCreated();
}
[Column(Storage="_recno", DbType="Int NOT NULL", IsPrimaryKey=true)]
public int recno
{
get
{
return this._recno;
}
set
{
if ((this._recno != value))
{
this.OnrecnoChanging(value);
this.SendPropertyChanging();
this._recno = value;
this.SendPropertyChanged("recno");
this.OnrecnoChanged();
}
}
}
public event PropertyChangingEventHandler PropertyChanging;
public event PropertyChangedEventHandler PropertyChanged;
protected virtual void SendPropertyChanging()
{
if ((this.PropertyChanging != null))
{
this.PropertyChanging(this, emptyChangingEventArgs);
}
}
protected virtual void SendPropertyChanged(String propertyName)
{
if ((this.PropertyChanged != null))
{
this.PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
}
}
}
Then it simply became a matter of juggling around some things in the code. Where I'd normally have had:
MyDBDataContext mydb = new MyDBDataContext();
I had to get it to share its connection with a normal SqlConnection so that I could use the connection to create the temporary table. After that it seems quite usable.
string connstring = "Data Source.... etc..";
SqlConnection conn = new SqlConnection(connstring);
conn.Open();
SqlCommand cmd = new SqlCommand("create table #temptab " +
"(recno int primary key not null)", conn);
cmd.ExecuteNonQuery();
MyDBDataContext mydb = new MyDBDataContext(conn);
// Now insert some records (1 shown for example)
TempTab tt = new TempTab();
tt.recno = 1;
mydb.TempTabs.InsertOnSubmit(tt);
mydb.SubmitChanges();
And using it:
// Through normal SqlCommands, etc...
cmd = new SqlCommand("select top 1 * from #temptab", conn);
Object o = cmd.ExecuteScalar();
// Or through Linq
var t = from tx in mydb.TempTabs
from v in mydb.v_BigTables
where tx.recno == v.recno
select tx;
Does anyone see a problem with this approach as a general-purpose solution for using temporary tables in joins in Linq?
It solved my problem wonderfully, as now I can do a straightforward join in Linq instead of having to use .Contains().
Postscript:
The one problem I do have is that mixing Linq and regular SqlCommands on the table (where one is reading/writing and so is the other) can be hazardous. Always using SqlCommands to insert on the table, and then Linq commands to read it works out fine. Apparently, Linq caches results -- there's probably a way around it, but it wasn't obviousl.
I don't see a problem with using temporary tables to solve your problem. As far as mixing SqlCommands and LINQ, you are absolutely correct about the hazard factor. It's so easy to execute your SQL statements using a DataContext, I wouldn't even worry about the SqlCommand:
private string _ConnectionString = "<your connection string>";
public void CreateTempTable()
{
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
dc.ExecuteCommand("create table #temptab (recno int primary key not null)");
}
}
public void DropTempTable()
{
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
dc.ExecuteCommand("DROP TABLE #TEMPTAB");
}
}
public void YourMethod()
{
CreateTempTable();
using (MyDBDataContext dc = new MyDBDataContext(_ConnectionString))
{
...
... do whatever you want (within reason)
...
}
DropTempTable();
}
We have a similar situation, and while this works, the issue becomes that you aren't really dealing with Queryables, so you cannot easily use this "with" LINQ. This isn't a solution that works with method chains.
Our final solution was just to throw what we want in a stored procedure, and write selects in that procedure against the temp tables when we want those values. It is a compromise, but both are workarounds. At least with the stored proc the designer will generate the calling code for you, and you have a black boxed implementation so if you need to do further tuning you can do so strictly within the procedure, without a recompile.
In a perfect world, there will be some future support for writing Linq2Sql statements that allow you to dicate the use of temp tables within your queries, avoid the nasty sql IN statement for complex scenarios like this one.
As a "general-purpose solution", what if you code is run in more than one threads/apps? I think big-list solution is always related to the problem domain. It's better to use a regular table for the problem you are working on.
I once created a "generic" list table in database. The table was created with three columns: int, uniqueidentifier and varchar, along with other columns to manage each list. I was thinking: "it ought to be enough to handle many cases". But soon I received a task that requires a join be performed with a list on three integers. After that, I never tried to create "generic" list table again.
Also, it's better to create a SP to insert multiple items into the list table in each database call. You can easily insert ~2000 items in less than 2 db round trips. Of cause, depending on what you are doing, performance may do not matter.
EDIT: forgot it is a temporary table and temporary table is per connection, so my previous argument on multi-threads was not proper. But still, it is not a general solution, for enforcing the fixed schema.
Would the solution offered by Neil actually work? If its a temporary table, and each of the methods is creating and disposing its own data context, I dont think the temporary table would still be there after the connection was dropped.
Even if it was there, I think this would be an area where you are assuming some functionality of how queries and connections end up being rendered, and thats ome of the big issues with linq to sql - you just dont know what might happen downt he track as the engineers come up with better ways of doing things.
I'd do it in a stored proc. You can always return the result set into a pre-defined table if you wish.