Cassandra insert preparedStatement or mapper? - performance

We are doing inserts of a few millions records in 1 time in a Cassandra 3.0 database. Question is : what has best performance: using the mapper (annotating our object 'JPA' style) or using a prepared statement, which will only be prepared once and then bind for every insert.
I read here that the mapper does an implicit prepared statement in the background so performance should not differ. But I don't understand where he should keep that prepared statement? Or is it done for every insert, which would take away the advantage of doing a prepared statement. So question : mapper (jpa style) or preparedStatement (JDBC style :-) ) ?

Mapper keeps prepared statements in the instance of Mapper class, and instances of the Mapper class are kept in the MappingManager, so if you're recreating MappingManager all the time, then you're loosing you're prepared statements, and get the worse performance...
If you go with prepared statements directly, then you need to keep it somewhere together with instance of Session object that you should create only once & reuse.

Related

Spring JdbcTemplate execute vs update

What is the difference between execute(String sql) and update(String sql) in JdbcTemplate?
If my statement is a straight CRUD and not an object creation DDL (as the execute javadoc implies), does it make sense to use execute vs the seemingly more lightweight update?
The method execute(String sql) returns void if a call of the method succeeds without errors. (see execute(..) JavaDoc). As for plain JDBC, it should/can be used to define database schema elements (DDL), for instance with CREATE TABLE... statements.
By contrast, update(String sql) is typically used for DML statements which correspond to SQL INSERT/UPDATE/DELETE operations. In these cases, in which data are manipulated, from a programmer's perspective it is important to know how many rows have been added/changed/deleted by the respective DML operation.
For this reason, the update(...) method returns a non negative int value to let you know:
Returns:
the number of rows affected
As the JavaDoc indicates by using the term "typically" in its description, you could, however, use execute(String sql) to manipulate data without the need to use the returned int value. In theory, and for some DBMS implementations, this call could be some nanoseconds quicker, as no return value needs to be transferred.
Yet, from my personal and a programmer's perspective, you should use both operations with the difference between DDL vs. DML statements in mind, as by its nature update signals a data manipulation operation being conducted.
Hope, it helps.

jdbc Statement or PreparedStatement without where

If I don't have a WHERE clause in a query then should I use Statement or PreparedStatement. Which one will be efficient.
For Ex,
SELECT ID, NAME FROM PERSON
A prepared statement is precompiled to enhance efficiency. Also the database caches the statement which gains performance on later execution. Both can be of use even if you don't have variables in your statement. Especially if the statement is executed often.
If executed once or very seldomly I'd say a normal Statement is fine. Otherwise I would use a PreparedStatement. But there's no way of beeing sure about it without benchmarking.
Depends on the implementation of the JDBC driver. Some vendors save that statement in a cache, regardless if is a instance of java.sql.Statement or java.sql.PreparedStatement. For simplicity, you could use java.sql.Statement. On the other hand, if you plan to add a parameter and execute the statement several times (in the same connection), uses an instance of java.sql.PreparedStatement.
In the javadoc for java.sql.PreparedStatement says:
This object can then be used to efficiently execute this statement multiple times.
Apart from what has been mentioned by stonedsquirrel, another point is in future if you would want to add where condition then it is easy to make a change, all you need to add the following in your code
PreparedStatement ps = con.prepareStatement("SELECT ID, NAME FROM PERSON WHERE NAME= ?");
ps.setString(1, getName(""));
....
...
However if you are using Statement, then you need to make more changes in your code.
So by using PreparedStatement you will do minimal change if you need to add where conditions.
On the contrary by using Statement, it is quite easy to log or print the sql query, however if
PreparedStatement is used, logging or printing sql statement is quite difficult or there are no direct approaches available.

best way to kill thread that populates Java ResultSet object from Oracle DB

I have a background thread that is querying an Oracle database via a Select statement. The statement is populating a ResultSet Java object. If the query returns a lot of rows, the ResultSet object might get very large. If it's too large, I want to both cancel the background thread, but more importantly I want to cancel the thread that is creating the ResultSet object and eating up a lot of Java memory.
From what I have read so far online, java.sql.Statement cancel(), seems to be the best way to get this done, can anyone confirm this? is there a better way?
java.sql.Statement close() also works, I could probably catch the ExhaustedResultset exception, but maybe that's not safe.
To clarify, I do not want the ResultSet or the thread - I want to discard both completely from memory.
This depends on the JDBC implementation: Statement.cancel() is a request to the JDBC driver class, that may or may not do what you need or expect.
However, seeing as you are performing a select (normally non-transactional) and seeing as the default row Prefetch property for JDBC is 10, this should probably do the trick. See this answer for similar/related information:
When I call PreparedStatement.cancel() in a JDBC application, does it actually kill it in an Oracle database?
Canceling the thread doesn't solve your problem, if you really need the query results.
If you are concerned about using up too much memory you can set the fetch size on the resultSet, which will limit the number of rows you get back at a time. Then you would have to consume the resultSet as you go (if the data piles up in the data structure you're copying the resultSet rows into then you're back to eating up memory).
Oracle has a great document on memory management depending on your driver version.

Spring JDBCTemplate vs Plain JDBC for inserting large numbers of records

We have to insert 2 millions of records across multiple tables and right now we are writing into a CSV file and using db2 import to load into database.
We wanted to change this logic to some kind of JDBC. While looking into multiple options, I am confused with Spring JDBC template and plain JDBC.
Let us say I wanted to insert 1 million records into 10 tables, each of which will have 100 thousand, and all these are simple JDBC statements (not prepared statements because I don't know which table I am dealing with at runtime).
Whatever system we choose will need to handle inserting up to 15 million records for a peak request.
Which framework will be better?
If you want to move a lot of data, then using JDBC (or any library building on top of JDBC) may be a bad choice, compared to using bulk copy tools (like db2import). JDBC is going to be orders of magnitude slower, because
JDBC is a very chatty protocol, and
usually bulk copy tools relax constraints during the copy process.
The difference in time can be extreme: what takes the bulk copy tool 10 minutes can take hours using JDBC. You'll want to create a prototype and do some timings and be certain about what kind of performance you'll get before you commit to something like this.
If you're already using Spring, then you may as well use JdbcTemplate. It makes things a bit easier, and in some simple cases means you need not use the JDBC API directly yourself. Essentially, JdbcTemplate is a very thin wrapper around JDBC that removes some of your annoying boiler-plate code.
As skaffman said, if you are already using Spring then your choice is probably JdbcTemplate. Specifically you may want to look at the batchUpdate() method. Here is a pretty good example of how it works. I've used it to insert a couple hundred thousand rows quickly with great success.
Consider JdbcSession from jcabi-jdbc. It's as simple as JDBC should be, for example (inserting a million records):
JdbcSession session = new JdbcSession(source);
for (int i = 0; i < 1000000; ++i) {
session.sql("INSERT INTO foo (number) VALUES (?)")
.set(i)
.insert(new VoidHandler());
}
That's it.

Entity framework and performance

I am trying to develop my first web project using the entity framework, while I love the way that you can use linq instead of writing sql, I do have some severe performance issuses. I have a lot of unhandled data in a table which I would like to do a few transformations on and then insert into another table. I run through all objects and then inserts them into my new table. I need to do some small comparisons (which is why I need to insert the data into another table) but for performance tests I have removed them. The following code (which approximately 12-15 properties to set) took 21 seconds, which is quite a long time. Is it usually this slow, and what might I do wrong?
DataLayer.MotorExtractionEntities mee = new DataLayer.MotorExtractionEntities();
List<DataLayer.CarsBulk> carsBulkAll = ((from c in mee.CarsBulk select c).Take(100)).ToList();
foreach (DataLayer.CarsBulk carBulk in carsBulkAll)
{
DataLayer.Car car = new DataLayer.Car();
car.URL = carBulk.URL;
car.color = carBulk.SellerCity.ToString();
car.year = //... more properties is set this way
mee.AddToCar(car);
}
mee.SaveChanges();
You cannot create batch updates using Entity Framework.
Imagine you need to update rows in a table with a SQL statement like this:
UPDATE table SET col1 = #a where col2 = #b
Using SQL this is just one roundtrip to the server. Using Entity Framework, you have (at least) one roundtrip to the server loading all the data, then you modify the rows on the client, then it will send it back row by row.
This will slow things down especially if your network connection is limited, and if you have more than just a couple of rows.
So for this kind of updates a stored procedure is still a lot more efficient.
I have been experimenting with the entity framework quite a lot and I haven't seen any real performance issues.
Which row of your code is causing the big delay, have you tried debugging it and just measuring which method takes the most time?
Also, the complexity of your database structure could slow down the entity framework a bit, but not to the speed you are saying. Are there some 'infinite loops' in your DB structure? Without the DB structure it is really hard to say what's wrong.
can you try the same in straight SQL?
The problem might be related to your database and not the Entity Framework. For example, if you have massive indexes and lots of check constraints, inserting can become slow.
I've also seen problems at insert with databases which had never been backed-up. The transaction log could not be reclaimed and was growing insanely, causing a single insert to take a few seconds.
Trying this in SQL directly would tell you if the problem is indeed with EF.
I think I solved the problem. I have been running the app locally, and the database is in another country (neighbor, but never the less). I tried to load the application to the server and run it from there, and it then only took 2 seconds to run instead of 20. I tried to transfer 1000 records which took 26 seconds, which is quite an update, though I don't know if this is the "regular" speed for saving the 1000 records to the database?

Resources