JOOQ vs JDBC+tests - jdbc

What advantage have JOOQ over JDBC+tests?
In JDBC you can write SQL queries direct in code, with JOOQ we calls methods, so JOOQ is by default more slow.
In JOOQ is harder to do mistakes but not impossible. This mistakes can be caught in tests, with JOOQ you also should write these tests, so, no advantage here for JOOQ.

I completely agree with you. It's always better to have tests, regardless if you're using a "dynamic language" (SQL as an external DSL, e.g. JDBC) or a "static language" (SQL as an internal DSL, e.g. jOOQ)
But there's much more than that:
You seem to have scratched only the surface of what jOOQ can do for you. Sure, type safe, embedded SQL is a great feature, but once you have that, you get for free (list is far from exhaustive):
Active records: With JDBC, you're back to spelling out each individual boring INSERT, UPDATE, DELETE statement manually. jOOQ's UpdatableRecord greatly simplifies this, while offering things like:
RecordListener for record lifecycle management
Optimistic locking
Batching of inserts, updates, deletes
Dynamic SQL is very easy. Instead of that JDBC string concatenation mess that you'd be getting otherwise, you can just dynamically add clauses to your SQL statements.
Multi tenancy: You can easily switch schema references and/or table references at runtime in order to run the same query against a different schema.
Standardisation: The same jOOQ query runs on up to 21 RDBMS because the jOOQ API standardises the generated SQL. This can be seen in the jOOQ manual's section about the LIMIT clause, for instance - one of SQL's most poorly standardised clauses
Query Lifecycle: There's a simple SPI called ExecuteListener that allows you to hook into the various JDBC interaction steps, including:
SQL generation
Prepared statement creation
Variable binding
Execution
Result fetching
Exceptions
SQL transformation: The VisitListener SPI allows you to intercept the SQL generation at any arbitrary position in your query expression tree. This can be very useful, e.g. to implement powerful things like row level security.
Stored procedures: These are rather tedious to bind to with JDBC, especially if you're using more advanced features like:
Oracle's TABLE and OBJECT types (imagine implementing SQLData et al.)
Oracle's PL/SQL types
Implicit cursors
Table-valued functions
Of course, you get the compile-time type safety that you've mentioned (and IDE autocompletion) for free. And if you don't want to go all in on the internal DSL that jOOQ is offering, you can still just use plain SQL and then use jOOQ's API in a non-type safe way (which still has tons of features):
// Just one example: CSV exports
String csvExport =
ctx.fetch("SELECT * FROM my_table WHERE id = ?", 3)
.formatCSV();
TL;DR:
JDBC is a wire protocol abstraction API
jOOQ is a SQL API
Disclaimer:
(Of course, this answer is biased as I work for the company behind jOOQ)

Related

Is better Linq or SQL query for complex calculations and aggregations?

We must create and show at runtime (asp.net mvc) some complex reports from Oracle tables data with millions of records. The reports data must be obtained from groupings and little complex calculations.
So is it better for performance and maintainability of code that do these groupings and calculations via sql query (pl/sql) or via linq?
Thanks for your kindle reply
So is it better for performance and maintainability of code that do
these groupings and calculations via sql query (pl/sql) or via linq?
It depends on what you mean by via linq. If you mean that you fetch the complete table to local memory and then use linq statements to extract the result that you want, then of course SQL statements are faster.
However, if you mean that you use Entity Framework, or something similar, then the answer is not a easy to give.
If you use Entity Framework (or some clone), your tables will be represented by IQueryable<...> instead of IEnumerable<...>. An IQueryable has an Expression and a Provider. The Expression represents the query that must be performed. The Provider knows which system must execute the query (usually a Database Management System) and how to communicate with this system. When the query must be executed, it is the task of the Provider to translate the Expression into the language that the system knows (usually something SQL-like) and to execute the SQL-query.
There are two kinds of IQueryable LINQ statements: those that return an IQueryable<...> of something, and those that return a TResult. The ones that return IQueryable only change the Expression. They are functions that use deferred execution.
Function that do not return an IQueryable, are ToList(), FirstOrDefault(), Any(), Max(), etc. Internally they will call functions that will GetEnumerator() (usually via a foreach), which orders the Provider to translate the Expression and execute the query.
Back to your question
So which one is more efficient, entity framework or SQL? Efficiency is not only the time to perform the queries, it is also the development/testing time, for the first version and for future changes in the software.
If you use an entity-framework (-clone), the SQL-queries created from the Expressions are pretty efficient, depending on the framework manufacturer. If you look at the code, then sometimes the SQL query is not the optimal one, although you'll have to be a pretty good SQL-programmer to improve most queries.
The big advantage above using Entity Framework and LINQ queries above SQL statements is that development times will be shorter. The syntax of the LINQ statements is checked at compile time, SQL statements at run-time. Development and test periods will be shorter.
It is easy to reuse LINQ statements, while SQL statements almost always have to be written especially for the query you want to execute. LINQ statements can be tested without a database on any sequence of items that represent your tables.
My Advice
For most queries you won't notice any difference in execution time between the entity framework query or the SQL query.
If you expect complicated queries and future changes, I'd go for entity framework. With main argument the shorter development time, the better testing possibilities, and the better maintainability.
If you detect some queries where you notice that the execution time is too long, you can always decide to bypass entity framework by executing a SQL query instead of using LINQ.
If you've wrapped your DbContext in a proper repository, where you hide the use cases from their implementations, the users of your repository won't notice the difference.

Repository Pattern Contestation

According to Martin Fowler:
... "Client objects construct query
specifications declaratively and
submit them to Repository for
satisfaction" ...
Why? What are the advantages at that point?
I see one disadvantage: database queries are spread and hidden over ties. That makes it harder to debug.
The advantage is that the "what" (the declarative specification) is separated from the "how" or implemenation details. So the client doesn't need to know whether it's querying a relational database, a Web service, an object database (eg Mongo), an XML data store, etc.
Let's assume you're using an RDBMS. Even so, the client is isolated from needing to know whether the database is Oracle, MS SQL, SQLite, mySQL, PostGres, etc. This will save you a lot of headache when the commandment "thou shalt (not) use MS SQL" (or whatever) comes down from the mountain.
The additional layer does introduce some overhead. But (1) ORM tools like (N)Hibernate are quite good at optimizing the generated queries for whatever back-end you're using, and (2) the overhead is generally negligible compared to the cost of database read, let alone a web service call.
We're converting from LINQ to NHibernate right now to avoid the "N+1" problem (ie you generate one query/hit for each "master" database record, plus a query/hit for each "child" record).
And BTW ... there is such a thing as LINQ to NHibernate.

ORM for Oracle pl/sql

I am developing a enterprise software for a big company using Oracle. Major processing unit is planned to be developed in PL/SQL. I am wondered if there is any ORM like Hibernate for Java, but the one for PL/SQL. I have some ideas how to make such a framework using PL/SQL and Oracle system tables, but it is interesting - why no one have done this before? What do you think will that be effective in speed and memory consumption? Why?
ORMs exist to provide an interface between a database-agnostic language like Java and a DBMS like Oracle. PL/SQL in contrast knows the Oracle DBMS intimately and is designed to work with it (and a lot more efficiently than Java + ORM can). So an ORM between PL/SQL and the Oracle DBMS would be both superfluous and unhelpful!
Take a read through these two articles - they contain some interesting points
Ask Tom - Relational VS Object Oriented Database Design
Ask Tom - Object relational impedance mismatch
As Tony pointed out ORMs really serve as helper between the App and Db context boundaries.
If you are looking for an additional level of abstraction at the database layer you might want to look into table encapsulation. This was a big trend back in the early 2000s. If you search you will find a ton of whitepapers on this subject.
Plsqlintgen still seems to be around at http://sourceforge.net/projects/plsqlintgen/
This answer has some relevant thoughts on the pros and cons of wrapping your tables in pl/sql TAPIs (Table APIs) for CRUD operations.
Understanding the differences between Table and Transaction API's
There was also a good panel discussion on this at last years UK Oracle User Group - the overall conclusion was against using table APIs and for transaction APIs, for much the same reason - the strength of pl/sql is the procedural control of SQL statements, while TAPIs push you away from writing set-based SQL operations and towards row-by-row processing.
The argument for TAPI is where you may want to enforce some kind of access policy, but Oracle offers a lot of other ways to do this (fine-grained access control, constraints, triggers on insert/update/etc can be used to populate defaults and enforce that the calling code is passing a valid request).
I would definitely advise against wrapping tables in PL/SQL object types.
A lot of the productivity with pl/sql comes from the fact that you can easily define things in terms of the underlying database structure - a row record type can be simply defined as %ROWTYPE, and will be automatically impacted when the table structure changes.
myRec myTable%ROWTYPE
INSERT INTO table VALUES myRec;
This also applies to collections based over these types, and there are powerful bulk operations that can be used to fetch & insert whole collections.
On the other hand, object types must be explicitly impacted each time you want to change them - every table change would require the object type to be impacted and released, doubling your work.
It can also be difficult to release changes if you are using inheritance and collections of types (you can 'replace' a package, but cannot replace a type once it is used by another type).
This isn't putting OO PL/SQL down - there are places where it definitely simplifies code (i.e. avoiding code duplication, anywhere you would clearly benefit from polymorphism) - but it is best to understand and play to the strengths of the language, and the main strength is that the language is tightly-coupled to the underlying DB.
That said, I do often find myself creating procedures to construct a default record, insert a record, etc - often enough to have editor macros for it - but I've never found a good argument for automatically generating this code for all tables (a good way to create a lot of unused code??)
Oracle is a Relation database and also has the ability to work as an object-oriented database as well. It does this by building an abstraction layer (fairly automatically) on top of the relational structure. This would seemingly eliminate the need for any "tool" as it is already built-in.

What are the advantages of LINQ to SQL?

I've just started using LINQ to SQL on a mid-sized project, and would like to increase my understanding of what advantages L2S offers.
One disadvantage I see is that it adds another layer of code, and my understanding is that it has slower performance than using stored procedures and ADO.Net. It also seems that debugging could be a challenge, especially for more complex queries, and that these might end up being moved to a stored proc anyway.
I've always wanted a way to write queries in a better development environment, are L2S queries the solution I've been looking for? Or have we just created another layer on top of SQL, and now have twice as much to worry about?
Advantages L2S offers:
No magic strings, like you have in SQL queries
Intellisense
Compile check when database changes
Faster development
Unit of work pattern (context)
Auto-generated domain objects that are usable small projects
Lazy loading.
Learning to write linq queries/lambdas is a must learn for .NET developers.
Regarding performance:
Most likely the performance is not going to be a problem in most solutions. To pre-optimize is an anti-pattern. If you later see that some areas of the application are to slow, you can analyze these parts, and in some cases even swap some linq queries with stored procedures or ADO.NET.
In many cases the lazy loading feature can speed up performance, or at least simplify the code a lot.
Regarding debuging:
In my opinion debuging Linq2Sql is much easier than both stored procedures and ADO.NET. I recommend that you take a look at Linq2Sql Debug Visualizer, which enables you to see the query, and even trigger an execute to see the result when debugging.
You can also configure the context to write all sql queries to the console window, more information here
Regarding another layer:
Linq2Sql can be seen as another layer, but it is a purely data access layer. Stored procedures is also another layer of code, and I have seen many cases where part of the business logic has been implemented into stored procedures. This is much worse in my opinion because you are then splitting the business layer into two places, and it will be harder for developers to get a clear view of the business domain.
Just a few quick thoughts.
LINQ in general
Query in-memory collections and out-of-process data stores with the same syntax and operators
A declarative style works very well for queries - it's easier to both read and write in very many cases
Neat language integration allows new providers (both in and out of process) to be written and take advantage of the same query expression syntax
LINQ to SQL (or other database LINQ)
Writing queries where you need them rather than as stored procs makes development a lot faster: there are far fewer steps involved just to get the data you want
Far fewer strings (stored procs, parameter names or just plain SQL) involved where typos can be irritating; the other side of this coin is that you get Intellisense for your query
Unless you're going to work with the "raw" data from ADO.NET, you're going to have an object model somewhere anyway. Why not let LINQ to SQL handle it for you? I rather like being able to just do a query and get back the objects, ready to use.
I'd expect the performance to be fine - and where it isn't, you can tune it yourself or fall back to straight SQL. Using an ORM certainly doesn't remove the need for creating the right indexes etc, and you should usually check the SQL being generated for non-trivial queries.
It's not a panacea by any means, but I vastly prefer it to either making SQL queries directly or using stored procs.
I must say they are what you have been looking for. It takes some time getting used to it, but once you do you can't think of going back (at least for me).
Regarding linq vs. stored procedures, you can have poor performance on either if you build it wrong. I moved to linq to sql some stored procedures of a client that were awfully coded, so the time dropped from 20secs (totally unaceptable for a web app) to < 1 sec. And much much less code then the stored procedure solution.
Update 1: Also you get a lot of flexibility, as you can limit the columns of what you are getting and it will actually only retrieve that. On the stored procedure solution you have to define a procedure for each column set you are getting, even if the underlying queries are the same.
Just as an update, here are some links on the future of LINQ to SQL:
What is the Future of Linq to SQL
Has Microsoft confirmed their stance on LINQ to SQL end-of-life?
Is LINQ to SQL Dead or Alive?
As a comment in the last link states, LINQ to SQL isn't going to go away, just not "improved upon" at least by Microsoft. Take these comments and posts as you will, just be cautious in your development plans.
We switched over to LINQ2Entity over the Entity Framework environment recently. Before, we had basic SQLadapters. Since the database we are working with is rather small, I cannot comment on the performance of LINQ.
I must admit though, writing queries have become a lot easier, and the addition of Entities, does enable strong typing.

Is Hibernate good for batch processing? What about memory usage?

I have a daily batch process that involves selecting out a large number of records and formatting up a file to send to an external system. I also need to mark these records as sent so they are not transmitted again tomorrow.
In my naive JDBC way, I would prepare and execute a statement and then begin to loop through the recordset. As I only go forwards through the recordset there is no need for my application server to hold the whole result set in memory at one time. Groups of records can be feed across from the database server.
Now, lets say I'm using hibernate. Won't I endup with a bunch of objects representing the whole result set in memory at once?
Hibernate does also iterate over the result set so only one row is kept in memory. This is the default. If it to load greedily, you must tell it so.
Reasons to use Hibernate:
"Someone" was "creative" with the column names (PRXFC0315.XXFZZCC12)
The DB design is still in flux and/or you want one place where column names are mapped to Java.
You're using Hibernate anyway
You have complex queries and you're not fluent in SQL
Reasons not to use Hibernate:
The rest of your app is pure JDBC
You don't need any of the power of Hibernate
You have complex queries and you're fluent in SQL
You need a specific feature of your DB to make the SQL perform
Hibernate offers some possibilities to keep the session small.
You can use Query.scroll(), Criteria.scroll() for JDBC-like scrolling. You can use Session.evict(Object entity) to remove entities from the session. You can use a StatelessSession to suppress dirty-checking. And there are some more performance optimizations, see the Hibernate documentation.
Hibernate as any ORM framework is intended for developing and maintaining systems based on object oriented programming principal. But most of the databases are relational and not object oriented, so in any case ORM is always a trade off between convenient OOP programming and optimized/most effective DB access.
I wouldn't use ORM for specific isolated tasks, but rather as an overall architectural choice for application persistence layer.
In my opinion I would NOT use Hibernate, since it makes your application a whole lot bigger and less maintainable and you do not really have a chance of optimizing the generated sql-scripts in a quick way.
Furthermore you could use all the SQL functionality the JDBC-bridge supports and are not limited to the hibernate functionality. Another thing is that you have the limitations too that come along with each layer of legacy code.
But in the end it is a philosophical question and you should do it the way it fits you're way of thinking best.
If there are possible performance issues then stick with the JDBC code.
There are a number of well known pure SQL optimisations which
which would be very difficult to do in Hibernate.
Only select the columns you use! (No "select *" stuff ).
Keep the SQl as simple as possible. e.g. Dont include small reference tables like currency codes in the join. Instead load the currency table into memory and resolve currency descriptions with a program lookup.
Depending on the DBMS minor re-ordering of the SQL where predicates can have a major effect on performance.
If you are updateing/inserting only commit every 100 to 1000 updates. i.e. Do not commit every unit of work but keep some counter so you commit less often.
Take advantage of the aggregate functions of your database. If you want totals by DEPT code then do it in the SQL with " SUM(amount) ... GROUP BY DEPT ".

Resources