Related
There are many relational algebra packages (arel, axiom, alf) which generate SQL from an abstract representation of a query.
Are there any libraries that allow you to go the other way - from SQL to a relational algebra?
No, I wouldn't count on it.
SQL is a horrendous language, parsing it is an immense task, and parsing it for the purpose of capturing the original algebraic intent is considered infeasible by just about the whole world, as far as I know.
And then I haven't even begun to mention the various ways in which vendors turn it into something that is actually no less than a completely proprietary language, despite a possible superficial resemblance to what is supposed to be a standard.
And even if such a package existed, what would you do with the output you obtained from it ?
Apache Calcite might be what you're looking for.
Which paradigm is better for design and analysis of algorithms?
Which is faster? Because I have a subject called Design and Analysis of Algorithms in university and have a time limit for programs. Is OOP slower than Procedure programming? Or the time difference is not big?
Object-Oriented programming isn't particularly relevant to algorithms. Procedural programming you will need, but as far as algorithms are concerned, object-oriented programming is just another way to package up procedural programming. You have methods instead of functions and classes instead of records/structs, but the only relevant difference is run-time dispatch, and that's just a declarative way to handle a run-time decision that could have been handled some other way.
Object-Oriented programming is more relevant to the larger scale - design patterns etc - whereas algorithms are more relevant to the smaller scale involving a small number (often just one) of procedures.
IMO algorithms exist separat from the OO or PP issue.
Neither OO or PP are 'slow', in either design-time or program performance, they are different approaches.
I would think that Functional Programming would produce cleaner implementation of algorithms.
Having said that, you shouldn't see much of a difference whatever approach you take. An algorithm can be expressed in any language or development paradigm.
Update: (following comments)
Apparently functional programming does not lend itself to implementing algorithms as well as I thought it may. It has other strengths and I mostly mentioned it for completeness sake, as the question only mentioned OOP (object oriented programming) and PP (procedural programming).
the weak link is liekly to be your knowledge - what language & paradigm are you most comfortable with. use that
For design, analysis and development: definitely OOP. It was invented solemnly for the benefit of designers and developers.
For program runtime execution: sometimes PP is more efficient, but often OOP gets reduced to plain PP by the compiler, making them equivalent.
The difference (in execution time) is marginal at best.
Note that there is a more important factor than sheer performance: OOP provide the programmer with better means to organize his code which results in programs that are well structured, understandable, and more reliable (less bugs).
Object oriented programming abstracts many low level details from the programmer. It is designed with the goal
to make it easier to write and read (and understand) programs
to make programs look closer to the real world (and hence, easier to understand).
Procedural programming does not have many abstractions like objects, methods, virtual functions etc.
So, talking about speed: a seasoned expert who knows the internals of how an object oriented system will work can write a program that runs just as fast.
That being said, the speed advantage achieved by use PP over OOP will be very marginal. It boils down to which way you can write programs comfortably.
EDIT:
An interesting anecdote comes to my mind: in the Microsoft Foundation Classes, message passing from one object to the other was implemented using macros that looked like BEGIN_MESSAGE_MAP() and END_MESSAGE_MAP(), and the reason was that it was faster than using virtual functions.
This is one case where the library developers have used OOP, but have knowingly sidestepped a performance bottleneck.
My guess is that the difference is not big enough to worry about, and the time limit should allow using a slower language, since the algorithm used would be what's important.
The purpose of the time limit should IMO be to get you to avoid using for example a O(n3) algorithm when there is a O(n log n)
To make writing code easy and less error prone, you need a language that supports Generics - such as C++ with STL or Java with the Java Collections Framework. If you are implementing an algorithm against a deadline, you may be able to save time by not providing your algorithm with a nice O-O or Generic interface, so making the code you write yourself entirely procedural.
For run time efficiency, you would probably be best writing everything in procedural C - see e.g. the examples in "The Practice Of Programming" - but it will take a lot longer to write, and you are more likely to make mistakes. This also assumes that all the building blocks you need are available in their most up to date and efficient from in procedural C as well, which is quite an assumption these days. Most likely making use of the STL or the JFC will in practice save you cpu time as well as development time.
As for functional languages, I remember hearing functional programming enthusiasts point out how much easier to use their languages were than the competition, and then observing that those members of the class who chose a functional language were still struggling when those who wrote in Fortran 77 had finished and gone on to draw graphs of the performance of their program. I see that the claims of the functional programming community have not changed. I do not know if the underlying reality has.
Steve314 said it well. OOP is more about the design patterns and organization of large applications. It also lets you deal with unknowns better, which is great for user apps. However, for analyzing algorithms, most likely you are going to be thinking functionally about what you want to do. In that case, I'd stick to more simple PP and not try to create a fully OO design, when you care about the algorithm. I'd want to work with C or Matlab (depending on how math intensive the algorithm is). Just my opinion on it.
I once adapted the Knuth-Morris-Pratt string search algorithm so that I could have an object that would take a character at a time and return a match/no-match status. It wasn't a straight-forward translation.
At what point does LINQ become too terse and procedural techniques resorted to?
Terseness is in the eye of the beholder. When you're not comfortable with the code anymore, then it's time to refactor it a bit. The refactoring could be swapping to some procedural bits, or breaking your linq queries apart, or whatever it takes to make it understandable again. As long as the intent of the code is obvious, it shouldn't matter how terse it is or what techniques are used to achieve the end goal :-)
Any language construct, not just LINQ, is too terse when the majority of people on your group cannot quickly understand what a line of code is doing.
When you can no longer do what is required to be done (easily).
Given a straightforward user-driven, high traffic web application (no fancy reporting/BI):
If my utmost goal is performance (not ease of maintainability, ease of queryability, etc) I would surmise that in most cases, a roll-yourown DAL would be the best choice.
However, if i were to choose Linq2SQL or NHibernate, roughly what kind of performance hit would we be talking about? 10%? 20%? 200%? Of the two, which would be faster?
Does anyone have any real world numbers that could shed some light on this? (and yes, I know Stackoverflow runs on Linq2SQL..)
If you know your stuff (esp. in SQL and ADO.NET), then yes - most likely, you'll be able to create a highly tweaked, highly optimized custom DAL for your particular interest and be faster overall than a general-purpose ORM like Linq-to-SQL or NHibernate.
As to how much - that's really really hard to say without knowing your concrete table structure, data and usage patterns. I remember Rico Mariani did some Linq-to-SQL vs. raw SQL comparisons, and his end result was that Linq-to-SQL achieve over 90% of the performance of a highly skilled SQL programmer.
See: http://blogs.msdn.com/ricom/archive/2007/07/05/dlinq-linq-to-sql-performance-part-4.aspx
Not too shabby in my book, especially if you factor in the productivity gains you get - but that's the big trade-off always: productivity vs. raw performance.
Here's another blog post on Entity Framework and Linq-to-SQL compared to DataReader and DataTable performance.
I don't have any such numbers for NHibernate, unfortunately.
In two high traffic web apps refactoring a ORM call to use a stored procedure from ado.net only got us about 1-2% change in CPU and time.
Going from an ORM to a custom DAL is an exercise in micro optimization.
I'm starting work on a program which is perhaps most naturally described as a batch of calculations on database tables, and will be executed once a month. All input is in Oracle database tables, and all output will be to Oracle database tables. The program should stay maintainable for many years to come.
It seems straight-forward to implement this as a series of stored procedures, each performing a sensible transformation, for example distributing costs among departments according to some business rules. I can then write unit tests to check if the output of each transformation is as I expected.
Is it a bad idea to do this all in PL/SQL? Would you rather do heavy batch calculations in a typical object oriented programming language, such as C#? Isn't it more expressive to use a database centric programming language such as PL/SQL?
You describe the following requirements
a) Must be able to implement Batch Processing
b) Result must be maintainable
My Response:
PL/SQL was designed to achieve just what you describe. It's also important to note that there are efficiencies in PL/SQL that are not available in other tools. An stored procedure language put the processing next to the data - which is where batch processing ought to sit.
It easy enough to write poorly maintainable code in any language.
Having said the above, your implementation will depend on the available skills, a proper design and adherence to good quality processes.
To be efficient your implementation must process data in batches ( select in batches and insert/update in batches ). The danger with an OO approach is that it is easy to be led towards a design that processes data row by row. This type of approach contains unnecessary overhead, and will be significantly less efficient than a design that processes data in batches of rows.
It is possible to use both approaches successfully.
Mathew Butler
Something for other commenters to note - the question is about PL/SQL, not about SQL. Some of the answers have obviously been about SQL, not PL/SQL. PL/SQL is a fully functional database language, and it's mature as well. There are some shortcomings, but for the type of thing the poster wants to do, it's very good.
No, it isn't necessarily a bad idea. If the solution seems straightforward to you and allows you to test and verify each process, its sounds like it could be a good idea. OO platforms can be (though they don't have to be) bad for large data sets, as object creation and overhead can kill performance.
Oracle designed PL/SQL with problems like yours in mind, if there is sufficient corporate knowledge of the database and PL/SQL this seems like a reasonable solution. Keep large batch sets in mind, as each call from PL/SQL to the actual SQL engine is a context switch, so single record processes should be batched together where possible to improve performance.
Just make sure you somehow log what is happening while it's working. Otherwise you'll have a black box and if it gets stuck somewhere for hours, you'll be wondering whether to stop it or let it work 'a little bit more'.
PL/SQL is a mature language that integrates well with SQL. With each version of Oracle it becomes more and more powerful.
Also starting from Oracle 11, PL/SQL compiles to machine code by default.
Normally I say put as little in PL/SQL as possible - it is typically a lot less maintainable - at one of my last jobs I really saw how messy and hard to work with it could get.
However, since it is batch processing - and since the input and output are both the DB - it makes good sense to put the logic into PL/SQL - to minimize "moving parts". However, if it were business logic - or components used by other pieces of your system - I would say don't do it..
I wrote a huge amount of batch processing and report generation programs in both PL/SQL and ProC for one project. They generally preferred I write in PL/SQL as their own developers who would maintain in the future found that easier to understand than ProC code.
It ended up being only the really funky processing or reports that ended up being written in Pro*C.
It is not necessary to write these as stored procedures as other people have alluded to, they can be just script files that are run as necessary, kind of like a shell script. Make source code revision control and migration between test and production systems a heck of a lot easier, too.
As long as the calculations you need to perform can be adequately AND readably captured in PL/SQL, then using only PL/SQL would make the most sense.
The real catch is maintainability -- it's very easy to write unmaintainable SQL, if only because every RDBMS has a different syntax and different function set once you step outside of simple SQL DML, and no real standards for formatting. commenting, etc.
I've created batch programs using C# and SQL.
Pros of C#:
You've got the full library of .NET and all the power of an OO
language.
Cons of C#:
Batch program and db separate - this means, you'll have to manage your batch program separate from the database.
You need to escape all that dang sql code
Pros of SQL:
Integrates nicely with the DBMS. If this job only manipulates the database, it would make sense to include it with the database. You end up with a single db and all of its components in one package.
No need to escape sql code
keeping it real - you are programming in your problem domain
Cons of SQL:
Its SQL and I personally just don't know it as well as C#.
In general, I would stick with using SQL because of the Pros outlined above.
This is a loaded question :)
There's a couple of database programming architecture designs you should know of, and what their costs/benefits are.
2 Tier generally means you have a client connecting to a DB, issuing direct SQL calls.
3 Tier generally means you have an "application server" that is issuing direct SQL calls to the DB, but the client is talking to the app server. Generally, this affords "scaling out".
Finally, you have 2 1/2 tiered apps that employ a 2 Tier like format, only the work is compartmentalized within stored procedures.
Your process sounds like a "back office" kind of thing, and clients/processes just need results that are being aggregated and cached on a once a month basis.
That is, there is no agent that connects, and connects often, and says "do these calculations". Instead you allude to a process that happens once in a while, and you can get away with non-real time.
Therefore, given those requirements, I'd say that generally, it will be faster to be closer to the data, and let SQL server do all the calculations.
I think you'll find that proximity to the data will serve you well.
However, in performing these calculations, you may find that some calculations are not amenable to SQL Servers. Take for example calculating the accrued interest of a bond, or any fixed income instrument. Not very pretty in SQL, and much more suited for a richer programming language. However, if you just have simple averages and other relatively sane aggregates, I'd stick to stored procedures, on the SQL side.
So again, there's not enough information as to the nature of your calculations, or what your house mandates in terms of SQL capabilities of devs for support, or what your boss says...but since I know my way around SQL, and like to stay close to the data, I'd stay pure SQL/Stored Procedures for a task like this.
YMMV :)
It's not usually more expressive because most stored procedure languages suck by design. But it will probably run faster than in an external app.
I guess it boils down to how familiar you are with PL/SQL, how much time you have to write this, how important is performance and if you can reasonably expect maintainers to be familiar enough with PL/SQL to maintain a big program written in it.
If speed is not relevant and maintainers will probably be not PL/SQL proficient, you might be better using a 'traditional' language.
You could also use a hybrid approach, where you use PL/SQL to generate intermediate data (say, table joins and sums or whatever) and a separate application to control flow and check values and errors.