How to run statistical analyses on Oracle server using SAS - oracle

In order to take advantage of an Oracle server's vastly larger disk space and RAM, is it possible to run a SAS procedure (eg, proc glimmix or proc nlmixed) on a dataset stored on a server using the ODBC interface?
Or am I limited to extracting datasets to my PC via ODBC and not actually manipulating or analyzing the data with SAS while the data resides on the server?

At the end of the day, some of the work will have to be done by SAS on your PC, assuming you're doing anything complicated (like GLIMMIX would be). SAS (in particular 9.3 or newer) is pretty smart about making the database do as much work as possible; for example, even some PROC MEANS may execute fully on the database side.
However, this is only true to the extent that the procedure can be translated into database functionality without extraordinary measures. SAS isn't likely to perform a regression on the database side, since that's not native Oracle. The data has to make its way across the (likely limited) bandwidth, to some extent.
You can certainly do a lot to limit what you have to do in SAS. Any presummarization can be done in Oracle; any other data prep work prior to the actual PROC GLIMMIX can likely be done in Oracle. You can certainly give it a shot by simply using libname connections and doing something like
proc glimmix data=oracle.table ... options ... ;
run;
and seeing what happens - maybe it'll surprise you, or even me, in how much it handles in-database. It might bring it over locally, it might not.
You may want to consider asking a question with a simplified version of what you're doing, including example data, and simply asking if anyone has any ideas for improving performance. There's a lot of tweaking that can be done, and perhaps some of us here can help.

Related

query for a few hundred .db SQLite databases

I have been asked to create a simple program to submit user defined queries to SQLite databases (.db). I have not worked with the offline databases before and have a question about optimizing performance.
There are a few hundred .db files that I need to query. Is it quicker to attach them all to a single query using ATTACH, or join them all into a single database and work from there? My thoughts are that there will be some trade off between how much time it takes for inital set up versus the query speed. Is there perhaps a different method that would result in better performance?
I dont think it will matter, but this will be written with C# for a windows OS desktop.
Thanks!
The documentation says:
The number of simultaneously attached databases is limited to SQLITE_MAX_ATTACHED which is set to 10 by default. [...] The number of attached databases cannot be increased above 62.
So attaching a few hundred databases will be very quick because outputting an error message can be done really fast. ☺

SQL Server Express performance problems

Initiation
I have a SQL Server Express 2008 R2 running. There are ten users who read / write permanently to the same tables using Stored Procedures. They do this day and night.
Problem
The performance of the Stored Procedures is getting lower and lower with increasing database size.
A Stored Procedure call needs avg 10ms when the database size is about 200MB.
The same call needs avg 200ms when the database size is about 3GB.
So we have to cleanup the database once a month.
We already did index optimization for some tables with positive effects but the problem still exists.
Finally im not a SQL Server expert. Could you give me some hints to start getting rid of this performance problem?
Download and read Waits and Queues
Download and follow the Troubleshooting SQL Server 2005/2008 Performance and Scalability Flowchart
Read Troubleshooting Performance Problems in SQL Server 2005
The SQL Server Express Edition limitations (1GB memory buffer pool, only one socket CPU used, 10GB database size) are unlikely to be the issue. Application design, bad queries, excessive locking concurrency and poor indexing are more likely to be the problem. The linked articles (specially the first one) include methodology on how to identify the bottleneck(s).
This is MOST likely simple a programmer mistake - sounds like you simply do either have:
Non proper indexing on some tables. THis is NOT optimization - bad indices is like broken HTML for web people, if you have no index then basically you are not using SQL as it is supposed to be used, you should always have proper indexes.
Not enough hardware, such as RAM. yes, it can manage a 10gb database, but if your hot set (the suff accessed all the time) is 2gb and you have only 1gb it WILL hit disc more often than it needs.
Slow discs, particularly a express problem because most people do not bother to get a proper disc layout. THen they run a sQL database againnst a slow 200 IOPS end user disc where - depending on need - a SQL database wants MANY spindles or an SSD (typical SSD these days has 40.000 IOPS).
That is it at the end - plus possibly really bad SQL. Typical filter error: somefomula(field) LIKE value, which means "forget your index, please, make a table scan and calculate someformula(field) before checking".
First, SQL Server Express is not the best edition to your requierement. Get a Developer's Edition to test it. Its exactly like the Enterprise but free if you dont use on "production".
About the performance, there are so many things involved here, and you can improve it using, since indexes until partitioning. We need more info to provide help
Before Optimizing your SQL queries, you need to find the hotspot of the queries. Usually you can use SQL Profiler to do this on SQL Server. For Express edition, there's no such tool. But you can walk around by using a few queries:
Return all renct query:
SELECT *
FROM sys.dm_exec_query_stats order by total_worker_time DESC;
Return only top time consuming queries:
SELECT total_worker_time, execution_count, last_worker_time, dest.TEXT
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY total_worker_time DESC;
Now you should know which query needs to be optimized.
May be poor indexes,Poor design of database, may not apply normalization,unwanted column indexes,poor queries which take much time to execute.
SQLExpress is built for testing purposes and the performance is directly limited by Microsoft, If you use it in a production environment you may want to get a license for SQL Server.
Have a look here SQL Express for production?

Oracle PL/SQL stored procedure compiler vs PostgreSQL PGSQL stored procedure compiler

I noticed that Oracle takes a while to compile a stored procedure but it runs much faster than its PostgreSQL PGSQL counterpart.
With PostgreSQL, the same procedure (i.e. it's all in SQL-92 format with functions in the select and where clauses) takes less time to compile but longer to run.
Is there a metrics web site where I can find side by side Oracle vs PostgreSQL performance metrics on stored procedures, SQL parsing, views and ref_cursors?
Is the PostgreSQL PGSQL compiler lacking in optimization routines? What are the main differences between how Oracle handles stored procedures and how PostgreSQL handles them?
I'm thinking of writing a PGSQL function library that will allow PL/SQL code to be compiled and run in PGSQL. (i.e. DECODE, NVL, GREATEST, TO_CHAR, TO_NUMBER, all
PL/SQL functions) but I don't know if this is feasible.
This is a hard question to answer fully because it can run pretty deep, but I'll add my 2 cents towards a high level contribution to an answer. First off I really like PostgreSQL and I really like Oracle. However PL/SQL is so much more deep of a language/environment than PL/PGSQL provides or really any other database engines procedure language that I have ever ran into for that matter. Oracle since at least 10G uses a optimizing compiler for PL/SQL. Which most likely contributes to why it compiles slower in your use cases. PL/SQL also has native compilation. You can compile the PL/SQL code down to machine code with a simple compiler directive. This is good for computation intensive logic not for SQL logic. My point of all this is Oracle has spent lots of resources on making PL/SQL a real treat from a functionality standpoint and a performance stand point and I only touched on two of many examples.PL/SQL is light years ahead of PG/SQL is what it sums up to and I don't imagine as nice as PG/SQL is catching up to Oracle any time soon.
I doubt you will find a side by side comparison, though I think this would be really nice. The effort to do so wouldn't probably be worth most people's time.
Also I wouldn't re-write what is already out there.
http://www.pgsql.cz/index.php/Oracle_functionality_(en)
There is no official benchmark for stored procedures like TPC for SQL (see tpc.org). I'm also not aware of any database application with specific PL/SQL and pgSQL implementations which could be used as benchmark.
Both languages are compiled and optimized into intermediate code and then ran by an interpreter. PL/SQL can be compiled to machine code which doesn't improve overall performance as much as one might think, because the interpreter is quite efficient and typical applications spend most time in the SQL engine and not in the procedural code (see AskTom article).
When procedural code calls SQL it happens just like in any other program, using statements and bind parameters for input and output. Oracle is able to keep these SQLs "prepared" which means that the cursors are ready to be used again without an additional SQL "soft parse" (usually a SQL "hard parse" happens only when the database runs a SQL for the first time since it was started).
When functions are used in select or where clauses, the database has to switch back-and-forth between the SQL and procedural engines. This can consume more processing time then the code itself.
A major difference between the two compilers is that Oracle maintains a dependency tree, which causes PL/SQL to automaticly recompile when underlying objects are changed. Compilation errors are detected without actually running the code, which is not the case with Postgres (see Documentation)
Is there a metrics web site where I can find side by side Oracle vs
PostgreSQL performance metrics on stored procedures, SQL parsing,
views and ref_cursors?
Publicly benchmarking Oracle performance is probably a breach of your licensing terms. If you're a corporate user make sure legal check it out before you do anything like that.
I'm thinking of writing a PGSQL function library that will allow
PL/SQL code to be compiled and run in PGSQL. (i.e. DECODE, NVL,
GREATEST, TO_CHAR, TO_NUMBER, all PL/SQL functions) but I don't know
if this is feasible.
Do check the manual and make sure you need all of these, since some are already implemented as built-in functions. Also, I seem to recall an add-on pack of functions to improve Oracle compatibility.
Lastly, don't forget PostgreSQL offers choices on how to write functions. Simple ones can be written in pure SQL (and automatically inlined in some cases) and for heavy lifting, I prefer to use Perl (you might like Python or Java).

Transfer large amount of data from DB2 to Oracle?

I need every day to transfer large amounts of data (about several millions records) from db2 to oracle database. Could u suggest the best perfoming method to do that?
DB2 will allow you to select Oracle as a replication target. This is probably the most efficient and easiest way to do it every day, it also removes the "intermediate container" objection that you have.
See this introduction (and more from the documentation online) for more.
If you're only talking speed then do this.
Time how long it takes to dump the DB2 data to a flatfile.
Time how long it takes to suck that flatfile into Oracle.
there's your baseline and it's free. If you can beat that with an ETL tool, then decide if the cost of the tool is worth it.
For a simple ETL like this, there's little that I've found that can beat this on time.
The downside of this is just general file manipulation BS...
how do you know when to read from the file
how do you know that you got all the rows
how do you resume when something breaks
All those little "niceties" usually get paid for with speed. Of course, I'm joking a bit. They aren't always a little nicety. They are often essential for a smooth running process.
Dump out data to delimited file. Load to Oracle via DIRECT load sqlldr job. Not sexy, but fast. If you can be on same subnet that would be best (pushing data across the network is not what you want). Set this up on a cron, add email alerts on errors

oracle metrics monitoring and reporting in real time

I am stress testing a database table
I am looking for any software that can connect to my database and show me some metrics like no of rows in a table, time for inserts , inserts/time, table fragmentation[logical/physical] etc .
It would be great if the reporting tool can do the following:
1] Report in real time or atleast after some interval so that I do not have to wait for test to finish to get first look at the data
2] Ability to do stuff with the data later, like get 99.99 percentile, avg etc.
Is mostly freely available :)
Does anyone have any suggestion of something I can use with my Oracle table. Any pointers would be great.
I can actually write scripts to logg stuff like select count(*) etc .. but then I will have to spend a lot of time parsing and changing the data reporting rather than the tests.
I think some intelligent thing might already be out there ??
Thanks
Edit:
I am looking at a piece of design for
a new architecture
The tests are
"comparison" tests for different
designs and hence as far as I do it
on same hardware and same schema etc
they are comparable to some
granularity.
I want to monitor index
fragmentation, and response times
etc.
If you think there are other
things that can change please let me
know. I am trying to roll back the
table to particular state[basically
truncate] for each new iteration of
the test
First, Oracle has built-in functionality for telling you the number of rows in a table (either use count(*) or search 'gather statistics oracle' for another option).
But "stress testing a table" sounds to me like you're going down the wrong path. Most of the metrics you're mentioning ("time for inserts , inserts/time, table fragmentation[logical/physical] etc") are highly dependent on many factors:
what OS Oracle's running on
how the OS is tuned (i.e. other services running)
how the specific Oracle instance is configured
what underlying storage architecture Oracle's using (and how tablespaces are configured)
what other queries are being executed in the database at the exact same time as your test
But NONE of them would be related to the table design itself.
Now, if you're wondering if your normalized (or de-normalized) table schema is hurting your application, that's another matter. As is performance being degraded by improper/unneeded/missing indexes, triggers, or a host of other problems.
But if you really want an app that will give you real-time monitoring, check out Quest Software's Spotlight on Oracle. But it's definitely not free.
Just to add to the other comments, I believe what you really want is to stress test the queries you're running and not the table. The table is just a bunch of data blocks on a disk and the query is what will make the difference in performance as far as development is concerned. That will tell you if you need different indexes or need to redesign the query.
On the other hand, if you're looking at it as a DBA or system administrator, you're probably more interested in OS level statistics especially disk latency, memory paging, and CPU utilization.
All this is available in the enterprise manager which is my primary tuning tool for development and DBA. If you don't have that, read up on using sql_trace to profile your queries and your OS specific documentation on how to get those stats.

Resources