We have created some new SSAS Tabular models which fetch data directly from Oracle. But after some testing, we found that with real customer data (with few millions of rows of data), the processing times go close to 4 hours. Our goal is to keep them under about 15mins (Due to existing system performance). We fetch from Oracle tables so query performance is not the bottleneck.
Are there any general design guides/best practices to handle such a scenario?
Check your application side array fetch size as you could be experiencing network latency.
** Array fetch size note:
As per the Oracle documentation the Fetch Buffer Size is an application side memory setting that affects the number of rows returned by a single fetch. Generally, you balance the number of rows returned with a single fetch (a.k.a. array fetch size) with the number of rows needed to be fetched.
A low array fetch size compared to the number of rows needed to be returned will manifest as delays from increased network and client side processing needed to process each fetch (i.e. the high cost of each network round trip [SQL*Net protocol]).
If this is the case, on the Oracle side you will likely see very high waits on “SQL*Net message from client”. [This wait event is posted by the session when it is waiting for a message from the client to arrive. Generally, this means that the session is just sitting idle, however, in a Client/Server environment it could also mean that either the client process is running slow or there are network latency delays. The database performance is not degraded by high wait times for this wait event.]
As I like to say: “SQL*Net is a chatty protocol”; so even though Oracle may be done with its processing of the query, excessive network round-trips results in slower response times on the client side. One should expect that low array fetch size may be contributing to the slowness if the elapsed time to get the data into the application is much longer than the elapsed time for the DB to run the SQL; in this case app side processing time can also be a factor contributing to the slowness [you can look into app specific ways to troubleshoot/tune app side processing].
Array fetch size is not an attribute of the Oracle account nor is it an Oracle side session setting. Array fetch size can only be set at the client; there is no DB setting for the array fetch size the client will use. Every client application has a different mechanism for specifying the array fetch size:
Informatica: ?? config. file param ??? setting at the connection or
result set level??
Cognos http://www-01.ibm.com/support/docview.wss?uid=swg21981559
SQL*Plus: set arraysize n
Java/JDBC: setFetchSize(int rows) /* method in Statement,
PreparedStatement, CallableStatement, and ResultSet objects */
Properties object put method “defaultRowPrefetch”
http://download.oracle.com/otn_hosted_doc/jdeveloper/905/jdbc-javadoc/oracle/jdbc/OracleDriver.html Another link to Oracle JDBC DefaultRowPrefetch
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html
.Net Oracle .Net Developers Guide The FetchSize property represents
the total memory size in bytes that ODP.NET allocates to cache the
data fetched from a database round-trip. The FetchSize property can
be set on the OracleCommand, OracleDataReader, or OracleRefCursor
object, depending on the situation. It controls the fetch size for
filling a DataSet or DataTable using an OracleDataAdapter.
ODBC driver: ?? something like: SetRowsetSize
Related
I am just curious of ways to better tune for speed bulk inserts via apache nifi. I am just curious if a different driver or other configurations could speed up the process. Any inputs or references to resources would be greatly appreciated!
This is my current flow with configurations included in pictures, Source DB is Oracle, Destination DB is IBM db2 z/Os:
I think you have a few things working against you:
You probably have low concurrency set on the PutDatabaseRecord processor.
You have a very large fetch size.
You have a very large record-per-flowfile count.
From what I've read in the past, the fetch size controls how many records will be pulled from the query's remote result in each iteration. So in your case, it has to pull 100k records before it will even register data being ready. Try dropping it down to 1k records for the fetch and experiment with 100-1000 records per flowfile.
If you're bulk inserting that flowfile, you're also sending over 100k inserts at once.
I was reading some interesting stuff about JDBC pre-fetch size, but I cannot find any answer to a few questions:
The Java app I'm working on is designed to fetch rows from cursors opened and returned by functions within PL/SQL packages. I was wondering whether the pre-fetch default setting of the JDBC driver is actually affecting the fetching process or not, being the SQL statements parsed and opened within the Oracle database. I tried setting the fetch size on the JBoss configuration file and printing the value taken from the method setFetchSize(). The new value (100, just for testing purpose) was returned but I see no difference in how the application performs.
I also read this pre-fetching is enhancing performance by reducing the number of round-trips between the client and the database server, but how can I measure the number of round trips in order to verify and quantify the actual benefits I can eventually get by tuning the pre-fetch size?
Yes the Oracle JDBC thin driver will use the configured prefetch size when fetching from any cursor whether the cursor was opened by the client or from within a stored proc.
The easiest way to count the roundtrips is to look a the sqlnet trace. You can turn on sqlnet tracing on the server-side by adding trace_level_server = 16 to your sqlnet.ora file (again on the server as JDBC thin doesn't use sqlnet.ora). Each foreground process will then dump the network traffic in a trace file. You can then see the network packets exchanged with the client and count the roundtrips. By default the driver fetches rows 10 by 10. But since you have increased the fetch size to 100 it should fetch up to that number of rows in one single roundtrip.
Note that unless your client is far away from your server (significant ping time) then the cost of a roundtrip won't be high and unless you're fetching a very high number of rows (10,000s) you won't see much difference in performance in increasing the fetch size. The default 10 usually works fine for most OLTP applications. In your client is far away then you can also consider increasing the SDU size (maximum size of a sqlnet packet). The default is 8k but you can increase it up to 2MB in 12.2.
I'm using DBeaver to connect to an Oracle database. Database connection and table properties view functions are working fine without any delay. But fetching table data is too slow(sometimes around 50 seconds).
Any settings to speed up fetching table data in DBeaver?
Changing following settings in your oracle db connection will be faster fetching table data than it's not set.
Right click on your db connection --> Edit Connection --> Oracle properties --> tick on 'Use RULE hint for system catalog queries'
(by default this is not set)
UPDATE
In the newer version (21.0.0) of DBeaver, many more performance options appear here. Turning on them significantly improves the performance for me
I've never used DBeaver, but I often see applications which use too small an "array fetch size"**, which often poses fetch issues.
** Array fetch size note:
As per the Oracle documentation the Fetch Buffer Size is an application side memory setting that affects the number of rows returned by a single fetch. Generally you balance the number of rows returned with a single fetch (a.k.a. array fetch size) with the number of rows needed to be fetched.
A low array fetch size compared to the number of rows needed to be returned will manifest as delays from increased network and client side processing needed to process each fetch (i.e. the high cost of each network round trip [SQL*Net protocol]).
If this is the case, you will likely see very high waits on “SQLNet message from client” [in gv$session or elsewhere].
SQLNet message from client
This wait event is posted by the session when it is waiting for a message from the client to arrive. Generally, this means that the session is just sitting idle, however, in a Client/Server environment it could also means that either the client process is running slow or there are network latency delays. The database performance is not degraded by high wait times for this wait event.
We have a TDBGrid that connected to TClientDataSet via TDataSetProvider in Delphi 7 with Oracle database.
It goes fine to show content of small tables, but the program hangs when you try to open a table with many rows (for ex 2 million rows) because TClientDataSet tries to load the whole table in memory.
I tried to set "FetchOnDemand" to True for our TClientDataSet and "poFetchDetailsOnDemand" to True in Options for TDataSetProvider, but it does not help to solve the problem. Any ides?
Update:
My solution is:
TClientDataSet.FetchOnDemand = T
TDataSetProvider.Options.poFetchDetailsOnDemand = T
TClientDataSet.PacketRecords = 500
I succeeded to solve the problem by setting the "PacketRecords" property for TCustomClientDataSet. This property indicates the number or type of records in a single data packet. PacketRecords is automatically set to -1, meaning that a single packet should contain all records in the dataset, but I changed it to 500 rows.
When working with RDBMS, and especially with large datasets, trying to access a whole table is exactly what you shouldn't do. That's a typical newbie mistake, or a borrowing from old file based small database engines.
When working with RDBMS, you should load the rows you're interested in only, display/modify/update/insert, and send back changes to the database. That means a SELECT with a proper WHERE clause and also an ORDER BY - remember row ordering is never assured when you issue a SELECT without an OREDER BY, a database engine is free to retrieve rows in the order it sees fit for a given query.
If you have to perform bulk changes, you need to do them in SQL and have them processed on the server, not load a whole table client side, modify it, and send changes row by row to the database.
Loading large datasets client side may fali for several reasons, lack of memory (especially 32 bit applications), memory fragmentation, etc. etc., you will flood the network probably with data you don't need, force the database to perform a full scan, maybe flloding the database cache as well, and so on.
Thereby client datasets are not designed to handle millions of billions of rows. They are designed to cache the rows you need client side, and then apply changes to the remote data. You need to change your application logic.
I am running queries against an Oracle 10g with JDBC (using the latest drivers and UCP as DataSource) in order to retrieve CLOBs (avg. 20k characters). However the performance seems to be pretty bad: the batch retrieval of 100 LOBs takes 4s in average. The operation is also neither I/O nor CPU nor network bound judging from my observations.
My test setup looks like this:
PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");
dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");
final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);
j.query("SELECT bigClob FROM ...",
new RowCallbackHandler() {
public void processRow(final ResultSet rs) throws SQLException {
String result = handler.getClobAsString(rs, "bigClob");
}
});
}
I experimented with the fetch sizes but to no avail. Am I doing something wrong? Is there a way to speed up CLOB retrieval when using JDBC?
The total size of the result set is in the ten thousands - measured over the span of the whole retrieval the initial costs
Is there an Order By in the query? 10K rows is quite a lot if it has to be sorted.
Also, retrieving the PK is not a fair test versus retrieving the entire CLOB. Oracle stores the table rows with probably many in a block, but each of the CLOBs (if they are > 4K) will be stored out of line, each in a series of blocks. Scanning the list of PK's is therefore going to be fast. Also, there is probably an index on the PK, so Oracle can just quickly scan the index blocks and not even access the table.
4 seconds does seem a little high, but it is 2MB that needs to be possible read from disk and transported over the network to your Java program. Network could be an issue. If you perform an SQL trace of the session it will point you at exactly where the time is being spent (disk reads or network).
My past experience of using oracle LOB type data to store large data has not been good. It is fine when it is under 4k since it store it locally like varchar2. Once it is over 4k, you start seeing performance degrade. Perhaps, things may have improved since I last tried it a couple of years ago, but here are the things I found in the past for your information:
As clients need to get LOBs via oracle server, you may consider the following interesting situation.
lob data will compete limited SGA
cache with other data type if oracle
decide to cache it. As clob data are
general big, so it may push other
data
lob data get poor disk read if
oracle decide not to cache it, and
stream the data to the client.
fragmentation is probably something
that you haven't encountered yet. You will see if your applications delete lobs, and oracle tries to reuse the lob. I don't know if oracle support online defragmenting the disk for lob (they have for indexes, but it takes long time when we tried it previous).
You mentioned 4s for 100 lobs of avg 20k, so it's 40ms per lobs. Remember each lob needs to have to retrieved via separate Lob locater (it is not in the result set by default). That is an additional round trip for each lob, I assume (I am not 100% sure on this since it was a while ago) If that is the case, I assume that will be at least 5ms extra time per round trip in serial order, right? If so, your performance is already first limited by sequential lob fetches. You should be able to verify this by tracking the time spent in sql execution vs lob content fetching. Or you can verify this by excluding the lob column as suggested by the previous answer in the post, which should tell you if it is lob related.
Good luck
I had a similar issue and found the JDBC Lobs making a network call when accessin the lobs.
As of Oracle 11.2g JDBC Driver you can use a prefetch.
This speeded up access by 10 times...
statement1.setFetchSize(1000);
if (statement1 instanceof OracleStatement) {
((OracleStatement) statement1).setLobPrefetchSize(250000);
}
Thanks for all the helpful suggestions. Despite being flagged as answer to the problem my answer is that there seems to be no good solution. I tried using parallel statements, different storage characteristics, presorted temp. tables and other things. The operation seems not to be bound to any characteristic visible through traces or explain plans. Even query parallelism seems to be sketchy when CLOBs are involved.
Undoubtedly there would be better options to deal with with large CLOBs (especially compression) in an 11g environment but atm. I am stuck with 10g.
I have opted now for an additional roundtrip to the database in which I'll preprocess the CLOBs into a size optimized binary RAW. In previous deployments this has always been a very fast option and will likely be worth the trouble of maintaining an offline computed cache. The cache will be invalided and update using a persistent process and AQ until someone comes up with a better idea.