I am using vs 2017 enterprise for development and an internal Azure run time environment to run the ssis packages. I am reading a ~1.5 million row oracle table and writing into an MS Sql DAN database. Can anyone explain the performance characteristics I describe below? It appears that the OLEDB Oracle driver is very efficient but then it runs terribly with the OLEDB SSIS task. Here are the run time characteristics:
ADO.net Task with Oracle Ado.net driver: ms sql table takes ~32 minutes to load
ADO.net Task with an Oracle OLEDB driver: ms sql table takes ~5 minutes to load (the winner)
OLEDB Task with an Oracle OLEDB driver: ms sql table take ~20 minutes to load
Ok, I think I figured this out. I had been just taking the MS Sql write side for granted but, in my environment, there are two drivers you can specify in an OLEDB Destination Task for MS Sql, SQLOLEDB.1 and SQLNCLI11.1 . The job that writes in 5 minutes is using SQLOLEDB.1 to write to MS Sql. The 3rd job that takes ~20 minutes is using SQLNCLI11.1 . Ironically SQLOLEDB.1 is older and many organizations are deprecating it even though it is much faster. I will be able to prove my suspicions soon and will report the final result.
Related
We have a legacy process that runs on SSIS 2016 on Windows Server 2016, executes custom queries against databases on remote servers, pulls the results (thousands or sometimes millions of records) and stores them in a local SQL Server database. These other databases are on DB2 and Oracle 19c.
This process has always connected using an OLE DB driver and a data flow with OLE DB source and destination components. It also has always been slow.
Because of some article we read recently talking about how OLE DB transfers only 1 record at a time, but with ADO.NET this network transfer could be done in batches (is this even true?), we decided to try to use an ADO.NET driver to connect to DB2 and replace the OLE DB source and destination components by ADO.NET components.
The transfer we were using as test case, which involved 46 million records, basically flew and we could see it bring down around 10K records at a time. Something that used to run in 13 hours ran in 24 minutes with no other changes. Some small tweaks in the query allowed us to bring that time even lower to 11 minutes.
This is obviously major and we want to be able to replicate it with our Oracle data sources. Network bandwidth seems to have been the main issue, so we want to be able to transfer data from Oracle 19c to our SQL Server 2016 databases using SSIS in batches, but want to ask the experts what the best/fastest way to do this is.
Is Microsoft Connector for Oracle the way to go as far as driver? Since we're not on SQL Server 2019, this article says we also need to install the Oracle Client and Microsoft Connector Version 4.0 for Oracle by Attunity. What exactly is the Oracle Client? Is it one of these? If so, which one, based on our setup?
Also, should we use ADO.NET components in the data flow just like we did with DB2? In other words, is the single record vs. record batches difference driven by the driver used to connect, the type of components in the data flow or both need to go hand in hand for this to work?
Thanks in advance for your responses!
OLEDB connections are not slow by themselves - it's a matter or what features the driver has available to it. It sounds like the ADO.NET driver for DB2 allows bulk insert and the OLEDB one does not.
Regarding Oracle, the attunity driver is the way to go. You'll need to install the oracle driver as well. The links that you have look correct to me but I don't have access to test.
Also, please note that dataflows will batch data by default in increments of the buffer size. 10k rows for example.
Situation
Same query and same volume on a new server (same hardware specs, processors, RAM, disk SSD, etc...) on SQL Server 2016 runs in 8 seconds and on SQL Server 2019 more than 3 hours.
Step by step
Installed a new SQL Server 2019 database on a new server, to be the new production environment. Same number of processors, same memory, SSD disks, data in one disk, logs on other, etc ....
Migrated the tables, views, stored procedures, the data, the indexes, rebuild all the indexes.
Executed the ETL, reading from source production, and all is OK, execution times are within params.
Configured the reporting tool (that generates SQL over the database), all ok.
problem with some reports.
Copy the SQL to the Management studio to debug and just to generate the explain plan of this query, on the SQL Server 2016 it takes 8 sec, but on the SQL Server 2019 several minutes (after 5 minutes, I cancelled the request)
Why?
Then I:
checked the memory "Available physical memory is high"
rebuilt the indexes
confirm that the disks were SSD
execute the explain plan and check if the CPUs where being used (monitor)
updated the statistics (exec sp_updatestats)
installed the CU9 and restart the SQL Server 2019 (not the server)
cut the query to be able to generate the explain plan on both servers.
compare explain plans (between 2016 and 2019) and change the "Cost Threshold for Parallelism" and the "Max Degree of Parallelism" to 0 because 2016 used parallelism and 2019 was not. Same problem.
use HINT to force parallelism, but with same execution times again.
then out of nothing and without HINT, it was using now parallelism on the short explain plan, but still unable to generate the complete explain plan.
the query was reading from ## tables so I've created normal tables on the database, same problem.
Bottom line
For me, it's strange the amount of time that SQL Server 2019 needs to generate the explain plan, while the SQL Server 2016 only need a couple of seconds.
How can I troubleshoot this?
I have experienced very similar problem with SQL Server 2019 (RTM-CU16-GDR) on windows 2019.
The query was a simple query like "select count(*) from Schema1.Table1 where report_date='2022-01-23' and type = 2 and DueDate='2022-03-18'". I just tried to see estimated execution plan but it took 3 minutes. When I went into details, I have realized that Statistic is created for DueDate automatically. Since the statistic is created, plan generation took just a few seconds. I When I remove the statistics, again it took 3 minutes. When I created the statistics of DueDate manually, plan generation took a few seconds which was very good indeed.
To find solution I turned off AUTO_CREATE_STATISTICS off and on, and then it behaved normal, plan generation took a few seconds. Here is the script.
ALTER DATABASE [DbName] SET AUTO_CREATE_STATISTICS OFF
GO
ALTER DATABASE [DbName] SET AUTO_CREATE_STATISTICS ON
GO
After this simple silly turning OFF and ON, even after removing the specific statistic of the column, the estimated plan was generated in seconds instead of minutes.
I am situation where I have to check and confirm whether SSAS partitions queries are running parallel or not while processing the SSAS cube using SSIS job. SSIS job/package using 'Analysis Services Processing Task' to process cube by selecting each object(dimensions and partitions) in it instead of selecting direct SSAS DB.
Can any one please guide how to check parallelism using sql profiler?
Also if anyone can point out why cube processing using above way is taking longer than the cube processing by SSIS job in which 'Analysis Services Processing Task' selecting ssas db name directly.
please help with any comments/ suggestions.
Many Thanks
Regards,
Update: My end db from which partitions will fetch the data is Oracle
I think there is an easier way than using SQL Profiler, you can benefit from the amazing stored procedure sp_whoisactive to check what are the current query running on the server (Data Source SQL Database Engine) while processing the Analysis Services Processing Task.
Just download the stored procedure and create it on your master database.
sp_whoisactive homepage
How to Log Activity Using sp_whoisactive
Hint: In SQL Server Management Studio, go to data source properties and check the maximum allowed connections property, since it may prevent queries parallel execution
If you are looking for an answer using SQL Profiler, you can simply create a trace to monitor the SQL Server that contains the data sources used by partitions. And while the partitions are processed if many SQL queries are executed in parallel then parallelism is achieved.
If you are new to SQL Profiler you can refer to the following links to learn how to monitor only T-SQL commands:
How to monitor just t-sql commands in SQL Profiler?
Use SQL Server Profiler to trace database calls from third party applications
But if you are looking for a simpler solution, then the other answer is what you are looking for.
we had one SSIS package with Oracle 11 Client, we would run our daily query with 30min to 1 hour run time.
we had to upgrade our oracle clients as one of our other oracle source got upgraded.
post upgrade to Oracle 12c, our daily job run time increased.
oracle DBA said, its not running in parallel, as its occupying only one processor.
when we run the same query from SQL Developer or toad, its running in parallel. but if we run from SSIS OLEDB Source component its not running in parallel.
I'm clue less with this behavior. any solution will be helpful.
ask me more clarifications if required.
Trying to figure out the issue
I tried to search on this topic, i didn't find a lot of informations but i think it is based on the OLEDB Connection string provided in the OLEDB Connection Manager.
Check the following Oracle documentation it may give you some insights:
Features of OraOLEDB
In the link above, in the Distributed Transactions part, they mentioned that:
The DistribTX attribute specifies whether sessions are enabled to enlist in distributed transactions. Valid values are 0 (disabled) and 1 (enabled). The default is 1 which indicates that sessions are enabled for distributed transaction enlistments.
Sessions enabled for distributed transaction enlistments cannot run statements that use the direct path load and parallel DML capabilities of the Oracle database. Such statements are executed as conventional path serial statements.
I am not sure if this could help, but it is not bad to give a try.
Oracle Attunity Connectors
Instead of using OLEDB Source to read from oracle, it is better to use Oracle Attunity Connectors for SSIS which guarantee higher performance than OLEDB Source:
Microsoft Connectors By Attunity
Attunity's high speed connectors for Oracle and Teradata have been selected by Microsoft to be included with SQL Server Integration Services (SSIS).
When I use the OLE DB Destination Editor on SSIS to connect to an oracle database, it takes a long time (15-30 minutes) to retrieve the list of tables in the database.
The database is not so big (400 tables and 50 views) and is based on Oracle 12.
I have done some SSIS packages connected to other Oracle databases before and it has never been so long. What can I look at to determine what is the issue and how to solve it ?
Thanks,