Exporting data to csv file: sqlplus vs sqlcl parallel vs pl/sql utl_file dbms_parallel_execute, which one is faster - performance

In my last project we were working on a requirement where huge data (40 million rows) needs to read and for each row we need to trigger a process. As part of design we used multithreading where each thread fetch data for a given partition using Jdbc Cursor with a configurable fetch size. However when we ran the job in the application in the Prod environment, we observed that it is slow as it is taking more time in querying data from database.
As we had very tight time lines on completion of job execution, we have come up with work around where the data is exported from SQL Developer in csv file format and split in to small files. These files are provided to job. This has improved the job performance significantly and helped completing the job on time.
As mentioned above we have used manual step to export the data to the file. If this need to automate this step, executing exporting step from Java App for instance, which one of the below options (which are suggested on the web) will be faster.
sqlplus (Java making native call to sqlplus)
sqlcl parallel spool
pl/sql procedure with utl_file and dbms_parallel_execute
Below link gives some details on the above but does not have stats.
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9536328100346697722
Please note that currently I don't have access to this Oracle environment, so could not test from my side. Also I am an application developer and don't have much expertise on DB side.
So looking for advise from some one who have worked on similar use case earlier or have relevant expertise.
Thanks in advance.

Related

How to check if SSAS Partitions are running in parallel or not using SQL Profiler?

I am situation where I have to check and confirm whether SSAS partitions queries are running parallel or not while processing the SSAS cube using SSIS job. SSIS job/package using 'Analysis Services Processing Task' to process cube by selecting each object(dimensions and partitions) in it instead of selecting direct SSAS DB.
Can any one please guide how to check parallelism using sql profiler?
Also if anyone can point out why cube processing using above way is taking longer than the cube processing by SSIS job in which 'Analysis Services Processing Task' selecting ssas db name directly.
please help with any comments/ suggestions.
Many Thanks
Regards,
Update: My end db from which partitions will fetch the data is Oracle
I think there is an easier way than using SQL Profiler, you can benefit from the amazing stored procedure sp_whoisactive to check what are the current query running on the server (Data Source SQL Database Engine) while processing the Analysis Services Processing Task.
Just download the stored procedure and create it on your master database.
sp_whoisactive homepage
How to Log Activity Using sp_whoisactive
Hint: In SQL Server Management Studio, go to data source properties and check the maximum allowed connections property, since it may prevent queries parallel execution
If you are looking for an answer using SQL Profiler, you can simply create a trace to monitor the SQL Server that contains the data sources used by partitions. And while the partitions are processed if many SQL queries are executed in parallel then parallelism is achieved.
If you are new to SQL Profiler you can refer to the following links to learn how to monitor only T-SQL commands:
How to monitor just t-sql commands in SQL Profiler?
Use SQL Server Profiler to trace database calls from third party applications
But if you are looking for a simpler solution, then the other answer is what you are looking for.

Take Data From Oracle to Cassandra in every day

We want to take tables from Oracle to Cassandra every day. Because tables is updated in Oracle everyday. So when i searched this , i find these options:
Extract oracle tables as a file , then write Cassandra
Using sqoop to get tables from oracle, write Map Reduce job and insert into Cassandra ?
I am not sure which way is the appropriate ? Also is there another options ?
Thank you.
Option 1
Extracting oracle tables as a file and then writing to Cassandra manually everyday can be tiresome process unless if you are scheduling a cron job. I have tried this before, but if the process fails then logging it might be an issue. If you are using this process and exporting to CSV and trying to write to cassandra then I would suggest using cassandra bulk loader (https://github.com/brianmhess/cassandra-loader)
Option 2
I haven't worked with this, so can't speak about this.
Option 3 (I use this)
I use an open source tool, Pentaho Data Integration (Spoon) (https://community.hitachivantara.com/docs/DOC-1009855-data-integration-kettle) to solve this problem. It's fairly a simple process
spoon. You can automate this process by using a carte server (spoon server) which has logging capabilities as well as automatic restarting if the process failed in between.
Let me know if you found any other solution that worked for you.

Oracle application - migration to Exadata server

We have an upcoming migration of our Oracle database to an Exadata server. I want to clarify some issues I have thought of:
Will there be any issues with the code - performance issues? Exadata has another type of optimizer, it doesn’t uses indexes, has a columnar optimizer, if I’m not misleading,
Currently there are some import or export files generated on the database server (accessed via Filezilla). I understand that at Exadata the database server is inaccessible, and I suspect that either:
• we will have to move those files to another server - Oracle knows only FTP (which has ports closed at our client) -> how do we write / read from another server? (as far as I understand, they would like to put all the files on the WAS server)
• or we will need to import the files into the table using the java application and process them from there (and the same with the exported files).
Files that come automatically from other applications can be written to the database server? Or we have the same problems as for the manual part.
We have plenty of database jobs that run KSH scripts on the database server - is there a problem with them? I understand they should also be moved to the WAS server, but I do not know how Oracle will call them from there.
Will there be any problems with Jenkins deployments? Anything changed? Here we save the SQL/PLSQL sources in some XML files, from which the whole application is restored (packages, configuration tables, nomenclatures ...) (with the exception of the working data) (the XML files are read through a procedure from an oracle directory).
If you can think of any other issues concerning this migration, any problems you have encountered during or after the migration to Exadata, please share!
Thank you,
Step by step:
On exadata you are going to have the same optimizer behaviour with some improvements because the exadata may improve full table scan performance thanks to smart full scans. Indeed the exadata is able to avoid retrieving data blocks in fts because it knows in advance they do not contain neeeded data.
In the exadata you can export to external servers DBFS file systems, that might be useful for external tables, imports/exports and so on.
You can write your files on the DBFS you can configure.
You could use your DBFS, if you want the ksh files are accessed from outside your exadata.
Let your oracle directory point to a directory in the DBFS file system where you put your xml files and you are done.

Stored Procedures Overwhelming Oracle.EXE On Oracle 11g On Windows

Until very recently we ran a 3rd party HR database on an Oracle Unix environment. I have additionally set up various web services that hit stored procedures to carry out a few bespoke processes for our users, and all ran well for years.
However, now that we have moved to Oracle on a Windows environment there is suddenly a big problem.
The best example I have is a VB.Net solution that reads in a 2000 row CSV of employees into a datatable, runs a couple of stored procedures to bring back Post Id etc, populates a database table with the results, then feeds it all back out into a new CSV. This process used to take 1-2 minutes to complete on Unix. It now takes well over 2 hours and kills the server!
The problem manifests by overwhelming the CPU on the database server. Any stored procedure call sends Oracle.EXE into overdrive, completely max-ing out the CPU core that it's using such that no other stored procedures can be run and everything grinds to a halt.
We have run Oracle Enterprise Manager, which suggested the creation of some indexes etc, but nothing will improve the issue. Like I say, the SQL ran fine and swiftly for years, and it hasn't changed at all.
Does anybody know what could be causing this? I am completely at a loss.
The way I see it, it must either be:
1. A CPU/hardware issue (but we have investigated, added extra cores etc to no avail)
2. An Oracle configuration issue?; or
3. An issue with the 3rd party database (which is supposedly identical to what it was on Unix).
Thanks to anyone who read this far.
P.S. I've had a Stack Overflow user account for years but can't get logged into it any more. Back to noobie status for me!

utPLSQL and existing data?

I'm struggling to deploy utPLSQL to improve quality in my current project. The problem is that there are currently almost 1000 database tables and nearly 800 PL/SQL packages. Also I'm very new using utPLSQL framework but have some experience in SQL and PL/SQL.
I cannot rely on existing data to stay the same during and between test runs in order to produce same test results since there are dozens of developers changing the data constantly. What I'm looking for is to create temporary test tables in the tester schema based on existing production tables, fill them with test data and make PL/SQL code to use those test tables when running tests. Is it even possible? If not, what approach should I use?
I've been reading Kevin McCormack's article Continuous Integration with Oracle PL/SQL, utPLSQL and Hudson but the problem is I cannot spend too much time for reading and trying to find solution before the idea of using utPLSQL framework will be mothballed by the organization I'm working for.
Any help would be most appreciated.
When using utPLSQL I have each test create any data it needs, execute the test against the created data, then roll back the transaction at the end which effectively removes the test data from the database. This takes extra time because I have to figure out what data actually needs to be created, but ensures that the data exists when it's needed and doesn't hang around when it isn't needed - and the tests don't count on data which may or may not exist. YMMV.

Resources