jdbc with oracle DB - out of memory - oracle

I've written a simple code that reads a table from oracle DB.
I try to run in on a very big table and I see that it consumes a huge amount of memory.
I thought that using fetchsize will cause it to optimize memory usage (that what happens when using it on SQLSERVER), but it didn't. tried it with various values - from 10 to 100000.
Can't see how I manage to perform a simple task - export a very big oracle table to a csv file.
I use ojdbc6.jar as a driver.
also I use
Any idea?

Seems like creating the statement with ResultSet.TYPE_FORWARD_ONLY solved this problem.


Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

How to implement ORACLE to VERTICA replication?

I am in the process of creating an Oracle to Vertica process!
We are looking to create a Vertica DB that will run heavy reports. For now is all cool Vertica is fast space use is great and all well and nice until we get to the main part getting the data from Oracle to Vertica.
OK, initial load is ok, dump to csv from Oracle to Vertica, load times are a joke no problem so far everybody things is bad joke or there's some magic stuff going on! well is Simply Fast.
Bad Part Now -> Databases are up and going ORACLE/VERTICA - and I have data getting altered in ORACLE so I need to replicate my data in VERTICA. What now:
From my tests and from what I can understand about Vertica insert, updates are not to used unless maybe max 20 per sec - so real time replication is out of question.
So I was thinking to read the arch log from oracle and ETL -it to create CSV data with the new data, altered data, deleted values-changed data and then applied it into VERTICA but I can not get a list like this:
Because explicit data change in VERTICA leads to slow performance.
So I am looking for some ideas about how I can solve this issue, knowing I cannot:
Alter my ORACLE production structure.
Use ORACLE env resources for filtering the data.
Cannot use insert, update or delete statements in my VERTICA load process.
Things I depend on:
The use of copy command
Data consistency
A max of 60 min window(every 60 min - new/altered data need to go to VERTICA).
I have seen the Continuent data replication, but it seems that nowbody wants to sell their prod, I cannot get in touch with them.
will loading the whole data to a new table
and then replacing them be acceptable?
copy new() ...
-- you can swap tables in one command:
alter table old,new,swap rename to swap,old,new;
truncate new;
Extract data from Oracle(in .csv format) and load it using Vertica COPY command. Write a simple shell script to automate this process.
I used to use Talend(ETL), but it was very slow then moved to the conventional process and it has really worked for me. Currently processing 18M records, my entire process takes less than 2 min.

What can I do to enhance the performance of bulk data loading using Derby?

I am using Derby In-Memory DB. I need to perform some data loading from csv files in the beginning. For now, it takes about 25 seconds to load all the csv files into their tables. I hope the time can be reduced. Due to the data files are not very large actually.
What I have done is using the built-in procedure from derby.
{CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (?,?,?,',','"','UTF-8',1 )} or
The only special thing is sometimes the data in one tables is splitted into many small csv files. So I have to load them one by one.And I have tested if I can combine them together, it will only take 16 seconds. However I cannot remove this feature because it is needed by the user.
Is there anything I can do to reduce the time of loading data? Should I disable log or write some user-defined function/procedure or any other tune can be done? Any advice will be fine.
Use H2 instead of Derby, and use the CSVREAD feature. If that's still too slow, see the fast import optimization, or use the CSV tool directly (without using a database). Disclaimer: I wrote the CSV support for H2.

How can I speed up loading data in Oracle tables?

I have some very large tables (to me anyway), as in millions of rows. I am loading them from a legacy system and it is taking forever. Assuming hardware is ok that is fast. How can I speed this up? I have tried exporting from one system into CSV and used Sql loader - slow. I have also tried a direct link from one system to another so there is no middle csv file, just unload from one load into another.
One person said something about pre-staging tables and that somehow could make things faster. I don't know what that is or if it could help. I was hoping for input. Thank you.
Oracle 11g is what is being used.
update: my database is clustered so I don't know if I can do anything to speed things up.
What you can try:
disabling all constraints and only enabling them after the load process
CTAS (create table as select)
What you really should do: understand what is you bottleneck. Is it network, file I/O, checking constraints ... then fix that problem. For me looking at the explain plan is most of the time the first step.
As Jens Schauder suggested, if you can connect to your source legacy system via DB link, CTAS would be the best compromise between performance and simplicity, as long as you don't need any joins on the source side.
Otherwise, you should consider using SQL*Loader and tweaking some settings. Using direct path I was able to load 100M records (~10GB) in 12 minutes on a 6 year old ProLaint.
EDIT: I used the data format defined for the Datamation sort benchmark. The generator for it is available in the Apache Hadoop distribution. It generates records with fixed width fields with 99 bytes of data plus a newline character per line of file. The SQL*Loader control file I used for the numbers quoted above was:
INFILE 'rec100M.txt' "FIX 99"
What is the configuration you are using?
Does the database where the data is imported have something like a standby database coupled to it? If so, it is very likely to have a configuration with force_logging enabled?
You can check this using
SELECT FORCE_logging from v$database;
It can also be enabled at tablespace level:
SELECT TABLESPACE_name,FORCE_logging from DBA_tablespaces
If your database is running ith force_logging, or your tablespace has force_logging, this will have impact on the import speed.
If this is not the case, check if archivelog mode is enabled.
SELECT LOG_mode from v$database;
If so, it could be that the archives are not written fast enough. In that case increase the size of the online redolog files.
If the database is not running archivelog mode, it still has to write to the redo files, if not using direct path inserts. In that case, check how quick the redo's can be written. Normally, 200GB/h is very well possible, when indexes are not playing a role.
Important is to find what link is causing the lack of performance. It could be the input, it could be the output. Here I focused on the output.

Hibernate with Oracle JDBC issue

I have a select query which takes 10 min to complete as it runs thru 10M records. When I run thru TOAD or program using normal JDBC connection I get the results back, but while running a Job which uses Hibernate as ORM does not return any results. It just hangs up ...even after 45 min? Please help
Are you saying you trying to retrieve 10M records using an ORM like hibernate?
If this is the case you have one big problems, you need to redesign your application because this is not going to work, and about why it hangs up, well, I bet is because it runs out of memory.
Have you enabled SQL output for Hibernate? You need to set hibernate.show_sql to true in order to do that.
Once that's done, compare the generated SQL with the one you're been running through TOAD. Are they exactly the same or not?
I'm going to venture a guess here and say they're not because once SQL is generated Hibernate does nothing fancy - connection is taken from a pool; prepared statement is created and executed - so it should be no different from JDBC.
Thus the question most likely is how can your HQL be optimized. If you need any help with that you'll have to post the HQL in question as well as appropriate mappings / table schemas. Running explain on query would help as well.
