I want to load about 2 million rows from CSV formatted file to database and run some SQL statement for analysis, and then remove the data. File size is 2GB in size. Data is web server log message.
Did some research and found H2 in-memory database seems to be faster, since its keep the data in memory. When I try to load the data got OutOfMemory error message because of 32 bit java. Planning to try with 64 bit java.
I am looking for all the optimization option to load the quickly and run the SQL.
test.sql
CREATE TABLE temptable (
f1 varchar(250) NOT NULL DEFAULT '',
f2 varchar(250) NOT NULL DEFAULT '',
f3 reponsetime NOT NULL DEFAULT ''
) as select * from CSVREAD('log.csv');
Running like this in 64 bit java:
java -Xms256m -Xmx4096m -cp h2*.jar org.h2.tools.RunScript -url 'jdbc:h2:mem:test;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0' -script test.sql
If any other database available to use in AIX please let me know.
thanks
If the CSV file is 2 GB, then it will need more than 4 GB of heap memory when using a pure in-memory database. The exact memory requirements depend a lot on how redundant the data is. If the same values appear over and over again, then the database will need less memory as common objects are re-used (no matter if it's a string, long, timestamp,...).
Please note the LOCK_MODE=0, UNDO_LOG=0, and LOG=0 are not needed when using create table as select. In addition, the CACHE_SIZE does not help when using the mem: prefix (but it helps for in-memory file systems).
I suggest to try using the in-memory file system first (memFS: instead of mem:), which is slightly slower than mem:, but needs less memory usually:
jdbc:h2:memFS:test;CACHE_SIZE=65536
If this is not enough, try the compressed in-memory mode (memLZF:), which is again slower but uses even less memory:
jdbc:h2:memLZF:test;CACHE_SIZE=65536
If this is still not enough, I suggest to try the regular persistent mode and see how fast this is:
jdbc:h2:~/data/test;CACHE_SIZE=65536
Related
My Oracle 11.2.0.3 FULL DATABASE Datapump Export is very slow, when i ask V$SESSION_LONGOPS
SELECT USERNAME,OPNAME,TARGET_DESC,SOFAR,TOTALWORK,MESSAGE,SYSDATE,ROUND(100*SOFAR/TOTALWORK,2)||'%' COMPLETED FROM V$SESSION_LONGOPS
where SOFAR/TOTALWORK!=1
it show me 2 records, in opname one containing the SYS_EXPORT_FULL_XX, and another "Rowid Range Scan" and the message for the last one is
Rowid Range Scan : MY_SCHEMA.BIG_TABLE: 28118329 out of 30250532 Blocks done and it takes hours and hours.
I.E : MY_SCHEMA.BIG_TABLE is a 220 GB table size having 2 CLOB colunn.
If you have CLOBs in the table it will take a long time to export because that wont parallelize. Exactly what phase are you stuck in? Could you paste the last lines from the log file or get a status from data pump?
There are some best practices that you could try out:
SecureFile LOBs can be faster than BasicFile LOBs. That is yet another reason for going to SecureFile LOBs.
You could try to increase the STREAMS_POOL_SIZE to 256 MB (at least) although I think that is not the reason.
Use PARALLEL option and set it to 2 x CPU cores. Never export statistics - it is better to either export using DBMS_STATS or regather at target database.
Regards,
Daniel
Well for 11g and 12cR1 the Streams AQ Enqueue is a common culprit for this as well. If you ALTER SYSTEM SET EVENTS 'IMMEDIATE TRACE NAME MMAN_CREATE_DEF_REQUEST LEVEL 6' this will help if the issue is the very common Streams AQ Enqueue.
Im using oracle_fdw extension in my postgresql database. I'm trying to copy the data of many tables in the oracle database into my postgresql tables. I'm doing so by running insert into local_postgresql_temp select * from remote_oracle_table. The performance of this operation are very slow and I tried to check the reason for that and mybe choose a different alternative.
1)First method - Insert into local_postgresql_table select * from remote_oracle_table this generated total disk write of 7 M/s and actual disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
2)second method - copy (select * from oracle_remote_table) to /tmp/dump generates total disk write of 4 M/s and actuval disk write of 100 K/s. The copy utility suppose to be very fast but it seems very slow.
-When I run copy from the local dump, the reading is very fast 300 M/s.
-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.
-The wals directory is located on a different file system.
The parameters I assigned :
min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 12GB
work_mem = 128MB
maintenance_work_mem = 4GB
shared_buffers = 2000MB
RAM : 16G
CPU CORES : 8
HOW can I increase the writes ? How can I get the data faster from the oracle database to my postgresql database?
I run perf on this entire process and the results :
I'd concentrate on the bits you said were fast.
-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.
-When I run copy from the local dump, the reading is very fast 300 M/s.
I'd suggest you combine these two. Use a dump tool (or SQLPLUS) to export the data from Oracle into a file that Postgresql's COPY command can read. You could generate the binary file format directly but its a bit tricky, but generating a CSV separated version etc, shouldn't be too tricky. An example for that is at How do I spool to a CSV formatted file using SQLPLUS?
I tested Hive with the following queries:
create table test (key string, value string) stored as orc;
insert into table test values ('a','a'), ('b','b');
select key, count(*) from test group by key;
And I got the out-of-memory error:
Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
I have searched online, but people usually got this error when they were working on some bigger files. In my case, the file only has two rows, and my computer has 14G memory.
I have set /etc/hadoop/conf/hadoop-env.sh HADOOP_HEAPSIZE to 1024. It does not work.
First I increased tez.runtime.io.sort.mb, but I got this error instead: tez.runtime.io.sort.mb should be larger than 0 and should be less than the available task memory
Then I increased hive.tez.java.opts (and some other parameters) as suggested by #Hellmar Becker. That fixed the problem.
I got the same error while creating the truck table as ORC in this Hadoop Hello World tutorial. You can try to compress the ORC storage using:
CREATE TABLE XXX STORED AS ORC TBLPROPERTIES ("orc.compress.size"="1024");
I hope this helps (for me, it worked).
They also agree that its issue in the Sandbox...
https://community.hortonworks.com/questions/34426/failure-to-execute-hive-query-from-lab-2.html#comment-35900
Tried many solutions , not working . Time being using this work around -
CREATE TABLE avg_mileage (truckid STRING,avgmpg BIGINT ) STORED AS ORC;
I am seeing poor performance in Oracle (11g) when trying to copy CLOBs from one database to another. I have tried several things, but haven't been able to improve this.
The CLOBs are used for gathering report data. This can be quite large on a record by record basis. I am calling a procedure on the remote databases (across a WAN) to build the data, then copying the results back to the database at the corporate headquarters for comparison. The general format is:
CREATE TABLE my_report(the_db VARCHAR2(30), object_id VARCHAR2(30),
final_value CLOB, CONSTRAINT my_report_pk PRIMARY KEY (the_db, object_id));
To gain performance, I accumulate the results for remote sites into remote copies of the table. At the end of the procedure run, I try to copy the data back. This query is very simple:
INSERT INTO my_report SELECT * FROM my_report#europe;
The performance that I am seeing is around 9 rows per second, with an average CLOB size of 3500 bytes. (I am using CLOBs as this size often goes above 4k, the VARCHAR2 limit.) For 70,000 records (not uncommon) this takes around 2 hours to transfer. I have tried using the create table as select method, but this gets the same performance. I also spent more than a few hours tuning SQL*NET, but see no improvement from this. Changing the Arraysize does not improve the performance (though it can reduce it if the value is reduced.
I am able to get a copy over using the old exp/imp methods (export the table from remote, import it back in), which runs much faster, but this is fairly manual for my automated report. I have considered trying to write a pipelined function to select this data from, using it to split the CLOBS into BYTE/VARCHAR2 chunks (with an additional chunk number column), but didn't want to do this if someone had tried it and found a problem.
Thanks for your help.
I was able to get better performance when increasing arraysize to 1500 or higher. See also attached document: http://www.fors.com/velpuri2/PERFORMANCE/SQLNET.pdf
Perhaps this is normal, but in my Oracle 11g database I am seeing programmers using Oracle's SQL Developer regularly consume more than 100MB of combined UGA and PGA memory. I'd like to know if this is normal and what can be done about it. Our database is on the 32 bit version of Windows 2008, so memory limitations are becoming an increasing concern. I am using the following query to show the memory usage:
SELECT e.SID, e.username, e.status, b.PGA_MEMORY
FROM v$session e
LEFT JOIN
(select y.SID, y.value pga,
TO_CHAR(ROUND(y.value/1024/1024),99999999) || ' MB' PGA_MEMORY
from v$sesstat y, v$statname z
where y.STATISTIC# = z.STATISTIC# and NAME = 'session pga memory') b
ON e.sid=b.sid
WHERE (PGA)/1024/1024 > 20
ORDER BY 4 DESC;
It seems that the resource usage goes up any time a table is opened in SQLDeveloper, but even when it is closed the memory does not go away. The problem is worse if the table is sorted while it was open as that seems to use even more memory. I understand how this would use memory while it is sorting, and perhaps even while it is still open, but to use memory after it is closed seems wrong to me. Can anyone confirm this?
Update:
I discovered that my numbers were off due to not understanding that the UGA is stored in the PGA under dedicated server mode. This makes the numbers lower than they were, but the problem still remains that SQL Developer seems to use excessive PGA.
Perhaps SQL Developer doesn't close the cursors it had opened.
So if you run a query which sorts a million rows and SQL Developer fetches only first 20 rows from there, it needs to keep the cursor open should you want to scroll down and fetch more.
So, it needs to keep some of the PGA memory associated with the cursor's sort area still allocated (it's called retained sort area) as long as the cursor is open and hasn't reached EOF (end-of-fetch).
Pick a session and run:
select sql_id,operation_type,actual_mem_used,max_mem_used,tempseg_size
from v$sql_workarea_active
where sid = &SID_OF_INTEREST
This should show whether some cursors are still kept open with their memory...
Are you using Automatic Memory Management? If yes, I would not worry about the PGA memory used.
See docs:
Automatic Memory Management: http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/memory003.htm#ADMIN11011
MEMORY_TARGET: http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/initparams133.htm
Is there a reason you are using 32 bit Oracle? Most recent hardware supports 64 bit.
Oracle, especially with AMM, will use every bit of memory on the machine you give it. If it doesn't have a reason to de-allocate memory it will not do so. It is the same with storage space: if you delete 20 GB of user data that space is not returned to the OS. Oracle will hold on to it unless you explicitly compact the tablespaces.
I believe a simple test should relieve your concerns. If it's 32 bit, and each SQL Developer session is using 100MB+ of RAM, then you'd only need a few hundred sessions open to cause a low-memory problem...if there really is one.