Am new to hiberante JPA. I am working on oracle to postgres migration and we are not using aws dms service for data migration. We would like to move ahead with Java for copying tables which have more than 1 million records. I have problem for below scenario.
Table A - Oracle
Table B - PostGres
Am extracting records from Oracle using ScrollableResults. Once i have the data from Oracle, i need to loop up a value in postgres database for data from Oracle before performing insert into postgres database.
I thought first #ColumnTransformer will help but it is not helping as i dont know how to reference data from oracle on ColumnTransformer expression.
So finally went ahead with writing normal insert query with values and subquery for lookup. Also set hibernate.jdbc.batch_size to 100.
I executed the program in this way and it took 5 mins for 10k records which i feel is slow.
is there any other solution for this problem to improve the performance.
Thanks for all your help
I found the solution. I solved it by storing postgres lookup table in list object then performing search in lookup table list object before performing insert. Now the speed is good.
Related
Is there a way to replicate data(like triggers or jobs) from oracle tables to postgres tables and vice versa(for different set of tables) without using external tools? Just one way replication for both the scenarios.
Just a hint:
You can think of create a DB link from Oracle to Postgres which is called heterogeneous connectivity which makes it possible to select data from Postgres with a select statement in Oracle.
Then use materialized views to schedule and store the results of those selects.
As you don't want to use any external tool otherwise the solution should have been much simpler
for 20 tables I need to replicate data from oracle to postgres. For 40 different tables, I need to replicate from postgres to oracle.
I could imagine the following setup:
For the Oracles tables that need to be accessible from Postgres, simply create foreign tables inside the Postgres server. They appear to be "local" tables in the Postgres server, but the FDW ("foreign data wrapper") will forward any request to the Oracle server. So no replication required. Whether or not this will be fast enough depends on how you access the tables. Some operations (WHERE clause, ORDER BY etc) can be pushed down to the Oracle server, some will be done by the Postgres server after all the rows have been fechted.
For the Postgres tables that need to be replicated to Oracle you could have a similar setup: create a foreign table that points to the target table in Oracle. Then create triggers on the Postgres table that will update the foreign table, thus sending the changes to Oracle.
This could all be managed on the Postgres side.
I am using a loop in scala to query an Oracle table every 10 second, since Oracle table get continuously insertion. I create a select request then I create n json string containing n line from oracle that I push into Elasticsearch. After that I create a delete request to erase the n line from Oracle table that I have inserted into ES. I developped a completely beginner approach. So can you suggest me a better approach to load in real time or micro batch data from Oracle to ES and delete from Oracle. I heard about logstach or SreamSets. Do you have any idea? Thanks
Can anyone suggest how to convert CONNECT BY Oracle query into Greenplum. Greenplum doesn't support recursive queries. So, we can not use WITH RECURSIVE. Is there any alternate solution to re-write the below query.
SELECT child_id, Parnet_id, LEVEL , SYS_CONNECT_BY_PATH (child_id,'/') as HIERARCHY
FROM pathnode
START WITH Parnet_id = child_id
CONNECT BY NOCYCLE PRIOR child_id = Parnet_id;
There are ways to do this but it will be a one-off per query. You will need to create a function that loops through your pathnode table and "return next" to return each row. You can search on this site to find examples of doing this with PostgreSQL 8.2.
Work is happening to rebase Greenplum to PostgreSQL 8.3, 8.4, and so on. Those later PostgreSQL versions support "with recursive" which is the ANSI SQL way to write your SQL but Greenplum doesn't support it yet. When it does get supported by Greenplum, I don't think it will perform all that well. The query will force looping and individual row lookups. This works great in an OLTP database but not so well for an MPP database.
I suggest you transform your data in Oracle with a VIEW and then just dump the view to a file to load into Greenplum. The DDL of having a self-referencing, N-level table will never be a good idea in an MPP database.
Hey EXPERIENCED SSIS DEVELOPERS, I need your help.
High-Level Requirements
Query SQL Server table (on a different server than my SSIS server) resulting in about 200-300k records results set.
Use three output colums for each row to lookup date in Oracle database.
Insert or Update SQL Server table with results.
Use SSIS.
SQL Server 2008
Sounds easy, right?
Here is what I have done:
Created on Control Flow Execute SQL Task that gets a recordset from SQL Server. Very fast, easy query, like select field1, field2, field 3 from table where condition > 0. That's it. Takes less than a second.
Created a variable (evaluated as expression) for the Oracle query that uses the results set from the above in the WHERE clause.
Created a ForEachLoop Container that takes the results (from #1 above) for each row in the recordset and runs it through a Data Flow that uses the Oracle query (from #2 above) with Data access mode: SQL command from variable against an Oracle data source. Fast, simple query with only about 6 columns returned.
Data Conversion - obvious reasons - changing 3 columns from Oracle data types to SQL Server data types.
OLE DB Destination to insert to SQL Server using Fast Load to staging table.
It works perfectly! Hooray! Bad news - it is very, very slow. When I say slow, I mean it process 3000 records per hour. Holy moly - so freaking slow.
Question: am I missing a way to speed it up? It seems like the ForEachLoop Container is the bottleneck. Growl.
Important Points:
- I have NO write access in Oracle environment, so don't even suggest a potential solution that requires it. Not a possibility. At all.
Oracle sources do not allow for direct parameter definition. So no SELECT FIELD FROM TABLE WHERE ?. Don't suggest it - doesn't work.
Ideas
- Should I find a way to break down the results of the Execute SQL task and send them through several ForEachLoop Containers for faster processing?
Is there another design that is more appropriate?
Is there a script I can use that is faster?
Would it be faster to create a temporary table in memory and populate it - then use the results to bulk insert to SQL Server? Does this work when using an Oracle data source?
ANY OTHER IDEAS?
visual studion 2008
oracle db 11.1.0.7
oracle client for .NET
I have a relatively simple query, that selects the rows from across multiple tables (up to 4) using joins. OracleDataAdapter returns no rows for the only dataset's table, but if I copy and paste that query in SQLDeveloper then I get the desired results.
I can get the data from other tables using the adapter with no problem, but it seems like it struggles with the bit longer selection query (string length is ~ 300 (not that much at all))
Connection string for the connection is 100% correct.
Any ideas? thank you...
Check that you using same oracle user to connect to database. Maybe FGAC hides data.
Check that there is no temporary tables in you query.
Solution by OP.
The problem was, that after I imported the data in the SQLDeveloper in one of the involved tables, this change hasn't been committed automatically as I've falsely presumed... I've figured this out after I edited some data in the same table within the SQLDeveloper, and it has failed with the message that the edit operation on uncommitted action is now allowed. The headache I had was in SQLDeveloper, not the DataAdapter.