Data Transfer from one table to other table using WSO2 EI - wso2-enterprise-integrator

I am trying copy data from one table to another identical table using data service concept in wso2 ei.
The source table is having 99997 records, so I am pulling data with limit like [select * from table between 1 to 750] but the resultset takes around 3 mins of time. If I continue with this logic the whole copying would take hours, which is not fruitful.
Can anyone help how can this be achieved using WSO2 EI

Since this is db related requirement, you don’t need to use a EI. Best method would be to take a DB dump and restore it in the second database.

Related

Data Readiness Check

Let's say there is a job A which executes a Python to connect to Oracle, fetch the data from Table A and load the data into Snowflake once a day. Application A dependent on Table A in Snowflake can just depend on the success of job A for further processing, this is easy.
But if the data movement is via Replication (Change Data Capture from Oracle moves to s3 using Golden Gate, pipes pushes into stage, stream to target using Task every few mins) - what is the best way to let Application A know that the data is ready? How to check if the data is ready? is there something available in Oracle, like a table level marker that can be moved over to Snowflake? Table's in Oracle cannot be modified to add anything new, marker rows also cannot be added - these are impractical. But something that Oracle provides implicitly, which can be moved over to Snowflake or some SCN like number at the table level that can be compared every few minutes could be a solution, eager to know any approaches.

Order by making very slow the application using oracle

In my application I need to generate the report for transaction history which is done by all clients. I have used Oracle 12c for my application. I have 300k clients. This table is related client details and transaction history table. I have written the query to generate showing transaction history per month. It returns near 20 million records.
SELECT C.CLIENT_ID, CD.CLIENT_NAME, ...... FROM CLIENT C, CLIENT_DETAILS CD,
TRANSACTION_HISTORY TH
--Condition part
ORDER BY C.CLIENT_ID
These 3 tables have right indexes which is working fine. But when fetching data using order by to showing customers in order this query takes 8 hours to execute the batch process.
I have analysed the cost of the query. Cost is 80085. But when I remove the order by from query the cost became to 200. So that I have removed the order by as of now. But I need to show the customers by order. I cannot use the limit. Is there any way to overcome this?
you can try indexing the client id in the table, which would speed up the performance of the table to fetch the data in some order.
you can use the link for the reference: link
Hope this would help you

Query returning single record taking too much time in a EJB-Hibernate Application along with Oracle DB

I am working with a EJB(3.0)-Hibernate(3) project along with Oracle 11g DB.
First of all due to the security reason I am unable to share my code, I am really sorry for that.
Issue is :
In my Application from different locations, DB has been called for retrieving, persisting, merging records which deals with a number of tables in DB.
But, for a particular retrieve query(select query which is fetching only a single record by putting a primary key data in where clause) from my Application, it is taking too much time(almost 4 minutes) for getting the response from DB(response is proper with a single record).
I can track the time by debugging from calling point to DB inside Application and the retrieving response from DB to my Application.
So, I want to know why for a single record fetching, it is taking so much time where for other queries it's fetching within seconds or micro-seconds.
And also want to know how to track the time-stamp of [query request from Application just hitting the Database after connecting DB through Hibernate Layer] and also what is going on inside the DB for this flow.
Please give me some advice or suggestions from your entire work experience if you facing such kind of issue and also help me how to track the whole flow
Application <-> Hibernate Layer <-> Database
Thanks in advance!!!

How to do table operations in Google BigQuery?

Wanted some advice on how to deal with table operations (rename column) in Google BigQuery.
Currently, I have a wrapper to do this. My tables are partitioned by date. eg: if I have a table name fact, I will have several tables named:
fact_20160301
fact_20160302
fact_20160303... etc
My rename column wrapper generates aliased queries. ie. if I want to change my table schema from
['address', 'name', 'city'] -> ['location', 'firstname', 'town']
I do batch query operation:
select address as location, name as firstname, city as town
and do a WRITE_TRUNCATE on the parent tables.
My main issues lies with the fact that BigQuery only supports 50 concurrent jobs. This means, that when I submit my batch request, I can only do around 30 partitions at a time, since I'd like to reserve 20 spots for ETL jobs that are runnings.
Also, I haven't found of a way where you can do a poll_job on a batch operation to see whether or not all jobs in a batch have completed.
If anyone has some tips or tricks, I'd love to hear them.
I can propose two options
Using View
Views creation is very simple to script out and execute - it is fast and free to compare with cost of scanning whole table with select into approach.
You can create view using Tables: insert API with properly set type property
Using Jobs: insert EXTRACT and then LOAD
Here you can extract table to GCS and then load it back to GBQ with adjusted schema
Above approach will a) eliminate cost cost of querying (scan) tables and b) can help with limitations. But might not depends on the actual volumke of tables and other requirements you might have
The best way to manipulate a schema is through the Google Big Query API.
Use the tables get api to retrieve the existing schema for your table. https://cloud.google.com/bigquery/docs/reference/v2/tables/get
Manipulate your schema file, renaming columns etc.
Again using the api perform an update on the schema, setting it to your newly modified version. This should all occur in one job https://cloud.google.com/bigquery/docs/reference/v2/tables/update

select & update in both live & archive tables in the same schema

The application that I am working on currently has an archive logic where all the records older than 6 months will be moved to history tables in the same schema, but on a different table space. This is achieved using a stored procedure which is being executed daily.
For ex. TABLE_A (live, latest 6 months) ==> TABLE_A_H (archive, older than 6 months, up to 8 years).
So far no issues. Now the business has come up with a new requirement where the archived data should also be available for selects & updates. The updates can happen even for an year old data.
selects could be direct like,
select * from TABLE_A where id = 'something'
Or it could be open-ended query like,
select * from TABLE_A where created_date < 'XYZ'
Updates are usually for specific records.
These queries are exposed as REST services to the clients. There are possibilities of junk/null values (no way the application can sanitize the input).
The current snapshot of the DB is
PARENT_TABLE (10M records, 10-15K for each record)
CHILD_TABLE_ONE (28M records, less than 1K for each record)
CHILD_TABLE_TWO (25M records, less than 1K for each record)
CHILD_TABLE_THREE (46M records, less than 1K for each record)
CHILD_TABLE_FOUR (57M records, less than 1K for each record)
Memory is not a constraint - I can procure additional 2 TB of space if needed.
The problem is how do I keep the response time lower when it accesses the archive tables?.
What are all the aspects that I should consider when building a solution?
Solution1: For direct select/update, check if the records are available in live tables. If present, perform the operation on the live tables. If not, perform the operation on the archive tables.
For open ended queries, use UNION ???
Solution2: Use month-wise partitions and keep all 8 years of data in single set of tables?. Does oracle handles 150+ Millions of records in single table for select/update efficiently?
Solution3: Use NoSQL like Couchbase?. Not a feasible solution at the moment because of the infra/cost involved.
Solution4: ???
Tech Stack: Oracle 11G, J2EE Application using Spring/Hibernate (Java 1.6) hosted on JBoss.
Your response will be very much appreciated.
If I were you, I'd go with Solution 2, and ensure that you have the relevant indexes available for the types of queries you expect to be run.
Partitioning by month means that you can take advantage of partition pruning, assuming that the queries involve the column that's being partitioned on.
It also means that your existing code does not need to be amended in order to select or update the archived data.
You'll have to set up housekeeping to add new partitions though - unless you go for interval partitioning, but that has its own set of gotchas.

Resources