Spring batch my sql dump parsing issue - spring

Problem statement
I need to replace certain dates in my sql dump file and create another dump file. First I need to parse the create table definition and store information about date type columns. I may need to skip certain columns. After that I need to parser the "Insert into table" statements. Break this statement into rows and then into columns and then replace the dates.
Solution I am using Spring batch where reader is Composite Reader. I read the entire table definition (Drop statement, Create statement and Insert statements) in memory and then pass to processor to replace the dates. During reading I also split the insert statements into Rows and Column.
Problem This solution works fine for small dump but getting out of memory for large dumps. e.g. There is one table having long blobs and size is 2GB.
Any idea how can I fix the problem or Spring batch is not a right solution for this. Any help will be highly appreciated.

Related

Creating txt file using Pentaho

I'm currently trying to create txt files from all tables in the dbo schema
I have like 200s-300s tables there, so it would takes up too much times to create it manually..
I was thinking for creating a loop.
so as example (using AdventureWorks2019) :
select t.name as table_name
from sys.tables t
where schema_name(t.schema_id) = 'Person'
order by table_name;
This would get all the table name within the Person schema.
So I would loop :
Table input : select * from ${table_name}
But then i realized that for txt files, i need to declare all the field and their data types in pentaho, so it would become a problems.
Any ideas how to do this "backup" txt files?
Using Metadata Injection and more queries to the schema catalog tables in SQL Server. You not only need to retrieve the table name, you would need to afterwards retrieve the columns in that table and the data types, and inject that information (metadata) to the text output step.
You have in the samples directory of your spoon installation an example on how to use Metadata Injection, use it, along with the documentation, to build a simple example (the check to generate a transformation with the metadata you have injected is of great use to debug)
I have something similar to copy data from one database to another, both in Oracle, but with SQL Server you have similar catalog tables as in Oracle to retrieve the information you need. I created a simple, almost empty transformation to read one table and write to another. This transformation has almost no information, only the database origin in the Table Input step and the target database in the Table Output step:
And then I have a second transformation where I fill up all the information (metadata) to inject: The query to perform in the Table Input step, and all the data I need in the Table Output: Target table, if I need to truncate before inserting, the columns from (stream field) and to (Table field):

How to tune Oracle's SQL*Loader append?

I am writing a Java program that creates a CSV file with 6,800,000 records conforming to specific distribution parameters and populates a table using Oracle's SQL*Loader.
I am testing my program using different sizes of records (50,000 and 500.000). The CSV File generation by itself is quite fast, using concurrency it takes miliseconds to create and insert these records into a file.
Inserting said records, on the other hand, is taking too long. Reading the log file generated by SQL*Loader, it takes 00:00:32.90 seconds to populate the table with 50,000 records and 00:07:58.83 minutes to populate it with 500,000.
SQL*Loader benchmarks I've googled show much better perfomances, such as 2 million rows in less than 2 minutes. I've followed this tutorial to improve the time, but it barely changed at all. There's obviously something wrong here, but I don't know what.
Here's my control file:
OPTIONS (SILENT=ALL, DIRECT=TRUE, ERRORS=50, COLUMNARRAYROWS=50000, STREAMSIZE=500000)
UNRECOVERABLE LOAD DATA
APPEND
INTO TABLE MY_TABLE
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
...
Another important info: I've tried using PARALLEL=TRUE, but I get the ORA-26002 error (Table MY_TABLE has index defined upon it). Unfortunatly, running with skip_index_maintenance renders the index UNUSABLE.
What am I doing wrong?
Update
I have noticed that soon after running the program (less than a second), all rows are already present in the database. Yet, SQL*Loader is still busy and only finishes after 32-45 seconds.
What could it be doing?
One thought would be to create an external table and set the name to the csv file. Then after creating the file you can run a sql script inside Oracle to process the data directly.
Or, look at the following (copied from here:)
This issue is caused when using the bulk load option in parallel to load an Oracle target that has an index on it. An Oracle limitation.
To resolve this issue do one of the following:
· Change the target load option to Normal.
· Disable the enable parallel mode option in relational connection browser.
· Drop the indexes before loading.
· Or create a pre- and post-session sql to drop and create indexes and key constraints

How to update/insert a table without creating a new table (temporary or otherwise)

Background: My team has an etl job that updates an aggregate table. Each row contains data for a particular date, but this row can and will get updated after the row date (which means any row can contain data from multiple jobs). This ETL job missed some data for one day last week and now I need to backfill it.
Problem: I have the missing data, and what I was planning on doing was dumping that data into a temporary table and then merging it with the agg table. That way I can deal with whether the ETL job already contains a row for that data (update) or whether a new row needs to be added (insert), but I don't have sufficient permissions to create a temp table, and I'd prefer not to involve the DBA.
Question: Can I do an insert/update sort of behavior without creating a temporary table (this is Oracle SQL by the way).
Edit: The data is coming from a tsv file.
Why do you want to avoid involving the DBA? The DBA should have full knowledge of what's going on in the database, as they are ultimately responsible for the condition of the data within it. So you shouldn't be playing sneaky commando with them.
As you have a file of missing data, the easiest way to present it to the database is with an external table. This requires the creation of the table and probably a directory object as well. You will need the DBA's help with this task.
The only way to avoid creating database objects is to convert your TSV file into a series of DML statements. An IDE which supports regex and/or records macros will prove invaluable here. I like TextPad; other editors are available.
The DML statement for doing upserts in Oracle is the MERGE statement. The one thing you need to watch for is recency. Your missing data comes from last week. If a row exists it may have have been added or amended in the intervening period. You must write your MERGE statement so it does not overwrite more recent data with the older stuff. Hopefully your table has useful metadata columns such as DATE_CREATED and LAST_UPDATED.

Attempting to use SQL-Developer to analyze a system table dump created with 'exp'

I'm attempting to recover the data from a specific table that exists in a system table dump I performed earlier. I would like to append the rows existing in the dump to any rows that may exist in the active table. The problem is, it's likely that the name of the table in the dump is not the same as what exists in the database currently (They're dynamically created with a prefix of ARC_TREND_). In addition, I don't know the name of the table as it exists in the dump, I was hoping to use SQL Developer to analyze the dump file as I can recognize the correct table by it's columns and it's existing rows.
While i'm going on blind faith that SQL Developer can work with my dump file, when attempting to open it, i'm getting a Java Heap OutOfMemory exception raised. I've adjusted the maximum heap size from 640m to 1024m in both sqldeveloper.bat and in sqldeveloper.conf, but to no avail.
Can someone recommend a course of action for me to take to recover the data from a table which exists in a exp created dump file? A graphical tool would be nice, but I'm no stranger to the command line. I need to analyze the tables that exist in the dump in order to pick the correct one out. Then I assume I can use imp TABLE= to bring it back into the active instance. It likely won't match the existing table name, so I will use SQL Developer to copy the rows from the imported table to the table where I need them to be.
The dump was taken from a Linux server running 10g, and will be imported to (the same server & database instance, upgraded) an 11g instance of the same database.
Thanks
Since you're referring to imp rather than impdp, I assume this wasn't exported with data pump. Either way, I doubt you'll get anything useful through SQL Developer.
Fortunately most of what you're trying to do is quite easy from the command line; just run imp with the INDEXFILE parameter, which will give you a text file containing all the table (commented out with REM) and index creation commands. From that you should be able to spot the table from its column names.
You can't really see any row data though, so if there's more than one possible match you might need to import several tables and inspect the data in them in the database to see which one you really want.

What is the fastest way to insert data into an Oracle table?

I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?
It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.
Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm
Drop the index, then insert the rows, then re-create the index.
If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html
Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');
Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed
Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html
Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo

Resources