Very slow UPDATE with PutDatabaseRecord processor - oracle

I have 2 PutDatabaseRecord which are writing to Oracle DB.
This is part of the schema:
The red square there ins't error, but information messages, thus I put the processor to INFO mode. So it tells about fetching the schema, and "Commit because of batch size" with SQL update command.
The problem is, that first of them: ml_task, processes several millions records in few minutes, but the other one: ml_sales - stack with 1000 records several hours... This stacks both at night, when DB is much loaded and at a day time.
In a same moment, update goods.ml_[name] set load_date = sysdate statements takes same time via the SQLDeveloper interface - both for ml_task and ml_sales. It takes around of 10 minutes as night and several minutes at a day.
Both of them working with same pooling service.
This is the configuration of the bottom part of the service:
Both of the processors have same configuration, except the table name and update keys.
I tried set Max Batch Size to zero, it have no influence.
Both of the processors configured run in one thread,but I tried to configure 10 threads - there is no influence.
Also, there isn't lack of connections to the DB at all, I have around of 10 processors, every one uses one thread, so 50 connections, I think is enough.
There are around 5 millions record they are processing.
This is JSON of ml_task:
[{"DOC_ID": 1799041400,"LINE_D":694098344,"LOAD_DATE":"16-Jul-21"} ... ]
This is something similar to Nifi's update for ml_task:
update goods.ml_task
set load_date = sysdate
where doc_id = ?
and line_id = ?;
This is the table:
Name Null? Type
------------- -------- ------
DOC_ID NOT NULL NUMBER
LINE_ID NOT NULL NUMBER
ORG_ID NOT NULL NUMBER
NMCL_ID NOT NULL NUMBER
ASSORTMENT_ID NOT NULL NUMBER
START_DATE NOT NULL DATE
END_DATE DATE
ITEMS_QNT NUMBER
MODIFY_DATE DATE
LOAD_DATE DATE
This is a JSON of ml_sales:
[{"REP_DATE":"06-Jul-21","NMCL_ID":336793,"ASSORT_ID":7,"RTT_ID":92,"LOAD_DATE":"16-Jul-21"} ... ]
Request for ml_sales is such as:
update goods.ml_sales set load_date = sysdate
where nmcl_id = ?
and assort_id = ?
and rtt_id = ?
and rep_date = ?;
And the table:
Name Null? Type
----------- -------- ----------
REP_DATE NOT NULL DATE
NMCL_ID NOT NULL NUMBER(38)
ASSORT_ID NOT NULL NUMBER
RTT_ID NOT NULL NUMBER(38)
OUT_ITEMS NUMBER
MODIFY_DATE DATE
LOAD_DATE DATE
What can be reason for so slow update of ml_sales?
UPDATE
I put all the circuits to STOP except the problematic one, I committed all sessions in SQLDeveloper... and it is the same result... very long...
UPDATE
As I mentioned above, I actually copied the ml_sales table to other scheme and this didn't influenced the result.

I guess there were problem in lack of indexes in Oracle. I asked our DBA's to check the problem in DB side, they fixed it, now the UPDATE works as fast as for ml_task table, but unfortunately I still don't know what exactly was the problem, thus out DBA left for vocation. So, again, very possible that it were indexes problem.

Related

Best way to create tables for huge data using oracle

Functional requirement
We kinda work for devices. Each device, roughly speaking, has its unique identifier, an IP address, and a type.
I have a routine that pings all devices that has an IP address.
This routine is nothing more than a C# console application, which runs every 3 minutes trying to ping the IP address of each device.
The result of the ping I need to store in the database, as well as the date of verification (regardless of the result of the ping).
Then we got into the technical side.
Technical part:
Assuming my ping and bank structuring process is ready from the day 01/06/2016, I need to do two things:
Daily extraction
Extraction in real time (last 24 hours)
Both should return the same thing:
Devices that are unavailable for more than 24 hours.
Devices that are unavailable for more than 7 days.
Understood to be unavailable the device to be pinged AND did not responded.
It is understood by available device to be pinged AND answered successfully.
What I have today and works very badly:
A table with the following structure:
create table history (id_device number, response number, date date);
This table has a large amount of data (now has 60 million, but the trend is always grow exponentially)
** Here are the questions: **
How to achieve these objectives without encountering problems of slowness in queries?
How to create a table structure that is prepared to receive millions / billions of records within my corporate world?
Partition the table based on date.
For partitioning strategy consider performance vs maintanence.
For easy mainanence use automatic INTERVAL partitions by month or week.
You can even do it by day or manually pre-define 2 day intervals.
You query only needs 2 calendar days.
select id_device,
min(case when response is null then 'N' else 'Y' end),
max(case when response is not null then date end)
from history
where date > sysdate - 1
group by id_device
having min(case when response is null then 'N' else 'Y' end) = 'N'
and sysdate - max(case when response is not null then date end) > ?;
If for missing responses you write a default value instead of NULL, you may try building it as an index-organized table.
You need to read about Oracle partitioning.
This statement will create your HISTORY table partitioned by calendar day.
create table history (id_device number, response number, date date)
PARTITION BY RANGE (date)
INTERVAL(NUMTOYMINTERVAL(1, 'DAY'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('5-24-2016', 'DD-MM-YYYY')),
PARTITION p1 VALUES LESS THAN (TO_DATE('5-25-2016', 'DD-MM-YYYY'));
All your old data will be in P0 partition.
Starting 5/24/2016 a new partition will be automatically created each day.
HISTORY now is a single logical object but physically it is a collection of identical tables stacked on top of each other.
Because each partitions data is stored separately, when a query asks for one day worth of data, only a single partition needs to be scanned.

Get the recent N records from a very huge database in oracle

I am working on a very large Oracle database with around 90 million records. An I wanted to get the recent 100 records for the UI purpose. I was trying to achieve this using order by clause on the date column in the schema, even though I am able to get the recent records but it takes around 20-25 mints for the processing.
E.g. Schema
message_id varchar2,
message_head varchar2,
message_body varchar2,
----------
----------
message_date date
I was using the message_date to sort for the recent messages.
Could anyone please help me out to provide the latest 300 messages in less time (say less than 1 minute) And i am also wondering how big data-driven companies like facebook and twitter manages to give the latest posts and tweets within seconds.
Thanks in advance
1) Create an index on message_date
2) Add a sequence column (that is indexed also) and retrieve the last X records using that column
Note that if you're working on a 'live' database that the last X records is changing all the time.
Another approach, or maybe even complementary...
Use optimizer mode hint for FIRST N ROWS. That will instruct the optimizer to optimize for a fast retrieval of the first records.
SELECT /*+ FIRST_ROWS(10) */ employee_id, last_name, salary, job_id
FROM employees
WHERE department_id = 20;
will optimize for retrieving the 10 first rows. You should do 100.

How to guarantee order of primary key and timestamp in oracle database

I am creating some record which have id, ts ... So firstly I call select to get ts and id:
select SEQ_table.nextval, CURRENT_TIMESTAMP from dual
and then I call insert
insert into table ...id, ts ...
this works good in 99 % but sometimes when there is a big load the order of record is bad because I need record.id < (record+1).id and record.ts < (record+1).ts but this conditional is met. How I can solve this problem ? I am using oracle database.
You should not use the result of a sequence for ordering. This might look strange but think about how sequences are cached and think about RAC. Every instance has it's own sequence cache .... For performance you need big caches. sequences had better be called random unique key generators that happen to work sequenctially most of the time.
The timestamp format has a time resolution upto microsecond level. When hardware becomes quicker and load increases it could be that you get multiple rows at the same time. There is not much you can do about that, until oracle takes the resolution a step farther again.
Use an INSERT trigger to populate the id and ts columns.
create table sotest
(
id number,
ts timestamp
);
create sequence soseq;
CREATE OR REPLACE TRIGGER SOTEST_BI_TRIG BEFORE
INSERT ON SOTEST REFERENCING NEW AS NEW FOR EACH ROW
BEGIN
:new.id := soseq.nextval;
:new.ts := CURRENT_TIMESTAMP;
END;
/
PHIL#PHILL11G2 > insert into sotest values (NULL,NULL);
1 row created.
PHIL#PHILL11G2 > select * from sotest;
ID TS
---------- ----------------------------------
1 11-MAY-12 13.29.33.771515
PHIL#PHILL11G2 >
You should also pay attention to the other answer provided. Is id meant to be a meaningless primary key (it usually is in apps - it's just a key to join on)?

select only new row in oracle

I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;

History records, missing records, filling in the blanks

I have a table that contains a history of costs by location. These are updated on a monthly basis.
For example
Location1, $500, 01-JAN-2009
Location1, $650, 01-FEB-2009
Location1, $2000, 01-APR-2009
if I query for March 1, I want to return the value for Feb 1, since March 1 does not exist.
I've written a query using an oracle analytic, but that takes too much time (it would be fine for a report, but we are using this to allow the user to see the data visually through the front and and switch dates, requerying takes too long as the table is something like 1 million rows).
So, the next thought I had was to simply update the table with the missing data. In the case above, I'd simply add in a record identical to 01-FEB-2009 except set the date to 01-MAR-2009.
I was wondering if you all had thoughts on how to best do this.
My plan had been to simply create a cursor for a location, fetch the first record, then fetch the next, and if the next record was not for the next month, insert a record for the missing month.
A little more information:
CREATE TABLE MAXIMO.FCIHIST_BY_MONTH
(
LOCATION VARCHAR2(8 BYTE),
PARKALPHA VARCHAR2(4 BYTE),
LO2 VARCHAR2(6 BYTE),
FLO3 VARCHAR2(1 BYTE),
REGION VARCHAR2(4 BYTE),
AVG_DEFCOST NUMBER,
AVG_CRV NUMBER,
FCIDATE DATE
)
And then the query I'm using (the system will pass in the date and the parkalpha). The table is approx 1 million rows, and, again, while it takes a reasonable amount of time for a report, it takes way too long for an interactive display
select location, avg_defcost, avg_crv, fcimonth, fciyear,fcidate from
(select location, avg_defcost, avg_crv, fcimonth, fciyear, fcidate,
max(fcidate) over (partition by location) my_max_date
from FCIHIST_BY_MONTH
where fcidate <='01-DEC-2008'
and parkalpha='SAAN'
)
where fcidate=my_max_date;
The best way to do this is to create a PL/SQL stored procedure that works backwards from the present and runs queries that fail to return data. Each month that it fails to return data it inserts a row for the missing data.
create or replace PROCEDURE fill_in_missing_data IS
cursor have_data_on_date is
select locaiton, trunc(date_filed) have_date
from the_table
group by location, trunc(date_field)
order by desc 1
;
a_date date;
day_offset number;
n_days_to_insert number;
BEGIN
a_date := trunc(sysdate);
for r1 in fill_in_missing_data loop
if r1.have_date < a_date then
-- insert dates in a loop
n_days_to_insert := a_date - r1.have_date; -- Might be off by 1, need to test.
for day_offset in 1 .. n_days_to_insert loop
-- insert missing day
insert into the_table ( location, the_date, amount )
values ( r1.location, a_date-day_offset, 0 );
end loop;
end if;
a_date := r1.have_date;
-- this is a little tricky - I am going to test this and update it in a few minutes
end loop;
END;
Filling in the missing data will (if you are careful) make the queries much simpler and run faster.
I would also add a flag to the table to indicate that the data is missing data filled in so that if
you need to remove it (or create a view without it) later you can.
I have filled in missing data and also filled in dummy data so that outer join were not necessary so as to improve query performance a number of times. It is not "clean" and "perfect" but I follow Leflar's #1 Law, "always go with what works."
You can create a job in Oracle that will automatically run at off-peak times to fill in the missing data. Take a look at: This question on stackoverflow about creating jobs.
What is your precise use case underlying this request?
In every system I have worked on, if there is supposed to be a record for MARCH and there isn't a record for MARCH the users would like to know that fact. Apart from anything they might want to investigate why the MARCH record is missing.
Now if this is basically a performance issue then you ought to tune the query. Or if it presentation issue - you want to generate a matrix of twelve rows and that is difficult if a doesn't have a record for some reason - then that is a different matter, with a variety of possible solutions.
But seriously, I think it is a bad practice for the database to invent replacements for missing records.
edit
I see from your recent comment on your question that is did turn out to be a performance issue - indexes fixed the problem. So I feel vindicated.

Resources