I have a simple database on which I want to do a simple query.
These are the columns for my database table:
external_id
TimeStamp
Value
Validation
Reason
Datum
Uur
Afronden
The columns are:
external_id is the id of a certain meter
Timestamp does nothing
Value is the value of that meter
Validation is just yes or no
Reason just got a varchar value with a reason
Datum is the date
Uur is the hour of that that
Afronden is a column needed for rounding
The query I want to execute got as goal to get the highest and lowest value of the sum of the value of each day.
As you see, each day got divided in hours, have to check or the datum is the same or changes and get the total value by that.
This is my query:
Declare #totaal bigInt
Declare #tussentotaal bigint
Declare #Datum varchar
Declare #datumverschil varchar
Declare #hoogste bigint
Declare #laagste bigint
Declare #teller bigint
Declare #tellettotaal bigint
set #tellettotaal = (select count(*) from cresent_opdracht_de_proost_wim.dbo.[test])
Set #teller = 1
SET #datum = (Select top(1) datum
from cresent_opdracht_de_proost_wim.dbo.[test]
order by afronden asc)
Set #datumverschil = #Datum
set #tussentotaal = 0
set #totaal = 0
set #hoogste = 1775000006856
set #laagste = 1775000006856
while #teller <= #tellettotaal
begin
if #teller = 1
Begin
set #tussentotaal = (select top(1) value
from cresent_opdracht_de_proost_wim.dbo.[test]
order by afronden asc)
if #tussentotaal != 0
begin
Set #tussentotaal = #tussentotaal/100
end
End
Else
begin
SET #tussentotaal = (Select top(1) value
from (select top (#teller) *
from cresent_opdracht_de_proost_wim.dbo.[test]) q
order by afronden desc)
Set #tussentotaal = #tussentotaal/100
end
if #tussentotaal != 0
Begin
Set #totaal = #totaal + #tussentotaal
end
SET #teller= #teller + 1
Set #datumverschil = (Select top(1) datum
from (select top (#teller) *
from cresent_opdracht_de_proost_wim.dbo.[test]) q
order by afronden desc)
if #datum != #datumverschil
Begin
if #totaal >= #hoogste
begin
set #hoogste = #totaal
end
if #totaal <= #laagste
begin
if #totaal != 0
Begin
set #laagste = #totaal
end
end
Set #datum = #datumverschil
set #totaal = 0
select #teller As teller
end
end
Select #hoogste As hoogste
Select #laagste As laagste
After after 22 minutes only 44000 rows were processed.
Does anybody know how I can optimise my query?
Hm, ever thought that it may be an idea not to use a broken procedural approach?
First, your logic may be broken - the time osounds too large, so maybe you just have a dead loop. Sorry, it is very hard for most peope to understand fringe langauges - i.e. anything that is not english, so you table and field names make no sense. Nonsense hardly is debuggable by people not understanting it.
THAT SAID: I think you can do with a LOT less queries in total by not running a senseless loop over top 1 - instead get everything ordered descending. Brutallly speaking: Get rid of the loop and make ONE Statement that compiles your result set. Seriously. That should be possible. Do not try to outsmart a query optimizer and go back to write procedural database code like many people did 20 years ago at dbase times. In general. your query is linear which means no parallelization and your micro managing of conditions means no query optimizer can be doind smart things (as optimization is statement by statement). Define the result set in one statement and you may wonder how efficient is it.
I am quite sure at the end you will find out that you basically try to outsmart something that is better than you writing queries. Use SQL - define the result set.
And finally, without knowing your data distribution and - ouch - indices - we can not really help. Maybe you simply miss any sensible index? Who knows - you dont tell us anything (no query plan output, no table defintiions, no index definitions).
It would possibly also have not to abuse data types: Declare #Datum varchar - OUCH. Ever heard of the DATE data type in sql server?
Related
I have created a stored procedure which is taking too much of time to update the columns of the table. Say 3 hrs to update 2.5k records out of 43k records.
So can I reduce the time of updating the records. Below is my logic for the same.
procedure UPDATE_MST_INFO_BKC
(
P_SAPID IN NVARCHAR2
)
as
v_cityname varchar2(500):='';
v_neid varchar2(500):='';
v_latitude varchar2(500):='';
v_longitude varchar2(500):='';
v_structuretype varchar2(500):='';
v_jc_name varchar2(500):='';
v_jc_code varchar2(500):='';
v_company_code varchar2(500):='';
v_cnt number :=0;
begin
select count(*) into v_cnt from structure_enodeb_mapping where RJ_SAPID=P_SAPID and rownum=1;
if v_cnt > 0 then
begin
select RJ_CITY_NAME, RJ_NETWORK_ENTITY_ID,LATITUDE,LONGITUDE,RJ_STRUCTURE_TYPE,RJ_JC_NAME,RJ_JC_CODE,'6000'
into v_cityname,v_neid,v_latitude, v_longitude, v_structuretype,v_jc_name,v_jc_code,v_company_code from structure_enodeb_mapping where RJ_SAPID=P_SAPID and rownum=1;
update tbl_ipcolo_mast_info set
CITY_NAME = v_cityname,
NEID = v_neid,
FACILITY_LATITUDE = v_latitude,
FACILITY_LONGITUDE = v_longitude,
RJ_STRUCTURE_TYPE = v_structuretype,
RJ_JC_NAME = v_jc_name,
RJ_JC_CODE = v_jc_code,
COMPANY_CODE = v_company_code
where SAP_ID=P_SAPID;
end;
end if;
end UPDATE_MST_INFO_BKC;
What adjustments can I make to this?
As far as I understand your code, It is updating TBL_IPCOLO_MAST_INFO having SAP_ID = P_SAPID Means It is updating one record and you must be calling the procedure for each record.
It is a good practice of calling the procedure once and update all the record in one go. (In your case 2.5k records must be updated in one call of this procedure only)
For your requirement, Currently, I have updated the procedure code to only execute MERGE statement, which will be same as multiple SQLs in your question for single P_SAPID.
PROCEDURE UPDATE_MST_INFO_BKC (
P_SAPID IN NVARCHAR2
) AS
BEGIN
MERGE INTO TBL_IPCOLO_MAST_INFO I
USING (
SELECT
RJ_CITY_NAME,
RJ_NETWORK_ENTITY_ID,
LATITUDE,
LONGITUDE,
RJ_STRUCTURE_TYPE,
RJ_JC_NAME,
RJ_JC_CODE,
'6000' AS COMPANY_CODE,
RJ_SAPID
FROM
STRUCTURE_ENODEB_MAPPING
WHERE
RJ_SAPID = P_SAPID
AND ROWNUM = 1
)
O ON ( I.SAP_ID = O.RJ_SAPID )
WHEN MATCHED THEN
UPDATE SET I.CITY_NAME = O.RJ_CITY_NAME,
I.NEID = O.RJ_NETWORK_ENTITY_ID,
I.FACILITY_LATITUDE = O.LATITUDE,
I.FACILITY_LONGITUDE = O.LONGITUDE,
I.RJ_STRUCTURE_TYPE = O.RJ_STRUCTURE_TYPE,
I.RJ_JC_NAME = O.RJ_JC_NAME,
I.RJ_JC_CODE = O.RJ_JC_CODE,
I.COMPANY_CODE = O.COMPANY_CODE;
END UPDATE_MST_INFO_BKC;
Cheers!!
3 hours? That's way too much. Are sap_id columns indexed? Even if they aren't, data set of 43K rows is just too small.
How do you call that procedure? Is it part of another code, perhaps some unfortunate loop which does something row-by-row (which is, in turn, slow-by-slow)?
A few objections:
are all those variables' datatypes really varchar2(500)? Consider declaring them so that they'd take table column's datatype, e.g. v_cityname structure_enodeb_mapping.rj_city_name%type;. Also, there's no need to explicitly say that their value is null (:= ''), it is so by default
select statement which checks whether there's something in the table for that parameter's value should be rewritten to use EXISTS as it should perform better than rownum = 1 condition you used.
also, consider using exception handlers (no-data-found if there's no row for a certain ID; too-many-rows if there are two or more rows)
select statement that collects data into variables has the same condition; do you really expect more than a single row for each ID (passed as a parameter)?
Anyway, the whole procedure's code can be shortened to a single update statement:
update tbl_ipcolo_mst_info t set
(t.city_name, t.neid, ...) = (select s.rj_city_name,
s.rj_network_entity_id, ...
from structure_enodeb_mapping s
where s.rj_sapid = t.sap_id
)
where t.sap_id = p_sapid;
If there is something to be updated, it will be. If there's no matching t.sap_id, nothing will happen.
I will try to present my problem as simplified as possible.
Assume that we have 3 tables in Oracle 11g.
Persons (person_id, name, surname, status, etc )
Actions (action_id, person_id, action_value, action_date, calculated_flag)
Calculations (calculation_id, person_id,computed_value,computed_date)
What I want is for each person that meets certain criteria (let's say status=3)
I should get the sum of action_values from the Actions table where calculated_flag=0. (something like this select sum(action_value) from Actions where calculated_flag=0 and person_id=current_id).
Then I shall use that sum in a some kind of formula and update the Calculations table for that specific person_id.
update Calculations set computed_value=newvalue, computed_date=sysdate
where person_id=current_id
After that calculated_flag for participated rows will be set to 1.
update Actions set calculated_flag=1
where calculated_flag=0 and person_id=current_id
Now this can be easily done sequentially, by creating a cursor that will run through Persons table and then execute each action needed for the specific person.
(I don't provide the code for the sequential solution as the above is just an example that resembles my real-world setup.)
The problem is that we are talking about quite big amount of data and sequential approach seems like a waste of computational time.
It seems to me that this task could be performed in parallel for number of person_ids.
So the question is:
Can this kind of task be performed using parallelization in PL/SQL?
What would the solution look like? That is, what special packages (e.g. DBMS_PARALLEL_EXECUTE), keywords (e.g. bulk collect), methods should be used and in what manner?
Also, should I have any concerns about partial failure of parallel updates?
Note that I am not quite familiar with parallel programming with PL/SQL.
Thanks.
Edit 1.
Here my pseudo code for my sequential solution
procedure sequential_solution is
cursor persons_of_interest is
select person_id from persons
where status = 3;
tempvalue number;
newvalue number;
begin
for person in persons_of_interest
loop
begin
savepoint personsp;
--step 1
select sum(action_value) into tempvalue
from actions
where calculated_flag = 0
and person_id = person.person_id;
newvalue := dosomemorecalculations(tempvalue);
--step 2
update calculations set computed_value = newvalue, computed_date = sysdate
where person_id = person.person_id;
--step 3
update actions set calculated_flag = 1;
where calculated_flag = 0 and person_id = person.person_id;
--step 4 (didn't mention this step before - sorry)
insert into actions
( person_id, action_value, action_date, calculated_flag )
values
( person.person_id, 100, sysdate, 0 );
exception
when others then
rollback to personsp;
-- this call is defined with pragma AUTONOMOUS_TRANSACTION:
log_failure(person_id);
end;
end loop;
end;
Now, how would I speed up the above either with forall and bulk colletct or with parallel programming Under the following constrains:
proper memory management (taking into consideration large amount of data)
For a single person if one part of the step sequence fails - all steps should be rolled back and the failure logged.
I can propose the following. Let's say you have 1 000 000 rows in persons table, and you want to process 10 000 persons per iteration. So you can do it in this way:
declare
id_from persons.person_id%type;
id_to persons.person_id%type;
calc_date date := sysdate;
begin
for i in 1 .. 100 loop
id_from := (i - 1) * 10000;
id_to := i * 10000;
-- Updating Calculations table, errors are logged into err$_calculations table
merge into Calculations c
using (select p.person_id, sum(action_value) newvalue
from Actions a join persons p on p.person_id = a.person_id
where a.calculated_flag = 0
and p.status = 3
and p.person_id between id_from and id_to
group by p.person_id) s
on (s.person_id = c.person_id)
when matched then update
set c.computed_value = s.newvalue,
c.computed_date = calc_date
log errors into err$_calculations reject limit unlimited;
-- updating actions table only for those person_id which had no errors:
merge into actions a
using (select distinct p.person_id
from persons p join Calculations c on p.person_id = c.person_id
where c.computed_date = calc_date
and p.person_id between id_from and id_to)
on (c.person_id = p.person_id)
when matched then update
set a.calculated_flag = 1;
-- inserting list of persons for who calculations were successful
insert into actions (person_id, action_value, action_date, calculated_flag)
select distinct p.person_id, 100, calc_date, 0
from persons p join Calculations c on p.person_id = c.person_id
where c.computed_date = calc_date
and p.person_id between id_from and id_to;
commit;
end loop;
end;
How it works:
You split the data in persons table into chunks about 10000 rows (depends on gaps in numbers of ID's, max value of i * 10000 should be knowingly more than maximal person_id)
You make a calculation in the MERGE statement and update the Calculations table
LOG ERRORS clause prevents exceptions. If an error occurs, the row with the error will not be updated, but it will be inserted into a table for errors logging. The execution will not be interrupted. To create this table, execute:
begin
DBMS_ERRLOG.CREATE_ERROR_LOG('CALCULATIONS');
end;
The table err$_calculations will be created. More information about DBMS_ERRLOG package see in the documentation.
The second MERGE statement sets calculated_flag = 1 only for rows, where no errors occured. INSERT statement inserts the these rows into actions table. These rows could be found just with the select from Calculations table.
Also, I added variables id_from and id_to to calculate ID's range to update, and the variable calc_date to make sure that all rows updated in first MERGE statement could be found later by date.
I am converting some code in Access over to Oracle, and one of the queries in Access uses a table that I am unable to use in Oracle. I am unable to create new tables, so I am trying to figure out a way to use the logic behind the table in the FOR section of my select.
The logic of the table is similar to:
FOR i = 1 To 100
number = number + 1
.AddNew
!tbl_number = number
NEXT i
I'm trying to convert this to oracle, and so far I have:
FOR i in 1 .. 100 LOOP
number := number + 1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
I was thinking a cursor or a record would be the answer, but I can't seem to figure out how to implement that. In the end I basically want to have:
SELECT
table.number
FROM
(
--My for loop logic
) table
EDIT
The calculation is a bit more complicated; that was just an example. They aren't actually sequential, and there isn't really a pattern to rows.
EDIT
Here is a more complicated version of the for loop which is closer to what I'm actually doing:
FOR i in 1 .. 100 LOOP
number1 := number1 + 7;
number2 := (number2 + 8) / number1;
--This is where I am stuck; How do I simulate the table part
END LOOP;
You could use a recursive query (assuming you are on Oralce 11gR2 or later):
with example(idx, number1, number2) as (
-- Anchor Section
select 1
, 1 -- initial value
, 2 -- initial value
from dual
union all
-- Recursive Section
select prev.idx + 1
, prev.number1 + 7
, (prev.number2 + 8) / prev.number1
from example prev
where prev.idx < 100 -- The Guard
)
select * from example;
In the Anchor section set all the values for your first record. Then in the Recursive section setup the logic to determine the next records values as a function of the prior records values.
The Anchor section could select the initial values from some other table rather than being hard coded as in my example.
The recursive section needs to select from the named subquery (in this case example) but may also join to other tables as needed.
You need to generate a set with sequential integer numbers. Maybe you can use this (for Oracle 10g and above):
SELECT
ROWNUM NUM
FROM
DUAL D1,
DUAL D2
CONNECT BY
(D1.DUMMY = D2.DUMMY AND ROWNUM <= 100)
We have two tables, Customer and CustomerEvent both contains few million rows. On SQL Server 2000, we deployed an UDF called fn_CustomerEvent which returns TRUE or FALSE based on two parameters CustomerID and EventCode, e.g.
SELECT dbo.fn_CustomerEvent(1345678, 'Music')
The UDF code is:
CREATE FUNCTION [dbo].[fn_CustomerEvent](#CustomerID INT, #EviCode NVARCHAR(10))
RETURNS NVARCHAR(10)
AS
BEGIN
DECLARE #List NVARCHAR(10)
SELECT #List = CASE
WHEN COUNT(*) > 0 THEN 'TRUE'
ELSE 'FALSE'
END
FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode
RETURN #List
END
The performance on SQL Server 2000 was great. Return TOP 5000 rows within 3 seconds. For example,
SELECT TOP 5000
CustomerID, dbo.fn_CustomerEvent(1345678, 'Music')
FROM [Table1]
But now, we are moving to SQL Server 2005. Same code, same UDF, but performance drops dramatically from 3 seconds to 1 minutes 20 seconds.
Can anyone point me a right direction on where should I start to optimize the performance?
The scalar UDF is evaluated for each row (i.e. 5000 times). You could either call it once and store the result in a variable
DECLARE #Result nvarchar(10)
SELECT #Result = dbo.fn_CustomerEvent(1345678, 'Music')
SELECT TOP 5000
CustomerID, #Result
FROM [Table1]
or you can use an inline TVF (and I would also use EXISTS instead of COUNT)
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT CASE
WHEN EXISTS(SELECT *
FROM CustomerEvent
WHERE CustomerID = #CustomerID
AND EviCode = #EviCode) THEN 'TRUE'
ELSE 'FALSE'
END)
See Scalar functions, inlining, and performance: An entertaining title for a boring post for more about this technique.
There is one big problem with UDF's: they don't work with indexes. If you want to get code re-use and maintain performance, I will normally build either a computed column (which can be indexed) or a view.
CREATE FUNCTION CustomerEvent (#CustomerID INT,
#EviCode NVARCHAR(10))
RETURNS TABLE
AS
RETURN
(SELECT COALESCE((SELECT 'TRUE' FROM CustomerEvent
WHERE
CustomerID = #CustomerID
AND EviCode = #EviCode)
, 'FALSE'))
Check for Indexes, Rebuild Them and Update your statistics.
Query below takes 20 seconds to run. user_table has 40054 records. other_table has 14000 records
select count(a.user_id) from user_table a, other_table b
where a.user_id = b.user_id;
our restriction is that any query running more than 8 seconds gets killed...>_< I've ran explain plans, asked questions here but based on our restrictions I can not get this query to run in less than 8 secs. So I made a loop out of it.
begin
FOR i IN role_user_rec.FIRST .. role_user_rec.LAST LOOP
SELECT COUNT (a.user_id) INTO v_into FROM user_table a
WHERE TRIM(role_user_rec(i).user_id) = TRIM(a.user_id);
v_count := v_count + v_into;
END LOOP;
I know restrictions suck and this is not effecient way to do things but is there any other way to make this loop run faster?
Can you get around the loop? I agree with Janek, if the query itself takes too long you may have to do a different method to get it. And to agree with Mark, if you can do it in one query then by all means do so. But if you cannot, drop the loop as below
But try it something like this; drop the loop:
/*
--set up for demo/test
Create Type Testusertype As Object(User_Id Number , User_Name Varchar2(500));
CREATE TYPE TESTUSERTYPETABLE IS TABLE OF TESTUSERTYPE;
*/
Declare
Tutt Testusertypetable;
TOTALCOUNT NUMBER ;
Begin
Select Testusertype(Object_Id,Object_Name)
bulk collect into TUTT
From User_Objects
;
Dbms_Output.Put_Line(Tutt.Count);
Select Count(*) Into Totalcount
From User_Objects Uu
Inner Join Table(Tutt) T
ON T.User_Id = Uu.Object_Id;
Dbms_Output.Put_Line(Tutt.Count);
Dbms_Output.Put_Line(Totalcount);
End ;