Am trying to list out all the possible sequential (continuous and forward direction only) sum combinations , within the same subject.
Listing out the row_id and the number of rows involved in the sum.
Sample :
Input (Source Table :)
DLID Subject Total
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Expected Result :
ID Number of Rows Subject Total
1 1 Science 70
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280
VBSript Code :
' myarray - reads the entire table from access database
' "i" is the total number of rows read
' "j" if to access each row one by one
' "m" is the number of subsequent rows with same subject , we are trying to check
' "n" is a counter to start from each row and check upto m - 1 rows whether same sub
' "k" is used to store the results into "resultarray"
myarray(0,j) = holds the row_id
myarray(1,j) = holds the subject
myarray(2,j) = holds the score
myarray(3 4 5 6 are other details
i is the total number of rows - around 80,000
There can be conitnuous records from the same subject as many as 700 - 800
m = is the number of rows matching / number of rows leading to the sum
For m = 1 to 700
For j = 0 to i-m
matchcount = 1
For n = 1 to m-1
if myarray(1,j) = myarray (1,j+n) Then
matchcount = matchcount + 1
Else
Exit For
End If
Next
If matchcount = m Then
resultarray(2,k) = 0
For o = 0 to m - 1
resultarray(2,k) = CDbl(resultarray(2,k)) + CDbl (myarray (2,j+o))
resultarray(1,k) = m
resultarray(0,k) = ( myarray (0,j) )
resultarray(3,k) = ( myarray (3,j) )
resultarray(4,k) = ( myarray (4,j) )
resultarray(5,k) = ( myarray (1,j) )
resultarray(7,k) = ( myarray (5,j) )
resultarray(8,k) = ( myarray (6,j) )
Next
resultarray(2,k) = round(resultarray(2,k),0)
k = k + 1
ReDim Preserve resultarray(8,k)
End If
Next
Next
Code is working perfect , but is very slow.
Am dealing with 80,000 row and from 5 to 900 continuous rows of same subject.
So the number of combinations , comes in a few millions.
Takes few hours for one set of 80,000 rows. have to do many sets daily.
Please suggest how to speed this up.
Better Algorithm / Code Improvements / Different Language to code
Please assist.
Here are the building blocks for a "real" Access (SQL) solution.
Observation #1
It seems to me that a good first step would be to add two Numeric (Long Integer) columns to the [SourceTable]:
[SubjectBlock] will number the "blocks" of rows where the subject is the same
[SubjectBlockSeq] will sequentially number the rows within each block
They both should be indexed (Duplicates OK). The code to populate these columns would be...
Public Sub UpdateBlocksAndSeqs()
Dim cdb As DAO.Database, rst As DAO.Recordset
Dim BlockNo As Long, SeqNo As Long, PrevSubject As String
Set cdb = CurrentDb
Set rst = cdb.OpenRecordset("SELECT * FROM [SourceTable] ORDER BY [DLID]", dbOpenDynaset)
PrevSubject = "(an impossible value)"
BlockNo = 0
SeqNo = 0
DBEngine.Workspaces(0).BeginTrans ''speeds up bulk updates
Do While Not rst.EOF
If rst!Subject <> PrevSubject Then
BlockNo = BlockNo + 1
SeqNo = 0
End If
SeqNo = SeqNo + 1
rst.Edit
rst!SubjectBlock = BlockNo
rst!SubjectBlockSeq = SeqNo
rst.Update
PrevSubject = rst!Subject
rst.MoveNext
Loop
DBEngine.Workspaces(0).CommitTrans
rst.Close
Set rst = Nothing
End Sub
...and the updated SourceTable would be...
DLID Subject Total SubjectBlock SubjectBlockSeq
1 Science 70 1 1
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
5 Maths 80 2 1
6 Maths 90 2 2
7 English 90 3 1
8 English 80 3 2
9 English 70 3 3
10 Science 75 4 1
(Note that I tweaked your test data to make it easier to verify the results below.)
Now as we iterate through the ever-increasing "length of sequence to be included in the total" we can quickly identify the "blocks" that are of interest simply by using a query like...
SELECT SubjectBlock FROM SourceTable WHERE SubjectBlockSeq=3
...which will return...
1
3
...indicating that when calculating the totals for a "run of 3" we won't need to look at blocks 2 ("Maths") and 4 (the last "Science" one) at all.
Observation #2
The first time through, when NumRows=1, is a special case: it just copies the rows from [SourceTable] into the [Expected Results] table. We can save time by doing that with a single query:
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 1 AS Expr1, SourceTable.Subject, SourceTable.Total,
SourceTable.SubjectBlock, SourceTable.SubjectBlockSeq+1 AS Expr2
FROM SourceTable;
You may notice that I have added two columns to the [ExpectedResult] table: [SubjectBlock] (as before) and [NextSubjetBlockSeq] (which is just [SubjectBlockSeq]+1). Again, they should both be indexed, allowing duplicates. We'll use them below.
Observation #3
As we continue looking for longer and longer "runs" to sum, each run is really just an earlier (shorter) run with an additional row tacked onto the end. If we write our results to the [ExpectedResults] table as we go along, we can re-use those values and not bother going back and adding up the individual values for the entire run.
When NumRows=2, the "add-on" rows are the ones where SubjectBlockSeq>=2...
SELECT SourceTable.*
FROM SourceTable
WHERE (((SourceTable.SubjectBlockSeq)>=2))
ORDER BY SourceTable.DLID;
...that is...
DLID Subject Total SubjectBlock SubjectBlockSeq
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
6 Maths 90 2 2
8 English 80 3 2
9 English 70 3 3
...and the [ExpectedResult] rows with the "earlier (shorter) run" onto which we will be "tacking" the additional row are the ones
from the same [SubjectBlock],
with [NumRows]=1, and
with [ExpectedResult].[NextSubjectBlockSeq] = [SourceTable].[SubjectBlockSeq]
so we can get the new totals and append them to [ExpectedResult] like this
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 2 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=2) AND (ExpectedResult.NumRows=1));
The rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
2 2 Science 130 1 3
3 2 Science 135 1 4
4 2 Science 145 1 5
6 2 Maths 170 2 3
8 2 English 170 3 3
9 2 English 150 3 4
Now we're cookin'...
Using the same logic as before, we can now process for NumRows=3. The only differences are that we will be inserting the value 3 into NumRows, and our selection criteria will be
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2))
The complete query is
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 3 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2));
and the rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
3 3 Science 205 1 4
4 3 Science 205 1 5
9 3 English 240 3 4
Parameterization
Since each successive query is so similar, it would be awfully nice if we could just write it once and use it repeatedly. Fortunately, we can, if we turn it into a "Parameter Query":
PARAMETERS TargetNumRows Long;
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, [TargetNumRows] AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
AND (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
WHERE (((SourceTable.SubjectBlockSeq)>=[TargetNumRows])
AND ((ExpectedResult.NumRows)=[TargetNumRows]-1));
Create a new Access query, paste the above into the SQL pane, and then save it as pq_appendToExpectedResult. (The "pq_" is just a visual cue that it's a Parameter Query.)
Invoking a Parameter Query from VBA
You can invoke (execute) a Parameter Query in VBA via a QueryDef object:
Dim cdb As DAO.Database, qdf As DAO.QueryDef
Set cdb = CurrentDb
Set qdf = cdb.QueryDefs("pq_appendToExpectedResult")
qdf!TargetNumRows = 4 '' parameter value
qdf.Execute
Set qdf = Nothing
When to stop
Now you can see that it's simply a matter of incrementing NumRows and re-running the Parameter Query, but when to stop? That's easy:
After incrementing your NumRows variable in VBA, test
DCount("DLID", "SourceTable", "SubjectBlockSeq=" & NumRows)
If it comes back 0 then you're done.
Show me (all) the code
Sorry, not right away. ;) Play around with this and let us know how it goes.
Your question is:
"Please suggest how to speed this up. Better Algorithm / Code Improvements / Different Language to code Please assist."
I can answer part of your question quickly in short. "Different Language to code" == SQL.
In detail:
Whatever it is you're trying to achieve looks dataset intensive. I'm almost certain this processing would be handled more efficiently within the DBMS that houses your data, as the DBMS is able to take a (reasonably well written) SQL query and optimise it based on its own knowledge of the data you are interrogating, and perform aggregation over large sets/sub-sets of data very quickly and efficiently.
Iterating over large datasets row-by-row to accumulate values is rarely (dare I say never) going to yield acceptable performance. Which is why DBMSes don't do this natively (if you don't force them to by using iterative code, or code that needs to investigate each row, such as your VB code).
Now, for the implementation of Better Algorithm / Code Improvements / Different Language.
I've done this in SQL, but regardless of if you use my solution or not, I would still highly recommend you migrate your data to eg MS SQL or Oracle or mySql etc if you find that your use of MS Access is binding you to iterative approaches (which is not to suggest it is doing that... I don't know if this is the case or not).
But if this is genuinely not homework, and/or you are genuinely tied to MS Access, then perhaps an investment of effort to convert this to MS Access might be fruitful in terms of performance. The principles should all be the same - it's a relational database and this is all fairly standard SQL, so I would've thought there'd be Access equivalents for what I've done here.
Failing that, you should be able to "point" an MSSQL instance at the MS Access file, as a linked server via an Access provider. If you'd like advice on this, let me know.
There's some code here that is procedural by nature, in order to set up some "helper" tables that will allow the heavy-lifting aggregation on your sequences to be done using set-based operations.
I've called the source table "Your_Source_Table". Do a search-replace on all instances to rename as whatever you've called it.
Note also that I haven't set up indexes on anything... you should do this. Indexes should be created for all the columns involved in joins, I expect. Checking the execution plan to ensure there's no unnecessary table scans would be wise.
I used the following to create Your_Source_Table:
-- Create Your apparent table structure
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Your_Source_Table]') AND type in (N'U'))
DROP TABLE [dbo].[Your_Source_Table]
GO
CREATE TABLE [dbo].[Your_Source_Table](
[DLID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[Total] [int] NOT NULL
) ON [PRIMARY]
GO
And populated it as:
DLID Subject Total
----------- ---------- -----------
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Then, I created the following "helpers". Explanations in code.
-- Set up helper structures.
-- Build a number table
if object_id('tempdb..##numbers') is not null
BEGIN DROP TABLE ##numbers END
SELECT TOP 10000 IDENTITY(int,1,1) AS Number -- Can be 700, 800, or 900 contiguous rows, depending on which comment I read. So I'll run with 100000 to be sure :-)
INTO ##numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE ##numbers ADD CONSTRAINT PK_numbers PRIMARY KEY CLUSTERED (Number)
-- Determine where each block starts.
if object_id('tempdb..#tempGroups') is not null
BEGIN DROP TABLE #tempGroups END
GO
CREATE TABLE #tempGroups (
[ID] [int] NOT NULL IDENTITY,
[StartID] [int] NULL,
[Subject] [nchar](10) NULL
) ON [PRIMARY]
INSERT INTO #tempGroups
SELECT t.DLID, t.Subject FROM Your_Source_Table t WHERE DLID=1
UNION
SELECT
t.DLID, t.Subject
FROM
Your_Source_Table t
INNER JOIN Your_Source_Table t2 ON t.DLID = t2.DLID+1 AND t.subject != t2.subject
-- Determine where each block ends
if object_id('tempdb..##groups') is not null
BEGIN DROP TABLE ##groups END
CREATE TABLE ##groups (
[ID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[StartID] [int] NOT NULL,
[EndID] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##groups
SELECT
g1.id as ID,
g1.subject,
g1.startID as startID,
CASE
WHEN g2.id is not null THEN g2.startID-1
ELSE (SELECT max(dlid) FROM Your_Source_Table) -- Boundary case when there is no following group (ie return the last row)
END as endID
FROM
#tempGroups g1
LEFT JOIN #tempGroups g2 ON g1.id = g2.id-1
DROP TABLE #tempGroups;
GO
-- We now have a helper table called ##groups, that identifies the subject, start DLID and end DLID of each continuous block of a particular subject in your dataset.
-- So now, we can build up the possible sequences within each group, by joining to a number table.
if object_id('tempdb..##sequences') is not null
BEGIN DROP TABLE ##sequences END
CREATE TABLE ##sequences (
[seqID] [int] NOT NULL IDENTITY,
[groupID] [int] NOT NULL,
[start_of_sequence] [int] NOT NULL,
[end_of_sequence] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##sequences
SELECT
g.id,
ns.number start_of_sequence,
ne.number end_of_sequence
FROM
##groups g
INNER JOIN ##numbers ns
ON ns.number <= (g.endid - g.startid + 1) -- number is in range for this block
INNER JOIN ##numbers ne
ON ne.number <= (g.endid - g.startid + 1) -- number is in range for this block
and ne.number >= ns.number -- end after start
ORDER BY
1,2,3
Then, the results you're after can be achieved with a single set-based operation:
-- By joining groups to your dataset we can add a group identity to each record.
-- By joining sequences we can generate copies of the rows for aggregation into each sequence.
select
min(t.dlid) as ID, -- equals (s.start_of_sequence + g.startid - 1) (sequence positions offset by group start position)
count(t.dlid) as number_of_rows,
g.subject,
sum(t.total) as total
--select *
from
Your_Source_Table t
inner join ##groups g
on t.dlid >= g.startid and t.dlid <= g.endid -- grouping rows into each group.
inner join ##sequences s
on s.groupid = g.id -- get the sequences for this group.
and t.dlid >= (s.start_of_sequence + g.startid - 1) -- include the rows required for this sequence (sequence positions offset by group start position)
and t.dlid <= (s.end_of_sequence + g.startid - 1)
group by
g.subject,
s.seqid
order by 2, 1
BUT NOTE:
This result is NOT exactly the same as your "Expected Result".
You've incorrectly included a duplicate instance of the 1 row sequence starting at row 1 (for science, sum total 1*70=70), but not included the 4 row sequence starting at row 1 (for science, sum total 4*70 = 280).
The correct results, IMHO are:
ID number_of_rows subject total
----------- -------------- ---------- -----------
1 1 Science 70 <-- You've got this row twice.
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280 <-- You don't have this row.
(20 row(s) affected)
Related
Is it possible to access the previous values that have not yet been stored in the database?
I have a table related to a particular module (MOD) which I will call table XA.
I can insert multiple records into XA simultaneously they are going to be inserted, I cannot change this fact.
For example, the following data is inserted in XA
ID | ParentId | Type | Name | Value
1 | 1 | 5 | Cost | 20000
2 | 1 | 9 | Risk | 10000
And I need in this case to insert / update a record in this same table. A calculated value
At the moment of executing the trigger, the value with the name of Cost for example is inserted first, and then the value of Risk.
When evaluating the Risk, I must have the ability to know what the Cost value is to make the calculation and insert the calculated record.
I tried to create a Package to which I would feed the data, but I still have the same problem.
create or replace PACKAGE GLOBAL
IS
PRAGMA SERIALLY_REUSABLE;
TYPE arr IS TABLE OF VARCHAR2 (32)
INDEX BY VARCHAR2 (50);
NUMB arr;
END GLOBAL;
//Using in trigger
GLOBAL.NUMB (:NEW.ID || '-' || :NEW.ParentId) := :NEW.Value;
BEGIN
IF :NEW.Type == 9 AND GLOBAL.NUMB (5 || '-' || :NEW.ParentId) IS NOT NULL
THEN
// calculate and insert record
ELSE IF :NEW.Type == 5 AND GLOBAL.NUMB (9 || '-' || :NEW.ParentId) IS NOT NULL
// calculate and insert record
END IF;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
// NOT HAVE TWO INSERT TO SAME REGISTER
END;
Values 5 and 9 are for reference.
Both records are not always inserted, one or more can be inserted, even the calculated value can be imputed but must be replaced by the calculation.
And I can't create a view since there is an internal process that depends on this particular table.
Do you really really must store calculated value into a table? That's usually not the best idea as you have to maintain it in any possible case (inserts, updates, deletes).
Therefore, another suggestion: a view. Here's an example; my "calculation" is simple, I'm just subtracting cost - risk as I don't know what you really do. If calculation is very complex and should be run every time on a very large data set, yes - performance might suffer.
Anyway, here you go; see if it helps.
Sample data:
SQL> select * From xa order by parentid, name;
ID PARENTID TYPE NAME VALUE
---------- ---------- ---------- ---- ----------
1 1 5 Cost 20000
2 1 9 Risk 10000
5 4 5 Cost 4000
7 4 9 Risk 800
A view:
SQL> create or replace view v_xa as
2 select id,
3 parentid,
4 type,
5 name,
6 value
7 from xa
8 union all
9 select 0 id,
10 parentid,
11 99 type,
12 'Calc' name,
13 sum(case when type = 5 then value
14 when type = 9 then -value
15 end) value
16 from xa
17 group by parentid;
View created.
What does it contain?
SQL> select * from v_xa
2 order by parentid, type;
ID PARENTID TYPE NAME VALUE
---------- ---------- ---------- ---- ----------
1 1 5 Cost 20000
2 1 9 Risk 10000
0 1 99 Calc 10000
5 4 5 Cost 4000
7 4 9 Risk 800
0 4 99 Calc 3200
6 rows selected.
SQL>
I have a problem with how to get my Hive code optimized.
I have a huge table as follows:
Customer_id Product_id Date Value
1 1 02/28 100.0
1 2 02/02 120.0
1 3 02/10 144.0
2 2 02/15 120.0
2 3 02/28 144.0
... ... ... ...
I want to create a complex network where I link the products through the buyers. The graph does not have to be directed and I have to count the number of links between them.
In the end I need this:
Product_x Product_y amount
1 2 1
1 3 1
2 3 2
Can anyone help me with this?
I need an optimized way to do this. The join of the table with itself is not the solution. I really need an optimum way on this =/
CREATE TABLE X AS
SELECT
a.product_id as product_x,
b.product_id as product_y,
count(*) as amout
FROM table as a
JOIN table as b
ON a.customer_id = b.customer_id
WHERE a.product_id < b.product_id
GROUP BY product_x, product_y;
I have been working with an application that is integrated with spring and Hibernate 4.X.X and its transaction is managed by JTA in Weblogic application server. After 3 years, there are about 40 million records only into one table from 100 tables that exist in my DB. The DB is Oracle 11g. The response time of a query is about 5 minutes because of increasing the count of records of this tables.
I customized the query and put it into Sql Developer and run the query advisor plan for suggestion some Index. Totally after doing such this, its response time is reduced to 2 minute. But even so, this response time does not satisfy the Custumer. To further clarify I put the query, It is as following:
select *
from (select (count(storehouse0_.ID) over()) as col_0_0_,
storehouse3_.storeHouse_ID as col_1_0_,
(DBPK_PUB_STOREHOUSE.get_Storehouse_Title(storehouse5_.id, 1)) as col_2_0_,
storehouse5_.Organization_Code as col_3_0_,
publicgood1_.Goods_Item_Id as col_4_0_,
storehouse0_.storeHouse_Inventory_Id as col_5_0_,
storehouse0_.Id as col_6_0_,
storehouse3_.samapel_Item_Id as col_7_0_,
samapelite10_.MAINNAME as col_8_0_,
publicgood1_.serial_Number as col_9_0_,
publicgood1_1_.production_Year as col_10_0_,
samapelpar2_.ID_SourceInfo as col_11_0_,
samapelpar2_.Pn as col_12_0_,
storehouse3_.expire_Date as col_13_0_,
publicgood1_1_.Status_Id as col_14_0_,
baseinform12_.Topic as col_15_0_,
publicgood1_.public_Num as col_16_0_,
cast(publicgood1_1_.goods_Status as number(10, 0)) as col_17_0_,
publicgood1_1_.goods_Status as col_18_0_,
publicgood1_1_.deleted as col_19_0_
from amd.Core_StoreHouse_Inventory_Item storehouse0_,
amd.Core_STOREHOUSE_INVENTORY storehouse3_,
amd.Core_STOREHOUSE storehouse5_,
amd.SMP_SAMAPEL_CODE samapelite10_
cross join amd.Core_Goods_Item_Public publicgood1_
inner join amd.Core_Goods_Item publicgood1_1_
on publicgood1_.Goods_Item_Id = publicgood1_1_.Id
left outer join amd.SMP_SOURCEINFO samapelpar2_
on publicgood1_1_.Samapel_Part_Number_Id =
samapelpar2_.ID_SourceInfo, amd.App_BaseInformation
baseinform12_
where not exists
(select ssec.samapelITem_id
from core_security_samapelitem ssec
inner join core_goods_item g
on ssec.samapelitem_id = g.samapel_item_id
where not exists (SELECT aa.groupid
FROM app_actiongroup aa
where aa.groupid in
(select au.groupid
from app_usergroup au
where au.userid = 1)
and aa.actionid = 9054)
and ssec.isenable = 1
and storehouse0_.goods_Item_ID = g.id)
and not exists
(select *
from CORE_POWER_SECURITY cps
where not exists (SELECT aa.groupid
FROM app_actiongroup aa
where aa.groupid in
(select au.groupid
from app_usergroup au
where au.userid = 1)
and aa.actionid = 9055)
and cps.inventory_id =
storehouse0_.storeHouse_Inventory_Id
and cps.goodsitemtype = 6)
and storehouse0_.storeHouse_Inventory_Id = storehouse3_.Id
and storehouse3_.storeHouse_ID = storehouse5_.Id
and storehouse3_.samapel_Item_Id = samapelite10_.MAINCODE
and publicgood1_1_.Status_Id = baseinform12_.ID
and 1 <> 2
and storehouse0_.goods_Item_ID = publicgood1_.Goods_Item_Id
and publicgood1_1_.edited = 0
and publicgood1_1_.deleted = 0
and (exists (select storehouse13_.Id
from amd.Core_STOREHOUSE storehouse13_
cross join amd.core_power power16_
cross join amd.core_power power17_
where storehouse5_.powerID = power16_.Id
and storehouse13_.powerID = power17_.Id
and (storehouse13_.Id in (741684217))
and storehouse13_.storeHouseType = 2
and (power16_.hierarchiCode like
power17_.hierarchiCode || '%')) or
(storehouse3_.storeHouse_ID in (741684217)) and
storehouse5_.storeHouseType = 1)
and (storehouse5_.storeHouse_Status not in (2, 3))
order by storehouse3_.samapel_Item_Id)
where rownum <= 10
[Note: This query is generated by Hibernate].
It is clear that order by 40 million holds so much time.
I find the main issue of this query. I omitted the “order by” and run the query, its response time was reduced to about 5 second. I was wonderful why the “order by” affects so much the response time.
(Some body may think that if this table is partitioned or use another facility of oracle, it may get better response time. Ok it may be right but my emphasis is the “order by” performance. If there is a way that do the “order by” responsibility, why not to do it). Any way I am not able to omit the “order by” because the Customer needs to order and it is necessary for paging. I find a solution that is explained by an example. This solution I order only some records that is needed. How, I will explain later. It is clear when oracle wants to sort 40 million records, it naturally takes so much time. I replace “order by” with “where clause”. With doing this replacement the response time was reduces from 2 minute to about 5 second and this is very exciting for me.
I explain my solution via an example, anybody that read this Post tells me whether this solution is good or there are another solution that I do not know exists.
Another hand I have a solution that is explained later, if it is ok or not. Whether I use it or not.
I explain my solution:
Let’s assumed that there are two table as below:
Post table
Id Others fields
1
2
3
4
5
… …
Post_comment table
Id post_id
1 5
2 5
3 5
4 5
6 5
7 2
8 2
9 2
10 3
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
23 1
24 1
25 1
26 4
27 4
There is a form that shows the result of join between POST table and POST_COMMENT table.
I explain both query with “order by” all records of that table and “order by” only specific records that are needed. The result of two query are exactly the same but the response time of second approach is the better than that one.
You assume that the page size is 10 and you are in page 3.
The first query with the “order by” all records of that table:
select *
from (Select res.*, rownum as rownum_
from (Select * from POST_COMMENT Order by post_id asc) res
Where rownum <= 30)
where rownum_ > 20
The second solution:
Before execution the query, I query as below:
select *
from (select post_id, count(id) from POST_COMMENT group by post_id)
order by post_id asc
So the result of it is the below:
Post_id Count(id) Sum(count(id))
1 15 15
2 3 18
3 1 19
4 2 21
5 5 26
It needs to say that the third column that is "Sum(count(id))" is calculated after that query.Any entry of this column is sum all before records.
So there is a formula that specifics which post_id must be selected. The formula is the below:
pageSize = 10, pageNumber = 3
from : (pageNumber – 1) * pageCount 2 * 10 = 20
to : (pageNumber – 1) * pageCount + pageCount 20 + 10 = 30
So I need the posts that are between (20, 30] of Sum(count(id)). According to this, I need only two post_id that have value 4,5. According to this the main query of second approach is:
select *
from (select rownum as rownum_, res.*
from (select *
from (select * from POST_COMMENT where post_id in (4, 5))
order by post_id asc) res
where rownum <= 30)
where rownum_ > 20
If you look at both query, you will see the biggest difference. The second query only selects the records of POST_COMENT that have post_id that are 4 and 5. After that, orders this records not all records of that table.
After posting this post, I have searched. finally I am redirected to HERE . I can reach to the response time that is very excited for me. It is reduced from 3 minutes to less than 3 seconds. It is necessary to know, I only use one tip from all of the query optimization guidelines that are in that site that is Duplicate constant condition for different tables whenever possible.
Note: Before doing this tip, there are some indexs on fields that are in where-clause and order-by.
This is a very complicated situation for me and I was wondering if someone can help me with it:
Here is my table:
Record_no Type Solde SQLCalculatedPmu DesiredValues
------------------------------------------------------------------------
2570088 Insertion 60 133 133
2636476 Insertion 67 119,104 119,104
2636477 Insertion 68 117,352 117,352
2958292 Insertion 74 107,837 107,837
3148350 Radiation 73 107,837 107,83 <---
3282189 Insertion 80 98,401 98,395
3646066 Insertion 160 49,201 49,198
3783510 Insertion 176 44,728 44,725
3783511 Insertion 177 44,475 44,472
4183663 Insertion 188 41,873 41,87
4183664 Insertion 189 41,651 41,648
4183665 Radiation 188 41,651 41,64 <---
4183666 Insertion 195 40,156 40,145
4183667 Insertion 275 28,474 28,466
4183668 Insertion 291 26,908 26,901
4183669 Insertion 292 26,816 26,809
4183670 Insertion 303 25,842 25,836
4183671 Insertion 304 25,757 25,751
In my table every value in the SQLCalculatedPmu column or desiredValue Column is calculated based on the preceding value.
As you can see, I have calculated the SQLcalculatedPMU column based on the round on 3 decimals. The case is that on each line radiation, the client want to start the next calculation based on 2 decimals instead of 3(represented in the desired values column). Next values will be recalculated. For example line 6 will change as the value in line 5 is now on 2 decimals. I could handle this if there where one single radiation but in my case I have a lot of Radiations and in this case they will change all based on the calculation of the two decimals.
In summary, Here are the steps:
1 - round the value of the preceding row of a raditaiton and put it in the radiation row.
2 - calculate all next insertion rows.
3 - when we reach another radiation we redo steps 1 and 2 and so on
I m using an oracle DB and I m the owner so I can make procedures, insert, update, select.
But I m not familiar with procedures or loops.
For information, this is the formula for SQLCalculatedPmu uses two additional culmns price and number and this is calculated every line cumulativelly for each investor:
(price * number)+(cumulative (price*number) of the preceeding lines)
I tried something like this :
update PMUTemp
set SQLCalculatedPmu =
case when Type = 'Insertion' then
(number*price)+lag(SQLCalculatedPmu ,1) over (partition by investor
order by Record_no)/
(number+lag(solde,1) over (partition by investor order by Record_no))
else
TRUNC(lag(SQLCalculatedPmu,1) over partition by invetor order by Record_no))
end;
but I gave me this error (I think it's because I m looking at the preceiding line that itself is modified during the SQL statement) :
ORA-30486 : window function are allowed only in the SELECT list of a query.
I was wondering if creating a procedure that will be called as many time as the number of radiations would do the job but I m really not good in procedures
Any help
Regards,
just to make my need simpler, all I want is to have the DesiredValues column starting from the SQLCalculatedPmu column. Steps are
1 - on a radiation the value become = trunc(preceding value,2)
2 - calculate all next insertion rows this way : (price * number)+(cumulative (price*number) of the preceeding lines). As the radiation value have changed then I need to recalculate next lines based on it
3 - when we reach another radiation we redo steps 1 and 2 and so on
Kindest regards
You should not need a procedure here -- a SQL update of the Radiation rows in the table would do this quicker and more reliably.
Something like ..
update my_table t1
set (column_1, column_2) =
(select round(column_1,2), round(column_2,2)
from my_table t2
where t2.type = 'Insertion' and
t2.record_no = (select max(t3.record_no)
from my_table t3
where t3.type = 'Insertion' and
t3.record_no < t1.record_no ))
where t1.type = 'Radiation'
I'm attempting to perform a large query where I want to:
Query the detail rows, then
Perform aggregations based on the results returned
Essentially, I want to perform my data-intensive query ONCE, and derive both summary and detail values from the one query, as the query is pretty intensive. I'm SURE there is a better way to do this using the frontend application (e.g. detail rows in the SQL, aggregate in front-end?), but I want to know how to do this all in PL/SQL using essentially one select against the db (for performance reasons, I don't want to call essentially the same large Select twice)(and at this point, my reasons for wanting to do it in one query might be called stubborn... i.e. even if there's a better way, I'd like to know if it can be done).
I know how to get the basic "detail-level" resultset. That query would return data such as:
UPC-Region-ProjectType-TotalAssignments-IncompleteAssignments
So say I have 10 records:
10-A-X-20-10
11-B-X-10-5
12-C-Y-30-15
13-C-Z-20-10
14-A-Y-10-5
15-B-X-30-15
16-C-Z-20-10
17-B-Y-10-5
18-C-Z-30-15
19-A-X-20-10
20-B-X-10-5
I want to be able to perform the query, then perform aggregations on that resultset, such as:
Region A Projects: 3
Region A Total Assign: 50
Region A Incompl Assign: 25
Region B...
Region C...
Project Type X Projects: 5
Project Type X Total Assign: 90
Project Type X Incompl Assign: 45
Project Type Y...
Project Type Z...
And then return both resultsets (Summary + Detail) to the calling application.
I guess the idea would be running the Details query into a Temp Table, and then selecting/performing aggregation on it there to build the second "summary level" query. then passing the two resultsets back as two refcursors.
But I'm open to ideas...
My initial attempts have been:
type rec_projects is record
(record matching my DetailsSQL)
/* record variable */
project_resultset rec_projects;
/* cursor variable */
OPEN cursorvar1 FOR
select
upc,
region,
project_type,
tot_assigns,
incompl_assigns
...
Then I:
loop
fetch cursorvar1 into project_resultset;
exit when cursorvar1%NOTFOUND;
/* perform row-by-row aggregations into variables */
If project_resultset.region = 'A'
then
numAProj := numAProj + 1;
numATotalAssign := numATotalAssign + project_resultset.Totassigns;
numAIncomplAssign := numAIncomplAssign + project_resultset.Incomplassigns;
and so on...
end loop;
Followed by opening another refcursor var - selecting the variables from DUAL:
open cursorvar2 for
select
numAProj, numATotalAssign, numAIncomplAssign, etc, etc from dual;
Lastly:
cur_out1 := cursorvar1;
cur_out2 := cursorvar2;
not working... cursorvar1 seems to load fine, and I get into the loop. But I'm not ending up with anything in cursorvar2, and just feel I'm probably totally on the wrong path here (that there is a better way to do it)
Thanks for your help.
I prefer doing all calculations on server side.
Both types of information (detail + master) can be fetched through single cursor:
with
DET as (
-- your details subquery here
select
UPC,
Region,
Project_Type,
Total_Assignments,
Incomplete_Assignments
from ...
)
select
UPC,
Region,
Project_Type,
Total_Assignments,
Incomplete_Assignments,
null as projects_ctr
from DET
union all
select
null as UPC,
Region,
null as Project_Type,
sum(Total_Assignments) as Total_Assignments,
sum(Incomplete_Assignments) as Incomplete_Assignments,
count(0) as projects_ctr
from DET
group by Region
union all
select
null as UPC,
null as Region,
Project_Type,
sum(Total_Assignments) as Total_Assignments,
sum(Incomplete_Assignments) as Incomplete_Assignments,
count(0) as projects_ctr
from DET
group by Project_Type
order by UPC nulls first, Region, Project_Type
Result:
UPC Region Project_Type Total_Assignments Incomplete_Assignments Projects_Ctr
------ ------ ------------ ----------------- ---------------------- ------------
(null) A (null) 50 25 3
(null) B (null) 60 30 4
(null) C (null) 100 50 4
(null) (null) X 90 45 5
(null) (null) Y 50 25 3
(null) (null) Z 70 35 3
10 A X 20 10 (null)
11 B X 10 5 (null)
12 C Y 30 15 (null)
13 C Z 20 10 (null)
14 A Y 10 5 (null)
15 B X 30 15 (null)
16 C Z 20 10 (null)
17 B Y 10 5 (null)
18 C Z 30 15 (null)
19 A X 20 10 (null)
20 B X 10 5 (null)
fiddle
If you are going to be creating these reports regularly, it might be better to create a global temporary table to store the results of your initial query:
CREATE GLOBAL TEMPORARY TABLE MY_TEMP_TABLE
ON COMMIT DELETE ROWS
AS
SELECT
UPC,
Region,
ProjectType,
TotalAssignments,
IncompleteAssignments
FROM WHEREVER
;
You can then run a series of follow-up queries to calculate the various statistics values for your report and output them in a format other than a large text table.