use lag int the next line after its line have been executed - oracle

This is a very complicated situation for me and I was wondering if someone can help me with it:
Here is my table:
Record_no Type Solde SQLCalculatedPmu DesiredValues
------------------------------------------------------------------------
2570088 Insertion 60 133 133
2636476 Insertion 67 119,104 119,104
2636477 Insertion 68 117,352 117,352
2958292 Insertion 74 107,837 107,837
3148350 Radiation 73 107,837 107,83 <---
3282189 Insertion 80 98,401 98,395
3646066 Insertion 160 49,201 49,198
3783510 Insertion 176 44,728 44,725
3783511 Insertion 177 44,475 44,472
4183663 Insertion 188 41,873 41,87
4183664 Insertion 189 41,651 41,648
4183665 Radiation 188 41,651 41,64 <---
4183666 Insertion 195 40,156 40,145
4183667 Insertion 275 28,474 28,466
4183668 Insertion 291 26,908 26,901
4183669 Insertion 292 26,816 26,809
4183670 Insertion 303 25,842 25,836
4183671 Insertion 304 25,757 25,751
In my table every value in the SQLCalculatedPmu column or desiredValue Column is calculated based on the preceding value.
As you can see, I have calculated the SQLcalculatedPMU column based on the round on 3 decimals. The case is that on each line radiation, the client want to start the next calculation based on 2 decimals instead of 3(represented in the desired values column). Next values will be recalculated. For example line 6 will change as the value in line 5 is now on 2 decimals. I could handle this if there where one single radiation but in my case I have a lot of Radiations and in this case they will change all based on the calculation of the two decimals.
In summary, Here are the steps:
1 - round the value of the preceding row of a raditaiton and put it in the radiation row.
2 - calculate all next insertion rows.
3 - when we reach another radiation we redo steps 1 and 2 and so on
I m using an oracle DB and I m the owner so I can make procedures, insert, update, select.
But I m not familiar with procedures or loops.
For information, this is the formula for SQLCalculatedPmu uses two additional culmns price and number and this is calculated every line cumulativelly for each investor:
(price * number)+(cumulative (price*number) of the preceeding lines)
I tried something like this :
update PMUTemp
set SQLCalculatedPmu =
case when Type = 'Insertion' then
(number*price)+lag(SQLCalculatedPmu ,1) over (partition by investor
order by Record_no)/
(number+lag(solde,1) over (partition by investor order by Record_no))
else
TRUNC(lag(SQLCalculatedPmu,1) over partition by invetor order by Record_no))
end;
but I gave me this error (I think it's because I m looking at the preceiding line that itself is modified during the SQL statement) :
ORA-30486 : window function are allowed only in the SELECT list of a query.
I was wondering if creating a procedure that will be called as many time as the number of radiations would do the job but I m really not good in procedures
Any help
Regards,
just to make my need simpler, all I want is to have the DesiredValues column starting from the SQLCalculatedPmu column. Steps are
1 - on a radiation the value become = trunc(preceding value,2)
2 - calculate all next insertion rows this way : (price * number)+(cumulative (price*number) of the preceeding lines). As the radiation value have changed then I need to recalculate next lines based on it
3 - when we reach another radiation we redo steps 1 and 2 and so on
Kindest regards

You should not need a procedure here -- a SQL update of the Radiation rows in the table would do this quicker and more reliably.
Something like ..
update my_table t1
set (column_1, column_2) =
(select round(column_1,2), round(column_2,2)
from my_table t2
where t2.type = 'Insertion' and
t2.record_no = (select max(t3.record_no)
from my_table t3
where t3.type = 'Insertion' and
t3.record_no < t1.record_no ))
where t1.type = 'Radiation'

Related

Oracle - determine and return the specfic hour of data with the highest sum of the values

I think I can do this in a more roundabout way using arrays, scripting, etc...BUT is it possible to sum up (aggregate) all the values for each "hour" of data in a database for a given field? Basically, I am trying to determine which hour in a day's worth of data had the highest sum...preferably without having to loop through 24 times for each day I want to look at. For example...let's say I have a table called "table", that contains columns for times and values as the follows:
Time Value
00:00 1
00:15 1
00:30 2
00:45 2
01:00 1
01:15 1
01:30 1
01:45 1
If I summed up by hand, I would get the following
Sum for 00 Hour = 6
Sum for 01 Hour = 4
So, in this example 00 Hour would be my "largest sum" hour. I'd like to end up returning simply which hour had the highest sum, and what that value was...the other hours don't matter in this case.
Can this be done all in a single ORACLE query, or does it need to be done outside the query with some scripting and working with the times and values separately? If not a single, maybe even just grab the sum for each hour, and I can run multiple queries - one for each hour? Then push each hour to an array, and just use the max of that array? I know there is a SUM() function in oracle, but how to tell it to "sum all the hours and just return the hour with the highest sum" escapes me. Hope all this makes sense. lol
Thanks for any advice to make this easier. :-)
The following query should do what you are looking for:
SELECT SUBSTR(time, 1, 2) AS HOUR,
SUM(amount) AS TOTAL_AMOUNT
FROM test_data
GROUP BY SUBSTR(time, 1, 2)
ORDER BY TOTAL_AMOUNT DESC
FETCH FIRST ROW WITH TIES;
The query uses the SUM function but grouping by the hour part of your time column. Then it orders the results by the summed amounts descending, only returning the maximum value.
Here is a DBFiddle showing the query in use (LINK)

Hive SELECT col, COUNT(*) mismatch

Let me start by saying, I am very new to Hive, so I'm not sure what information folks will need to help me out. Please let me know what information would be useful. Also, while I'd usually create a small dataset to recreate the problem with, I think this problem has to do with the scale of my dataset, because I can't seem to recreate the problem on a smaller dataset. Let me know if you have suggestions to make this more easy to answer.
Okay now that's out of the way, here's my problem. I have a huge dataset, partitioned by month, with about 500 million rows per month. I have a column with an ID number in it (I'll call it idcol), and I want to closely examine a couple of examples where there's a high number of repeated IDs and a very low number. So, I used this:
SELECT idcol, COUNT(*) FROM table WHERE month = 7 GROUP BY idcol LIMIT 10;
And got:
000005185884381 13
000035323848000 24
000017027256315 531
000010121767109 54
000039844553332 3
000013731352481 309
000024387407996 3
000028461234451 67
000016564844672 1
000032933040806 17
So, I went to investigate the first idvar with a count of 3, with:
SELECT * FROM table WHERE month = 7 AND idcol = '000039844553332';
I expected to see just 3 rows, but ended up with 469 rows found! That was strange enough, but then I just happened to run the original line of code above but with LIMIT 5 instead and ended up with:
000005185884381 13
000017027256315 75
000010121767109 25
000013731352481 59
000024387407996 1
And, it may be hard to see because the idcol is so long, but idvar 000017027256315 ended up with a count of 531 when I did LIMIT 10 and just 75 when I did LIMIT 5.
What am I missing?! How can I get a correct count of just a small number of values so I can investigate further?!
BTW my first thought was to make the counting part a sub-query, but that didn't change a thing. I used:
SELECT * FROM (SELECT idcol, COUNT(*) FROM table WHERE month = 7 GROUP BY idcol) x LIMIT 10;
...same EXACT results
Most likely the counts are being computed from statistics.See here for the bug and the related discussion.
hive.compute.query.using.stats = FALSE
If this doesn't fix it try the ANALYZE command before running the count(*)
ANALYZE TABLE table_name PARTITION(month) COMPUTE STATISTICS;

Split amount into multiple rows if amount>=$10M or <=$-10B

I have a table in oracle database which may contain amounts >=$10M or <=$-10B.
99999999.99 chunks and also include remainder.
If the value is less than or equal to $-10B, I need to break into one or more 999999999.99 chunks and also include remainder.
Your question is somewhat unreadable, but unless you did not provide examples here is something for start, which may help you or someone with similar problem.
Let's say you have this data and you want to divide amounts into chunks not greater than 999:
id amount
-- ------
1 1500
2 800
3 2500
This query:
select id, amount,
case when level=floor(amount/999)+1 then mod(amount, 999) else 999 end chunk
from data
connect by level<=floor(amount/999)+1
and prior id = id and prior dbms_random.value is not null
...divides amounts, last row contains remainder. Output is:
ID AMOUNT CHUNK
------ ---------- ----------
1 1500 999
1 1500 501
2 800 800
3 2500 999
3 2500 999
3 2500 502
SQLFiddle demo
Edit: full query according to additional explanations:
select id, amount,
case
when amount>=0 and level=floor(amount/9999999.99)+1 then mod(amount, 9999999.99)
when amount>=0 then 9999999.99
when level=floor(-amount/999999999.99)+1 then -mod(-amount, 999999999.99)
else -999999999.99
end chunk
from data
connect by ((amount>=0 and level<=floor(amount/9999999.99)+1)
or (amount<0 and level<=floor(-amount/999999999.99)+1))
and prior id = id and prior dbms_random.value is not null
SQLFiddle
Please adjust numbers for positive and negative borders (9999999.99 and 999999999.99) according to your needs.
There are more possible solutions (recursive CTE query, PLSQL procedure, maybe others), this hierarchical query is one of them.

Oracle Moving Average without Current Row

Would like to calculate moving average in oracle for the last 30 records excluding current row.
select crd_nb, flng_month, curr_0380,
avg(curr_0380) over (partition by crd_nb order by flng_month ROWS 30 PRECEDING) mavg
from us_ordered
The above SQL calculates moving average for 30 records including current row.
Is there any way to calculate moving average excluding current row.?
select
crd_nb,
flng_month,
curr_0380,
avg(curr_0380) over (
partition by crd_nb
order by flng_month
ROWS between 30 PRECEDING and 1 preceding
) mavg
from us_ordered
#be_here_now's answer is clearly superior. I'm leaving mine in place nonetheless, as it's still functional, if needlessly complex.
An answer would to be to get the sum and count individually, subtract out the current row then divide the two results. It's a little ugly, but it should work:
SELECT crd_nb,
flng_month,
curr_0380,
( SUM (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- curr_0380)
/ ( COUNT (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- 1)
mavg
FROM us_ordered
If curr_0380 can be null, you'd have to tweak this slightly so that the current row is removed only if it's not null.

SQL / VBScript / Intelligent Algorithm to find sum combinations quickly

Am trying to list out all the possible sequential (continuous and forward direction only) sum combinations , within the same subject.
Listing out the row_id and the number of rows involved in the sum.
Sample :
Input (Source Table :)
DLID Subject Total
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Expected Result :
ID Number of Rows Subject Total
1 1 Science 70
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280
VBSript Code :
' myarray - reads the entire table from access database
' "i" is the total number of rows read
' "j" if to access each row one by one
' "m" is the number of subsequent rows with same subject , we are trying to check
' "n" is a counter to start from each row and check upto m - 1 rows whether same sub
' "k" is used to store the results into "resultarray"
myarray(0,j) = holds the row_id
myarray(1,j) = holds the subject
myarray(2,j) = holds the score
myarray(3 4 5 6 are other details
i is the total number of rows - around 80,000
There can be conitnuous records from the same subject as many as 700 - 800
m = is the number of rows matching / number of rows leading to the sum
For m = 1 to 700
For j = 0 to i-m
matchcount = 1
For n = 1 to m-1
if myarray(1,j) = myarray (1,j+n) Then
matchcount = matchcount + 1
Else
Exit For
End If
Next
If matchcount = m Then
resultarray(2,k) = 0
For o = 0 to m - 1
resultarray(2,k) = CDbl(resultarray(2,k)) + CDbl (myarray (2,j+o))
resultarray(1,k) = m
resultarray(0,k) = ( myarray (0,j) )
resultarray(3,k) = ( myarray (3,j) )
resultarray(4,k) = ( myarray (4,j) )
resultarray(5,k) = ( myarray (1,j) )
resultarray(7,k) = ( myarray (5,j) )
resultarray(8,k) = ( myarray (6,j) )
Next
resultarray(2,k) = round(resultarray(2,k),0)
k = k + 1
ReDim Preserve resultarray(8,k)
End If
Next
Next
Code is working perfect , but is very slow.
Am dealing with 80,000 row and from 5 to 900 continuous rows of same subject.
So the number of combinations , comes in a few millions.
Takes few hours for one set of 80,000 rows. have to do many sets daily.
Please suggest how to speed this up.
Better Algorithm / Code Improvements / Different Language to code
Please assist.
Here are the building blocks for a "real" Access (SQL) solution.
Observation #1
It seems to me that a good first step would be to add two Numeric (Long Integer) columns to the [SourceTable]:
[SubjectBlock] will number the "blocks" of rows where the subject is the same
[SubjectBlockSeq] will sequentially number the rows within each block
They both should be indexed (Duplicates OK). The code to populate these columns would be...
Public Sub UpdateBlocksAndSeqs()
Dim cdb As DAO.Database, rst As DAO.Recordset
Dim BlockNo As Long, SeqNo As Long, PrevSubject As String
Set cdb = CurrentDb
Set rst = cdb.OpenRecordset("SELECT * FROM [SourceTable] ORDER BY [DLID]", dbOpenDynaset)
PrevSubject = "(an impossible value)"
BlockNo = 0
SeqNo = 0
DBEngine.Workspaces(0).BeginTrans ''speeds up bulk updates
Do While Not rst.EOF
If rst!Subject <> PrevSubject Then
BlockNo = BlockNo + 1
SeqNo = 0
End If
SeqNo = SeqNo + 1
rst.Edit
rst!SubjectBlock = BlockNo
rst!SubjectBlockSeq = SeqNo
rst.Update
PrevSubject = rst!Subject
rst.MoveNext
Loop
DBEngine.Workspaces(0).CommitTrans
rst.Close
Set rst = Nothing
End Sub
...and the updated SourceTable would be...
DLID Subject Total SubjectBlock SubjectBlockSeq
1 Science 70 1 1
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
5 Maths 80 2 1
6 Maths 90 2 2
7 English 90 3 1
8 English 80 3 2
9 English 70 3 3
10 Science 75 4 1
(Note that I tweaked your test data to make it easier to verify the results below.)
Now as we iterate through the ever-increasing "length of sequence to be included in the total" we can quickly identify the "blocks" that are of interest simply by using a query like...
SELECT SubjectBlock FROM SourceTable WHERE SubjectBlockSeq=3
...which will return...
1
3
...indicating that when calculating the totals for a "run of 3" we won't need to look at blocks 2 ("Maths") and 4 (the last "Science" one) at all.
Observation #2
The first time through, when NumRows=1, is a special case: it just copies the rows from [SourceTable] into the [Expected Results] table. We can save time by doing that with a single query:
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 1 AS Expr1, SourceTable.Subject, SourceTable.Total,
SourceTable.SubjectBlock, SourceTable.SubjectBlockSeq+1 AS Expr2
FROM SourceTable;
You may notice that I have added two columns to the [ExpectedResult] table: [SubjectBlock] (as before) and [NextSubjetBlockSeq] (which is just [SubjectBlockSeq]+1). Again, they should both be indexed, allowing duplicates. We'll use them below.
Observation #3
As we continue looking for longer and longer "runs" to sum, each run is really just an earlier (shorter) run with an additional row tacked onto the end. If we write our results to the [ExpectedResults] table as we go along, we can re-use those values and not bother going back and adding up the individual values for the entire run.
When NumRows=2, the "add-on" rows are the ones where SubjectBlockSeq>=2...
SELECT SourceTable.*
FROM SourceTable
WHERE (((SourceTable.SubjectBlockSeq)>=2))
ORDER BY SourceTable.DLID;
...that is...
DLID Subject Total SubjectBlock SubjectBlockSeq
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
6 Maths 90 2 2
8 English 80 3 2
9 English 70 3 3
...and the [ExpectedResult] rows with the "earlier (shorter) run" onto which we will be "tacking" the additional row are the ones
from the same [SubjectBlock],
with [NumRows]=1, and
with [ExpectedResult].[NextSubjectBlockSeq] = [SourceTable].[SubjectBlockSeq]
so we can get the new totals and append them to [ExpectedResult] like this
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 2 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=2) AND (ExpectedResult.NumRows=1));
The rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
2 2 Science 130 1 3
3 2 Science 135 1 4
4 2 Science 145 1 5
6 2 Maths 170 2 3
8 2 English 170 3 3
9 2 English 150 3 4
Now we're cookin'...
Using the same logic as before, we can now process for NumRows=3. The only differences are that we will be inserting the value 3 into NumRows, and our selection criteria will be
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2))
The complete query is
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 3 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2));
and the rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
3 3 Science 205 1 4
4 3 Science 205 1 5
9 3 English 240 3 4
Parameterization
Since each successive query is so similar, it would be awfully nice if we could just write it once and use it repeatedly. Fortunately, we can, if we turn it into a "Parameter Query":
PARAMETERS TargetNumRows Long;
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, [TargetNumRows] AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
AND (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
WHERE (((SourceTable.SubjectBlockSeq)>=[TargetNumRows])
AND ((ExpectedResult.NumRows)=[TargetNumRows]-1));
Create a new Access query, paste the above into the SQL pane, and then save it as pq_appendToExpectedResult. (The "pq_" is just a visual cue that it's a Parameter Query.)
Invoking a Parameter Query from VBA
You can invoke (execute) a Parameter Query in VBA via a QueryDef object:
Dim cdb As DAO.Database, qdf As DAO.QueryDef
Set cdb = CurrentDb
Set qdf = cdb.QueryDefs("pq_appendToExpectedResult")
qdf!TargetNumRows = 4 '' parameter value
qdf.Execute
Set qdf = Nothing
When to stop
Now you can see that it's simply a matter of incrementing NumRows and re-running the Parameter Query, but when to stop? That's easy:
After incrementing your NumRows variable in VBA, test
DCount("DLID", "SourceTable", "SubjectBlockSeq=" & NumRows)
If it comes back 0 then you're done.
Show me (all) the code
Sorry, not right away. ;) Play around with this and let us know how it goes.
Your question is:
"Please suggest how to speed this up. Better Algorithm / Code Improvements / Different Language to code Please assist."
I can answer part of your question quickly in short. "Different Language to code" == SQL.
In detail:
Whatever it is you're trying to achieve looks dataset intensive. I'm almost certain this processing would be handled more efficiently within the DBMS that houses your data, as the DBMS is able to take a (reasonably well written) SQL query and optimise it based on its own knowledge of the data you are interrogating, and perform aggregation over large sets/sub-sets of data very quickly and efficiently.
Iterating over large datasets row-by-row to accumulate values is rarely (dare I say never) going to yield acceptable performance. Which is why DBMSes don't do this natively (if you don't force them to by using iterative code, or code that needs to investigate each row, such as your VB code).
Now, for the implementation of Better Algorithm / Code Improvements / Different Language.
I've done this in SQL, but regardless of if you use my solution or not, I would still highly recommend you migrate your data to eg MS SQL or Oracle or mySql etc if you find that your use of MS Access is binding you to iterative approaches (which is not to suggest it is doing that... I don't know if this is the case or not).
But if this is genuinely not homework, and/or you are genuinely tied to MS Access, then perhaps an investment of effort to convert this to MS Access might be fruitful in terms of performance. The principles should all be the same - it's a relational database and this is all fairly standard SQL, so I would've thought there'd be Access equivalents for what I've done here.
Failing that, you should be able to "point" an MSSQL instance at the MS Access file, as a linked server via an Access provider. If you'd like advice on this, let me know.
There's some code here that is procedural by nature, in order to set up some "helper" tables that will allow the heavy-lifting aggregation on your sequences to be done using set-based operations.
I've called the source table "Your_Source_Table". Do a search-replace on all instances to rename as whatever you've called it.
Note also that I haven't set up indexes on anything... you should do this. Indexes should be created for all the columns involved in joins, I expect. Checking the execution plan to ensure there's no unnecessary table scans would be wise.
I used the following to create Your_Source_Table:
-- Create Your apparent table structure
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Your_Source_Table]') AND type in (N'U'))
DROP TABLE [dbo].[Your_Source_Table]
GO
CREATE TABLE [dbo].[Your_Source_Table](
[DLID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[Total] [int] NOT NULL
) ON [PRIMARY]
GO
And populated it as:
DLID Subject Total
----------- ---------- -----------
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Then, I created the following "helpers". Explanations in code.
-- Set up helper structures.
-- Build a number table
if object_id('tempdb..##numbers') is not null
BEGIN DROP TABLE ##numbers END
SELECT TOP 10000 IDENTITY(int,1,1) AS Number -- Can be 700, 800, or 900 contiguous rows, depending on which comment I read. So I'll run with 100000 to be sure :-)
INTO ##numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE ##numbers ADD CONSTRAINT PK_numbers PRIMARY KEY CLUSTERED (Number)
-- Determine where each block starts.
if object_id('tempdb..#tempGroups') is not null
BEGIN DROP TABLE #tempGroups END
GO
CREATE TABLE #tempGroups (
[ID] [int] NOT NULL IDENTITY,
[StartID] [int] NULL,
[Subject] [nchar](10) NULL
) ON [PRIMARY]
INSERT INTO #tempGroups
SELECT t.DLID, t.Subject FROM Your_Source_Table t WHERE DLID=1
UNION
SELECT
t.DLID, t.Subject
FROM
Your_Source_Table t
INNER JOIN Your_Source_Table t2 ON t.DLID = t2.DLID+1 AND t.subject != t2.subject
-- Determine where each block ends
if object_id('tempdb..##groups') is not null
BEGIN DROP TABLE ##groups END
CREATE TABLE ##groups (
[ID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[StartID] [int] NOT NULL,
[EndID] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##groups
SELECT
g1.id as ID,
g1.subject,
g1.startID as startID,
CASE
WHEN g2.id is not null THEN g2.startID-1
ELSE (SELECT max(dlid) FROM Your_Source_Table) -- Boundary case when there is no following group (ie return the last row)
END as endID
FROM
#tempGroups g1
LEFT JOIN #tempGroups g2 ON g1.id = g2.id-1
DROP TABLE #tempGroups;
GO
-- We now have a helper table called ##groups, that identifies the subject, start DLID and end DLID of each continuous block of a particular subject in your dataset.
-- So now, we can build up the possible sequences within each group, by joining to a number table.
if object_id('tempdb..##sequences') is not null
BEGIN DROP TABLE ##sequences END
CREATE TABLE ##sequences (
[seqID] [int] NOT NULL IDENTITY,
[groupID] [int] NOT NULL,
[start_of_sequence] [int] NOT NULL,
[end_of_sequence] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##sequences
SELECT
g.id,
ns.number start_of_sequence,
ne.number end_of_sequence
FROM
##groups g
INNER JOIN ##numbers ns
ON ns.number <= (g.endid - g.startid + 1) -- number is in range for this block
INNER JOIN ##numbers ne
ON ne.number <= (g.endid - g.startid + 1) -- number is in range for this block
and ne.number >= ns.number -- end after start
ORDER BY
1,2,3
Then, the results you're after can be achieved with a single set-based operation:
-- By joining groups to your dataset we can add a group identity to each record.
-- By joining sequences we can generate copies of the rows for aggregation into each sequence.
select
min(t.dlid) as ID, -- equals (s.start_of_sequence + g.startid - 1) (sequence positions offset by group start position)
count(t.dlid) as number_of_rows,
g.subject,
sum(t.total) as total
--select *
from
Your_Source_Table t
inner join ##groups g
on t.dlid >= g.startid and t.dlid <= g.endid -- grouping rows into each group.
inner join ##sequences s
on s.groupid = g.id -- get the sequences for this group.
and t.dlid >= (s.start_of_sequence + g.startid - 1) -- include the rows required for this sequence (sequence positions offset by group start position)
and t.dlid <= (s.end_of_sequence + g.startid - 1)
group by
g.subject,
s.seqid
order by 2, 1
BUT NOTE:
This result is NOT exactly the same as your "Expected Result".
You've incorrectly included a duplicate instance of the 1 row sequence starting at row 1 (for science, sum total 1*70=70), but not included the 4 row sequence starting at row 1 (for science, sum total 4*70 = 280).
The correct results, IMHO are:
ID number_of_rows subject total
----------- -------------- ---------- -----------
1 1 Science 70 <-- You've got this row twice.
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280 <-- You don't have this row.
(20 row(s) affected)

Resources