MDX COUNT ON FILTERED SET - filter

I have an employee dimension (22k rows) which has attribute [country],[region]. my fact table [LastAccess]is simple, just loggin data of employee, the measurement [RowCount] is row count. I want to count how many employee logged in the site twice [Two visits], five times [Five Visits],more than five [More Visits] during period of time, group by region and country. In the SQL, I used a temp table to count the visits for each employee, then use query like to display :
SUM(IIF([Visits]=2,1,0)) AS [TwoVisits],
SUM(IIF([Visits]=5,1,0)) AS [FiveVisits],
SUM(IIF([Visits]>5,1,0)) AS [MoreVisits],
FROM #EmployeeVisits EVPM
INNER JOIN [dbo].[tblEmployeeData] ED ON EVPM.[AltEmployeeId] = ED. [AltEmployeeId]
GROUP BY ED.[BusinessRegion], ED.[CountryCode] )
The result should be like this:
Region Country TwoVisits FiveVisits More Visits
1 US 1261 1054 913
2 IN 1829 1576 1441
3 GB 344 281 237
I worked out some query like this :
WITH
MEMBER [Measures].[twoVisits] AS
case when ([Measures].[Last Access 1 Count]) =2
then 1
else 0
end
MEMBER [Measures].[twoVisitsCount] AS
SUM([Employee].[Alt Employee Id].[Alt Employee Id].Members, [Measures]. [twoVisits])
SELECT
{[Measures].[twoVisitsCount]}ON 0,
NON EMPTY ({[Employee].[Business Region].& [1],[Employee].[Business Region].&[2],[Employee].[Business Region].&[3]},
{[Employee].[Country Code].[Country Code].Members}) ON 1
FROM
(
select(
{[Employee].[Status Id].&[1],[Employee].[Status Id].&[3],[Employee].[Status Id].&[4]},
{[Employee].[Employee Type].&[13],[Employee].[Employee Type].&[29],
[Employee].[Employee Type].&[9],[Employee].[Employee Type].&[25],
[Employee].[Employee Type].&[5],[Employee].[Employee Type].&[1],
[Employee].[Employee Type].&[14],[Employee].[Employee Type].&[30],
[Employee].[Employee Type].&[10],[Employee].[Employee Type].&[26],
[Employee].[Employee Type].&[6],[Employee].[Employee Type].&[2]},
{[Date].[Date Key].&[20160801]:[Date].[Date Key].&[20160803]}
) on 0
FROM [OLAP Prep]
)
But after long running , it gave me this:
twoVisitsCount
1 CA 36651
1 CL 36651
1 CO 36651
1 CR 36651
1 DO 36651
1 EC 36651
1 GT 36651
1 HN 36651
1 KY 36651
2 AU 36651
2 BD 36651
2 CN 36651
2 HK 36651
2 ID 36651
2 IN 36651
2 JP 36651
2 KR 36651
3 99 36651
3 AE 36651
3 AL 36651
All break down count are same.

Try changing this calculation as follows.
MEMBER [Measures].[twoVisitsCount] AS
SUM(Existing [Employee].[Alt Employee Id].[Alt Employee Id].Members, [Measures]. [twoVisits])
Your calculation looped across all employees. Adding the keyword Existing will only loop across employees in that business region and country.

Related

Get Total count

I want to merge two columns(Sender and Receiver) and get the Transaction Type count then merge another table with using Sender_Receiver primary id.
Sender Receiver Type Amount Date
773787639 777611388 1 300 2/1/2019
773631898 776806843 4 450 8/20/2019
773761571 777019819 6 369 2/11/2019
774295511 777084440 34 1000 1/22/2019
774263079 776816905 45 678 6/27/2019
774386894 777202863 12 2678 2/10/2019
773671537 777545555 14 38934 9/29/2019
774288117 777035194 18 21 4/22/2019
774242382 777132939 21 1275 9/30/2019
774144715 777049859 30 6309 7/4/2019
773911674 776938987 10 3528 5/1/2019
773397863 777548054 15 35892 7/6/2019
776816905 772345091 6 1234 7/7/2019
777035194 775623065 4 453454 7/20/2019
Second Table
Mobile_number Age
773787639 34
773787632 23
774288117 65
I am try to get like this kind of table
Sender/Receiver Type_1 Type_4 Type_12...... Type_45 Age
773787639 3 2 0 0 23
773631898 1 0 1 2 56
773397863 2 2 0 0 65
772345091 1 1 0 3 32
Ok, I have seen your old question and you just need inner join in sub-query as following:
SELECT
SenderReceiver,
COUNT(CASE WHEN Type = 1 THEN 1 END) AS Type_1,
COUNT(CASE WHEN Type = 2 THEN 1 END) AS Type_2,
COUNT(CASE WHEN Type = 3 THEN 1 END) AS Type_3,
...
COUNT(CASE WHEN Type = 45 THEN 1 END) AS Type_45,
Age -- changes here
FROM
( SELECT sr.SenderReceiver, sr.Type, st.Age from -- changes here
(SELECT Sender AS SenderReceiver, Type FROM yourTable
UNION ALL
SELECT Receiver, Type FROM yourTable) sr
join <second_table> st on st.Mobile_number = sr.SenderReceiver -- changes here
) t
GROUP BY
SenderReceiver,
Age; -- changes here
Changes done in your previous query are marked with comments -- changes here.
Please replace the name of the <second_table> with the original name of the table.
Cheers!!

Complex Networks in Hive - Optimization Code

I have a problem with how to get my Hive code optimized.
I have a huge table as follows:
Customer_id Product_id Date Value
1 1 02/28 100.0
1 2 02/02 120.0
1 3 02/10 144.0
2 2 02/15 120.0
2 3 02/28 144.0
... ... ... ...
I want to create a complex network where I link the products through the buyers. The graph does not have to be directed and I have to count the number of links between them.
In the end I need this:
Product_x Product_y amount
1 2 1
1 3 1
2 3 2
Can anyone help me with this?
I need an optimized way to do this. The join of the table with itself is not the solution. I really need an optimum way on this =/
CREATE TABLE X AS
SELECT
a.product_id as product_x,
b.product_id as product_y,
count(*) as amout
FROM table as a
JOIN table as b
ON a.customer_id = b.customer_id
WHERE a.product_id < b.product_id
GROUP BY product_x, product_y;

Oracle reduce result set on field duplication

I have a result set of a select in Oracle (12c) as the following:
GROUP_ID NAME ORDERING
1 AA 0
1 AA 1
1 AB 2
1 AC 3
2 BA 1
2 BA 2
2 BB 3
2 BC 4
I do not know how I could reduce the result set to remove rows based on one column while keeping the other fields. The expected outcome looks like the following:
GROUP_ID NAME ORDERING
1 AA 1
1 AB 2
1 AC 3
2 BA 2
2 BB 3
2 BC 4
I tried to solve it using group by but it got rid of the required field ordering. I am not an expert on window functions but I think it could be a valid attempt to use one.
From your data, it seems that you only need:
select group_id, name, max(ordering)
from yourTable
group by group_id, name

Using loop to insert values from one table to another

I have the following table named screening_plan:
plan_id movie_id plan_start_day plan_end_day plan_min_start_hh24 plan_max_start_hh24 screenings
1 1 1/06/2015 28/06/2015 9 17 5
2 2 1/06/2015 28/06/2015 9 22 4
3 3 1/06/2015 28/06/2015 9 22 5
4 4 1/06/2015 28/06/2015 9 17 4
And another tables theatre:
THEATRE_ID THEATRE_DESCRIPTION THEATRE_TOTAL_ROWS
1 2
2 2
3 3
4 2
There is a total of 18 screenings per day. I have to insert the details in the screening table as follows:
screening_id plan_id theatre_id screening_date screening_start_hh24 screening_start_mm60
1 1 3 1/06/2015 9 0
2 1 3 1/06/2015 11 30
3 1 3 1/06/2015 14 0
4 1/06/2015
plan_id is a foreign key referring table screening and theatre_id is a foreign key referring table theatre.
Each movie should be screened as per the screening number is defined
in the table screening_plan.
There is a break of 30 minutes between 2 consecutive screenings in
the same theatre.
The screening_start_hh24 should be less than plan_max_start_hh24.
Please note that the number of screenings for the first movie won't
fit into the provided time interval,so the second screening should be done
in an alternate theatre(preferably in theatre_id=2 starting from 11:30).
Each movie has a lenght of 2 hours.
I am stuck with this since yesterday. Tried doing it using the If-Else block, but that requires defining every condition. How can I do this using a loop?Please help.
My code(I have skipped the declaration part here):
BEGIN
SELECT plan_id INTO s_plan_id FROM screening_plan WHERE plan_id=1;
SELECT theatre_id INTO s_theatre_id FROM theatre WHERE theatre_id=1;
SELECT PLAN_START_DATE INTO s_screening_date FROM screening_plan WHERE plan_id=1;
SELECT Count(*) INTO s_count_theatre_id FROM screening;
IF s_count_theatre_id = 0
THEN
s_screening_start_hh24:=9;
s_screening_start_mm60:=0;
ELSIF s_count_theatre_id >0 AND s_count_theatre_id <=4
THEN
s_screening_start_hh24:=11 ;
s_screening_start_mm60:=30 ;
ELSE
Dbms_Output.put_line('---');
END IF;
INSERT INTO screening (plan_id, theatre_id, screening_date, screening_start_hh24, screening_start_mm60)
VALUES( s_plan_id,
s_theatre_id,
s_screening_date,
s_screening_start_hh24,
s_screening_start_mm60);
END;
There should be a total of 18 record in the table screening.5 for movie_id=1, 4 for movie_id=2, 5 for movie_id=3 and 4 for movie_id=3.

SQL / VBScript / Intelligent Algorithm to find sum combinations quickly

Am trying to list out all the possible sequential (continuous and forward direction only) sum combinations , within the same subject.
Listing out the row_id and the number of rows involved in the sum.
Sample :
Input (Source Table :)
DLID Subject Total
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Expected Result :
ID Number of Rows Subject Total
1 1 Science 70
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280
VBSript Code :
' myarray - reads the entire table from access database
' "i" is the total number of rows read
' "j" if to access each row one by one
' "m" is the number of subsequent rows with same subject , we are trying to check
' "n" is a counter to start from each row and check upto m - 1 rows whether same sub
' "k" is used to store the results into "resultarray"
myarray(0,j) = holds the row_id
myarray(1,j) = holds the subject
myarray(2,j) = holds the score
myarray(3 4 5 6 are other details
i is the total number of rows - around 80,000
There can be conitnuous records from the same subject as many as 700 - 800
m = is the number of rows matching / number of rows leading to the sum
For m = 1 to 700
For j = 0 to i-m
matchcount = 1
For n = 1 to m-1
if myarray(1,j) = myarray (1,j+n) Then
matchcount = matchcount + 1
Else
Exit For
End If
Next
If matchcount = m Then
resultarray(2,k) = 0
For o = 0 to m - 1
resultarray(2,k) = CDbl(resultarray(2,k)) + CDbl (myarray (2,j+o))
resultarray(1,k) = m
resultarray(0,k) = ( myarray (0,j) )
resultarray(3,k) = ( myarray (3,j) )
resultarray(4,k) = ( myarray (4,j) )
resultarray(5,k) = ( myarray (1,j) )
resultarray(7,k) = ( myarray (5,j) )
resultarray(8,k) = ( myarray (6,j) )
Next
resultarray(2,k) = round(resultarray(2,k),0)
k = k + 1
ReDim Preserve resultarray(8,k)
End If
Next
Next
Code is working perfect , but is very slow.
Am dealing with 80,000 row and from 5 to 900 continuous rows of same subject.
So the number of combinations , comes in a few millions.
Takes few hours for one set of 80,000 rows. have to do many sets daily.
Please suggest how to speed this up.
Better Algorithm / Code Improvements / Different Language to code
Please assist.
Here are the building blocks for a "real" Access (SQL) solution.
Observation #1
It seems to me that a good first step would be to add two Numeric (Long Integer) columns to the [SourceTable]:
[SubjectBlock] will number the "blocks" of rows where the subject is the same
[SubjectBlockSeq] will sequentially number the rows within each block
They both should be indexed (Duplicates OK). The code to populate these columns would be...
Public Sub UpdateBlocksAndSeqs()
Dim cdb As DAO.Database, rst As DAO.Recordset
Dim BlockNo As Long, SeqNo As Long, PrevSubject As String
Set cdb = CurrentDb
Set rst = cdb.OpenRecordset("SELECT * FROM [SourceTable] ORDER BY [DLID]", dbOpenDynaset)
PrevSubject = "(an impossible value)"
BlockNo = 0
SeqNo = 0
DBEngine.Workspaces(0).BeginTrans ''speeds up bulk updates
Do While Not rst.EOF
If rst!Subject <> PrevSubject Then
BlockNo = BlockNo + 1
SeqNo = 0
End If
SeqNo = SeqNo + 1
rst.Edit
rst!SubjectBlock = BlockNo
rst!SubjectBlockSeq = SeqNo
rst.Update
PrevSubject = rst!Subject
rst.MoveNext
Loop
DBEngine.Workspaces(0).CommitTrans
rst.Close
Set rst = Nothing
End Sub
...and the updated SourceTable would be...
DLID Subject Total SubjectBlock SubjectBlockSeq
1 Science 70 1 1
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
5 Maths 80 2 1
6 Maths 90 2 2
7 English 90 3 1
8 English 80 3 2
9 English 70 3 3
10 Science 75 4 1
(Note that I tweaked your test data to make it easier to verify the results below.)
Now as we iterate through the ever-increasing "length of sequence to be included in the total" we can quickly identify the "blocks" that are of interest simply by using a query like...
SELECT SubjectBlock FROM SourceTable WHERE SubjectBlockSeq=3
...which will return...
1
3
...indicating that when calculating the totals for a "run of 3" we won't need to look at blocks 2 ("Maths") and 4 (the last "Science" one) at all.
Observation #2
The first time through, when NumRows=1, is a special case: it just copies the rows from [SourceTable] into the [Expected Results] table. We can save time by doing that with a single query:
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 1 AS Expr1, SourceTable.Subject, SourceTable.Total,
SourceTable.SubjectBlock, SourceTable.SubjectBlockSeq+1 AS Expr2
FROM SourceTable;
You may notice that I have added two columns to the [ExpectedResult] table: [SubjectBlock] (as before) and [NextSubjetBlockSeq] (which is just [SubjectBlockSeq]+1). Again, they should both be indexed, allowing duplicates. We'll use them below.
Observation #3
As we continue looking for longer and longer "runs" to sum, each run is really just an earlier (shorter) run with an additional row tacked onto the end. If we write our results to the [ExpectedResults] table as we go along, we can re-use those values and not bother going back and adding up the individual values for the entire run.
When NumRows=2, the "add-on" rows are the ones where SubjectBlockSeq>=2...
SELECT SourceTable.*
FROM SourceTable
WHERE (((SourceTable.SubjectBlockSeq)>=2))
ORDER BY SourceTable.DLID;
...that is...
DLID Subject Total SubjectBlock SubjectBlockSeq
2 Science 60 1 2
3 Science 75 1 3
4 Science 70 1 4
6 Maths 90 2 2
8 English 80 3 2
9 English 70 3 3
...and the [ExpectedResult] rows with the "earlier (shorter) run" onto which we will be "tacking" the additional row are the ones
from the same [SubjectBlock],
with [NumRows]=1, and
with [ExpectedResult].[NextSubjectBlockSeq] = [SourceTable].[SubjectBlockSeq]
so we can get the new totals and append them to [ExpectedResult] like this
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 2 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=2) AND (ExpectedResult.NumRows=1));
The rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
2 2 Science 130 1 3
3 2 Science 135 1 4
4 2 Science 145 1 5
6 2 Maths 170 2 3
8 2 English 170 3 3
9 2 English 150 3 4
Now we're cookin'...
Using the same logic as before, we can now process for NumRows=3. The only differences are that we will be inserting the value 3 into NumRows, and our selection criteria will be
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2))
The complete query is
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 3 AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2));
and the rows appended to [ExpectedResult] are
DLID NumRows Subject Total SubjectBlock NextSubjectBlockSeq
3 3 Science 205 1 4
4 3 Science 205 1 5
9 3 English 240 3 4
Parameterization
Since each successive query is so similar, it would be awfully nice if we could just write it once and use it repeatedly. Fortunately, we can, if we turn it into a "Parameter Query":
PARAMETERS TargetNumRows Long;
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, [TargetNumRows] AS Expr1, SourceTable.Subject,
[ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal,
SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult
ON (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
AND (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
WHERE (((SourceTable.SubjectBlockSeq)>=[TargetNumRows])
AND ((ExpectedResult.NumRows)=[TargetNumRows]-1));
Create a new Access query, paste the above into the SQL pane, and then save it as pq_appendToExpectedResult. (The "pq_" is just a visual cue that it's a Parameter Query.)
Invoking a Parameter Query from VBA
You can invoke (execute) a Parameter Query in VBA via a QueryDef object:
Dim cdb As DAO.Database, qdf As DAO.QueryDef
Set cdb = CurrentDb
Set qdf = cdb.QueryDefs("pq_appendToExpectedResult")
qdf!TargetNumRows = 4 '' parameter value
qdf.Execute
Set qdf = Nothing
When to stop
Now you can see that it's simply a matter of incrementing NumRows and re-running the Parameter Query, but when to stop? That's easy:
After incrementing your NumRows variable in VBA, test
DCount("DLID", "SourceTable", "SubjectBlockSeq=" & NumRows)
If it comes back 0 then you're done.
Show me (all) the code
Sorry, not right away. ;) Play around with this and let us know how it goes.
Your question is:
"Please suggest how to speed this up. Better Algorithm / Code Improvements / Different Language to code Please assist."
I can answer part of your question quickly in short. "Different Language to code" == SQL.
In detail:
Whatever it is you're trying to achieve looks dataset intensive. I'm almost certain this processing would be handled more efficiently within the DBMS that houses your data, as the DBMS is able to take a (reasonably well written) SQL query and optimise it based on its own knowledge of the data you are interrogating, and perform aggregation over large sets/sub-sets of data very quickly and efficiently.
Iterating over large datasets row-by-row to accumulate values is rarely (dare I say never) going to yield acceptable performance. Which is why DBMSes don't do this natively (if you don't force them to by using iterative code, or code that needs to investigate each row, such as your VB code).
Now, for the implementation of Better Algorithm / Code Improvements / Different Language.
I've done this in SQL, but regardless of if you use my solution or not, I would still highly recommend you migrate your data to eg MS SQL or Oracle or mySql etc if you find that your use of MS Access is binding you to iterative approaches (which is not to suggest it is doing that... I don't know if this is the case or not).
But if this is genuinely not homework, and/or you are genuinely tied to MS Access, then perhaps an investment of effort to convert this to MS Access might be fruitful in terms of performance. The principles should all be the same - it's a relational database and this is all fairly standard SQL, so I would've thought there'd be Access equivalents for what I've done here.
Failing that, you should be able to "point" an MSSQL instance at the MS Access file, as a linked server via an Access provider. If you'd like advice on this, let me know.
There's some code here that is procedural by nature, in order to set up some "helper" tables that will allow the heavy-lifting aggregation on your sequences to be done using set-based operations.
I've called the source table "Your_Source_Table". Do a search-replace on all instances to rename as whatever you've called it.
Note also that I haven't set up indexes on anything... you should do this. Indexes should be created for all the columns involved in joins, I expect. Checking the execution plan to ensure there's no unnecessary table scans would be wise.
I used the following to create Your_Source_Table:
-- Create Your apparent table structure
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Your_Source_Table]') AND type in (N'U'))
DROP TABLE [dbo].[Your_Source_Table]
GO
CREATE TABLE [dbo].[Your_Source_Table](
[DLID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[Total] [int] NOT NULL
) ON [PRIMARY]
GO
And populated it as:
DLID Subject Total
----------- ---------- -----------
1 Science 70
2 Science 70
3 Science 70
4 Science 70
5 Maths 80
6 Maths 80
7 English 90
8 English 90
9 English 90
10 Science 75
Then, I created the following "helpers". Explanations in code.
-- Set up helper structures.
-- Build a number table
if object_id('tempdb..##numbers') is not null
BEGIN DROP TABLE ##numbers END
SELECT TOP 10000 IDENTITY(int,1,1) AS Number -- Can be 700, 800, or 900 contiguous rows, depending on which comment I read. So I'll run with 100000 to be sure :-)
INTO ##numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE ##numbers ADD CONSTRAINT PK_numbers PRIMARY KEY CLUSTERED (Number)
-- Determine where each block starts.
if object_id('tempdb..#tempGroups') is not null
BEGIN DROP TABLE #tempGroups END
GO
CREATE TABLE #tempGroups (
[ID] [int] NOT NULL IDENTITY,
[StartID] [int] NULL,
[Subject] [nchar](10) NULL
) ON [PRIMARY]
INSERT INTO #tempGroups
SELECT t.DLID, t.Subject FROM Your_Source_Table t WHERE DLID=1
UNION
SELECT
t.DLID, t.Subject
FROM
Your_Source_Table t
INNER JOIN Your_Source_Table t2 ON t.DLID = t2.DLID+1 AND t.subject != t2.subject
-- Determine where each block ends
if object_id('tempdb..##groups') is not null
BEGIN DROP TABLE ##groups END
CREATE TABLE ##groups (
[ID] [int] NOT NULL,
[Subject] [nchar](10) NOT NULL,
[StartID] [int] NOT NULL,
[EndID] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##groups
SELECT
g1.id as ID,
g1.subject,
g1.startID as startID,
CASE
WHEN g2.id is not null THEN g2.startID-1
ELSE (SELECT max(dlid) FROM Your_Source_Table) -- Boundary case when there is no following group (ie return the last row)
END as endID
FROM
#tempGroups g1
LEFT JOIN #tempGroups g2 ON g1.id = g2.id-1
DROP TABLE #tempGroups;
GO
-- We now have a helper table called ##groups, that identifies the subject, start DLID and end DLID of each continuous block of a particular subject in your dataset.
-- So now, we can build up the possible sequences within each group, by joining to a number table.
if object_id('tempdb..##sequences') is not null
BEGIN DROP TABLE ##sequences END
CREATE TABLE ##sequences (
[seqID] [int] NOT NULL IDENTITY,
[groupID] [int] NOT NULL,
[start_of_sequence] [int] NOT NULL,
[end_of_sequence] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO ##sequences
SELECT
g.id,
ns.number start_of_sequence,
ne.number end_of_sequence
FROM
##groups g
INNER JOIN ##numbers ns
ON ns.number <= (g.endid - g.startid + 1) -- number is in range for this block
INNER JOIN ##numbers ne
ON ne.number <= (g.endid - g.startid + 1) -- number is in range for this block
and ne.number >= ns.number -- end after start
ORDER BY
1,2,3
Then, the results you're after can be achieved with a single set-based operation:
-- By joining groups to your dataset we can add a group identity to each record.
-- By joining sequences we can generate copies of the rows for aggregation into each sequence.
select
min(t.dlid) as ID, -- equals (s.start_of_sequence + g.startid - 1) (sequence positions offset by group start position)
count(t.dlid) as number_of_rows,
g.subject,
sum(t.total) as total
--select *
from
Your_Source_Table t
inner join ##groups g
on t.dlid >= g.startid and t.dlid <= g.endid -- grouping rows into each group.
inner join ##sequences s
on s.groupid = g.id -- get the sequences for this group.
and t.dlid >= (s.start_of_sequence + g.startid - 1) -- include the rows required for this sequence (sequence positions offset by group start position)
and t.dlid <= (s.end_of_sequence + g.startid - 1)
group by
g.subject,
s.seqid
order by 2, 1
BUT NOTE:
This result is NOT exactly the same as your "Expected Result".
You've incorrectly included a duplicate instance of the 1 row sequence starting at row 1 (for science, sum total 1*70=70), but not included the 4 row sequence starting at row 1 (for science, sum total 4*70 = 280).
The correct results, IMHO are:
ID number_of_rows subject total
----------- -------------- ---------- -----------
1 1 Science 70 <-- You've got this row twice.
2 1 Science 70
3 1 Science 70
4 1 Science 70
5 1 Maths 80
6 1 Maths 80
7 1 English 90
8 1 English 90
9 1 English 90
10 1 Science 75
1 2 Science 140
2 2 Science 140
3 2 Science 140
5 2 Maths 160
7 2 English 180
8 2 English 180
1 3 Science 210
2 3 Science 210
7 3 English 270
1 4 Science 280 <-- You don't have this row.
(20 row(s) affected)

Resources