Oracle: Validate input data with given data structure rules - oracle

I have table called data rules. Its given below with some explanation.
Data|GroupNum|GroupType|GroupMinOcc|GroupMaxOcc|DataStatus|DataMinOccWithinGroup|DataMaxOccurenceWithinGroup|IDX
ABC |GroupA |Mandatory| 1 | 1 | Mandatory| 1 | 1 |1
DEF |GroupB |Mandatory| 1 | 1 |Mandatory | 1 | 1 |2
GHI |GroupC |Mandatory| 1 | 1 |Mandatory | 1 | 1 |3
JKL |GroupD |Optional | 0 | 1 |Optional | 0 | 1 |4
FFF |Group1 |Optional | 0 | 1 |Mandatory | 1 | 1 |5
RRR |Group1 |Optional | 0 | 1 |Optional | 0 | 2 |6
MMM |Group2 |Optional | 0 | 2 |Mandatory | 1 | 1 |7
PPP |Group2 |Optional | 0 | 2 |Optional | 0 | 1 |8
CCC |Group3 |Optional | 0 | 2 |Optional | 0 | 2 |9
SSS |Group4 |Mandatory| 1 | 2 |Mandatory | 1 | 1 |10
TTT |Group4 |Mandatory| 1 | 2 |Mandatory | 0 | 2 |11
Let me explain you this data rules first.
1) A group can have multiple data records
Here as you can see GroupA is having only ABC data and Group 1 is having FFF and RRR data.
2) A group can be mandatory and optional. It means a group will appear definitely if its mandatory. Secondly If its mandatory, then its data records also having mandatory and optional status.
For example: Check group4
This group is mandatory and its First data SSS is also mandatory. It means this group is mandatory and when it will occur this data should also occur. But second data in this group is TTT which is optional. No matter group is mandatory, but this data is optional inside mandatory group so it can occur from 0 to 2 times
Lets Say this group occur two times..It would look like this
Group4 Example: Valid
SSS
TTT
TTT
SSS
TTT
Invalid Group4 occurrence
SSS
SSS
TTT
TTT
TTT
Its invalid because in second occurrence of group TTT is occurred 3 times but it should not exceed from 2
3) If group is optional it can be appear or cannot be.
So As you can see, GroupD, Group2 and Group3 are optional So after GroupC directly Group4 data also can come in input data..like this
ABC
DEF
GHI
SSS
TTT
I want to capture exact IDX number from Data rules table from their respective groups only, If input data dont follow the rules mentioned in data rules table.
For example 1st Input Data Example
ABC
DEF
GHI
JKL
JKL
SSS
As you can see here JKL is optional data in optional group. But if this optional group occurs this JKL should come only one time. But it came twice. So I want return that IDX number 4.
2nd Data Example.
ABC
DEF
GHI
TTT
Here it should return IDX number 10. Because from mandatory group4 mandatory data SSS is missing and in data rules its IDX is 10
3rd Example
ABC
DEF
GHI
SSS
SSS
TTT
In this also IDX return for SSS should be 10. Because Its occurred twice. And as you can see in data rules whole Group4 can repeat one time only and whenever it will occur SSS will come only one time. SO that's a error
Many errors can occur same time as well. SO IDX number needs to returned from their respective group data only from data rules table.
In input data, Only one column will come with data records only.
Note: Group data will appear only in series like mentioned in data rules from top to bottom. And can be appeared or not on the basis of definition mentioned in data rules table.
Any suggestions..how can I achieve this ?

Related

Google Sheets Query get Even/Odd rows in grouped results

I have a long list of rows with dates on the side, and a text field after
01/01/2019 | ABC | ...
The list is ordered by date, and may have between 1 and 4 rows per date
01/01/2019 | ABC | ...
01/01/2019 | DEF | ...
05/01/2019 | ABC | ...
05/01/2019 | DEF | ...
05/01/2019 | ABC | ...
05/01/2019 | GHI | ...
10/01/2019 | ABC | ...
10/01/2019 | XYZ | ...
I can happily run a QUERY() which groups by the date and COUNT()s the number of rows matching that date
01/01/2019 | 2
05/01/2019 | 4
10/01/2019 | 2
I'm trying to use a series of functions in acceptable Google Sheets format which will group the items by date, and then only return the Nth rows. I'm also happy with EVEN/ODD rows here.
Importantly, I don't want the EVEN/ODD based on the actual spreadsheet ROW(), but I need the EVEN/ODD/Nth based on the number of matching rows in the aggregated group, if that makes sense.
So I would like this output:
EVENS
01/01/2019 | DEF | (row 2 in group)
05/01/2019 | DEF | (row 2 in group)
05/01/2019 | GHI | (row 4 in group)
10/01/2019 | XYZ | (row 2 in group)
ODDS
01/01/2019 | ABC | (row 1 in group)
05/01/2019 | ABC | (row 1 in group)
05/01/2019 | ABC | (row 3 in group)
10/01/2019 | ABC | (row 1 in group)
Ultimately, my aim is to count all the occurrences of the text field (ABC/DEF/GHI/etc) that happen as the FIRST or SECOND or THIRD or FOURTH event for any particular day, then sort descending, but only include them (for example) if ABC was an EVEN row of that group, or if XYZ was an ODD row within that group (eg row 2 of the group, ignoring the fact in the whole spreadsheet it happens to be on row 35)
ABC | 156
DEF | 30
GHI | 10
JKL | 8
MNO | 7
XYZ | 1
You could do it with one formula if you wanted to
=filter(A2:B,ISEVEN(row(A2:A)-match(A2:A,A2:A,0)))
and
=filter(A2:B,isodd(row(A2:A)-match(A2:A,A2:A,0)+1))
assuming the data starts in row 2.
If the data started in a different row, you could do a lookup on the row:
=filter(A2:B,ISODD(row(A2:A)-vlookup(A2:A,{A2:A,row(A2:A)},2,false)))
and
=filter(A2:B,ISEVEN(row(A2:A)-vlookup(A2:A,{A2:A,row(A2:A)},2,false)))
you can add helper column like:
=ARRAYFORMULA(IF(LEN(A1:A), COUNTIFS(B1:B, B1:B, ROW(B1:B), "<="&ROW(B1:B)), ))
and then filter for even and odd like:
=FILTER(A1:B, ISEVEN(C1:C))
=FILTER(A1:B, ISODD(C1:C))

MySQL - Top 5 rank best seller plans or courses

I sell subscriptions of my online course, as well as the courses in retail.
I would bring the "top 5" of best selling plans / courses. For this, I have a table called "subscriptionPlan", which stores the purchased plan ID, or in the case of a course, the course ID, and the amount spent on this transaction. Example:
table subscriptionPlan
sbpId | subId | plaId | couId | sbpAmount
1 | 1 | 1 | 1 | 499.99
2 | 2 | 1 | 2 | 499.99
3 | 3 | 2 | 0 | 899.99
4 | 4 | 1 | 1 | 499.99
Just for educational purposes, plaId = 1 is a plan called "Single Sale" that I created, to maintain the integrity of the DB. When the couId isn't empty, you also have bought a separate course, not a plan where you can attend any course.
My need is: List the top 5 sales. If it is a plan, display the plan name (plan table, column plaTitle). If it is a course, display its name (table course, colna couTitle). This logic that I can't code. I was able to rank a top 5 of PLANS, but it groups the courses, since the GROUP BY is by the ID of the plan. I believe the prank is here, maybe creating an IF / ELSE in this GROUPBY, but I don't know how to do this.
The query that i code, to rank my top 5 plans is:
SELECT sp.plaId, sp.couId, p.plaTitle, p.plaPermanent, c.couTitle, SUM(sbpAmount) AS sbpTotalAmount
FROM subscriptionPlan sp
LEFT JOIN plan p ON sp.plaId = p.plaId
LEFT JOIN course c ON sp.couId = c.couId
GROUP BY sp.plaId
ORDER BY sbpTotalAmount DESC
LIMIT 5
The result that i expected was:
plaId | couId | plaTitle | couTitle | plaPermanent | sbpTotalAmount
1 | 1 | Venda avulsa | Curso 01 | true | 999.98
2 | 0 | Acesso total | null | false | 899.99
3 | 2 | Venda avulsa | Curso 02 | true | 499.99
How could I get into this query formula?
When grouping you can use:
Simple columns, or
Any [complex] expression.
In your case, it seems you need to group by an expression, such as:
GROUP BY CASE WHEN sp.plaId = 1 THEN -1 ELSE sp.couId END
In this case I chose -1 as the grouping for the "Single Plan". You can replace the value for any other that doesn't match any couId.

In SCD Type2 how to find latest record

I have table i have run the job in scdtype 2 load the data below
no | name | loc |
-----------------
1 | abc | hyd |
-----------------
2 | def | bang |
-----------------
3 | ghi | chennai |
then i have run the second run load the data given below
no | name | loc |
-----------------
1 | abc | hyd |
-----------------
2 | def | bang |
-----------------
3 | ghi | chennai |
--------------------
1 | abc | bang |
here no dates,flags,and run ids
how to find second updated record in this situtation
Thanks
I don't think you'll be able to distinguish between the updated record and the original record.
A Dimension table using Type 2 SCD requires additional columns that describes the period in which the record is valid (or current), exactly for this reason.
The solution is to ensure your dimension table has these columns (Typically ValidFrom and ValidTo dates or date/times, and sometimes an IsCurrent flag for good measure). Your ETL process would then populate these columns as part of making the Type 2 updates.

Emit a group only when condition meets

I have requirement to emit all records corresponds to a group, only when a condition is met. Below is the sample data set with alias name as "SAMPLE_DATA".
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
3 | 3 | 1
3 | 2 | 2
4 | 5 | 1
4 | 6 | 2
SAMPLE_DATA_GRP = GROUP SAMPLE_DATA BY Col-1;
RESULT = FOREACH SAMPLE_DATA_GRP {
max_value = MAX(SAMPLE_DATA.Col-2);
IF(max_value >= 5)
GENERATE ALL RECORDS IN THAT GROUP;
}
RESULT should be:
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
---- ---- ---
4 | 5 | 1
4 | 6 | 2
Two groups got generated. First group is generate because max value of 4,5 is "5"(which meets our condition >=5). Same for second group (6 >= 5).
As I would be performing this operation on big dataset operations like distinct and join would be overkill. For this reason I have come up with pseudo code with one grouping to perform this operation.
Hope I have provided enough information. Thanks in advance.
I would be performing this operation on a huge data set. Doing operation like distinct and join would be overkill on the system. For this reason I have come up with this grouping approach.
Please try the below code and see..
This solution is little lengthy ,but it will work
numbers = LOAD '/home/user/inputfiles/c1.txt' USING PigStorage(',') AS(c1:int,c2:int,c3:int);
num_grp = GROUP numbers by c1;
num_each = FOREACH num_grp
{
max_each = MAX(numbers.c2);
generate flatten(group) as temp_c1, (max_each >= 5 ?1 :0) as indicator;
};
num_each_filtered = filter num_each BY indicator == 1;
num_joined = join numbers BY c1,num_each_filtered by tem_c1;
num_output = FOREACH num_joined GENERATE c1,c2,c3;
dump num_output;
O/p:
Col-1 | Col-2 | Col-3
-------------------------
2 | 4 | 1
2 | 5 | 2
---- ---- ---
4 | 5 | 1
4 | 6 | 2

Group Unique ID

In stata if I have a list if groups:
XYZ
ABC
ABC
BCH
JSA
BCH
XYZ
How I get each group to have a unique ID in a second column after sorting, for example:
ABC 1
BCH 2
JSA 3
XYZ 4
You need sort, then group(), which is part of egen.
sysuse auto,clear
sort make
egen make_gp = group(make)
This yields:
. list make make_gp in 1/5
+-------------------------+
| make make_gp |
|-------------------------|
1. | AMC Concord 1 |
2. | AMC Pacer 2 |
3. | AMC Spirit 3 |
4. | Buick Century 7 |
5. | Buick Electra 8 |
+-------------------------+

Resources