Issue with OBIEE 11G. Records being counted twice - oracle

My development team is working with OBIEE 11G to make an analysis as follows:
Discover which policyholders are in alert. An alert is defined as this: when the quantity of claims of a policyholder is superior to a certain threshold. If the policyholder has at least one alert in one of its claims, then the policyholder is in alert. The problem is that those thresholds are defined to a particular key (combination of type of client, range of age, type of pathology and other stuff) and a policyholder can have many keys and a threshold for each key, so the quantity of claims varies. Something like this:
Policyholder Key #Claims Threshold
ABC123 XYZ 3 4
WQE 3 2
EFG456 ABC 1 2
The ABC123 policyholder has 6 claims in total, 3 for the key XYZ (which has a threshold of 4) and 3 for the key WQE (which has a threshold of 2). On the other hand, the EFG456 policyholder has 1 claim for the key ABC that has a threshold of 2. So in this case, ABC123 policyholder should be in alert because the quantity of claims for the key WQE is greater than the threshold.
So, in OBIEE 11G my team added two columns, one to mark the records in alert and one to mark the records which are not in alert. Like this:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 1
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
You see the problem now? OBIEE 11G does not see policyholder ABC123 as a unit and mark it both as in alert and not in alert, which is wrong. The correct info should be:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 0
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
Because, it doesn't matter if the policyholder did not reach the alert for key XYZ. If an alert is discovered, the complete file of the policyholder is examined to resolve the alert.
Is there anyway of telling this to OBIEE 11G???
Please help!!

I think this is a dimensional modeling problem instead of an OBIEE one:
In order to help I will make a few assumptions:
PolicyHolder and Key are separate dimensions:
Although the "key" dimension contains some attributes from the policyholder,
such as type of client and age group; it also combines other entities like pathology and to me
that is enough to consider it at least a mini dimension.
The "Is in alert flag" can be modeled as a factless fact table:
It looks like you only need to know if a particular policyholder is in alert,
there is no metric associated with the event and you only need a flag that is either 0 or 1. This can be solved with a simple table that includes at least 3 columns: FK_POLICYHOLDER,FK_DATE and the flag. You already have a flag but it is included in the claims table as a calculated column, if you model this flag as a separate table you will have control of the dimensionality and granularity of the alert. See What dw model is appropriate when there's no measure?.
The metric "number of claims" has a different dimensionality than the alert flag.
I think the crux of the problem is that flags are calculated at the key level but for reporting purposes are only needed at the Policyholder level.
If you want alerts to be assigned to a PolicyHolder "as a unit" then you need a fact table that is linked to the PolicyHolder dimension and NOT LINKED to the key dimension
Concretely:
Create a separate dimension table for your "Key" entity (type of client, pathology, etc.)
Create a new factless fact table that contains alerts for a policyholder, this table should not link to the Key dimension.
Change the "Alert" column in your report, you should get that value from the flag counter of your new factless fact table.

Firstly, the ALERT columns seem redundant. It's an incredibly simple calculation that would be better done by OBI dynamically. That way you can check for policy holders in alert (on the aggregate of their keys) or for each key.
If I wanted to fix that calculation in OBI I would do it with a calculated logical column in the BMM (based on other logical columns) simply evaluating CLAIMS against THRESHOLD:
CASE WHEN CLAIMS >= THRESHOLD THEN 1 ELSE 0 END
That way the flag can work at multiple levels (either for POLICYHOLDER or KEY). But it seems a very simple calculation that could just be done in the Analysis as a filter (or selection step).
Even simpler though (assuming you have CLAIMS and THRESHOLD as measure columns with a SUM aggregation, and POLICYHOLDER and KEY as dimension column) would be to ignore any sort of alert column altogether. If you don't bring KEY into the Analysis OBI would give you each policy holder, their total claims and total threshold. You could then use selection steps or a filter in the criteria to remove those not over the threshold.

Related

I would like to create an efficient Bigtable row key

I would like to create an optimal row key in Bigtable. I have a table channel_data with 3 columns: channel_id,date,fan_count.
channel_id
date
fan_count
1
2022-03-01
5000
1
2022-03-02
6000
2
2022-03-01
200
2
2022-03-02
300
3
2022-03-03
1000
Users of our application can set up brands/buckets by adding multiple channels. Users can choose any random channel_id.
I want to design an efficient row key to fetch aggregated fan_count in a date range for a brand.
Let's say the user creates a brand with channel_id 1 and 3 and wish to see sum of all fans for the time period 2022-03-01 to 2022-03-03
The result should be 5000+6000+1000=12000
You have a few options here. Because you're looking to do queries based on date, you should probably make that the end part of your rowkey so you can scope down by brand first. You could also use timestamped cells to store multiple values for each channel. Perhaps a week or month of data, so it is grouped together in that way, but this isn't necessary.
Perhaps a rowkey like channel_id/yyyy-mm-dd is what you'd want. You can choose to store the date and channel info in the table, but it isn't necessary since you'd have it in your ids. You can just treat Bigtable like a key/value store in this instance which might be more optimal depending on your scenario.
If you choose to store a month of data per row, you would just make the rowkey something like channel_id/yyyy-mm and just timestamp each value for the day.
Either way for your queries, if you need multiple channels, then you could just do multiple reads or a multi-prefix scan. Let me know if this helps clarify the schema design and if you have more questions.

Google Sheets: Data Validation - Unique row values across multiple columns

Good day,
I have seen from here a solution to control duplicate entries into a single column. A Data validation with this custom formula works well for one column.
I would like to achieve the same effect over multiple columns ... i.e. unique row entries across multiple columns. Take for example below three columns A-C. Only when values {1,2,1} are entered for the second time will the input be rejected.
A B C
1 1 1
1 2 1
1 2 2
2 2 2
1 2 1 X Entry should be rejected.
Is there a quick way to do this using Data Validation - custom formulae?
use custom formula for data validation:
=INDEX(COUNTIF($A$1:$A&"×"&$B$1:$B&"×"&$C$1:$C, $A1&"×"&$B1&"×"&$C1)<2)

PowerBI filter table based on value of measure_A OR measure_B [duplicate]

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!

Sort by shared variable

My report group displays the net cost of the top 10 clients. {#NetCost} is a shared variable for calculate sales cost - all taxes and all discounts:
if travelType="OW" then
salesCost-Tax1-Tax2-Tax3-Discount1-Discount2
else
(salesCost-Tax1-Tax2-Tax3-Discount1-Discount2)/2
shared currencyvar netCostTotal:=netCostTotal+{#netCost};
I know how to sort by one database field, but how can I sort by a running shared variable?
client# client name Net Cost
1 1234 XXXXX 150.22
2 2345 XXXXX 140.11
3 4567 XXXXX 120.00
Shared variables open up a lot of room for error, but it should still be possible to sort by their results: Create a new formula field, and in its formula just put the shared netCost variable. From here you can insert a Group based on this new Formula field. Depending on the timing of events in the report, you might need to use whileprintingrecords in this Formula.
In this particular case you sorted on a Summary of your netCost value and sorting by that, which earned the correct answer.

oracle PL/SQL with automatic values

Hi folks I have a question regarding calculating the value in a column in oracle.
So I have this table
NAME PROCESS1 PROCESS2 WEIGHT TOTAL_WEIGHT
ITEM1 0 0 10
ITEM2 1 1 10
ITEM3 1 1 15
So what I am trying to do here is generating the value in total_weight based on process1 and process2 in PL/SQL. Because later on I need to show the sum of total weight in PHP page. so for item 2 the total weight should be 20 and for item 3 it should be 30. Should I use procedure for generating the value in total weight? I want the value is updated when user change the value in Process1 or process2. Please help me I am kinda newbie here.
select name, process1, process2, weight, (process1+process2)*weight total_weight
from table
I see no reason that PL/SQL has to play a role in this, unless I misunderstood the requirement.
Declaring VIRTUAL (Computed) COLUMNS in Oracle Table Design
I'll agree with most of what has been said so far with some additional elaboration. My starting table design looks similar but it too is also inaccurate for some use cases as explained below.
CREATE TABLE "PROCESSED_PRODUCT_WEIGHT" (
"PRODUCT_NAME" VARCHAR2(40) NOT NULL,
"PROCESS1" NUMBER,
"PROCESS2" NUMBER,
"WEIGHT" NUMBER,
"TOTAL_WEIGHT" NUMBER GENERATED ALWAYS AS
((PROCESS1 + PROCESS2)*WEIGHT) VIRTUAL,
"RECORDED_DATE" DATE,
CONSTRAINT "PROCESSED_PRODUCT_WEIGHT_PK"
PRIMARY KEY ("PRODUCT_NAME", "RECORDED_DATE")
)
/
Previous Suggestions and Assumptions
Table Bound Attribute Properties: The table construct used by #Bob Jarvis is also known as a VIRTUAL COLUMN. It works well because the definition of TOTAL_WEIGHT is entirely dependent on other values contained within the same table.
SQL Query Associated Calculation: On the other hand, #Nishanthi Grashia and #OldProgrammer both recommend modifying the value within each SQL query executed against the database.
BOTH Cases may work assuming that the mass per unit of the product does not change during the lifetime of the production cycle.
An example where this assumption is not flexible is if the products consist of units of varying mass per unit volume.
Since it was not mentioned in the OP, consider this possibility:
Products ITEM1, ITEM2 and ITEM3 have variable weights per unit.
They are all produced in a coffee packaging plant.
Each item can be a type of coffee bean and its source.
"Processes" could be bean "treatments" such as decaffeination, roasting type or flavor infusion.
The "units" could be packaging of varying sizes. This would mean that package volumes would have a direct effect on the mass (called "weight") per product unit counted.
Test Cases for Identifying the Effect of Changing Unit Sizes
Each test case shows how a virtual column does not satisfy the possibility of variations in the unit sizes and masses of each product over time.
Test Case One:
For production observations made 2/14/2015
Test Case Two:
The mass per unit processed on 3/14/2014 is increased only, skewing the total mass produced since the item quantities made previously are multiplied by a larger value through the virtual column definition.
Test Case Three:
Data Output and Results
Above are the test results associated with all three test cases. the resulting values are not correct for the use cases created. They demonstrate that for a changing weight value, the virtual/calculated column formula and approach gives incorrect results.
A Discussion of Alternate Solutions
The trigger approach may work for maintaining calculated values for TOTAL_WEIGHT. Incremental changes (updates) are appended to the current, existing value as each component varies.
Force all DML through a single DML operation contained in a CRUD package. The problem with defining an embedded SQL statement to enforce requirements is that other processes and their developers will need to be familiar with what your isolated PHP form/page does within your app in order to duplicate it for their own operation.
If there is a concern about overhead or possible locking of the main table, then consider introducing a composite key: PRODUCT_NAME + WEIGHT. This covers for the problem so that quantities of the same product name are multiplied by their correct weight and values already calculated remain unchanged even if the weight multiplier is modified.
SOMETIMES, ALWAYS, NEVER... Are popular assumptions thrown around in developer's project circles. How likely is this to happen at all? It depends... if you're a coffee bean packaging outfit, I'd say it's quite possible.
Onward!
From what I understand, you want the field TOTAL_WEIGHT to be updated whenever the value in process 1 or process2 changes. So, Ideally you must be using TRIGGERS for this.
TRIGGERS are used to trigger an action based on an initial event.
So, for your case, the initial event is "Value change of process1 or process2" and the action expected is "automatic update of total weight field based on changed value."
But, for your requirement, a trigger is unnecessary and a totally unnecessary overhead. So, instead of having an additional field in the table, rather use a select query as below, which would calculate the value during run-time and display the real-time value.
SELECT NAME,
PROCESS1,
PROCESS2,
WEIGHT ,
(WEIGHT * (PROCESS1 + PROCESS2)) AS TOTAL_WEIGHT
FROM MY_TABLE
The output would be:
NAME | PROCESS1 | PROCESS2 | WEIGHT | TOTAL_WEIGHT
------------------------------------------------------------
ITEM1 | 0 | 0 | 10 | 0
ITEM2 | 1 | 1 | 10 | 20
ITEM3 | 1 | 1 | 15 | 30
You can use this TOTAL_WEIGHT using something like resultSet.getLong("TOTAL_WEIGHT");
Or, if you are very particular in having the field, then you can modify your update query to include
UPDATE MY_TABLE SET FIELD1=VALUE1, FIELD2=VALUE2, ... ,
TOTAL_WEIGHT = (WEIGHT * (PROCESS1 + PROCESS2))
WHERE SOME_CONDITION;
If you're using 11g or later, the safest way to handle this would be to make TOTAL_WEIGHT a computed column. The CREATE TABLE statement would then become something like
CREATE TABLE MY_TABLE
(PROCESS1 NUMBER,
PROCESS2 NUMBER,
WEIGHT NUMBER,
TOTAL_WEIGHT NUMBER GENERATED ALWAYS AS (NVL((PROCESS1+PROCESS2)*WEIGHT, 0)));
Done this way applications don't need to know how to compute TOTAL_WEIGHT - it's always done correctly.
SQLFiddle here.
Share and enjoy.

Resources