oracle PL/SQL with automatic values - oracle

Hi folks I have a question regarding calculating the value in a column in oracle.
So I have this table
NAME PROCESS1 PROCESS2 WEIGHT TOTAL_WEIGHT
ITEM1 0 0 10
ITEM2 1 1 10
ITEM3 1 1 15
So what I am trying to do here is generating the value in total_weight based on process1 and process2 in PL/SQL. Because later on I need to show the sum of total weight in PHP page. so for item 2 the total weight should be 20 and for item 3 it should be 30. Should I use procedure for generating the value in total weight? I want the value is updated when user change the value in Process1 or process2. Please help me I am kinda newbie here.

select name, process1, process2, weight, (process1+process2)*weight total_weight
from table
I see no reason that PL/SQL has to play a role in this, unless I misunderstood the requirement.

Declaring VIRTUAL (Computed) COLUMNS in Oracle Table Design
I'll agree with most of what has been said so far with some additional elaboration. My starting table design looks similar but it too is also inaccurate for some use cases as explained below.
CREATE TABLE "PROCESSED_PRODUCT_WEIGHT" (
"PRODUCT_NAME" VARCHAR2(40) NOT NULL,
"PROCESS1" NUMBER,
"PROCESS2" NUMBER,
"WEIGHT" NUMBER,
"TOTAL_WEIGHT" NUMBER GENERATED ALWAYS AS
((PROCESS1 + PROCESS2)*WEIGHT) VIRTUAL,
"RECORDED_DATE" DATE,
CONSTRAINT "PROCESSED_PRODUCT_WEIGHT_PK"
PRIMARY KEY ("PRODUCT_NAME", "RECORDED_DATE")
)
/
Previous Suggestions and Assumptions
Table Bound Attribute Properties: The table construct used by #Bob Jarvis is also known as a VIRTUAL COLUMN. It works well because the definition of TOTAL_WEIGHT is entirely dependent on other values contained within the same table.
SQL Query Associated Calculation: On the other hand, #Nishanthi Grashia and #OldProgrammer both recommend modifying the value within each SQL query executed against the database.
BOTH Cases may work assuming that the mass per unit of the product does not change during the lifetime of the production cycle.
An example where this assumption is not flexible is if the products consist of units of varying mass per unit volume.
Since it was not mentioned in the OP, consider this possibility:
Products ITEM1, ITEM2 and ITEM3 have variable weights per unit.
They are all produced in a coffee packaging plant.
Each item can be a type of coffee bean and its source.
"Processes" could be bean "treatments" such as decaffeination, roasting type or flavor infusion.
The "units" could be packaging of varying sizes. This would mean that package volumes would have a direct effect on the mass (called "weight") per product unit counted.
Test Cases for Identifying the Effect of Changing Unit Sizes
Each test case shows how a virtual column does not satisfy the possibility of variations in the unit sizes and masses of each product over time.
Test Case One:
For production observations made 2/14/2015
Test Case Two:
The mass per unit processed on 3/14/2014 is increased only, skewing the total mass produced since the item quantities made previously are multiplied by a larger value through the virtual column definition.
Test Case Three:
Data Output and Results
Above are the test results associated with all three test cases. the resulting values are not correct for the use cases created. They demonstrate that for a changing weight value, the virtual/calculated column formula and approach gives incorrect results.
A Discussion of Alternate Solutions
The trigger approach may work for maintaining calculated values for TOTAL_WEIGHT. Incremental changes (updates) are appended to the current, existing value as each component varies.
Force all DML through a single DML operation contained in a CRUD package. The problem with defining an embedded SQL statement to enforce requirements is that other processes and their developers will need to be familiar with what your isolated PHP form/page does within your app in order to duplicate it for their own operation.
If there is a concern about overhead or possible locking of the main table, then consider introducing a composite key: PRODUCT_NAME + WEIGHT. This covers for the problem so that quantities of the same product name are multiplied by their correct weight and values already calculated remain unchanged even if the weight multiplier is modified.
SOMETIMES, ALWAYS, NEVER... Are popular assumptions thrown around in developer's project circles. How likely is this to happen at all? It depends... if you're a coffee bean packaging outfit, I'd say it's quite possible.
Onward!

From what I understand, you want the field TOTAL_WEIGHT to be updated whenever the value in process 1 or process2 changes. So, Ideally you must be using TRIGGERS for this.
TRIGGERS are used to trigger an action based on an initial event.
So, for your case, the initial event is "Value change of process1 or process2" and the action expected is "automatic update of total weight field based on changed value."
But, for your requirement, a trigger is unnecessary and a totally unnecessary overhead. So, instead of having an additional field in the table, rather use a select query as below, which would calculate the value during run-time and display the real-time value.
SELECT NAME,
PROCESS1,
PROCESS2,
WEIGHT ,
(WEIGHT * (PROCESS1 + PROCESS2)) AS TOTAL_WEIGHT
FROM MY_TABLE
The output would be:
NAME | PROCESS1 | PROCESS2 | WEIGHT | TOTAL_WEIGHT
------------------------------------------------------------
ITEM1 | 0 | 0 | 10 | 0
ITEM2 | 1 | 1 | 10 | 20
ITEM3 | 1 | 1 | 15 | 30
You can use this TOTAL_WEIGHT using something like resultSet.getLong("TOTAL_WEIGHT");
Or, if you are very particular in having the field, then you can modify your update query to include
UPDATE MY_TABLE SET FIELD1=VALUE1, FIELD2=VALUE2, ... ,
TOTAL_WEIGHT = (WEIGHT * (PROCESS1 + PROCESS2))
WHERE SOME_CONDITION;

If you're using 11g or later, the safest way to handle this would be to make TOTAL_WEIGHT a computed column. The CREATE TABLE statement would then become something like
CREATE TABLE MY_TABLE
(PROCESS1 NUMBER,
PROCESS2 NUMBER,
WEIGHT NUMBER,
TOTAL_WEIGHT NUMBER GENERATED ALWAYS AS (NVL((PROCESS1+PROCESS2)*WEIGHT, 0)));
Done this way applications don't need to know how to compute TOTAL_WEIGHT - it's always done correctly.
SQLFiddle here.
Share and enjoy.

Related

Tableau - Aggregate and non aggregate error for divide forumla

I used COUNT (CUST_ID) as measure value to come up [Total No of Customer]. When I created new measure for [Average Profit per customer] by formula - [Total Profit] / [Total No of Customer], the error of Aggregate and non aggregate error prompted.
DB level:
Cust ID_____Profit
123_______100
234_______500
345_______350
567_______505
You must be looking for avg aggregate function.
Select cust_id, avg(profit)
From your_table
Group by cust_id;
Cheers!!
In your database table, you appear to have one data row per customer. Customer ID is serving as a unique primary key. The level of detail (or granularity) of the database table is the customer.
Given that, the simplest solution to your question is to display AVG([Profit]) -- without having [Cust ID] in the view (i.e. not on any shelf)
If the assumptions mentioned above are not correct, then you may need to employ other methods depending on how you define your question. I suggest making sure you understand what COUNT() actually does compared to COUNTD(). The behavior is not what people tend to assume. LOD calculations may prove useful. All described in the online help.
Put the calculations directly in the calculated field as:
SUM([Profit])/COUNT([CUST_ID])
This will give you aggregate and aggregate calculation.
If you want to show Average profit using a key like [CUST_ID], you can use LOD expression:
{FIXED [CUST_ID]: AVG[Profit]}

PowerBI filter table based on value of measure_A OR measure_B [duplicate]

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!

Issue with OBIEE 11G. Records being counted twice

My development team is working with OBIEE 11G to make an analysis as follows:
Discover which policyholders are in alert. An alert is defined as this: when the quantity of claims of a policyholder is superior to a certain threshold. If the policyholder has at least one alert in one of its claims, then the policyholder is in alert. The problem is that those thresholds are defined to a particular key (combination of type of client, range of age, type of pathology and other stuff) and a policyholder can have many keys and a threshold for each key, so the quantity of claims varies. Something like this:
Policyholder Key #Claims Threshold
ABC123 XYZ 3 4
WQE 3 2
EFG456 ABC 1 2
The ABC123 policyholder has 6 claims in total, 3 for the key XYZ (which has a threshold of 4) and 3 for the key WQE (which has a threshold of 2). On the other hand, the EFG456 policyholder has 1 claim for the key ABC that has a threshold of 2. So in this case, ABC123 policyholder should be in alert because the quantity of claims for the key WQE is greater than the threshold.
So, in OBIEE 11G my team added two columns, one to mark the records in alert and one to mark the records which are not in alert. Like this:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 1
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
You see the problem now? OBIEE 11G does not see policyholder ABC123 as a unit and mark it both as in alert and not in alert, which is wrong. The correct info should be:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 0
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
Because, it doesn't matter if the policyholder did not reach the alert for key XYZ. If an alert is discovered, the complete file of the policyholder is examined to resolve the alert.
Is there anyway of telling this to OBIEE 11G???
Please help!!
I think this is a dimensional modeling problem instead of an OBIEE one:
In order to help I will make a few assumptions:
PolicyHolder and Key are separate dimensions:
Although the "key" dimension contains some attributes from the policyholder,
such as type of client and age group; it also combines other entities like pathology and to me
that is enough to consider it at least a mini dimension.
The "Is in alert flag" can be modeled as a factless fact table:
It looks like you only need to know if a particular policyholder is in alert,
there is no metric associated with the event and you only need a flag that is either 0 or 1. This can be solved with a simple table that includes at least 3 columns: FK_POLICYHOLDER,FK_DATE and the flag. You already have a flag but it is included in the claims table as a calculated column, if you model this flag as a separate table you will have control of the dimensionality and granularity of the alert. See What dw model is appropriate when there's no measure?.
The metric "number of claims" has a different dimensionality than the alert flag.
I think the crux of the problem is that flags are calculated at the key level but for reporting purposes are only needed at the Policyholder level.
If you want alerts to be assigned to a PolicyHolder "as a unit" then you need a fact table that is linked to the PolicyHolder dimension and NOT LINKED to the key dimension
Concretely:
Create a separate dimension table for your "Key" entity (type of client, pathology, etc.)
Create a new factless fact table that contains alerts for a policyholder, this table should not link to the Key dimension.
Change the "Alert" column in your report, you should get that value from the flag counter of your new factless fact table.
Firstly, the ALERT columns seem redundant. It's an incredibly simple calculation that would be better done by OBI dynamically. That way you can check for policy holders in alert (on the aggregate of their keys) or for each key.
If I wanted to fix that calculation in OBI I would do it with a calculated logical column in the BMM (based on other logical columns) simply evaluating CLAIMS against THRESHOLD:
CASE WHEN CLAIMS >= THRESHOLD THEN 1 ELSE 0 END
That way the flag can work at multiple levels (either for POLICYHOLDER or KEY). But it seems a very simple calculation that could just be done in the Analysis as a filter (or selection step).
Even simpler though (assuming you have CLAIMS and THRESHOLD as measure columns with a SUM aggregation, and POLICYHOLDER and KEY as dimension column) would be to ignore any sort of alert column altogether. If you don't bring KEY into the Analysis OBI would give you each policy holder, their total claims and total threshold. You could then use selection steps or a filter in the criteria to remove those not over the threshold.

Sqlite view vs plain select statement performance

I have a simple table (with about 8 columns and a LOT of rows) in a SQLite database. There is a single program that runs as a service and performs selects, updates and inserts on the table quite often (approximately every 5 minutes). The selects are used only to determine which rows are to be updated, and they are based on a column that holds boolean values (probably translated to integer internally by SQLite).
There is also a web application that performs selects (always with a GROUP BY clause) whenever a web user wishes to view part of the data.
There are two ways to ask for data through the web application: (a) predefined filters (i.e. the where clause has specific conditions on 3 specific columns) an (b) custom filters (i.e. the user chooses the values for the conditions, but the columns participating in the where clause are the same as in (a)). As mentioned, in both cases there is a GROUP BY operation.
I am wondering whether using a view or a custom function might increase the performance. Currently, a "custom" select may take more than 30 seconds to complete - and that's before any data has been sent back to the user.
EDIT:
Using EXPLAIN QUERY PLAN on a "predefined" select statement yields only one row:
0|0|TABLE mytable
Using EXPLAIN on the same query, yields the following:
0|OpenVirtual|1|4|keyinfo(2,-BINARY,BINARY)
1|OpenVirtual|2|3|keyinfo(1,BINARY)
2|MemInt|0|5|
3|MemInt|0|4|
4|Goto|0|27|
5|MemInt|1|5|
6|Return|0|0|
7|IfMemPos|4|9|
8|Return|0|0|
9|AggFinal|0|0|count(0)
10|AggFinal|2|1|sum(1)
11|MemLoad|0|0|
12|MemLoad|1|0|
13|MemLoad|2|0|
14|MakeRecord|3|0|
15|MemLoad|0|0|
16|MemLoad|1|0|
17|Sequence|1|0|
18|Pull|3|0|
19|MakeRecord|4|0|
20|IdxInsert|1|0|
21|Return|0|0|
22|MemNull|1|0|
23|MemNull|3|0|
24|MemNull|0|0|
25|MemNull|2|0|
26|Return|0|0|
27|Gosub|0|22|
28|Goto|0|82|
29|Integer|0|0|
30|OpenRead|0|2|
31|SetNumColumns|0|9|
32|Rewind|0|48|
33|Column|0|8|
34|String8|0|0|123456789
35|Le|356|39|collseq(BINARY)
36|Column|0|3|
37|Integer|180|0|
38|Gt|100|42|collseq(BINARY)
39|Column|0|7|
40|Integer|1|0|
41|Ne|356|47|collseq(BINARY)
42|Column|0|6|
43|Sequence|2|0|
44|Column|0|3|
45|MakeRecord|3|0|
46|IdxInsert|2|0|
47|Next|0|33|
48|Close|0|0|
49|Sort|2|69|
50|Column|2|0|
51|MemStore|7|0|
52|MemLoad|6|0|
53|Eq|512|58|collseq(BINARY)
54|MemMove|6|7|
55|Gosub|0|7|
56|IfMemPos|5|69|
57|Gosub|0|22|
58|AggStep|0|0|count(0)
59|Column|2|2|
60|Integer|30|0|
61|Add|0|0|
62|ToReal|0|0|
63|AggStep|2|1|sum(1)
64|Column|2|0|
65|MemStore|1|1|
66|MemInt|1|4|
67|Next|2|50|
68|Gosub|0|7|
69|OpenPseudo|3|0|
70|SetNumColumns|3|3|
71|Sort|1|80|
72|Integer|1|0|
73|Column|1|3|
74|Insert|3|0|
75|Column|3|0|
76|Column|3|1|
77|Column|3|2|
78|Callback|3|0|
79|Next|1|72|
80|Close|3|0|
81|Halt|0|0|
82|Transaction|0|0|
83|VerifyCookie|0|1|
84|Goto|0|29|
85|Noop|0|0|
The select I used was as the following
SELECT
COUNT(*) as number,
field1,
SUM(CAST(filter2 +30 AS float)) as column2
FROM
mytable
WHERE
(filter1 > '123456789' AND filter2 > 180)
OR filter3=1
GROUP BY
field1
ORDER BY
number DESC, field1;
Whenever you're going to be doing comparisons of a non-primary-key field, it's a good design idea to add an index into to the field(s). Too many, however, can cause INSERTs to crawl, so plan accordingly.
Also, if you have simple fields such as ones that only hold a boolean value, you may want to consider declaring it as an INTEGER instead of whatever you declared it as. Declaring it as any type not specifically defined by SQLite will cause it to default to a NUMERIC type which will take longer to compare values because it will store it internally as a double and will use the floating-point math processor instead of the integer math processor.
IMO, the GROUP BY sorting directive is sometimes a dead giveaway to an unoptimized query; its methodology involves eliminating redundant data which could have been eliminated beforehand if it hadn't been pulled out of the database to begin with.
EDIT:
I saw your query and saw there are some simple things you can do to optimize it:
SUM(CAST(filter2 +30 AS float)) is inefficient; why are you casting it as a float? Why not just SUM it then add 30 * the COUNT?
filter1 > '123456789' - Why the string comparison? Why not just use integer comparison?

Having more than 50 column in a SQL table

I have designed my database in such a way that One of my table contains 52 columns. All the attributes are tightly associated with the primary key attribute, So there is no scope of further Normalization.
Please let me know if same kind of situation arises and you don't want to keep so many columns in a single table, what is the other option to do that.
It is not odd in any way to have 50 columns. ERP systems often have 100+ columns in some tables.
One thing you could look into is to ensure most columns got valid default values (null, today etc). That will simplify inserts.
Also ensure your code always specifies the columns (i.e no "select *"). Any kind of future optimization will include indexes with a subset of the columns.
One approach we used once, is that you split your table into two tables. Both of these tables get the primary key of the original table. In the first table, you put your most frequently used columns and in the second table you put the lesser used columns. Generally the first one should be smaller. You now can speed up things in the first table with various indices. In our design, we even had the first table running on memory engine (RAM), since we only had reading queries. If you need to get the combination of columns from table1 and table2 you need to join both tables with the primary key.
A table with fifty-two columns is not necessarily wrong. As others have pointed out many databases have such beasts. However I would not consider ERP systems as exemplars of good data design: in my experience they tend to be rather the opposite.
Anyway, moving on!
You say this:
"All the attributes are tightly associated with the primary key
attribute"
Which means that your table is in third normal form (or perhaps BCNF). That being the case it's not true that no further normalisation is possible. Perhaps you can go to fifth normal form?
Fifth normal form is about removing join dependencies. All your columns are dependent on the primary key but there may also be dependencies between columns: e.g, there are multiple values of COL42 associated with each value of COL23. Join dependencies means that when we add a new value of COL23 we end up inserting several records, one for each value of COL42. The Wikipedia article on 5NF has a good worked example.
I admit not many people go as far as 5NF. And it might well be that even with fifty-two columns you table is already in 5NF. But it's worth checking. Because if you can break out one or two subsidiary tables you'll have improved your data model and made your main table easier to work with.
Another option is the "item-result pair" (IRP) design over the "multi-column table" MCT design, especially if you'll be adding more columns from time to time.
MCT_TABLE
---------
KEY_col(s)
Col1
Col2
Col3
...
IRP_TABLE
---------
KEY_col(s)
ITEM
VALUE
select * from IRP_TABLE;
KEY_COL ITEM VALUE
------- ---- -----
1 NAME Joe
1 AGE 44
1 WGT 202
...
IRP is a bit harder to use, but much more flexible.
I've built very large systems using the IRP design and it can perform well even for massive data. In fact it kind of behaves like a column organized DB as you only pull in the rows you need (i.e. less I/O) rather that an entire wide row when you only need a few columns (i.e. more I/O).

Resources