BIRT: How to include NULL values in crosstab average function - birt

When creating aggregations on crosstabs, BIRT ignores NULL values. It leads to incorrect values for AVERAGE. How can I replace NULL with zero. My data is from a stored procedure query. Thank you !
Leo

So you want the average of [4,null,null,null] to be 1? I think BIRT's calculation 4 is correct instead of this.
But anyway, you could a computed output column XXX_nvl0 to your dataset which is computed as
( row["XXX"] == null? 0.0 : row["XXX"] )
Then compute the average of XXX_nvl0 instead of XXX.

Related

Perform analysis on an array/collection type

In Oracle Pl/SQL if i have an array or collection type filled with numbers how can i get an average, stddev or the result of any other mathematical operation?
I can figure out a horrible way of doing average by looping though and storing the total, count and average so far but is there somthing like
avg:=my_collection.avg
std:=my_collection.stddev
?
You can cast the collection into a TABLE:
select *
from TABLE ( cast( some_data as mytableType ) )
And then use normal analytical functions.

How to filter by measure values in MDX while having dimension members in both axis

I'm developing an application that uses a tabular database to show some business data.
I need to provide some basic filtering over measures values (equal to, greater than, lesser than etc.) and I'm currently analyzing the proper way to generate the MDX.
Looking at some documentation (and other threads on this site), I found that the most efficient approach would be using the FILTER or HAVING functions to filter out undesired values.
Unfortunately all examples normally include measures on one axis and dimension member on the other, but I potentially have dimension members in both axis and can't find a proper solution to use such functions to filter by measure value.
What have I done so far?
To make it easier to explain, let's say that we want to get the yearly sales quantities by product class filtering quantity > 1.3 milions
Trying to use HAVING or FILTER Functions, the resulting MDX I came up with is
SELECT
NON EMPTY {[YearList].[Year].[Year].MEMBERS * [Measures].[Qty]}
HAVING [Measures].[Qty] > 1.3e6 ON COLUMNS,
NON EMPTY {[Classes].[cClass].[cClass].MEMBERS}
HAVING [Measures].[Qty] > 1.3e6 ON ROWS
FROM [Model]
or
SELECT
NON EMPTY FILTER({[YearList].[Year].[Year].MEMBERS * [Measures].[Qty]},
[Measures].[Qty] > 1.3e6) ON COLUMNS,
NON EMPTY FILTER({[Classes].[cClass].[cClass].MEMBERS} ,
[Measures].[Qty] > 1.3e6) ON ROWS
FROM [Model]
But this is of course leading to unexpected result for the final user because the filter is happening on the aggregation of the quantities by the dimension on that axis only, which is greater then 1.3M
The only way I found so far to achieve what I need is to define a custom member with an IIF statement
WITH
MEMBER [Measures].[FilteredQty] AS
IIF ( [Measures].[Qty] > 1.3e6, [Measures].[Qty], NULL)
SELECT
NON EMPTY {[YearList].[Year].[Year].MEMBERS * [Measures].[FilteredQty]} ON COLUMNS,
NON EMPTY {[Classes].[cClass].[cClass].MEMBERS} ON ROWS
FROM [Model]
The result is the one expected:
Is this the best approach or I should keep using FILTER and HAVING functions? Is there even a better approach I'm still missing?
Thanks
This is the best approach. You need to consider how MDX resolves result. In the example above it is a coincidence that your valid data in a continous region of first four columns of first row. Lets relax the filtering clause and make it >365000. Now take a look at last row of the result, the first two columns and the last column are eligible cells but the third and fourth column is not eligible. However your query will report it as null and the non empty function will not help. The reason is that non empty needs the entire row to be null
Now the question that why filter is not eliminating the cell? Filter will eliminate a row or column when the criteria is greater then the sum on the other axis. So if filter is on columns the filter value has to be greater than the sum of rows for that column. Take a look at the sample below as soon as you remove the comments the last column will be removed.
select
non empty
filter(
([Measures].[Internet Sales Amount]
,{[Date].[Calendar Year].&[2013],[Date].[Calendar Year].&[2014]}
,[Date].[Calendar Quarter of Year].[Calendar Quarter of Year]
),([Date].[Calendar Year].currentmember,[Date].[Calendar Quarter of Year].currentmember,[Product].[Subcategory].currentmember,[Measures].[Internet Sales Amount])>45694.70--+0.05
)
on columns
,
non empty
[Product].[Subcategory].members
on rows
from
[Adventure Works]
Edit another sample added.
with
member [Measures].[Internet Sales AmountTest]
as
iif(([Date].[Calendar Year].currentmember,[Date].[Calendar Quarter of Year].currentmember,[Product].[Subcategory].currentmember,[Measures].[Internet Sales Amount])>9000,
([Date].[Calendar Year].currentmember,[Date].[Calendar Quarter of Year].currentmember,[Product].[Subcategory].currentmember,[Measures].[Internet Sales Amount]),
null
)
select
non empty
({[Measures].[Internet Sales Amount],[Measures].[Internet Sales AmountTest]}
,{[Date].[Calendar Year].&[2013]}
,[Date].[Calendar Quarter of Year].[Calendar Quarter of Year]
)
on columns
,
non empty
[Product].[Subcategory].[Subcategory]
on rows
from
[Adventure Works]

Decimal precision in hive

I want to add the decimal precision to be set for the values
Example:
select 1*1.00000;
output: 1.0
Even tried with cast
select cast(cast(1*1.0000 as double) as decimal(5,2))
output: 1
I want the results to be displayed as 1.000. Is there any way to do so in hive?
Create a table and test it. It works if we give the exact precision value as mentioned in the decimal function.
create table test1_decimal (b decimal (5,3));
INSERT INTO test1_Decimal values(1.000); //This will shrink it to 1 as the total digits is not five.
INSERT INTO test1_Decimal values(123.12345); //This results in NULL as it exceeds the total digits(5).
INSERT INTO test1_Decimal values(12.123); //This will give you exact result as the precision and number of digits fits. Ouputs as 12.123
So if the value matches the decimal function then it displays correctly else it shrinks or converts to NULL.

Oracle SQL Query Performance, Function based Indexes

I have been trying to fine tune a SQL Query that takes 1.5 Hrs to process approx 4,000 error records. The run time increases along with the number of rows.
I figured out there is one condition in my SQL that is actually causing the issue
AND (DECODE (aia.doc_sequence_value,
NULL, DECODE(aia.voucher_num,
NULL, SUBSTR(aia.invoice_num, 1, 10),
aia.voucher_num) ,
aia.doc_sequence_value) ||'_' ||
aila.line_number ||'_' ||
aida.distribution_line_number ||'_' ||
DECODE (aca.doc_sequence_value,
NULL, DECODE(aca.check_voucher_num,
NULL, SUBSTR(aca.check_number, 1, 10),
aca.check_voucher_num) ,
aca.doc_sequence_value)) = " P_ID"
(P_ID - a value from the first cursor sql)
(Note that these are standard Oracle Applications(ERP) Invoice tables)
P_ID column is from the staging table that is derived the same way as above derivation and compared here again in the second SQL to get the latest data for that record. (Basically reprocessing the error records, the value of P_ID is something like "999703_1_1_9995248" )
Q1) Can I create a function based index on the whole left side derivation? If so what is the syntax.
Q2) Would it be okay or against the oracle standard rules, to create a function based index on standard Oracle tables? (Not creating directly on the table itself)
Q3) If NOT what is the best approach to solve this issue?
Briefly, no you can't place a function-based index on that expression, because the input values are derived from four different tables (or table aliases).
What you might look into is a materialised view, but that's a big and potentially difficult to solve a single query optimisation problem with.
You might investigate decomposing that string "999703_1_1_9995248" and applying the relevant parts to the separate expressions:
DECODE(aia.doc_sequence_value,
NULL,
DECODE(aia.voucher_num,
NULL, SUBSTR(aia.invoice_num, 1, 10),
aia.voucher_num) ,
aia.doc_sequence_value) = '999703' and
aila.line_number = '1' and
aida.distribution_line_number = '1' and
DECODE (aca.doc_sequence_value,
NULL,
DECODE(aca.check_voucher_num,
NULL, SUBSTR(aca.check_number, 1, 10),
aca.check_voucher_num) ,
aca.doc_sequence_value)) = '9995248'
Then you can use indexes on the expressions and columns.
You could separate the four components of the P_ID value using regular expressions, or a combination of InStr() and SubStr()
Ad 1) Based on the SQL you've posted, you cannot create function based index on that. The reason is that function based indexes must be:
Deterministic - i.e. the function used in index definition has to always return the same result for given input arguments, and
Can only use columns from the table the index is created for. In your case - based on aliases you're using - you have four tables (aia, aila, aida, aca).
Req #2 makes it impossible to build a functional index for that expression.

MDX Replace Range With Filter

While looking at the following answer I wanted to replace the range with a filter.
MDX - Concurrent calculations based on a "record time range"?
The MDX is:
with member [Measures].[TotalUsed] as sum({[Date].[YQM].&[20090501]:[Date].[YQM].&[20090907]}, [Measures].[Used])
select {[Measures].[Total Used]}on columns,
{[Color].[Colors].[All].MEMBERS}on rows
from [Cube]
I'm trying to replace the Date range with a filter like this:
with member [Measures].[TotalUsed] as sum(FILTER([Date].[YQM], [Date].[YQM] < [Date].[YQM].&[20090907]), [Measures].[Used])
select {[Measures].[Total Used]}oncolumns,
{[Color].[Colors].[All].MEMBERS}on rows
from [Cube]
What is the conditional statement looking for in terms of comparing values? Any Help would be great!
Thanks!
The Filter statement needs a SET and then an EXPRESSION to filter on. You can drop this right inside your SUM function. The expression part of the filter can be almost anything, but it has to evaulate to true/false for each cell in the SET.
-- FILTER ( SET, EXPRESSION)
It's a bit tough not knowing what your data is structured like but your statment would probably end up like the following, filtering rows with less than 50 'UnUsed' for your timeperiods, and then summing them as an example.
`
WITH MEMBER [Measures].[TotalUsed]
AS SUM (FILTER ( [Date].[YQM].members, [Measures].[UnUsed] > 50 ),
[Measures].[Used] )
SELECT {[Measures].[Total Used]} ON COLUMNS,
{[Color].[Colors].[All].MEMBERS} ON ROWS
FROM [Cube]

Resources