I've some clinical data that needs some tricky calculations.
Screenshot here
I have a column called "LabCode" and another "Days" which is my aggregate data/Measure.The third column is my expected output after the aggregations.
The labcode are the high (H) or Low (L) dosage from the test results from an individual on the trial.
The doctors want to know how many days the person was on low dosage before high dosage and vice versa.
I want OVER Function expression that can show the results in yellow highlights.
Thanks for your help.
Related
I'm trying to figure out the difference between the in-application modifier as_rate() and the rollup function per_second().
I want a table with two columns: the left column shows the total number of events submitted to a Distribution (in query-speak: count:METRIC{*} by {tag}), and the right column shows the average rate of events per second. The table visualization applies a sum rollup on left column, and an average rollup on the right column, so that the left column should equal the right column multiplied by the total number of seconds in the selected time period.
From reading the docs I expected either of these queries to work for the right column:
count:DISTRIBUTION_METRIC{*} by {tag}.as_rate()
per_second(count:DISTRIBUTION_METRIC{*} by {tag})
But, it turns out that these two queries are not the same. as_rate() is the only one that finds the expected average rate where left = right * num_seconds. In fact, the per_second() rollup does this extra weird thing where metrics with lower total events have higher average rates.
Is someone able to clarify why these two functions are not synonymous and what per_second() does differently?
I have a data set similar to the table below (simplified for brevity)
I need to calculate the total spend per conversion per team for every month, with ability to plot this as time based line chart being an additional nicety. The total spend is equal to the sum of Phone Expenditure, Travel allowance & Misc. Allowance, this can be a calculated field.
I cannot add a calculated field for the ratio, as for some sales person, the number of conversion can be 0 for a given month. So, averaging over team is not option. How can I go about this?
Thanks for help and suggestions in advance!
I've discussed the question with the Harish offline. I've learned that he is trying to calculate ratio per group, not per row.
To perform calculations per group, users can add calculated fields inside a QuickSight analysis and use level aware aggregation expressions. (Note that level aware aggregations can only be used in an analysis, not in the data prep view). Here is a link to the documentation about level aware aggregations if you want to learn more about this area https://docs.aws.amazon.com/quicksight/latest/user/level-aware-aggregations.html
I have revenue data that has a natural credit balance (less than zero or negative) and expense data that has a natural debit balance (greater than zero or positive). Revenue accounts are tagged as reporting code=400 and all other items are some other number other than 400.
I would like to create a variance column between two measures where the idea is to display whether it is favorable to profit or not. Creating the two measures is not the problem...it is that the variance calculation itself is different for revenue than expenses. Current-LastPeriod=Variance will work for Revenue, while (Current-LastPeriod)*-1=Variance will work for Expenses.
How can I construct a DAX calculation that handles this type of situation?
I tried IF statements but received an error saying a single value for my reportingcodeid column could not be determined. It makes sense as it is a measure. Here is what I tried.
Variance:=IF(TrialBalance_View[ReportingCode1ID]=400,[MonthlyAmount]-[MonthlyAmount_PY],[MonthlyAmount]-[MonthlyAmount_PY]*-1)
This is an issue I am sure a lot of Finance/Accounting people have and I appreciate any help I can get! Thank you!!
Without seeing your data model and code of the measures, I suggest this approach:
Variance =
SUMX(
VALUES(TrialBalance_View[ReportingCode1ID]),
VAR Account_Variance = [MonthlyAmount]-[MonthlyAmount_PY]
RETURN IF(TrialBalance_View[ReportingCode1ID] = 400, Account_Variance, -Account_Variance)
)
Here, we first create a list of reporting codes visible in a current context using VALUES function. Then, SUMX iterates these codes one by one. For each code, it computes variance and stores it in a variable. Then, if code is 400, it takes variance, else it takes negative variance.
If it does not work, please add your data model diagram, and post DAX code for your measures.
I have been using the Query object for some time and I can not find how to show the percentage of a value over the total of the row.
On this Genexus Query Usage Example, it shows totals by Continent, Country and Attraction.
How this query must be configured to show the attractions percentage over row totals to get a result like this:
Those numbers in the image you posted come from a QueryViewer example. They represent the total sales of Soy, Wheat and Rice per Continent and Country. There are three different measures in this query, and you cannot make calculations among measures in the QueryViewer, so you can't achieve what you want with these numbers.
But you can modify or make your query so that there is only one measure, let's say "Sales amount", and the Crop can be one dimension:
Dimensions: Continent, Country and Crop
Measures: Sales amount
Once the data is presented like this, the percentage will show exactly what you want.
PS: By "dimension" I mean attributes without aggregation and by "measure" I mean those attributes that have any aggregation function (Sum, Average, etc.)
I have an application that collects performance metrics and stores them in a datamart. I then use Mondrian to enable analysis and ad-hoc exploration of the data. I'm collecting about 5e6 rows per day and total size of the METRIC table is about 300M rows.
We "color" our data based on the metrics comparison to an SLA. There are exactly 5 distinct values for color. When we do simple MDX queries to get, for example, a color distribution of the data for a specific date range, say 1 day, we see queries like below:
2014-06-11 23:17:08,042 DEBUG [sql] - 223: SqlTupleReader.readTuples
[[Color].[Color]]: executing sql [select "METRIC"."COLOR" as "c0"
from "METRIC" "METRIC" group by "METRIC"."COLOR" order by
"METRIC"."COLOR" ASC NULLS LAST] 2014-06-11 23:17:58,747 DEBUG [sql] -
223: , exec 50704 ms
In order to improve performance, the datamart includes aggregate tables at the hour and day levels, and both aggregate tables include the COLOR column.
I understand that Mondrian is very dependent on the underlying database performance, but there is really no way to tune this. I can create an index on COLOR (because a full scan of the index will be marginally faster than a full scan of the table), but it seems silly to create an index with 5 distinct value on a 300M row table. The day aggregate table has about 500K rows and would be significantly faster executing virtually the same query against this table, but Mondrian always seems to go to the base fact table for these dimension queries.
My question is, is there some way to avoid this query? If I can't avoid it, is it possible to get Mondrian to use the aggregate tables for this type of query? I have specified approxRowCount in the single level of this dimension/hierarchy and that eliminated the similar query to get the count of values. I haven't dug into the source of Mondrian yet to determine if there is a possibility of using the aggregate table or if there is some configuration on my part that is preventing it.
Edit for Clarification:
I probably didn't do a good job of asking my question-let me try and clarify. My MDX query looks something like:
select [Color].[Color].Members on columns,
{[Measures].[Metric Value], [Measures].[Count]} on rows
from [Metric]
where [Time].[2014].[June].[11]
I can look at this and hand write a SQL query that answers this query
select COLOR, avg(VALUE), sum(FACT_COUNT)
from AGG_DAY_METRIC
where YEAR = 2014
and MONTH = 6
and DAY_OF_MONTH = 11
group by COLOR
The database answers this query in about 100ms scanning approx 4K rows. It takes Mondrian several minutes to answer the
query because it does several queries that don't answer the MDX query directly, but rather get information about the
dimension. In the case above, the database has to scan 300M rows, taking 50 seconds, to return that there are 5 possible
colors. If color was in a normal dimension table there would only be 5 rows, but in a degenerate dimension there can be 100s
of millions of rows.
So my questions are:
a) Is there a way to tell Mondrian the values of a degenerate dimension and avoid these queries?
b) Is there a way to have Mondrian answer these queries from aggregate tables?
This problem was solved, not by modifying anything in the Mondrian schema or the application, but the database. The database in this case was Oracle and we were able to create a materialized view with query rewrite enabled.
The materialized view is created from the exact query issued by Mondrian. Since the color values don't change very frequently (almost never in our case), the materialized view does a full refresh once a day.
In this case the queries went from taking minute(s) to milliseconds. If your facing an issue like this and your database is Oracle this is a good approach to speeding up the tuples resolution for degenerate dimensions with low cardinality.
It's hard to give any specific directions without knowing more about your schema, but it looks to me you have to make sure that the number of rows with certain colours (count) has to be marked defined as an aggregate measure (Count or Max Number).
Please note that these aggregates are not calculated continuously (I think it would be to heavy for the backing data-store, and Mondrian won't keep a flowing set in memory for incoming facts).
The aggregation can be specified to be ran/rebuilt at specific times (nightly, hourly...). This would make Mondrian a bit unsuitable for real-time analysis, but you should be able to do almost instant queries on historical data.
If your dimension has 5 distinct values in a 300M fact table it should not be a degenerate dimension. It should be in a separate dimension table. A degenerate dimension should ONLY be used if its cardinality is close to the full fact table row count, making a separate table pointless, as there would be no significant storage savings and joining the dimension results in a lot of data being read;
If you put the colors on a separate dim table, any "Read Tuples" query will return results in a few ms, and your problem is solved.
However, more to the point of your question, Mondrian should be able to pick the dim values from the agg tables. Unless you have distinct-count aggregators in the cube, in which case you're in a tricky situation (unless there's an agg table that exactly matches the level of detail you need, Mondrian will very likely scan the fact table).
You should also set the highCardinality attribute of this degenerate dimension to True. Even with only 5 distinct values, having highCardinality=false tells Mondrian it's safe to scan the whole dimension to populate the list of members. Setting it to true stops this scan.
You should also add an index to this column. It's always a good idea to add indexes to every key and degenerate dimension column in a fact table. With an index the DB should answer much faster that SQL query.
Finally, you have a 300M row fact table. What DBMS are you using? Is it a Column oriented DB? If not, you should try them as a possible alternative to your data store. Column oriented DB have a significant performance increase over Row oriented DBs for Mondrian-like queries. There are a few good options out there, you should test drive them.