I've read through similar questions and they don't seem to quite fit my issue or they're in a different environment.
I'm working in MS-Access 2016.
I have a customer complaints report which has fields: year, month, count([complaint #]), complaint_desc.
(complaint # is the literal ID number we assign to each complaint entered into the table)
I grouped the report by year and month and then grouped by complaint_desc and for each desc did a count of complaint number, and then did a count of complaint # to add up total complaints for the month and stuck it in the month footer which gives a result of something like this:
2020 03 <= (this is the month group header)
complaint desc | count of complaints/desc
---------------------------------------------
electrical | 2 {This section is
cosmetic | 6 {in the Complaint_desc
mechanical | 1 {group footer
---------------------------------------------
9 <= (this is month group footer)
repeating the group for each month
This is all good. What I want to do is to sort the records within the complaint desc group in descending order of count(complaint#) so that it looks like:
2020 03
complaint desc | count of complaints/category
---------------------------------------------
cosmetic | 6
electrical | 2
mechanical | 1
---------------------------------------------
9
However nothing I do seems to work, the desc group's built-in sort "a on top" overrides sorting in the query. adding a sort by complaint# is ignored also. I tried to add a sort by count(complaint#) and access told me I can't have an aggregate function in an order by (but I think it would have been overridden anyway). I also tried to group by count(complaint#) also shot down as aggregate in a group by. Tried moving complaint_desc and count(complaint#) to the complaint# group header and it screwed up the total count in the month footer and also split up the complaint desc's defeating it's original purpose...
I really didn't think this change was going to be a big deal, but a solution has evaded me for a while now. I've read similar questions and tried to follow examples but they didn't lead to my intended result.
Any Idea?
I figured it out! Thank you to #UnhandledException who got me thinking on the right track.
So here's what I did:
The original query the report was based on contained the following:
Design mode:
Field | Year | Month | Complaint_Desc | Complaint# |
Total | Group By | Group By | Group By | Group By |
Sort | | | | |
or in SQL:
SELECT Year, Month, [tbl Failure Mode].[Code description], [Complaint Data Table].[Complaint #]
FROM [tbl Failure Mode] RIGHT JOIN [Complaint Data Table] ON [tbl Failure Mode].[ID code] = [Complaint Data Table].[Failure Mode]
GROUP BY Year, Month, [tbl Failure Mode].[Code description], [Complaint Data Table].[Complaint #];
And then I was using the report's group and sort functions to make it show how I wanted except for the hiccup I mentioned.
I made another query based upon that query:
Design mode:
Field | Year | Month | Complaint_Desc | Complaint# |
Total | Group By | Group By | Group By | Count |
Sort | Descending | Descending | | Descending |
or in SQL:
SELECT [qry FailureMode].Year, [qry FailureMode].Month, [qry FailureMode].[Complaint_description], Count([qry FailureMode].[Complaint #]) AS [CountOfComplaint #], [qry FailureMode].Complaint
FROM [qry FailureMode]
GROUP BY [qry FailureMode].Year, [qry FailureMode].Month, [qry FailureMode].[Code description], [qry FailureMode].Complaint
ORDER BY [qry FailureMode].Year DESC , [qry FailureMode].Month DESC , Count([qry FailureMode].[Complaint #]) DESC;
Then I changed the report structure:
I eliminated the Complaint_Desc group, moved complaint_desc and CountofComplaint# (which is now not a function but it's own calculated field from my new query) to the DETAIL section of the report. Then I deleted my 2nd count(complaint#) that was in the month footer as a total for each month and replaced it with the "AccessTotalsCountOfComplaint #" which is =Sum([CountOfComplaint #]) which I had access auto-create by right-clicking on the CountofComplaint_Desc in details scrolling to "Total" and clicking on "Sum". (I deleted the extra AccessTotalsCountOfComplaint#'s that were outside of the Month Group Footer that I needed it for...)
Et Voila
I hope this helps someone else, and thank you again to Unhandled Exception who pointed me in the right direction.
I have a simple table with data in the following format:
category | score | date
cat1 | 3 | 31/3/2019
cat2 | 9 | 31/3/2019 Q1 data
cat3 | 7 | 31/3/2019
...
cat1 | 6 | 30/6/2019
cat2 | 4 | 30/6/2019 Q2 data etc.
cat3 | 1 | 30/6/2019
Basically, I have many rows for quarterly data (scores for different categories) where the date column references the actual quarter. I have a chart where I'm showing the values from the latest quarter (most recent data), but I need a column to give me previous quarter's score. I found out about PREVIOUSQUARTER, which looked like an easy trick, but it returns blanks.
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER(data[date]))
Can someone tell me what I'm doing wrong? I tried creating a date table, with continuous dates between the first and the last date of my column, it didn't help. No other time intelligence function seems to return anything, so I guess it's something generic. I tried the documentation, but it doesn't mention any limitation. What I'm looking for is:
category | score | date | prevQtr
cat1 | 3 | 31/3/2019 |
cat2 | 9 | 31/3/2019 |
cat3 | 7 | 31/3/2019 |
...
cat1 | 6 | 30/6/2019 | 3
cat2 | 4 | 30/6/2019 | 9
cat3 | 1 | 30/6/2019 | 7
Thanks
Screenshots:
Okay, three things here.
You need to reference your date dimension date column for the built in time intelligence functions. (Note, this also implies you should only use date fields from the date dimension. Hide 'data'[date] in your fact so you aren't tempted to use it.)
Since you're adding this as a calculated column, you're going to have to do some extra context manipulation.
Since there is no context being contributed by the date dimension here (you're only in row context of the fact table), you can't use a built-in time intelligence function effectively here.
So! My suggestion is to define your PrevQtr as a measure, rather than a column, in which case it will work with the small refactoring below:
PrevQtr =
CALCULATE (
SUM ( 'Data'[Score] ),
PREVIOUSQUARTER ( 'dateTable'[Date] ) // just referencing the date table here
)
If you must create a calculated column for this, and your data all follows the pattern in your sample of having values on the last day of the quarter, then I think this is the best bet for it:
PrevQtrColumn =
VAR CurrentRowDate = 'Data'[Date]
VAR PriorQuarterEnd = EOMONTH ( CurrentRowDate, -3 )
RETURN
CALCULATE (
SUM ( 'Data'[Score] ),
ALLEXCEPT ( 'Data', 'Data'[Category] ),
'Data'[Date] = PriorQuarterEnd
)
The big gotcha there is covered by the ALLEXCEPT. CALCULATE converts all row context to filter context, so your 'Data'[Score] values become filter context. ALLEXCEPT clears that, preserving only the [Category].
I would recommend just using a measure to achieve this, BUT.. your measure:
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER(data[date]))
Needs to reference your date table:
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER('dateTable'[date]))
This modification should do it.
In Hiveql, what is the most elegant and performatic way of calculating an average value when there are 'gaps' in the data, with implicit repeated values between them? i.e. Considering a table with the following data:
+----------+----------+----------+
| Employee | Date | Balance |
+----------+----------+----------+
| John | 20181029 | 1800.2 |
| John | 20181105 | 2937.74 |
| John | 20181106 | 3000 |
| John | 20181110 | 1500 |
| John | 20181119 | -755.5 |
| John | 20181120 | -800 |
| John | 20181121 | 1200 |
| John | 20181122 | -400 |
| John | 20181123 | -900 |
| John | 20181202 | -1300 |
+----------+----------+----------+
If I try to calculate a simple average of the november rows, it will return ~722.78, but the average should take into account the days that are not shown have the same balance as the previous register. In the above data, John had 1800.2 between 20181101 and 20181104, for example.
Assuming that the table always have exactly one row for each date/balance and given that I cannot change how this data is stored (and probably shouldn't since it would be a waste of storage to write rows for days with unchanged balances), I've been tinkering with getting the average from a select with subqueries for all the days in the queried month, returning a NULL for the absent days, and then using case to get the balance from the previous available date in reverse order. All of this just to avoid writing temporary tables.
Step 1: Original Data
The 1st step is to recreate a table with the original data. Let's say the original table is called daily_employee_balance.
daily_employee_balance
use default;
drop table if exists daily_employee_balance;
create table if not exists daily_employee_balance (
employee_id string,
employee string,
iso_date date,
balance double
);
Insert Sample Data in original table daily_employee_balance
insert into table daily_employee_balance values
('103','John','2018-10-25',1800.2),
('103','John','2018-10-29',1125.7),
('103','John','2018-11-05',2937.74),
('103','John','2018-11-06',3000),
('103','John','2018-11-10',1500),
('103','John','2018-11-19',-755.5),
('103','John','2018-11-20',-800),
('103','John','2018-11-21',1200),
('103','John','2018-11-22',-400),
('103','John','2018-11-23',-900),
('103','John','2018-12-02',-1300);
Step 2: Dimension Table
You will need a dimension table where you will have a calendar (table with all the possible dates), call it dimension_date. This is a normal industry standard to have a calendar table, you could probably download this sample data over the internet.
use default;
drop table if exists dimension_date;
create external table dimension_date(
date_id int,
iso_date string,
year string,
month string,
month_desc string,
end_of_month_flg string
);
Insert some sample data for entire month of Nov 2018:
insert into table dimension_date values
(6880,'2018-11-01','2018','2018-11','November','N'),
(6881,'2018-11-02','2018','2018-11','November','N'),
(6882,'2018-11-03','2018','2018-11','November','N'),
(6883,'2018-11-04','2018','2018-11','November','N'),
(6884,'2018-11-05','2018','2018-11','November','N'),
(6885,'2018-11-06','2018','2018-11','November','N'),
(6886,'2018-11-07','2018','2018-11','November','N'),
(6887,'2018-11-08','2018','2018-11','November','N'),
(6888,'2018-11-09','2018','2018-11','November','N'),
(6889,'2018-11-10','2018','2018-11','November','N'),
(6890,'2018-11-11','2018','2018-11','November','N'),
(6891,'2018-11-12','2018','2018-11','November','N'),
(6892,'2018-11-13','2018','2018-11','November','N'),
(6893,'2018-11-14','2018','2018-11','November','N'),
(6894,'2018-11-15','2018','2018-11','November','N'),
(6895,'2018-11-16','2018','2018-11','November','N'),
(6896,'2018-11-17','2018','2018-11','November','N'),
(6897,'2018-11-18','2018','2018-11','November','N'),
(6898,'2018-11-19','2018','2018-11','November','N'),
(6899,'2018-11-20','2018','2018-11','November','N'),
(6900,'2018-11-21','2018','2018-11','November','N'),
(6901,'2018-11-22','2018','2018-11','November','N'),
(6902,'2018-11-23','2018','2018-11','November','N'),
(6903,'2018-11-24','2018','2018-11','November','N'),
(6904,'2018-11-25','2018','2018-11','November','N'),
(6905,'2018-11-26','2018','2018-11','November','N'),
(6906,'2018-11-27','2018','2018-11','November','N'),
(6907,'2018-11-28','2018','2018-11','November','N'),
(6908,'2018-11-29','2018','2018-11','November','N'),
(6909,'2018-11-30','2018','2018-11','November','Y');
Step 3: Fact Table
Create a fact table from the original table. In normal practice, you ingest the data to hdfs/hive then process the raw data and create a table with historical data where you keep inserting in increment manner. You can look more into data warehousing to get the proper definition but I call this a fact table - f_employee_balance.
This will re-create the original table with missing dates and populate the missing balance with earlier known balance.
--inner query to get all the possible dates
--outer self join query will populate the missing dates and balance
drop table if exists f_employee_balance;
create table f_employee_balance
stored as orc tblproperties ("orc.compress"="SNAPPY") as
select q1.employee_id, q1.iso_date,
nvl(last_value(r.balance, true) --initial dates to be populated with 0 balance
over (partition by q1.employee_id order by q1.iso_date rows between unbounded preceding and current row),0) as balance,
month, year from (
select distinct
r.employee_id,
d.iso_date as iso_date,
d.month, d.year
from daily_employee_balance r, dimension_date d )q1
left outer join daily_employee_balance r on
(q1.employee_id = r.employee_id) and (q1.iso_date = r.iso_date);
Step 4: Analytics
The query below will give you the true average for by month:
select employee_id, monthly_avg, month, year from (
select employee_id,
row_number() over (partition by employee_id,year,month) as row_num,
avg(balance) over (partition by employee_id,year,month) as monthly_avg, month, year from
f_employee_balance)q1
where row_num = 1
order by year, month;
Step 5: Conclusion
You could have just combined step 3 and 4 together; this would save you from creating extra table. When you are in the big data world, you don't worry much about wasting extra disk space or development time. You can easily add another disk or node and automate the process using workflows. For more information, please look into data warehousing concept and hive analytical queries.