I have created a matrix report that has a Row Group of Person and a column group of Subject with Child groups of SubmittedBy and Grading Name. The data that is being pulled through is a mixture of numerical and alphabetic data. I am needing to do a count per row of how many of a certain criteria appear. For example, I wish to count by person how many 1's they have achieved, or how many A* they have achieved (this is all grading data from a school). Is there a way to do this with matrix data?
If there is a way, can you then get a count using certain criteria, for example, if grading Name = focus count how many WB grades there are per person (row)?
This is the Design
This shows a subsection of data (as it is a very large dataset), but with SchoolID, Forename and Surname cut off to preserve anonymity.
A simple way would be to add Total to the Subject row group and use the count() function:
[
Related
I have a spreadsheet of all my personal budgeting, and I've allocated different amounts to different categories (e.g. power bills, cars etc.) I want to have a cell that shows how much is remaining in each individual category.
For example, say I have 1000 allocated for power bills per year. I want a cell that has the formula of something along the lines of:
=1000-IFS(F2="Power",E2)
Where column F is the category of bill in text, and column E is the amount each bill was.
I want it this formula to apply for the whole column, instead of me needing to enter in each individual cell with conditions. Can someone help please?
I have tried:
=1000-IFS(F:F="Power",E:E)
But this only comes up with error messages.
try in row 1:
=INDEX(1000-IF(F:F="Power", E:E, ))
I'm trying to create a calculated column based on a derived measure in SSAS cube, this measure which will count the number of cases per order so for one order if it has 3 cases it will have the value 3.
Now I'm trying to create a bucket attribute which says 1caseOrder,2caseOrder,3caseOrder,3+caseOrder. I tried the below one
IF([nrofcase] = 1, "nrofcase[1]", IF([nrofcase] = 2, "nrofcase[2]",
IF([nrofcase] = 3, "nrofcase[3]", "nrofcase[>3]") )
But it doesn't work as expected, when the level of the report is changed from qtr to week it was suppose to recalculate on different level.
Please let me know if it case work.
Calculated columns are static. When the column is added and when the table is processed, the value is calculated and stored. The only way for the value to change is to reprocess the model. If the formula refers to a DAX measure, it will use the measure without any of the context from the report (eg. no row filters or slicers, etc.).
Think of it this way:
Calculated column is a fact about a row that doesn't change. It is known just by looking at a single row. An example of this is Cost = [Quantity] * [Unit Price]. Cost never changes and is known by looking at the Quantity and Unit Price columns. It doesn't matter what filters or context are in the report. Cost doesn't change.
A measure is a fact about a table. You have to look at multiple rows to calculate its value. An example is Total Cost = SUM(Sales[Cost]). You want this value to change depending on the context of time, region, product, etc., so it's value is not stored but calculated dynamically in the report.
It sounds like for your data, there are multiple rows that tell you the number of cases per order, so this is a measure. Use a measure instead of a calculated column.
I have a matrix - 1,172 words down column A, then the same 1,172 names across row 1. Then each word is cross-referenced with all the other names to give a similarity score (this is already done).
In another sheet, I want to look up a word, and return all the words with which it has a certain similarity score - in this case, greater than or equal to 0.33. I attach a MWE, in which I give an idea of the answer I am looking for by looking it up manually.
I think it's some sort of reverse lookup. As in, instead of finding the value corresponding to a particular row and a particular column, it's finding the column based on value in the main sheet and row. I'm just really stuck at this point and would massively appreciate some help. Thanks! MWE here
If your words on the second sheet are in the same order then:
=IFERROR(TEXTJOIN(", ",,FILTER(Scores!B$1:W$1,(Scores!B2:W2>=0.33)*((Scores!B2:W2<1)))),"-")
Drag down.
Explanation:
Filter the values from row 1 according to the similarity score condition, using FILTER.
Concatenate the filtered values using TEXTJOIN.
I have transactional data which contains customer information as well as stores they shopped from. I can count the number of different stores each customer used by a simple DISTINCTCOUNT([Site Name]) measure.
There are millions of customers and I want to make a simple summary table which shows the sum of # customers who visited X number of stores. Like a histogram. Maximum stores they visited is 6, minimum is 1.
I know there are multiple ways to do this but I am new to DAX and can't do what I think yet.
The easiest way:
Assuming your DISTINCTCOUNT([Site Name]) measure is called CustomerStoreCount ...
Add a new dimension table, StoreCount, to your model containing a single column, StoreCount. Populate it with the values 1,2,3,4,5,6 (... up to maximum number of stores.)
Create a measure, ThisStoreCount = MAX(StoreCount[StoreCount]).
Create a base customer count measure, TotalCustomers:=DISTINCTCOUNT(CustomerTable[Customer])
Create a contextual measure, CustomersWhoVisitedXNumberOfStores := CALCULATE ( TotalCustomers, FILTER(VALUES(CustomerTable[Customer]), ThisStoreCount = CustomerStoreCount) )
On your pivot table / reporting tool, etc. use StoreCount[StoreCount] on the axes and CustomersWhOVisitedXNumberOfStores as the measure.
So basically walk through the customer list (since there's no relationship between StoreCount and CustomerTable), compare that customer's CustomerStoreCount with the maximum StoreCount[StoreCount] value, which for each StoreCount[StoreCount] value is ... drum roll itself. If it matches, keep it, otherwise filter it out; you end up with a count of customers whose store visits equals the value of StoreCount[StoreCount].
And of course the more general modeling hint: when you want to display a metric by something (i.e. customer count by number of stores visited), that something is an attribute, not a metric.
We have a scenario like this:
Millions of records (Record 1, Record 2, Record 3...)
Partitioned into millions of small non-intersecting groups (Group A, Group B, Group C...)
Membership gradually changes over time, i.e. a record may be reassigned to another group.
We are redesigning the data schema, and one use case we need to support is given a particular record, find all other records that belonged to the same group at a given point in time. Alternatively, this can be thought of as two separate queries, e.g.:
To which group did Record 15544 belong, three years ago? (Call this Group g).
What records belonged to Group g, three years ago?
Supposing we use a relational database, the association between records and groups is easily modelled using a two-column table of record id and group id. A common approach for allowing historical queries is to add a timestamp column. This allows us to answer the question above as follows:
Find the row for Record 15544 with the most recent timestamp prior to the given date. This tells us Group g.
Find all records that have at any time belonged to Group g.
For each of these records, find the row with the most recent timestamp prior to the given date. If this indicates that the record was in Group g at that time, then add it to the result set.
This is not too bad (assuming the table is separately indexed by both record id and group id), and may even be the optimal algorithm for the naive table structure just described, but it does cost an index lookup for every record found in step 2. Is there an alternative data structure that would answer the query more efficiently?
ETA: This is only one of several use cases for the system, so we don't want to speed up this query at the expense of making queries about current groupings slower, nor do we want to pay a huge price in space consumption, etc.
How about creating two tables:
(recordID, time-> groupID) - key is recordID, time - sorted by
recordID, and secondary by time (Let that be map1)
(groupID, time-> List) - key is groupID, time - sorted by
recordID, and secondary by time (Let that be map2)
At each record change:
Retrieve the current groupID of the record you are changing
set t <- current time
create a new entry to map1 for old group: (oldGroupID,t,list') - where list' is the same list, but without the entry you just moved out from there.
Add a new entry to map1 for new group: (newGroupId,t,list'') - where list'' is the old list for the new group, with the changed record added to it.
Add a new entry (recordId,t,newGroupId) to map1
During query:
You need to find the entry in map2 that is 'closest' and smaller than
(recordId,desired_time) - this is classic O(logN) operation in
sorted data structure.
This will give you the group g the element belonged to at the desired time.
Now, look in map1 similarly for the entry with key closest but smaller than (g,desired_time). The value is the list of all records that are at the group at the desired time.
This requires quite a bit of more space (at constant factor though...), but every operation is O(logN) - where N is the number of record changes.
An efficient sorted DS for entries that are mostly stored on disk is a B+ tree, which is also implemented by many relational DS implementations.