Power Pivot - Aggregate within groups to determine max value - dax

I'm looking for a DAX formula (for Power Pivot) that aggregates within certain groups and across other groups to determine the maximum.
Here's my data table:
State
Customer
Fruit
Qty
NY
A
Apple
5
NY
A
Orange
1
NY
A
Pear
5
NY
B
Apple
1
NY
B
Orange
6
NY
C
Apple
2
NY
C
Orange
2
NY
C
Pear
5
CA
D
Orange
4
CA
D
Pear
2
I want to determine the most popular fruit by State (ignoring Customer). In NY, there are a total of 8 apples, 9 oranges, and 10 pears. So the formula should return Pear.
Resulting in a table like this:
State
Dominant Fruit
NY
Pear
CA
Orange
What is the Power Pivot formula I need for that Dominant Fruit column on the resulting table? Thanks

You can create a measure to rank the amount of fruits per state like so:
Ranking = RANKX( ALLEXCEPT( 'Table','Table'[Customer],'Table'[State] ) , CALCULATE( SUM( 'Table'[Qty] ) ) )
This measure will rank "Dominant Fruit" (based on the quantity) with 1.
You can than add filter on visual to show only values where rank is 1:

Related

Add multiple custom row measures in PowerBI

I have a PowerBi matrix and I'm trying to 3 some custom rows at the end of each group but can't figure out how to do so. Below is what the matrix looks like.
Salesperson
Total Units Sold
John
Apples
10
Oranges
5
Spoilage
2
Katie
Mangoes
12
Apples
9
Pears
15
Spoilage
1
And I'm trying to get a Total, Net and Percentage into the matrix as shown below. Total Fruits is a summation of all the rows above except the spoilage row. Net is the summation of all above including the Spoilage and Percentage (Pct) is Spoilage divided by Total Fruits.
Salesperson
Total Units Sold
John
Apples
10
Oranges
5
Total Fruits
15
Spoilage
2
Net
13
Pct
13.3%
Katie
Mangoes
12
Apples
9
Pears
15
Total Fruits
36
Spoilage
1
Net
35
Pct
2.9%
I have a fact table that records each fruit sold by the product code and the salesperson id and dimension tables for the salesperson and the products.
I'm new to PowerBI and so I would appreciate all the details to make this work.

How to create means in panel data for specific years?

I need help in a particular issue with Stata. I have a panel dataset by id year from 1996 to 2018.
The panel data is a combination of world countries and regions, yearly observations, for 7 different crops, area cultivated.
I would like to create a mean around years 2000, 2010 and 2018, so that mean(year2000)= mean of (1999+2000+2001), mean(year2010)=mean from (2009+2010+2011) and mean(year2018)= mean from (2016+2017+2018) for every crop from my 7 crops selection.
Then the problem is even more complicated when I need to combine some countries to form sub-regions: say I need the sub-region RUS1 = Russia + Ukraine. How can I create another variable that shows the total from crop1 between crop1 area cultivated in Russia + crop1 area cultivated in Ukraine on yearly basis. Meaning another variable that shows these sums for each year using the above means.
I've tried with by id year: egen area_rus1=total(area) if area=="Russia" & area=="Ukraine"
but nothing works.
The names of area being strings I used encode (area), gen (area2) and automatically Stata generates a number.
In order to create a panel dataset i've used gen id=area2+itemcode
The panel data looks like this after sort year
Please be aware that the period is 1996-2018. The example above shows only year 1996.
This didn't get much of a response, for several reasons:
You didn't show very much code.
You didn't show data in a form that is especially useful. An image can't be copied and pasted easily into someone's Stata to allow experiment. In fact your image shows variables that are irrelevant and variables that are different versions of each other and so is much more complicated than we need.
You escalated the question to ask the most complicated version of what you want to know.
There is a problem you should have explained better. area is string and so totals can't be calculated at all and area2 is just arbitrary integers so totals can be calculated but don't make sense. "nothing works" is not informative as a problem report. The only totals that make sense to me are totals of value.
You need to simplify your problem first and then build up.
The essence seems to be as follows:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 country str6 item float year str1 region float value
"A" "barley" 1999 "X" 1
"B" "barley" 1999 "X" 2
"C" "barley" 1999 "Y" 3
"A" "barley" 2000 "X" 4
"B" "barley" 2000 "X" 5
"C" "barley" 2000 "Y" 6
"A" "barley" 2001 "X" 7
"B" "barley" 2001 "X" 8
"C" "barley" 2001 "Y" 9
end
* means by countries: similar variables for other periods
egen mean_9901_c = mean(cond(inrange(year, 1999, 2001), value, .)), by(country item)
* aggregation to regions, but ensure that you don't double count
egen value_region = total(value), by(region item year)
egen tag = tag(region item year)
* means by regions: similar variables for other periods
egen mean_9901_r = mean(cond(tag == 1 & inrange(year, 1999, 2001), value_region, .)), by(region item)
list, sepby(year)
+---------------------------------------------------------------------------------+
| country item year region value mean_9~c value_~n tag mean_9~r |
|---------------------------------------------------------------------------------|
1. | A barley 1999 X 1 4 3 1 9 |
2. | B barley 1999 X 2 5 3 0 9 |
3. | C barley 1999 Y 3 6 3 1 6 |
|---------------------------------------------------------------------------------|
4. | A barley 2000 X 4 4 9 1 9 |
5. | B barley 2000 X 5 5 9 0 9 |
6. | C barley 2000 Y 6 6 6 1 6 |
|---------------------------------------------------------------------------------|
7. | A barley 2001 X 7 4 15 1 9 |
8. | B barley 2001 X 8 5 15 0 9 |
9. | C barley 2001 Y 9 6 9 1 6 |
+---------------------------------------------------------------------------------+
The example shows just one item, but the code should work for several.
The example shows fake data for just three years, but means for other periods can be constructed similarly.
Results are repeated for all observations to which they apply. To see or use results just once, use if. For example the means over 1999 to 2001 are shown for each of those years (and others) but if year == 1999 would be a way to see results just once.
See also help collapse, help egen for its tag() function and this paper.
What was wrong with your code
Your problems start with
if area=="Russia" & area=="Ukraine"
which selects observations for which it is true that area is both "Russia" and "Ukraine" in the same observation, which is impossible. You need the | (or) operator there, not the & operator, or to approach the problem in another way.
The prefix id is wrong too. Using by id: enforces separate calculations for different values of id and is going to make the combinations of identifiers impossible.

How to find average of tuples in relational algebra calculator

Problem is to use the group by function to find only the average of books checked out by students of a specific department. However, it keeps outputting the average of all checked out books from all students.
What I have so far:
γ avg(Books_Quantity) -> y (Student) ⨝ (σ Department = 'Computer_Science' (Student))
The output should be 1.75, but is instead outputting the average for all the departments.
y Student.Student_ID Student.Student_Name Student.Department Student.Books_Quantity
1.5 1 John Computer_Science 2
1.5 2 Lisa Computer_Science 1
1.5 5 Xina Computer_Science 3
1.5 7 Chang Computer_Science 1
I found the answer. You have to put the Select option inside the table selection operation. Like so:
γ avg(Books_Quantity) -> y (σ Department = 'Computer_Science' (Student))

Sort on an separate table without joining result in pandas?

I have the following data:
fruit = pd.DataFrame({'fruit': ['apple', 'orange', 'apple', 'blueberry'],
'colour': ['red', 'orange', 'green', 'black']})
costs = pd.DataFrame({'fruit': ['apple', 'orange', 'blueberry'],
'cost': [1.7, 1.4, 2.1]})
I want a copy of the fruit table sorted by cost from the costs table, but without the cost column included. What's the best way to do this? It's fine if there's a join in an intermediate step - I'm mostly worried about long-term memory waste.
I would do a left merge and then argsort:
In [11]: fruit.merge(costs, how="left")
Out[11]:
colour fruit cost
0 red apple 1.7
1 orange orange 1.4
2 green apple 1.7
3 black blueberry 2.1
Note: that if you used a different index (for fruits), it will be ignored/replaced with range(0, len(fruit)).
In [12]: fruit.merge(costs, how="left")["cost"].argsort()
Out[12]:
0 1
1 0
2 2
3 3
Name: cost, dtype: int64
Now reorder using iloc (by position) rather than loc (by label).
In [13]: fruit.iloc[fruit.merge(costs, how="left")["cost"].argsort()]
Out[13]:
colour fruit
1 orange orange
0 red apple
2 green apple
3 black blueberry
Note: It's important to left merge as an ordinary merge will change the order (!!). It's also more efficient.
An alternative, cleaner, but less efficient way:
In [21]: fruit.merge(costs).sort("cost").loc[:, fruit.columns]
Out[21]:
colour fruit
2 orange orange
0 red apple
1 green apple
3 black blueberry
Note: In the next pandas, sort_values might be preferred over sort...
why don't you merge the columns and then drop the unneeded one
pd.merge(fruit , costs).sort_index(by = 'cost').drop('cost' , axis = 1 )

How to optimize Cartesian product

Is there a better way to compute Cartesian product. Since Cartesian product is a special case that differs on each case. I think, I need to explain what I need to achieve and why I end up doing Cartesian product. Please help me if Cartesian product is the only solution for my problem. If so, how to improve the performance?
Background:
We are trying to help customers to buy products cheaper.
Let say customer ordered 5 products (prod1, prod2, prod3, prod4, prod5).
Each ordered product has been offered by different vendors.
Representation Format 1:
Vendor 1 - offers prod1, prod2, prod4
vendor 2 - offers prod1, prod5
vendor 3 - offers prod1, prod2, prod5
vendor 4 - offers prod1
vendor 5 - offers prod2
vendor 6 - offers prod3, prod4
In other words
Representation Format 2:
Prod 1 - offered by vendor1, vendor2, vendor3, vendor4
Prod 2 - offered by vendor5, vendor3, vendor1
prod 3 - offered by vendor6
prod 4 - offered by vendor1, vendor6
prod 5 - offered by vendor3, vendor2
Now to choose the best vendor based on the price. We can sort the products by price and take the first one.
In that case we choose
prod 1 from vendor 1
prod 2 from vendor 5
prod 3 from vendor 6
prod 4 from vendor 1
prod 5 from vendor 3
Complexity:
Since we chose 4 unique vendors, we need to pay 4 shipping prices.
Also each vendor has a minimum purchase order. If we don't meet it, then we end up paying that charge as well.
In order to choose the best combination of products, we have to do Cartesian product of offered products to compute the total price.
total price computation algorithm:
foreach unique vendor
if (sum (product price offered by specific vendor * quantity) < minimum purchase order limit specified by specific vendor)
totalprice += sum (product price * quantity) + minimum purchase charge + shipping price
else
totalprice += sum (product price * quantity) + shipping price
end foreach
In our case
{vendor1, vendor2, vendor3, vendor4}
{vendor1, vendor3, vendor5}
{vendor6}
{vendor1, vendor6}
{vendor2, vendor3}
4 * 3 * 1 * 2 * 2 = 48 combination needs to be computed to find the best combination.
{vendor1,vendor1, vendor6, vendor1, vendor2} = totalprice1,
{vendor1, vendor3, vendor6, vendor1, vendor2} = totalprice2,
*
{vendor4, vendor5, vendor6, vendor6, vendor3} = totalprice48
Now sort the computed total price to find the best combination.
Actual problem:
If the customer orders more than 15 products, and assume, each product has been offered by 8 unique vendors, then we end up computing 8^15=35184372088832 combinations, which takes more than couple of hours. If the customer orders more than 20 products then it takes more than couple of days.
Is there a solution to approach this problem in a different angle?
Your problem can get even more complex. A simple example:
Product 1 2 3
Vendor 1 10 20 40
Vendor 2 20 10 40
--------------------------
needed cnt 100 100 25
You need 100 El. of P1, 100 of P2, and 25 of P3.
P1 can be purchased for 1000 at V1, P2 for 1000 at V2, and P3 for 1000 at V1 or V3.
Now shipping would be free, if you purchase for 1500, but cost you 200 at each vendor else.
So if you order everything at V1, you would pay 4000:
1000+2000+1000+0 (shipping) = or for the same sum
2000+1000+1000+0 at V2, or splitted
1000+0+0+200 = 1200 at V1 plus
0+1000+1000+0 = 2000 at V2,
which sums up to 3200 and could be found by your method.
But you could split the purchase of product 3 this way:
1000+0+500+0 = 1500 at V1 plus
0+1000+500+0 = 1500 at V2
which only sums up to 3000 and would not be found by your method.
Afaik, there is established research in such topics, and the keywords are matrices and system of equations.
You can describe your problem as
f(c11, p11) + f(c22, p12) + f(c13, p13) = c1 => dc1
f(c21, p21) + f(c22, p22) + f(c23, p23) = c2 => dc2
...
f(c31, p31) + f(c32, p32) + f(c13, p33) = c3 => dc3
where cij is the count of product j at vendor i and pij is the price of product j at vendor i, but f(c11,p11) is not just count*price, but a function of count and price, since there might be a quantity discount. The right side is the purchase total for vendor i.
This is without purchase discount, which has to be modeled on top. If the discount on shipping is only depending on the total costs, it can be modeled just from ci => dci.
You would try to minimize sum (dc1+dc2+...+dcm).

Resources