Calculate balance in Power Query with M - powerquery

I have two tables:
The first holds the refunds sent to customers:
Customer ID
Refund ID
Total refunded
1
Ref1
$100.00
1
Ref2
$150.00
1
Ref3
$200.00
2
Ref4
$300.00
and the second links the above table with the relevant credit notes:
Customer ID
Credit note Allocation ID
Refund Amount Allocated
Refund ID
1
CnAll1
$100.00
Ref1
1
CnAll2
$100.00
Ref2
1
CnAll3
$50.00
Ref2
1
CnAll4
$100.00
Ref3
What I want to achieve is a third table with a combination of the two with the refund balance not allocated for each refund
Customer ID
Refund ID
Total refunded
Credit note Allocation ID
Refund Amount Allocated
Balance not allocated to any Credit note
1
Ref1
$100.00
CnAll1
$100.00
$-
1
Ref2
$150.00
CnAll2
$100.00
$-
1
Ref2
$150.00
CnAll3
$50.00
$-
1
Ref3
$200.00
CnAll4
$100.00
$100.00
2
Ref4
$300.00
$-
$300.00
Refund Ref1 is matched with the relevant credit note in full. The value of the credit note is = the refund. No balance not allocated
Refund Ref2 is matched in full as well against two credit note. The value of the credit notes is = the refund. No balance not allocated
Refund Ref3 is matched only for $ 100 because the value of the credit notes is $ 100. The balance not allocated is $ 200-100=$ 100
Refund Ref4 is not matched at all therefore the balance not allocated is $ 300
I cannot find the way to calculate the correct balance in all the scenarios above.

Its mostly a question of doing a few merges, then shifting the ID row by one for comparison
Load your second table as Table1 into powerquery
Load your first table into powerquery with code:
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer ID", Int64.Type}, {"Refund ID", type text}, {"Total refunded", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Customer ID", "Refund ID"}, Table1, {"Customer ID", "Allocated Refund ID"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Credit note Allocation ID", "Refund Amount"}, {"Credit note Allocation ID", "Refund Amount"}),
RefundByID=Table.Group(Table1, {"Allocated Refund ID"}, {{"Refund Amount", each List.Sum([Refund Amount]), type nullable number}}),
#"Merged Queries2" = Table.NestedJoin(#"Expanded Table1", {"Refund ID"},RefundByID, {"Allocated Refund ID"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries2", "Table1", {"Refund Amount"}, {"Refund Amount.1"}),
#"Replaced Value" = Table.ReplaceValue(#"Expanded Table2",null,0,Replacer.ReplaceValue,{"Refund Amount.1"}),
//shift Refund ID down one row
shiftedList = List.RemoveFirstN(#"Replaced Value"[Refund ID],1) & {null},
custom1 = Table.ToColumns(#"Replaced Value") & {shiftedList},
custom2 = Table.FromColumns(custom1,Table.ColumnNames(#"Replaced Value") & {"Next Row"}),
#"Added Custom" = Table.AddColumn(custom2, "Balance", each if [Refund ID]=[Next Row] then 0 else [Total refunded]-[Refund Amount.1]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Refund Amount.1", "Next Row"})
in #"Removed Columns"

Related

PowerPivot: get latest (date-based) record in table

I am trying to do some analysis on PowerPivot.
I have mainly 2 tables.
List of Companies and Total number of employees
Company
Total number of employees
Company A
50
Company B
10
And a "Transaction" Table about a migration, it lists how many employees were migrated on a specific date. See below. On 10.10.22 1 employee of Company A was migrated on 11.10.22 in total 2 employees were migrated. So it's always the total. And on 16.10.22 the total was 4 so 1 less than on the 15.10.22 (yeah can happen)
Company
Total Migrated employees
Date
Company A
1
10.10.22
Company A
2
11.10.22
Company A
5
15.10.22
Company A
4
16.10.22
Company B
1
15.10.22
Company B
2
16.10.22
At the end I want a table showing the progress in %, so something like this
Company
Total number of employees
Total Migrated (last day)
%
Company A
50
4
8%
Company B
10
2
20%
Any direction really appreciated.
I am thinking about Calculate and some kind of oldest function...
Thanks a lot for your help
Unclear to me if you want powerquery or powerpivot since you tagged both of them, but this is powerquery (M)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source,"Total Migrated (last day)",(i)=>Table.Sort(Table.SelectRows(Table2, each [Company]=i[Company]),{{"Date", Order.Descending}})[Total Migrated employees]{0},type number),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "%", each [#"Total Migrated (last day)"]/[Total number of employees],Percentage.Type)
in #"Added Custom1"
Found this - which is just what I needed:
https://www.youtube.com/watch?v=hidJ5T_DYQ0

Googlesheet Query Row Values in a Colunn by specific values in another

Sheet contains table and 9 query problem set.
https://docs.google.com/spreadsheets/d/1-YW7prEz2rkCKangks2CeDK2spCidGQa8hJJhXWmGiI/edit?usp=sharing
Trying to solve for each customer request:
1 For Mike, into one cell, list Title and Rating for Title that is specifically "Where is the Sun?" List only those with Rental Date that is not TODAY(), which is 9/1/2022
2 For Lizzy, into one cell with results sorted by latest Rental Date, find all Rental Dates, Ratings, and Elements, specifically by her Favorite Title: Where is the Venus?
3 For Ed, his favorite element is specifically "A", query Rental Date, Source, Title, into one Cell AND/OR bring back Rental Date, Source, Title for rating that is exactly 10.00%
4 For John, find Source, Title, Rating into one cell, sorted by oldest Rental Date, that contain "A" in element but those elements can never contain "K"
5 For Mona, find the Rental Date, Title, and Elements for the lowest rating. In another cell, do the same but find for the highest rating.
6 For Claire, find the Title, Rental Date, and Elements for highest rated title that is "Where is Venus"
7 For Frank, find the Title, Rental Date, and Elements whose Rating is closest to the Average Rating. In another cell, do the same for the the Rating that is farthest from the Average Rating.
8 For Jack, find the Rental Date and Title whose Rating is the biggest outlier in the range of Rating?
9 For Mina, list all Titles that have duplicate Rental Dates, Group titles and state their duplicate duplate Rental Dates.
your A2 cell is 9/1/2002 not 9/1/2022 (TODAY)
1:
=ARRAYFORMULA(TRIM(QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select C,D where C='Where is the Sun?' and A < date '"&TEXT(TODAY(), "e-m-d")&"'", 0)),,9^9)),,9^9)))
2:
=ARRAYFORMULA(TRIM(QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select A,D,E where C='Where is Venus?' order by A desc", 0)),,9^9)),,9^9)))
3:
=ARRAYFORMULA(TRIM(QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select A,B,C where E matches '.*(A,|, ?A$).*' and D = 0.1", 0)),,9^9)),,9^9)))
4:
=ARRAYFORMULA(TRIM(QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select B,C,D where E matches '.*(A,|, ?A$).*' and not E matches '.*(K,|, ?K$).*' order by A desc", 0)),,9^9)),,9^9)))
5:
=ARRAYFORMULA(TRIM({QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select A,C,E where D ="&MIN(D2:D), 0)),,9^9)),,9^9), QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select A,C,E where D ="&MAX(D2:D), 0)),,9^9)),,9^9)}))
6:
=ARRAYFORMULA(TRIM(QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY(A:E, "select C,A,E where C = 'Where is Venus?' order by D desc limit 1", 0)),,9^9)),,9^9)))
7:
=ARRAYFORMULA(TRIM({QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY({A2:E, ABS(D2:D-AVERAGE(D2:D))&""}, "select Col3,Col1,Col5 where Col6 ='"&MIN(ABS(D2:D-AVERAGE(D2:D)))&"'", 0)),,9^9)),,9^9), QUERY(CHAR(10)&FLATTEN(QUERY(TRANSPOSE(QUERY({A2:E, ABS(D2:D-AVERAGE(D2:D))&""}, "select Col3,Col1,Col5 where Col6 ='"&MAX(ABS(D2:D-AVERAGE(D2:D)))&"'", 0)),,9^9)),,9^9)}))
8:
no idea what "biggest outlier in the range of Rating" is. if you want that negative value just say so...
9:
kindly provide an example of the desired result in your sample sheet

Power query - Group a table based on the date and the hourly interval

I need to group a table based on the date and the hourly interval, using the Sum:
Date
Interval: from 8am today to <8am today+1
Previously I was using MS Access and a query to create it. Now I need to go through Power Query in MS Excel.
That was the SQL Query used before:
SELECT switch(Tbl_Prod_Chat.[Interval]>=8,Tbl_Prod_Chat.[Date],Tbl_Prod_Chat.[Interval]<8,Tbl_Prod_Chat.[Date]-1) AS LINK_DATE, Tbl_Prod_Chat.Agent, Sum(Tbl_Prod_Chat.ProdChat) AS Prod_Chat
FROM Tbl_Prod_Chat
GROUP BY Switch(Tbl_Prod_Chat.[Interval]>=8,Tbl_Prod_Chat.[Date],Tbl_Prod_Chat.[Interval]<8,Tbl_Prod_Chat.[Date]-1), Tbl_Prod_Chat.Agent;
The table is built as:
Field 1 "Date" (type/format: mm/dd/yyyy)
Field 2 "Interval" (type: whole number): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0
Field 3 "Volume of contact" (type: whole number)
The new table would be:
Field 1 "Date"
Field 2 "Total Volume" (sum on 24h from 8am toady to <8am Today+1).
Can you please help me on this?
Thanks
Seb
Sounds like you just need to add a single custom column
add column .. custom column...
= if [Interval] >7 or [Interval]=0 then [Date] else Date.AddDays([Date],-1)
or
= if [Interval] <8 and [Interval] > 0 then Date.AddDays([Date],-1) else [Date]
That will take all hours [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0] and use the current date and will take all hours [1,2,3,4,5,6,7] from the next date.
Then right click ... Group By .. on that new custom column and do operation Sum on Column: Volume of Contact, with whatever name you want in New Column Name
sample full code
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Interval", Int64.Type}, {"Volume of contact", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if [Interval] >7 or [Interval]=0 then [Date] else Date.AddDays([Date],-1), type date),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Volume of Contact", each List.Sum([Volume of contact]), type number}})
in #"Grouped Rows"

Insert rows for missing dates in Power Query

the starting point is the following table in which entries are made for events on specific days (journal).
Entity
Event
Date
Amount
0123
acquisition
05.05.2015
10,000.00
0123
capital increase
30.11.2015
1,000.00
0123
write-off
31.12.2017
-4,000.00
0123
write-up
31.12.2019
3,000.00
This journal is loaded into Power Query to be enhanced with additional information from other sources.
The goal is a Power Pivot table in which the amounts are summarized as at 31.12. of each year (Subtotals).
Year
Entity
Event
Date
Amount
2015
0123
aquisition
05.05.2015
10,000.00
2015
0123
capital increase
30.11.2015
1,000.00
2015 Subtotal
0123
11,000.00
2016 Subtotal
0123
11,000.00
2017
0123
write-off
31.12.2017
-4,000.00
2017 Subtotal
0123
7,000.00
2018 Subtotal
0123
7,000.00
2019
0123
write-up
31.12.2019
3,000.00
2019 Subtotal
0123
10,000.00
2020 Subtotal
0123
10,000,00
The question is how to insert rows in Power Query for years where no activity (event) has occurred (no entry in the journal) so that a subtotal can be shown in Power Pivot as of 31.12. of each year.
I hope I could explain my issue in an understandable way. Thanks in advance for your help!
Kind regards,
Joerg
See if something like this works for you. There are shorter, more confusing ways to do it
Get minimum year of all the data, and maximum year of all the data, and create a table of all combinations of years and entities. See if those are being used. If not, merge that year and entity back into the original table with month=dec day=31
there is a bit of self-merging etc, which requires pasting this into home...advanced... since not all of it can be done in the user interface
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Entity", Int64.Type}, {"Event", type text}, {"Date", type date}, {"Amount", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date])),
// Create table of all possible Entities and Years
DateList = {Date.Year(List.Min(#"Added Custom"[Date])) .. Date.Year(List.Max(#"Added Custom"[Date]))},
Entities = Table.AddColumn(Table.Distinct(Table.SelectColumns(#"Added Custom",{"Entity"})),"Year", each DateList),
#"Expanded Year" = Table.ExpandListColumn(Entities, "Year"),
// Find unique Data and merge into original data set
#"Merged Queries" = Table.NestedJoin(#"Expanded Year",{"Year", "Entity"},#"Added Custom",{"Year", "Entity"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"Date"}, {"Date2"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Table2", each ([Date2] = null)),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Date", each #date([Year],12,31), type date),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Date2", "Year"}),
#"Appended Query" = Table.Combine({#"Changed Type", #"Removed Columns" })
in #"Appended Query"

SQL sample groups

I have a sqlite database that I can read as:
In [42]: df = pd.read_sql("SELECT * FROM all_vs_all", engine)
In [43]:
In [43]: df.head()
Out[43]:
user_data user_model \
0 037d05edbbf8ebaf0eca#172.16.199.165 037d05edbbf8ebaf0eca#172.16.199.165
1 037d05edbbf8ebaf0eca#172.16.199.165 060210bf327a3e3b4621#172.16.199.33
2 037d05edbbf8ebaf0eca#172.16.199.165 1141259bd36ba65bef02#172.21.44.180
3 037d05edbbf8ebaf0eca#172.16.199.165 209627747e2af1f6389e#172.16.199.181
4 037d05edbbf8ebaf0eca#172.16.199.165 303a1aff4ab6e3be82ab#172.21.112.182
score Class time_secs model_name bin_id
0 0.283141 0 1514764800 Flow 0
1 0.999300 1 1514764800 Flow 0
2 1.000000 1 1514764800 Flow 0
3 0.206360 1 1514764800 Flow 0
4 1.000000 1 1514764800 Flow 0
As the table is too big I rather than reading the full table I select a random subset of rows:
This can be done very quckly as:
random_query = "SELECT * FROM all_vs_all WHERE abs(CAST(random() AS REAL))/9223372036854775808 < %f AND %s" % (ratio, time_condition)
df = pd.read_sql(random_query, engine)
The problem is that for each triplet [user_data, user_model, time_secs] I want to get all the rows containing that triplet. Each triplet appears 1 or 2 times.
A possible way to do it is to firstly sample a random set of triplets and then get all the rows that have one of the selected triplets but this seems to be too slow.
Is there an efficient way to do it?
EDIT: If I could load all the data in pandas I would have done something like:
selected_groups = []
for group in df.groupby(['user_data', 'user_model', 'time_secs']):
if np.random.uniform(0,1) > ratio:
selected_groups.append(group)
res = pd.concat(selected_groups)
Few sample join and sql query:
currently admitted :
Select p.patient_no, p.pat_name,p.date_admitted,r.room_extension,p.date_discharged FROM Patient p JOIN room r ON p.room_location = r.room_location where p.date_discharged IS NULL ORDER BY p.patient_no,p.pat_name,p.date_admitted,r.room_extension,p.date_discharged;
vacant rooms:
SELECT r.room_location, r.room_accomadation, r.room_extension FROM room r where r.room_location NOT IN (Select p.room_location FROM patient.p where p.date_discharged IS NULL) ORDER BY r.room_location, r.room_accomadation, r.room_extension;
no charges yet :
SELECT p.patient_no, p.pat_name, COALESCE (b.charge,0.00) charge FROM patient p LEFT JOIN billed b on p.patient_no = b.patient_no WHERE p.patient_no NOT IN (SELECT patient_no FROM billed) group by p.patient_no ORDER BY p.patient_no, p.pat_name,charge;
max salarised :
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MAX(salary) FROM physician) UNION
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MIN(salary) FROM physician) ORDER BY phy_id,phy_name, salary;
various item consumed by:
select p.pat_name, i.discription, count (i.item code) as item code from patient p join billed b on p.patient no = b. patient no join item i on b.item code = i.item code group by p.patient no, i.item code order by..
patient not receivede treatment:
SELECT p.patient_no,p.pat_name FROM patient p where p.patient_no NOT IN (SELECT t.patient_no FROM treats t)
ORDER BY p.patient_no,p.pat_name;
2 high paid :
Select phy_id, phy_name, date_of_joining, max(salary) as salary from physician group by salary having salary IN (Select salary from physician)
Order by phy_id, phy_name, date_of_joining, salary limit 2;
over 200:
select patient_no, sum (charge), as total charge from billed group by patient no having total charges > 200 order by patient no, total charges

Resources