power query how to extract data from text file - powerquery

I have a text file - in fact a report - that has several pages, each page having a header and a footer. The header has a string that indicates the topic covered in the body of the page. I would like to extract the body of the pages that relate to a specific topic. Headers and Footers have the same number of lines, and body has the same structure as shown in an example at the bottom of the note. How to extract the information about claims type BBB only ?
The number of rows to skip at the top of the report is unknown, as well as the number of rows to drop at the bottom of the report. Could somebody point me in the right direction ? Thank you.
Page 1
Claims type: AAA
Claim # Amount $
11111 10
11112 20
.....
End of Page 1
Page 2
Claims type : AAA
...etc.
End of Page 2
Page 3
Claims type : BBB
Claim # Amount $
21111 100
21112 200
.....
End of Page 3
Page 4
Claims type : CCC

You Can do it with UI only:
let
Source= Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
AddCustom = Table.AddColumn(Source, "Custom", each if Text.Start([Column1],6)="Claims" then Text.End([Column1],3) else if Text.Start([Column1],6)="End of" then "Trash" else null),
ReplErrs = Table.ReplaceErrorValues(AddCustom, {{"Custom", null}}),
FillDown = Table.FillDown(ReplErrs,{"Custom"}),
FilterBBB = Table.SelectRows(FillDown, each ([Custom] = "BBB")),
Rem1st = Table.Skip(FilterBBB,1),
Promoted = Table.PromoteHeaders(Rem1st)
in
Promoted

I don't think there's a way to do this purely through the UI. You'll want to use the Table.PositionOf and List.PositionOf methods.
Here is what I have:
let
Source = Table // however you get the table
#"Position of Claims" = Table.PositionOf(Source, [Column1 = "Claims type : BBB", Column2 = null]),
// Remove entries above the table belonging to Claims type BBB.
#"Remove Top Rows" = Table.Skip(Source, #"Position of Claims" + 2),
// Check which column has the "End of Page" tag
#"Added Custom" = Table.AddColumn(#"Remove Top Rows", "Custom", each if [Column1] is text and Text.StartsWith([Column1], "End of Page") then 1 else 0),
#"Position of End of Page" = List.PositionOf(#"Added Custom"[Custom], 1),
// Remove rows that don't belong to this page's table
#"Remove Bottom Rows" = Table.FirstN(#"Added Custom", #"Position of End of Page"),
// Remove the column that told us which row had End of Page on it
#"Removed Columns" = Table.RemoveColumns(#"Remove Bottom Rows",{"Custom"})
in
#"Removed Columns"`

Related

Power Query: Extracting value from nested lists

Hello hope someone can assist me in a power query I'm having trouble with. I'm brand new to Power Query and the M language and while I do have some coding background coding is not my day job. I'm pulling data from web page and the data in one column that is a list nested in a list.
This is a clip of what I see at from query initially:
I then drill down on the list and see this for all of the rows:
I then drill down again in that list and get a this:
At this level I get at least one row but there could be many rows.
What I want is to take all of the values and combine them into one cell as a bulleted list like this:
Any assistance on how to do this would be appreciated
I've tried looking at some of the examples in other threads and only get errors when I do this.
You didn't really provide enough detail here, but it looks like a bunch of lists within lists
you can run them through something like this to expand them all. If the results dont look like what you want, provide more information and sample data we can reproduce
let Source = <<copy whatever your source is here>>,
//Marcel Beug 2017
TableSchema = Table.Schema(Source),
ColumnNames = Table.SelectColumns(TableSchema,{"Name"}),
IsListColumn = Table.AddColumn(ColumnNames, "IsListColumn?", each List.AllTrue(List.Transform(Table.Column(Source,[Name]), each _ is list))),
NonListColumns = Table.SelectRows(IsListColumn, each ([#"IsListColumn?"] = false)),
NonListColumnNames = Table.RemoveColumns(NonListColumns,{"IsListColumn?"})[Name],
SelectNonListColumns = Table.SelectColumns(Source,NonListColumnNames),
ListColumns = Table.SelectRows(IsListColumn, each ([#"IsListColumn?"] = true)),
ListColumnNames = Table.RemoveColumns(ListColumns,{"IsListColumn?"})[Name],
SelectListColumns = Table.SelectColumns(Source,ListColumnNames),
TableFromLists = Table.AddColumn(SelectListColumns, "TableFromLists", each Table.FromColumns(Record.FieldValues(_))),
ListTables = Table.SelectColumns(TableFromLists,{"TableFromLists"}),
Custom1 = Table.FromColumns({Table.ToRecords(SelectNonListColumns),Table.ToRecords(ListTables)}),
#"Expanded Column1" = Table.ExpandRecordColumn(Custom1, "Column1", Table.ColumnNames(#table(List.Min({1,List.Count(NonListColumnNames)}),{})), NonListColumnNames),
#"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column2", {"TableFromLists"}, {"TableFromLists"}),
#"Expanded TableFromLists" = Table.ExpandTableColumn(#"Expanded Column2", "TableFromLists", Table.ColumnNames(#table(List.Count(ListColumnNames),{})), ListColumnNames),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded TableFromLists",ColumnNames[Name])
in #"Reordered Columns"
EDIT for specific website clarification
let Source = Web.Page(Web.Contents("https://ised-isde.canada.ca/site/high-speed-internet-canada/en/universal-broadband-fund/selected-universal-broadband-fund-projects")),
#"Expanded Data" = Table.ExpandTableColumn(Source, "Data", {"Location of project", "Number of Households to be served / Number of kilometers to be covered (mobile projects)", "Funding recipient", "Funding amountFootnote *"}, {"Location of project", "Number of Households to be served / Number of kilometers to be covered (mobile p", "Funding recipient", "Funding amountFootnote *"}),
#"Added Custom" = Table.AddColumn(#"Expanded Data", "Location of Project2", each Text.Combine([Location of project]{1},"#(lf)")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Source", "ClassName", "Id", "Location of project"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Caption] <> "Document")),
#"FundingToAmount" = Table.TransformColumns(#"Filtered Rows",{{"Funding amountFootnote *", each Number.From(Text.Select(_,{"0".."9",".","$"})), type number}})
in #"FundingToAmount"
then in excel format that column as text control [x] wrap text

Power query, iterate over the column records to apply a custom cumulative calculation

Using Power Query in Excel. I am trying to implement a custom column that would iteratively calculate the row based on the previous row's value of the same column.
I have a 3 column table and the 4th column will be the calculation column that I am failing to implement.
The calculation is very easy to apply in Excel which goes as follows:
Formula in cell D3 --> = =IF(A3=1,C3+6.4,IF(C3+D2>=12.8,12.8,IF(C3+D2<=1.28,1.28,C3+D2)))
The same formula is applied to the whole column by dragging.
The idea behind it:
For each category, I have an index column starting from 1,
If Index = 1, then Calculation is Value + 6.4,
else if Value + Value(previous row Custom cumulative) >= 12.8 then 12.8
else if Value + Value(previous row Custom cumulative) <= 1.28 then 1.28
else Value + Value(previous row Custom cumulative)
So, the calculation is a cumulative sum with an upper and lower cap built into it.
How can I implement this in Power Query and M-Language?
I really appreciate your help!
I have tried to use List.Generate and List.Accumulate features, however, I was stuck with creating records that has values from multiple columns in it.
Try this
(edited to make more efficient with single pass process)
let Source = Excel.CurrentWorkbook(){[Name="Table15"]}[Content],
process = (zzz as list) => let x= List.Accumulate( zzz,{0},( state, current ) =>
if List.Last(state) =0 then List.Combine ({state,{6.4+current}}) else
if List.Last(state)+current >=12.8 then List.Combine ({state,{12.8}}) else
if List.Last(state)+current <=1.28 then List.Combine ({state,{1.28}}) else
List.Combine ({state,{List.Last(state)+current}})
) in x,
#"Grouped Rows" = Table.Group(Source, {"Category"}, {{"data", each
let a=process(_[Values])
in Table.AddColumn(_, "Custom Cumulative", each a{[Index]}), type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Index", "Values", "Custom Cumulative"}, {"Index", "Values", "Custom Cumulative"})
in #"Expanded data"

Duplicate row for each item in column

I can do things in PowerQuery but I can't find how to achieve the following result:
Before:
After:
The goal is to duplicate the last row (filtered with Project Code=null) for each item in Project Code column. I think duplicating the row as is is important to keep the Metadata Table and expand it later.
Thank you very much for your help.
Try this
Grab all rows except null row
Get unique values of Project column in a list
Grab null row
Create row with the list and expand it
Put the two tables back together.
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
AllButNull=Table.SelectRows( Source, each ([Project Code] <> null)),
UniqueProjects=List.Distinct(AllButNull[Project Code]),
OnlyNull = Table.SelectRows(#"Added Custom", each ([Project Code] = null)),
#"Replaced Value" = Table.ReplaceValue(OnlyNull,null, UniqueProjects,Replacer.ReplaceValue,{"Project Code"}),
#"Expanded Project Code" = Table.ExpandListColumn(#"Replaced Value" , "Project Code"),
combined = AllButNull & #"Expanded Project Code"
in combined
Alternately, grab the last row instead of the null row:
Grab all rows except last
Get unique values of Project column in a list
Grab last row
Create row with the list and expand it
Put the two tables back together.
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
AllButLast=Table.RemoveLastN(Source ,1),
UniqueProjects=List.Distinct(AllButLast[Project Code]),
OnlyBottomRow = Table.LastN(#"Added Custom", 1),
#"Replaced Value" = Table.ReplaceValue(OnlyBottomRow,null, UniqueProjects,Replacer.ReplaceValue,{"Project Code"}),
#"Expanded Project Code" = Table.ExpandListColumn(#"Replaced Value", "Project Code"),
combined = AllButLast & #"Expanded Project Code"
in combined

Remove specific character in position if len > 10 power query editor

Using Power Query Editor I am trying to remove the (Right,3) character if the LEN of the string is > 11.
This is what I am working with now ' = Table.RenameColumns(#"Merged Columns",{{"Merged", "oe_nosuf"}})'
example of current value : "65507129-02"
If the value is "65507129-002" then I want to remove the extra "0" from the right 3 spaces.
any help is appreciated.
Sample two ways to do this, a new column based on Column1 or replacing the current value of Column1
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.Length([Column1]) = 12 then Text.Start([Column1],9) & Text.End([Column1],2) else [Column1]),
#"Modify" = Table.TransformColumns(#"Added Custom",{{"Column1", each if Text.Length(_) = 12 then Text.Start(_,9) & Text.End(_,2) else _ , type text}})
in #"Modify"

How to tell Power BI that it should use a filter inside the calculate formula and use it as filter for other visualizations?

I am creating a loan process approval report in Power BI. One of the visuals ('funnel') displays the total count of applications, the count of applications approved at the applicant level, the count of applications approved at the product level, and the count of applications that are approved at both levels.
These measures are calculated like this:
Approved applicants =
CALCULATE(
COUNT(ApplicationDecision[applicantEligibility]);
ApplicationDecision[applicantEligibility] = 1)
I.e., it counts fields in the specific column that are equal to 1 and leaves out the residual 'zero' fields.
What I need is that this funnel visualization works as a filter, i.e. when I click the 'Approved applicants' panel, all other visualizations will be filtered by the condition 'ApplicationDecision[applicantEligibility] = 1'.
Is there a way to tell the report that it should take the filter for 'calculate' and make it work as a report-level filter when clicked on?
Thank you very much for any hint, hope I was specific enough!
Edit:
Here is the data example:
applicationUniqueId | applicantEligibility | productEligibility | applicationEligibility
A0001 1 1 1
A0002 1 0 0
A0003 0 1 0
A0004 1 1 1
A0005 0 0 0
A0006 1 0 0
And for these data, the visual would show me:
Applications: 6
Approved applicants: 4
Approved products: 3
Approved applications: 2
What I need is that when I click e.g. on row 'Approved applicants', the whole report will filter based on the condition:
[applicantEligibility]='1'
The first thought - I would unpivot the data. Then all the funnel types would be in one variable. You can easily slice that variable.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcjQwMDBU0lFC4FgdiKgRVMQAjGGixlARQxRRE6wmmMJ1I6s1QzU3FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"applicationUniqueId " = _t, #"applicantEligibility " = _t, #"productEligibility " = _t, applicationEligibility = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"applicationUniqueId ", type text}, {"applicantEligibility ", Int64.Type}, {"productEligibility ", Int64.Type}, {"applicationEligibility", Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"applicationUniqueId "}, "Attribute", "Value"),
#"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Attribute", "variable"}})
in
#"Renamed Columns"

Resources