Dynamically rename a set of columns using Power Query - powerquery

I am trying to dynamically rename a set of columns in Power Query, List1 being the original column names and List2 the new column names. I think I need to merge List1 and List2 into a single list of pairs, but can't figure out the correct syntax.
Many thanks!
let
//list of original column names
List1= {"Name1","Name2","Name3","Name4"},
//Create test table
Source = Table.FromRows({{1231,1233,4121,5232},{3546,3426,1246,3464}} , List1),
//list of new column names
List2 = {"NewName 1","NewName 2","NewName 3","NewName 4"},
//Rename columns (in practice, the two lists of names will be dynamic, not hard coded as below)
Result = Table.RenameColumns(Source, {
{"Name1","NewName 1"},
{"Name2","NewName 2"},
{"Name3","NewName 3"},
{"Name4", "NewName 4"}})
in
Result

If you have a table with old and new names then you can use following pattern
let
rename_list = Table.ToColumns(Table.Transpose(Table2)),
result = Table.RenameColumns(Table1, rename_list, MissingField.Ignore)
in result
where Table2 is "Rename Table" and Table1 is initial table with data.
This idea is described in details here
https://bondarenkoivan.wordpress.com/2015/04/17/dynamic-table-headers-in-power-query-sap-bydesign-odata/

If you have the resulting column names you want, it seems like you could convert Source back to rows, then call Table.FromRows on List2
let
//list of original column names
List1= {"Name1","Name2","Name3","Name4"},
//Create test table
Source = Table.FromRows({{1231,1233,4121,5232},{3546,3426,1246,3464}} , List1),
//list of new column names
List2 = {"NewName 1","NewName 2","NewName 3","NewName 4"},
Result = Table.FromRows(Table.ToRows(Source), List2)
in
Result
(Unless it is wrong to assume that e.g. Name 2 will always be the second column.)

Stating the original problem according to Ivan's solution, here goes. Carl's has the same result and is a little simpler for the example I gave, however, my situation will benefit from having the rename pairs set out explicitly in a table (ie. Table2). Plus using the MissingField.Ignore parameter with Table.RenameColumns means that it will only change the selection of columns I want to rename in my production query, the rest will remain unchanged.
let
//list of original column names
List1= {"Name1","Name2","Name3","Name4"},
//Create test table
Source = Table.FromRows({{1231,1233,4121,5232},{3546,3426,1246,3464}} , List1),
//list of new column names
List2 = {"NewName 1","NewName 2","NewName 3","NewName 4"},
//Rename columns (in practice, the two lists of names will be dynamic, not hard coded as below)
//Bring List1 and List2 together as rows in a table
Table2 = Table.FromRows({List1,List2}),
//Create a list of rename pairs
RenameList = Table.ToColumns(Table2),
//Call to Table.RenameColumns
Result = Table.RenameColumns(Source, RenameList, MissingField.Ignore)
in
Result

Finally... figured it out using the following function
Table.TransformColumnNames(table as table, nameGenerator as function, optional options as nullable record) as table
First create a nameGenerator function (e.g. MyFuncRenameColumns) to provide a new column name given any original column name as an input.
In my example, here's my code for MyFuncRenameColumns:
let
MyFunctionSwitchColumnName = (originalColumnName) as text =>
let
//list of original column names
List1= {"Name1","Name2","Name3","Name4"},
//Create table
Source = Table.FromRows({{1231,1233,4121,5232},{3546,3426,1246,3464}} , List1),
//list of new column names
List2 = {"NewName 1","NewName 2","NewName 3","NewName 4"},
//Create table matching List1 to corresponding new value in List2
CreateRecord = Record.FromList(List2,List1),
ConvertedtoTable = Record.ToTable(CreateRecord),
//Filter table to just the row where the input originalColumnName matches
ReduceExcess = Table.SelectRows(ConvertedtoTable, each [Name] = originalColumnName),
//Return the matching result in the [Value] column (or give the original column name if there was no valid match)
NewColumnName = try ReduceExcess{0}[Value] otherwise originalColumnName
in
NewColumnName
in
MyFunctionSwitchColumnName
Here's where you use it as one of the parameters for Table.TransformColumnNames:
let
//list of original column names
List1= {"Name1","Name2","Name3","Name4"},
//Create table
Source = Table.FromRows({{1231,1233,4121,5232},{3546,3426,1246,3464}} , List1),
RenameColumns = Table.TransformColumnNames(Source, MyFuncRenameColumns)
in
RenameColumns
Hope that helps someone!

Related

Powerquery: passing column value to custom function

I'm struggling on passing the column value to a formula. I tried many different combinations but I only have it working when I hard code the column,
(tbl as table, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in
Table.AddColumn(tbl, "newcolname" , each ([column] - avg)/sdev)
I'd like to replace [column] by a variable. In fact, it's the column I use for the average and the standard deviation.
Please any help.
Thank you
This probably does what you want, called as x= fctn(Source,"ColumnA")
Does the calculations using and upon ColumnA from Source table
(tbl as table, col as text) =>
let
avg = List.Average(Table.Column(tbl,col)),
sdev = List.StandardDeviation(Table.Column(tbl,col))
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, col) - avg)/sdev)
Potentially you want this. Does the average and std on the list provided (which can come from any table) and does the subsequent calculations on the named column in the table passed over
called as x = fctn(Source,"ColumnNameInSource",SomeSource[SomeColumn])
(tbl as table, cname as text, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, cname) - avg)/sdev)

How can I expand all lists in a row of lists at the same time without repeating values?

In response to this question regarding how to expand all lists in a row of lists at the same time, #Carl Walsh kindly provided this succinct and helpful code:
let
Source = #table({"A", "B"}, {{ {1,2}, {3,4}} }),
Expanded = List.Accumulate(
Table.ColumnNames(Source),
Source,
(state, column) => Table.ExpandListColumn(state, column))
in
Expanded
Which yields this:
I would like to get this result instead:
I don't want values repeated in previously processed columns as each follow-on column is processed.
Is there a simple modification to Carl's code that will get me there?
Maybe not really simple, but effective: the code below combines the columns with adjusted code so the lists are zipped and the inner lists are transformed into records. Next the list column is expanded, resulting in a column with nested records which are subsequently expanded.
Unfortunately, you can't use combine columns with nested lists, so I created some dummy text columns first that I combined in order to generate the base code, which I subsequently adjusted and I removed the steps with the dummy columns.
let
Source = #table({"A", "B"}, {{ {1,2}, {3,4}} }),
#"Merged Columns" = Table.CombineColumns(Source,{"A", "B"}, each List.Transform(List.Zip(_), each Record.FromList(_,{"A","B"})),"Merged"),
#"Expanded Merged" = Table.ExpandListColumn(#"Merged Columns", "Merged"),
#"Expanded Merged1" = Table.ExpandRecordColumn(#"Expanded Merged", "Merged", {"A", "B"}, {"A", "B"})
in
#"Expanded Merged1"

Filter inner bag in Pig

The data looks like this:
22678, {(112),(110),(2)}
656565, {(110), (109)}
6676, {(2),(112)}
This is the data structure:
(id:chararray, event_list:{innertuple:(innerfield:chararray)})
I want to filter those rows where event_list contains 2. I thought initially to flatten the data and then filter those rows that have 2. Somehow flatten doesn't work on this dataset.
Can anyone please help?
There might be a simpler way of doing this, like a bag lookup etc. Otherwise with basic pig one way of achieving this is:
data = load 'data.txt' AS (id:chararray, event_list:bag{});
-- flatten bag, in order to transpose each element to a separate row.
flattened = foreach data generate id, flatten(event_list);
-- keep only those rows where the value is 2.
filtered = filter flattened by (int) $1 == 2;
-- keep only distinct ids.
dist = distinct (foreach filtered generate $0 as (id:chararray));
-- join distinct ids to origitnal relation
jnd = join a by id, dist by id;
-- remove extra fields, keep original fields.
result = foreach jnd generate a::id, a::event_list;
dump result;
(22678,{(112),(110),(2)})
(6676,{(2),(112)})
You can filter the Bag and project a boolean which says if 2 is present in the bag or not. Then, filter the rows which says that projection is true or not
So..
input = LOAD 'data.txt' AS (id:chararray, event_list:bag{});
input_filt = FOREACH input {
bag_filter = FILTER event_list BY (val_0 matches '2');
GENERATE
id,
event_list,
isEmpty(bag_filter.$0) ? false : true AS is_2_present:boolean;
;
};
output = FILTER input_filt BY is_2_present;

Linq: the best overloaded match has some invalid arguments

I do a linq to sql query to get a list:
var list1 = db.Table1.Where(a => a.item == "Widgets").ToList();
Now I want to get a list from another table using the results of list above:
var list2 = db.Table2.Where(a => list1.Contains(a.GUID)).ToList();
So far this all works as expected.
Now I want to do a query where I find all rows in another DB table that have GUIDs from my list2
var list3 = db.MyTable.Where(a => list2.Contains(a.GUID)).ToList();
The data types are all the same in the three tables so I know those match. But I get the best overloaded match has some invalid arguments?
You are missing the Where-clause in your third line:
var list3 = db.MyTable.Where(a => list2.Contains(a.GUID)).ToList();
EDIT: Okay, this was only a type and the question was edited, see new answer below.
Looking at your exception
System.Collections.Generic.List.Contains(Test1.Data.M‌​odels.Table1)' has some invalid arguments
We can see that list2 is of type List<Test1.Data.Models.Table1>, yet you try to run list2.Contains(long). You have to change
var list2 = db.Table2.Where(a => list1.Contains(a.GUID)).ToList();
to
var list2 = db.Table2.Where(a => list1.Contains(a.GUID)).Select(a => a.GUID).ToList();
Then list2 should be of type List.
I am personally not a big of var because you cannot extract the exact type of a variable from source code. If you change your vars to "real" data types you may see your problem far easier.

Hbase - get column names for row by column name prefix

I have a Hbase Table with the following description.
For a row key, my column would be of the form a_1, a_2,a_3,b_1,c_1,C_2 and so on, a compound key format.
Suppose one of my row is of the form
row key - row1
column family - c1
columns - a_1, a_2,a_3,b_1,b_2,c_1,C_2,d_9,d_99
Can I, by any operation retrieve a,b,c,d as the columns corresponding to row1, I am not bothered about whatever be the suffixes for a,b,c...
I can get all column names for a given row, add them to set by splitting the row keys by their first part and emit the set. I am worried, if there would be a better way of doing it by filters or some other hbase way of getting it done, please comment...
You can use COlumnPrefixFilter for that. You can see the following code
Configuration hadoopConf = new Configuration();
hadoopConf.set("hbase.zookeeper.quorum", "localhost");
hadoopConf.set("hbase.zookeeper.property.clientPort", "2181");
HTable hTable = new HTable(hadoopConf, "KunderaExamples");
Scan scan = new Scan();
scan.setFilter(new ColumnPrefixFilter("A".getBytes()));
ResultScanner scanner = hTable.getScanner(scan);
Iterator<Result> resultsIter = scanner.iterator();
while (resultsIter.hasNext())
{
Result result = resultsIter.next();
List<KeyValue> values = result.list();
for (KeyValue value : values)
{
System.out.println(value.getKey());
System.out.println(new String(value.getQualifier()));
System.out.println(value.getValue());
}
}

Resources