Powerquery: Expand all columns of that have records in them - powerquery

Using Power Query in Microsoft Excel 2013, I created a table that looks like this:
// To insert this in Power Query, append a '=' before the 'Table.FromRows'
Table.FromRows(
{
{"0", "Tom", "null", "null"},
{"1", "Bob", [ name="Berlin" , street="BarStreet" ], [ name="Mary", age=25 ]},
{"2", "Jim", [ name="Hamburg", street="FooStreet" ], [ name="Marta", age=30 ]}
},
{"ID", "Name", "Address", "Wife"}
)
Now, I want to expand the columns Address and Wife by using the name attribute
on both records. Manually, I would do it like this:
// To insert this in Power Query, append a '=' before the 'Table.FromRows'
let
t = Table.FromRows(
{
{"0", "Tom", "null", "null"},
{"1", "Bob", [ name="Berlin" , street="BarStreet" ], [ name="Mary", age=25 ]},
{"2", "Jim", [ name="Hamburg", street="FooStreet" ], [ name="Marta", age=30 ]}
},
{"ID", "Name", "Address", "Wife"}
),
expAddress = Table.ExpandRecordColumn(t, "Address", {"name"}, {"Address β†’ name"}),
expWife = Table.ExpandRecordColumn(expAddress, "Wife", {"name"}, {"Wife β†’ name"})
in
expWife
Background
Whenever I have data tables that have a different layout, I need to rewrite the
query. In a fantasy world, you could expand all columns that have Records in
them using a specific key. Ideally, you would have the following library
functions:
// Returns a list with the names of the columns that match the secified type.
// Will also try to infer the type of a column if the table is untyped.
Table.ColumnsOfTypeInfer(
table as table,
listOfTypes as list
) as list
// Expands a column of records into columns with each of the values.
Table.ExpandRecordColumnByKey(
table as table,
columns as list,
key as text,
) as table
Then, I could call
// To insert this in Power Query, append a '=' before the 'Table.FromRows'
let
t = Table.FromRows(
{
{"0", "Tom", "null", "null"},
{"1", "Bob", [ name="Berlin" , street="BarStreet" ], [ name="Mary", age=25 ]},
{"2", "Jim", [ name="Hamburg", street="FooStreet" ], [ name="Marta", age=30 ]}
},
{"ID", "Name", "Address", "Wife"}
),
recordColumns = Table.ColumnsOfTypeInfer(t, {type record}),
expAll = Table.ExpandRecordColumnByKey(t, recordColumns, "name")
in
expAll
Question
Can you get a list of columns with a specific type that is not specified in the table, aka infer it?
Can you make that record expansion generic?
Edit: Added row #0 with two null cells.

(First off, thanks for the clear explanation and sample data and suggestions!)
1) There's no way in M code to do type inference. This limitation might almost be considered a "feature", because if the source data changes in a way that causes the inferred type to be different, it will almost certainly break your query.
Once you load your untyped data, it should be quick to use the Detect Data Type button to generate the M for this. Or if you are reading data from JSON it should be mostly typed enough already.
If you have a specific scenario where this doesn't work want to update your question? :)
2) It's very possible and only a little convoluted to make the record expansion generic, as long as the cell values of the table are records. This finds columns where all rows are either null or a record and expands the name column.
Here's some simple implementations you can add to your library:
let
t = Table.FromRows(
{
{"0", "Tom", null, null},
{"1", "Bob", [ name="Berlin" , street="BarStreet" ], [ name="Mary", age=25 ]},
{"2", "Jim", [ name="Hamburg", street="FooStreet" ], [ name="Marta", age=30 ]}
},
{"ID", "Name", "Address", "Wife"}
),
Table.ColumnsOfAllRowType = (table as table, typ as type) as list => let
ColumnNames = Table.ColumnNames(table),
ColumnsOfType = List.Select(ColumnNames, (name) =>
List.AllTrue(List.Transform(Table.Column(table, name), (cell) => Type.Is(Value.Type(cell), typ))))
in
ColumnsOfType,
Table.ExpandRecordColumnByKey = (table as table, columns as list, key as text) as table =>
List.Accumulate(columns, table, (state, columnToExpand) =>
Table.ExpandRecordColumn(state, columnToExpand, {key}, { columnToExpand & " β†’ " & key })),
recordColumns = Table.ColumnsOfAllRowType(t, type nullable record),
expAll = Table.ExpandRecordColumnByKey(t, recordColumns, "name")
in
expAll
If a new library function can be implemented in just M we're less likely to add it to our standard library, but if you feel it is missing feel free to suggest it at: https://ideas.powerbi.com/forums/265200-power-bi/
You might have a good argument for adding something like Table.ReplaceTypeFromFirstRow(table as table) as table, because constructing the type with M is very messy.

Sorry to come to this a bit late, but I just had a similar challenge. I tried using Chris Webb's ExpandAll function:
http://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
... but that only works on Table-type columns, not Record-type columns, but I have managed to hack it to that purpose. I duplicated Chris' function as "ExpandAllRecords" and made 3 edits: :
replaced each if _ is table then Table.ColumnNames(_) with each if _ is record then Record.FieldNames(_)
replaced Table.ExpandTableColumn with Table.ExpandRecordColumn
replaced ExpandAll with ExpandAllRecords
I tried getting both tables and records expanding in one function, but I kept getting type errors.
Anyway, with that in place, the final query is just:
let
t = Table.FromRows(
{
{"1", "Tom", null, [ name="Jane", age=35 ]},
{"2", "Bob", [ name="Berlin" , street="BarStreet" ], [ name="Mary", age=25 ]},
{"3", "Jim", [ name="Hamburg", street="FooStreet" ], [ name="Marta", age=30 ]}
},
{"ID", "Name", "Address", "Wife"}
),
Output = ExpandAllRecords(t)
in
Output
Edit:
Out of concern that that one day the great snippet (by Chris Webb, mentioned by #MikeHoney) will one day disappear), I'll mirror the entire code here:
let
//Define function taking two parameters - a table and an optional column number
Source = (TableToExpand as table, optional ColumnNumber as number) =>
let
//If the column number is missing, make it 0
ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,
//Find the column name relating to the column number
ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},
//Get a list containing all of the values in the column
ColumnContents = Table.Column(TableToExpand, ColumnName),
//Iterate over each value in the column and then
//If the value is of type table get a list of all of the columns in the table
//Then get a distinct list of all of these column names
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents,
each if _ is table then Table.ColumnNames(_) else {}))),
//Append the original column name to the front of each of these column names
NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),
//Is there anything to expand in this column?
CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,
//If this column can be expanded, then expand it
ExpandedTable = if CanExpandCurrentColumn
then
Table.ExpandTableColumn(TableToExpand, ColumnName,
ColumnsToExpand, NewColumnNames)
else
TableToExpand,
//If the column has been expanded then keep the column number the same, otherwise add one to it
NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,
//If the column number is now greater than the number of columns in the table
//Then return the table as it is
//Else call the ExpandAll function recursively with the expanded table
OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1)
then
ExpandedTable
else
ExpandAll(ExpandedTable, NextColumnNumber)
in
OutputTable
in
Source
You can then use this function on the XML file as follows:
let
//Load XML file
Source = Xml.Tables(File.Contents("C:\Users\Chris\Documents\PQ XML Expand All Demo.xml")),
ChangedType = Table.TransformColumnTypes(Source,{{"companyname", type text}}),
//Call the ExpandAll function to expand all columns
Output = ExpandAll(ChangedType)
in
Output
(Source and downloadable example: Chris Webb's Bi Blog, 2014-05-21)

Related

Fix an empty list in PowerQuery to a RESTFUL API

Forgive me I am very novice, I am experimenting with PQ in Excel to pull some sales data from a REST API https://docs.vendhq.com/reference/2/spec/sales/listsales
My query looks like the below
let
MaxVersion = Excel.CurrentWorkbook(){[Name="Versions"]}[Content]{0}[Sales],
Source = Json.Document(Web.Contents("https://*****.vendhq.com/api/2.0/sales?after=" &Text.From(MaxVersion) , [Headers=[Authorization="Bearer **********************************"]])),
data = Source[data],
#"Converted to Table" = Table.FromList(data, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "outlet_id", "register_id", "user_id", "customer_id", "invoice_number", "source", "source_id", "status", "note", "short_code", "return_for", "total_price", "total_tax", "total_loyalty", "created_at", "updated_at", "sale_date", "deleted_at", "line_items", "payments", "adjustments", "version", "receipt_number", "total_price_incl", "taxes"}, {"Column1.id", "Column1.outlet_id", "Column1.register_id", "Column1.user_id", "Column1.customer_id", "Column1.invoice_number", "Column1.source", "Column1.source_id", "Column1.status", "Column1.note", "Column1.short_code", "Column1.return_for", "Column1.total_price", "Column1.total_tax", "Column1.total_loyalty", "Column1.created_at", "Column1.updated_at", "Column1.sale_date", "Column1.deleted_at", "Column1.line_items", "Column1.payments", "Column1.adjustments", "Column1.version", "Column1.receipt_number", "Column1.total_price_incl", "Column1.taxes"}),
#"Expanded Column1.line_items" = Table.ExpandListColumn(#"Expanded Column1", "Column1.line_items"),
#"Expanded Column1.line_items1" = Table.ExpandRecordColumn(#"Expanded Column1.line_items", "Column1.line_items", {"id", "product_id", "tax_id", "discount_total", "discount", "price_total", "price", "cost_total", "cost", "tax_total", "tax", "quantity", "loyalty_value", "note", "price_set", "status", "sequence", "gift_card_number", "tax_components", "promotions", "total_tax", "total_cost", "total_discount", "total_loyalty_value", "total_price", "unit_cost", "unit_discount", "unit_loyalty_value", "unit_price", "unit_tax", "is_return"}, {"Column1.line_items.id", "Column1.line_items.product_id", "Column1.line_items.tax_id", "Column1.line_items.discount_total", "Column1.line_items.discount", "Column1.line_items.price_total", "Column1.line_items.price", "Column1.line_items.cost_total", "Column1.line_items.cost", "Column1.line_items.tax_total", "Column1.line_items.tax", "Column1.line_items.quantity", "Column1.line_items.loyalty_value", "Column1.line_items.note", "Column1.line_items.price_set", "Column1.line_items.status", "Column1.line_items.sequence", "Column1.line_items.gift_card_number", "Column1.line_items.tax_components", "Column1.line_items.promotions", "Column1.line_items.total_tax", "Column1.line_items.total_cost", "Column1.line_items.total_discount", "Column1.line_items.total_loyalty_value", "Column1.line_items.total_price", "Column1.line_items.unit_cost", "Column1.line_items.unit_discount", "Column1.line_items.unit_loyalty_value", "Column1.line_items.unit_price", "Column1.line_items.unit_tax", "Column1.line_items.is_return"})
in
#"Expanded Column1.line_items1"
The query works great once, it passes a parameter from the excel workbook to use as the version for pulling in sales, after the last refresh (which gets populated from a second query, not sure if its possible to combine the queries??).
The other problem is if on the refresh the data is empty (no new sales) or if it needs to paginate the data I cant quite workout the syntax of how to iteratively call this as a function and include an escape clause for if the data returned is empty.
Seems appropriate to use List.Generate here.
I can't test the code below (as I don't have the API credentials or know the exact endpoint), so you'll need to check if it returns all/expected pages.
let
getSalesData = (maxVersion as number) =>
let
url = "https://YOUR_DOMAIN_PREFIX.vendhq.com/api/2.0/sales",
requestOptions = [
Query = [after = Number.ToText(maxVersion)],
Headers = [Authorization = "Bearer YOUR_API_KEY"]
],
response = Web.Contents(url, requestOptions),
deserialised = Json.Document(response)
in deserialised,
allPagesOfSalesData = List.Generate(
() => getSalesData(0),
each not List.IsEmpty([data]),
each getSalesData([version][max])
)
in
allPagesOfSalesData
I pass in 0 as an argument for the first API call even though the documentation (https://docs.vendhq.com/reference/introduction/pagination#first-page) states:
"By default, the value of the after parameter will be assumed as equal 0 so it’s not necessary to use it on the first page."
The list allPagesOfSalesData stops being generated when the data field contains an empty list, which I think is what the documentation (https://docs.vendhq.com/reference/introduction/pagination#subsequent-pages) suggests:
"This should be repeated until an empty collection is returned. This will mean that all items of the collection have been returned."
But I've assumed that "empty collection" refers to the data field (in the JSON response).
You should get a list of tables (for allPagesOfSalesData), which you can then combine into a single table (using Table.Combine).
You might want to use Table.FromRecords (instead of the Table.FromList and Table.ExpandRecordColumn) when transforming the response.
Hopefully it works, but I can't test it.

Index duplicates from the list using Linq

Suppose I have a list of strings
var data = new List<string>{"fname", "phone", "lname", "home", "home", "company", "phone", "phone"};
I would like to list all values and add index to duplicates like this
fname,
phone,
lname,
home,
home[1],
company,
phone[1],
phone[2]
or like this
fname,
phone[0],
lname,
home[0],
home[1],
company,
phone[1],
phone[2]
The both solutions would work for me.
Is that possible with Linq?
You can use LINQ GroupBy to gather the matches, and then the counting version of Select to append the indexes.
var ans = data.GroupBy(d => d).SelectMany(dg => dg.Select((d, n) => n == 0 ? d : $"{d}[{n}]"));

RethinkDB filter array, return only matched values

I have a table like this
{
dummy: [
"new val",
"new val 2",
"new val 3",
"other",
]
}
want to get only matched values to "new", I am using query like this:
r.db('db').table('table')('dummy').filter(function (val) {
return val.match("^new")
})
but its giving error
e: Expected type STRING but found ARRAY in
what is wrong with query, if I remove .match("^new"), it returns all values
Thanks
The reason of why you're getting Expected type STRING but found ARRAY in is that the value of the dummy field is an array itself, and you cannot apply match to arrays.
Despite the filter you tried may look confusing, you have to rethink your query a bit: just remap the array to a new ^new-keyed array, and then just filter its values out in an inner expression.
For example:
r.db('db')
.table('table')
.getField('dummy')
.map((array) => array.filter((element) => element.match("^new")))
Output:
[
"new val" ,
"new val 2" ,
"new val 3"
]

How do I dynamically name a collection?

Title: How do I dynamically name a collection?
Pseudo-code: collect(n) AS :Label
The primary purpose of this is for easy reading of the properties in the API Server (node application).
Verbose example:
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
CASE
WHEN n:Movie THEN "movies"
WHEN n:Actor THEN "actors"
END as type, collect(n) as :type
Expected output in JSON:
[{
"user": {
....
},
"movies": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
],
"actors:" [ .... ]
}]
The closest I've gotten is:
[{
"user": {
....
},
"type": "movies",
"collect(n)": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
]
}]
The goal is to be able to read the JSON result with ease like so:
neo4j.cypher.query(statement, function(err, results) {
for result of results
var user = result.user
var movies = result.movies
}
Edit:
I apologize for any confusion in my inability to correctly name database semantics.
I'm wondering if it's enough just to output the user and their lists of both actors and movies, rather than trying to do a more complicated means of matching and combining both.
MATCH (user:User)
OPTIONAL MATCH (user)--(m:Movie)
OPTIONAL MATCH (user)--(a:Actor)
RETURN user, COLLECT(m) as movies, COLLECT(a) as actors
This query should return each User and his/her related movies and actors (in separate collections):
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
REDUCE(s = {movies:[], actors:[]}, x IN COLLECT(n) |
CASE WHEN x:Movie
THEN {movies: s.movies + x, actors: s.actors}
ELSE {movies: s.movies, actors: s.actors + x}
END) AS types;
As far as a dynamic solution to your question, one that will work with any node connected to your user, there are a few options, but I don't believe you can get the column names to be dynamic like this, or even the names of the collections returned, though we can associate them with the type.
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as nodes
WITH user, {type:type, nodes:nodes} as connectedNodes
RETURN user, COLLECT(connectedNodes) as connectedNodes
Or, if you prefer working with multiple rows, one row each per node type:
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as collection
RETURN user, {type:type, data:collection} as connectedNodes
Note that LABELS(n) returns a list of labels, since nodes can be multi-labeled. If you are guaranteed that every interested node has exactly one label, then you can use the first element of the list rather than the list itself. Just use LABELS(n)[0] instead.
You can dynamically sort nodes by label, and then convert to the map using the apoc library:
WITH ['Actor','Movie'] as LBS
// What are the nodes we need:
MATCH (U:User)--(N) WHERE size(filter(l in labels(N) WHERE l in LBS))>0
WITH U, LBS, N, labels(N) as nls
UNWIND nls as nl
// Combine the nodes on their labels:
WITH U, LBS, N, nl WHERE nl in LBS
WITH U, nl, collect(N) as RELS
WITH U, collect( [nl, RELS] ) as pairs
// Convert pairs "label - values" to the map:
CALL apoc.map.fromPairs(pairs) YIELD value
RETURN U as user, value

how to insert an array of id's into linq-to-sql query line in .net mvc 3

i want to get an array of id's which is like ["15", "26", "37", "48", "90"] and i want to get my remaining items from my remaining table that doesnt includes these supplier id's..
here what i done so far:
string[] arrgroupdetails;
arrgroupdetails = dataContext.GroupDetails.Select(c => c.supplier_id).ToArray();
var items = from thingies in dataContext.remainings where thingies.supplier_id.ToString() != arrgroupdetails.Any().ToString() select thingies;
so how can i achive this?
By heart, so do check syntax but someething like this should work:
var items = from thingies in dataContext.remainings
where !arrgroupdetails.Contains(thingies.supplier_id.ToString())
select thingies;

Resources