I am trying to combine many tables that has a name that matches a patterns.
So far, I have extracted the table names from #shared and have the table names in a list.
What I haven't being able to do is to loop this list and transform in a table list that can be combined.
e.g. Name is the list with the table names:
Source = Table.Combine( { List.Transform(Name, each #shared[_] )} )
The error is:
Expression.Error: We cannot convert a value of type List to type Text.
Details:
Value=[List]
Type=[Type]
I have tried many ways but I am missing some kind of type transformation.
I was able to transform this list of tables names to a list of tables with:
T1 = List.Transform(Name, each Expression.Evaluate(_, #shared))
However, the Expression.Evaluate feels like an ugly hack. Is there a better way for this transformation?
With this list of tables, I tried to combine them with:
Source = Table.Combine(T1)
But I got the error:
Expression.Error: A cyclic reference was encountered during evaluation.
If I extract the table from the list with the index (e.g T1{2}) it works. So in this line of thinking, I would need some kind o loop to append.
Steps illustrating the problem.
The objective is to append (Tables.Combine) every table named T_\d+_Mov:
After filtering the matching table names in a table:
Converted to a List:
Converted the names in the list to the real tables:
Now I just need to combine them, and this is where I am stuck.
It is important to not that I don't want to use VBA for this.
It is easier to recreate the query from VBA scanning the ThisWorkbook.Queries() but it would not be a clean reload when adding removing tables.
The final solution as suggested by #Michal Palko was:
CT1 = Table.FromList(T1, Splitter.SplitByNothing(), {"Name"}, null, ExtraValues.Ignore),
EC1 = Table.ExpandTableColumn(CT1, "Name", Table.ColumnNames(CT1{0}[Name]) )
where T1 was the previous step.
The only caveat is that the first table must have all columns or they will be skiped.
I think there might be easier way but given your approach try to convert your list to table (column) and then expand that column:
Alternatively use Table.Combine(YourList)
Related
I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew
Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format
If I want to expand this embedded table...
...and I click on the expand button, I'm presented with the dropdown to select which columns I want to expand:
However, if I choose '(Select All Columns)' to expand them all, Power Query turns that into hard-coded column names of all the columns at the time I do that. Like this:
= Table.ExpandTableColumn(Source, "AllData", {"Column1", "Column2", "Column3", "Column4", "Custom"}, {"Column1", "Column2", "Column3", "Column4", "Custom"})
After that, if the underlying embedded table's columns change, the hard-coded column names will no longer be relevant and the query will "break."
So how can I tell it to dynamically identify and extract all of the current columns of the embedded table?
You can do something like this to get the list of column names:
List.Accumulate(Source[AllData], {}, (state, current) => List.Union({state, Table.ColumnNames(current)}))
This goes through each cell in the column, gets the column names from the table in that cell, and adds the new names to the result. It's easier to store this in a new step and then reference that in your next step.
Keep in mind that this method can be much slower than passing in the list of names you know about because it has to scan through the entire table to get the column names. You may also have problems if you use this for the third parameter in Table.ExpandTableColumn because it could use a column name that already exists.
Try using Table.Join which joins and expands the second table in one step.
"Merged Queries" = Table.Join(Source,{"Index.1"},Table2,{"Index.2"},JoinKind.LeftOuter)
You just need to make sure that the columns between the tables are unique.
Use Table.PrefixColumns to ensure column names are unique
I have a table called profile, and I want to order them by which ones are the most filled out. Each of the columns is either a JSONB column or a TEXT column. I don't need this to a great degree of certainty, so typically I've ordered as follow:
SELECT * FROM profile ORDER BY LENGTH(CONCAT(profile.*)) DESC;
However, this is slow, and so I want to create an index. However, this does not work:
CREATE INDEX index_name ON profile (LENGTH(CONCAT(*))
Nor does
CREATE INDEX index_name ON profile (LENGTH(CONCAT(CAST(* AS TEXT))))
Can't say I'm surprised. What is the right way to declare this index?
To measure the size of the row in text representation you can just cast the whole row to text, which is much faster than concatenating individual columns:
SELECT length(profile::text) FROM profile;
But there are 3 (or 4) issues with this expression in an index:
The syntax shorthand profile::text is not accepted in CREATE INDEX, you need to add extra parentheses or default to the standard syntax cast(profile AS text)
Still the same problem that #jjanes already discussed: only IMMUTABLE functions are allowed in index expressions and casting a row type to text does not pass this requirement. You could build a fake IMMUTABLE wrapper function, like Jeff outlined.
There is an inherent ambiguity (that applies to Jeff's answer as well!): if you have a column name that's the same as the table name (which is a common case) you cannot reference the row type in CREATE INDEX since the identifier always resolves to the column name first.
Minor difference to your original: This adds column separators, row decorators and possibly escape characters to the text representation. Shouldn't matter much to your use case.
However, I would suggest a more radical alternative as crude indicator for the size of a row: pg_column_size(). Even shorter and faster and avoids issues 1, 3 and 4:
SELECT pg_column_size(profile) FROM profile;
Issue 2 remains, though: pg_column_size() is also only STABLE. You can create a simple and cheap SQL wrapper function:
CREATE OR REPLACE FUNCTION pg_column_size(profile)
RETURNS int LANGUAGE sql IMMUTABLE AS
'SELECT pg_catalog.pg_column_size($1)';
and then proceed like #jjanes outlined. More details:
Does PostgreSQL support "accent insensitive" collations?
Note that I created the function with the row type profile as parameter. Postgres allows function overloading, which is why we can use the same function name. Now, when we feed the matching row type to pg_column_size() our custom function matches more closely according to function type resolution rules and is picked instead of the polymorphic system function. Alternatively, use a separate name and possibly make the function polymorphic as well ...
Related:
Is there a way to disable function overloading in Postgres
You can declare a function which is falsely marked "immutable" and build an index on that.
CREATE OR REPLACE FUNCTION len_immut(record)
RETURNS int
LANGUAGE plperl
IMMUTABLE
AS $function$
## This function lies about its immutability.
## Use it with care. It is useful for indexing
## entire table rows.
return length(join ",", values %{$_[0]});
$function$
and then
create index on profile (len_immut(profile));
SELECT * FROM profile ORDER BY len_immut(profile) DESC;
Since the function is falsely labelled as immutable, the index may become out of date if you do things like add or drop columns on the table, or change the types of columns.
I'm trying to get the following done: Using Altova Mapforce, I use an XML file with schema as a source. I want to map it to exactly the same output, but only add data to one field.
The value of the field (it's Tax) is determined using a two table SQL join with a WHERE clause over both tables. The tables are joined using foreign keys, the relation is recognized by Mapforce.
The first field of the WHERE clause comes from the first table (header type table), the second and third field from the second tables (lines type tables).
However, I cannot seem to create the logical and correct equivalent of what I am describing here. I've tried it using complex AND constructions where it then inserts the one field I would need multiple times. I've tried WHERE clauses but they fail as they never supply both tables at the same time and there seems to be no way to use a pre-specified JOINing of two tables as a source. The WHERE clause then recognizes only the fields from the first table, not the second one.
Is there an example for this? Joining two (or more) tables, using WHERE to determine the exact row, then using a value from that row?
Best wishes.
Imagine I have the following table available to me:
A: { x: int, y: int, z: int, ...99 other columns... }
I now want to transform this, such that z is set to NULL where x > y, with the resulting dataset to be stored as B.
and I want to do it without having to explicitly mention all the other columns, as this becomes a maintenance nightmare.
Is there a simple solution?
This issue is tracked in this JIRA:
PIG-1693 There needs to be a way in foreach to indicate "and all the rest of the fields"
Currently I don't know anything simpler than doing what you say or not loading Z and adding a new column Z with the star expression.
I was able to drop some of the column bloat by nesting them in single-row bags and flattening afterwards.
Still, it feels like a bit of a hack. So I'm also investigating cascading to see if it's a better fit for my scenario.
A feature to facilitate your scenario was added in Pig 0.9. The new project-range operator (..) allows you to express a range of fields by indicating the starting and/or ending field names as in this example:
result = FOREACH someInput GENERATE field1, field2, null as field3, field4 .. ;
In the example above field1/2/3/4 are actual field names. One of the fields is set to null while the other fields are kept intact.
More details in this "New Apache Pig 0.9 Features – Part 3" article: http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/
To solve your specific problem you probably want to do a FILTER and an UNION to combine the results.
Of course you can select columns by column number, but that can easily become a nightmare if you change anything at all. I have found column names to be much more stable, and therefore I recommend the following solution:
Update mycol when it is between two known columns
You can use .. to indicate leading, or trailing columns (or inbetween columns). Here is how that would work out if you want to change the value of 'MyCol' to 'updatedvalue'.
aliasAfter = FOREACH aliasBefore GENERATE
.. colBeforeMyCol, updatedvalue, colAfterMyCol ..;