Power Query - conditional replace/clear entire cell in multiple columns - powerquery

I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew

Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format

Related

BIRT suppress multiple duplicate columns

I am working on a BIRT report. Its records are grouped on the basis of the status column. I was looking for an option in the Eclipse BIRT tool by which I can hide combinations of multiple columns in a row which are repeating. I have attached screenshots for both the current report and the expected report structure.
I tried the "suppress duplicate" option but that is limited to a single column. I am not able to apply this on multiple columns together. I couldn't figure out any other option. Please suggest any solution in the tool or do I need to change my query to return the result in the expected format?
Actual Result:
Expected Result:
There are three obvious ways to hide duplicate values.
All of these require you to configure this per column (BTW I don't understand why you consider this to be a problem).
As you already did: Use "suppress duplicates" at the column level.
Add more groups to your table after the existing group.
E.g. one group for the first column (whatever that is).
Then you can choose "Drop" "detail" in the properties of the corresponding group header cell. It's a bit difficult to get the layout right this way.
In your data set, if it's SQL, you can use a little construnct with CASE and the LAG analytic function to compare the column value to that of the previous row, and if they are equal, return NULL instead (pure SQL solution).

How to get the sum of values of a column in tmap?

I have 2 columns - Matches(Integer), Accounts_type(String). And i want to create a third column where i want to get proportions of matches played by different account types. I am new to Talend & am facing issue with this for past 2 days & did a lot of research but to no avail. Please help..
You can do it like this:
You need to read your source data twice (I used tFixedFlowInput_1 and tFixedFlowInput_2 with the same data). The idea is to calculate the total of your matches in tAggregateRow_1, it simply does a sum of all Matches without a group by column, then use that as a lookup.
The tMap then joins your source data with the calculated total. Since the total will always be one record, you don't need any join column. You then simply divide Matches by Total as required.
This is supposing you have unique values in Account_type; if you don't, you need to add another tAggregateRow between your source and tMap_1, in order to get sum of Matches for each Account_type (group by Account_type).

ArrayFormula column disappears when sorting in a filter view in Google Sheets

I'd like to use ArrayFormula to populate a column in spreadsheet, but when I Sort A->Z in a filter view, the ArrayFormula column vanishes. In some cases, the column includes a #REF! error about the range, and in some cases the column is just blank after the Sort. The following is a simplified version of what I'm trying to do (in my actual application, I'm doing a Vlookup to another sheet):
https://docs.google.com/spreadsheets/d/1XbqqedOjuSKuE-ZLIHNw59-r01EsNMpx7YVqOoxSOR4/edit?usp=sharing
The column 3 header uses an ArrayFormula to copy from column 1. If you go to the Filter 1 filter view, you'll note that column 3 is blank except for an error. This happens after I try to Sort Z->A on column 2. In my more complicated use-case, involving a Vlookup, after a Sort the column disappears entirely (leaving no #REF! error). Before sorting in both cases, everything is fine.
How do I make ArrayFormula values persist in filter views after sorting?
Thanks for your help!
I'm guessing that, because your references are normal (relative, not anchored/absolute), the range A2:A10 after sorting down turns into something absurd, like A7:A4, depending on actual sorted values.
Also, if you hover with your mouse on the #REF error, what does it tell you?
Anyway, try using absolute references in your formula:
=arrayformula({"Column 3"; A$2:A$10})
Edit
Fascinating. It's the first time I see this type of error. Taking it at face value, it seems that it's a limitation of Google Spreadsheets - you cannot use ARRAYFORMULAS spanning multiple rows inside sorted filter views, because, like I sort of guessed, it messes up the ARRAYFORMULA's range (as indicated by the fact that the formula is now in C4 instead of C1).
But that gives you also the solution: do not include the cell with the arrayformula in the filter view. Instead of making your filter view's target range A1:C20, make it A1:B20. Then the arrayformula in C1 will be untouched by the filter and will indeed continue to work.
I have found a solution for my usecase, in your case, it could be:
=arrayformula(if(row(C:C)=1;"Column 3";A:A))
But you'll need to consider the whole columns in your formulas.
Example
Have you tried A2:A?
If you don't put an ending row, means the end of the column.
It worked for me.
Cheers

Gnumeric Sort function

Can someone please direct me to a detailed explanation (link) of the Gnumeric sort function? The Gnumeric manual is abbreviated and has no examples. I haven't been able to find any appropriate info through the search engines and even Stackoverflow only has half a dozen questions on it which don't suit.
My problem is:
I have a table with rows of dates, names, and columns of data. (pretty straightforward stuff).
I want to sort ALL columns by the NAME column.
That is: keep each row intact for data but move them in the table up or down so that the order is alphabetic by name.
I can do this easily with Libercalc but prefer the feel and simplicity of Gnumeric, yet I have never been able to understand from the drop-down sort menu how to get this done. I can sort any column fine by itself, but can't seem to lock the other data in the row to be taken with it.
This is such a frequent function I'm surprised it's not made clearer in the drop-down menu. That is: Order by column x
The only way one can sort with Gnumeric, apparently, is to move the key column (i.e. in my case the NAME column) to be the left-most column (column A) in the table, and then sort, subsequently moving the columns back into their required format (date and time in first column) as I want it. This seems very clumsy to me and I wondered if there was an easier way of ordering a table in any format (e.g. just as it is imported from the csv file) by simply selecting the column to sort wherever it is in the table, as can be done in LiberCalc?
1) You need to select ALL the columns you want to sort:
menu > data > sort
2) Keep the column with the NAMEs to be sorted, and remove the rest of the columns in:
sort specification

Can I compare values in the same column in adjacent rows in PowerPivot?

I have a PowerPivot table for which I need to be able to determine how long an item was in an Error state. My data set looks something like this:
What I need to be able to do is to look at the values in the ID and State columns, and see if the value in the previous row is ERROR in the State column, and the same in the ID column. If it is, I then need to calculate the difference between the Changed Date values in those two rows.
So, for example, when I got to row 4, I would see that the value in the State column for Row 3, the previous row, is ERROR, and that the value in the ID column in the previous row is the same as the current row, so I would then calculate the difference between the Changed Date values in Row 3 and Row 4 (I don't care about the values in any of the other columns for this particular requirement).
Is there a way to do this in PowerPivot? I've done a fair amount of Internet searching, and it looks like if it can be done, it would use the EARLIER or EARLIEST DAX functions, but I can't find anything that tells me how, or even if, this can be done.
Thanks.
Chris,
I have had similar requirements many times and after a really long time of trial-and-error, I finally understood how EARLIER works. It can be very powerful, but also very slow so always check for the performance of your calculations.
To answer your question, you will need to create 4 calculated columns:
1) Item Rank - used for ranking the issues with same Item ID
=COUNTROWS(FILTER('ID', EARLIER([Item ID]) = [Item ID] && EARLIER([Date]) >= [Date]))
2) Follows Error - to easily find issue that follows EROR issue
=IF([State] = "EROR",[Item Rank]+1)
3) Time of Following Issue - simple lookup so that you can calculate the different
=IF([Follows Error]>0,
LOOKUPVALUE([Date], [User], [User], [Item Rank], [Follows Error]),
BLANK()
)
4) Time Diff - calculation of time different for the specific issue
=IF([State]="EROR",
DAY([Time of Following Issue])-DAY([Date]),
BLANK()
)
With those calculated columns, you can then easily create a powerpivot table, drag State and Item Id onto the ROWS pane and then simply add Time Diff to Values. You will get an overview of issues that contain string "EROR" issue and the time it took to resolve them.
This is what it looks like in PowerPivot window:
And the resulting Pivot table:
You can download my Excel file here (2013).
As I mentioned, be careful with the performance as the calculated columns with nested EARLIER and IF conditions might be a bit too performance-demanding. If there is a smarter way, I would be very happy to see it, but for now this works for me just fine.
Also, keep in mind that all calculated columns could be nested into 1, but I kept them separated to make it easier to understand the formulas.
Hope this helps :-)

Resources