Say I have data like this in one column in a Google Sheets:
abc
abc
def
def
abc
abc
xyz
I want something that will give me an output treating non-consecutive duplicates as unique, i.e.:
abc
def
abc
xyz
Using =UNIQUE() gives me only one instance of abc.
Use filter to remove rows or columns equal to the preceding value.
=filter(A2:A6, A2:A6 <> A1:A5)
You can use this simple formula (I'm sorry if it is too simple):
=if(A1=A2,"",A1)
This spreadsheet may help you see the formulas.
Treat non-consecutive duplicates in a spreadsheet column as unique
I hope it helps!
Related
I am having some trouble with a query in Crystal 2008. I have two tables with columns that are loosely related, both contain addresses. One table column is just a street name while the other is a street name plus some additional info. I want to find all records where these have the same street name and only show those. Example below:
Address
AddressB
123 St
123 St, ABC City
123 St
345 St, ABC City
I have tried using a formula such as below
if({AddressB} startswith {Address}) then {AddressB} else 'ERROR'
I have also tried this with LIKE and as well as * wildcards. Nothing seems to work. I will admit I am pretty amateur-ish with SQL and crystal so formulas are a new frontier for me writing reports. Also I should note that tables are linked appropriately with inner joins.
Any help would be greatly appreciated!
This should work. Perhaps your {Address} column is padded with spaces, so try:
IF ({AddressB} startswith Trim({Address})) THEN {AddressB} ELSE 'ERROR'
Test the effect of replacing the reference to the column name with the static text value that you "think" is in that column.
If you get a different behavior, what you think is in that column is not what is actually in that column. For example, the column might contain non-printable characters. You can get rid of those using the Replace function.
If you don't get a different behavior, then show us the expression with the static text values. That would allow us to replicate the behavior and understand the situation.
Note: the problem might be in your table join logic. If you have no join condition, then all records in TableA would join to all the records in TableB. In that case, you need to place the fields in the detail section to get a proper sense of what is being compared to what. Or rethink your join logic. Perhaps you should move one table to a subreport, or a SQL Expression instead of trying to include both tables in the main report.
I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew
Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format
I hope to explain clearly what I'm trying to do.
I have some data in column B, grouped as shown in column A.
I'd like to count the unique values for each group present in column A, not taking into account the unique values already counted in the previous group(s).
For instance, I'd like to:
count unique values in 'proyecto2' NOT COUNTING the unique values already present in 'proyecto1'.
count unique values in 'proyecto3' NOT COUNTING the unique values already present in 'proyecto1' and 'proyecto2'.
count unique values in 'proyecto4' NOT COUNTING the unique values already present in 'proyecto1', 'proyecto2' and 'proyecto3'.
and so on...
Below you can find a Google Sheet with the solution I found, even if I'm not very happy with it, to show easily what I mean.
https://docs.google.com/spreadsheets/d/1x8S76_6dUnHr1NtUbzNzpLTpQtqan6_ohemcrGsrpC0/edit#gid=0
Basically, in column A:B, we have the INPUT DATA. You can add data in column A and B to see how it works (my method, at the moment, only work if you add one of these groups in column A: 'proyecto1', 'proyecto2', 'proyecto3', 'proyecto4', 'proyecto5' and 'proyecto6').
In column D:E, we have the output data, basically, the unique values counted by the group.
In column G:W, the formula to process the data.
Clearly, my method is working up to 'proyecto6' since in the "processing columns" I'm taking into account formula only up to 'proyecto6'.
Everything is working but my question is: could you suggest me a more dynamic way of achieving what I'm trying to do? Or the only way is to write some code?
delete everything in range D:Z
paste to D2 cell:
=UNIQUE(A2:A)
paste to E2 cell:
=ARRAYFORMULA(IF(LEN(D2:D);
MMULT(IFERROR(LEN(G2:Z)/LEN(G2:Z); 0); TRANSPOSE(COLUMN(G2:Z2)^0)); ))
paste to G2 cell:
=TRANSPOSE(UNIQUE(FILTER(B$2:B; A$2:A=D2)))
paste to G3 cell and drag down:
=TRANSPOSE(UNIQUE(FILTER(FILTER(B$2:B; A$2:A=D3);
NOT(COUNTIF(INDIRECT("G2:"&ROW()-1); FILTER(B$2:B; A$2:A=D3))))))
I have a list of names (never over 100 names) with a value for each of them, either 3 or 4 digits.
john2E=1023
mary2E=1045
fred2E=968
And so on... They're formatted exactly like that in the .txt file. I have Python and Excel, also willing to download whatever I need.
What I want to do is sort all the names according to their values in a descending order so highest is on top. I've tried to use Excel by replacing the '2E=' with ',' so I can have the name,value then important the data so each are in separate columns but I still couldn't sort them any other way than A to Z.
Help is much appreciated, I did take my time to look around before posting this.
Replace the "2E=" with a tab character so that the data is displayed in excel in two columns. Then sort on the value column.
Imagine I have the following table available to me:
A: { x: int, y: int, z: int, ...99 other columns... }
I now want to transform this, such that z is set to NULL where x > y, with the resulting dataset to be stored as B.
and I want to do it without having to explicitly mention all the other columns, as this becomes a maintenance nightmare.
Is there a simple solution?
This issue is tracked in this JIRA:
PIG-1693 There needs to be a way in foreach to indicate "and all the rest of the fields"
Currently I don't know anything simpler than doing what you say or not loading Z and adding a new column Z with the star expression.
I was able to drop some of the column bloat by nesting them in single-row bags and flattening afterwards.
Still, it feels like a bit of a hack. So I'm also investigating cascading to see if it's a better fit for my scenario.
A feature to facilitate your scenario was added in Pig 0.9. The new project-range operator (..) allows you to express a range of fields by indicating the starting and/or ending field names as in this example:
result = FOREACH someInput GENERATE field1, field2, null as field3, field4 .. ;
In the example above field1/2/3/4 are actual field names. One of the fields is set to null while the other fields are kept intact.
More details in this "New Apache Pig 0.9 Features – Part 3" article: http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/
To solve your specific problem you probably want to do a FILTER and an UNION to combine the results.
Of course you can select columns by column number, but that can easily become a nightmare if you change anything at all. I have found column names to be much more stable, and therefore I recommend the following solution:
Update mycol when it is between two known columns
You can use .. to indicate leading, or trailing columns (or inbetween columns). Here is how that would work out if you want to change the value of 'MyCol' to 'updatedvalue'.
aliasAfter = FOREACH aliasBefore GENERATE
.. colBeforeMyCol, updatedvalue, colAfterMyCol ..;