How to filter one list of items from another list of items? - filter

I have a huge list of items in Column A (1,000 items) and a smaller list of items in Column B (510 items).
I want to put a formula in Column C to show only the Column A items not in Column B.
How to achieve this through a formula, preferably a FILTER formula?

Select the list in column A
Right-Click and select Name a Range...
Enter "ColumnToSearch"
Click cell C1
Enter this formula: =MATCH(B1,ColumnToSearch,0)
Drag the formula down for all items in B
If the formula fails to find a match, it will be marked "#N/A", otherwise it will be a number.
If you'd like it to be TRUE for match and FALSE for no match, use this formula instead:
=ISNUMBER(MATCH(B1,ColumnToSearch,0))
If you'd like to return the unfound value and return empty string for found values
=IF(ISNUMBER(MATCH(B1,ColumnToSearch,0)),"",B1)

Alternative method is simply =
FILTER(A1:A,if(COUNTIF(B1:B,A1:A),0,1))
It's much more efficient.
It uses countif to get a 0 or a 1 as an array if the values in B are in A, then it reverses the 0 and 1 to get the values that are missing instead of only the values that are in there. It then filters based on that.
Columns look like this
A B
1 2
2 5
3
4
5

ARE formulae:
=FILTER(A1:A, MATCH(A1:A, B1:B, 0))
=FILTER(A1:A, COUNTIF(B1:B, A1:A))
ARE NOT formulae:
=FILTER(A1:A, ISNA(MATCH(A1:A, B1:B, 0)))
=FILTER(A1:A, NOT(COUNTIF(B1:B, A1:A)))
in your case:
=FILTER(A1:A; ISNA(MATCH(A:A; B:B; )))
if you face a mismatch of ranges see: https://stackoverflow.com/a/54795616/5632629

Related

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

Array Formula For Maxifs

I feel like my question should be easy to figure out, but I've looked around and can't seem to find out how to get a basic array spill function that produces the max value. Here's my simplified data set:
Col A
Col B
Apple
864
Carrot
189
Pear
256
Apple
975
Pear
873
Carrot
495
Apple
95
Pear
36
Carrot
804
My objective is to have a unique list of food (from Col A), that returns the max corresponding Value from Col B. The formula for unique list from Col A is easy... =UNIQUE(filter(A:A,A:A<>"")), what I'm struggling with is getting a dynamic maxifs to align with this.
To illustrate, if I put the unique function in cell D2 (thus it would spill to d4 as shown below in blue), a correct corresponding non-array function would be =MAXIFS(B:B,A:A,D2) (shown in column e). I could drag this down the remaining rows but I would like this to be dynamic as there may be more food in my data set in the future.
What I would EXPECT to work is... =filter(MAXIFS(B:B,A:A,D2:D),D2:D<>"") but this returns #Value!. By comparison, if I were to use sumif/Average, =filter(SUMIF(A:A,D2:D,B:B),D2:D<>""), I get what I WOULD expect (which really confuses me).
Is there a way to get a dynamic maxifs (or any function that produces an equal value in column E) that would spill based on unique values in column D?
try:
=QUERY({A:B}, "select Col1,max(Col2) where Col2 is not null group by Col1 label max(Col2)''")
bonus:
=QUERY({A:B}, "select Col1,max(Col2),sum(Col2) where Col2 is not null group by Col1 label max(Col2)'',sum(Col2)''")
bonus 2:
=SORTN(SORT(A1:B, 2, ), 9^9, 2, 1, 1)
2 - sort the second column of range A1:B
<empty> - or 0 or FALSE = "in descending order"
9^9 - output all rows
2 - 2nd mode of SORTN = "group by..."
1 - 1st column
1 - in ascending order
Responding to provide a more clear answer and simplification as others see this looking for same:
The easiest way to accomplish this is by using an array formula such as:
=MAX(IF($A$1:$A$7="Apple",$B$1:$B%7)) followed by CTRL-SHIFT-ENTER

How do I do fill down with formula in reverse order in Google Sheet

When I fill down with formula starting with, say, =A9, it will auto fill with =A10, =A11, A=12, in incrementing order. But I want it to fill down in reverse order as =A8, =A7, =A6, how do I do that? In Excel, I can select 2 cells, =A10 and =A9, Excel will know to fill in reverse order.
You could use the SEQUENCE function within the formula
=SEQUENCE(7,1,11,-1)
or even as
=SEQUENCE(7,1,11,-1)
EDIT
If you need to reverse the order of already existing values in cells you can use:
=SORT(A1:A9,ROW(A1:A9),0)
or even the following to exclude empty rows
=SORT(A1:A9,ROW(A1:A9)*N(A1:A9<>""),0)
try like this:
=SORT(ROW(A:A), 1, 0)
or if you want from 10 to 5
=SORT(ROW(A5:A10), 1, 0)
or to flip the column values:
=SORT(A9:A15, ROW(A9:A15)*N(A9:A15<>""), 0)

how to ARRAY specific cells based on rules?

is there a (maybe a one (?) formula)-way how to pick all green cells (but only those which has numbers and excluding 0) in a row and put/list them in an array to that coresponding row ??
example: in cell AO1 there will be formula that will list these results:
AO1 = 647
AP1 = 2806
AQ1 = 15490
AR1 = 32105
AS1 = 33808
something like array of constants but constant will be a cell reference... I can only think of a hard way to doing it like make a table/grid of all green cells and then array them, but not sure how could I exlude things from arraying (things like: skip empty cell and skip cell that is "<1" )
edit: in another words: cell AO1: =arrayformula({$p$1;$r$1;$t$1;$v$1;$x$1;$z$1;$ab$1;$ad$1;$af$1;$ah$1;$aj$1;$al$1};and dont array empty and "<1" cells)
If the row are fixed could simply use filter on all the rows
like this : (I used the range you give in your question)
=FILTER(
{$p$1;$r$1;$t$1;$v$1;$x$1;$z$1;$ab$1;$ad$1;$af$1;$ah$1;$aj$1;$al$1};
{$p$1;$r$1;$t$1;$v$1;$x$1;$z$1;$ab$1;$ad$1;$af$1;$ah$1;$aj$1;$al$1}>0)
And for the K and the unique you can add them like this:
=ARRAYFORMULA(UNIQUE(FILTER(
{$p$1;$r$1;$t$1;$v$1;$x$1;$z$1;$ab$1;$ad$1;$af$1;$ah$1;$aj$1;$al$1};
{$p$1;$r$1;$t$1;$v$1;$x$1;$z$1;$ab$1;$ad$1;$af$1;$ah$1;$aj$1;$al$1}>0))&" K")

Stack multiple columns into one

I want to do a simple task but somehow I'm unable to do it. Assume that I have one column like:
a
z
e
r
t
How can I create a new column with the same value twice with the following result:
a
a
z
z
e
e
r
r
t
t
I've already tried to double my column and do something like :
=TRANSPOSE(SPLIT(JOIN(";",A:A,B:B),";"))
but it creates:
a
z
e
r
t
a
z
e
r
t
I get inspired by this answer so far.
Try this:
=SORT({A1:A5;A1:A5})
Here we use:
sort
{} to combine data
Accounting your comment, then you may use this formula:
=QUERY(SORT(ArrayFormula({row(A1:A5),A1:A5;row(A1:A5),A1:A5})),"select Col2")
The idea is to use additional column of data with number of row, then sort by row, then query to get only values.
And join→split method will do the same:
=TRANSPOSE(SPLIT(JOIN(",",ARRAYFORMULA(CONCAT(A1:A5&",",A1:A5))),","))
Here we use range only two times, so this is easier to use. Also see Concat + ArrayFormula sample.
Few hundreds rows is nothing :)
I created index from 1 to n, then pasted it twice and sorted by index. But it's obviously fancier to do it with a formula :)
Assuming Your list is in column A and (for now) the times of repeat are in C1 (can be changed to a number in the formula), then something simple like this will do (starting in B1):
=INDEX(A:A,(INT(ROW()-1)/$C$1)+1)
Simply copy down as you need it (will give just 0 after the last item). No sorting. No array. No sheets/excel problems. No heavy calculations.

Resources