I have a sheet to apply Query function to get the respective search data row by row. But I need to apply ArrayFormula to automate this search process. I want to know how should I do.
Expected Result
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple Custard apple/Sugar apple/Sweetsop Rose apple/Water apple
berry Cape gooseberry/Inca berry/Physalis
man Mango Mangosteen
mom
fruit Dragon fruit Egg fruit Passion fruit Black sapote/Chocolate pudding fruit
j Jackfruit Jujube Jenipapo
nake Snake fruit/Salak
me Horned Melon Honeydew melon Medlar fruit Mouse melon
Currently
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple
berry Apple Ice Apple
man Apple Ice Apple
mom Apple Ice Apple
fruit Apple Ice Apple
j Apple Ice Apple
nake Apple Ice Apple
me Apple Ice Apple
What I currently achieve is for single row using this:
=IF(LEN(F2:F)=0, IFERROR(1/0), IF(LEN(F2:F)>0, Query(TRANSPOSE(QUERY(Fruits!B:B, "select B where B contains '" & F2:F & "'")),"select * limit 12")))
How should I do. Please advise me. I attach my file link here.
[My Google Sheet file]
(https://docs.google.com/spreadsheets/d/1QDfruKtwJjmRQWqTlO3sBM-e9vp9QKwmla23ss0U1sY/edit#gid=1411907513)
use:
=ARRAY_CONSTRAIN(LAMBDA(a, b, BYROW(a, LAMBDA(x,
TRANSPOSE(IFNA(FILTER(b, SEARCH(IF(x="", "×", x), b)))))))
(F2:INDEX(F:F, MAX(ROW(F:F)*(F:F<>""))), Fruits!B2:B), 9^9, 12)
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(FILTER(FRUITS,REGEXMATCH(FRUITS,FRUIT)))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
Put this formula into G2, the result should be same as this image.
What we are doing here is...
use QUERY to get rid of blanks in range Current!F2:F, name the array as PHRASES with LAMBDA.
use QUERY to get rid of blanks in range Fruits!B:B, name the array as FRUITS with LAMBDA.
use BYROW to work on the single column array FRUITS value by value, with...
LAMBDA inside BYROW to name the value of each ROW as FRUIT,
use FILTER to filter the array FRUITS,
use REGEXMATCH to set the condition of the filter funciton in step.5, which returns TRUE for string matches,
TRANSPOSE the result of each filter to met your display format.
The filter can also be replaced by another QUERY function if you want, outputs should be identical in this case.
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(QUERY(FRUITS,"WHERE Col1 CONTAINS '"&FRUIT&"'"))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
According to you request in comments, this is the updated code:
to make it case insensitive, apply UPPER() to Col1 and FRUIT inside the transposed query,
to show blank instead of #N/A when there is no output on that row, apply IFNA() to the whole QUERY() inside the TRANSPOSE(),
to limit the length of the output array, warp up the TRANSPOSE() with ARRAY_CONSTRAIN().
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(
TRANSPOSE(IFNA(
QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'"),
"")),
1,12)
))
)(QUERY({Current!F2:F},NOTNULL),QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The code will leave an empty row if there is no match found, which is required in your comment * Show blank if no valid return. (instead of #N/A),
What do you means When there is no phrase match, that row skipped?
It won't in my test environment.
But if you mean when you leave some part of the 'check phrase' column empty, it does break the calculation, because this case is never mentioned, that you may have blanks in the check phrase column, so I simply didn't handle it.
And if that is the case, you should always include such conditions into the sample data you provide at the very begining, otherwise this is another issue, and maybe better to open another question to ask about a solution after you trying to work it out on your own.
Anyway, this is a quick solution if you need to handle blanks in Check phrase column:
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(TRANSPOSE(IFNA(IF(PHRASE="","",QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'")),"")),1,12)
))
)({Current!F2:F},QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The reason why the output result shifts upward when there are blanks in 'Check phrase' column, is because, as I said, I uses QUERY to get rid of extra blanks of the 2 source data, this helps speed things up a bit, but if there are blanks between array values, they will also be removed, which lead to the reference array being shortened.
To handle this issue, the easiest slove is, instead of removing the blanks, leave them there, and inside IFNA(), wherever encountering empty PHRASE, use a IF() to skip it by doing nothing, which result in leaving a blank row.
Related
We have a Church death insurance and of course we list down the names. If a member signs up, there's an "add" in the name. If said member opts out, we add a row below, paste his name and replace "add" with "drop". So there's a duplicate but technically not since the one has an "add" and the other has a "drop". If said member dies, same thing: add a row, paste his name then we put [deceased].
The format of the information inside the cell is Last name, First name status name of church.
ABA, ADONIS Add Upper Sumilop Church
ANG, NICK Add Upper Sumilop Church
ANG, NICK Drop Upper Sumilop Church
CAW, DERNA Add Vetarba/Talagutong Church
CAW, DERNA Deceased Vetarba/Talagutong Church
I'd like to make one list of both 'active' (didn't drop) and 'alive' (didn't die) members. Based on the example above, I'd like to make a list where I can show Aba, Adonis.
I was able to make a list where on one column, there's the dropped members, and on the other the members who have died. these columns are adjacent to each other but non-adjacent to the column of the original list.
I wasn't part of the program from the start up until they've already established the current system I'm working on now.
*they're enclosed in brackets instead of being in quotation marks.
query it:
=QUERY({A:A}; "where not lower(Col1) contains 'deceased'"; )
=QUERY({A:A};
"where not lower(Col1) contains 'deceased'
and not lower(Col1) contains 'drop'"; )
or:
=QUERY({A:A}; "where lower(Col1) contains 'add'"; )
In our schools, we have books of the same title by the same author but different ISBN #s. I am working on an inventory list so that we can scan the different ISBNs and then find out what is on hand for a title.
Here is my working spreadsheet demo. The live version will be separated (columns A-D by data that comes in on another sheet (possibly by Google Forms) and a separate sheet (F-J) that does all the math. For convenience / testing, they are all on one sheet.
Essentially, in column F, I would like to sum all the quantities in A where the ISBN's in C match any of the values of G and place it in F.
The formula I am using in F doesn't seem to completely work:
=SUMIF(C:C,arrayformula(split(G2,",")),A:A)
It captures the first match but ignores / doesn't loop over the rest. I have looked at Sumifs and Match and I cannot seem to get any closer with the syntax. I would greatly appreciate if anyone can help me solve this dilemma.
Additionally, I know how to do this with a custom script but I need to avoid that as end users break things for one reason or another and I can't handle the debugging load the way this could possibly be deployed.
Thanks in advance for anyone willing to take a look at this!
~Allan
Try in F2
=sum(query(A:D,"select A where C matches '"& textjoin("|",,split(G2,",")) &"' ",0))
delete everything in F2:F & J2:J and use F2:
=INDEX(IF(G2:G="",,MMULT(IFERROR(VLOOKUP(SPLIT(G2:G, ","), {C:C, A:A}, 2, ), 0),
SEQUENCE(COLUMNS(SPLIT(G2:G, ",")), 1, 1, ))))
in J2 use:
=ARRAYFORMULA(IF(G2:G="",,F2:F*I2:I))
How are you able to force an alphanumeric string to lowercase (or uppercase) in powerQuery?
I have a series of attribute codes coming into powerQuery , but the codes contain variations of
upper case and lower case text. In practice these items would be considered duplicates, but PowerQuery is case sensitive. I've tried using Text.lower / Text.upper but this requires the data to be type text. My data is alphanumeric (123abc, 111, aaa) and text functions do not work for data type any
Suggestions?
description below:
' Activity Activity ID'
Apple 1CA11
Apple 1ca11
Orange 2dp23
Orange 2DP23
'This should become:
Apple 1ca11
Orange 2dp23
Picture below:
You could ignore case of just the Activity ID field in Table.Distinct operations
= Table.Distinct(Source,{{"Activity", Comparer.Ordinal}, {"Activity ID", Comparer.OrdinalIgnoreCase}} )
or ignore case in all columns in the Table.Distinct
= Table.Distinct(Source, Comparer.OrdinalIgnoreCase)
Thanks Bryan Rock and Horsey Ride. I had a total goof up. The issue was in not ordering a change type prior (Looks like the original step got deleted).
Thanks for the help!
Changing type before forcing lower/uppercase solves the problem.
I'm trying to collect a dataset that could be used for automatically generating baseball articles.
I have play-by-play records of MLB games from retrosheet.org that I would like to be written out to plain text, as those that could possibly appear as part of a recap news article.
Here are some examples of the play-by-play records:
play,2,0,semim001,32,.CBFFFBBX,9/F
play,2,0,phegj001,01,FX,S7/G
play,2,0,martn003,01,CX,3/G
play,2,1,youne003,00,,NP
The following is what I would like to achieve:
For the first example
play,2,0,semim001,32,.CBFFFBBX,9/F,
I want it to be written out as something like:
"semim001 (Marcus Semien) was on three balls and two strikes in the second inning as the away player. He hit the ball into play after one called strike, one ball, three fouls, and another two balls. The fly ball was caught by the right outfielder."
The plays are formatted in the following way:
The first field is the inning, an integer starting at 1.
The second field is either 0 (for visiting team) or 1 (for home team).
The third field is the Retrosheet player id of the player at the plate.
The fourth field is the count on the batter when this particular event (play) occurred. Most Retrosheet games do not have this information, and in such cases, "??" appears in this field.
The fifth field is of variable length and contains all pitches to this batter in this plate appearance and is described below. If pitches are unknown, this field is left empty, nothing is between the commas.
The sixth field describes the play or event that occurred.
Explanations for all the symbols in the fifth and sixth field can be found on this Retrosheet page.
With Python 3, I've been able to format all the info of invariable length into a formatted sentence, which is all but the last two fields. I'm having difficulty in thinking of an efficient way to unparse (correct me if this is the wrong term to use here) the fifth and sixth fields, the pitches and the events that occurred, due to their variable length and wide variety of things that can occur.
I think I could write out all the rules based on the info on the Retrosheet website, but I'm looking for suggestions for a smarter way to do this. I wrote natural language processing as tags, hoping this could be a trivial problem in that field. Any pointers will be greatly appreciated!
My company has a client that tracks prices for products from different companies at different locations. This information goes into a database.
These companies email the prices to our client each day, and of course the emails are all formatted differently. It is impossible to have any of the companies change their format - they will not do it.
Some look sort of like this:
This is example text that could be many lines long...
Location 1
Product 1 Product 2 Product 3
$20.99 $21.99 $33.79
Location 2
Product 1 Product 2 Product 3
$24.99 $22.88 $35.59
Others look sort of like this:
PRODUCT PRICE + / -
------------ -------- -------
Location 1
1 2007.30 +048.20
2 2022.50 +048.20
Maybe some multiline text here about a holiday or something...
Location 2
1 2017.30 +048.20
2 2032.50 +048.20
Currently we have individual parsers written for each company's email format. But these formats change slightly pretty frequently. We can't count on the prices being on the same row or column each time.
It's trivial for us to look at the emails and determine which price goes with which product at which location. But not so much for our code. So I'm trying to find a more flexible solution and would like your suggestions about what approaches to take. I'm open to anything from regex to neural networks - I'll learn what I need to to make this work, I just don't know what I need to learn. Is this a lex/parsing problem? More similar to OCR?
The code doesn't have to figure out the formats all on its own. The emails fall into a few main 'styles' like the ones above. We really need the code to just be flexible enough that a new product line or whitespace or something doesn't make the file unparsable.
Thanks for any suggestions about where to start.
I think this problem would be suitable for proper parser generator. Regular expressions are too difficult to test and debug if they go wrong. However, I would go for a parser generator that is simple to use as if it was part of a language.
For these type of tasks I would go with pyparsing as its got the power of a full lr parser but without a difficult grammer to define and very good helper functions. The code is easy to read too.
from pyparsing import *
aaa =""" This is example text that could be many lines long...
another line
Location 1
Product 1 Product 2 Product 3
$20.99 $21.99 $33.79
stuff in here you want to ignore
Location 2
Product 1 Product 2 Product 3
$24.99 $22.88 $35.59 """
result = SkipTo("Location").suppress() \
# in place of "location" could be any type of match like a re.
+ OneOrMore(Word(alphas) + Word(nums)) \
+ OneOrMore(Word(nums+"$.")) \
all_results = OneOrMore(Group(result))
parsed = all_results.parseString(aaa)
for block in parsed:
print block
This returns a list of lists.
['Location', '1', 'Product', '1', 'Product', '2', 'Product', '3', '$20.99', '$21.99', '$33.79']
['Location', '2', 'Product', '1', 'Product', '2', 'Product', '3', '$24.99', '$22.88', '$35.59']
You can group things as you want but for simplicity I have just returned lists. Whitespace is ignored by default which makes things a lot simpler.
I do not know if there are equivalents in other languages.
You have given two pattern samples for text files.
I think these can be handled with scripting.
Something like: AWK, sed, grep with bash scripting.
One pattern in the first sample,
Section starts with keyword Location [Number]
second line of section has columns describing product names
third line of section has columns with prices for the products
There can be variable number of products per section.
There can be variable number of sections per file.
Products and prices are always on their designated lines of a section.
Whitespace separation identifies the (product,price) column-association.
Number of products in a section matches the number of prices in that section.
The collected data would probably be assimilated in a database.
The one thing I know I would use here is regular expressions. Three or four expressions could drive the parse logic for each e-mail format.
Trying to write the parse engine more generally than that would, I think, be skirting the edge of overprogramming it.