How are you able to force an alphanumeric string to lowercase (or uppercase) in powerQuery?
I have a series of attribute codes coming into powerQuery , but the codes contain variations of
upper case and lower case text. In practice these items would be considered duplicates, but PowerQuery is case sensitive. I've tried using Text.lower / Text.upper but this requires the data to be type text. My data is alphanumeric (123abc, 111, aaa) and text functions do not work for data type any
Suggestions?
description below:
' Activity Activity ID'
Apple 1CA11
Apple 1ca11
Orange 2dp23
Orange 2DP23
'This should become:
Apple 1ca11
Orange 2dp23
Picture below:
You could ignore case of just the Activity ID field in Table.Distinct operations
= Table.Distinct(Source,{{"Activity", Comparer.Ordinal}, {"Activity ID", Comparer.OrdinalIgnoreCase}} )
or ignore case in all columns in the Table.Distinct
= Table.Distinct(Source, Comparer.OrdinalIgnoreCase)
Thanks Bryan Rock and Horsey Ride. I had a total goof up. The issue was in not ordering a change type prior (Looks like the original step got deleted).
Thanks for the help!
Changing type before forcing lower/uppercase solves the problem.
Related
I have a sheet to apply Query function to get the respective search data row by row. But I need to apply ArrayFormula to automate this search process. I want to know how should I do.
Expected Result
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple Custard apple/Sugar apple/Sweetsop Rose apple/Water apple
berry Cape gooseberry/Inca berry/Physalis
man Mango Mangosteen
mom
fruit Dragon fruit Egg fruit Passion fruit Black sapote/Chocolate pudding fruit
j Jackfruit Jujube Jenipapo
nake Snake fruit/Salak
me Horned Melon Honeydew melon Medlar fruit Mouse melon
Currently
Check phrase Result 1 Result 2 Result 3 Result 4
Apple Apple Ice Apple
berry Apple Ice Apple
man Apple Ice Apple
mom Apple Ice Apple
fruit Apple Ice Apple
j Apple Ice Apple
nake Apple Ice Apple
me Apple Ice Apple
What I currently achieve is for single row using this:
=IF(LEN(F2:F)=0, IFERROR(1/0), IF(LEN(F2:F)>0, Query(TRANSPOSE(QUERY(Fruits!B:B, "select B where B contains '" & F2:F & "'")),"select * limit 12")))
How should I do. Please advise me. I attach my file link here.
[My Google Sheet file]
(https://docs.google.com/spreadsheets/d/1QDfruKtwJjmRQWqTlO3sBM-e9vp9QKwmla23ss0U1sY/edit#gid=1411907513)
use:
=ARRAY_CONSTRAIN(LAMBDA(a, b, BYROW(a, LAMBDA(x,
TRANSPOSE(IFNA(FILTER(b, SEARCH(IF(x="", "×", x), b)))))))
(F2:INDEX(F:F, MAX(ROW(F:F)*(F:F<>""))), Fruits!B2:B), 9^9, 12)
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(FILTER(FRUITS,REGEXMATCH(FRUITS,FRUIT)))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
Put this formula into G2, the result should be same as this image.
What we are doing here is...
use QUERY to get rid of blanks in range Current!F2:F, name the array as PHRASES with LAMBDA.
use QUERY to get rid of blanks in range Fruits!B:B, name the array as FRUITS with LAMBDA.
use BYROW to work on the single column array FRUITS value by value, with...
LAMBDA inside BYROW to name the value of each ROW as FRUIT,
use FILTER to filter the array FRUITS,
use REGEXMATCH to set the condition of the filter funciton in step.5, which returns TRUE for string matches,
TRANSPOSE the result of each filter to met your display format.
The filter can also be replaced by another QUERY function if you want, outputs should be identical in this case.
=LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(FRUIT,
TRANSPOSE(QUERY(FRUITS,"WHERE Col1 CONTAINS '"&FRUIT&"'"))
))
)(QUERY({Current!F2:F},"WHERE Col1 IS NOT NULL"),QUERY({Fruits!B:B},"WHERE Col1 IS NOT NULL"))
According to you request in comments, this is the updated code:
to make it case insensitive, apply UPPER() to Col1 and FRUIT inside the transposed query,
to show blank instead of #N/A when there is no output on that row, apply IFNA() to the whole QUERY() inside the TRANSPOSE(),
to limit the length of the output array, warp up the TRANSPOSE() with ARRAY_CONSTRAIN().
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(
TRANSPOSE(IFNA(
QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'"),
"")),
1,12)
))
)(QUERY({Current!F2:F},NOTNULL),QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The code will leave an empty row if there is no match found, which is required in your comment * Show blank if no valid return. (instead of #N/A),
What do you means When there is no phrase match, that row skipped?
It won't in my test environment.
But if you mean when you leave some part of the 'check phrase' column empty, it does break the calculation, because this case is never mentioned, that you may have blanks in the check phrase column, so I simply didn't handle it.
And if that is the case, you should always include such conditions into the sample data you provide at the very begining, otherwise this is another issue, and maybe better to open another question to ask about a solution after you trying to work it out on your own.
Anyway, this is a quick solution if you need to handle blanks in Check phrase column:
=LAMBDA(NOTNULL,LAMBDA(PHRASES,FRUITS,
BYROW(PHRASES,LAMBDA(PHRASE,
ARRAY_CONSTRAIN(TRANSPOSE(IFNA(IF(PHRASE="","",QUERY(FRUITS,"WHERE UPPER(Col1) CONTAINS '"&UPPER(PHRASE)&"'")),"")),1,12)
))
)({Current!F2:F},QUERY({Fruits!B:B},NOTNULL)))("WHERE Col1 IS NOT NULL")
The reason why the output result shifts upward when there are blanks in 'Check phrase' column, is because, as I said, I uses QUERY to get rid of extra blanks of the 2 source data, this helps speed things up a bit, but if there are blanks between array values, they will also be removed, which lead to the reference array being shortened.
To handle this issue, the easiest slove is, instead of removing the blanks, leave them there, and inside IFNA(), wherever encountering empty PHRASE, use a IF() to skip it by doing nothing, which result in leaving a blank row.
I'm trying to collect a dataset that could be used for automatically generating baseball articles.
I have play-by-play records of MLB games from retrosheet.org that I would like to be written out to plain text, as those that could possibly appear as part of a recap news article.
Here are some examples of the play-by-play records:
play,2,0,semim001,32,.CBFFFBBX,9/F
play,2,0,phegj001,01,FX,S7/G
play,2,0,martn003,01,CX,3/G
play,2,1,youne003,00,,NP
The following is what I would like to achieve:
For the first example
play,2,0,semim001,32,.CBFFFBBX,9/F,
I want it to be written out as something like:
"semim001 (Marcus Semien) was on three balls and two strikes in the second inning as the away player. He hit the ball into play after one called strike, one ball, three fouls, and another two balls. The fly ball was caught by the right outfielder."
The plays are formatted in the following way:
The first field is the inning, an integer starting at 1.
The second field is either 0 (for visiting team) or 1 (for home team).
The third field is the Retrosheet player id of the player at the plate.
The fourth field is the count on the batter when this particular event (play) occurred. Most Retrosheet games do not have this information, and in such cases, "??" appears in this field.
The fifth field is of variable length and contains all pitches to this batter in this plate appearance and is described below. If pitches are unknown, this field is left empty, nothing is between the commas.
The sixth field describes the play or event that occurred.
Explanations for all the symbols in the fifth and sixth field can be found on this Retrosheet page.
With Python 3, I've been able to format all the info of invariable length into a formatted sentence, which is all but the last two fields. I'm having difficulty in thinking of an efficient way to unparse (correct me if this is the wrong term to use here) the fifth and sixth fields, the pitches and the events that occurred, due to their variable length and wide variety of things that can occur.
I think I could write out all the rules based on the info on the Retrosheet website, but I'm looking for suggestions for a smarter way to do this. I wrote natural language processing as tags, hoping this could be a trivial problem in that field. Any pointers will be greatly appreciated!
I was writing an external app in python that uses the message system in odoo.
So, I need to use, the mail_message, and the mail_notification tables.
I tried to put elements individually via INSERT into the table filling the necessary elements to make this work, and it works perfectly, the messages appear in the "inbox" of messages in Odoo and the notification appears correctly.
But checking the rest of the fields in this table, I see that message_id got a tag format (between <>) and a series of numbers (that I haven't found any correlation) followed by "-openerp-'res_id'-'model'-#'company'".
So, I don't know how to fill this field, my proofs determined that is not a necessary field, but in a serious implementation I don't know if left this field empty can cause some issues.
Anyone can explain me the reason of this field and how to fill it?
Thanks
You can check the code in tools/mail.py and do something similar
def generate_tracking_message_id(res_id):
"""Returns a string that can be used in the Message-ID RFC822 header field
Used to track the replies related to a given object thanks to the "In-Reply-To"
or "References" fields that Mail User Agents will set.
"""
try:
rnd = random.SystemRandom().random()
except NotImplementedError:
rnd = random.random()
rndstr = ("%.15f" % rnd)[2:]
return "<%.15f.%s-openerp-%s#%s>" % (time.time(), rndstr, res_id, socket.gethostname())
My SQL code gives me over 10 000 rows, each containing client id, name, address and so forth. In my PowerBuilder 10.5 window I've set my DataWindow in which I'm retrieving my SQL code using id as retrieve argument. I have a Single line Edit (sle_id) in which the user can write an id and search by it. What I've figured out is that all of my clients have id's length of 8 characters and starting with either "46XXXXXXXX" or "7052XXXX". So to optimize my retrieve time I want to write a code in the clicked event of my "Start" button that is located in PowerBuilder window that would first check if the id starts with one of does two options: "46..." or "7052...". I assume I'd need to use length of the characters? For example, this is what I'd want...
IF sle_id.text STARTS with 46 or 7052 THEN retrieve
ELSE MessageBox ("INFO", "Your id must have begin with either 32 or 7052")
END IF;
Of course, I need something better then "Starts with". Much oblige for all the help!
there are some string functions in powerbuilder. I think you need this:
If( left(sle_id.text, 2) = "46" or left(sle_id.text, 4) = "7052" ) then
Best Regards
Gábor
I think you're trying to solve the wrong problem. Your database should have an index on client id. If the client id is unique use a unique index.
iPhone has a pretty good telephone number splitting function, for example:
Singapore mobile: +65 9852 4135
Singapore resident line: +65 6325 6524
China mobile: +86 135-6952-3685
China resident line: +86 10-65236528
HongKong: +886 956-238-82
USA: +1 (732) 865-3286
Notice the nice features here:
- the splitting of country code, area code, and the rest is automatic;
- the delimiter is also nicely adopted to different countries, e.g. "()", "-" and space.
Note the parsing logic is doable to me, however, I don't know where to get the knowledge of most countries' telephone number format.
where could i found such knowledge, or an open source code that implemented it?
You can get similar functionality with the libphonenumber code library.
Interestingly enough, you cannot use an NSNumberFormatter for this, but you can write your own custom class for it. Just create a new class, set properties such as countryCode, areaCode and number, and then create a method that formats the number based on the countryCode.
Here's a great example: http://the-lost-beauty.blogspot.com/2010/01/locale-sensitive-phone-number.html
As an aside: a friend told me about a gigantic regular expression he had to maintain that could pick telephone numbers out of intercepted communications from hundreds of countries around the world. It was very non-trivial.
Thankfully your problem is easier, as you can just have a table with the per-country formats:
format[usa] = "+d (ddd) ddd-dddd";
format[hk] = "+ddd ddd-ddd-dd";
format[china_mobile] = "+dd ddd-dddd-dddd";
...
Then when you're printing, you simply output one digit from the phone number string in each d spot as needed. This assumes you know the country, which is a safe enough assumption for telephone devices -- pick "default" formats for the few surrounding countries.
Since some countries have different formats with different lengths you might need to store your table with additional information:
format[germany][10] = "..."
format[germany][11] = "....."