Power Query in Excel and pattern recognition - powerquery

As Power Query does not provide support for RegEx how can I extract a number of 8 digits from a string.
The number can appear anywhere in the string.
I have tried to use M but didn't find any complete solution. I also tried the option "Column From Examples" but the complexity of the cases does not let me have a reliable result either.
I googled it but it seems there are no examples suitable for PowerQuery/M
abc123456 -->null
abc12345678 -->12345678
abc 12345678 -->12345678
abc 12345678aaaa -->12345678
abc 1111 ddd12345678aaaa 3467-->12345678
abc 1111 ddd123456aaaa 3467-->null
etc...
EDIT
I have added the following code as suggested by Alexis Olson
= Table.AddColumn(
Sheet1_Sheet,
"PatternMatched",
(r) =>
List.First(
List.Select(
List.Transform(
{0..Text.Length(r[String]) - 8},
each Text.Range(r[String], _, 8)
),
each _ = Text.Select(_, {"0".."9"})
)
)
)
the code differs only for Sheet1_Sheet instead of #"Previous Step" because this is the field holding the data but I get the following error message "Expression.Error: The field 'String' of the record wasn't found. Details: Column1=abc123456" What am I doing wrong ?
EDIT2
I have found the error in my formula. Here the correct one
Table.AddColumn(
#"Renamed Columns",
"PatternMatched",
(r) =>
List.First(
List.Select(
List.Transform(
{0..Text.Length(r[Column]) - 8},
each Text.Range(r[Column], _, 8)
),
each _ = Text.Select(_, {"0".."9"})
)
)
)
I had simply to replace from Alexis Olson's answer:
#"Previous Step" with #"Renamed Columns" because that is the previous step name in the "Applied Steps" section of the "Query Settings" in PQ
r[String] with r[Column] because "Column" is the name of the
column containing the data I want to find the pattern in my PQ

It's not simple, but you could add a custom column that does this:
Table.AddColumn(
#"Previous Step",
"PatternMatched",
(r) =>
List.First(
List.Select(
List.Transform(
{0..Text.Length(r[String]) - 8},
each Text.Range(r[String], _, 8)
),
each _ = Text.Select(_, {"0".."9"})
)
)
)
Let's take a look at what's happening here with the example r[String] = abc12345678.
Since Text.Length(r[String]) = 11, we can look at substring ranges of length 8 starting at index 0 through index 11 - 8 = 3.
So List.Transform({0,1,2,3}, each Text.Range("abc12345678", _, 8)) transforms the list {0,1,2,3} into a list of 8-long substring ranges: {"abc12345", "bc123456", "c1234567", "12345678"}.
Now for each of these substrings, check if it consists of only digits by comparing each string to a version of itself containing only digits using Text.Select(_, {"0".."9"}) to strip all but the digit characters. Then List.Select each substring where this condition is true.
The result is a list of substrings that are a length of 8 and contain only digits (empty if none exist). Use List.First to return the first string from this list.

Related

converted excel formal to DAX

please anyone can help to convert this formal to Dax
=IF(COUNTIFS(D:D,D2,L:L,"Y")>1,"Only 1 Y contact","OK")
It is looking to see if there is more than one Y in a column (L:L) per patient id. (D:D)
I don't know any DAX function that would count the number of occurrences of a sub string in a string, but one can use SUBSTITUTE() to remove each occurrences of a sub string in the string, then compute the length LEN() difference with original string :
IF(
LEN('Table'[Column])-
LEN(
SUBSTITUTE(
'Table'[Column],
"Y",
"" )
) > 1,
"Only 1 Y contact",
"OK")

convert word to ascii and return to word adding some value

I am working on PL/SQL . The oracle password by developer is set such way
=> input word => converted to ascii => added 2 to each letter => converted back to word
ex: input password is "admin".
admin is splitted into characters/letters (a, d, m, i, n)
converted to ascii and added 2 and again converted to word
a=97 97+2 = 99 = c
d=100 100+2=102 = f
m=109 109+2=111 = o
i=105 105+2=107 = k
n=110 110+2=112 = p
what i did is
$pass=str_split('admin');
foreach($pass as $password){
$new_password[]=chr(ord($password)+2);
}
$final= $new_password[0].$new_password[1].$new_password[2].$new_password[3].$new_password[4]; //the values 0-4 is set manually
echo $final;
result: cfokp
But i could not get proper ans to run the result string on command and match the oracle password with the retrieved one.
Another way in SQL is to split the characters, add 2 to the ascii value, and aggregate the string.
Of course, it won't be faster than the TRANSLATE approach. But, for a single or small set of values it shouldn't matter much.
For example,
SQL> WITH data AS
2 (SELECT 'admin' str FROM dual
3 )
4 SELECT str, LISTAGG(CHR(ASCII(REGEXP_SUBSTR(str, '\w', 1, LEVEL)) + 2), '') WITHIN GROUP(
5 ORDER BY LEVEL) str_new
6 FROM data
7 CONNECT BY LEVEL <= LENGTH(str)
8 /
STR STR_NEW
------ -------
admin cfokp
SQL>
The above SQL does following important tasks:
Split the string into characters using REGEXP_SUBSTR and ROW GENERATOR technique
Add value 2 to the ascii value of each character.
Convert back the modified ascii into characters.
Aggregate the string using LISTAGG
This is probably easier to do with translate:
select translate('admin',
'abcdefghijklmnopqrstuvwxyz',
'cdefghijklmnopqrstuvwxyzab'
)
from dual;
I'm not sure what you want to do with "y" and "z". This maps them back to "a" and "b". You can extend this to upper case letters and other characters if you need.

ReportViewer Expressions , character check

I'd like to know if there is a way to check if there is a comma , in the !field.Value.
I want to make these conversations:
10,5 -> 10,50
900 -> 900,00
To do that, I need to know if there is a comma in the field value and also how many characters are after the comma. Is it possible ?
Look at InStr(), Len(), and IIF(), I think they will get you what you want.
I don't have a way to test this where I am, but basically I think this expression will get you there:
=IIF(InStr(Fields!MyField.Value, ",") > 0,
Fields!MyField.Value & LEFT("000000", (-1 *(2 - (Len(Fields!MyField.Value) - InStr(Fields!MyField.Value, ","))))),
Fields!MyField.Value & ",00")
Here's the basic idea of the script:
If there is a comma in the field,
then add x number of 0s onto the end of the field
where x is 2 - (the length of the field - the position of the ',' in the string) * -1
else just return the field + ",00"

xquery- how to select value from a specific element even when that element has null values/multiple return-separated values

Please consider the following XML--
<table class="rel_patent"><tbody>
<tr><td>Name</td><td>Description</td></tr>
<tr><td>A</td><td>Type-A</td></tr>
<tr><td>B</td><td>Type-B</td></tr>
<tr><td>C</td><td>Type-C</td></tr>
<tr><td>AC</td><td>Type-C
Type-A</td></tr>
<tr><td>D</td><td></td></tr>
</tbody></table>
Now I want to select and display all values of "Name" with corresp. values of "Description" element...even when Description element has null values viz element with name=D, and also, when description element has values separated by enter then I want those values (of Description) in separate rows- viz Type-C and Type-A for element with name=AC
This is the type of query I have written--
let $rows_data:= $doc//table[#class="rel_patent"]/tbody/tr[1]/following-sibling::tr
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[4]
let $original_types_list:= tokenize($original_types_w_return, '(\r?\n|\r)$')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', $original_type_each, '^', $pos2)
However, I am getting the following type of response--
A^Type-A^1
B^Type-B^1
C^Type-C^1
AC^Type-C
Type-A^1
Now, I need to get the following correct in the above code+response---
(1) The data for "AC" should be 2 separate rows with "Type-C" and "Type-A" being in each of the 2 rows along with corresp. value for last field in each row as 1 and 2 (because these are 2 values)
(2) The data for "D" is not being shown at all.
How do I correct the above code to conform with these 2 requirements?
This works:
for $data_single_row in $rows_data
return
let $cited_name:= $data_single_row/td[1]
let $original_types_w_return:= $data_single_row/td[2]
let $original_types_list:= tokenize(concat($original_types_w_return, " "), '(\r?\n|\r)')
for $cited_type_each at $pos2 in $original_types_list
return concat( $cited_name, '^', normalize-space($cited_type_each), '^', $pos2)
(The first change was to replace $original_type_each with $cited_type_each and [4] with [2] which may ).
The first problem can be solved by removing the $ at the end of the tokenize parameter, since in the default mode $ only match the end of the string.
The second one is solved by adding an space $original_types_w_return, so it is not empty and tokenize returns something, and then removing it again with normalize-space (in XQuery 3.0 it could probably be solved by using 'allowing empty' in the for expression)

Regular expression to match some conditions given a formatted file name?

(Sorry for the bad title, any suggestion appreciated) ;-)
Well, consider those strings:
first = "SC/SCO_160ZA206_T_mlaz_kdiz_nziizjeij.ext"
second = "MLA/SA2_jkj15PO_B_lkazkl lakzlk-akzl.oxt"
third = "A12A/AZD_KZALKZL_F_LKAZ_AZ__azaz___.ixt"
I'm looking for a regular expression allowing me to get arrays like this (in ruby):
first_array = ['SCO', '160ZA206', 'T', 'mlaz_kdiz_nziizjeij']
second_array = ['SA2', 'jkj15PO', 'B', 'lkazkl lakzlk-akzl']
third_array = ['AZD', 'KZALKZL', 'F', 'LKAZ_AZ__azaz___']
The first match must be anything right after the / and before the first _
The second match must be anything between the first and the second _
The third match must be anything between the second and the third _
The last match must be anything between the third _ and the last .
I can't get it: [^\/].?([A-Z]*)_(.*)_(.*)[\.$] :-(
You're super close. Just add a question mark to the second matcher to make it lazy (otherwise, it won't stop at the first underscore), and then duplicate that matcher.
[^\/].?([A-Z]*)_(.*?)_(.*?)_(.*)[\.$]
Following up on #fge's split suggestion:
str = "SC/SCO_160ZA206_T_mlaz_kdiz_nziizjeij.ext"
p str[(str.index('/')+1)...str.rindex('.')].split( '_', 4)
#=> ["SCO", "160ZA206", "T", "mlaz_kdiz_nziizjeij"]
It splits on _ for max 4 elements (the fourth element is the remainder).

Resources