Data Masking in TEXT Column - user-defined-functions

I have PII element in a TEXT field that needs to be masked/scrubed in my snowflake DB. i could able to achieve this using JavaScript, need to implement the same using SQL UDF function.
EG:
I'm John, this is my SSN 111-11-1111
Output :
I'm XXXX, this is my XXX XXX-XX-XXXX

If you want to replace any letter or digit with a *, I think you can use something like this:
case
when current_role() in ('ADMIN') then val
else regexp_replace(val, '[A-Za-z\d]', '*')
end;
More info in this article:
https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html

Related

Regular Expression to match both first characters and last character in oracle

I have a table with a column with the structure:
Table name : re_result
res_id
--------------
PSI8765450
PSIRRRRTY781
ABCD000001
I want to fetch the values starting with PSI and ending with 1. My expected output is PSIRRRRTY781.
I am using query
Select * from re_result
Where regexp_like(^PSI*1)
But I am not getting the output. I am getting both PSIRRRRTY781 and ABCD000001.
Plz help
You do not need regular expressions; a simpler LIKE may do the work:
select res_id
from re_result
where res_id like 'PSI%1'
The same thing can be done with regexp:
where regexp_like(res_id, '^PSI(.*)1$')
This matches 'PSI' in the beginning of the string and '1' as last character, just before the end of string ($).
Here you find something more on regexp in Oracle
Another way to handle your query.
SELECT res_id FROM re_result WHERE UPPER(res_id) like UPPER('PSI%1')

SAS format procedure, invalue statement ,UPCASE option does not work

I need to create SAS informat that will change all case versions of 'Male' and 'Female' to digits.
I found in the documentation that there is UPCASE options that does the job. "converts all raw data values to uppercase before they are compared to the possible ranges. If you use UPCASE, then make sure the values or ranges you specify are in uppercase"
Unfortunately after adding the UPCASE option none of the input values is read properly.
The SAS version id 9.2.
My code is below.
options fmtsearch=(WORK);
proc format lib=WORK;
invalue gender UPCASE
MALE = 1
FEMALE = 2
;run;
data _null_;
q='MALE';
x=input(q,gender.);
put q=;
put x=;
run;
The log is:
NOTE: Invalid argument to function INPUT at line 186 column 7.
q=MALE
x=.
q=MALE x=. _ERROR_=1 _N_=1
What is the proper usage of this option?
Very simple, just put UPCASE inside brackets...

Split characters inside Pig field

I have a text input with '|' separator as
0.0000|25000| |BM|BM901002500109999998|SZ
which I split using PigStorage
A = LOAD '/user/hue/data.txt' using PigStorage('|');
Now I need to split the field BM901002500109999998 into different fields based on their position , say 0-2 = BM - Field1 and like wise.
So after this step I should get BM, 90100, 2500, 10, 9999998.
Is there any way in Pig script to achieve this, otherwise I plan to write an UDF and put separator on required positions.
Thanks.
You are looking for SUBSTRING:
A = LOAD '/user/hue/data.txt' using PigStorage('|');
B = FOREACH A GENERATE SUBSTRING($4,0,2) AS FIELD_1, SUBSTRING($4,2,7) AS FIELD_2, SUBSTRING($4,7,11) AS FIELD_3, SUBSTRING($4,11,13) AS FIELD_4, SUBSTRING($4,13,20) AS FIELD_5;
The output would be:
dump B;
(BM,90100,2500,10,9999998)
You can find more info about this function here.
I think that it will be much more efficient to use the built in UDF REGEX_EXTRACT_ALL.
You can get some idea of how to use this UDF from:
http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#REGEX_EXTRACT_ALL
STRSPLIT and REGEX_EXTRACT_ALL in PigLatin

I want fetch substring from in oracle table between last '/' and before '.' from last in images table

I want to fetch substring from string in column between last '/' and last '.' .
Here is sample date for IMAGE_PATH column name:
sph/images/30_Fairhall_Court.jpeg
sph/images/9_Pennethorne_House.jpeg
rbkc/images/TAVISTOCK_CRESCENT.jpeg
haringey/images/399932thumb.jpg
urbanchoice/images/18190862.jpg
wandle/images/f13c10d2-2692-457d-a208-8bb9e10b27dc.png
housingmoves/images/No14_Asterid Heights_DS37620.jpg
wandle/images/f13c10d2-2692-457d-a208-8bb9e10b27dc.png
So the required output is like
30_Fairhall_Court
9_Pennethorne_House
TAVISTOCK_CRESCENT
399932thumb
18190862
f13c10d2-2692-457d-a208-8bb9e10b27dc
No14_Asterid Heights_DS37620
f13c10d2-2692-457d-a208-8bb9e10b27dc
Please suggest how to fetch. I need to update another blank column in table with this value. The table has around 10 lacks records.
One of possible solutions is to use functions substr() and instr() with negative third parameter:
select image_path,
substr(image_path,
instr(image_path, '/', -1) + 1,
instr(image_path, '.', -1)-instr(image_path, '/', -1) - 1) img
from test
SQL Fiddle
Results:
IMAGE_PATH IMG
-------------------------------------------------------- -------------------------------------
sph/images/30_Fairhall_Court.jpeg 30_Fairhall_Court
sph/images/9_Pennethorne_House.jpeg 9_Pennethorne_House
rbkc/images/TAVISTOCK_CRESCENT.jpeg TAVISTOCK_CRESCENT
haringey/images/399932thumb.jpg 399932thumb
urbanchoice/images/18190862.jpg 18190862
wandle/images/f13c10d2-2692-457d-a208-8bb9e10b27dc.png f13c10d2-2692-457d-a208-8bb9e10b27dc
housingmoves/images/No14_Asterid Heights_DS37620.jpg No14_Asterid Heights_DS37620
wandle/ima.ges/f13c10d2-2692-457d-a208-8bb9e10b27dc.png f13c10d2-2692-457d-a208-8bb9e10b27dc
This regex works with the sample data you provided:
select regexp_substr(image_path
, '(/)([a-z0-9_ \-]+)(\.)([a-z]+)$'
, 1
, 1
, 'i'
, 2)
from t23
/
We have to include all the optional parameters after pattern so we can use the subexpr parameter to select just the filename element. Find out more.
As far as the updating goes, a million row table isn't that big. Given that you have to update all the rows there's not much you can do to tune it. Just issue the UPDATE statement and let it rip.
"its not working"
Hmmm, here's a SQL Fiddle which proves it does work. You've probably introduced a typo.
"The regexp looks unnecessary complex. Why not simply"
Perhaps it is too complicated. However your simplified version doesn't produce the correct result if there's more than one dot in the IMAGE_PATH. If that's never going to happen then your solution works just fine.

Whats the XPath equivalent to SQL In query?

I would like to know whats the XPath equivalent to SQL In query. Basically in sql i can do this:
select * from tbl1 where Id in (1,2,3,4)
so i want something similar in XPath/Xsl:
i.e.
//*[#id= IN('51417','1121','111')]
Please advice
(In XPath 2,) the = operator always works like in.
I.e. you can use
//*[#id = ('51417','1121','111')]
A solution is to write out the options as separate conditions:
//*[(#id = '51417') or (#id = '1121') or (#id = '111')]
Another, slightly less verbose solution that looks a bit like a hack, though, would be to use the contains function:
//*[contains('-51417-1121-111-', concat('-', #id, '-'))]
Literally, this means you're checking whether the value of the id attribute (preceeded and succeeded by a delimiter character) is a substring of -51417-1121-111-. Note that I am using a hyphen (-) as a delimiter of the allowable values; you can replace that with any character that will not appear in the id attribute.

Resources