Looking for Special characters in MonetDB table columns

Looking for Special characters in MonetDB table columns - monetdb

I am new to MonetDB, I would like to check if there are any special characters in any of the columns in MonetDB.
For example, I have a test Database and the table name is Lmr. I would be to check if any of the columns in table lmr contains special characters?
The query I tried:
SELECT jk
FROM lmr
WHERE jk like '%[^a-Z0-9]%'
I have multiple columns, so is there any way where I can check all the columns with a special character at once?

The LIKE and ILIKE operators use PCRE internally, but do not expose the same expressive power. Basically you can only use % as wildcards.
Luckily, MonetDB already provides wrappers to the PCRE library. For some reason that I am not aware of, they are not made available by default at the SQL layer.
In order to do that, you only need to create SQL function signatures that link to the code that is already available:
CREATE OR REPLACE FUNCTION pcre_match(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."match";
CREATE OR REPLACE FUNCTION pcre_imatch(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."imatch";
CREATE OR REPLACE FUNCTION pcre_replace(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace";
CREATE OR REPLACE FUNCTION pcre_replacefirst(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace_first";
After that (to be done only once in a database), you can do:
SELECT jk
FROM lmr
WHERE pcre_imatch(jk,'[^a-z0-9]');
The second parameter is a regular PCRE pattern.
Mind that you had an error in your example. The range a-Z does not exist, because a comes after Z.
In my example I used the i (ignore case) variant of the function, and only used range a-z.
If you want you can also use Unicode categories and rewrite your example to match everything that is not a letter or a number as:
SELECT jk
FROM lmr
WHERE pcre_imatch(jk,'[^\\p{L}\\p{N}]');
Mind that you need to escape each \, which becomes then \\.
About checking multiple columns at once, assuming that you want to return the rows where the condition is satisfied on any of the given columns, you could do this (for 3 columns here):
SELECT col1,col2,col3
FROM lmr
WHERE pcre_imatch(col1 || col2 || col3,'[^\\p{L}\\p{N}]');
where || is string concatenation.
The problem with this is that it first needs to concatenate all columns together. Because MonetDB is a column-store, it will do this for all rows at once. So it will first materialize in memory (and/or disk) all columns for all rows. I'm not sure how much data you have, but that is potentially very big.
The other approach is of course:
SELECT col1,col2,col3
FROM lmr
WHERE pcre_imatch(col1,'[^\\p{L}\\p{N}]')
OR pcre_imatch(col2,'[^\\p{L}\\p{N}]')
OR pcre_imatch(col3,'[^\\p{L}\\p{N}]');
I think I would choose the second approach, as it definitely has a much smaller memory footprint.

Related

sqlldr WHEN clause

I am trying to code a sqlldr.ctl file WHEN Clause to limit the records imported to those matching a portion of the current Schema's name.
The code I have (which does NOT work) is:
LOAD DATA
TRUNCATE INTO TABLE TMP_PRIM_ACCTS
when REGION_NUM = substr(user,-3,3)
Fields terminated by "|" Optionally enclosed by '"'
Trailing NULLCOLS
( PORTFOLIO_ACCT,
PRIMARY_ACCT_ID NULLIF (PRIMARY_ASSET_ID="NULL"),
REGION_NUM NULLIF (PARTITION_NUM="NULL")
)
sqlldr returns:
SQL*Loader-350: Syntax error at line 3.
Expecting quoted string or hex identifier, found "substr".
when PARTITION_NUM = substr(user,-3,3)
I cannot put single quotes around "user", because that turns it into the literal string "user". Can anyone explain how I can reference the "active" User in this WHEN Clause?
Thank you!

Can you try something like this? (now I can't make test with SQLLDR, but this is syntax I used for changing values):
when REGION_NUM = "substr(:user,-3,3)"

It doesn't look like you can. The documentation only shows fixed values:
Trying to use an expression in when that clause (or in nullif; thought I'd try to see if you could cause a rejection based on null PK value) you just see the literal value in the log:
Table TMP_PRIM_ACCTS, loaded when REGION_NUM = 0X73756273747228757365722c2d332c3329(character 'substr(user,-3,3)')
which is sort of what you referred when you said you couldn't quote user, but you'd have to quite the whole thing anyway. Using :user doesn't work either, the colon is seen as just another character, it doesn't try to find a column called user instead.
The simplest approach may be to pre-process the data file and remove any rows which don't match the pattern (e.g. via a regex). That would actually be slightly easier if you used an external table instead of SQL*Loader.
Alternatively, generate your control file and embed the correct literal value based on the user you'll connect as.

SQL Loader incompatible length

This is my control file
FIELDS (
dummy1 filler terminated by "cid=",
address enclosed by "<address>" and "</address>"
...
The address column in the table is varchar(10).
If the address in the file is over 10 characters then SQL*Loader cannot load it.
How I can capture address truncating to 10 characters?

The documentation has a section on applying SQL operators to fields.
A wide variety of SQL operators can be applied to field data with the SQL string. This string can contain any combination of SQL expressions that are recognized by the Oracle database as valid for the VALUES clause of an INSERT statement. In general, any SQL function that returns a single value that is compatible with the target column's datatype can be used.
In this case you can use the substr() function on the value from the file:
...
dummy filler terminated by "cid=",
address enclosed by "<address>" and "</address>" "substr(:address, 1, 10)"
...
The quoted "substr(:address, 1, 10)" passes the initial value from the file through the function before inserting the resulting 10 character (maximum) value, however long the original value in the file was. Note the colon before the name in that function call.
If your file is XML then you might be better off loading it as an external table and then using the built-in XML query tools to extract the data you want, rather than trying to parse it through delimited field definitions.

Different matches when using prepared statements on CHAR(3) column

I had to make a CHAR(1 CHAR) column wider and I forgot to change the column type to VARCHAR2:
DUPLICADO CHAR(3 CHAR)
I noticed the error when my PHP app would no longer find exact matches, e.g.:
SELECT *
FROM NUMEROS
WHERE DUPLICADO = :foo
... with :foo being #4 didn't find the 3-char padded #4 value. However, I initially hit a red herring while debugging the query in SQL Developer because injecting raw values into the query would find matches!
SELECT *
FROM NUMEROS
WHERE DUPLICADO = '#4'
Why do I get matches with the second query? Why do prepared statements make a difference?

To expand a little on my comments, I found a bit in the documentation that explains difference between blankpadded and nonpadded comparison:
http://docs.oracle.com/database/121/SQLRF/sql_elements002.htm#BABJBDGB
If both values in your comparison (the two sides of the equal sign) have datatype CHAR or NCHAR or are literal strings, then Oracle chooses blankpadded comparison. That means that if the lengths are different, then it pads the short one with blanks until they are the same length.
With the column DUPLICADO being a CHAR(3), the value '#4' is stored in the column as three characters '#4 ' (note the blank as third character.) When you do DUPLICADO = '#4' the rule states Oracle will use blankpadded comparison and therefore blankpad the literal '#4' until it has the same length as the column. So it actually becomes DUPLICADO = '#4 '.
But when you do DUPLICADO = :foo, it will depend on the datatype of the bind variable. If the datatype is CHAR, it will also perform blankpadded comparison. But if the datatype is VARCHAR2, then Oracle will use non-padded comparison and then it will be up to you to ensure to do blankpadding where necessary.
Depending on client or client language you may be able to specify the datatype of the bind variable and thereby get blankpadded or nonpadded comparison as needed.
SQL Developer may be a special case that might not allow you to specify datatype - it just possibly might default to bind variables always being datatype VARCHAR2. I don't know sufficient about SQL Developer to be certain about that ;-)

Why are there only two query types in the Go Database library?

From what I can tell, there are only two types of results the Go database/sql interface library expects back - a row or an array of rows. However, there is at least one more type of result - a single column.
DB.column('SELECT COUNT(*) FROM `user` WHERE `banned` IS NOT NULL')
Is there any way to handle this - or do I just have to fetch a row and then access the COUNT(*) from that?

Yes you fetch a one column row but is that so hard ?
var count int
row := db.QueryRow("SELECT COUNT(*) FROM `user` WHERE `banned` IS NOT NULL")
err := row.Scan(&count)
Note that this may be compacted if you find it too verbose (you may remove the row variable).
I think that other similar systems in other languages (for example JDBC) don't offer natively this shortcut either.
I find easier to handle an API that I can memorize and browse rather than an API which has all the utilities I might be willing to use to remove one line in my code.

For the record, a SQL Server stored procedure returns all of the following (at the same time):
an integer return code
zero or more messages (often warnings or errors) containing text and two integer codes
zero or more named, typed scalar output parameters
zero or more "rowsets", each of which is an ordered list of zero or more rows.
Within a rowset, all rows have the same number (one or more) of named, typed columns. The column names do not have to be distinct within a rowset.
SQL Server does not recognize any special cases, like a single rowset with a single row or a single column; or a single output parameter.
Other database systems are slightly different.

Swap fields inside an Oracle record using couple separator * and elements separator |

In an Oracle table, I have record with COUPLES (string,number) so separated:
Abc|3456*Def|7890*Ghi|9430*Jkl|3534
In the previous example, the couples are:
(Abc,3456)
(Def,7890)
(Ghi,9430)
(Jkl,3534)
I would like to modify each record swapping the order of every couple (first the number, then the string):
3456|Abc*7890|Def*9430|Ghi*3534|Jkl
The separator of the two elements of a couple is pipe (|).
The SEPARATOR BETWEEN COUPLES is asterisk (*).
How can I achieve my objective to swap the order of every couple?
Thank you in advance for your kind cooperation!

Try using regular expressions...now you've got two problems:
select
cola,
regexp_replace(cola, '([^*|]*)\|([^*|]*)(\*|$)','\2|\1\3') as swapped_col
from (
select '3456|Abc*7890|Def*9430|Ghi*3534|Jkl' cola from dual
)
Basically the regex is saying search for everything that isn't a | or a * until you find |, then find everything that isn't a | or * until you find a * or then end of the string. Then swap the two bits and terminate it with the character you found as the final separator (either * or EOL). The bits that are swapped are grouped by the round brackets then in the replace string the numbers denote which is placed where... so the contents of the second set of brackets is put first, then a vertical bar, then the first set of brackets, then the third.
By default, REGEXP_REPLACE will replace every occurrence that it finds of the pattern and replace it

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio