I came across an Oracle function when converting Oracle 11g schema to Postgresql 11 as following:
ADD CONSTRAINT valid_session_time_zone CHECK (regexp_instr(trim(both session_time_zone),'(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)')=1);
So I translate that as a postgresql 11 DOMAIN
CREATE DOMAIN chk_time_zone AS VARCHAR CHECK ( VALUE ~* '(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)');
but how to trim both side of the string beforehand in my domain expression?
Perhaps it's just my bad eyesight, but it looks the me that the original code has a problem. The original regexp requires that the matched string start with one of +, -, or blank, but the TRIM call would remove any leading blanks. Thus if the string to be matched starts with a blank which doesn't have a + or - after it, the TRIM call will remove the blank, and thus the pattern won't match. My recommendation is to just ignore the fact that the TRIM call is there, because it appears to be a potential bug, and proceed with the match as ~* '(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)'.
Your Oracle check doesn't manipulate the data entered into the table, it just checks that a trimmed version of it matches a pattern, so clearly it's OK to insert leading/trailing whitespace, or something elsewhere trims them off...
Thus I think you can ask PGSQL to disregard leading/trailing whitespace by permitting it in the pattern:
CREATE DOMAIN chk_time_zone AS VARCHAR CHECK
( VALUE ~* '\s*(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)\s*');
Note that the regex as entered doesn't contain any anchors to the start or the end, so that pattern could occur anywhere in the data for a match. Add ^ at the start and $ at the end of the regex if the input must match the pattern entirely
For example: 'abcdef' matches '[cd]+' but not '^[cd]+$'
This is available out of the box in PostgreSQL 15 and later.
https://www.depesz.com/2021/11/26/waiting-for-postgresql-15-add-assorted-new-regexp_xxx-sql-functions/#more-4025
https://www.postgresql.org/docs/15/functions-matching.html#FUNCTIONS-POSIX-REGEXP
Related
Let's say you have a table called Employee and one of the employee names begins or includes the [$] or [#] sign within the String, like Hel$lo or like #heLLo. How can you call the value?
Is it possible to select and trim the value in a single command?
Kind regards
If you want to select the names, but with special characters $ and # removed, you can use the TRANSLATE function. Add more characters to the list if you need to.
select translate(name, 'A$#', 'A') from employee;
The function will "translate" the character 'A' to itself, '$' and '#' to nothing (simply removing them from the string), and it will leave all other characters - other than A, $ and # - unchanged. It may seem odd that you need the 'A' in this whole business; and you really don't need 'A' specifically, but you do need some character that you want to keep. The reason for that is Oracle's idiotic handling of null; don't worry about the reason, just remember the technique.
You may need to remove characters but you don't know in advance what they will be. That can be done too, but you need to be careful not to remove legitimate characters, like the dot (A. C. Green), dash (John Connor-Smith), apostrophe (Betty O'Rourke) etc. You can then do it either with regular expressions (easy to write, but not the most efficient) or with TRANSLATE as above (it looks uglier, but it will run faster). Something like this:
select regexp_replace(name, [^[:alpha:].'-]) from employee
This will replace any character that is not "alpha" (letters) or one of the characters specifically enumerated (dot, apostrophe, dash) with nothing, effectively removing them. Note that dash has a special meaning in character classes, so it must be the last one in the enumeration.
If you need to make the changes in the table itself, you can use an update statement, using TRANSLATE or REGEXP_REPLACE as shown above.
I'm trying to add some constraints on database creation command in PostgreSQL.
Currently, I could do
psql -c "CREATE database \" x y\"\"z' \""
Then, I will get a database named literally " x y"z' " (without the double-quotes boundary).
It seems that pgsql supports any characters in it's database name, which is cool.
But it leads me headaches when I am doing automation stuff with bash script.
Yes, some additional work could be done to handle these cases in script. But I think these kind of names are actually meaningless (at least in my situation :), so, is there a way to add some constraints on database naming. For example, only allow [a-zA-Z0-9_.]+.
Just do not use double quotes, which you should avoid anyway if at all possible. See Documentation:
SQL identifiers and key words must begin with a letter (a-z, but also
letters with diacritical marks and non-Latin letters) or an underscore
(_). Subsequent characters in an identifier or key word can be
letters, underscores, digits (0-9), or dollar signs ($). Note that
dollar signs are not allowed in identifiers according to the letter of
the SQL standard, so their use might render applications less
portable. The SQL standard will not define a key word that contains
digits or starts or ends with an underscore, so identifiers of this
form are safe against possible conflict with future extensions of the
standard. ... There is a second kind of identifier: the delimited
identifier or quoted identifier. It is formed by enclosing an
arbitrary sequence of characters in double-quotes ("). A delimited
identifier is always an identifier, never a key word. ... Quoted
identifiers can contain any character, except the character with code
zero. (To include a double quote, write two double quotes.) This
allows constructing table or column names that would otherwise not be
possible, such as ones containing spaces or ampersands.
Not doubling quoting in you examples makes those names invalid and Postgres has no problem telling about it. So just do not use them.
Alternately you could create an event trigger. Within there you can restrict object names as needed, esp useful if you have strict naming standards. This would allow for database enforcement of those standards;
create function app_validate_table_name()
returns event_trigger
language 'plpgsql'
as $$
begin
if obj.object_identity ~! '[A-Za-z$_][[A-Za-z0-9$_]{0,62}'
then
raise exception 'App Error: Request Name (%) is invalid for <Your App Name here>',obj.object_identity;
end if
return;
end ;
$$;
create event trigger app_table_event_trigger on ddl_command_end
when tag in ('ALTER TABLE', 'CREATE TABLE')
execute procedure app_validate_table_name();
While the same can be applied to other objects it unfortunately does not seem to apply to creating a database itself.
Disclamer: The above has NOT been tested.
Unable to trim the non breakable space in the middle of a filed in oracle
'766195491 572'
Tried the below method it works only when non breakable space is present on the sides.
select length(trim(replace('766195491 572',chr(49824),''))) from dual;
it works only when non breakable space is present on the sides
That’s what the trim() function is supposed to do:
TRIM enables you to trim leading or trailing characters (or both) from a character string
“leading or trailing” means “at the sides”. It is not supposed to have any effect on appearances of the characters anywhere else in the source string.
You need to use the replace() or translate() functions instead; or for more complicated scenarios, regular expression functions.
If the input value is in a column named input_str, then:
translate(input_str, chr(49824), chr(32))
will replace every non-breakable space in the input string with a regular (breakable) space.
If you simply want to remove all non-breakable spaces and don't want to replace them with anything, then
replace(input_str, chr(49824))
(if you omit the third argument, the result is simply removing all occurrences of the second argument).
Perhaps the requirement is more complicated though; find all occurrences of one or more consecutive non-breaking spaces and replace each such occurrence with exactly one standard space. That is more easily achieved with a regular expression function:
regexp_replace(input_str, chr(49824) || '+', chr(32))
Try CHR(32) instead of CHR(49824)
select length(replace('766195491 572',chr(32),'')) from dual;
If it does not work, use something like this.
select length(regexp_replace('766195491 572','[^-a-zA-Z0-9]','') ) from dual;
DEMO
I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
My script downloads files from the net and then it saves them under the name taken from the same web server. I need a filter/remover of invalid characters for file/folder names under Windows NTFS.
I would be happy for multi platform filter too.
NOTE: something like htmlentities would be great....
Like Geo said, by using gsub you can easily convert all invalid characters to a valid character. For example:
file_names.map! do |f|
f.gsub(/[<invalid characters>]/, '_')
end
You need to replace <invalid characters> with all the possible characters that your file names might have in them that are not allowed on your file system. In the above code each invalid character is replaced with a _.
Wikipedia tells us that the following characters are not allowed on NTFS:
U+0000 (NUL)
/ (slash)
\ (backslash)
: (colon)
* (asterisk)
? (question mark)
" (quote)
< (less than)
(greater than)
| (pipe)
So your gsub call could be something like this:
file_names.map! { |f| f.gsub(/[\x00\/\\:\*\?\"<>\|]/, '_') }
which replaces all the invalid characters with an underscore.
filename_string.gsub(/[^\w\.]/, '_')
Explanation: Replace everything except word-characters (letter, number, underscore) and dots
I think your best bet would be gsub on the filename. One of the things I know you'll need to delete/replace is :.
I don't know how you plan to use those files later, but pretty much most reliable solution would be to keep the original filenames in a db table (or otherwise serialized hash), and name physical files after the unique ID that you (or the database) generated.
PS Another advantage of this approach is that you don't have to worry about the files with the same names (or different names that filter to same names).