Let's say you have a table called Employee and one of the employee names begins or includes the [$] or [#] sign within the String, like Hel$lo or like #heLLo. How can you call the value?
Is it possible to select and trim the value in a single command?
Kind regards
If you want to select the names, but with special characters $ and # removed, you can use the TRANSLATE function. Add more characters to the list if you need to.
select translate(name, 'A$#', 'A') from employee;
The function will "translate" the character 'A' to itself, '$' and '#' to nothing (simply removing them from the string), and it will leave all other characters - other than A, $ and # - unchanged. It may seem odd that you need the 'A' in this whole business; and you really don't need 'A' specifically, but you do need some character that you want to keep. The reason for that is Oracle's idiotic handling of null; don't worry about the reason, just remember the technique.
You may need to remove characters but you don't know in advance what they will be. That can be done too, but you need to be careful not to remove legitimate characters, like the dot (A. C. Green), dash (John Connor-Smith), apostrophe (Betty O'Rourke) etc. You can then do it either with regular expressions (easy to write, but not the most efficient) or with TRANSLATE as above (it looks uglier, but it will run faster). Something like this:
select regexp_replace(name, [^[:alpha:].'-]) from employee
This will replace any character that is not "alpha" (letters) or one of the characters specifically enumerated (dot, apostrophe, dash) with nothing, effectively removing them. Note that dash has a special meaning in character classes, so it must be the last one in the enumeration.
If you need to make the changes in the table itself, you can use an update statement, using TRANSLATE or REGEXP_REPLACE as shown above.
Related
I'm trying to add some constraints on database creation command in PostgreSQL.
Currently, I could do
psql -c "CREATE database \" x y\"\"z' \""
Then, I will get a database named literally " x y"z' " (without the double-quotes boundary).
It seems that pgsql supports any characters in it's database name, which is cool.
But it leads me headaches when I am doing automation stuff with bash script.
Yes, some additional work could be done to handle these cases in script. But I think these kind of names are actually meaningless (at least in my situation :), so, is there a way to add some constraints on database naming. For example, only allow [a-zA-Z0-9_.]+.
Just do not use double quotes, which you should avoid anyway if at all possible. See Documentation:
SQL identifiers and key words must begin with a letter (a-z, but also
letters with diacritical marks and non-Latin letters) or an underscore
(_). Subsequent characters in an identifier or key word can be
letters, underscores, digits (0-9), or dollar signs ($). Note that
dollar signs are not allowed in identifiers according to the letter of
the SQL standard, so their use might render applications less
portable. The SQL standard will not define a key word that contains
digits or starts or ends with an underscore, so identifiers of this
form are safe against possible conflict with future extensions of the
standard. ... There is a second kind of identifier: the delimited
identifier or quoted identifier. It is formed by enclosing an
arbitrary sequence of characters in double-quotes ("). A delimited
identifier is always an identifier, never a key word. ... Quoted
identifiers can contain any character, except the character with code
zero. (To include a double quote, write two double quotes.) This
allows constructing table or column names that would otherwise not be
possible, such as ones containing spaces or ampersands.
Not doubling quoting in you examples makes those names invalid and Postgres has no problem telling about it. So just do not use them.
Alternately you could create an event trigger. Within there you can restrict object names as needed, esp useful if you have strict naming standards. This would allow for database enforcement of those standards;
create function app_validate_table_name()
returns event_trigger
language 'plpgsql'
as $$
begin
if obj.object_identity ~! '[A-Za-z$_][[A-Za-z0-9$_]{0,62}'
then
raise exception 'App Error: Request Name (%) is invalid for <Your App Name here>',obj.object_identity;
end if
return;
end ;
$$;
create event trigger app_table_event_trigger on ddl_command_end
when tag in ('ALTER TABLE', 'CREATE TABLE')
execute procedure app_validate_table_name();
While the same can be applied to other objects it unfortunately does not seem to apply to creating a database itself.
Disclamer: The above has NOT been tested.
Unable to trim the non breakable space in the middle of a filed in oracle
'766195491 572'
Tried the below method it works only when non breakable space is present on the sides.
select length(trim(replace('766195491 572',chr(49824),''))) from dual;
it works only when non breakable space is present on the sides
That’s what the trim() function is supposed to do:
TRIM enables you to trim leading or trailing characters (or both) from a character string
“leading or trailing” means “at the sides”. It is not supposed to have any effect on appearances of the characters anywhere else in the source string.
You need to use the replace() or translate() functions instead; or for more complicated scenarios, regular expression functions.
If the input value is in a column named input_str, then:
translate(input_str, chr(49824), chr(32))
will replace every non-breakable space in the input string with a regular (breakable) space.
If you simply want to remove all non-breakable spaces and don't want to replace them with anything, then
replace(input_str, chr(49824))
(if you omit the third argument, the result is simply removing all occurrences of the second argument).
Perhaps the requirement is more complicated though; find all occurrences of one or more consecutive non-breaking spaces and replace each such occurrence with exactly one standard space. That is more easily achieved with a regular expression function:
regexp_replace(input_str, chr(49824) || '+', chr(32))
Try CHR(32) instead of CHR(49824)
select length(replace('766195491 572',chr(32),'')) from dual;
If it does not work, use something like this.
select length(regexp_replace('766195491 572','[^-a-zA-Z0-9]','') ) from dual;
DEMO
I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
In general I want to use smartypants to convert straight quotes, and -- into en-dashes, etc. But occasionally I want two hyphens, not in a literal string (between backticks) I tried with -\- or -\ -, but it still gets converted into an en-dash. Is there any way I can force two consecutive hyphens without disabling smartypants? (It must work inside inline formatting, by the way).
Example:
Some times I want an en-dash -- like here -- but not when
I must explicitly write **two hyphens (--)**
The last -- should stay as is.
Use a backslash:
\--
0123456789
I have a string seperated by \t and ,, but the number of \t is not fixed, for example :
a=["seg1\tseg2\t\tseg3,seg4"]
seg2 and seg3 is seperated by two \t.
So I try to split them by
a.split(/\t+|,/)
it print the right anwser :
["seg1", "seg2", "seg3", "seg4"]
And I also try this
a.split(/[\t+,]/)
but the answer is
["seg1", "seg2", "", "seg3", "seg4"]
Why ruby print different results?
Because \t+ inside [] does not mean "one or more tabs", it means "a tab or a plus". Since it finds two consecutive tabs, it splits twice, and the string in the middle becomes empty.
Most special characters, like . + * ? etc, when placed in an interval become "regular" characters. There are some exceptions, like ^ (which negates the interval when placed at the beginning), the \ (that escapes the next character(s), just like it does outside intervals) and the ] (that closes the interval; another [ is also disallowed there). So, [\t+,] actually means '\t' or '+' or ','.
Unfortunatly, I don't know any reference for the full set of characters that need or don't need escaping inside an interval. In doubt, I tend to escape just to be sure. In any case, an interval will always match a single character only, if you want something different you must put your quantifier outside the interval. (For example: [\t,]+, if you also admit two commas in a row; otherwise, your first regex is really the correct one)