Trimming chr(49824) in the middle of a field in oracle - oracle

Unable to trim the non breakable space in the middle of a filed in oracle
'766195491 572'
Tried the below method it works only when non breakable space is present on the sides.
select length(trim(replace('766195491 572',chr(49824),''))) from dual;

it works only when non breakable space is present on the sides
That’s what the trim() function is supposed to do:
TRIM enables you to trim leading or trailing characters (or both) from a character string
“leading or trailing” means “at the sides”. It is not supposed to have any effect on appearances of the characters anywhere else in the source string.
You need to use the replace() or translate() functions instead; or for more complicated scenarios, regular expression functions.

If the input value is in a column named input_str, then:
translate(input_str, chr(49824), chr(32))
will replace every non-breakable space in the input string with a regular (breakable) space.
If you simply want to remove all non-breakable spaces and don't want to replace them with anything, then
replace(input_str, chr(49824))
(if you omit the third argument, the result is simply removing all occurrences of the second argument).
Perhaps the requirement is more complicated though; find all occurrences of one or more consecutive non-breaking spaces and replace each such occurrence with exactly one standard space. That is more easily achieved with a regular expression function:
regexp_replace(input_str, chr(49824) || '+', chr(32))

Try CHR(32) instead of CHR(49824)
select length(replace('766195491 572',chr(32),'')) from dual;
If it does not work, use something like this.
select length(regexp_replace('766195491 572','[^-a-zA-Z0-9]','') ) from dual;
DEMO

Related

Calling a value starting with $ character in Oracle

Let's say you have a table called Employee and one of the employee names begins or includes the [$] or [#] sign within the String, like Hel$lo or like #heLLo. How can you call the value?
Is it possible to select and trim the value in a single command?
Kind regards
If you want to select the names, but with special characters $ and # removed, you can use the TRANSLATE function. Add more characters to the list if you need to.
select translate(name, 'A$#', 'A') from employee;
The function will "translate" the character 'A' to itself, '$' and '#' to nothing (simply removing them from the string), and it will leave all other characters - other than A, $ and # - unchanged. It may seem odd that you need the 'A' in this whole business; and you really don't need 'A' specifically, but you do need some character that you want to keep. The reason for that is Oracle's idiotic handling of null; don't worry about the reason, just remember the technique.
You may need to remove characters but you don't know in advance what they will be. That can be done too, but you need to be careful not to remove legitimate characters, like the dot (A. C. Green), dash (John Connor-Smith), apostrophe (Betty O'Rourke) etc. You can then do it either with regular expressions (easy to write, but not the most efficient) or with TRANSLATE as above (it looks uglier, but it will run faster). Something like this:
select regexp_replace(name, [^[:alpha:].'-]) from employee
This will replace any character that is not "alpha" (letters) or one of the characters specifically enumerated (dot, apostrophe, dash) with nothing, effectively removing them. Note that dash has a special meaning in character classes, so it must be the last one in the enumeration.
If you need to make the changes in the table itself, you can use an update statement, using TRANSLATE or REGEXP_REPLACE as shown above.

Convert Oracle function regexp_instr to Posgresql

I came across an Oracle function when converting Oracle 11g schema to Postgresql 11 as following:
ADD CONSTRAINT valid_session_time_zone CHECK (regexp_instr(trim(both session_time_zone),'(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)')=1);
So I translate that as a postgresql 11 DOMAIN
CREATE DOMAIN chk_time_zone AS VARCHAR CHECK ( VALUE ~* '(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)');
but how to trim both side of the string beforehand in my domain expression?
Perhaps it's just my bad eyesight, but it looks the me that the original code has a problem. The original regexp requires that the matched string start with one of +, -, or blank, but the TRIM call would remove any leading blanks. Thus if the string to be matched starts with a blank which doesn't have a + or - after it, the TRIM call will remove the blank, and thus the pattern won't match. My recommendation is to just ignore the fact that the TRIM call is there, because it appears to be a potential bug, and proceed with the match as ~* '(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)'.
Your Oracle check doesn't manipulate the data entered into the table, it just checks that a trimmed version of it matches a pattern, so clearly it's OK to insert leading/trailing whitespace, or something elsewhere trims them off...
Thus I think you can ask PGSQL to disregard leading/trailing whitespace by permitting it in the pattern:
CREATE DOMAIN chk_time_zone AS VARCHAR CHECK
( VALUE ~* '\s*(\+|\-|\s)?(0?[0-9]|1[01234])(:[0-5]\d)\s*');
Note that the regex as entered doesn't contain any anchors to the start or the end, so that pattern could occur anywhere in the data for a match. Add ^ at the start and $ at the end of the regex if the input must match the pattern entirely
For example: 'abcdef' matches '[cd]+' but not '^[cd]+$'
This is available out of the box in PostgreSQL 15 and later.
https://www.depesz.com/2021/11/26/waiting-for-postgresql-15-add-assorted-new-regexp_xxx-sql-functions/#more-4025
https://www.postgresql.org/docs/15/functions-matching.html#FUNCTIONS-POSIX-REGEXP

Add spaces before Capital Letters in Oracle

I am trying to insert a space before the capital letters in oracle. I thought it would be easy using a regexp_replace, but I can't seem to get a proper back reference to the character I am replacing.
select trim(regexp_replace ('FreddyFox', '[A-Z]', ' \1' )) from dual;
Result: '\1reddy \1ox'
I have tried multiple variants of a back reference but I can't seem to find something that satisfies Oracle.
I did look at multiple SO answers but I could not figure out what is wrong.
e.g. regexp_replace: insert a space in a string if not already present
TRIM(regexp_replace ('FreddyFox', '([A-Z])', ' \1' ))
TRIM enables you to trim leading or trailing characters (or both) from a character string. If trim_character or trim_source is a character literal, then you must enclose it in single quotes. Default is both.
regexp_replace ('FreddyFox', '^([A-Z])', ' \1')

Replace non-word characters, unless given sequence matches

I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"

What's the difference between /\t+|,/ and /[\t+,]/ when split a string using Ruby?

I have a string seperated by \t and ,, but the number of \t is not fixed, for example :
a=["seg1\tseg2\t\tseg3,seg4"]
seg2 and seg3 is seperated by two \t.
So I try to split them by
a.split(/\t+|,/)
it print the right anwser :
["seg1", "seg2", "seg3", "seg4"]
And I also try this
a.split(/[\t+,]/)
but the answer is
["seg1", "seg2", "", "seg3", "seg4"]
Why ruby print different results?
Because \t+ inside [] does not mean "one or more tabs", it means "a tab or a plus". Since it finds two consecutive tabs, it splits twice, and the string in the middle becomes empty.
Most special characters, like . + * ? etc, when placed in an interval become "regular" characters. There are some exceptions, like ^ (which negates the interval when placed at the beginning), the \ (that escapes the next character(s), just like it does outside intervals) and the ] (that closes the interval; another [ is also disallowed there). So, [\t+,] actually means '\t' or '+' or ','.
Unfortunatly, I don't know any reference for the full set of characters that need or don't need escaping inside an interval. In doubt, I tend to escape just to be sure. In any case, an interval will always match a single character only, if you want something different you must put your quantifier outside the interval. (For example: [\t,]+, if you also admit two commas in a row; otherwise, your first regex is really the correct one)

Resources