Format string in Oracle - oracle

I'm building a string in oracle, where I get a number from a column and make it a 12 digit number with the LPad function, so the length of it is 12 now.
Example: LPad(nProjectNr,12,'0') and I get 000123856812 (for example).
Now I want to split this string in parts of 3 digit with a "\" as prefix, so that the result will look like this \000\123\856\812.
How can I archive this in a select statement, what function can accomplish this?

Assuming strings of 12 digits, regexp_replace could be a way:
select regexp_replace('000123856812', '(.{3})', '\\\1') from dual
The regexp matches sequences of 3 characters and adds a \ as a prefix

It is much easier to do this using TO_CHAR(number) with the proper format model. Suppose we use \ as the thousands separator.... (alas we can't start a format model with a thousands separator - not allowed in TO_CHAR - so we still need to concatenate a \ to the left):
See also edit below
select 123856812 as n,
'\' || to_char(123856812, 'FM000G000G000G000', 'nls_numeric_characters=.\') as str
from dual
;
N STR
--------- ----------------
123856812 \000\123\856\812
Without the FM format model modifier, TO_CHAR will add a leading space (placeholder for the sign, plus or minus). FM means "shortest possible string representation consistent with the model provided" - that is, in this case, no leading space.
Edit - it just crossed my mind that we can exploit TO_CHAR() even further and not need to concatenate the first \. The thousands separator, G, may not be the first character of the string, but the currency symbol, placeholder L, can!
select 123856812 as n,
to_char(123856812, 'FML000G000G000G000',
'nls_numeric_characters=.\, nls_currency=\') as str
from dual
;

SUBSTR returns a substring of a string passed as the first argument. You can specify where the substring starts and how many characters it should be.
Try
SELECT '\'||SUBSTR('000123856812', 1,3)||'\'||SUBSTR('000123856812', 4,3)||'\'||SUBSTR('000123856812', 7,3)||'\'||SUBSTR('000123856812', 10,3) FROM dual;

Related

Regular expression to remove a portion of text from each entry in commas separated list

I have a string of comma separated values, that I want to trim down for display purpose.
The string is a comma separated list of values of varying lengths and number of list entries.
Each entry in the list is formatted as a five character pattern in the format "##-NX" followed by some text.
e.g., "01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc..."
Is there an regular expression function I can use to remove the text after the 5 character prefix portion of each entry in the list, returning "01-NX, 02-NX, 09-NX, 12-NX,..."?
I am a novice with regular expressions and I haven't been able figure out how to code the pattern.
I think what you need is
regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1')
The inner REGEXP_REPLACE looks for a pattern like nn-NX (two numeric characters followed by "-NX") and any number of characters up to the next comma, then replaces it with the first and third term, dropping the "any number of characters" part.
The outer REGEXP_REPLACE looks for a pattern like two numeric characters followed by any number of characters up to the last NX, and keeps that part of the string.
Here is the Oracle code I used for testing:
with a as (
select '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' as myString
from dual
)
select mystring
, regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1') as output
from a
This alternative calls REGEXP_REPLACE() once.
Match 2 digits, a dash and 'NX' followed by any number of zero or more characters (non-greedy) where followed by a comma or the end of the string. Replace with the first group and the 3rd group which will be either the comma or the end of the string.
EDIT: Took dougp's advice and eliminated the RTRIM by adding the 3rd capture group. Thanks for that!
WITH tbl(str) AS (
SELECT '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' FROM dual
)
SELECT
REGEXP_REPLACE(str, '(\d{2}-NX)(.*?)(,|$)', '\1\3') str
from tbl;

Oracle Contains statement with special characters

I would like to search for this string 'A&G BROS, INC.' using oracle contains statement
FROM contact
WHERE CONTAINS
(name, 'A&G BROS, INC.') > 0
But I do not get accurate results I get over 300,000 records basically anything containing INC.
I tried escaping the & char using
FROM contact
WHERE CONTAINS
(name, 'A&' || 'G BROS, INC.') > 0
I still get same massive results
Any idea how to run this query with this special chars I want to narrow the results down so I can al least get results that starts with "A&G" Note "LIKE" and "INSTR" cannot be used.
Another way to deal with the special characters is to use the function CHR(n), where n is the ASCII value of the special character. For &, it is 38, so instead of
'A&G BROS, INC.' you can use 'A'||CHR(38)||'G BROS, INC.'
Using these special characters directly in literals can be tricky, because they can behave differently in different environments.
You can find the ASCII value of a character using the ASCII function, like this:
select ascii('&') from dual;
ASCII('&')
38
The & is AND, but the , is also ACCUM. The behaviour of those operators explains what you are seeing.
You need to escape those characters:
To query on words or symbols that have special meaning in query expressions such as and & or| accum, you must escape them. There are two ways to escape characters in a query expression...
So you could do:
FROM contact
WHERE CONTAINS
(name, 'A\&G BROS\, INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, 'A{&}G BROS{,} INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, '{A&G BROS, INC.}') > 0
If you can't stop your client prompting for substitution variables - which is really a separate issue to the contains escapes - then you could combine this with your original approach:
FROM contact
WHERE CONTAINS
(name, '{A&' || 'G BROS, INC.}') > 0

Why doesn't it pads the space characters correctly when there are foreign characters?

My goal is to export a file with fixed-width columns. I have the following HQL:
insert overwrite table destination_table
select concat(rpad(p.artist_name,40," "),rpad(p.release_name,40," "))
from source_table;
"destination_table" is an external table which writes to a file. When artist_name and release_name contains normal English characters, no problem, the result is the following:
paulo kuong[29 space characters]I am terribly stuck album
I got 40 charaters fixed width columns. However, when the strings are not English, I got:
장재인[31 space characters]다른 누구도 아닌 너에게
Which suppose to be 37 space characters. LPAD seems not able to pad the spaces correctly. When I do "length(장재인)" it returns 3 characters.. So there is something weird going on with lpad and rpad in HIVE
Any idea?
I thought the rpad works as expected. According to the documents,
rpad(string str, int len, string pad)
#Returns str, right-padded with pad to a length of len
So, in your case the length of 장재인[31 space characters] should be 40.
In short the length of 장재인 should be 9.
I did a check in python, it then length of 장재인 is indeed 9.
>>> a = '장재인'
>>> len(a)
9

Oracle Pattern matching

In Oracle I want to check whether the string has "=' sign at the end. could you please let me know how to check it. If it has '=' sign at the end of string, I need to trailing that '=' sign.
for eg,
varStr VARCHAR2(20);
varStr = 'abcdef='; --needs to trailing '=' sign
I don't think you need "pattern matching" here. Just check if the last character is the =
where substr(varstr, -1, 1) = '='
substr when called with a negative position will work from the end of the string, so substr(varstr,-1,1) extracts the last character of the given string.
Use the REGEX_EXP function. I'm putting a sql command since you didn't specify on your question.:
select *
from someTable
where regexp_like( someField, '=$' );
The pattern $ means that the precedent character should be at the end of the string.
see it here on sql fiddle: http://sqlfiddle.com/#!4/d8afd/3
It seems that substr is the way to go, at lease with my sample data of about 400K address lines this returns 1043 entries that end in 'r' in an average of 0.2 seconds.
select count(*) from addrline where substr(text, -1, 1) = 'r';
On the other hand, the following returns the same results but takes 1.1 seconds.
select count(*) from addrline where regexp_like(text, 'r$' );

Oracle Builtin String Character Classes

Does Oracle have built-in string character class constants (digits, letters, alphanum, upper, lower, etc)?
My actual goal is to efficiently return only the digits [0-9] from an existing string.
Unfortunately, we still use Oracle 9, so regular expressions are not an option here.
Examples
The field should contain zero to three letters, 3 or 4 digits, then zero to two letters. I want to extract the digits.
String --> Result
ABC1234YY --> 1234
D456YD --> 456
455PN --> 455
No string constants, but you can do:
select translate
( mystring
, '0'||translate (mystring, 'x0123456789', 'x')
, '0'
)
from mytable;
For example:
select translate
( mystring
, '0'||translate (mystring, 'x0123456789', 'x')
, '0'
)
from
( select 'fdkhsd1237ehjsdf7623A#L:P' as mystring from dual);
TRANSLAT
--------
12377623
If you want to do this often you can wrap it up as a function:
create function only_digits (mystring varchar2) return varchar2
is
begin
return
translate
( mystring
, '0'||translate (mystring, 'x0123456789', 'x')
, '0'
);
end;
/
Then:
SQL> select only_digits ('fdkhsd1237ehjsdf7623A#L:P') from dual;
ONLY_DIGITS('FDKHSD1237EHJSDF7623A#L:P')
-----------------------------------------------------------------
12377623
You can check the list for predefined datatypes on Oracle here, but you are not going to find what are you looking for.
To extract the numbers of an string you can use some combination of these functions:
TO_NUMBER, to convert an string to number.
REPLACE, to remove occurences.
TRANSLATE, to convert chars.
If you provide a more concise example will be easier to give you a detailed solution.
If you are able to use PL/SQL here, another approach is write your own regular expression matcher function. One starting point is Rob Pike's elegant, very tiny regular expression matcher in Chapter 1 of Beautiful Code. One of the exercises for the reader is to add character classes. (You'd first need to translate his 30 lines of C code into PL/SQL.)

Resources