I have cust_nm column in a database. The column cust_nm is formatted with last name then first name are separated by a comma followed by a space than the middle initial.
TUNGESVIK, MARK M
I want to run a Oracle query to output this format.
If all your names are really in that exact format, you can do something like this
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 'TUNGESVIK, MARK M' cust_nm from dual
3 )
4 select substr( cust_nm, 1, instr(cust_nm, ', ')-1 ) last_name,
5 substr( cust_nm, instr(cust_nm, ', ')+2, instr(cust_nm, ' ', -1) - instr(cust_nm, ', ')-2) first_name,
6 substr( cust_nm, instr(cust_nm, ' ', -1)+1, length(cust_nm) ) middle_initial
7* from x
SQL> /
LAST_NAME FIRS M
--------- ---- -
TUNGESVIK MARK M
When you start including people that don't have a middle initial (or that have multiple middle initials), people with multiple spaces in their last or first name, the probability that at least some names aren't in this format but some other format, things get a lot more challenging. There are software products whose only purpose is to take incoming name data, parse it, scrub it, and standardize it. Writing your own code to try to handle every corner case is likely to take way more time than you're expecting.
Related
How can I find if the fifth position is a letter and thus not a number using Oracle ?
My last try was using the following statement:
REGEXP_LIKE (table_column, '([abcdefghijklmnopqrstuvxyz])');
Perhaps you'd rather check whether 5th position contains a number (which means that it is not something else), i.e. do the opposite of what you're doing now.
Why? Because a "letter" isn't only ASCII; have a look at the 4th row in my example - it contains Croatian characters and these aren't between [a-z] (nor [A-Z]).
SQL> with test (col) as
2 (select 'abc_3def' from dual union all
3 select 'A435D887' from dual union all
4 select '!#$%&/()' from dual union all
5 select 'ASDĐŠŽĆČ' from dual
6 )
7 select col,
8 case when regexp_like(substr(col, 5, 1), '\d+') then 'number'
9 else 'not a number'
10 end result
11 from test;
COL RESULT
------------- ------------
abc_3def number
A435D887 not a number
!#$%&/() not a number
ASDĐŠŽĆČ not a number
SQL>
Anchor to the start of the string else you may get unexpected results. This works, but remove the caret (start of string anchor) and it returns 'TRUE'! Note it uses the case-insensitive flag of 'i'.
select 'TRUE'
from dual
where regexp_like('abcd4fg', '^.{4}[A-Z]', 'i');
Yet another way to do it:
regexp_like(table_column, '^....[[:alpha:]]')
Using the character class [[:alpha:]] will pick up all letters upper case, lower case, accented and etc. but will ignore numbers, punctuation and white space characters.
If what you care about is that the character is not a number, then use
not regexp_like(table_column, '^....[[:digit:]]')
or
not regexp_like(table_column, '^....\d')
Try:
REGEXP_LIKE (table_column, '^....[a-z]')
Or:
SUBSTR (table_column, 5, 1 ) BETWEEN 'a' AND 'z'
i have a string 'MCDONALD_YYYYMMDD.TXT' i need to use regular expressions and append the '**' after the letter 'D' in the string given . (i.e In the string at postion 9 i need to append '*' based on a column value 'star_len'
if the star_len = 2 the o/p = ''MCDONALD??_YYYYMMDD.TXT'
if the star_len = 1 the o/p = ''MCDONALD?_YYYYMMDD.TXT'
with
inputs ( filename, position, symbol, len ) as (
select 'MCDONALD_20170812.TXT', 9, '*', 2 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select substr(filename, 1, position - 1) || rpad(symbol, len, symbol)
|| substr(filename, position) as new_str
from inputs
;
NEW_STR
-----------------------
MCDONALD**_20170812.TXT
select regexp_replace('MCDONALD_YYYYMMDD.TXT','MCDONALD','MCDONALD' ||
decode(star_len,1,'*',2,'**'))
from dual
This is how you could do it. I don't think you need it as a regular expression though if it is always going to be "MCDONALD".
EDIT: If you need to be providing the position in the string as well, I think a regular old substring should work.
select substr('MCDONALD_YYYYMMDD.TXT',1,position-1) ||
decode(star_len,1,'*',2,'**') || substr('MCDONALD_YYYYMMDD.TXT',position)
from dual
Where position and star_len are both columns in some table you provide(instead of dual).
EDIT2: Just to be more clear, here is another example using a with clause so that it runs without adding a table in.
with testing as
(select 'MCDONALD_YYYYMMDD.TXT' filename,
9 positionnum,
2 star_len
from dual)
select substr(filename,1,positionnum-1) ||
decode(star_len,1,'*',2,'**') ||
substr(filename,positionnum)
from testing
For the fun of it, here's a regex_replace solution. I went with a star since that what your variable was called even though your example used a question mark. The regex captures the filename string in 2 parts, the first being from the start up to 1 character before the position value, the second the rest of the string. The replace puts the captured parts back together with the stars in between.
with tbl(filename, position, star_len ) as (
select 'MCDONALD_20170812.TXT', 9, 2 from dual
)
select regexp_replace(filename,
'^(.{'||(position-1)||'})(.*)$', '\1'||rpad('*', star_len, '*')||'\2') as fixed
from tbl;
Should be a pretty simple question. I have two fields - one a year field and the other a month field. The month field is an integer and if there is only one digit such as 6 for June there is no leading zero. I want to concatenate the two fields together to get 201406 not 20146 if I concatenate them together now. I tried
year||to_char(month,'09') but the field is being displayed as 2014 06 with a space in-between the year and month. Is there a way to do this without a space?
If your output contains a space, then either your year or your month column contains a space. To get rid of these, you can use TRIM:
with v_data(year, month) as (
select '2015 ', ' 1' from dual union all
select ' 2014 ', ' 12 ' from dual union all
select '2014', '3' from dual
)
select trim(year) || lpad(trim(month), 2, '0')
from v_data
(this assumes that you really have two string columns - if you indeed have two date columns, please add example input to your question)
UPDATE
If you want to use to_char() instead, you should use the FM format modifier to get rid of the space:
select trim(year) || trim(to_char(month, 'FM09'))
from v_data
The issue is that, by default, to_char leaves a space in front of a positive formatted number, so that they line up well with negative numbers. To prevent this, use to_char(month,'fm09').
I have two rows that have a varchar column that are different according to a Java .equals(). I can't easily change or debug the Java code that's running against this particular database but I do have access to do queries directly against the database using SQLDeveloper. The fields look the same to me (they are street addresses with two lines separated by some new line or carriage feed/new line combo).
Is there a way to see all of the hidden characters as the result of a query?I'd like to avoid having to use the ascii() function with substr() on each of the rows to figure out which hidden character is different.
I'd also accept some query that shows me which character is the first difference between the two fields.
Try
select dump(column_name) from table
More information is in the documentation.
As for finding the position where the character differs, this might give you an idea:
create table tq84_compare (
id number,
col varchar2(20)
);
insert into tq84_compare values (1, 'hello world');
insert into tq84_compare values (2, 'hello' || chr(9) || 'world');
with c as (
select
(select col from tq84_compare where id = 1) col1,
(select col from tq84_compare where id = 2) col2
from
dual
),
l as (
select
level l from dual
start with 1=1
connect by level < (select length(c.col1) from c)
)
select
max(l.l) + 1position
from c,l
where substr(c.col1,1,l.l) = substr(c.col2,1,l.l);
SELECT DUMP('€ÁÑ', 1016)
FROM DUAL
... will print something like:
Typ=96 Len=3 CharacterSet=WE8MSWIN1252: 80,c1,d1
See the below data below first, i need to calculate the columns HUB_NM, PRODUCT_NM and STRIP_NM from the first 2 columns as described.
DEAL_ORIGINATION EXCH_SYMBOL HUB_NM PRODUCT_NM STRIP_NM
---------------- ---------------------------------------------- ---------- --------------------- ------------
TT_ICE IPE e-Gas Oil DEC 2010 IPE e-Gas Oil DEC 2010
GLOBEX HO DEC 2010 HO DEC 2010
ICE NG Firm Phys, ID, GDD - Transco-45 - Next Day Gas Transco-45 NG Firm Phys, ID, GDD Next Day Gas
STUSCO_ICE Brent Crude Futures - North Sea - Dec12 Brent Crude Futures DEC12
I can't work out how to do it. I know I should use SUBSTR and INSTR but I can't figure it out.
A) How to get HUB_NM column value from EXCH_SYMBOL?
If T.DEAL_ORIGINATION = 'ICE'
then
Find 1st space dash space
Find 2nd space dash space
Display the word in between, no space at the end
elsif T.DEAL_ORIGINATION in ('GLOBEX', 'TT_ICE', 'STUSCO_ICE')
then
null;
end if;
B) How to get PRODUCT_NM column value from EXCH_SYMBOL?
If T.DEAL_ORIGINATION in ( 'ICE', 'STUSCO_ICE')
then
Display from 1st character to the 1st dash, no space at the end
elsif T.DEAL_ORIGINATION in ('GLOBEX', 'TT_ICE',)
then
Remove -9 caharacters from the end of the word and display the fornt word, no space at the end
end if;
C) How to get STRIP_NM column value from EXCH_SYMBOL?
If T.DEAL_ORIGINATION in ( 'ICE', 'STUSCO_ICE')
then
Find the 2nd space dash space
Display from then on to the end of the word, no space at the end
elsif T.DEAL_ORIGINATION in ('GLOBEX', 'TT_ICE',)
then
Display the last -8 caharacters from the end of the word, no space at the end
end if;
Let's start adding some instructions to create a sample data.
CREATE TABLE mytab
(
DEAL_ORIGINATION VARCHAR2(100),
EXCH_SYMBOL VARCHAR2(100),
HUB_NM VARCHAR2(100),
PRODUCT_NM VARCHAR2(100),
STRIP_NM VARCHAR2(100)
);
INSERT INTO mytab (DEAL_ORIGINATION, EXCH_SYMBOL, HUB_NM, PRODUCT_NM, STRIP_NM)
VALUES ('TT_ICE', 'IPE e-Gas Oil DEC 2010', null, 'IPE e-Gas Oil', 'DEC 2010' );
INSERT INTO mytab (DEAL_ORIGINATION, EXCH_SYMBOL, HUB_NM, PRODUCT_NM, STRIP_NM)
VALUES ('GLOBEX', 'HO DEC 2010',null, 'HO', 'DEC 2010' );
INSERT INTO mytab (DEAL_ORIGINATION, EXCH_SYMBOL, HUB_NM, PRODUCT_NM, STRIP_NM)
VALUES ('ICE NG','Firm Phys, ID, GDD - Transco-45 - NEXT DAY Gas', 'Transco-45', 'NG Firm Phys, ID, GDD', 'NEXT DAY Gas');
INSERT INTO mytab (DEAL_ORIGINATION, EXCH_SYMBOL, HUB_NM, PRODUCT_NM, STRIP_NM)
VALUES ('STUSCO_ICE', 'Brent Crude Futures - North Sea - Dec12', null, 'Brent Crude Futures', 'DEC12');
Sure you have to make a lot of work figuring out how the instr and substr results will be. Moreover you will never figure it out by just thinking or writing down tons of parentheses.
My advice is to write a temporary select instruction with partial results, like the following:
SELECT deal_origination, exch_symbol,
INSTR(exch_symbol,' - ')+3 as string_start,
INSTR( SUBSTR(EXCH_SYMBOL,INSTR(exch_symbol,' - ')+3) , ' - ')-1 string_length ,
SUBSTR(exch_symbol, INSTR(exch_symbol,' - ')+3, INSTR( SUBSTR(EXCH_SYMBOL,INSTR(exch_symbol,' - ')+3) , ' - ')-1 ) as RESULT
FROM mytab
Please note that the RESULT column is made using the same expressions as string_start and string_length columns.
This also answers to the A question
This will give you the initial results, so you will be able to figure out what will happen inside the expression. Then put everything into a DECODE instruction
Example 2:
decode ( DEAL_ORIGINATION,
'ICE', 'results in case of ICE',
'GLOBEX', 'results in case of GLOBEX',
null)
-- the last null is the default condition
Finally to remove 9 characters to the end of a work use the LENGTH function
Example 3:
-- this removes the last 6 characters from the hello world string
select substr ( 'hello world', 1, length('hello world) - 6 )
Accept apologies for being unable to test the Oracle code on an actual machine.