Suppose this is my table:
ID STRING
1 'ABC'
2 'DAE'
3 'BYYYYYY'
4 'H'
I want to select all rows that have at least one of the characters in the STRING column somewhere in another row's STRING variable.
For example, 1 and 2 have an A in common and 1 ad 3 have a B in common, but 4 does not have any characters in common with any of the other rows. So my query should return only the first three lines.
I don't need to know with which line it matched.
Thanks!
#A.B.Cade : Good solution but could be done without any distinct nor join.
SELECT * FROM test t1
WHERE EXISTS
(
SELECT * FROM test t2
WHERE t1.id<>t2.id AND
regexp_like(t1.string, '['|| replace(t2.string, '.[]', '\.\[\]')||']')
)
The query won't compare the string with extra rows since it'll stop the comparison as soon as 1 match is found for the current row...
See fiddle.
#GolezTrol's answer is a good one, but here is another approach:
select distinct t1."ID", t1."STRING"
from table1 t1, table1 t2
where t1."ID" <> t2."ID"
and regexp_like(t1."STRING", '['|| t2."STRING"||']')
First take a cartessian product of the table
Then make sure your not comparing the same string to itself
then create a regexp from one string for comparing to the other - [<string1>] means that the string must contain one of the letters in the [ ] which are all from string1
Here is a fiddle
Like this:
select distinct
id, name
from
(select distinct
x.id,
x.NAME,
length(x.NAME) as leng,
substr(x.name, level, 1) as namechar
from
YourTable x
start with
level = 0
connect by
level <= length(x.name)) y
where
exists
(select
'x'
from
YourTable z
where
instr(z.name, y.namechar) > 0 and
z.id <> y.id)
order by
id
What it does:
First, (inner select) use the table with a number generator that returns a number for each letter in the name. Now each record in YourTable is returned Length(Name) times, each with another number. That generated number is used to isolate that letter (substr).
Then (subselect in top level where clause) check if records exist that contain that isolated letter. Distinct is needed, because records are returned more than once if more than one letter matches. You could add namechar to the outer select field list to see the letter that match.
Related
I need help
i have records 123,456,789 in rows when i am execute like
this one is working
select * from table1 where num1 in('123','456')
but when i am execute
select * from table1 where num1 in(select value from table2)
no resultset found - why?
Check the DataType varchare2 or Number
try
select * from table1 where num1 in(select to_char(value) from table2)
Storing comma separated values could be the cause of problem.
You can try using regexp_substr to split comma.
First and foremost, an important thing to remember: Do not store numbers in character datatypes. Use NUMBER or INTEGER. Secondly, always prefer VARCHAR2 datatype over CHAR if you wish to store characters > 1.
You said in one of your comments that num1 column is of type char(4). The problem with CHAR datatype is that If your string is 3 characters wide, it stores the record by adding extra 1 space character to make it 4 characters. VARCHAR2 only stores as many characters as you pass while inserting/updating and are not blank padded.
To verify that you may run select length(any_char_col) from t;
Coming to your problem, the IN condition is never satisfied because what's actually being compared is
WHERE 'abc ' = 'abc' - Note the extra space in left side operator.
To fix this, one good option is to pad the right side expression with as many spaces as required to do the right comparison.The function RPAD( string1, padded_length [, pad_string] ) could be used for this purpose.So, your query should look something like this.
select * from table1 where num1 IN (select rpad(value,4) from table2);
This will likely utilise an index on the column num1 if it exists.
The other one is to use RTRIM on LHS, which is only useful if there's a function based index on RTRIM(num1)
select * from table1 where RTRIM(num1) in(select value from table2);
So, the takeaway from all these examples is always use NUMBER types to store numbers and prefer VARCHAR2 over CHAR for strings.
See Demo to fully understand what's happening.
EDIT : It seems You are storing comma separated numbers.You could do something like this.
SELECT *
FROM table1 t1
WHERE EXISTS (
SELECT 1
FROM table2 t2
WHERE ',' ||t2.value|| ',' LIKE '%,' || rtrim(t1.num1) || ',%'
);
See Demo2
Storing comma separated values are bound to cause problems, better change it.
Let me tell you first,
You have stored values in table2 which is comma seperated.
So, how could you match your data with table1 and table2.
Its not Possible.
That's why you did not get any values in result set.
I found the Solution using string array
SELECT T.* FROM TABLE1 T,
(SELECT TRIM(VALUE)AS VAL FROM TABLE2)TABLE2
WHERE
TRIM(NUM1) IN (SELECT COLUMN_VALUE FROM TABLE(FUNC_GETSTRING_ARRAY(TABLE2.VAL)))
thanks
I have (and don't own, so I can't change) a table with a layout similar to this.
ID | CATEGORIES
---------------
1 | c1
2 | c2,c3
3 | c3,c2
4 | c3
5 | c4,c8,c5,c100
I need to return the rows that contain a specific category id. I starting by writing the queries with LIKE statements, because the values can be anywhere in the string
SELECT id FROM table WHERE categories LIKE '%c2%';
Would return rows 2 and 3
SELECT id FROM table WHERE categories LIKE '%c3%' and categories LIKE '%c2%'; Would again get me rows 2 and 3, but not row 4
SELECT id FROM table WHERE categories LIKE '%c3%' or categories LIKE '%c2%'; Would again get me rows 2, 3, and 4
I don't like all the LIKE statements. I've found FIND_IN_SET() in the Oracle documentation but it doesn't seem to work in 10g. I get the following error:
ORA-00904: "FIND_IN_SET": invalid identifier
00904. 00000 - "%s: invalid identifier"
when running this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories); (example from the docs) or this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories) <> 0; (example from Google)
I would expect it to return rows 2 and 3.
Is there a better way to write these queries instead of using a ton of LIKE statements?
You can, using LIKE. You don't want to match for partial values, so you'll have to include the commas in your search. That also means that you'll have to provide an extra comma to search for values at the beginning or end of your text:
select
*
from
YourTable
where
',' || CommaSeparatedValueColumn || ',' LIKE '%,SearchValue,%'
But this query will be slow, as will all queries using LIKE, especially with a leading wildcard.
And there's always a risk. If there are spaces around the values, or values can contain commas themselves in which case they are surrounded by quotes (like in csv files), this query won't work and you'll have to add even more logic, slowing down your query even more.
A better solution would be to add a child table for these categories. Or rather even a separate table for the catagories, and a table that cross links them to YourTable.
You can write a PIPELINED table function which return a 1 column table. Each row is a value from the comma separated string. Use something like this to pop a string from the list and put it as a row into the table:
PIPE ROW(ltrim(rtrim(substr(l_list, 1, l_idx - 1),' '),' '));
Usage:
SELECT * FROM MyTable
WHERE 'c2' IN TABLE(Util_Pkg.split_string(categories));
See more here: Oracle docs
Yes and No...
"Yes":
Normalize the data (strongly recommended) - i.e. split the categorie column so that you have each categorie in a separate... then you can just query it in a normal faschion...
"No":
As long as you keep this "pseudo-structure" there will be several issues (performance and others) and you will have to do something similar to:
SELECT * FROM MyTable WHERE categories LIKE 'c2,%' OR categories = 'c2' OR categories LIKE '%,c2,%' OR categories LIKE '%,c2'
IF you absolutely must you could define a function which is named FIND_IN_SET like the following:
CREATE OR REPLACE Function FIND_IN_SET
( vSET IN varchar2, vToFind IN VARCHAR2 )
RETURN number
IS
rRESULT number;
BEGIN
rRESULT := -1;
SELECT COUNT(*) INTO rRESULT FROM DUAL WHERE vSET LIKE ( vToFine || ',%' ) OR vSET = vToFind OR vSET LIKE ('%,' || vToFind || ',%') OR vSET LIKE ('%,' || vToFind);
RETURN rRESULT;
END;
You can then use that function like:
SELECT * FROM MyTable WHERE FIND_IN_SET (categories, 'c2' ) > 0;
For the sake of future searchers, don't forget the regular expression way:
with tbl as (
select 1 ID, 'c1' CATEGORIES from dual
union
select 2 ID, 'c2,c3' CATEGORIES from dual
union
select 3 ID, 'c3,c2' CATEGORIES from dual
union
select 4 ID, 'c3' CATEGORIES from dual
union
select 5 ID, 'c4,c8,c5,c100' CATEGORIES from dual
)
select *
from tbl
where regexp_like(CATEGORIES, '(^|\W)c3(\W|$)');
ID CATEGORIES
---------- -------------
2 c2,c3
3 c3,c2
4 c3
This matches on a word boundary, so even if the comma was followed by a space it would still work. If you want to be more strict and match only where a comma separates values, replace the '\W' with a comma. At any rate, read the regular expression as:
match a group of either the beginning of the line or a word boundary, followed by the target search value, followed by a group of either a word boundary or the end of the line.
As long as the comma-delimited list is 512 characters or less, you can also use a regular expression in this instance (Oracle's regular expression functions, e.g., REGEXP_LIKE(), are limited to 512 characters):
SELECT id, categories
FROM mytable
WHERE REGEXP_LIKE('c2', '^(' || REPLACE(categories, ',', '|') || ')$', 'i');
In the above I'm replacing the commas with the regular expression alternation operator |. If your list of delimited values is already |-delimited, so much the better.
I used a query to create a table, which has a SET in one of its columns.
T1:
serial _c3
1 193748 ["special","normal","normal"]
2 263565 ["normal","normal"]
Then I have another Table with serials only.
T2:
serial
1 193748
2 263565
3 636474
4 928396
I want a query that produces serials from T2 IF they are NOT in T1 or if T1's _c3 data has the word "special" in it. I also want a boolean value that indicates if T1 is in T2.
So using above example, I want:
T3:
serial in_t1
1 193748 1
3 636474 0
4 928396 0
Here is my query so far:
SELECT
T2.serial,
array_contains(T1._c3, 'special') as in_t1
FROM T2 LEFT OUTER JOIN T1 ON T1.serial = T2.serial
WHERE T1.serial is NULL OR array_contains(T1._c3, 'special')
LIMIT 50;
So for array_contains in select line I get this error message:
Error while compiling statement: FAILED: cannot recognize input near 'T1' '.' '_c3' in select expression.
When I remove that line from select and just run:
SELECT
T2.serial
FROM T2 LEFT OUTER JOIN T1 ON T1.serial = T2.serial
WHERE T1.serial is NULL OR array_contains(T1._c3, 'special')
LIMIT 50;
I get the same error but in the WHERE clause line now: cannot recognize input near 'T1' '.' '_c3' in select expression
Could you point me in the right direction? Thank you!
_c3 is illegal alias/column name, due to the underscore as its first character.
If you want to use it, wrap it with ticks signs (`).
Anther option would be to rename to column.
The cleanest solution would have been to alias the expression in the first place.
A super simple example of my script looks as follows:
-- Report Name: "Report_1"
col letters new_value p_letters
SELECT letters
FROM param_table
WHERE report_name = 'Report_1';
CREATE TABLE temp_table_1
(letter varchar2(1));
INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters);
For some reason, our system has a table called param_table: users enter parameters through the UI, the parameters entered are written to param_table, and then my script pulls the user's parameters from param_table.
As far as I understand, the first SELECT statement selects the letters column from param_table and makes its values accessible in '&&p_letters'. In my INSERT INTO statement, when my WHERE clause looks like this...
WHERE letter IN (&&p_letters);
...and the user enters letters separated by single quotes, eg ('A', B', C'), the script runs fine. I want to make the parameter optional, so I adjusted the WHERE clause like this:
WHERE '&&p_letters' = '' OR letter IN (&&p_letters);
In my output file, I get the following error:
WHERE (('' = '') OR letter IN ()) *
ERROR at line ...:
ORA-00936: missing expression
The compiler has evaluated the substitution variable correctly as '', but I'm getting an error.
Any idea what I could be doing wrong here?
The ORA-00936 is because IN () is not valid - you're missing something inside that. It is that it is complaining about, not the '' = '' part, though the result of that is undefined. You can check both conditions:
SQL> select * from dual where '' = '';
no rows selected
SQL> select * from dual where dummy in ();
select * from dual where dummy in ()
*
ERROR at line 1:
ORA-00936: missing expression
If you set verify on you can see how the substitution is handled. For your original query you'd see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE letter IN ('A','B','C')
3 rows inserted.
You can see that the post-substitution statement looks, and is, valid.
With your modified query you'd see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE ''A','B','C'' = '' OR letter IN ('A','B','C')
which generates an ORA-00920 because of the messed-up single quotes in the first expression. With no value from letters you'd instead see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '' = '' OR letter IN ()
which is the error you saw, ORA-00936.
I'd be tempted to do this with a collection type, either your own, or if you're comfortable with it then a built-in one:
INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE SYS.DBMS_DEBUG_VC2COLL(&&p_letters) IS EMPTY
OR letter MEMBER OF SYS.DBMS_DEBUG_VC2COLL(&&p_letters);
That works with your three comma-separated values, or null, since an empty collection is allowed. Read more about is empty and member of.
It would be better, of course, to not store comma-separated lists in a single column value anyway, and to change your data model so this kind of manipulation and reliance on client behaviour isn't necessary.
Assuming you're stuck with the data model, you could at least avoid the client reliance buy tokenizing the string (I'm using one common approach below) and looking for matches. However, you also need to account for either the report name not being in the table at all or the report existing with no letters value, both of which are handled by the max(letters) .. is null check - which makes it a bit ugly.
It's all in one statement though, with no need for a separate query to get the parameters and no need for substitution variables. (And there may be better ways to do it!)
INSERT INTO temp_table_1 (letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE (
SELECT MAX(letters)
FROM param_table
WHERE report_name = 'Report_2'
) IS NULL
OR letter IN (
SELECT TRIM(q'[']' FROM REGEXP_SUBSTR(letters, '[^,]', 1, LEVEL))
FROM param_table
WHERE report_name = 'Report_2'
CONNECT BY REGEXP_SUBSTR(letters, '[^,]', 1, level) IS NOT NULL
);
The requirement may seem a bit odd, but bear with me: Lets say I have a list of my employees like this:
pid name
-------------------------
1 Smith-Gordon
2 Hansen
3 Simpson
And a table of previous names (if e.g. Mrs Smith-Gordon and Mr Hansen had one or more different names before they were married, respectively), employeehist:
pid oldname
-------------------------
1 Smith
2 Taylor
2 Baker
What I want now is to be able to search for names and get results from both tables like this:
a) Search for "Simpson%" -> Get a result like "3, Simpson"
b) Search for "Hansen%" -> Get a result like "2, Hansen"
c) Search for "Taylor%" -> Get a result like "2, Hansen, matched on previous Taylor"
d) Search for "Smith%" -> Get a result like "1, Smith-Gordon"
In other words, I want the current record, plus the old name if that was where the pertinent match occurred.
What I tried so far:
1) Naively join the history to the current employees: The searches b), c) and d) will always contain something in the oldname column, so I can't tell where the match occurred. I also get duplicate hits for Mr Hansen.
2) I tried to UNION a first select on employees (containing a dummy NULL AS oldname) with a second select joining employeehist with employees which will return me a nice hit for search b) without an oldname and one with an oldname for c), but now I predictably get duplicates in d).
Any thoughts?
You can use the following query with a parameter:
SELECT e.pid,
CASE
WHEN e.name LIKE :search_key THEN e.name
WHEN eh.oldname LIKE :search_key THEN e.name || ' matched on previous ' || eh.oldname
END
FROM employees e
LEFT JOIN employeehist eh on (e.pid = eh.pid)
WHERE e.name LIKE :seach_key OR eh.oldname LIKE :search_key
I have come up with this solution:
SELECT * FROM ( /* (3) outer filter query */
SELECT e.pid, e.name, /* (1) query combining current and matching old names */
CASE
WHEN e.name LIKE :search_key THEN 'Y'
ELSE 'N'
END AS primary_match,
(
SELECT oldname /* (2) subquery that gives me one or no matching old name */
FROM employeehist eh
WHERE eh.pid = e.pid
AND eh.oldname LIKE :search_key
AND ROWNUM=1
)
FROM employees e
) combined
WHERE combined.primary_match = 'Y' OR combined.oldname IS NOT NULL;
There's one primary select (1) that gets me all current ids and names, and adds a CASE column whether the name matched. Additionally, it runs a subquery (2) that gets me one matching old name (also if there are several, or none if none). With that on hand I can use an outer select (2) that will filter away rows with no matches.
This would return e.g. for search key "Smith%"
pid | name | primary_match | oldname
1 | Smith-Gordon | Y | Smith
or for "Taylor%"
pid | name | primary_match | oldname
2 | Hansen | N | Taylor
I'm not sure how elegant it is, but it works as I want:
I get one result per matching current pid, no matter how many old names that pid has, matching or not. No duplicates.
I can distinguish between results that matched on the current name and those that ("only" or "also") matched on old names.
I don't need to define my matching condition twice because it gets rolled into that CASE column and I can filter on that.
There's obviously room for improvement: The subquery (2) could be made to return an aggregate of all matching old names (or the newest or oldest, I have a column for that).
But this works for me.
I have found a better solution than my previous one. My problem was that I couldn't GROUP BY pid and "squash" differing oldname rows. I'm quite sure I remember that this was possible in MySQL, but Oracle always ever gave me "979: not a GROUP BY expression". Strict but fair.
The solution is apparently to provide Oracle with a strategy how to deal with those rows:
SELECT pid, name,
MIN(oldname) KEEP (DENSE_RANK FIRST ORDER BY oldname NULLS FIRST) as oldname
/*(3) outer select combines current and old hits, and "squashes" duplicates, preferring current hits where available*/
FROM (
SELECT e.pid, e.name, null AS oldname /*(1) hits in current names*/
FROM employees e
WHERE e.name LIKE :search_key
UNION ALL
SELECT e.pid, e.name, eh.oldname /* (2) hits in old names*/
FROM employeehist eh
JOIN employees e ON e.pid = eh.pid
WHERE eh.oldname LIKE :search_key
) combined
GROUP BY pid, name;
The idea is simple: Run a query (1) that gives all matches in current names (plus a dummy "oldname" column with NULLs), then a query (2) that gives all matches in old names (complete with their joined current names to display). Then simply combine those, and remove the duplicates by pid (and name, because Oracle, but that's identical by definition) giving preference to rows where oldname is NULL.
This would return e.g. for search key "Smith%"
pid | name | oldname
1 | Smith-Gordon | NULL
which is exactly what I want. If there's a pid with a current and an old match, I don't care about the old one. Or for "Taylor%":
pid | name | oldname
2 | Hansen | Taylor
This query also appears to be roughly 10 times faster than my other solution - I guess because it avoids subqueries that depend on the current pid.
So the only odd thing is that I need to use MIN(oldname) instead of some form of identity. I get that Oracle needs an aggregate function here, but the whole point of the KEEP ... FIRST exercise is to only have one row anyway, no?
But it works, and it's fast, so I won't complain.