How to get multiple values in same cell in Oracle - oracle

I have a table in Oracle where there are two columns. In the first column, sometimes there are duplicate values that corresspond to a different value in the second column. How can I write a query that shows only unique values of the first column and all possible values from the second column?
The table looks somewhat like below
COLUMN_1 | COLUMN_2
NUMBER_1 | 4
NUMBER_2 | 4
NUMBER_3 | 1
NUMBER_3 | 6
NUMBER_4 | 3
NUMBER_4 | 4
NUMBER_4 | 5
NUMBER_4 | 6

You can use listagg() if you are using Oracle 11G or higher like
SELECT
COLUMN_1,
LISTAGG(COLUMN_2, '|') WITHIN GROUP (ORDER BY COLUMN_2) "ListValues"
FROM table1
GROUP BY COLUMN_1
Else, see this link for an alternative for lower versions
Oracle equivalent of MySQL group_concat

Related

Utilize function-based spatial index in SELECT list

I have an Oracle 18c table called LINES with 1000 rows. The DDL for the table can be found here: db<>fiddle.
The data looks like this:
create table lines (shape sdo_geometry);
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574360, 4767080, 574200, 4766980)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(573650, 4769050, 573580, 4768870)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574290, 4767090, 574200, 4767070)));
insert into lines (shape) values (sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571430, 4768160, 571260, 4768040)));
...
I've created a function that's intentionally slow — for testing purposes. The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
create or replace function slow_function(shape in sdo_geometry) return sdo_geometry
deterministic is
begin
return
--Deliberately make the function slow for testing purposes...
-- ...convert from SDO_GEOMETRY to JSON and back, several times, for no reason.
sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(sdo_util.from_json(sdo_util.to_json(
sdo_lrs.geom_segment_start_pt(shape)
))))))))));
end;
As an experiment, I want to create a function-based spatial index, as a way to pre-compute the result of the slow function.
Steps:
Create an entry in USER_SDO_GEOM_METADATA:
insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
values (
'lines',
'infrastr.slow_function(shape)',
-- 🡅 Important: Include the function owner.
sdo_dim_array (
sdo_dim_element('X', 567471.222, 575329.362, 0.5), --note to self: these coordinates are wrong.
sdo_dim_element('Y', 4757654.961, 4769799.360, 0.5)
),
26917
);
commit;
Create a function-based spatial index:
create index lines_idx on lines (slow_function(shape)) indextype is mdsys.spatial_index_v2;
Problem:
When I use the function in the SELECT list of a query, the index isn't being used. Instead, it's doing a full table scan...so the query is still slow when I select all rows (CTRL+ENTER in SQL Developer).
You might ask, "Why select all rows?" Answer: That's how mapping software often works...you display all (or most) of the points in the map — all at once.
explain plan for
select
slow_function(shape)
from
lines
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| LINES | 1 | 34 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Likewise, in my mapping software (ArcGIS Desktop 10.7.1), the map doesn't utilize the index either. I can tell, because the points are slow to draw in the map.
I'm aware that it's possible to create a view, and then register that view in USER_SDO_GEOM_METADATA (in addition to registering the index). And use that view in the map. I've tried that, but the mapping software still doesn't use the index.
I've also tried an SQL hint, but no luck — I don't think the hint is being used:
create or replace view lines_vw as (
select
/*+ INDEX (lines lines_idx) */
cast(rownum as number(38,0)) as objectid, --the mapping software needs a unique ID column
slow_function(shape) as shape
from
lines
where
slow_function(shape) is not null --https://stackoverflow.com/a/59581129/5576771
)
Question:
How can I utilize the function-based spatial index in the SELECT list in a query?
A spatial index is invoked only by the WHERE clause, not the SELECT list. A function in the SELECT list is invoked for every row returned by the WHERE clause, which in your case is SDO_ANYINTERACT( ) returning all rows.
You don't appear to be firing the index; just adding the function call as an attribute is insufficient
select
slow_function(shape)
from
lines
Should be....
select slow_function(shape)
from lines
Where sdo_anyinteract(slow_function(shape),sdo_geometry(2003, 26917,null,sdo_elem_info_array(1,1003,3),sdo_ordinate_array(1,2,3,4)) = 'TRUE'
Where 1,2,3,4 are the values of an optimized rectangle.
I tried using sdo_anyinteract() in the WHERE clause, as #SimonGreener suggested.
Unfortunately, the query still seems to be doing a full table scan (in addition to using the index). I was hoping to only use the index.
select
slow_function(shape) as shape
from
lines
where
sdo_anyinteract(slow_function(shape),
mdsys.sdo_geometry(2003, 26917, null, mdsys.sdo_elem_info_array(1, 1003, 1), mdsys.sdo_ordinate_array(573085.8702, 4771088.3813, 566461.6349, 4768833.3225, 570335.0629, 4757455.1278, 576959.2982, 4759710.1866, 573085.8702, 4771088.3813))
) = 'TRUE'
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | LINES | 1 | 46 | 1 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX (SEL: 0.000000 %)| LINES_IDX | | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
-----------------------------------
2 - access("MDSYS"."SDO_ANYINTERACT"("INFRASTR"."SLOW_FUNCTION"("SHAPE"),"MDSYS"."
SDO_GEOMETRY"(2003,26917,NULL,"MDSYS"."SDO_ELEM_INFO_ARRAY"(1,1003,1),"MDSYS"."SDO_OR
DINATE_ARRAY"(573085.8702,4771088.3813,566461.6349,4768833.3225,570335.0629,4757455.1
278,576959.2982,4759710.1866,573085.8702,4771088.3813)))='TRUE')
I played around with a SQL hint: /*+ INDEX (lines lines_idx) */. But that didn't seem to make a difference.
The function takes the SDO_GEOMETRY lines and outputs a SDO_GEOEMTRY point.
A possible alternative might be:
Instead of returning/indexing a geometry column, maybe I could return/index X&Y numeric columns (using a regular non-spatial index). And then either:
A) Convert the XY columns to SDO_GEOMETRY after-the-fact in a query on-the-fly, or...
B) Use GIS software to display the XY data as points in a map. For example, in ArcGIS Pro, create an "XY Event Layer".
That technique seemed to work ok here: Improve performance of startpoint query (ST_GEOMETRY). I was able to utilize the function-based index (non-spatial) in a SELECT clause — making my query significantly faster.
Of course, that technique would work best for points — since converting XYs after-the-fact to point geometries is easy/performant/practical. Whereas converting lines or polygons (maybe from WKT?) to geometries after-the-fact likely wouldn't make much sense. Even if that were possible, it would likely be too slow, and defeat the purpose of precomputing the data in the function-based index in the first place.

Oracle - Column Histograms Showing NONE even after GATHER_TABLE_STATS

I am trying to do performance tuning on a SQL query in Oracle 12c which is using a window partition. There's an index created on HUB_POL_KEY, PIT_EFF_START_DT on the table PIT. While running the explain plan with /*+ gather_plan_statistics */ hint, I observed there's a Window Sort Step in the Explain Plan which is having an Estimated Row Count of 5000K and an Actual Row Count of 1100. I executed DBMS_STATS.GATHER_TABLE_STATS on the table. When I checked in USER_TAB_COLUMNS table, I see there's no histogram generated for HUB_POL_KEY, PIT_EFF_START_DT. However, there's histogram existing for all other columns.
SQL Query
SELECT
PIT.HUB_POL_KEY,
NVL(LEAD(PIT.PIT_EFF_START_DT) OVER (PARTITION BY PIT.HUB_POL_KEY ORDER BY PIT.PIT_EFF_START_DT) ,TO_DATE('31.12.9999', 'DD.MM.YYYY')) EFF_END_DT
FROM PIT
1st Try:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT');
2nd Try:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS SIZE 254 (HUB_POL_KEY,PIT_EFF_START_DT)'));
Checking Histogram:
SELECT HISTOGRAM FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'PIT'
AND COLUMN_NAME IN ('HUB_POL_KEY','PIT_EFF_START_DT') --NONE
Table Statistics:
SELECT COUNT(*) FROM PIT --5570253
SELECT COLUMN_NAME,NUM_DISTINCT,NUM_BUCKETS,HISTOGRAM FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'PIT'
AND COLUMN_NAME IN ('HUB_POL_KEY','PIT_EFF_START_DT')
+------------------+--------------+-------------+-----------+
| COLUMN_NAME | NUM_DISTINCT | NUM_BUCKETS | HISTOGRAM |
+------------------+--------------+-------------+-----------+
| HUB_POL_KEY | 4703744 | 1 | NONE |
| PIT_EFF_START_DT | 154416 | 1 | NONE |
+------------------+--------------+-------------+-----------+
What am I missing here? Why is the bucket size 1 even when I am running the gather_table_stat procedure with method_opt specifying a size?
The correct syntax as per Oracle documentation should be method_opt=>('FOR COLUMNS (HUB_POL_KEY,PIT_EFF_START_DT) SIZE 254'). Trying it did not create the histogram stats as expected thought (maybe a bug ¯_(ツ)_/¯).
On the other side using method_opt=>('FOR ALL COLUMNS SIZE 254') or method_opt=>('FOR COLUMNS <column_name> SIZE 254') is working fine.
Probably a workaround would be then to gather stats for columns separately:
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS HUB_POL_KEY SIZE 254'));
EXEC DBMS_STATS.GATHER_TABLE_STATS('stg','PIT', method_opt=>('FOR COLUMNS PIT_EFF_START_DT SIZE 254'));

Oracle Identify not unique values in a clob column of a table

I want to identify all rows whose content in a clob column is not unique.
The query I use is:
select
id,
clobtext
from
table t
where
(select count(*) from table innerT where dbms_lob.compare(innerT.clobtext, t.clobtext) = 0)>1
However this query is very slow. Any suggestions to speed it up? I already tried to use the dbms_lob.getlength function to eliminate more elements in the subquery but I didn't really improve the performance (feels the same).
To make it more clear an example:
table
ID | clobtext
1 | a
2 | b
3 | c
4 | d
5 | a
6 | d
After running the query. I'd like to get (order doesn't matter):
1 | a
4 | d
5 | a
6 | d
In the past I've generated checksums (in my C# code) for each clob.
Whilst this will inccur a one off increase in io (to generate the checksum)
subsequent scans will be quicker, and you can index the value too
TK has a good PL\SQL example here:
Ask Tom

find string from delimited values from delimited values (Oracle PL-SQL)

How can write a query for Oracle database such that I can find a comma delimited list of values from a column that contains comma delimited list of values. The :parameter passed to sql statement is also a comma delimited values that user selected.
For e.g
We have a column in tables that contains
1 | 'A','B','C'
2 | 'C','A'
3 | 'A','B'
on the web application interface we have multi select box that shows
A
B
C
and allows user to to select one or more items.
I want rows 1 and 2 to show up if they select A and B, If they select A only the all three should show up b/c all rows 1 to 3 have 'A' value in it.
This example will hopefully help and it matches the values irrespective of which order they appear in the string in the DB record.
Create example table:
CREATE TABLE t
(val VARCHAR2(100));
Insert records:
INSERT INTO t VALUES
('1|''A'',''B'',''C''');
INSERT INTO t VALUES
('2|''C'',''A''');
INSERT INTO t VALUES
('3|''A'',''B''');
Check values:
SELECT * FROM t;
1|'A','B','C'
2|'C','A'
3|'A','B'
Check solution for 'A':
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(A)');
1|'A','B','C'
2|'C','A'
3|'A','B'
Check solution for A and B
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(A|B).*(A|B)');
1|'A','B','C'
3|'A','B'
If you want to make sure the 1| part of the result isn't matched by anything then you could query using:
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(.\|.*)(A)');
and
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(.\|.*)(A|B).*(A|B)');
Hope this helps...
You could use a where clause with varying number of bind values, depending on the number of selected options:
TEST#PRJ> create table t (c varchar2(100));
TEST#PRJ> insert into t values ('2 | ''C'',''A''');
TEST#PRJ> insert into t values ('3 | ''A'',''B''');
TEST#PRJ> select * from t where c like '%''A''%' and c like '%''B''%';
C
----------------------------------------------------------------------------------------------------------------
1 | 'A','B','C'
3 | 'A','B'
TEST#PRJ> select * from t where c like '%''A''%';
C
----------------------------------------------------------------------------------------------------------------
1 | 'A','B','C'
2 | 'C','A'
3 | 'A','B'
If the values are stored in order you could use a single bind value:
TEST#PRJ> select * from t where c like '%''A''%''B''%';
C
-----------------------------------------------------------------------------------------
1 | 'A','B','C'
3 | 'A','B'

Use Oracle unnested VARRAY's instead of IN operator

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Resources