From a database composed of two tables, EMP and DEP, I need to list all locations (DEPT.LOC ),how many departments are in this location and how many employees (EMP.EMPNO) are in this departments.
Create a view listing all locations (Loc) along with the number of departments in that location and the number of employees employed in those departments.
I wrote the below query but I get the error: missing expression.
SELECT D1.LOC, COUNT(D1.DEPTNO),COUNT(SELECT * FROM EMP E,DEPT D WHERE E.DEPTNO = D.DEPTNO AND D.LOC = D1.LOC ) FROM DEPT D1 GROUP BY D1.LOC
The way you tried is almost right (although it is not the most natural way to do it) - and it can be salvaged.
Your inner select returns several rows per department. COUNT(...) applies to a single expression - it is over a column, but it counts individual expressions, not multi-row inputs.
Instead, in the inner select you should select count(*), not *; and then the outer aggregate should be sum(). Something like this:
SELECT D1.LOC, COUNT(D1.DEPTNO),
sum((SELECT count(*) FROM EMP E, DEPT D
WHERE E.DEPTNO = D.DEPTNO AND D.LOC = D1.LOC ))
FROM DEPT D1 GROUP BY D1.LOC
Note also the two sets of parentheses used with the sum() function. (This is, in fact, what caused the immediate error you reported.) You are summing numbers, and the outer parentheses are part of the function call. But the numbers you are summing are the result of a "scalar subquery" (a subquery that returns a single row and single column, which happens to be a numeric value over which you can sum; in this case, that numeric value is a count, of employees by department). A subquery, scalar or otherwise, must always also be enclosed in its own set of parentheses, even if it appears within a function call, or other kinds of parentheses.
Now: the more natural way to do the same thing is with a join; not sure if you are far enough in your intro course to have studied this though.
select d.loc, count(distinct d.deptno) as departments, count(e.empno) as employees
from dept d left outer join emp e on d.deptno = e.deptno
group by d.loc
;
The outer join is needed so that departments are counted (for each location) even if they don't have any employees. In the SCOTT schema there is, in fact, a department with no employees - and it is the only department in its location. If you used an inner join above, that location wouldn't even appear in the output.
NOTE - the homework problem asks you to create a view, not a SELECT query. That part is trivial - I assume you can do it by yourself.
Related
my current problem is in 11g, but I am also interested in how this might be solved smarter in later versions.
I want to join two tables. Table A has 10 million rows, Table B is huge and has a billion of records across about a thousand partitions. One partition has around 10 million records. I am not joining on the partition key. For most rows of Table A, one or more rows in Table B will be found.
Example:
select * from table_a a
inner join table_b b on a.ref = b.ref
The above will return about 50 million rows, whereas the results come from about 30 partitions of table b. I am assuming a hash join is the correct join here, hashing table a and FTSing/index-scanning table b.
So, 970 partitions were scanned for no reason. And, I have a third query that could tell oracle which 30 partitions to check for the join.
Example of third query:
select partition_id from table_c
This query gives exactly the 30 partitions for the query above.
To my question:
In PL/SQL one can solve this by
select the 30 partition_ids into a variable (be it just a select listagg(partition_id,',') ... into v_partitions from table_c
Execute my query like so:
execute immediate 'select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in ('||v_partitions||')' into ...
Let's say this completes in 10 minutes.
Now, how can I do this in the same amount of time with pure SQL?
Just simply writing
select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in (select partition_id from table_c)
does not do the trick it seems, or I might be aiming at the wrong plan.
The plan I think I want is
hash join
table a
nested loop
table c
partition pruning here
table b
But, this does not come back in 10 minutes.
So, how to do this in SQL and what execution plan to aim at? One variation I have not tried yet that might be the solution is
nested loop
table c
hash join
table a
partition pruning here (pushed predicate from the join to c)
table b
Another feeling I have is that the solution might lie in joining table a to table c (not sure on what though) and then joining this result to table b.
I am not asking you to type everything out for me. Just a general concept of how to do this (getting partition restriction from a query) in SQL - what plan should I aim at?
thank you very much! Peter
I'm not an expert at this, but I think Oracle generally does the joins first, then applies the where conditions. So you might get the plan you want by moving the partition pruning up into a join condition:
select * from table_a a
inner join table_b b on a.ref = b.ref
and b.partition_id in (select partition_id from table_c);
I've also seen people try to do this sort of thing with an inline view:
select * from table_a a
inner join (select * from table_b
where partition_id in (select partition_id from table_c)) b
on a.ref = b.ref;
thank you all for your discussions with me on this one. In my case this was solved (not by me) by adding a join-path between table_c and table_a and by overloading the join conditions as below. In my case this was possible by adding column partition_id to table_a:
select * from
table_c c
JOIN table_a a ON (a.partition_id = c.partition_id)
JOIN table_b b ON (b.partition_id = c.partition_id and b.partition_id = a.partition_id and b.ref = a.ref)
And this is the plan you want:
leading(c,b,a) use_nl(c,b) swap_join_inputs(a) use_hash(a)
So you get:
hash join
table a
nested loop
table c
partition list iterator
table b
I have two tables on Oracle database, one is named departments_table and the other is locations_table. The departments.table has dep_id, dep_name, location_id, staff_id, employer_id. The locations table consists of location_id, city_id, streetname_id and postcode_id. How do I calculate the number of departments that each location has?
This is the code below is what I have tried to replicate but have been unsuccessful. The error message below that is what shows once the code has submitted.
SELECT dep_name, location_id,
COUNT(*)
FROM departments_table
WHERE location_id => 1
GROUP BY dep_name;
The results of this is an error, " not a single group function "
If you want to count how many departments are in each location, then you must group by location, not by department name, right? Let's start with that.
Then, you don't need ANYTHING about the individual departments in the output of the query, do you? You just need the location id and the count of departments.
select location_id, count(*) as cnt
from departments_table
group by location_id
;
This does most of the work. You may want to add the location name (city, address, etc.), which is/are stored elsewhere - in the locations_table. So you will need a join. And there may be locations in that table that are not, in fact, the location of any department (their id doesn't appear in the departments_table at all). If so, you would need an OUTER join. Also for those departments you probably want to show a count of 0 (rather than null) - you can "fix" that with the nvl() function. So you will end up with something like
select l.*, nvl(g.cnt, 0) as department_count
from locations_table l
left outer join
( select location_id, count(*) as cnt
from departments_table
group by location_id
) g
on l.location_id = g.location_id
;
SELECT l.location_id, l.city, COUNT(d.DEPARTMENT_ID)
FROM OEHR_LOCATIONS l, OEHR_DEPARTMENTS d WHERE l.location_id = d.location_id
GROUP BY l.location_id, l.city ORDER BY l.city;
This method works. I created aliases and made minor changes. OEHR stands for the table names so ignore that.
I was just wondering how you select multiple records from a table column. Please see below the query.
SELECT DISTINCT DEPARTMENT_NAME, CITY, COUNTRY_NAME
FROM OEHR_DEPARTMENTS
NATURAL JOIN OEHR_EMPLOYEES
NATURAL JOIN OEHR_LOCATIONS
NATURAL JOIN OEHR_COUNTRIES
WHERE JOB_ID = 'SA_MAN' AND JOB_ID = 'SA_REP'
;
Basically, I want to be able to select records from the table column I have, however when you use AND it only displays SA_MAN and not SA_REP. I have also tried to use OR and it displays no rows selected. How would I actually be able to select both Job ID's without it just displaying one or the other.
Sorry this may sound like a stupid question (and probably not worded right), but I am pretty new to Oracle 11g SQL.
For your own comfort while debugging, I suggest you to use inner joins instead of natual joins.
That where clause is confusing, if not utterly wrong, because you don't make clear which tables' JOB_ID should be filtered. Use inner joins, give aliases to tables, and refer to those aliases in the where clause.
select distinct DEPARTMENT_NAME, CITY, COUNTRY_NAME
from OEHR_DEPARTMENTS t1
join OEHR_EMPLOYEES t2
on ...
join OEHR_LOCATIONS t3
on ...
join OEHR_COUNTRIES t4
on ...
where tn.JOB_ID = 'SA_MAN' AND tm.JOB_ID = 'SA_REP'
After rephrasing your query somehow like this, you'll have a clearer view on the logical operator you'll have to use in the where clause, which I bet will be an OR.
EDIT (after more details were given)
To list the departments that employ staff with both 'SA_MAN' and 'SA_REP' job_id, you have to join the departments table with the employees twice, once with the filter job_id='SA_MAN' and once with job_id='SA_REP'
select distinct DEPARTMENT_NAME, CITY, COUNTRY_NAME
from OEHR_DEPARTMENTS t1
join OEHR_EMPLOYEES t2
on t1.department_id = t2.department_id --guessing column names
join OEHR_EMPLOYEES t3
on t1.department_id = t3.department_id --guessing column names
join OEHR_LOCATIONS t4
on t1.location_id = t4.location_id --guessing column names
join OEHR_COUNTRIES t5
on t4.country_id = t5.country_id --guessing column names
where t2.job_id = 'SA_MAN' and t3.job_id = 'SA_REP'
order by 1, 2, 3
I've faced with a weird problem now. The query itself is huge so I'm not going to post it here (I could post however in case someone needs to see). Now I have a table ,TABLE1, with a CHAR(1) column, COL1. This table column is queried as part of my query. When I filter the recordset for this column I say:
WHERE TAB1.COL1=1
This way the query runs and returns a very big resultset. I've recently updated one of the subqueries to speed up the query. But after this when I write WHERE TAB1.COL1=1 it does not return anything, but if I change it to WHERE TAB1.COL1='1' it gives me the records I need. Notice the WHERE clause with quotes and w/o them. So to make it more clear, before updating one of the sub-queries I did not have to put quotes to check against COL1 value, but after updating I have to. What feature of Oracle is it that I'm not aware of?
EDIT: I'm posting the tw versions of the query in case someone might find it useful
Version 1:
SELECT p.ssn,
pss.pin,
pd.doc_number,
p.surname,
p.name,
p.patronymic,
to_number(p.sex, '9') as sex,
citiz_c.short_name citizenship,
p.birth_place,
p.birth_day as birth_date,
coun_c.short_name as country,
di.name as leg_city,
trim( pa.settlement
|| ' '
|| pa.street) AS leg_street,
pd.issue_date,
pd.issuing_body,
irs.irn,
irs.tpn,
irs.reg_office,
to_number(irs.insurer_type, '9') as insurer_type,
TO_CHAR(sa.REG_CODE)
||CONVERT_INT_TO_DOUBLE_LETTER(TO_NUMBER(SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 2, 3)))
||SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 5, 4) CONVERTED_SSN_DOSSIER_NR,
fa.snr
FROM
(SELECT pss_t.pin,
pss_t.ssn
FROM EHDIS_INSURANCE.pin_ssn_status pss_t
WHERE pss_t.difference_status < 5
) pss
INNER JOIN SSPF_CENTRE.file_archive fa
ON fa.ssn = pss.ssn
INNER JOIN SSPF_CENTRE.persons p
ON p.ssn = fa.ssn
INNER JOIN
(SELECT pd_2.ssn,
pd_2.type,
pd_2.series,
pd_2.doc_number,
pd_2.issue_date,
pd_2.issuing_body
FROM
--The changed subquery starts here
(SELECT ssn,
MIN(type) AS type
FROM SSPF_CENTRE.person_documents
GROUP BY ssn
) pd_1
INNER JOIN SSPF_CENTRE.person_documents pd_2
ON pd_2.type = pd_1.type
AND pd_2.ssn = pd_1.ssn
) pd
--The changed subquery ends here
ON pd.ssn = p.ssn
INNER JOIN SSPF_CENTRE.ssn_archive sa
ON p.ssn = sa.ssn
INNER JOIN SSPF_CENTRE.person_addresses pa
ON p.ssn = pa.ssn
INNER JOIN
(SELECT i_t.irn,
irs_t.ssn,
i_t.tpn,
i_t.reg_office,
(
CASE i_t.insurer_type
WHEN '4'
THEN '1'
ELSE i_t.insurer_type
END) AS insurer_type
FROM sspf_centre.irn_registered_ssn irs_t
INNER JOIN SSPF_CENTRE.insurers i_t
ON i_t.irn = irs_t.new_irn
OR i_t.old_irn = irs_t.old_irn
WHERE irs_t.is_registration IS NOT NULL
AND i_t.is_real IS NOT NULL
) irs ON irs.ssn = p.ssn
LEFT OUTER JOIN SSPF_CENTRE.districts di
ON di.code = pa.city
LEFT OUTER JOIN SSPF_CENTRE.countries citiz_c
ON p.citizenship = citiz_c.numeric_code
LEFT OUTER JOIN SSPF_CENTRE.countries coun_c
ON pa.country_code = coun_c.numeric_code
WHERE pa.address_flag = '1'--Here's the column value with quotes
AND fa.form_type = 'Q3';
And Version 2:
SELECT p.ssn,
pss.pin,
pd.doc_number,
p.surname,
p.name,
p.patronymic,
to_number(p.sex, '9') as sex,
citiz_c.short_name citizenship,
p.birth_place,
p.birth_day as birth_date,
coun_c.short_name as country,
di.name as leg_city,
trim( pa.settlement
|| ' '
|| pa.street) AS leg_street,
pd.issue_date,
pd.issuing_body,
irs.irn,
irs.tpn,
irs.reg_office,
to_number(irs.insurer_type, '9') as insurer_type,
TO_CHAR(sa.REG_CODE)
||CONVERT_INT_TO_DOUBLE_LETTER(TO_NUMBER(SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 2, 3)))
||SUBSTR(TO_CHAR(sa.DOSSIER_NR, '0999999'), 5, 4) CONVERTED_SSN_DOSSIER_NR,
fa.snr
FROM
(SELECT pss_t.pin,
pss_t.ssn
FROM EHDIS_INSURANCE.pin_ssn_status pss_t
WHERE pss_t.difference_status < 5
) pss
INNER JOIN SSPF_CENTRE.file_archive fa
ON fa.ssn = pss.ssn
INNER JOIN SSPF_CENTRE.persons p
ON p.ssn = fa.ssn
INNER JOIN
--The changed subquery starts here
(SELECT ssn,
type,
series,
doc_number,
issue_date,
issuing_body
FROM
(SELECT ssn,
type,
series,
doc_number,
issue_date,
issuing_body,
ROW_NUMBER() OVER (partition BY ssn order by type) rn
FROM SSPF_CENTRE.person_documents
)
WHERE rn = 1
) pd --
--The changed subquery ends here
ON pd.ssn = p.ssn
INNER JOIN SSPF_CENTRE.ssn_archive sa
ON p.ssn = sa.ssn
INNER JOIN SSPF_CENTRE.person_addresses pa
ON p.ssn = pa.ssn
INNER JOIN
(SELECT i_t.irn,
irs_t.ssn,
i_t.tpn,
i_t.reg_office,
(
CASE i_t.insurer_type
WHEN '4'
THEN '1'
ELSE i_t.insurer_type
END) AS insurer_type
FROM sspf_centre.irn_registered_ssn irs_t
INNER JOIN SSPF_CENTRE.insurers i_t
ON i_t.irn = irs_t.new_irn
OR i_t.old_irn = irs_t.old_irn
WHERE irs_t.is_registration IS NOT NULL
AND i_t.is_real IS NOT NULL
) irs ON irs.ssn = p.ssn
LEFT OUTER JOIN SSPF_CENTRE.districts di
ON di.code = pa.city
LEFT OUTER JOIN SSPF_CENTRE.countries citiz_c
ON p.citizenship = citiz_c.numeric_code
LEFT OUTER JOIN SSPF_CENTRE.countries coun_c
ON pa.country_code = coun_c.numeric_code
WHERE pa.address_flag = 1--Here's the column value without quotes
AND fa.form_type = 'Q3';
I've put separating comments for the changed subqueries and the WHERE clause in both queries. Both versions of the subqueries return the same result, one of them is just slower, which is why I decided to update it.
With the most simplistic example I can't reproduce your problem on 11.2.0.3.0 or 11.2.0.1.0.
SQL> create table tmp_test ( a char(1) );
Table created.
SQL> insert into tmp_test values ('1');
1 row created.
SQL> select *
2 from tmp_test
3 where a = 1;
A
-
1
If I then insert a non-numeric value into the table I can confirm Chris' comment "that Oracle will rewrite tab1.col1 = 1 to to_number(tab1.col1) = 1", which implies that you only have numeric characters in the column.
SQL> insert into tmp_test values ('a');
1 row created.
SQL> select *
2 from tmp_test
3 where a = 1;
ERROR:
ORA-01722: invalid number
no rows selected
If you're interested in tracking this down you should gradually reduce the complexity of the query until you have found a minimal, reproducible, example. Oracle can pre-compute a conversion to be used in a JOIN, which as your query is complex seems like a possible explanation of what's happening.
Oracle explicitly recommends against using implicit conversion so it's wiser not to use it at all; as you're finding out. For a start there's no guarantees that your indexes will be used correctly.
Oracle recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:
SQL statements are easier to understand when you use explicit data type conversion functions.
Implicit data type conversion can have a negative impact on performance, especially if the data type of a column value is converted to that of a constant rather than the other way around.
Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR2 value may return an unexpected year depending on the value of the NLS_DATE_FORMAT
parameter.
Algorithms for implicit conversion are subject to change across software releases and among Oracle products. Behavior of explicit conversions is more predictable.
If you do only have numeric characters in the column I would highly recommend changing this to a NUMBER(1) column and I would always recommend explicit conversion to avoid a lot of pain in the longer run.
It's hard to tell without the actual query. What I would expect is that TAB1.COL1 is in some way different before and after the refactoring.
Candidates differences are Number vs. CHAR(1) vs. CHAR(x>1) vs VARCHAR2
It is easy to introduce differences like this with subqueries where you join two tables which have different types in the join column and you return different columns in your subquery.
To hunt that issue down you might want to check the exact datatypes of your query. Not sure how to do that right now .. but an idea would be to put it in a view and use sqlplus desc on it.
Recently I fixed the some bug: there was rownum in the join condition.
Something like this: left join t1 on t1.id=t2.id and rownum<2. So it was supposed to return only one row regardless of the “left join”.
When I looked further into this, I realized that I don’t understand how Oracle evaluates rownum in the "left join" condition.
Let’s create two sampe tables: master and detail.
create table MASTER
(
ID NUMBER not null,
NAME VARCHAR2(100)
)
;
alter table MASTER
add constraint PK_MASTER primary key (ID);
prompt Creating DETAIL...
create table DETAIL
(
ID NUMBER not null,
REF_MASTER_ID NUMBER,
NAME VARCHAR2(100)
)
;
alter table DETAIL
add constraint PK_DETAIL primary key (ID);
alter table DETAIL
add constraint FK_DETAIL_MASTER foreign key (REF_MASTER_ID)
references MASTER (ID);
prompt Disabling foreign key constraints for DETAIL...
alter table DETAIL disable constraint FK_DETAIL_MASTER;
prompt Loading MASTER...
insert into MASTER (ID, NAME)
values (1, 'First');
insert into MASTER (ID, NAME)
values (2, 'Second');
commit;
prompt 2 records loaded
prompt Loading DETAIL...
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (1, 1, 'REF_FIRST1');
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (2, 1, 'REF_FIRST2');
insert into DETAIL (ID, REF_MASTER_ID, NAME)
values (3, 1, 'REF_FIRST3');
commit;
prompt 3 records loaded
prompt Enabling foreign key constraints for DETAIL...
alter table DETAIL enable constraint FK_DETAIL_MASTER;
set feedback on
set define on
prompt Done.
Then we have this query :
select * from master t
left join detail d on d.ref_master_id=t.id
The result set is predictable: we have all the rows from the master table and 3 rows from the detail table that matched this condition d.ref_master_id=t.id.
Result Set
Then I added “rownum=1” to the join condition and the result was the same
select * from master t
left join detail d on d.ref_master_id=t.id and rownum=1
The most interesting thing is that I set “rownum<-666” and got the same result again!
select * from master t
left join detail d on d.ref_master_id=t.id and rownum<-666.
Due to the result set we can say that this condition was evaluated as “True” for 3 rows in the detail table. But if I use “inner join” everything goes as supposed to be.
select * from master t
join detail d on d.ref_master_id=t.id and rownum<-666.
This query doesn’t return any row,because I can't imagine rownum to be less then -666 :-)
Moreover, if I use oracle syntax for outer join, using “(+)” everything goes well too.
select * from master m ,detail t
where m.id=t.ref_master_id(+) and rownum<-666.
This query doesn’t return any row too.
Can anyone tell me, what I misunderstand with outer join and rownum?
ROWNUM is a pseudo-attribute of result sets, not of base tables. ROWNUM is defined after rows are selected, but before they're sorted by an ORDER BY clause.
edit: I was mistaken in my previous writeup of ROWNUM, so here's new information:
You can use ROWNUM in a limited way in the WHERE clause, for testing if it's less than a positive integer only. See ROWNUM Pseudocolumn for more details.
SELECT ... WHERE ROWNUM < 10
It's not clear what value ROWNUM has in the context of a JOIN clause, so the results may be undefined. There seems to be some special-case handling of expressions with ROWNUM, for instance WHERE ROWNUM > 10 always returns false. I don't know how ROWNUM<-666 works in your JOIN clause, but it's not meaningful so I would not recommend using it.
In any case, this doesn't help you to fetch the first detail row for each given master row.
To solve this you can use analytic functions and PARTITION, and combine it with Common Table Expressions so you can access the row-number column in a further WHERE condition.
WITH numbered_cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY t.id ORDER BY d.something) AS rn
FROM master t LEFT OUTER JOIN detail d ON d.ref_master_id = t.id
)
SELECT *
FROM numbered_cte
WHERE rn = 1;
if you want to get the first three values from the join condition change the select statement like this.
select *
from (select *
from master t left join detail d on d.ref_master_id=t.id)
where rownum<3;
You will get the required output. Take care on unambigiously defined column names when using *
Let me give an absolute answer which u can run directly with out making any changes to the code.
select *
from (select t.id,t.name,d.id,d.ref_master_id,d.name
from master t left join detail d on d.ref_master_id=t.id)
where rownum<3;
A ROWNUM filter doesn't make any sense in a join, but it isn't being rejected as invalid.
The explain plan will either include the ROWNUM filter or exclude it. If it includes it, it will apply the filter to the detail table after applying the other join condition(s). So if you put in ROWNUM=100 (which will never be satisfied) all the detail rows are excluded and then the outer join kicks in.
If you put in ROWNUM=1 it seems to drop the filter.
And if you query
with
a as (select rownum a_val from dual connect by level < 10),
b as (select rownum*2 b_val from dual connect by level < 10)
select * from a left join b on a_val < b_val and rownum in (1,3);
you get something totally weird.
It probably should be rejected as an error, so expect nonsensical things to happen