I am using nested queries to achieve this:
Basically, I have this:
employee table:
employee_id, locale
audience table
employee_id
country table
country_name,country_code
country_language
country_code, geo
I need this: employee_id,audience_id,country_name,locale from these tables that come under "APAC" geo:
I have this query:
SELECT employee_id
FROM audience
WHERE employee_id IN
(SELECT employee_id
FROM employee
WHERE LOCALE IN
(SELECT LOCALE
FROM COUNTRY_LANGUAGE
WHERE COUNTRY_CODE IN
(SELECT COUNTRY_CODE
FROM COUNTRY
WHERE GEO='apac')
)
)
ORDER BY employee_id);
This is throwing this error: "SQL command not properly ended"
Also, will this query produce right results if run properly? If not, can u suggest something else?
Used this as joins. Did not return anything:
select a.employee_id,
a.locale,
b.audience_id,
c.LOCALE_CODE,
d.COUNTRY_NAME
from employee a,
audience b,
country_language c,
country d
where
a.employee_id=b.employee_ID
and d.geo='apac'
and d.country_code=c.country_code
and a.locale=c.LOCALE_CODE;
You can try to use UNION SELECT
Related
Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query
I am trying to using Sample clause using Group by and Having clauses.
My requirement is getting a Company ID which has only one record, I have tried many ways below is one of the methods I have tried.
With TESTA AS(
select TA.COMPANY_ID from(
select t1.COMPANY_ID from Table11 t1
where t1.COMPANY_ID in (select t2.COMPANY_ID from Table22 t2)
group by t1.COMPANY_ID having count(t1.COMPANY_ID)=1)TA )
select * FROM TESTA Sample(1);
When I execute the above query, It is throwing the below error.
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT,
GROUP BY, etc.
01446. 00000 - "cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc."
*Cause:
*Action:
But, When I execute the below query, I am getting the results but it is not satisfying my requirement.
select COMPANY_ID from Table11 Sample(10)
where COMPANY_ID in (select company_id from Table22 )
Group by COMPANY_ID HAVING Count(COMPANY_ID)=1
It is executing all the possible records, can someone help me on this, please.
I am using CDH-5.4.4 Cloudera Edition, I have a CSV file in HDFS location, My requirement is to perform Real time SQL queries on Hadoop Environement (OLTP).
So I decided to go with Impala, I have created MetaStore table to a CSV file, then execuing query in impala editor (within HUE application) .
When i am executing below query, i am getting error like
"AnalysisException: all DISTINCT aggregate functions need to have the
same set of parameters as count(DISTINCT City); deviating function:
count(DISTINCT Country)".
CSV File
OrderID,CustomerID,City,Country
Ord01,Cust01,Aachen,Germany
Ord02,Cust01,Albuquerque,USA
Ord03,Cust01,Aachen,Germany
Ord04,Cust02,Arhus,Denmark
Ord05,Cust02,Arhus,Denmark
Problamatic Query
Select CustomerID,Count(Distinct City),Count(Distinct Country) From CustomerOrders Group by CustomerID
Problem:
Unable to execute the Impala Query with More than one Distinct Values in an Query.. I have searched over internet they provide NDV() method as a workaround, But NDV method only returns approximate count of distinct values, I need Exact unique count for more than one fields.
Expectation:
What is the best way to do Exact unique count for more than one fields? Kindly modify the above query to work with Impala.
Note: This is not my original table, I have replicate for the forum question.
I've the same problem in Impala. Here is my workaround:
SELECT CustomerID
,sum(nr_of_cities)
,sum(nr_of_countries)
FROM (
SELECT CustomerID
,Count(DISTINCT City) AS nr_of_cities
,0 AS nr_of_countries
FROM CustomerOrders
GROUP BY CustomerID
UNION ALL
SELECT CustomerID
,0 AS nr_of_cities
,Count(DISTINCT Country) AS nr_of_countries
FROM CustomerOrders
GROUP BY CustomerID
) AS aa
GROUP BY CustomerID
I think this can be done cleaner (untested):
WITH
countries AS
(
SELECT CustomerID
,COUNT(DISTINCT City) AS nr_of_countries
FROM CustomerOrders
GROUP BY 1
)
,
cities AS
(
SELECT CustomerID
,COUNT(DISTINCT City) AS nr_of_cities
FROM CustomerOrders
GROUP BY 1
)
SELECT CustomerID
,nr_of_cities
,nr_of_countries
FROM cities INNER JOIN countries USING (CustomerID)
I have 2 tables, COMPANY and EMPLOYEE.
COMPANY_ID is the primary key of the COMPANY table and foreign key for EMPLOYEE table. The COMPANY_ID is a 10 digit number. We are generate a 3 number combination and query the database.
The select statement has regex to bulk load the company based on COMPANY_ID. The query is executed multiple times with different patterns
i.e.
regexp_like(COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)') .
Existing query looks something like this -
select *
from COMPANY company
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
The new requirement is to retrieve the company information along with the employee count. For example if a company has 10 employees, then the query should return all the columns of the COMPANY table, along with employee count i.e. 10
This is the select statement that I came up with -
select
nvl(count_table.cont_count, 0), company.*
from
COMPANY company,
(select company.COMPANY_ID, count(company.COMPANY_ID) as cont_count
from COMPANY company, EMPLOYEE employee
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and company.CONTACT_ID = employee.CONTACT_ID
group by (company.COMPANY_ID)) count_table
where
regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Above query works, but it takes double the time compared to the previous statement. Is there a better way to retrieve the employee count?
Note: Oracle database is in use.
You don't need to execute that expensive REGEXP_LIKE twice:
select nvl(count_table.cont_count,0),company.*
from COMPANY company
,( select employee.COMPANY_ID, count(employee.COMPANY_ID) as cont_count
from EMPLOYEE employee
group by (employee.COMPANY_ID)
) count_table
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Or you could use a scalar subquery:
select company.*
, (select count(*)
from employee e
where e.company_id = c.company_id
)
from COMPANY c
where regexp_like(c.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
And personally I would ditch the slow REGEXP_LIKE for something like:
where substr(c.company_id,1,3) between '000' and '009'
The derived table does not add value, thus I would get rid of it and use a scalar query (because I do not know all of your columns in the company table to properly do a group by):
select c.*,
nvl(
(select count(1)
from employee emp
where emp.company_id = c.company_id
),0) employee_count
from company c
where regexp_like(c.company_id, '^(000|001|002|003|004|005|006|007|008|009)')
Also, if performance is still an issue, I would consider modifying your where statement to not use a regexp.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Addendum
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I see that the question explicitly identifies that the employee table has company_id as a foreign key. Since this is clarified, I am removing this statement:
The data model for these tables is not intuitive (would you not have
company_id as a foreign key in the employees table?).
I have the following query
select a.empid, a.age, a.city, b.name
join supervisor b on a.supervisorid = b.empid
There is a chance that entries in "Supervisor" table may not be present in "Employee" table as an Employee
After forming the above query , i want to make "b.supervisorname" field as "null", if "b.supervisorid" not in "a.empid" column
EMPLOYEE TABLE:
EMPID--AGE--CITY--SUPERVISOR
1--12--A--123
2--21--B--1
3--23--C--2
Supervisor Table:
SUPERVISOR TABLE
EMPID--NAME
123--ABC
1--EFG
2-HIJ
OUTPUT:
EMPID--AGE--CITY--NAME
1--12--A--null
2--21--B--ABC
3--23--C--EFG
i dont want to use,
select a.empid, a.age, a.city, b.name
from employee a
join supervisor b on a.supervisorid =
(select empid
from supervisor
where empid in (select empid from employee))
as this kind of querying affects the performance
Is there any shortcut way to do it?
You should ALWAYS use explicit joins to avoid performance issues. And in general it helps to define a FROM clause in queries
The query below should work for you:
select
e.empid,
e.age,
e.city,
s.name
FROM
employee e
LEFT OUTER JOIN
supervisor s
on e.supervisor = s.empid