Display all the fields associated with the record using Impala - hadoop

Suppose, I have a student table with some fields in impala. Imagine there is a field called total_mark and I should find the student details with maximum mark from each branch.
My table is like this :-
In this table I have to get the details of student with maximum marks from each department.
My query will be like this :-
select id,max(total_marks) from student_details group by department;
But using this query I can get only the id and total_marks. Provided there can be students with same name,age I can't group with fields like age,name .
So how should I query the table to get all the details of top student from each department ??
Thanks in advance.

You can make use of the JOIN concept
select stu.*
from student_details stu
join
( select department,max(total_marks) as max
from student_details
group by department
) rank
on stu.department=rank.department and stu.total_marks=rank.max;

Related

Oracle select rows from a query which are not exist in another query

Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query

Oracle SQL - How to add a filter on a case field that I created?

I'm relatively new to Oracle SQL and have run into an issue where I'm trying to filter a report to only return records logged by a specific list of user names.
They are currently stored in the system in fields user.first_name and user.surname and I've created the following CAST field in the coding to join the two together:
CAST(USER.FIRST_NAME||' '||USER.SURNAME as VARCHAR (25)) as CUSTOMER
What I want to do now though is restrict it so that my query will only return records where the customer is in a pre-determined list that I can hard core into the SQL.
eg I only want to see records for :
Joe Bloggs,
John Doe,
A Nother
How do I do this in Oracle SQL?
Thanks
One option (as you said you hardcoded it) is
select *
from your_table
where customer in ('Joe Bloggs', 'John Doe', 'A Nother');
A better one is to store those customers into a separate table and join it with your_table:
insert into separate_table (name) values ('Joe Bloggs'); -- do the same for the rest
select *
from your_table y join separate_table s on s.name = y.first_name ||' '||i.surname;
Even better, use their IDs (because, there could be two John Doe persons; which one will you take)?
insert into separate_table (id) values (1123); -- this is Joe Bloggs
select *
from your_table y join separate_table s on s.id = y.id;

SQL Query Performance with count

I have 2 tables, COMPANY and EMPLOYEE.
COMPANY_ID is the primary key of the COMPANY table and foreign key for EMPLOYEE table. The COMPANY_ID is a 10 digit number. We are generate a 3 number combination and query the database.
The select statement has regex to bulk load the company based on COMPANY_ID. The query is executed multiple times with different patterns
i.e.
regexp_like(COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)') .
Existing query looks something like this -
select *
from COMPANY company
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
The new requirement is to retrieve the company information along with the employee count. For example if a company has 10 employees, then the query should return all the columns of the COMPANY table, along with employee count i.e. 10
This is the select statement that I came up with -
select
nvl(count_table.cont_count, 0), company.*
from
COMPANY company,
(select company.COMPANY_ID, count(company.COMPANY_ID) as cont_count
from COMPANY company, EMPLOYEE employee
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and company.CONTACT_ID = employee.CONTACT_ID
group by (company.COMPANY_ID)) count_table
where
regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Above query works, but it takes double the time compared to the previous statement. Is there a better way to retrieve the employee count?
Note: Oracle database is in use.
You don't need to execute that expensive REGEXP_LIKE twice:
select nvl(count_table.cont_count,0),company.*
from COMPANY company
,( select employee.COMPANY_ID, count(employee.COMPANY_ID) as cont_count
from EMPLOYEE employee
group by (employee.COMPANY_ID)
) count_table
where regexp_like(company.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
and count_table.COMPANY_ID(+)= company.COMPANY_ID
Or you could use a scalar subquery:
select company.*
, (select count(*)
from employee e
where e.company_id = c.company_id
)
from COMPANY c
where regexp_like(c.COMPANY_ID, '^(000|001|002|003|004|005|006|007|008|009)')
And personally I would ditch the slow REGEXP_LIKE for something like:
where substr(c.company_id,1,3) between '000' and '009'
The derived table does not add value, thus I would get rid of it and use a scalar query (because I do not know all of your columns in the company table to properly do a group by):
select c.*,
nvl(
(select count(1)
from employee emp
where emp.company_id = c.company_id
),0) employee_count
from company c
where regexp_like(c.company_id, '^(000|001|002|003|004|005|006|007|008|009)')
Also, if performance is still an issue, I would consider modifying your where statement to not use a regexp.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Addendum
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I see that the question explicitly identifies that the employee table has company_id as a foreign key. Since this is clarified, I am removing this statement:
The data model for these tables is not intuitive (would you not have
company_id as a foreign key in the employees table?).

Creat view with a avg column

I need to create a view VIEW UOS_VU_STUDENT_AVERAGE, one of the column requries average GRADE, SQL:
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, STUDENT_MODULE.GRADE
FROM STUDENT, STUDENT_MODULE
WHERE STUDENT_ID<120000001
How could I avg grade in this sql?
try
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, avg(STUDENT_MODULE.GRADE)
FROM STUDENT, STUDENT_MODULE
WHERE STUDENT_ID<120000001
group by STUDENT.FIRST_NAME, STUDENT.LAST_NAME
as zerkms commented, there is no join condition, you probably need something like this:
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, avg(STUDENT_MODULE.GRADE)
FROM STUDENT join STUDENT_MODULE on student_module.STUDENT_ID = student.id
WHERE STUDENT_ID<120000001
group by STUDENT.FIRST_NAME, STUDENT.LAST_NAME
(I'm just guessing that the FK is on student_module.STUDENT_ID = student.id)
You have to use "Group by" for any mathematical function like
AVG ( [ ALL | DISTINCT ] expression )
for example ..
SELECT id, AVG(salary) from tablename GROUP BY filedname
You'll need to do something like the following. I've made a few assumptions on the structure of the tables.
CREATE VIEW uos_vu_student_average AS
SELECT first_name, last_name, AVG(grade) avg_grade
FROM student, student_module
WHERE student.student_id = student_module.student_id
AND student.student_id < 120000001
GROUP BY first_name, last_name;
In your example, and as was pointed out by a previous poster, you don't have a join in your original example so it would just average all the grades together, regardless of the student_id. The result would show everyone with the same average grade.
When doing aggregate functions inside a view you also need to assign the resulting column an alias, so you can reference it in some manner when performing DML against the view. In this case I assigned it avg_grade.

Getting latest record for each userid in rails 3.2

I have user with name, location, created_at as important fields in table.
I want to retrieve for each user the latest location,i.e, I want something like this:
username location created_at
abc New York 2012-08-18 16:18:57
xyz Mexico city 2012-08-18 16:18:57
abc Atlanta 2012-08-11 16:18:57
only input is UId(1,2) array of userids.please help me to accomplish this.I just want to know how to write query using active record query interface.
Generally, this should be a standard way to solve this kind of problems:
SELECT l1.user, l1.location
FROM locations l1
LEFT JOIN locations l2 ON l1.user = l2.user AND l2.created_at > l1.created_at
WHERE l2.id IS NULL
The idea is to join the table with itself, and find those rows which don't have any row with the same user and greater created_at.
Of course, you should have (user, created_at) index on your table.
Now you should see how would that be represented in AR interface.
When
u_id
is the array of user ids, then
u_id.map{|i| User.find(i).location}
should be an array of the users locations.
You can Use
User.where(:uid => [1,2,3]).maximum('location')
which will create something like
SELECT MAX(`users`.`location`) AS max_id FROM `users` WHERE `users`.`id` IN (1, 2,3)

Resources