Is it possible to select only distinct combinations of multiple columns?
E.g. only the distinct combinations of customers and the dates they placed orders (as a representation of only days they placed orders)?
What you’re looking for are groups of data (which shows you only distinct combinations of values), which you can return with the GROUP BY clause.
SELECT customer_id, date
FROM orders
GROUP BY customer_id, date;
SELECT DISTINCT
customerId, orderDate
FROM
table;
OR
SELECT DISTINCTROW
customerId, orderDate
FROM
table;
Related
Does anyone know how to write the SELECT DISTINCT statement so that the rows circled in blue can be treated as duplicates? Currently, at the date level, they are duplicates but they have time differences
SELECT ID, PHN_NO, DATE_CREATED, DATE_MODIFIED FROM USER_PHONE_HISTORY
WHERE PHONE_NUMBER = '1234567890'
ORDER BY START_DATE DESC;
-- 12 RECORDS
SELECT DISTINCT ID, PHN_NO, DATE_CREATED, DATE_MODIFIED FROM USER_PHONE_HISTORY
WHERE PHONE_NUMBER = '1234567890'
ORDER BY START_DATE DESC;
-- 12 RECORDS
If I understand correctly, you probably want something like
select distinct id, phn_no, trunc(date_created) as date_created,
trunc(date_modified) as date_modified
from user_phone_history
where .......
order by .......
or some simple modification thereof (it's not clear which date you must handle - this handles both).
I am not sure why you want to do this, but I assume you have your reasons...
Have the following data tables:
menu_items:
item_id,
item_name,
price,
sales:
item_id,
customer_id,
employee_id,
date
I am attempting to join the tables on item_id. I want to display the item_name, number of item_names sold and date, and group them by the date. How should I adjust the code below to make the query work.
select item_name, count(item_name), date
from menu_items join sales
on item_id = item_id
group by date
As you probably found out, it won't work; all non-aggregated items must be contained in the GROUP BY clause. Also, you should use table aliases, always.
select s.date_col,
i.item_name,
count(*) number_of_items_sold
from menu_items i join sales s on s.item_id = i.item_id
group by s.date_col, i.item_name
order by s.date_col, i.item_name;
If it is not what you wanted, please, post some sample data and desired output; it might be easier to answer, then.
I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.
select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';
All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.
You can use row_number function to solve the problem.
create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc
where col1 = 'something';
select *
from temp
where rn=1
You can also sort the table while partitioning.
row_number() over (partition by col1,col2,col3 order by col4 asc) as rn
DISTINCT is the most overused and least understood function in SQL. It's the last thing that is executed over your entire result set and removes duplicates using ALL columns in your select. You can do a GROUP BY with a string, in fact that is the answer here:
SELECT col1,col2,col3,COLLECT_SET(col4),COLLECT_SET(col5),COLLECT_SET(col6)
FROM abc WHERE col1 = 'something'
GROUP BY col1,col2,col3;
Now that I re-read your question though, I'm not really sure what you are after. You might have to join the table to an aggregate of itself.
I'm using this query:
SELECT *
FROM HISTORY
LEFT JOIN CUSTOMER ON CUSTOMER.CUST_NUMBER = HISTORY.CUST_NUMBER
LEFT JOIN (
Select LOAN_DATE, CUST_NUMBER, ACCOUNT_NUMBER, STOCK_NUMBER, LOC_SALE
From LOAN
WHERE ACCOUNT_NUMBER != 'DD'
ORDER BY LOAN_DATE DESC
) LOAN ON LOAN.CUST_NUMBER = HISTORY.CUST_NUMBER
order by DATE desc
But I want only the top result from the loan table to be joined (Most recent by Loan_date). For some reason, it's getting three records (one for each loan on the customer I'm looking at). I'm sure I'm missing something simple?
If you're after joining the latest loan row per cust_number, then this ought to do the trick:
select *
from history
left join customer on customer.cust_number = history.cust_number
left join (select loan_date,
cust_number,
account_number,
stock_number,
loc_sale
from (select loan_date,
cust_number,
account_number,
stock_number,
loc_sale,
row_number() over (partition by cust_number
order by loan_date desc) rn
from loan
where account_number != 'DD')
where rn = 1) loan on loan.cust_number = history.cust_number
order by date desc;
If there are two rows with the same loan_date per cust_number and you want to retrieve both, then change the row_number() analytic function for rank().
If you only want to retreive one row, then you'd have to make sure you add additional columns into the order by, to make sure that the tied rows always display in the same order, otherwise you could find that sometimes you get different rows returned on subsequent runs of the query.
I have table that stores employee job name, it has the following columns:
id; date_from; date_to; emp_id; jobname_id; grade;
Each emp_id can have many consecutive records with the same jobname_id due to many grade changes.
How can I select previous different jobname_id omitting those that are the same like the most current one?
This solution uses the FIRST_VALUE() analytic function to identify each employee's current job. It then filters for all the jobs which dfon't match that one:
select distinct id
, jobname_id
from ( select id
, jobname_id
, first_value(jobname_id) over (partition by id
order by from_date desc) as current_job
from employee
where emp_id = 1234 )
where jobname_id != current_job
order by id, jobname_id
/
Will this work for your issue:
SELECT DISTINCT
e1.emp_id,
e1.jobname_id
FROM employee e1
WHERE NOT EXISTS
(SELECT 1
FROM employee e2
WHERE e1.emp_id = e2.emp_id
AND SYSDATE BETWEEN e2.date_from
AND NVL(e2.date_to, SYSDATE + 1));
(This asumes your table is named "employee" and emp_id is the PK value).
It selects unique emp_id, jobname_id values where the emp_id, jobname_id values are not current.
EDIT: I agree with Chin Boon that fundamentally this is a design issue and perhaps that should be addressed rather than working around the problem.