Is this possible?
I have a sample query:
select vehicle_name
from vehicles
where vehicle_name in ('TOYO', 'HOND');
I may have a lot vehicle_name in the IN operator clause. The result it returns is in the screenshot given below (screenshot 1).
What I want is in the second screenshot (screenshot 2) where first row should be HOND and second row should be TOYO. (based on alphabetical order) Third row should be HOND and fourth row should be TOYO. so on and so forth. In other words, two HOND or two TOYO should not come one after the other until the end of the result where no alternate vehicle_name is found.
Thanks,
You could use the analytic function ROW_NUMBER() to generate sequence numbers separately for each vehicle_name and use that for ordering. You don't need to add the function to SELECT - you can use it directly in ORDER BY.
SELECT vehicle_name
FROM vehicles
WHERE vehicle_name in ('HOND', 'TOYO')
ORDER BY row_number() over (partition by vehicle_name order by null), vehicle_name
;
Related
I have question about "Subquery in Order by clause". The below request returns the error. Is it means that Subquery in Order by clause must be scalar?
select *
from employees
order by (select * from employees where first_name ='Steven' and last_name='King');
Error:
ORA-00913: too many values
00913. 00000 - "too many values"
Yes, it means that if you use a subquery in ORDER BY it must be scalar.
With select * your subquery returns multiple columns and the DBMS would not know which of these to use for the sorting. And if you selected one column only, you would still have to make sure you only select one row of course. (The difference is that Oracle sees the too-many-columns problem immediately, but detect too many rows only when fetching the data.)
This would be allowed:
select * from employees
order by (select birthdate from employees where employee_id = 12345);
This is a scalar query, because it returns only one value (one column, one row). But of course this still makes as little sense as your original query, because the subquery result is independent from the main query, i.e. it returns the same value for every row in the table and thus no sorting takes effect.
A last remark: A subquery in ORDER BY makes very seldomly sense, because that would mean you order by something you don't display. The exception is when looking up a sortkey. E.g.:
select *
from products p
where type = 'shirt' and color = 'blue' and size in ('S', 'M', 'L', 'XL')
order by (select sortkey from sizes s where s.size = p.size);
It means that valid options for ORDER BY clause can be
expression,
position or
column alias
A subquery is neither of these.
These are the field (crane_no) values to be sorted
QC11QC10QC9
I tried the following query:
select * from table order by crane_no DESC
but query results does not give in an order because the field is mixed with staring and number (Example:QC12).
I get following results for above query:
QC9QC11QC10
I want the results to be in order (QC9, QC10, QC11). Thanks
If the data isn't huge, I'd use a regex order by clause:
select
cran_no
from your_table
order by
regexp_substr(cran_no, '^\D*') nulls first,
to_number(regexp_substr(cran_no, '\d+'))
This looks for the numbers in the string, so rows like 'QCC20', 'DCDS90' are ordered properly; it also takes care of nulls.
One approach is to extract the numeric portion of the crane_no columns using SUBSTR(), cast to an integer, and order descending by this value.
SELECT *
FROM yourTable
ORDER BY CAST(SUBSTR(crane_no, 3) AS INT) DESC
Note that I assume in my answer that every entry in crane_no is prefixed with the fixed width QC. If not, then we would have to do more work to identify the numerical component.
select ...
order by to_number( substr( crane_no,3 )) desc
I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.
I have a select statement that includes a call to the row_number() function, which technically gives me a unique id per row that is returned.
SELECT f.*, row_number() as row_id OVER(ORDER BY f.name)
FROM widgets f
It'd be kinda cool if i could somehow use this row_id to sort the table. I'd now like to try to use the row_number to sort like so:
table.sort(mytable, function(a,b) return a.row_id< b.row_id end)
I'm just trying to save myself from having to loop through the results to add a unique id and then sort it but maybe it's not possible.
I don't know how you would express this in Lua, but in Postgres you can order by a column alias. The Postgres query would be:
SELECT f.*, row_number() OVER (ORDER BY f.name) as row_id
FROM widgets f
ORDER BY row_id;
The fact that you don't want to order by f.name suggests that you have duplicates. Do note that ordering in SQL is not guaranteed to be stable. That is, duplicate names could be in different orders. If you have a way of making the ordering stable (i.e. by uniquely identifying each row), you can use those columns in the order by.
I can find empirical distribution that way
select command_type, duration, round(percentage, 2)
from (select distinct command_type,duration_sec,
percent_rank() over(partition by command_type order by duration) percentage
from command_durations
order by 1, 2)
The question is how to do the same using oracle model clause. I have started with this
select command_type,duration,dur_count from command_durations
model UNIQUE SINGLE REFERENCE
partition by (command_type)
dimension by ( duration)
measures(0 dur_count)
rules(
dur_count[duration]=count(1)[cv(duration)]
)
order by command_type,duration
But now I need to make records distinct, in order to be able to proceed with finding empirical distribution.
How to do the records distinct in the model clause?
If you want to take that query and use 'distinct' on it, one method might be to wrap that in a From Subquery statement, and then do a distinct. For instance:
Select Distinct command_type, duration, dur_count
From (
[Your Code]
)
Let me know if that works.