Column that sums values once per unique ID, while filtering on type (Oracle Fusion Transportation Intelligence) - oracle

I realize that this has been discussed before but haven't seen a solution in a simple CASE expression for adding a column in Oracle FTI - which is as far as my experience goes at the moment unfortunately. My end goal is to have an total Weight for each Category only counting the null type entries and only one Weight per ID (Don't know why null was chosen as the default Type). I need to break the data apart by Type for a total Cost column which is working fine so I didn't include that in the example data below, but because I have to break the data up by Type, I am having trouble eliminating redundant values in my Total Weight results.
My original column which included redundant weights was as follows:
SUM(CASE Type
WHEN null
THEN 'Weight'
ELSE null
END)
Some additional info:
Each ID can have multiple Types (additionally each ID may not always have A or B but should always have null)
Each ID can only have one Weight (But when broken apart by type the value just repeats and messes up results)
Each ID can only have one Category (This doesn't really matter since I already separate this out with a Category column in the results)
Example Data:
ID |Categ. |Type | Weight
1 | Old | A | 1600
1 | Old | B | 1600
1 | Old |(null) | 1600
2 | Old | B | 400
2 | Old |(null) | 400
2 | Old |(null) | 400
3 | New | A | 500
3 | New | B | 500
3 | New |(null) | 500
4 | New | A | 500
4 | New |(null) | 500
4 | New |(null) | 500
Desired Results:
Categ. | Total Weight
Old | 2000
New | 1000
I was trying to figure out how to include a DISTINCT based on ID in the column, but when I put DISTINCT in front of CASE it just eliminates redundant weights so I would just get 500 for Total Weight New.
Additionally, I thought it would be possible to divide the weight by the count of weights before aggregating them, but this didn't seem to work either:
SUM(CASE Type
WHEN null
THEN 'Weight'/COUNT(CASE Type
WHEN null
THEN 'Weight'
ELSE null
END)
ELSE null
END)
I am very appreciative of any help that can be offered, please let me know if there is a simple way to create a column that achieves the desired results. As it may be apparent, I am pretty new to Oracle, so please let me know if there is any additional information that is needed.
Thanks,
Will

You don't need a case statement here. You were on the right track with distinct, but you also need to use an inline view (a subquery in the from the caluse).
The subquery in the from clause, selecting all distinct combinations of (id, categ, weight), allows you to then select from the result set, whereby you select only categ, sum of weight, grouping by categ. The subquery in the from clause has no repeated weights for a given id (unlike the table itself, which is why this is needed).
This would have to be done a little differently if an id were ever to have more than one category, but you noted that an id only ever has one category.
select categ,
sum(weight)
from (select distinct id,
categ,
weight
from tbl)
group by categ;
Fiddle: http://sqlfiddle.com/#!4/11a56/1/0

Related

How to select nth row in CockroachDB?

If I use something like a SERIAL (which is a random number) for my table's primary key, how can I select a numbered row from my table? In MySQL, I just use the auto incremented ID to select a specific row, but not sure how to approach the problem with an arbitrary numbering sequence.
For reference, here is the table I'm working with:
+--------------------+------+-------+
| id | name | score |
+--------------------+------+-------+
| 235451721728983041 | ABC | 1000 |
| 235451721729015809 | EDF | 1100 |
| 235451721729048577 | GHI | 1200 |
| 235451721729081345 | JKL | 900 |
+--------------------+------+-------+
Using the LIMIT and OFFSET clauses will return the nth row. For example SELECT * FROM tbl ORDER BY col1 LIMIT 1 OFFSET 9 returns the 10th row.
Note that it’s important to include the ORDER BY clause here because you care about the order of the results (if you don’t include ORDER BY, it’s possible that the results are arbitrarily ordered).
If you care about the order in which things were inserted, you could ORDER BY the SERIAL column (id in your case), though it’s not always the case because transaction contention and other things could cause the generated SERIAL values to not be strictly ordered.

ORDER BY subquery and ROWNUM goes against relational philosophy?

Oracle 's ROWNUM is applied before ORDER BY. In order to put ROWNUM according to a sorted column, the following subquery is proposed in all documentations and texts.
select *
from (
select *
from table
order by price
)
where rownum <= 7
That bugs me. As I understand, table input into FROM is relational, hence no order is stored, meaning the order in the subquery is not respected when seen by FROM.
I cannot remember the exact scenarios but this fact of "ORDER BY has no effect in the outer query" I have read more than once. Examples are in-line subqueries, subquery for INSERT, ORDER BY of PARTITION clause, etc. For example in
OVER (PARTITION BY name ORDER BY salary)
the salary order will not be respected in outer query, and if we want salary to be sorted at outer query output, another ORDER BY need to be added in the outer query.
Some insights from everyone on why the relational property is not respected here and order is stored in the subquery ?
The ORDER BY in this context is in effect Oracle's proprietary syntax for generating an "ordered" row number on a (logically) unordered set of rows. This is a poorly designed feature in my opinion but the equivalent ISO standard SQL ROW_NUMBER() function (also valid in Oracle) may make it clearer what is happening:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, *
from table
) t
where rn <= 7;
In this example the ORDER BY goes where it more logically belongs: as part of the specification of a derived row number attribute. This is more powerful than Oracle's version because you can specify several different orderings defining different row numbers in the same result. The actual ordering of rows returned by this query is undefined. I believe that's also true in your Oracle-specific version of the query because no guarantee of ordering is made when you use ORDER BY in that way.
It's worth remembering that Oracle is not a Relational DBMS. In common with other SQL DBMSs Oracle departs from the relational model in some fundamental ways. Features like implicit ordering and DISTINCT exist in the product precisely because of the non-relational nature of the SQL model of data and the consequent need to work around keyless tables with duplicate rows.
Not surprisingly really, Oracle treats this as a bit of a special case. You can see that from the execution plan. With the naive (incorrect/indeterminate) version of the limit that crops up sometimes, you get SORT ORDER BY and COUNT STOPKEY operations:
select *
from my_table
where rownum <= 7
order by price;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=7)
If you just use an ordered subquery, with no limit, you only get the SORT ORDER BY operation:
select *
from (
select *
from my_table
order by price
);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
| 2 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
With the usual subquery/ROWNUM construct you get something different,
select *
from (
select *
from my_table
order by price
)
where rownum <= 7;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 13 | 3 (34)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 13 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=7)
3 - filter(ROWNUM<=7)
The COUNT STOPKEY operation is still there for the outer query, but the inner query (inline view, or derived table) now has a SORT ORDER BY STOPKEY instead of the simple SORT ORDER BY. This is all hidden away in the internals so I'm speculating, but it looks like the stop key - i.e. the row number limit - is being pushed into the subquery processing, so in effect the subquery may only end up with seven rows anyway - though the plan's ROWS value doesn't reflect that (but then you get the same plan with a different limit), and it still feels the need to apply the COUNT STOPKEY operation separately.
Tom Kyte covered similar ground in an Oracle Magazine article, when talking about "Top- N Query Processing with ROWNUM" (emphasis added):
There are two ways to approach this:
- Have the client application run that query and fetch just the first N rows.
- Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.
The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do.
... and then goes on to outline what it's actually doing, rather more authoritatively than I can.
Interestingly I don't think the order of the final result set is actually guaranteed; it always seems to work, but arguably you should still have an ORDER BY on the outer query too to make it complete. It looks like the order isn't really stored in the subquery, it just happens to be produced like that. (I very much doubt that will ever change as it would break too many things; this ends up looking similar to a table collection expression which also always seems to retain its ordering - breaking that would stop dbms_xplan working though. I'm sure there are other examples.)
Just for comparison, this is what the ROW_NUMBER() equivalent does:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, my_table.*
from my_table
) t
where rn <= 7;
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 2 | 52 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 2 | 26 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | MY_TABLE | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"<=7)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "PRICE")<=7)
Adding to sqlvogel's good answer :
"As I understand, table input into FROM is relational"
No, table input into FROM is not relational. It is not relational because "table input" are tables and tables are not relations. The myriads of quirks and oddities in SQL eventually all boil down to that simple fact : the core building brick in SQL is the table, and a table is not a relation. To sum up the differences :
Tables can contain duplicate rows, relations cannot. (As a consequence, SQL offers bag algebra, not relational algebra. As another consequence, it is as good as impossible for SQL to even define equality comparison for its most basic building brick !!! How would you compare tables for equality given that you might have to deal with duplicate rows ?)
Tables can contain unnamed columns, relations cannot. SELECT X+Y FROM ... As a consequence, SQL is forced into "column identity by ordinal position", and as a consequence of that, you get all sorts of quirks, e.g. in SELECT A,B FROM ... UNION SELECT B,A FROM ...
Tables can contain duplicate column names, relations cannot. A.ID and B.ID in a table are not distinct column names. The part before the dot is not part of the name, it is a "scope identifier", and that scope identifier "disappears" once you're "outside the SELECT" it appears/is introduced in. You can verify this with a nested SELECT : SELECT A.ID FROM (SELECT A.ID, B.ID FROM ...). It won't work (unless your particular implementation departs from the standard in order to make it work).
Various SQL constructs leave people with the impression that tables do have an ordering to rows. The ORDER BY clause, obviously, but also the GROUP BY clause (which can be made to work only by introducing rather dodgy concepts of "intermediate tables with rows grouped together"). Relations simply are not like that.
Tables can contain NULLs, relations cannot. This one has been beaten to death.
There should be some more, but I don't remember them off the tip of the hat.

whats the purpose of using over and rank keywords in hive sql?

What is the meaning/purpose of using over and rank keywords in hive sql?
select rank() over (order by net_worth desc) as rank, name, net_worth from wealth order by rank, name;
+------+---------+---------------+
| rank | name | net_worth |
+------+---------+---------------+
| 1 | Solomon | 2000000000.00 |
| 2 | Croesus | 1000000000.00 |
| 2 | Midas | 1000000000.00 |
| 4 | Crassus | 500000000.00 |
| 5 | Scrooge | 80000000.00 |
+------+---------+---------------+
The OVER clause is powerful in that you can have aggregates over different ranges ("windowing"), whether you use a GROUP BY or not
OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results
Over clause can be used in association with aggregate function and ranking function. The over clause determine the partitioning and ordering of the records before associating with aggregate or ranking function.
suppose you use only rank() function then how sql will understand on which bases rank will be calculated. example table have 3 columns name, net_worth and net_profit. Name with highest net_profit will be first rank. so you have to tell the sql that calculate rank on the bases of highest net_profit.
over() works on a "window" of attributes.
In your example, select rank() over (order by net_worth desc), you have instructed to rank the table with net_worth column in descending order. Due to that reason, ranking is done on descending order of net_worth.
over() is powerful it was used along with partition by.
Have a look at this article, which provides good examples to understand the concepts.
If you have sales table with Territory & Sales Amount, you can provide rank on order of Sales Amount Or create a partition for Territory and rank the Sales amount with in a Territory.
Have a look at this article to get understanding on WindowingAndAnalytics. It will explain how to use aggregate functions in HiveQL.

Oracle Identify not unique values in a clob column of a table

I want to identify all rows whose content in a clob column is not unique.
The query I use is:
select
id,
clobtext
from
table t
where
(select count(*) from table innerT where dbms_lob.compare(innerT.clobtext, t.clobtext) = 0)>1
However this query is very slow. Any suggestions to speed it up? I already tried to use the dbms_lob.getlength function to eliminate more elements in the subquery but I didn't really improve the performance (feels the same).
To make it more clear an example:
table
ID | clobtext
1 | a
2 | b
3 | c
4 | d
5 | a
6 | d
After running the query. I'd like to get (order doesn't matter):
1 | a
4 | d
5 | a
6 | d
In the past I've generated checksums (in my C# code) for each clob.
Whilst this will inccur a one off increase in io (to generate the checksum)
subsequent scans will be quicker, and you can index the value too
TK has a good PL\SQL example here:
Ask Tom

Will this type of pagination scale?

I need to paginate on a set of models that can/will become large. The results have to be sorted so that the latest entries are the ones that appear on the first page (and then, we can go all the way to the start using 'next' links).
The query to retrieve the first page is the following, 4 is the number of entries I need per page:
SELECT "relationships".* FROM "relationships" WHERE ("relationships".followed_id = 1) ORDER BY created_at DESC LIMIT 4 OFFSET 0;
Since this needs to be sorted and since the number of entries is likely to become large, am I going to run into serious performance issues?
What are my options to make it faster?
My understanding is that an index on 'followed_id' will simply help the where clause. My concern is on the 'order by'
Create an index that contains these two fields in this order (followed_id, created_at)
Now, how large is the large we are talking about here? If it will be of the order of millions.. How about something like the one that follows..
Create an index on keys followed_id, created_at, id (This might change depending upon the fields in select, where and order by clause. I have tailor-made this to your question)
SELECT relationships.*
FROM relationships
JOIN (SELECT id
FROM relationships
WHERE followed_id = 1
ORDER BY created_at
LIMIT 10 OFFSET 10) itable
ON relationships.id = itable.id
ORDER BY relationships.created_at
An explain would yield this:
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | relationships | ref | sample_rel2 | sample_rel2 | 5 | | 1 | Using where; Using index |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
If you examine carefully, the sub-query containing the order, limit and offset clauses will operate on the index directly instead of the table and finally join with the table to fetch the 10 records.
It makes a difference when at one point your query makes a call like limit 10 offset 10000. It will retrieve all the 10000 records from the table and fetch the first 10. This trick should restrict the traversal to just the index.
An important note: I tested this in MySQL. Other database might have subtle differences in behavior, but the concept holds good no matter what.
you can index these fields. but it depends:
you can assume (mostly) that the created_at is already ordered. So that might by unnecessary. But that more depends on you app.
anyway you should index followed_id (unless its the primary key)

Resources