Determine table to select from based on indicator - oracle

I have a situation where there are three tables with same table design (same amount of columns, same PK, same FK) but store different value on one date, only one data will be stored per day in each table. Each table has a flag column (Y or N) to determine which should be the value to be reported on the specific day. Only one table's value will be flagged as Y for one CODE in one day.
Example:
Table 1:
Date | Code | Value | Flag
01-DEC, ABC, 111, N
Table 2:
Date | Code | Value | Flag
01-DEC, ABC, 222, N
Table 3:
Date | Code | Value | Flag
01-DEC, ABC, 333, Y
Refer to the example above, the value of CODE ABC for date: 01-DEC should be 333 as it's flagged as Y.
How should the SQL look like?

First off, I would strongly question the data model that has three identical tables that seem to store essentially the same information and serve the same purpose.
You could, assuming that there will be only one Y row, do
SELECT *
FROM( SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2
UNION ALL
SELECT *
FROM table3 )
WHERE flag = 'Y'

Related

Oracle: Update values in table with aggregated values from same table

I am looking for a possibly better approach to this.
I have created a temp table in Oracle 11.2 that I'm using to pre calculate values that I will need in other selects instead of always generating them again with each select.
create global temporary table temp_foo (
DT timestamp(6), --only the date part will be used in this example but for later things I will need the time
Something varchar2(100),
Customer varchar2(100),
MinDate timestamp(6),
MaxDate timestamp(6),
Filecount int,
Errorcount int,
AvgFilecount int,
constraint PK_foo primary key (DT, Customer)
) on commit preserve rows;
I then first insert some fixed values for everything except AvgFilecount. AvgFilecount should contain the average for the Filecount for the 3 previous records (going by the date in DT). It doesn’t matter that the result will be converted to an int, I don’t need the decimal places
DT | Customer | Filecount | AvgFilecount
2019-04-30 | x | 10 | avg(2+3+9)
2019-04-29 | x | 2 | based on values before this
2019-04-28 | x | 3 | based on values before this
2019-04-27 | x | 9 | based on values before this
I thought about using a normal UPDATE statement as this should be faster than looping through the values. I should mention that there are no gaps in the DT field but obviously there is a first one where I won‘t find any previous records. If I would loop through, I could easily calculate AvgFilecount with (the record before previous record/2 + previous record)/3 which I cannot with UPDATE as I cannot guarantee the order of how they are executed. So I‘m fine with just taking the last 3 records (going by DT) and calcuting it from there.
What I thought would be an easy update is giving me headaches. I‘m mostly doing SQL Server where I would just join the 3 other records but it seems is a bit different in Oracle. I have found https://stackoverflow.com/a/2446834/4040068 and wanted to use the second approach in the answer.
update
(select curr.DT, curr.temp_foo, curr.Filecount, curr.AvgFilecount as OLD, (coalesce(Minus1.Filecount, 0) + coalesce(Minus2.Filecount, 0) + coalesce(Minus3.Filecount, 0)) / 3 as NEW
from temp_foo curr
left join temp_foo Minus1 ON Minus1.Customer = curr.Customer and trunc(Minus1.DT) = trunc(curr.DT-1)
left join temp_foo Minus2 ON Minus2.Customer = curr.Customer and trunc(Minus2.DT) = trunc(curr.DT-2)
left join temp_foo Minus3 ON Minus3.Customer = curr.Customer and trunc(Minus3.DT) = curr.DT-3
order by 1, 2
)
set OLD = NEW;
Which gives me an
ORA-01779: cannot modify a column which maps to a non key-preserved
table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
I thought this should work as both join conditions are in the primary key and thus unique. I am currently implementing the first approach in the above mentioned answer but it is getting quite big and it feels like there should be a better solution to this.
Other things I thought about trying:
using a nested subselect (nested because Oracle doesn’t know top(n) and I need to sort the subselect) to select the previous 3 records ordered by DT and then he outer select with rownum <=3 and then I could just use AVG(). However, I was told subselect can be quite slow and joins are better in Oracle performance wise. Dunno if that is really the case, haven‘t done any testing
Edit: My insert right now looks like this. I am already aggregating the Filecount for a day as there can be multiple records per DT per Customer per Something.
insert into temp_foo (DT, Something, Customer, Filecount)
select dates.DT, tbl1.Something, tbl1.Customer, coalesce(sum(tbl3.Filecount),0)
from table(Function_Returning_Daterange(NULL, NULL)) dates
cross join
(SELECT Something,
Code,
Value
FROM Table2 tbl2
WHERE (Something = 'Value')) tbl1
left outer join Table3 tbl3
on tbl3.Customer = tbl1.Customer
and trunc(tbl3.MinDate) = trunc(dates.DT)
group by dates.DT, tbl1.Something, tbl1.Customer;
You could use an analytic average with a window clause:
select dt, customer, filecount,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
and then do the update part with a merge statement:
merge into tmp_foo t
using (
select dt, customer,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
) s
on (s.dt = t.dt and s.customer = t.customer)
when matched then update set t.avgfilecount = s.avgfilecount;
4 rows merged.
select dt, customer, filecount, avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
You haven't shown your original insert statement; it might be possible to add the analytic calculation to that, and avoid the separate update step.
Also, if you want the first two date values to be calculated as if the 'missing' extra days before them had zero counts, you could use sum and division instead of avg:
select dt, customer, filecount,
sum(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding)/3 as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 4
2019-04-28 x 3 3
2019-04-27 x 9
It depends what you expect those last calculated values to be.

In hiveql, what is the most elegant/performatic way of calculating an average value if some of the data is implicitly not present?

In Hiveql, what is the most elegant and performatic way of calculating an average value when there are 'gaps' in the data, with implicit repeated values between them? i.e. Considering a table with the following data:
+----------+----------+----------+
| Employee | Date | Balance |
+----------+----------+----------+
| John | 20181029 | 1800.2 |
| John | 20181105 | 2937.74 |
| John | 20181106 | 3000 |
| John | 20181110 | 1500 |
| John | 20181119 | -755.5 |
| John | 20181120 | -800 |
| John | 20181121 | 1200 |
| John | 20181122 | -400 |
| John | 20181123 | -900 |
| John | 20181202 | -1300 |
+----------+----------+----------+
If I try to calculate a simple average of the november rows, it will return ~722.78, but the average should take into account the days that are not shown have the same balance as the previous register. In the above data, John had 1800.2 between 20181101 and 20181104, for example.
Assuming that the table always have exactly one row for each date/balance and given that I cannot change how this data is stored (and probably shouldn't since it would be a waste of storage to write rows for days with unchanged balances), I've been tinkering with getting the average from a select with subqueries for all the days in the queried month, returning a NULL for the absent days, and then using case to get the balance from the previous available date in reverse order. All of this just to avoid writing temporary tables.
Step 1: Original Data
The 1st step is to recreate a table with the original data. Let's say the original table is called daily_employee_balance.
daily_employee_balance
use default;
drop table if exists daily_employee_balance;
create table if not exists daily_employee_balance (
employee_id string,
employee string,
iso_date date,
balance double
);
Insert Sample Data in original table daily_employee_balance
insert into table daily_employee_balance values
('103','John','2018-10-25',1800.2),
('103','John','2018-10-29',1125.7),
('103','John','2018-11-05',2937.74),
('103','John','2018-11-06',3000),
('103','John','2018-11-10',1500),
('103','John','2018-11-19',-755.5),
('103','John','2018-11-20',-800),
('103','John','2018-11-21',1200),
('103','John','2018-11-22',-400),
('103','John','2018-11-23',-900),
('103','John','2018-12-02',-1300);
Step 2: Dimension Table
You will need a dimension table where you will have a calendar (table with all the possible dates), call it dimension_date. This is a normal industry standard to have a calendar table, you could probably download this sample data over the internet.
use default;
drop table if exists dimension_date;
create external table dimension_date(
date_id int,
iso_date string,
year string,
month string,
month_desc string,
end_of_month_flg string
);
Insert some sample data for entire month of Nov 2018:
insert into table dimension_date values
(6880,'2018-11-01','2018','2018-11','November','N'),
(6881,'2018-11-02','2018','2018-11','November','N'),
(6882,'2018-11-03','2018','2018-11','November','N'),
(6883,'2018-11-04','2018','2018-11','November','N'),
(6884,'2018-11-05','2018','2018-11','November','N'),
(6885,'2018-11-06','2018','2018-11','November','N'),
(6886,'2018-11-07','2018','2018-11','November','N'),
(6887,'2018-11-08','2018','2018-11','November','N'),
(6888,'2018-11-09','2018','2018-11','November','N'),
(6889,'2018-11-10','2018','2018-11','November','N'),
(6890,'2018-11-11','2018','2018-11','November','N'),
(6891,'2018-11-12','2018','2018-11','November','N'),
(6892,'2018-11-13','2018','2018-11','November','N'),
(6893,'2018-11-14','2018','2018-11','November','N'),
(6894,'2018-11-15','2018','2018-11','November','N'),
(6895,'2018-11-16','2018','2018-11','November','N'),
(6896,'2018-11-17','2018','2018-11','November','N'),
(6897,'2018-11-18','2018','2018-11','November','N'),
(6898,'2018-11-19','2018','2018-11','November','N'),
(6899,'2018-11-20','2018','2018-11','November','N'),
(6900,'2018-11-21','2018','2018-11','November','N'),
(6901,'2018-11-22','2018','2018-11','November','N'),
(6902,'2018-11-23','2018','2018-11','November','N'),
(6903,'2018-11-24','2018','2018-11','November','N'),
(6904,'2018-11-25','2018','2018-11','November','N'),
(6905,'2018-11-26','2018','2018-11','November','N'),
(6906,'2018-11-27','2018','2018-11','November','N'),
(6907,'2018-11-28','2018','2018-11','November','N'),
(6908,'2018-11-29','2018','2018-11','November','N'),
(6909,'2018-11-30','2018','2018-11','November','Y');
Step 3: Fact Table
Create a fact table from the original table. In normal practice, you ingest the data to hdfs/hive then process the raw data and create a table with historical data where you keep inserting in increment manner. You can look more into data warehousing to get the proper definition but I call this a fact table - f_employee_balance.
This will re-create the original table with missing dates and populate the missing balance with earlier known balance.
--inner query to get all the possible dates
--outer self join query will populate the missing dates and balance
drop table if exists f_employee_balance;
create table f_employee_balance
stored as orc tblproperties ("orc.compress"="SNAPPY") as
select q1.employee_id, q1.iso_date,
nvl(last_value(r.balance, true) --initial dates to be populated with 0 balance
over (partition by q1.employee_id order by q1.iso_date rows between unbounded preceding and current row),0) as balance,
month, year from (
select distinct
r.employee_id,
d.iso_date as iso_date,
d.month, d.year
from daily_employee_balance r, dimension_date d )q1
left outer join daily_employee_balance r on
(q1.employee_id = r.employee_id) and (q1.iso_date = r.iso_date);
Step 4: Analytics
The query below will give you the true average for by month:
select employee_id, monthly_avg, month, year from (
select employee_id,
row_number() over (partition by employee_id,year,month) as row_num,
avg(balance) over (partition by employee_id,year,month) as monthly_avg, month, year from
f_employee_balance)q1
where row_num = 1
order by year, month;
Step 5: Conclusion
You could have just combined step 3 and 4 together; this would save you from creating extra table. When you are in the big data world, you don't worry much about wasting extra disk space or development time. You can easily add another disk or node and automate the process using workflows. For more information, please look into data warehousing concept and hive analytical queries.

sequence generator in plsql code in oracle apps

I have a sample table like below
PO_HEADER || ITEM || LINE_NUM
1 X
1 Y
1 Z
1 A
1 B
I want to update sequence number's in line_num column , like 1...5. , and when i enter another line , next sequence number should be generated automatically like 6 in line_number column.
I want to write the code to update the sequence numbers in line_num, & also to capture the next sequence number . So that when i enter the new line , I should get the next sequence number
Demo: http://sqlfiddle.com/#!4/4d6ff/1
CREATE TABLE Table1
("PO_HEADER" int, "ITEM" varchar2(1), "LINE_NUM" int)
;
INSERT ALL
INTO Table1 ("PO_HEADER", "ITEM", "LINE_NUM")
VALUES (1, 'X', NULL)
INTO Table1 ("PO_HEADER", "ITEM", "LINE_NUM")
VALUES (1, 'Y', NULL)
INTO Table1 ("PO_HEADER", "ITEM", "LINE_NUM")
VALUES (1, 'Z', NULL)
INTO Table1 ("PO_HEADER", "ITEM", "LINE_NUM")
VALUES (1, 'A', NULL)
INTO Table1 ("PO_HEADER", "ITEM", "LINE_NUM")
VALUES (1, 'B', NULL)
SELECT * FROM dual
;
create sequence alamakota;
and now:
update table1 set LINE_NUM = alamakota.nextval;
select * from table1;
| PO_HEADER | ITEM | LINE_NUM |
|-----------|------|----------|
| 1 | X | 1 |
| 1 | Y | 2 |
| 1 | Z | 3 |
| 1 | A | 4 |
| 1 | B | 5 |
People usually use a sequence (Oracle object) which provides uniqueness but is (generally speaking) not gapless. Sequence is easy to implement; you'd either call it during insert, or create a database trigger which would set column value to the next sequence number.
If you insist on "MAX + 1" option, note that it is about to fail in a multi-user environment - two or more users fetch the same MAX value, depending on COMMIT moment and column uniqueness (primary/unique key, unique index) the first one will succeed and everyone else's insert will fail.
There's a workaround - additional table whose sequence number is maintained through an autonomous transaction function.
If you're on 12c, use the identity column. Otherwise, I'd suggest a sequence.
I would not use a sequence for this.
If there is a parent table, to which PO_HEADER is a foreign key, grab a lock on the parent row. Then, SELECT MAX(line_num)+1... for that PO and use it. Do not release the lock on the parent table until you commit your insert to TABLE1.
If you do not have a parent table, you can use DBMS_LOCK to accomplish the same thing. (Allocate a user lock representing the PO and lock it in lieu of a parent table).
Since your application is well designed and all TABLE1 inserts happen through this code, you know it will work. The lock on the parent table ensures two sessions do not get the same next line number because the lock forces them to act one after the other.
If you application is not so well designed, this won't help you.

find string from delimited values from delimited values (Oracle PL-SQL)

How can write a query for Oracle database such that I can find a comma delimited list of values from a column that contains comma delimited list of values. The :parameter passed to sql statement is also a comma delimited values that user selected.
For e.g
We have a column in tables that contains
1 | 'A','B','C'
2 | 'C','A'
3 | 'A','B'
on the web application interface we have multi select box that shows
A
B
C
and allows user to to select one or more items.
I want rows 1 and 2 to show up if they select A and B, If they select A only the all three should show up b/c all rows 1 to 3 have 'A' value in it.
This example will hopefully help and it matches the values irrespective of which order they appear in the string in the DB record.
Create example table:
CREATE TABLE t
(val VARCHAR2(100));
Insert records:
INSERT INTO t VALUES
('1|''A'',''B'',''C''');
INSERT INTO t VALUES
('2|''C'',''A''');
INSERT INTO t VALUES
('3|''A'',''B''');
Check values:
SELECT * FROM t;
1|'A','B','C'
2|'C','A'
3|'A','B'
Check solution for 'A':
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(A)');
1|'A','B','C'
2|'C','A'
3|'A','B'
Check solution for A and B
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(A|B).*(A|B)');
1|'A','B','C'
3|'A','B'
If you want to make sure the 1| part of the result isn't matched by anything then you could query using:
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(.\|.*)(A)');
and
SELECT val
FROM t
WHERE REGEXP_LIKE(val, '(.\|.*)(A|B).*(A|B)');
Hope this helps...
You could use a where clause with varying number of bind values, depending on the number of selected options:
TEST#PRJ> create table t (c varchar2(100));
TEST#PRJ> insert into t values ('2 | ''C'',''A''');
TEST#PRJ> insert into t values ('3 | ''A'',''B''');
TEST#PRJ> select * from t where c like '%''A''%' and c like '%''B''%';
C
----------------------------------------------------------------------------------------------------------------
1 | 'A','B','C'
3 | 'A','B'
TEST#PRJ> select * from t where c like '%''A''%';
C
----------------------------------------------------------------------------------------------------------------
1 | 'A','B','C'
2 | 'C','A'
3 | 'A','B'
If the values are stored in order you could use a single bind value:
TEST#PRJ> select * from t where c like '%''A''%''B''%';
C
-----------------------------------------------------------------------------------------
1 | 'A','B','C'
3 | 'A','B'

Select all rows from SQL based upon existence of multiple rows (sequence numbers)

Let's say I have table data similar to the following:
123456 John Doe 1 Green 2001
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
345678 Jim Doe 1 Red 2001
What I am attempting to do is only isolate the records for Jane Doe based upon the fact that she has more than one row in this table. (More that one sequence number)
I cannot isolate based upon ID, names, colors, years, etc...
The number 1 in the sequence tells me that is the first record and I need to be able to display that record, as well as the number 2 record -- The change record.
If the table is called users, and the fields called ID, fname, lname, seq_no, color, date. How would I write the code to select only records that have more than one row in this table? For Example:
I want the query to display this only based upon the existence of the multiple rows:
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
In PL/SQL
First, to find the IDs for records with multiple rows you would use:
SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1
So you could get all the records for all those people with
SELECT * FROM table WHERE ID IN (SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1)
If you know that the second sequence ID will always be "2" and that the "2" record will never be deleted, you might find something like:
SELECT * FROM table WHERE ID IN (SELECT ID FROM table WHERE SequenceID = 2)
to be faster, but you better be sure the requirements are guaranteed to be met in your database (and you would want a compound index on (SequenceID, ID)).
Try something like the following. It's a single tablescan, as opposed to 2 like the others.
SELECT * FROM (
SELECT t1.*, COUNT(name) OVER (PARTITION BY name) mycount FROM TABLE t1
)
WHERE mycount >1;
INNER JOIN
JOIN:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1 JOIN users u2 ON (u1.ID = u2.ID and u2.seq_no = 2)
WHERE:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1, thetable u2
WHERE
u1.ID = u2.ID AND
u2.seq_no = 2
Check out the HAVING clause for a summary query. You can specify stuff like
HAVING COUNT(*) >= 2
and so forth.

Resources