What is the best way to work around data? - oracle

I am trying to display data as such:
In our database we have events (with unique ID) and then a start date. The events do not overlap, and each one starts on the date the last one ended. However we don't have 'end date' in the database.
I have to feed the data into another system so that it shows event ID, start date, and end date (which is just the next start date).
I want to avoid creating a custom view as that's really frowned upon here for this database. So I'm wondering if there's a good way to do this in a query.
Essentially it would be:
EventA | Date1 | Date2
EventB | Date2 | Date3
EventC | Date3 | Date4
The events are planned years in advance and I only need to have the next few months pulled for the query, so no worry about running out of 'next event start dates' and in case it matters, this query will be part of a webservice call.
The basic pseudo code for event and date would be:
select Event.ID, Event.StartDate
from Event
where Event.StartDate > sysdate and Event.StartDate < sysdate+90
Essentially I want to take the next row's Event.StartDate and make it the current row's Event.EndDate

Use the LEAD analytic function:
Oracle Setup:
A table with 10 rows:
CREATE TABLE Event ( ID, StartDate ) AS
SELECT LEVEL, TRUNC( SYSDATE ) + LEVEL
FROM DUAL
CONNECT BY LEVEL <= 10;
Query:
select ID,
StartDate,
LEAD( StartDate ) OVER ( ORDER BY StartDate ) AS EndDate
from Event
where StartDate > sysdate and StartDate < sysdate+90
Output:
ID | STARTDATE | ENDDATE
-: | :-------- | :--------
1 | 22-JUN-19 | 23-JUN-19
2 | 23-JUN-19 | 24-JUN-19
3 | 24-JUN-19 | 25-JUN-19
4 | 25-JUN-19 | 26-JUN-19
5 | 26-JUN-19 | 27-JUN-19
6 | 27-JUN-19 | 28-JUN-19
7 | 28-JUN-19 | 29-JUN-19
8 | 29-JUN-19 | 30-JUN-19
9 | 30-JUN-19 | 01-JUL-19
10 | 01-JUL-19 | null
db<>fiddle here

Related

Populating future dates in oracle table

I have attached tables, product and date.
Lets say my product table has data till yesterday i.e 05/31/2018
am trying to populate season table where I could do the calculation till 5/31/2018 where value = (value on same day last year/previous day last year) with Ch(a) and P(pen), however the data set was till 5/31/2018. my aim is to get data/calculation for 06/1/2018 till 12/31/2018 as well. how do i get the data for these future dates as I have the data to calculate these future dates in prod table.
appreciate if you can help.
Thank you!
You can generate a series of dates using CONNECT BY subquery like this one:
SELECT Start_date + level - 1 as my_date
FROM (
SELECT date '2018-01-01' as Start_date FROM dual
)
CONNECT BY Start_date + level - 1 <= date '2018-01-05'
Demo: http://www.sqlfiddle.com/#!4/072359/1
| MY_DATE |
|----------------------|
| 2018-01-01T00:00:00Z |
| 2018-01-02T00:00:00Z |
| 2018-01-03T00:00:00Z |
| 2018-01-04T00:00:00Z |
| 2018-01-05T00:00:00Z |

Insert value based on min value greater than value in another row

It's difficult to explain the question well in the title.
I am inserting 6 values from (or based on values in) one row.
I also need to insert a value from a second row where:
The values in one column (ID) must be equal
The values in column (CODE) in the main source row must be IN (100,200), whereas the other row must have value of 300 or 400
The value in another column (OBJID) in the secondary row must be the lowest value above that in the primary row.
Source Table looks like:
OBJID | CODE | ENTRY_TIME | INFO | ID | USER
---------------------------------------------
1 | 100 | x timestamp| .... | 10 | X
2 | 100 | y timestamp| .... | 11 | Y
3 | 300 | z timestamp| .... | 10 | F
4 | 100 | h timestamp| .... | 10 | X
5 | 300 | g timestamp| .... | 10 | G
So to provide an example..
In my second table I want to insert OBJID, OBJID2, CODE, ENTRY_TIME, substr(INFO(...)), ID, USER
i.e. from my example a line inserted in the second table would look like:
OBJID | OBJID2 | CODE | ENTRY_TIME | INFO | ID | USER
-----------------------------------------------------------
1 | 3 | 100 | x timestamp| substring | 10 | X
4 | 5 | 100 | h timestamp| substring2| 10 | X
My insert for everything that just comes from one row works fine.
INSERT INTO TABLE2
(ID, OBJID, INFO, USER, ENTRY_TIME)
SELECT ID, OBJID, DECODE(CODE, 100, (SUBSTR(INFO, 12,
LENGTH(INFO)-27)),
600,'CREATE') INFO, USER, ENTRY_TIME
FROM TABLE1
WHERE CODE IN (100,200);
I'm aware that I'll need to use an alias on TABLE1, but I don't know how to get the rest to work, particularly in an efficient way. There are 2 million rows right now, but there will be closer to 20 million once I start using production data.
You could try this:
select primary.* ,
(select min(objid)
from table1 secondary
where primary.objid < secondary.objid
and secondary.code in (300,400)
and primary.id = secondary.id
) objid2
from table1 primary
where primary.code in (100,200);
Ok, I've come up with:
select OBJID,
min(case when code in (300,400) then objid end)
over (partition by id order by objid
range between 1 following and unbounded following
) objid2,
CODE, ENTRY_TIME, INFO, ID, USER1
from table1;
So, you need a insert select the above query with a where objid2 is not null and code in (100,200);

Hive query GROUP BY error; Invalid table alias or column reference

Kindest,
I am trying to extend some working HIVE queries, but seem to fall short. Just wanting to test GROUP BY function, which is common to a number of queries that I need to complete. Here is the query that I am trying to execute:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp)) AS hourly
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hourly;
This yields the following error:
ERROR: Error while executing Hive script.Query returned non-zero code:
10, cause: FAILED: Error in semantic analysis: Line 1:320 Invalid
table alias or column reference 'hourly': (possible column names are:
messagerowid, payload_sensor, messagetimestamp, payload_temp,
payload_timestamp, payload_timestampmysql, payload_watt,
payload_wattseconds)
The intention is to end up with a timebound query (say last 24 hours) broken in by SUM() on payload_wattsecond etc. To get started breaking then out the creation of the summary tables, I started building a group by query which were going to derive the hourly anchor for the select query.
Problem though is the error above. Would greatly appreciate any pointers to what is wrong here.. can't seem to find it myself, but then again I am a newbie on HIVE.
Thanks in advance ..
UPDATE: Tried to update the query. Here is the query that I just tried to run:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp))
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hour(from_unixtime(payload_timestamp));
.. that however gives another error, which is:
ERROR: Error while executing Hive script.Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key 'messageRowID'
Thoughts?
UPDATE #2) The following is a quick dump of a few samples that are derived into the EVENT_KS CF in WSO2BAM. The last column is a calculated (in the perl daemon..) #watt_seconds, which will be used in a query to calculate the aggregate sum totalled into kwH, which then will be dumped into MySQL tables for sync to the application that holds the ui/ux layer..
[12:03:00] [jskogsta#enterprise ../Product Centric Opco Modelling]$ ~/local/apache-cassandra-2.0.8/bin/cqlsh localhost 9160 -u admin -p admin --cqlversion="3.0.5"
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 1.2.13 | CQL spec 3.0.5 | Thrift protocol 19.36.2]
Use HELP for help.
cqlsh> use "EVENT_KS";
cqlsh:EVENT_KS> select * from currentcost_stream limit 5;
key | Description | Name | Nick_Name | StreamId | Timestamp | Version | payload_sensor | payload_temp | payload_timestamp | payload_timestampmysql | payload_watt | payload_wattseconds
-------------------------------------------+---------------------------+--------------------+----------------------+---------------------------+---------------+---------+----------------+--------------+-------------------+------------------------+--------------+---------------------
1403365575174::10.11.205.218::9443::9919 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403365575174 | 1.0.18 | 1 | 13.16 | 1403365575 | 2014-06-21 23:46:15 | 6631 | 19893
1403354553932::10.11.205.218::9443::2663 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354553932 | 1.0.18 | 1 | 14.1 | 1403354553 | 2014-06-21 20:42:33 | 28475 | 0
1403374113341::10.11.205.218::9443::11852 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403374113341 | 1.0.18 | 1 | 10.18 | 1403374113 | 2014-06-22 02:08:33 | 17188 | 154692
1403354501924::10.11.205.218::9443::1894 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354501924 | 1.0.18 | 1 | 10.17 | 1403354501 | 2014-06-21 20:41:41 | 26266 | 0
1403407054092::10.11.205.218::9443::15527 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403407054092 | 1.0.18 | 1 | 17.16 | 1403407054 | 2014-06-22 11:17:34 | 6332 | 6332
(5 rows)
cqlsh:EVENT_KS>
What I will be trying to do is to issue a query against this table (actually multiples depending on the various presentation aggregations that are required..), and present a view based on hourly sum's, 10-minute sum's, daily sum's, monthly sum's etc. etc. Depending on the query, the GROUP BY was intended to give this 'index' so to speak. Right now just testing this.. so will see how it ends up in the end. Hope this makes sense?!
So not trying to remove duplicates...
UPDATE 3) Was going about this all wrong.. and thought a bit more on the tip that was given below. Hence just simplifying the whole query gave the right results. The following query gives the total amount of kwH on an hourly basis for the WHOLE dataset. With this, I can create the various iterations of kwH spent over various time periods like
Hourly over the last 24 hours
Daily over the last year
Minute over the last hour
.. etc. etc.
Here is the query:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select hour(from_unixtime(payload_timestamp)) AS hourly, (sum(payload_wattseconds)/(60*60)/1000)
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
GROUP BY hour(from_unixtime(payload_timestamp));
This query yields the following based on the sample data:
hourly _c1
0 16.91570472222222
1 16.363228888888887
2 15.446414166666667
3 11.151388055555556
4 18.10564666666667
5 2.2734924999999997
6 17.370668055555555
7 17.991484444444446
8 38.632728888888884
9 16.001440555555554
10 15.887023888888889
11 12.709341944444445
12 23.052629722222225
13 14.986092222222222
14 16.182284722222224
15 5.881564999999999
18 2.8149172222222223
19 17.484405
20 15.888274166666665
21 15.387210833333333
22 16.088641666666668
23 16.49990916666667
Which is aggregate kwH per hourly timeframe over the entire dataset..
So, now on to the next problem. ;-)

oracle query to get max hour every day, and corresponding row values

I'm having a hard time creating a query to do the following:
I have this table, called LOG:
ID | INSERT_TIME | LOG_VALUE
----------------------------------------
1 | 2013-04-29 18:00:00.000 | 160473
2 | 2013-04-29 21:00:00.000 | 154281
3 | 2013-04-30 09:00:00.000 | 186552
4 | 2013-04-30 14:00:00.000 | 173145
5 | 2013-04-30 14:30:00.000 | 102235
6 | 2013-05-01 11:00:00.000 | 201541
7 | 2013-05-01 23:00:00.000 | 195234
What I want to do is build a query that returns, for each day, the last values inserted (using the max value of INSERT_TIME). I'm only interested in the date part of that column, and in the column LOG_VALUE. So, this would be my resultset after running the query:
2013-04-29 154281
2013-04-30 102235
2013-05-01 195234
I guess that I need to use GROUP BY over the INSERT_TIME column, along with MAX() function, but by doing that, I can't seem to get the LOG_VALUE. Can anyone help me on this, please?
(I'm on Oracle 10g)
SELECT trunc(insert_time),
log_value
FROM (
SELECT insert_time,
log_value,
rank() over (partition by trunc(insert_time)
order by insert_time desc) rnk
FROM log)
WHERE rnk = 1
is one option. This uses the analytic function rank to identify the row with the latest insert_time on each day.

Create a table of dates in rows

I have a Oracle Database;
and I want to create a table with two columns, one contain id and the other contain incremented dates in rows.
i want to specify in my PL/SQL code the limit dates, and the code will generate the rows between the two limit dates (from and to).
This is an example output result :
+-----+--------------------+
| id |dates |
+-----+--------------------+
| 1 |01/02/2011 04:00:00 |
+-----+--------------------+
| 2 |01/02/2011 05:00:00 |
+-----+--------------------+
| 3 |01/02/2011 06:00:00 |
+-----+--------------------+
| 4 |01/02/2011 07:00:00 |
+-----+--------------------+
| 5 |01/02/2011 08:00:00 |
....
...
..
| 334 |05/03/2011 023:00:00|
+-----+--------------------+
You haven't exactly deluged us with details, but this is the sort of construct you want:
select level as id
, &&start_date + ((level-1) * (1/24) as dates
from dual
connect by level <= ((&&end_date - &&start_date)*24)
/
This assumes your input values are whole days, You will need to adjust the maths if your start or end date contains a time component.
You would need to start with a date baseline:
vBaselineDate := TRUNC(SYSDATE);
OR
vBaselineDate := TO_DATE('28-03-2013 12:00:00', 'DD-MM-YYYY HH:MI:SS');
Then increment the baseline by adding fractions of a day depending on how large you want the range, eg: 1 minute, 1 hour etc.
FOR i IN 1..334 LOOP
INSERT INTO mytable
(id, dates)
VALUES
(i, (vBaselineDate + i/24));
END LOOP;
COMMIT;
1/24 = 1 hour.
1/1440 = 1 minute;
Hope this helps.

Resources