BIRT: Computing the average of a time interval - birt

I have the following database table consisting of a series of tasks:
id BIGINT NOT NULL
time_in timestamp(0) with time zone,
time_out timestamp(0) with time zone,
user_name text NOT NULL
I compute the duration of a task as (time_out - time_in). I need to first compute the average task duration for a given user, and afterwards for all users.
I do the first within the SQL query used to obtain the data set. The result of this is an INTERVAL, but in BIRT it is seen as a String (maybe because the INTERVAL data type is absent).
So now I have a set of Strings, each representing an INTERVAL. How do I compute their average?
P.S. I first thought I should do this by using the BIRT 'Time' data type. However, 'Time' represents a time of day, not an interval. There seems to be not equivalent for INTERVAL in BIRT.

Have you tried the avg aggregate function?
SELECT avg(time_out - time_in) as user_avg FROM tasks WHERE user_name='bob';
SELECT avg(time_out - time_in) as all_avg FROM tasks;

Not familiar with birt, but I'd think the best way would be to let PostgreSQL do it. Define a view that does your averaging, and just SELECT from it.

Related

How Should I Represent Time of Day in an Oracle database?

I would like to store time of day - e.g. 18:00 in an Oracle database, and would like to do comparison queries with it.
What is the recommended way to represent a time of day column?
You can use a column with datetime typoe and so yo can perform comparisio on datetime column and show th time with to_char(datetime)
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions180.htm
eg :
SELECT TO_CHAR(your_col, 'HH24:MI')
FROM your_table;
for format models see http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm

What datatype to use for daytime (hour, minute) in Oracle? [duplicate]

In one field I need to store not a datetime pair, i.e. a standard Oracle date.
01/10/2009 22:10:39
But time only
22:10:39
I think that save disk space (I have 2 million rows) or provide faster processing.
You could try the INTERVAL DAY TO SECOND data type but it won't save you any disk space ... it is very suitable for this purpose though.
create table t1 (time_of_day interval day (0) to second(0));
insert into t1 values (TO_DSINTERVAL('0 23:59:59'));
select date '2009-05-13'+time_of_day
from t1;
11 bytes though.
Your best bet would probably be storing "seconds since midnight" as a number field.
SELECT to_char( SYSDATE, 'SSSSS' ) FROM dual;
You can extract the time from a date as a string like this:
to_char(sysdate,'HH.MI.SS')
but there is no time-only data type that will help you save space.
you can use:
TO_CHAR(<DATE_COLUMN>, '<TIME_FORMAT>');
example
TO_CHAR(SYSDATE, 'HH24:MI:SS');
for time format you can check in here
You would save a few Mb of disk space(which is nothing nowadays) and you would gain next to nothing in performance.
You could use a column of NUMBER type for storing the number of seconds since midnight as suggested, just don't forget about the constraints.
(You'd probably use NUMBER(5, 0) which uses 1-3 bytes depending on the stored value, instead of a constant 7 bytes used by a DATE column)

complex sql query for finding the average working time of a person in office

I need to retrieve using an oracle based query the working time of an employee which is the time coming in and time going out for an employee based on his transactions in the transaction table.
The transaction table has a date-time field which can be used for this purpose.
The steps involved would be as follows:
1) find the first transaction on a date and the last transaction on the same date - that would be his time in and timeout for that date
2) calculate the overall time-in as the avg of all time-ins on each date, similarly do for time-out
Transaction table is as follows: transctn(transid, resourceid, event, currentdate)
The second requirement is to find the average transactions performed each day, which is basically find the count of transids for each day and then average of that.
The final answer should be, when a userid is provided the query the return result is:
frequent working time(based on average): 9:43 am - 6:45 pm
average transactions performed/day = 43
How do I write the above requirement in oracle SQL or more smartly using Hibernate if Transctn is my Domain class
I have something like this:
select 'frequent working time: '
||(select rtrim(to_char(min(currentdate),'hh:mi pm')) from transctn)
||' - '||(select rtrim(to_char(max(currentdate),'hh:mi pm')) from transctn)
||', average transactions performed/day = '
||(select rtrim(to_char(count(distinct transid)/
count(distinct(to_char(currentdate,'rrmmdd')))) from transctn)
from dual
Firstly, your query has lots of selects from the same table stuck together, which isn't efficient, and makes it harder to read than necessary. And the rtrim isn't doing anything as you've already dictated the format. What you have can be rewritten as:
select 'frequent working time: '|| to_char(min(currentdate),'hh:mi pm')
||' - '|| to_char(max(currentdate),'hh:mi pm'),
'average transactions performed/day = '
|| to_char(count(distinct transid)
/count(distinct to_char(currentdate,'rrmmdd')))
from transctn;
But this isn't averaging properly, and isn't for a specific user. I'm join to assume this is homework and in the usual spirit of the site, try to give you pointers rather than the complete answer...
1) find the first transaction on a date and the last transaction on
the same date - that would be his time in and timeout for that date
You're not far off here, but you're breaking down by date. To get the time-in and time-out for every user, for each day, you could use:
select resourceid, trunc(currentdate), min(currentdate), max(currentdate)
from transctn
group by resourceid, trunc(currentdate)
order by resourceid, trunc(currentdate);
2) calculate the overall time-in as the avg of all time-ins on each
date, similarly do for time-out
You can't average dates directly in Oracle; you'd get ORA-00932: inconsistent datatypes: expected NUMBER got DATE. There are various ways to achieve the effect, you just need to figure out a safe way to treat the date - or more specifically here the time part - as a number. For example, you could treat the time portion as the difference between the actual time and the start of the day, which Oracle returns as a number:
select avg(min(currentdate) - trunc(currentdate)),
avg(max(currentdate) - trunc(currentdate))
from transctn
group by trunc(currentdate);
But you than have to translate that fractional number back into something recognisable. One way of doing that is to add the number to an arbitrary fixed date, and then just extract the time part as a string as you were already doing:
select to_char(date '2000-01-01' + avg(min(currentdate) - trunc(currentdate)),
'HH:MI pm') as avg_time_in,
to_char(date '2000-01-01' + avg(max(currentdate) - trunc(currentdate)),
'HH:MI pm') as avg_time_out
from transctn
group by trunc(currentdate);
This look messy and you might find a better way to do it. If it is homework then I would assume you've been taught methods for doing this sort of thing, or something that can be adapted to be applicable.
This is still for all resources, so you'll need to add a filter to restrict to one user ID. Hopefully this gives you some ideas for tackling the second requirement as well.

Real Time issues: Oracle Performance tuning (types / indexes / plsql / queries)

I am looking for a real time solution...
Below are my DB columns. I am using Oracle10g. Please help me in defining table types / indexes and tuned PLSQL / query (both) for the updates and insertion
Insert and Update queries are simple but here we need to take care of the performance because my system will execute such 200 times per second.
Let me know... should I use procedures or simple queries? It is requested to write tuned plsql and query with proper DB table types / indexes.
I would really like to see the performance of my system after continuous 200 updates per second
DB table (columns) (I can change the structure if required so please let me know...)
Play ID - ID
Type - Song or Message
Count - Summation of total play
Retries - Summation of total play, if failed.
Duration - Total Duration
Last Updated - Late Updated Date Time
Thanks in advance ... let me know in case of any confusion...
You've not really given a lot of detail about WHAT you are updating etc.
As a basis for you to write your update statements, don't use PL/SQL unless you cannot achieve what you want to do in SQL as the context switching alone will hurt your performance before you even get round to processing any records.
If you are able to create indexes specifically for the update then index the columns that will appear in your update statement's WHERE clause so the records can be found quickly before being updated.
As for inserting, look up the benefits of the /*+ append */ hint for inserting records to see if it will benefit your particular case.
Finally, the table structure you will use will depend on may factors that you haven't even begun to touch on with the details you've supplied, I suggest you either do some research on DB structure or ask your DBA's for a 101 class in it.
Best of luck...
EDIT:
In response to:
Play ID - ID ( here id would be song name like abc.wav something..so may be VARCHAR2, yet not decided..whats your openion...is that fine if primary key is of type VARCHAR2....any suggesstions are most welcome...... ) Type - Song or Message ( varchar2) Count - Summation of total play ( Integer) Retries - Summation of total play, if failed. ( Integer) Duration - Total Duration ( Integer) Last Updated - Late Updated Date Time ( DateTime )
There is nothing wrong with having a PRIMARY KEY as a VARCHAR2 data type (though there is often debate about the value of having a non-specific PK, i.e. a sequence). You must, however, ensure your PK is unique, if you can't guarentee this then it would be worth having a sequence as your PK over having to introduce another columnn to maintain uniqueness.
As for declaring your table columns as INTEGER, they eventually will be resolved to NUMBER anyway so I'd just create the table column as a number (unless you have a very specific reason for creating them as INTEGER).
Finally, the DATETIME column, you only need decare it as a DATE datatype unless you need real precision in your time portion, in which case declare it as a TIMESTAMP datatype.
As for helping you with the structure of the table itself (i.e. which columns you want etc.) then that is not something I can help you with as I know nothing of your reporting requirements, application requirements or audit requirements, company best practice, naming conventions etc. I'm afraid that is something for you to decide for yourself.
For performance though, keep indexes to a minumum (i.e. only index columns that will aid your UPDATE WHERE clause search), only update the minimum data possible and, as suggested before, research the APPEND hint for inserts it may help in your case but you will have to test it for yourself.

Querying a data warehouse data involving time dimension

I have two tables for time dimension
date (unique row for each day)
time of the day (unique row for each minute in a day)
Given this schema what would a query look like if one wants to retrieve facts for last X hours where X can be any number greater than 0.
Things start to be become tricky when the start time and end time happen to be in two different days of the year.
EDIT: My Fact table does not have a time stamp column
Fact tables do have (and should have) original timestamp in order to avoid weird by-time queries which happen over the boundary of a day. Weird means having some type of complicated date-time function in the WHERE clause.
In most DWs these type of queries are very rare, but you seem to be streaming data into your DW and using it for reporting at the same time.
So I would suggest:
Introduce the full timestamp in the fact table.
For the old records, re-create the timestamp from the Date and Time keys.
DW queries are all about not having any functions in the WHERE clause, or if a function has to be used, make sure it is SARGABLE.
You would probably be better served by converting the Start Date and End Date columns to TIMESTAMP and populating them.
Slicing the table would require taking the appropriate interval BETWEEN Start Date AND End Date. In Oracle the interval would be something along the lines of SYSDATE - (4/24) or SYSDATE - NUMTODSINTERVAL(4, 'HOUR')
This could also be rewritten as:
Start Date <= (SYSDATE - (4/24)) AND End Date >= (SYSDATE - (4/24))
It seems to me that given the current schema you have, that you will need to retrieve the appropriate time IDs from the time dimension table which meet your search criteria, and then search for matching rows in the fact table. Depending on the granularity of your time dimension, you might want to check the performance of doing either (SQL Server examples):
A subselect:
SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())
An inner join:
SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()
Neither of these are truly attractive options.
Have you considered that you may be querying against a cube that is intended for roll-up analysis and not necessarily for "last X" analysis?
If this is not a "roll-up" cube, I would agree with the other posters in that you should re-stamp your fact tables with better keys, and if you do in fact intend to search off of hour frequently, you should probably include that in the fact table as well, as any other attempt will probably make the query non-sargable (see What makes a SQL statement sargable?).
Microsoft recommends at http://msdn.microsoft.com/en-us/library/aa902672%28v=sql.80%29.aspx that:
In contrast to surrogate keys used in other dimension tables, date and time dimension keys should be "smart." A suggested key for a date dimension is of the form "yyyymmdd". This format is easy for users to remember and incorporate into queries. It is also a recommended surrogate key format for fact tables that are partitioned into multiple tables by date.
Best luck!

Resources