Hibernate + Postgres: simple query with bad performance - performance

In my JavaEE webapplication, I am querying a postgres database to get the data of an employees holidays.
The table holiday has the following columns:
id bigint NOT NULL
start timestamp without time zone NOT NULL
end timestamp without time zone NOT NULL
employee_id bigint (FK referencing PK of table employee, indexed)
some other columns that do not seem to be related to my issue. Not much data in here.
Goal: Find any holiday entry of a specific employee concerning a given time interval (not necessarily completely, i.e. a holiday from 2009-12-30 to 2010-01-05 is a valid result when searching for holidays from 2010-01-01 to 2010-01-31).
That said, my named Query is:
SELECT
h
FROM
Holiday h
WHERE
h.employee.id = :employeeIdParam
AND (
h.end BETWEEN :fromParam AND :toParam
OR
h.start BETWEEN :fromParam AND :toParam
)
Query execution using hibernate takes ~400ms, resulting in 100 selected rows out of only 2000 rows in total. Much too long, isn't it?
(Using pgAdmin, it takes ~12ms.)
What's the issue here?
(Running on local machine - it's not a connection problem)
Using Wildfly 9.0, Postgres 9.4, Hibernate 4.3.1.
Update
Database Log:
select holiday0_.id as id1_70_, holiday0_.end as end3_70_, holiday0_.employee_id as emplo10_70_, holiday0_.start as star6_70_
from holiday holiday0_ where holiday0_.employee_id=$1 and (holiday0_.end between $2 and $3 or holiday0_.start between $4 and $5)",
"Parameter: $1 = '2757', $2 = '2003-07-01 00:00:00', $3 = '2015-08-11 23:59:59', $4 = '2003-07-01 00:00:00', $5 = '2015-08-11 23:59:59'"

Related

Oracle SQL Developer get table rows older than n months

In Oracle SQL Developer, I have a table called t1 who have two columns col1 defined as NUMBER(19,0) and col2 defined as TIMESTAMP(3).
I have these rows
col1 col2
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
3 26/11/21 10:27:11,750000000
4 26/11/21 10:27:59,606000000
5 16/12/21 11:47:04,105000000
6 16/12/21 12:29:27,101000000
My sysdate looks like this:
select sysdate from dual;
SYSDATE
03/03/22
I want to create a stored procedure (SP) which will delete rows older than 2 months and displayed message n rows are deleted
But when i execute this statement
select * from t1 where to_date(TRUNC(col2), 'DD/MM/YY') < add_months(sysdate, -2);
I don't get the first 2 rows of my t1 table. I get more than 2 rows
1 03/01/22 12:00:00,000000000
2 03/01/22 13:00:00,000000000
How can i get these rows and deleted it please ?
In Oracle, a DATE data type is a binary data type consisting of 7 bytes (century, year-of-century, month, day, hour, minute and second). It ALWAYS has all of those components and it is NEVER stored with a particular formatting (such as DD/MM/RR).
Your client application (i.e. SQL Developer) may choose to DISPLAY the binary DATE value in a human readable manner by formatting it as DD/MM/RR but that is a function of the client application you are using and not the database.
When you show the entire value:
SELECT TO_CHAR(ADD_MONTHS(sysdate, -2), 'YYYY-MM-DD HH24:MI:SS') AS dt FROM DUAL;
Then it outputs (depending on time zone):
DT
2022-01-03 10:11:28
If you compare that to your values then you can see that 2022-01-03 12:00:00 is not "more than 2 months ago" so it will not be matched.
What you appear to want is not "more than 2 months ago" but "equal to or more than 2 months, ignoring the time component, ago"; which you can get using:
SELECT *
FROM t1
WHERE col2 < add_months(TRUNC(sysdate), -2) + INTERVAL '1' DAY;
or
SELECT *
FROM t1
WHERE TRUNC(col2) <= add_months(TRUNC(sysdate), -2);
(Note: the first query would use an index on col2 but the second query would not; it would require a function-based index on TRUNC(col2) instead.)
Also, don't use TO_DATE on a column that is already a DATE or TIMESTAMP data type. TO_DATE takes a string as the first argument and not a DATE or TIMESTAMP so Oracle will perform an implicit conversion using TO_CHAR and if the format models do not match then you will introduce errors (and since any user can set their own date format in their session parameters at any time then you may get errors for one user that are not present for other users and is very hard to debug).
db<>fiddle here
Perhaps just:
select *
from t1
where col2 < add_months(sysdate, -2);

Why does my total session (aggregated using EXTRACT MONTH) is less than total session if I broke down by the date?

I'm trying to generate my total session by month. I've tried using two different ways.
I'm using date field for the first column
I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month
I have tried using below code for the 1st one:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the 2nd one I tried using this code:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the result, I got the output as per below:
20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019
May: 2,905
So, there's 114 session discrepancy and I'd like to know why.
Thank you in advance.
For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session
Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day
The following query should show you which sessions_ids occur on both dates.
select session_id, count(distinct date) as num_dates
from table
where date >= '2019-05-20' AND date <= '2019-05-21'
group by 1
having num_dates > 1
This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.

Dax - Filter by value then count occurence in table

I am working with an employee table and was wondering if I could get some help.
My data table has rows with start and end date values. I filter these rows down using Filter(data table, [start date]<=[Measure.MaxMonth]&&[end date]>=[Measure.MaxMonth]. [Measure.MaxMonth] is a measure that sits on a disconnected date table and functions like a parameter.
Here is a formula that I have been testing but have not been getting the desired results:
Measure.DirectReports = CALCULATE(COUNTROWS(Data Table),filter(Data Table,Data Table[Mgr ID]=Data Table[Emp ID]&&Data Table[Start Date]<=[Meas.LastMonth]&&Data Table[End Date]>=[Meas.LastMonth]))
Measure.LastMonth = max(EOM[End of Month]) ->this value can equal any month end between July 2005 and July 2017. EOM is a date table with a row for each month end - 7/31/2005, 8/31/2005,....7/31/2017
This gives me a table structured liked this:
Emp ID,Emp Attr,Mgr ID,Start Date,End Date
1,B,4,10/1/2013,10/6/2013
1,B,4,10/7/2013,12/29/2013
1,B,4,12/30/2013,12/28/2014
1,B,8,12/29/2014,10/4/2015
1,B,8,10/5/2015,12/27/2015
1,B,12,12/28/2015,5/15/2016
1,B,12,5/16/2016,10/2/2016
1,B,12,10/3/2016,12/25/2016
2,B,4,12/1/2014,12/28/2014
2,B,4,12/29/2014,12/27/2015
2,B,4,12/28/2015,2/7/2016
2,B,4,2/8/2016,3/6/2016
2,B,8,3/7/2016,6/1/2016
3,B,6,7/1/2015,12/27/2015
3,B,8,12/28/2015,6/30/2016
3,B,6,7/1/2016,9/4/2016
3,B,6,9/5/2016,12/25/2016
3,B,6,12/26/2016,5/7/2017
3,B,4,5/8/2017,6/11/2017
3,B,4,6/12/2017,6/25/2017
3,B,4,6/26/2017,7/9/2017
3,B,19,7/10/2017,12/31/9999
4,A,,7/1/1996,4/2/2006
4,A,,4/3/2006,12/31/2007
4,A,,1/1/2008,5/22/2011
4,A,,5/23/2011,11/16/2014
4,A,,11/17/2014,6/11/2017
4,A,,6/12/2017,6/25/2017
4,A,,6/26/2017,12/31/9999
5,B,4,11/8/2010,1/2/2011
5,B,4,1/3/2011,5/22/2011
5,B,4,5/23/2011,1/1/2012
5,B,4,1/2/2012,5/31/2012
5,B,4,6/1/2012,7/1/2012
5,B,4,7/2/2012,9/7/2012
6,B,4,1/3/2011,5/22/2011
6,B,4,5/23/2011,9/5/2011
6,B,4,9/6/2011,1/1/2012
6,B,4,1/2/2012,12/30/2012
6,B,4,12/31/2012,12/29/2013
6,B,4,12/30/2013,5/18/2014
6,B,4,5/19/2014,11/16/2014
6,B,4,11/17/2014,12/28/2014
6,B,4,12/29/2014,3/22/2015
6,B,4,3/23/2015,12/27/2015
6,B,4,12/28/2015,3/6/2016
6,B,4,3/7/2016,8/21/2016
6,B,4,8/22/2016,10/30/2016
6,B,4,10/31/2016,12/25/2016
6,B,4,12/26/2016,1/8/2017
6,B,4,1/9/2017,5/7/2017
6,B,4,5/8/2017,6/11/2017
6,B,4,6/12/2017,6/25/2017
6,B,4,6/26/2017,12/31/9999
7,B,4,1/2/2012,12/30/2012
7,B,4,12/31/2012,12/29/2013
7,B,4,12/30/2013,5/18/2014
7,B,4,5/19/2014,11/16/2014
7,B,4,11/17/2014,12/28/2014
7,B,4,12/29/2014,3/8/2015
7,B,4,3/9/2015,1/18/2016
7,B,4,1/19/2016,2/19/2016
8,B,6,12/31/2012,11/3/2013
8,B,4,11/4/2013,12/29/2013
8,B,4,12/30/2013,1/26/2014
8,B,4,1/27/2014,5/18/2014
8,B,4,5/19/2014,11/16/2014
8,B,4,11/17/2014,12/28/2014
8,B,4,12/29/2014,3/22/2015
8,B,4,3/23/2015,12/27/2015
8,B,4,12/28/2015,7/1/2016
10,B,4,10/3/2011,12/18/2011
10,B,4,12/19/2011,12/30/2012
10,B,4,12/31/2012,2/23/2013
10,B,4,2/24/2013,11/20/2014
11,B,4,2/1/2011,2/27/2011
11,B,4,2/28/2011,5/1/2011
12,B,4,9/15/2012,12/31/2012
12,B,4,9/15/2012,12/31/2012
12,B,4,1/1/2013,12/31/2013
12,B,4,1/1/2013,4/30/2014
12,B,4,1/1/2014,4/30/2014
12,B,4,5/1/2014,11/16/2014
12,B,4,5/1/2014,12/28/2014
12,B,4,11/17/2014,11/30/2014
12,B,4,12/1/2014,12/28/2014
12,B,4,12/29/2014,12/27/2015
12,B,4,12/29/2014,12/30/2016
12,B,4,12/28/2015,12/30/2016
12,B,4,12/31/2016,12/31/2016
12,B,4,1/1/2017,6/11/2017
12,B,4,6/12/2017,6/25/2017
12,B,4,6/26/2017,7/9/2017
12,B,19,7/10/2017,12/31/9999
13,B,4,12/28/2015,9/4/2016
13,B,4,9/5/2016,12/25/2016
13,B,4,12/26/2016,6/11/2017
13,B,4,6/12/2017,6/25/2017
13,B,4,6/26/2017,12/31/9999
14,B,4,1/12/2015,12/27/2015
14,B,4,12/28/2015,12/25/2016
14,B,4,12/26/2016,6/11/2017
14,B,4,6/12/2017,6/25/2017
14,B,4,6/26/2017,12/31/9999
16,B,4,9/14/2015,10/19/2015
17,B,6,8/22/2016,12/25/2016
17,B,6,12/26/2016,5/7/2017
17,B,4,5/8/2017,6/11/2017
17,B,4,6/12/2017,6/25/2017
17,B,4,6/26/2017,7/9/2017
17,B,19,7/10/2017,12/31/9999
18,B,6,9/12/2016,12/25/2016
18,B,6,12/26/2016,5/7/2017
18,B,13,5/8/2017,6/11/2017
18,B,13,6/12/2017,6/25/2017
18,B,13,6/26/2017,7/9/2017
18,B,19,7/10/2017,12/31/9999
19,B,4,7/10/2017,12/31/9999
Empl ID is a unique employee number. Mgr ID references the Empl ID of the employee's manager at the desired point in time (Measure.LastMonth). Emp Attr is an attribute of the employee, such as level.
Does anyone have any ideas on how to create a measure that will count the occurrences of Empl ID in Mgr ID? Ideally, if I am creating a visual in Power BI and I filter based on Empl Attr ="A", can the resulting measure value give me the result = 3 -> the empl id "1" occurs 3 times in the mgr id column.
I need this to be a measure and not a calculated column so that I can trend the results over time (end of month on X axis in trend visual).
Thanks for the help and let me know if you have any questions!
Edit - Updating basically the entire answer due to new information about the problem.
I feel that your core problem is trying to create a relationship between two tables based on the evaluation of an expression (End of Month between Start Date and End Date). From my experience, the easiest way to workaround this is to CROSSJOIN the two tables and then filter it down based on whatever expression you would like. For this, I created a new table with this formula.
Results = DISTINCT(
SELECTCOLUMNS(
FILTER(
CROSSJOIN('Data Table', EOM),
'Data Table'[Start Date] <= EOM[End of Month] && 'Data Table'[End Date] >= EOM[End of Month]
),
"Emp ID", [Emp ID],
"End of Month", [End of Month]
)
)
One more piece before finally making the relationships, we need a list of unique employee IDs. That can easily be obtained by creating a new table with this formula.
Employees = DISTINCT(
SELECTCOLUMNS('Data Table',
"Emp ID", 'Data Table'[Emp ID]
)
)
From here, create relationships between all of the tables as shown in the image below.
I know you asked for a measure, but bear with me as I feel confident that this will get you what you want. Add a new column to the Results table with this formula.
DirectReports = CALCULATE(
COUNTROWS('Data Table'),
FILTER(ALL('Data Table'),
'Data Table'[Mgr ID] = EARLIER(Results[Emp ID]) &&
'Data Table'[Start Date] <= EARLIER(Results[End of Month]) &&
'Data Table'[End Date] >= EARLIER(Results[End of Month])
)
)
At this point, I would hide the Emp ID and End of Month from the results table and any other field(s) you desire. From there, make your visuals. For example, you said you wanted to show direct report count over time, so I made this simple line chart and card.

PostgreSQL - How to decrease select statement excution time

My Postgres version: "PostgreSQL 9.4.1, compiled by Visual C++ build
1800, 32-bit"
The table which i am going to deal with; contains columns
eventtime - timestamp without timezone
serialnumber - character varying(32)
sourceid - integer
and 4 other columns
here is my select statement
SELECT eventtime, serialnumber
FROM t_el_eventlog
WHERE
eventtime at time zone 'CET' > CURRENT_DATE and
sourceid = '14';
the excution time for the above query is 59647ms
And in my r script i have 5 these kind of queries (excution time = 59647ms*5).
Without using time zone 'CET', the excution time is very less - but in my case I must use time zone 'CET' and if I am right the high excution time is beacuse of these timezone.
my query plan
text query
explain analyze query(without timezone)
Is there anyway that I can decrease the query excution time for my select statement
Since the distribution of the values is unknown to me, there is no clear way of solving the problem.
But one problem is obvious: There is an index for the eventtime column, but since the query operates with a function over that column, the index can't be used.
eventtime in time zone 'UTC' > CURRENT_DATE
Either the index has to be dropped and recreated with that function or the query has to be rewritten.
Recreate the index (example):
CREATE INDEX ON t_el_eventlog (timezone('UTC'::text, eventtime));
(this is the same as eventtime in time zone 'UTC')
This matches the filter with the function, the index can be used.
I suspect the sourceid not having a great distribution, not having very much different values. In that case, dropping the index on sourceid AND dropping the index on eventtime with creating a new index over eventtime and sourceid could be an idea:
CREATE INDEX ON t_el_eventlog (timezone('UTC'::text, eventtime), sourceid);
This is what the theory is telling us. I made a few tests around that, with a table with around 10 Million Rows, eventtime distribution within 36 hours and only 20 different sourceids (1..20). Distribution is very random. The best results were in an index over eventtime, sourceid (no function index) and adjusting the query.
CREATE INDEX ON t_el_eventlog (eventtime, sourceid);
-- make sure there is no index on source id. we need to force postgres to this index.
-- make sure, postgres learns about our index
ANALYZE; VACUUM;
-- use timezone function on current date (guessing timezone is CET)
SELECT * FROM t_el_eventlog
WHERE eventtime > timezone('CET',CURRENT_DATE) AND sourceid = 14;
With the table having 10'000'000 rows, this query returns me about 500'000 rows in only 400ms. (instead of about 1400 up to 1700 in all other combinations).
Finding the best match between the indexes and the query is the quest. I suggest some research, a recommendation is http://use-the-index-luke.com
this is what the query plan looks like with the last approach:
Index Only Scan using evlog_eventtime_sourceid_idx on evlog (cost=0.45..218195.13 rows=424534 width=0)
Index Cond: ((eventtime > timezone('CET'::text, (('now'::cstring)::date)::timestamp with time zone)) AND (sourceid = 14))
as you can see, this is a perfect match...

How to avoid expensive Cartesian product using row generator

I'm working on a query (Oracle 11g) that does a lot of date manipulation. Using a row generator, I'm examining each date within a range of dates for each record in another table. Through another query, I know that my row generator needs to generate 8500 dates, and this amount will grow by 365 days each year. Also, the table that I'm examining has about 18000 records, and this table is expected to grow by several thousand records a year.
The problem comes when joining the row generator to the other table to get the range of dates for each record. SQLTuning Advisor says that there's an expensive Cartesian product, which makes sense given that the query currently could generate up to 8500 x 18000 records. Here's the query in its stripped down form, without all the date logic etc.:
with n as (
select level n
from dual
connect by level <= 8500
)
select t.id, t.origdate + n origdate
from (
select id, origdate, closeddate
from my_table
) t
join n on origdate + n - 1 <= closeddate -- here's the problem join
order by t.id, t.origdate;
Is there an alternate way to join these two tables without the Cartesian product?
I need to calculate the elapsed time for each of these records, disallowing weekends and federal holidays, so that I can sort on the elapsed time. Also, the pagination for the table is done server-side, so we can't just load into the table and sort client-side.
The maximum age of a record in the system right now is 3656 days, and the average is 560, so it's not quite as bad as 8500 x 18000; but it's still bad.
I've just about resigned myself to adding a field to store the opendays, computing it once and storing the elapsed time, and creating a scheduled task to update all open records every night.
I think that you would get better performance if you rewrite the join condition slightly:
with n as (
select level n
from dual
connect by level <= 8500
)
select t.id, t.origdate + n origdate
from (
select id, origdate, closeddate
from my_table
) t
join n on Closeddate - Origdate + 1 <= n --you could even create a function-based index
order by t.id, t.origdate;

Resources