Hibernate + Postgres: simple query with bad performance - performance
In my JavaEE webapplication, I am querying a postgres database to get the data of an employees holidays.
The table holiday has the following columns:
id bigint NOT NULL
start timestamp without time zone NOT NULL
end timestamp without time zone NOT NULL
employee_id bigint (FK referencing PK of table employee, indexed)
some other columns that do not seem to be related to my issue. Not much data in here.
Goal: Find any holiday entry of a specific employee concerning a given time interval (not necessarily completely, i.e. a holiday from 2009-12-30 to 2010-01-05 is a valid result when searching for holidays from 2010-01-01 to 2010-01-31).
That said, my named Query is:
SELECT
h
FROM
Holiday h
WHERE
h.employee.id = :employeeIdParam
AND (
h.end BETWEEN :fromParam AND :toParam
OR
h.start BETWEEN :fromParam AND :toParam
)
Query execution using hibernate takes ~400ms, resulting in 100 selected rows out of only 2000 rows in total. Much too long, isn't it?
(Using pgAdmin, it takes ~12ms.)
What's the issue here?
(Running on local machine - it's not a connection problem)
Using Wildfly 9.0, Postgres 9.4, Hibernate 4.3.1.
Update
Database Log:
select holiday0_.id as id1_70_, holiday0_.end as end3_70_, holiday0_.employee_id as emplo10_70_, holiday0_.start as star6_70_
from holiday holiday0_ where holiday0_.employee_id=$1 and (holiday0_.end between $2 and $3 or holiday0_.start between $4 and $5)",
"Parameter: $1 = '2757', $2 = '2003-07-01 00:00:00', $3 = '2015-08-11 23:59:59', $4 = '2003-07-01 00:00:00', $5 = '2015-08-11 23:59:59'"
Related
Oracle SQL Developer get table rows older than n months
In Oracle SQL Developer, I have a table called t1 who have two columns col1 defined as NUMBER(19,0) and col2 defined as TIMESTAMP(3). I have these rows col1 col2 1 03/01/22 12:00:00,000000000 2 03/01/22 13:00:00,000000000 3 26/11/21 10:27:11,750000000 4 26/11/21 10:27:59,606000000 5 16/12/21 11:47:04,105000000 6 16/12/21 12:29:27,101000000 My sysdate looks like this: select sysdate from dual; SYSDATE 03/03/22 I want to create a stored procedure (SP) which will delete rows older than 2 months and displayed message n rows are deleted But when i execute this statement select * from t1 where to_date(TRUNC(col2), 'DD/MM/YY') < add_months(sysdate, -2); I don't get the first 2 rows of my t1 table. I get more than 2 rows 1 03/01/22 12:00:00,000000000 2 03/01/22 13:00:00,000000000 How can i get these rows and deleted it please ?
In Oracle, a DATE data type is a binary data type consisting of 7 bytes (century, year-of-century, month, day, hour, minute and second). It ALWAYS has all of those components and it is NEVER stored with a particular formatting (such as DD/MM/RR). Your client application (i.e. SQL Developer) may choose to DISPLAY the binary DATE value in a human readable manner by formatting it as DD/MM/RR but that is a function of the client application you are using and not the database. When you show the entire value: SELECT TO_CHAR(ADD_MONTHS(sysdate, -2), 'YYYY-MM-DD HH24:MI:SS') AS dt FROM DUAL; Then it outputs (depending on time zone): DT 2022-01-03 10:11:28 If you compare that to your values then you can see that 2022-01-03 12:00:00 is not "more than 2 months ago" so it will not be matched. What you appear to want is not "more than 2 months ago" but "equal to or more than 2 months, ignoring the time component, ago"; which you can get using: SELECT * FROM t1 WHERE col2 < add_months(TRUNC(sysdate), -2) + INTERVAL '1' DAY; or SELECT * FROM t1 WHERE TRUNC(col2) <= add_months(TRUNC(sysdate), -2); (Note: the first query would use an index on col2 but the second query would not; it would require a function-based index on TRUNC(col2) instead.) Also, don't use TO_DATE on a column that is already a DATE or TIMESTAMP data type. TO_DATE takes a string as the first argument and not a DATE or TIMESTAMP so Oracle will perform an implicit conversion using TO_CHAR and if the format models do not match then you will introduce errors (and since any user can set their own date format in their session parameters at any time then you may get errors for one user that are not present for other users and is very hard to debug). db<>fiddle here
Perhaps just: select * from t1 where col2 < add_months(sysdate, -2);
Why does my total session (aggregated using EXTRACT MONTH) is less than total session if I broke down by the date?
I'm trying to generate my total session by month. I've tried using two different ways. I'm using date field for the first column I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month I have tried using below code for the 1st one: with session1 as( select date, session_id from table where date >= '2019-05-20' AND date <= '2019-05-21') SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1 GROUP BY 1 For the 2nd one I tried using this code: with session1 as( select date, session_id from table where date >= '2019-05-20' AND date <= '2019-05-21') SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1 GROUP BY 1 For the result, I got the output as per below: 20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019 May: 2,905 So, there's 114 session discrepancy and I'd like to know why. Thank you in advance.
For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day
The following query should show you which sessions_ids occur on both dates. select session_id, count(distinct date) as num_dates from table where date >= '2019-05-20' AND date <= '2019-05-21' group by 1 having num_dates > 1 This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.
Dax - Filter by value then count occurence in table
I am working with an employee table and was wondering if I could get some help. My data table has rows with start and end date values. I filter these rows down using Filter(data table, [start date]<=[Measure.MaxMonth]&&[end date]>=[Measure.MaxMonth]. [Measure.MaxMonth] is a measure that sits on a disconnected date table and functions like a parameter. Here is a formula that I have been testing but have not been getting the desired results: Measure.DirectReports = CALCULATE(COUNTROWS(Data Table),filter(Data Table,Data Table[Mgr ID]=Data Table[Emp ID]&&Data Table[Start Date]<=[Meas.LastMonth]&&Data Table[End Date]>=[Meas.LastMonth])) Measure.LastMonth = max(EOM[End of Month]) ->this value can equal any month end between July 2005 and July 2017. EOM is a date table with a row for each month end - 7/31/2005, 8/31/2005,....7/31/2017 This gives me a table structured liked this: Emp ID,Emp Attr,Mgr ID,Start Date,End Date 1,B,4,10/1/2013,10/6/2013 1,B,4,10/7/2013,12/29/2013 1,B,4,12/30/2013,12/28/2014 1,B,8,12/29/2014,10/4/2015 1,B,8,10/5/2015,12/27/2015 1,B,12,12/28/2015,5/15/2016 1,B,12,5/16/2016,10/2/2016 1,B,12,10/3/2016,12/25/2016 2,B,4,12/1/2014,12/28/2014 2,B,4,12/29/2014,12/27/2015 2,B,4,12/28/2015,2/7/2016 2,B,4,2/8/2016,3/6/2016 2,B,8,3/7/2016,6/1/2016 3,B,6,7/1/2015,12/27/2015 3,B,8,12/28/2015,6/30/2016 3,B,6,7/1/2016,9/4/2016 3,B,6,9/5/2016,12/25/2016 3,B,6,12/26/2016,5/7/2017 3,B,4,5/8/2017,6/11/2017 3,B,4,6/12/2017,6/25/2017 3,B,4,6/26/2017,7/9/2017 3,B,19,7/10/2017,12/31/9999 4,A,,7/1/1996,4/2/2006 4,A,,4/3/2006,12/31/2007 4,A,,1/1/2008,5/22/2011 4,A,,5/23/2011,11/16/2014 4,A,,11/17/2014,6/11/2017 4,A,,6/12/2017,6/25/2017 4,A,,6/26/2017,12/31/9999 5,B,4,11/8/2010,1/2/2011 5,B,4,1/3/2011,5/22/2011 5,B,4,5/23/2011,1/1/2012 5,B,4,1/2/2012,5/31/2012 5,B,4,6/1/2012,7/1/2012 5,B,4,7/2/2012,9/7/2012 6,B,4,1/3/2011,5/22/2011 6,B,4,5/23/2011,9/5/2011 6,B,4,9/6/2011,1/1/2012 6,B,4,1/2/2012,12/30/2012 6,B,4,12/31/2012,12/29/2013 6,B,4,12/30/2013,5/18/2014 6,B,4,5/19/2014,11/16/2014 6,B,4,11/17/2014,12/28/2014 6,B,4,12/29/2014,3/22/2015 6,B,4,3/23/2015,12/27/2015 6,B,4,12/28/2015,3/6/2016 6,B,4,3/7/2016,8/21/2016 6,B,4,8/22/2016,10/30/2016 6,B,4,10/31/2016,12/25/2016 6,B,4,12/26/2016,1/8/2017 6,B,4,1/9/2017,5/7/2017 6,B,4,5/8/2017,6/11/2017 6,B,4,6/12/2017,6/25/2017 6,B,4,6/26/2017,12/31/9999 7,B,4,1/2/2012,12/30/2012 7,B,4,12/31/2012,12/29/2013 7,B,4,12/30/2013,5/18/2014 7,B,4,5/19/2014,11/16/2014 7,B,4,11/17/2014,12/28/2014 7,B,4,12/29/2014,3/8/2015 7,B,4,3/9/2015,1/18/2016 7,B,4,1/19/2016,2/19/2016 8,B,6,12/31/2012,11/3/2013 8,B,4,11/4/2013,12/29/2013 8,B,4,12/30/2013,1/26/2014 8,B,4,1/27/2014,5/18/2014 8,B,4,5/19/2014,11/16/2014 8,B,4,11/17/2014,12/28/2014 8,B,4,12/29/2014,3/22/2015 8,B,4,3/23/2015,12/27/2015 8,B,4,12/28/2015,7/1/2016 10,B,4,10/3/2011,12/18/2011 10,B,4,12/19/2011,12/30/2012 10,B,4,12/31/2012,2/23/2013 10,B,4,2/24/2013,11/20/2014 11,B,4,2/1/2011,2/27/2011 11,B,4,2/28/2011,5/1/2011 12,B,4,9/15/2012,12/31/2012 12,B,4,9/15/2012,12/31/2012 12,B,4,1/1/2013,12/31/2013 12,B,4,1/1/2013,4/30/2014 12,B,4,1/1/2014,4/30/2014 12,B,4,5/1/2014,11/16/2014 12,B,4,5/1/2014,12/28/2014 12,B,4,11/17/2014,11/30/2014 12,B,4,12/1/2014,12/28/2014 12,B,4,12/29/2014,12/27/2015 12,B,4,12/29/2014,12/30/2016 12,B,4,12/28/2015,12/30/2016 12,B,4,12/31/2016,12/31/2016 12,B,4,1/1/2017,6/11/2017 12,B,4,6/12/2017,6/25/2017 12,B,4,6/26/2017,7/9/2017 12,B,19,7/10/2017,12/31/9999 13,B,4,12/28/2015,9/4/2016 13,B,4,9/5/2016,12/25/2016 13,B,4,12/26/2016,6/11/2017 13,B,4,6/12/2017,6/25/2017 13,B,4,6/26/2017,12/31/9999 14,B,4,1/12/2015,12/27/2015 14,B,4,12/28/2015,12/25/2016 14,B,4,12/26/2016,6/11/2017 14,B,4,6/12/2017,6/25/2017 14,B,4,6/26/2017,12/31/9999 16,B,4,9/14/2015,10/19/2015 17,B,6,8/22/2016,12/25/2016 17,B,6,12/26/2016,5/7/2017 17,B,4,5/8/2017,6/11/2017 17,B,4,6/12/2017,6/25/2017 17,B,4,6/26/2017,7/9/2017 17,B,19,7/10/2017,12/31/9999 18,B,6,9/12/2016,12/25/2016 18,B,6,12/26/2016,5/7/2017 18,B,13,5/8/2017,6/11/2017 18,B,13,6/12/2017,6/25/2017 18,B,13,6/26/2017,7/9/2017 18,B,19,7/10/2017,12/31/9999 19,B,4,7/10/2017,12/31/9999 Empl ID is a unique employee number. Mgr ID references the Empl ID of the employee's manager at the desired point in time (Measure.LastMonth). Emp Attr is an attribute of the employee, such as level. Does anyone have any ideas on how to create a measure that will count the occurrences of Empl ID in Mgr ID? Ideally, if I am creating a visual in Power BI and I filter based on Empl Attr ="A", can the resulting measure value give me the result = 3 -> the empl id "1" occurs 3 times in the mgr id column. I need this to be a measure and not a calculated column so that I can trend the results over time (end of month on X axis in trend visual). Thanks for the help and let me know if you have any questions!
Edit - Updating basically the entire answer due to new information about the problem. I feel that your core problem is trying to create a relationship between two tables based on the evaluation of an expression (End of Month between Start Date and End Date). From my experience, the easiest way to workaround this is to CROSSJOIN the two tables and then filter it down based on whatever expression you would like. For this, I created a new table with this formula. Results = DISTINCT( SELECTCOLUMNS( FILTER( CROSSJOIN('Data Table', EOM), 'Data Table'[Start Date] <= EOM[End of Month] && 'Data Table'[End Date] >= EOM[End of Month] ), "Emp ID", [Emp ID], "End of Month", [End of Month] ) ) One more piece before finally making the relationships, we need a list of unique employee IDs. That can easily be obtained by creating a new table with this formula. Employees = DISTINCT( SELECTCOLUMNS('Data Table', "Emp ID", 'Data Table'[Emp ID] ) ) From here, create relationships between all of the tables as shown in the image below. I know you asked for a measure, but bear with me as I feel confident that this will get you what you want. Add a new column to the Results table with this formula. DirectReports = CALCULATE( COUNTROWS('Data Table'), FILTER(ALL('Data Table'), 'Data Table'[Mgr ID] = EARLIER(Results[Emp ID]) && 'Data Table'[Start Date] <= EARLIER(Results[End of Month]) && 'Data Table'[End Date] >= EARLIER(Results[End of Month]) ) ) At this point, I would hide the Emp ID and End of Month from the results table and any other field(s) you desire. From there, make your visuals. For example, you said you wanted to show direct report count over time, so I made this simple line chart and card.
PostgreSQL - How to decrease select statement excution time
My Postgres version: "PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 32-bit" The table which i am going to deal with; contains columns eventtime - timestamp without timezone serialnumber - character varying(32) sourceid - integer and 4 other columns here is my select statement SELECT eventtime, serialnumber FROM t_el_eventlog WHERE eventtime at time zone 'CET' > CURRENT_DATE and sourceid = '14'; the excution time for the above query is 59647ms And in my r script i have 5 these kind of queries (excution time = 59647ms*5). Without using time zone 'CET', the excution time is very less - but in my case I must use time zone 'CET' and if I am right the high excution time is beacuse of these timezone. my query plan text query explain analyze query(without timezone) Is there anyway that I can decrease the query excution time for my select statement
Since the distribution of the values is unknown to me, there is no clear way of solving the problem. But one problem is obvious: There is an index for the eventtime column, but since the query operates with a function over that column, the index can't be used. eventtime in time zone 'UTC' > CURRENT_DATE Either the index has to be dropped and recreated with that function or the query has to be rewritten. Recreate the index (example): CREATE INDEX ON t_el_eventlog (timezone('UTC'::text, eventtime)); (this is the same as eventtime in time zone 'UTC') This matches the filter with the function, the index can be used. I suspect the sourceid not having a great distribution, not having very much different values. In that case, dropping the index on sourceid AND dropping the index on eventtime with creating a new index over eventtime and sourceid could be an idea: CREATE INDEX ON t_el_eventlog (timezone('UTC'::text, eventtime), sourceid); This is what the theory is telling us. I made a few tests around that, with a table with around 10 Million Rows, eventtime distribution within 36 hours and only 20 different sourceids (1..20). Distribution is very random. The best results were in an index over eventtime, sourceid (no function index) and adjusting the query. CREATE INDEX ON t_el_eventlog (eventtime, sourceid); -- make sure there is no index on source id. we need to force postgres to this index. -- make sure, postgres learns about our index ANALYZE; VACUUM; -- use timezone function on current date (guessing timezone is CET) SELECT * FROM t_el_eventlog WHERE eventtime > timezone('CET',CURRENT_DATE) AND sourceid = 14; With the table having 10'000'000 rows, this query returns me about 500'000 rows in only 400ms. (instead of about 1400 up to 1700 in all other combinations). Finding the best match between the indexes and the query is the quest. I suggest some research, a recommendation is http://use-the-index-luke.com this is what the query plan looks like with the last approach: Index Only Scan using evlog_eventtime_sourceid_idx on evlog (cost=0.45..218195.13 rows=424534 width=0) Index Cond: ((eventtime > timezone('CET'::text, (('now'::cstring)::date)::timestamp with time zone)) AND (sourceid = 14)) as you can see, this is a perfect match...
How to avoid expensive Cartesian product using row generator
I'm working on a query (Oracle 11g) that does a lot of date manipulation. Using a row generator, I'm examining each date within a range of dates for each record in another table. Through another query, I know that my row generator needs to generate 8500 dates, and this amount will grow by 365 days each year. Also, the table that I'm examining has about 18000 records, and this table is expected to grow by several thousand records a year. The problem comes when joining the row generator to the other table to get the range of dates for each record. SQLTuning Advisor says that there's an expensive Cartesian product, which makes sense given that the query currently could generate up to 8500 x 18000 records. Here's the query in its stripped down form, without all the date logic etc.: with n as ( select level n from dual connect by level <= 8500 ) select t.id, t.origdate + n origdate from ( select id, origdate, closeddate from my_table ) t join n on origdate + n - 1 <= closeddate -- here's the problem join order by t.id, t.origdate; Is there an alternate way to join these two tables without the Cartesian product? I need to calculate the elapsed time for each of these records, disallowing weekends and federal holidays, so that I can sort on the elapsed time. Also, the pagination for the table is done server-side, so we can't just load into the table and sort client-side. The maximum age of a record in the system right now is 3656 days, and the average is 560, so it's not quite as bad as 8500 x 18000; but it's still bad. I've just about resigned myself to adding a field to store the opendays, computing it once and storing the elapsed time, and creating a scheduled task to update all open records every night.
I think that you would get better performance if you rewrite the join condition slightly: with n as ( select level n from dual connect by level <= 8500 ) select t.id, t.origdate + n origdate from ( select id, origdate, closeddate from my_table ) t join n on Closeddate - Origdate + 1 <= n --you could even create a function-based index order by t.id, t.origdate;