Queries running very slow on first load on PostgreSQL - performance

We are using a PostgreSQL version 9.4 database on Amazon EC2. All of our queries run super slow on the first try, until it gets cached after that they are quite quick however it is not a conciliation as it slows down the page load.
Here in one of the queries we use:
SELECT HE.fs_perm_sec_id,
HE.TICKER_EXCHANGE,
HE.proper_name,
OP.shares_outstanding,
(SELECT factset_industry_desc
FROM factset_industry_map AS fim
WHERE fim.factset_industry_code = HES.industry_code) AS industry,
(SELECT SUM(POSITION) AS ST_HOLDINGS
FROM OWN_STAKES_HOLDINGS S
WHERE S.POSITION > 0
AND S.fs_perm_sec_id = HE.fs_perm_sec_id
GROUP BY FS_PERM_SEC_ID) AS stake_holdings,
(SELECT SUM(CURRENT_HOLDINGS)
FROM
(SELECT CURRENT_HOLDINGS
FROM OWN_INST_HOLDINGS IHT
WHERE FS_PERM_SEC_ID=HE.FS_PERM_SEC_ID
ORDER BY CURRENT_HOLDINGS DESC LIMIT 10)A) AS top_10_inst_hodings,
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id) AS inst_holdings
FROM own_prices OP
JOIN h_security_ticker_exchange HE ON OP.fs_perm_sec_id = HE.fs_perm_sec_id
JOIN h_entity_sector HES ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'PG-NYS'
ORDER BY OP.price_date DESC LIMIT 1
Ran an EXPLAIN ANALYSE and received the following results:
QUERY PLAN
Limit (cost=223.39..223.39 rows=1 width=100) (actual time=2420.644..2420.645 rows=1 loops=1)
-> Sort (cost=223.39..223.39 rows=1 width=100) (actual time=2420.643..2420.643 rows=1 loops=1)
Sort Key: op.price_date
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.26..223.39 rows=1 width=100) (actual time=2316.169..2420.566 rows=36 loops=1)
-> Nested Loop (cost=0.17..8.87 rows=1 width=104) (actual time=3.958..5.084 rows=36 loops=1)
-> Index Scan using h_sec_exch_factset_entity_id_idx on h_security_ticker_exchange he (cost=0.09..4.09 rows=1 width=92) (actual time=1.452..1.454 rows=1 loops=1)
Index Cond: ((ticker_exchange)::text = 'PG-NYS'::text)
-> Index Scan using alex_prices on own_prices op (cost=0.09..4.68 rows=33 width=23) (actual time=2.496..3.592 rows=36 loops=1)
Index Cond: ((fs_perm_sec_id)::text = (he.fs_perm_sec_id)::text)
-> Index Scan using alex_factset_entity_idx on h_entity_sector hes (cost=0.09..4.09 rows=1 width=14) (actual time=0.076..0.077 rows=1 loops=36)
Index Cond: (factset_entity_id = he.factset_entity_id)
SubPlan 1
-> Index Only Scan using alex_factset_industry_code_idx on factset_industry_map fim (cost=0.03..2.03 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=36)
Index Cond: (factset_industry_code = hes.industry_code)
Heap Fetches: 0
SubPlan 2
-> GroupAggregate (cost=0.08..2.18 rows=2 width=17) (actual time=0.735..0.735 rows=1 loops=36)
Group Key: s.fs_perm_sec_id
-> Index Only Scan using own_stakes_holdings_perm_position_idx on own_stakes_holdings s (cost=0.08..2.15 rows=14 width=17) (actual time=0.080..0.713 rows=39 loops=36)
Index Cond: ((fs_perm_sec_id = (he.fs_perm_sec_id)::text) AND (\position\ > 0::numeric))
Heap Fetches: 1155
SubPlan 3
-> Aggregate (cost=11.25..11.26 rows=1 width=6) (actual time=0.166..0.166 rows=1 loops=36)
-> Limit (cost=0.09..11.22 rows=10 width=6) (actual time=0.081..0.150 rows=10 loops=36)
-> Index Only Scan Backward using alex_current_holdings_idx on own_inst_holdings iht (cost=0.09..194.87 rows=175 width=6) (actual time=0.080..0.147 rows=10 loops=36)
Index Cond: (fs_perm_sec_id = (he.fs_perm_sec_id)::text)
Heap Fetches: 288
SubPlan 4
-> Aggregate (cost=194.96..194.96 rows=1 width=6) (actual time=66.102..66.102 rows=1 loops=36)
-> Index Only Scan using alex_current_holdings_idx on own_inst_holdings oih (cost=0.09..194.87 rows=175 width=6) (actual time=0.060..65.209 rows=2505 loops=36)
Index Cond: (fs_perm_sec_id = (he.fs_perm_sec_id)::text)
Heap Fetches: 33453
Planning time: 1.581 ms
Execution time: 2420.830 ms
Once we disable the SELECT SUM() for the 3 aggregates it speeds up considerably but it defeats the point of having a relational DB.
We are running the queries with NodeJS using PG plugin (https://www.npmjs.com/package/pg) to connect and run the queries on the DB
How can we speed up the queries? What additional steps we could take? We have already indexed the DB and all the fields seems to be indexed properly but still it not fast enough.
Any help, comments and /or suggestions are appreciated.

Nested loops with aggregates are generally a bad thing. The below should avoid that. (Untested; a SQLFiddle would have been helpful.) Give this a spin and let me know. I'm curious how the engine plays with the window function filter.
WITH security
AS (
SELECT HE.fs_perm_sec_id
, HE.TICKER_EXCHANGE
, HE.proper_name
, OP.shares_outstanding
, OP.price_date
FROM own_prices AS OP
JOIN h_security_ticker_exchange AS HE
ON OP.fs_perm_sec_id = HE.fs_perm_sec_id
JOIN h_entity_sector AS HES
ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'PG-NYS'
)
SELECT SE.fs_perm_sec_id
, SE.TICKER_EXCHANGE
, SE.proper_name
, SE.shares_outstanding
, S.stake_holdings
, IHT.top_10_inst_holdings
, OIH.inst_holdings
FROM security SE
JOIN (
SELECT S.fs_perm_sec_id
, SUM(S.POSITION) AS stake_holdings
FROM OWN_STAKES_HOLDINGS AS S
WHERE S.fs_perm_sec_id IN (
SELECT fs_perm_sec_id
FROM security
)
AND S.POSITION > 0
GROUP BY S.fs_perm_sec_id
) AS S
ON SE.fs_perm_sec_id = S.fs_perm_sec_id
JOIN (
SELECT IHT.FS_PERM_SEC_ID
, SUM(IHT.CURRENT_HOLDINGS) AS top_10_inst_holdings
FROM OWN_INST_HOLDINGS AS IHT
WHERE IHT.FS_PERM_SEC_ID IN (
SELECT fs_perm_sec_id
FROM security
)
AND ROW_NUMBER() OVER (
PARTITION BY IHT.FS_PERM_SEC_ID
ORDER BY IHT.CURRENT_HOLDINGS DESC
) <= 10
GROUP BY IHT.FS_PERM_SEC_ID
) AS IHT
ON SE.fs_perm_sec_id = IHT.fs_perm_sec_id
JOIN (
SELECT S.fs_perm_sec_id
, SUM(OIH.current_holdings) AS inst_holdings
FROM own_inst_holdings AS OIH
WHERE OIH.fs_perm_sec_id IN (
SELECT fs_perm_sec_id
FROM security
)
GROUP BY OIH.fs_perm_sec_id
) AS OIH
ON SE.fs_perm_sec_id = OIH.fs_perm_sec_id
ORDER BY SE.price_date
LIMIT 1

Related

Query runs much slower using JDBC

I have two different queries that take about the same amount of time to execute when I timed with Adminer or DBeaver
Query one
select * from state where state_name = 'Florida';
When I run the query above in Adminer it takes anywhere from
0.032 s to 0.058 s
EXPLAIN ANALYZE
Seq Scan on state (cost=0.00..3981.50 rows=1 width=28) (actual time=1.787..15.047 rows=1 loops=1)
Filter: (state_name = 'Florida'::citext)
Rows Removed by Filter: 50
Planning Time: 0.486 ms
Execution Time: 15.779 ms
Query two
select
property.id as property_id ,
full_address,
street_address,
street.street,
city.city as city,
state.state_code as state_code,
zipcode.zipcode as zipcode
from
property
inner join street on
street.id = property.street_id
inner join city on
city.id = property.city_id
inner join state on
state.id = property.state_id
inner join zipcode on
zipcode.id = property.zipcode_id
where
full_address = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211';
The above query takes from
0.025 s to 0.048 s
EXPLAIN ANALYZE
Nested Loop (cost=29.82..65.96 rows=1 width=97) (actual time=0.668..0.671 rows=1 loops=1)
-> Nested Loop (cost=29.53..57.65 rows=1 width=107) (actual time=0.617..0.620 rows=1 loops=1)
-> Nested Loop (cost=29.25..49.30 rows=1 width=120) (actual time=0.582..0.585 rows=1 loops=1)
-> Nested Loop (cost=28.97..41.00 rows=1 width=127) (actual time=0.532..0.534 rows=1 loops=1)
-> Bitmap Heap Scan on property (cost=28.54..32.56 rows=1 width=131) (actual time=0.454..0.456 rows=1 loops=1)
Recheck Cond: (full_address = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211'::citext)
Heap Blocks: exact=1
-> Bitmap Index Scan on property_full_address (cost=0.00..28.54 rows=1 width=0) (actual time=0.426..0.426 rows=1 loops=1)
Index Cond: (full_address = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211'::citext)
-> Index Scan using street_pkey on street (cost=0.42..8.44 rows=1 width=28) (actual time=0.070..0.070 rows=1 loops=1)
Index Cond: (id = property.street_id)
-> Index Scan using city_id_pk on city (cost=0.29..8.30 rows=1 width=25) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (id = property.city_id)
-> Index Scan using state_id_pk on state (cost=0.28..8.32 rows=1 width=19) (actual time=0.032..0.032 rows=1 loops=1)
Index Cond: (id = property.state_id)
-> Index Scan using zipcode_id_pk on zipcode (cost=0.29..8.30 rows=1 width=22) (actual time=0.048..0.048 rows=1 loops=1)
Index Cond: (id = property.zipcode_id)
Planning Time: 5.473 ms
Execution Time: 1.601 ms
I have the following methods which uses JDBCTemplate to execute the same queries.
Query one
public void performanceTest(String str) {
template.queryForObject(
"select * from state where state_name = ?",
new Object[] { str }, (result, rowNum) -> {
return result.getObject("state_name");
});
}
time: 140ms, which is 0.14 seconds
Query two
public void performanceTest(String str) {
template.queryForObject(
"SELECT property.id AS property_id , full_address, street_address, street.street, city.city as city, state.state_code as state_code, zipcode.zipcode as zipcode FROM property INNER JOIN street ON street.id = property.street_id INNER JOIN city ON city.id = property.city_id INNER JOIN state ON state.id = property.state_id INNER JOIN zipcode ON zipcode.id = property.zipcode_id WHERE full_address = ?",
new Object[] { str }, (result, rowNum) -> {
return result.getObject("property_id");
});
}
The time it takes to execute the method above is
time: 828 ms, which is 0.825 seconds
I am timing the method's execution time using this code below
long startTime1 = System.nanoTime();
propertyRepo.performanceTest(address); //or "Florida" depending which query I'm testing
long endTime1 = System.nanoTime();
long duration1 = TimeUnit.MILLISECONDS.convert((endTime1 - startTime1), TimeUnit.NANOSECONDS);
System.out.println("time: " + duration1);
Why is query two so much slower when I run it from JDBC compared to when I run it from Adminer? Anything I can do to improve the performance for query two?
EDIT:
I created two different PHP scripts containing the queries respectively. They take the same amount of time using PHP, so I assume it has something to do with JDBC? Below is the result of the PHP scripts. The time PHP takes is a higher than Java takes with Query one since I am not using any connection pooling. But both queries are taking pretty much the same amount of time to execute. Something is causing a delay with Query two on JDBC.
EDIT:
When I run the query using prepared statement it's slow. But it's fast when I run it with statement. I did EXPLAIN ANALYZE for both, using preparedStatement and statement
preparedStatement explain analyze
Nested Loop (cost=1.27..315241.91 rows=1 width=97) (actual time=0.091..688.583 rows=1 loops=1)
-> Nested Loop (cost=0.98..315233.61 rows=1 width=107) (actual time=0.079..688.571 rows=1 loops=1)
-> Nested Loop (cost=0.71..315225.26 rows=1 width=120) (actual time=0.069..688.561 rows=1 loops=1)
-> Nested Loop (cost=0.42..315216.95 rows=1 width=127) (actual time=0.057..688.548 rows=1 loops=1)
-> Seq Scan on property (cost=0.00..315208.51 rows=1 width=131) (actual time=0.032..688.522 rows=1 loops=1)
Filter: ((full_address)::text = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211'::text)
Rows Removed by Filter: 8790
-> Index Scan using street_pkey on street (cost=0.42..8.44 rows=1 width=28) (actual time=0.019..0.019 rows=1 loops=1)
Index Cond: (id = property.street_id)
-> Index Scan using city_id_pk on city (cost=0.29..8.30 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (id = property.city_id)
-> Index Scan using state_id_pk on state (cost=0.28..8.32 rows=1 width=19) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (id = property.state_id)
-> Index Scan using zipcode_id_pk on zipcode (cost=0.29..8.30 rows=1 width=22) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (id = property.zipcode_id)
Planning Time: 2.400 ms
Execution Time: 688.674 ms
statement explain analyze
Nested Loop (cost=29.82..65.96 rows=1 width=97) (actual time=0.232..0.235 rows=1 loops=1)
-> Nested Loop (cost=29.53..57.65 rows=1 width=107) (actual time=0.220..0.223 rows=1 loops=1)
-> Nested Loop (cost=29.25..49.30 rows=1 width=120) (actual time=0.211..0.213 rows=1 loops=1)
-> Nested Loop (cost=28.97..41.00 rows=1 width=127) (actual time=0.198..0.200 rows=1 loops=1)
-> Bitmap Heap Scan on property (cost=28.54..32.56 rows=1 width=131) (actual time=0.175..0.177 rows=1 loops=1)
Recheck Cond: (full_address = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211'::citext)
Heap Blocks: exact=1
-> Bitmap Index Scan on property_full_address (cost=0.00..28.54 rows=1 width=0) (actual time=0.162..0.162 rows=1 loops=1)
Index Cond: (full_address = '139-Skillman-Ave-Apt-5C-Brooklyn-NY-11211'::citext)
-> Index Scan using street_pkey on street (cost=0.42..8.44 rows=1 width=28) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: (id = property.street_id)
-> Index Scan using city_id_pk on city (cost=0.29..8.30 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (id = property.city_id)
-> Index Scan using state_id_pk on state (cost=0.28..8.32 rows=1 width=19) (actual time=0.007..0.007 rows=1 loops=1)
Index Cond: (id = property.state_id)
-> Index Scan using zipcode_id_pk on zipcode (cost=0.29..8.30 rows=1 width=22) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (id = property.zipcode_id)
Planning Time: 2.442 ms
Execution Time: 0.345 ms
It's because of the connection pool that is used by the different clients.
You can setup a fast connection pool like HikariC for JDBC like this:
public class HikariCPDataSource {
private static HikariConfig config = new HikariConfig();
private static HikariDataSource ds;
static {
config.setJdbcUrl("jdbc:h2:mem:test");
config.setUsername("user");
config.setPassword("password");
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
ds = new HikariDataSource(config);
}
public static Connection getConnection() throws SQLException {
return ds.getConnection();
}
private HikariCPDataSource(){}
}

PostgreSQL 9.6 performance issue due to sequential scan insted of index

I am encountering a performance issue with PostgreSQL 9.6.
I have a table:
CREATE TABLE public.cdrsinfo (
recordid bigint NOT NULL,
billid integer,
CONSTRAINT cdrsinfo_pkey PRIMARY KEY (recordid) );
And a index created on it like:
CREATE UNIQUE INDEX cdrsinfo_pkey ON cdrsinfo USING btree (recordid)
CREATE INDEX indx_cdrsinfo_billid ON cdrsinfo USING btree (billid)
The table has around 3M records, I've done the analyze on it and when running the following queries with EXPLAIN I get some strange results :
SELECT max(recordid) FROM example WHERE billid = 535;
Result (cost=26631.27..26631.28 rows=1 width=8)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..26631.27 rows=1 width=8)
-> Index Scan Backward using cdrsinfo_pkey on cdrsinfo (cost=0.57..2291944283.82 rows=86064 width=8)
Index Cond: (recordid IS NOT NULL)
Filter: (billid = 535)
SELECT max(recordid) FROM example WHERE billid < 535;
Aggregate (cost=725.85..725.86 rows=1 width=8)
-> Index Scan using indx_cdrsinfo_billid on cdrsinfo (cost=0.57..725.37 rows=192 width=8)
Index Cond: (billid < 535)
If I do a count of all the rows that have the billid = 535 I get 44 . My question is why doesn't the query planner use the indx_cdrsinfo_billid in the first example ?
I get huge performance issues because of this, this first SQL takes ~2 hours to complete and the second one ~170 ms .
I forgot to mention a 3rd index that I have on the table :
CREATE INDEX indx_cdrsinfo_billid_recordid ON cdrsinfo USING btree (recordid,billid)
As I mentioned the table was analyzed before running the query . Now when I executed the explain with analyze verbose and buffers I get a very good time on the same query where billid = 535 :
Result (cost=0.85..0.86 rows=1 width=8) (actual time=0.034..0.034 rows=1 loops=1)
Output: $0
Buffers: shared hit=5
InitPlan 1 (returns $0)
-> Limit (cost=0.57..0.85 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)
Output: cdrsinfo.recordid
Buffers: shared hit=5
-> Index Only Scan Backward using indx_cdrsinfo_billid_recordid on public.cdrsinfo (cost=0.57..24041.88 rows=89007 width=8) (actual time=0.022..0.022 rows=1 loops=1)
Output: cdrsinfo.recordid
Index Cond: ((cdrsinfo.billid = 535) AND (cdrsinfo.recordid IS NOT NULL))
Heap Fetches: 0
Buffers: shared hit=5
Planning time: 0.177 ms
Execution time: 0.056 ms
The index was there in the past too, I don't understand why the Query Planner decided to use it now and not this morning .
Another strange thing when I got that very awful execution time I tried to re-write that query in several modes to see the explain plan and when I do it like this :
SELECT max(recordid) FROM cdrsinfo WHERE recordid IN (SELECT recordid FROM cdrsinfo WHERE billid = 535 OFFSET 0)
Because of the OFFSET 0 the Query Planner used the indx_cdrsinfo_billid, the one that I would expect to be used .

Slow performance after upgrading PostreSQL from 9.1 to 9.4

I'm getting extremely slow performance after upgrading Postgres 9.1 to 9.4. Here's an example of a two queries which are running significantly more slowly.
Note: I realize that these queries might be able to be rewritten to work more efficiently, however the main thing I'm concerned about is that after upgrading to a newer version of Postgres, they are suddenly running 100x more slowly! I'm hoping there's a configuration variable someplace I've overlooked.
While doing the upgrade I used the pg_upgrade command with the --link option. The configuration file is the same between 9.4 and 9.1. It's not running on the exact same hardware, but they're both running on a Linode and I've tried using 3 different Linodes now for the new server, so I don't think this is a hardware issue.
It seems like in both cases, 9.4 is using different indexes than 9.1?
9.1:
EXPLAIN ANALYZE SELECT "id", "title", "timestamp", "parent", "deleted", "sunk", "closed", "sticky", "lastupdate", "views", "oldid", "editedon", "devpost", "hideblue", "totalvotes", "statustag", "forum_category_id", "account_id" FROM "forum_posts" WHERE "parent" = 882269 ORDER BY "timestamp" DESC LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=63.87..63.87 rows=1 width=78) (actual time=0.020..0.020 rows=0 loops=1)
-> Sort (cost=63.87..63.98 rows=45 width=78) (actual time=0.018..0.018 rows=0 loops=1)
Sort Key: "timestamp"
Sort Method: quicksort Memory: 17kB
-> Index Scan using index_forum_posts_parent on forum_posts (cost=0.00..63.65 rows=45 width=78) (actual time=0.013..0.013 rows=0 loops=1)
Index Cond: (parent = 882269)
Total runtime: 0.074 ms
(7 rows)
9.4:
EXPLAIN ANALYZE SELECT "id", "title", "timestamp", "parent", "deleted", "sunk", "closed", "sticky", "lastupdate", "views", "oldid", "editedon", "devpost", "hideblue", "totalvotes", "statustag", "forum_category_id", "account_id" FROM "forum_posts" WHERE "parent" = 882269 ORDER BY "timestamp" DESC LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..63.48 rows=1 width=1078) (actual time=920.484..920.484 rows=0 loops=1)
-> Index Scan Backward using forum_posts_timestamp_index on forum_posts (cost=0.42..182622.07 rows=2896 width=1078) (actual time=920.480..920.480 rows=0 loops=1)
Filter: (parent = 882269)
Rows Removed by Filter: 1576382
Planning time: 0.166 ms
Execution time: 920.521 ms
(6 rows)
9.1:
EXPLAIN ANALYZE SELECT "user_library_images"."id", "user_library_images"."imgsrc", "user_library_images"."library_image_id", "user_library_images"."type", "user_library_images"."is_user_uploaded", "user_library_images"."credit", "user_library_images"."orig_dimensions", "user_library_images"."account_id" FROM "user_library_images" INNER JOIN "image_tags" ON "user_library_images"."id" = "image_tags"."user_library_image_id" WHERE ("user_library_images"."account_id" = 769718 AND "image_tags"."tag" ILIKE '%stone%') GROUP BY "user_library_images"."id", "user_library_images"."imgsrc", "user_library_images"."library_image_id", "user_library_images"."type", "user_library_images"."is_user_uploaded", "user_library_images"."credit", "user_library_images"."orig_dimensions", "user_library_images"."account_id" ORDER BY "user_library_images"."id";
Group (cost=2015.46..2015.49 rows=1 width=247) (actual time=0.629..0.652 rows=6 loops=1)
-> Sort (cost=2015.46..2015.47 rows=1 width=247) (actual time=0.626..0.632 rows=6 loops=1)
Sort Key: user_library_images.id, user_library_images.imgsrc, user_library_images.library_image_id, user_library_images.type, user_library_images.is_user_uploaded, user_library_images.credit, user_library_images.orig_dimensions, user_library_images.account_id
Sort Method: quicksort Memory: 19kB
-> Nested Loop (cost=0.00..2015.45 rows=1 width=247) (actual time=0.283..0.603 rows=6 loops=1)
-> Index Scan using index_user_library_images_account on user_library_images (cost=0.00..445.57 rows=285 width=247) (actual time=0.076..0.273 rows=13 loops=1)
Index Cond: (account_id = 769718)
-> Index Scan using index_image_tags_user_library_image on image_tags (cost=0.00..5.50 rows=1 width=4) (actual time=0.020..0.021 rows=0 loops=13)
Index Cond: (user_library_image_id = user_library_images.id)
Filter: (tag ~~* '%stone%'::text)
Total runtime: 0.697 ms
(11 rows)
9.4:
Group (cost=166708.13..166709.46 rows=59 width=1241) (actual time=9677.052..9677.052 rows=0 loops=1)
Group Key: user_library_images.id, user_library_images.imgsrc, user_library_images.library_image_id, user_library_images.type, user_library_images.is_user_uploaded, user_library_images.credit, user_library_images.orig_dimensions, user_library_images.account_id
-> Sort (cost=166708.13..166708.28 rows=59 width=1241) (actual time=9677.049..9677.049 rows=0 loops=1)
Sort Key: user_library_images.id, user_library_images.imgsrc, user_library_images.library_image_id, user_library_images.type, user_library_images.is_user_uploaded, user_library_images.credit, user_library_images.orig_dimensions, user_library_images.account_id
Sort Method: quicksort Memory: 17kB
-> Hash Join (cost=10113.22..166706.39 rows=59 width=1241) (actual time=9677.035..9677.035 rows=0 loops=1)
Hash Cond: (image_tags.user_library_image_id = user_library_images.id)
-> Seq Scan on image_tags (cost=0.00..156488.85 rows=11855 width=4) (actual time=0.301..9592.048 rows=63868 loops=1)
Filter: (tag ~~* '%stone%'::text)
Rows Removed by Filter: 9370406
-> Hash (cost=10045.97..10045.97 rows=5380 width=1241) (actual time=0.047..0.047 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on user_library_images (cost=288.12..10045.97 rows=5380 width=1241) (actual time=0.027..0.037 rows=4 loops=1)
Recheck Cond: (account_id = 769718)
Heap Blocks: exact=4
-> Bitmap Index Scan on index_user_library_images_account (cost=0.00..286.78 rows=5380 width=0) (actual time=0.019..0.019 rows=4 loops=1)
Index Cond: (account_id = 769718)
Planning time: 0.223 ms
Execution time: 9677.109 ms
(19 rows)
====
After running the analyze script (see the answer below), the problem was solved. For reference, here's the new ANALYZE output (for 9.4):
Group (cost=2062.82..2062.91 rows=4 width=248) (actual time=8.775..8.801 rows=7 loops=1)
Group Key: user_library_images.id, user_library_images.imgsrc, user_library_images.library_image_id, user_library_images.type, user_library_images.is_user_uploaded, user_library_images.credit, user_library_images.orig_dimensions, user_library_images.account_id
-> Sort (cost=2062.82..2062.83 rows=4 width=248) (actual time=8.771..8.780 rows=7 loops=1)
Sort Key: user_library_images.id, user_library_images.imgsrc, user_library_images.library_image_id, user_library_images.type, user_library_images.is_user_uploaded, user_library_images.credit, user_library_images.orig_dimensions, user_library_images.account_id
Sort Method: quicksort Memory: 19kB
-> Nested Loop (cost=0.87..2062.78 rows=4 width=248) (actual time=4.156..8.685 rows=7 loops=1)
-> Index Scan using index_user_library_images_account on user_library_images (cost=0.43..469.62 rows=304 width=248) (actual time=0.319..2.528 rows=363 loops=1)
Index Cond: (account_id = 769718)
-> Index Scan using index_image_tags_user_library_image on image_tags (cost=0.43..5.23 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=363)
Index Cond: (user_library_image_id = user_library_images.id)
Filter: (tag ~~* '%stone%'::text)
Rows Removed by Filter: 2
Planning time: 2.956 ms
Execution time: 8.907 ms
(14 rows)
Limit (cost=65.81..65.81 rows=1 width=77) (actual time=0.256..0.256 rows=0 loops=1)
-> Sort (cost=65.81..65.92 rows=47 width=77) (actual time=0.252..0.252 rows=0 loops=1)
Sort Key: "timestamp"
Sort Method: quicksort Memory: 17kB
-> Index Scan using index_forum_posts_parent on forum_posts (cost=0.43..65.57 rows=47 width=77) (actual time=0.211..0.211 rows=0 loops=1)
Index Cond: (parent = 882269)
Planning time: 2.978 ms
Execution time: 0.380 ms
(8 rows)
pg_upgrade does not copy (or migrate) statistics for your database.
So you need to analyze your tables in order to update the statistics in the migrated database. pg_upgrade will create a batch file/shell script with the name analyze_new_cluster that can be used for that.
Alternatively you can use vacuum analyze manually to achieve the same thing.
The missing statistics can be detected by looking at the execution plan. The difference between the expected number of rows and the actual numbers are too high:
(cost=0.00..286.78 rows=5380 width=0) (actual time=0.019..0.019 rows=4 loops=1)
==> 5380 vs. 4 rows
or
(cost=0.00..156488.85 rows=11855 width=4) (actual time=0.301..9592.048 rows=63868 loops=1)
==> 11855 vs. 63868 rows

PostgreSQL Query not executing or takes too much time

I am using PostgreSQL 9.2 on windows while executing this query:
SELECT
gtab16.VrId,
coalesce((
select (gtab09.TdAmt+gtab09.CdAmt)
from gtab09
Where gtab09.JrmId = gtab16.JrmId
limit 1
), 0::money
) as DisAmount,
gtab02.VrName,
gtab02.Vrlongname,
gtab16.AcYrId,
CASE WHEN gtab16.vrid = 28 THEN gtab16.RefNo ELSE gtab16.VrNo END AS VrNo,
gtab16.RefNo,
CASE
WHEN gtab16.VrId = 10 THEN cast((
Select coalesce(PBillDate,null)
from gtab09
Where gtab09.JrmId = gtab16.JrmId
) as bpchar)
ELSE ''
END As BillDate ,
Cast(gtab16.VrDate As timestamp) As VrDate,
gtab16.AgeDate,
(
SELECT coalesce(Sum(gtab18.ageamt),0::money)
FROM
(gtab16 AS A INNER JOIN gtab17 AS B ON A.jrmId = B.JrmId)
INNER JOIN
gtab18 ON B.JrDetId = gtab18.crjrdetid
WHERE gtab18.drjrdetid = Gtab17.JrDetId AND A.AgeDate <= '2014-07-09'
) AS AgedAmt,
case when gtab17.dr > 0::money then gtab17.Dr else gtab17.Cr end AS VrAmt,
gtab17.AcId, gtab12.AcName,
gtab12.AcShortName,
gtab12.PhoneOff,
case when gtab17.cr > 0::money then 1 else 0 end AS Receipt,
gtab47.AreaName,
gtab16.JrMId,
gtab17.JrDetId,
date_part('day','2014-07-09' - Gtab16.agedate) as DayCount,
(
SELECT coalesce(sum(chqAmt),0::money)
From gtab19
Where PartyAcId = gtab17.acid and vrid = 19 And Pdc =1
) as PDCCheq,
30 As Span1,
60 As Span2,
90 As Span3
FROM
(
gtab16
INNER JOIN
(
gtab17
INNER JOIN
gtab12 ON gtab17.AcId = gtab12.acid
) ON gtab16.jrmId = gtab17.JrmId
)
INNER JOIN gtab02 ON gtab16.VrId = gtab02.vrId
INNER JOIN gtab47 ON gtab12.AreaId = gtab47.AreaId
WHERE
gtab16.BranchID = 1 And
gtab17.Dr > 0::money AND
case when gtab16.AcYrid = 2 then 1 else gtab16.VrId end <> 6 And
date_part('day','2014-07-09' - Gtab16.agedate) >= 0 AND
(gtab12.AcGrCode = '204' or gtab12.AcGrCode = '103') And
gtab47.AreaId IN (7) AND
date_part('day', '2014-07-09' - gtab16.AgeDate) >= 0 And
(gtab17.Dr - (
SELECT coalesce(Sum(gtab18.ageamt),0::money)
From
(
gtab16 AS A
INNER JOIN gtab17 AS B ON A.jrmId = B.JrmId
)
INNER JOIN gtab18 ON B.JrDetId = gtab18.crjrdetid
Where
gtab18.drjrdetid = Gtab17.JrDetId AND
A.AgeDate <= '2014-07-09') > 0::money
) AND
gtab16.VrDate Between '2014-07-01' And '2014-07-09'
it takes long time, and here is the explain analyze:
"Nested Loop (cost=0.00..98913858.59 rows=9 width=363) (actual time=302403.378..302628.382 rows=71 loops=1)"
" -> Seq Scan on gtab47 (cost=0.00..1.30 rows=1 width=122) (actual time=0.006..0.010 rows=1 loops=1)"
" Filter: (areaid = 7)"
" Rows Removed by Filter: 23"
" -> Nested Loop (cost=0.00..98908508.69 rows=9 width=249) (actual time=302400.148..302405.795 rows=71 loops=1)"
" Join Filter: (gtab16.vrid = gtab02.vrid)"
" Rows Removed by Join Filter: 3834"
" -> Seq Scan on gtab02 (cost=0.00..1.55 rows=55 width=150) (actual time=0.004..0.052 rows=55 loops=1)"
" -> Materialize (cost=0.00..98908499.74 rows=9 width=103) (actual time=5380.762..5498.218 rows=71 loops=55)"
" -> Nested Loop (cost=0.00..98908499.70 rows=9 width=103) (actual time=295941.855..302398.524 rows=71 loops=1)"
" Join Filter: (gtab17.jrmid = gtab16.jrmid)"
" Rows Removed by Join Filter: 1886191"
" -> Nested Loop (cost=0.00..98897543.98 rows=2015 width=69) (actual time=7.437..299102.826 rows=2037 loops=1)"
" Join Filter: (gtab17.acid = gtab12.acid)"
" Rows Removed by Join Filter: 12893055"
" -> Seq Scan on gtab17 (cost=0.00..98819605.03 rows=29138 width=28) (actual time=2.974..276230.715 rows=68228 loops=1)"
" Filter: ((dr > (0)::money) AND ((dr - (SubPlan 5)) > (0)::money))"
" Rows Removed by Filter: 111761"
" SubPlan 5"
" -> Aggregate (cost=548.98..549.00 rows=1 width=8) (actual time=3.131..3.132 rows=1 loops=88001)"
" -> Nested Loop (cost=0.71..548.98 rows=1 width=8) (actual time=2.707..3.126 rows=0 loops=88001)"
" -> Nested Loop (cost=0.42..548.64 rows=1 width=12) (actual time=2.701..3.119 rows=0 loops=88001)"
" -> Seq Scan on gtab18 gtab18_1 (cost=0.00..540.19 rows=1 width=12) (actual time=2.693..3.109 rows=0 loops=88001)"
" Filter: (drjrdetid = gtab17.jrdetid)"
" Rows Removed by Filter: 28575"
" -> Index Scan using gtab17_pkey on gtab17 b_1 (cost=0.42..8.44 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=28574)"
" Index Cond: (jrdetid = gtab18_1.crjrdetid)"
" -> Index Scan using gtab16_pkey on gtab16 a_1 (cost=0.29..0.33 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=28574)"
" Index Cond: (jrmid = b_1.jrmid)"
" Filter: (agedate <= '2014-07-09 00:00:00'::timestamp without time zone)"
" -> Materialize (cost=0.00..140.94 rows=178 width=45) (actual time=0.001..0.160 rows=189 loops=68228)"
" -> Seq Scan on gtab12 (cost=0.00..140.05 rows=178 width=45) (actual time=0.057..0.927 rows=189 loops=1)"
" Filter: ((areaid = 7) AND (((acgrcode)::text = '204'::text) OR ((acgrcode)::text = '103'::text)))"
" Rows Removed by Filter: 2385"
" -> Materialize (cost=0.00..3037.42 rows=262 width=38) (actual time=0.006..0.788 rows=926 loops=2037)"
" -> Seq Scan on gtab16 (cost=0.00..3036.11 rows=262 width=38) (actual time=10.342..13.037 rows=926 loops=1)"
" Filter: ((vrdate >= '2014-07-01 00:00:00'::timestamp without time zone) AND (vrdate <= '2014-07-09 00:00:00'::timestamp without time zone) AND (branchid = 1) AND (CASE WHEN (acyrid = 2) THEN 1 ELSE vrid END <> 6) AND (date_p (...)"
" Rows Removed by Filter: 58837"
" SubPlan 1"
" -> Limit (cost=0.29..8.31 rows=1 width=16) (actual time=0.011..0.012 rows=1 loops=71)"
" -> Index Scan using gtab09_jrmid_idx on gtab09 (cost=0.29..8.31 rows=1 width=16) (actual time=0.007..0.007 rows=1 loops=71)"
" Index Cond: (jrmid = gtab16.jrmid)"
" SubPlan 2"
" -> Index Scan using gtab09_jrmid_idx on gtab09 gtab09_1 (cost=0.29..8.31 rows=1 width=8) (never executed)"
" Index Cond: (jrmid = gtab16.jrmid)"
" SubPlan 3"
" -> Aggregate (cost=548.98..549.00 rows=1 width=8) (actual time=2.975..2.975 rows=1 loops=71)"
" -> Nested Loop (cost=0.71..548.98 rows=1 width=8) (actual time=2.968..2.970 rows=0 loops=71)"
" -> Nested Loop (cost=0.42..548.64 rows=1 width=12) (actual time=2.965..2.966 rows=0 loops=71)"
" -> Seq Scan on gtab18 (cost=0.00..540.19 rows=1 width=12) (actual time=2.959..2.960 rows=0 loops=71)"
" Filter: (drjrdetid = gtab17.jrdetid)"
" Rows Removed by Filter: 28575"
" -> Index Scan using gtab17_pkey on gtab17 b (cost=0.42..8.44 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)"
" Index Cond: (jrdetid = gtab18.crjrdetid)"
" -> Index Scan using gtab16_pkey on gtab16 a (cost=0.29..0.33 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)"
" Index Cond: (jrmid = b.jrmid)"
" Filter: (agedate <= '2014-07-09 00:00:00'::timestamp without time zone)"
" SubPlan 4"
" -> Aggregate (cost=28.63..28.64 rows=1 width=8) (actual time=0.130..0.131 rows=1 loops=71)"
" -> Seq Scan on gtab19 (cost=0.00..28.62 rows=1 width=8) (actual time=0.124..0.124 rows=0 loops=71)"
" Filter: ((partyacid = gtab17.acid) AND (vrid = 19) AND (pdc = 1))"
" Rows Removed by Filter: 607"
"Total runtime: 302628.704 ms"
PostgreSQL server side config
max_connections 10000
max_stack_depth 2MB
shared_buffers 1GB
temp_buffers 1GB
am new so any suggestions or advices are much appreciated
on http://explain.depesz.com/, you will see where your query was slow.
but before doing query optimisation , read a tutorial about it to understand what is seqscan/loop/cast etc... and please try to rewrite your query to do it more simple! i think you can.
Read what #wildplasser said because Select coalesce(PBillDate,null) from is due to an incomprehensive use of coalesce...Select PBillDate from is the same so don't do not necessary treatment.
But to help you, when I'm in this case, I try to remove all the select clause to kept only one and remove join that are not necessary for my select clause and play the query. try to re-add select clause one by one with the correct join and if you see your query time growing, do an explain, go to depesz website to read the explain and try to detect where your query was slow and add index or rewrite it.
It's very hard to read a big query and try to optimize it, so add select clause step by step and you will find your solution.
and check if your statistics are ok, if not your explain will be not correct. you can do vacuum to refresh your stat. And try explain analyze instead of explain because explain analyze played the real query and return you many other information to see the cost of each part.
Read a tutorial on explain analyze because you have many options very useful ;)
Best regards

PostgreSQLquery speed is variable

Context
I have a table that keeps netflow data (all packets intercepted by the router).
This table features approximately 5.9 million rows at the moment.
Problem
I am trying a simple query to count the number of packets received by day, which should not take long.
The first time I run it, the query takes 88 seconds, then after a second run, 33 seconds, then 5 seconds for all subsequent runs.
The main problem is not the speed of the query, but rather that after executing the same query 3 times, the speed is nearly 20 times faster.
I understand the concept of query cache, however the performance of the original query run makes no sense to me.
Tests
The column that I am using to join (datetime) is of type timestamptz, and is indexed:
CREATE INDEX date ON netflows USING btree (datetime);
Looking at the EXPLAIN statements. The difference in execution is in the Nested Loop.
I have already VACUUM ANALYZE the table, with the exact same results.
Current environment
Linux Ubuntu 12.04 VM running on VMware ESX 4.1
PostgreSQL 9.1
VM has 2 GB RAM, 2 cores.
database server is entirely dedicated to this and is doing nothing else
inserts in the table every minute (100 rows per minute)
very low disk, ram or cpu activity
Query
with date_list as (
select
series as start_date,
series + '23:59:59' as end_date
from
generate_series(
(select min(datetime) from netflows)::date,
(select max(datetime) from netflows)::date,
'1 day') as series
)
select
start_date,
end_date,
count(*)
from
netflows
inner join date_list on (datetime between start_date and end_date)
group by
start_date,
end_date;
Explain of first run (88 seconds)
Sort (cost=27007355.59..27007356.09 rows=200 width=8) (actual time=89647.054..89647.055 rows=18 loops=1)
Sort Key: date_list.start_date
Sort Method: quicksort Memory: 25kB
CTE date_list
-> Function Scan on generate_series series (cost=0.13..12.63 rows=1000 width=8) (actual time=92.567..92.667 rows=19 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.05..0.06 rows=1 width=0) (actual time=71.270..71.270 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=71.259..71.261 rows=1 loops=1)
-> Index Scan using date on netflows (cost=0.00..303662.15 rows=5945591 width=8) (actual time=71.252..71.252 rows=1 loops=1)
Index Cond: (datetime IS NOT NULL)
InitPlan 4 (returns $3)
-> Result (cost=0.05..0.06 rows=1 width=0) (actual time=11.786..11.787 rows=1 loops=1)
InitPlan 3 (returns $2)
-> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=11.778..11.779 rows=1 loops=1)
-> Index Scan Backward using date on netflows (cost=0.00..303662.15 rows=5945591 width=8) (actual time=11.776..11.776 rows=1 loops=1)
Index Cond: (datetime IS NOT NULL)
-> HashAggregate (cost=27007333.31..27007335.31 rows=200 width=8) (actual time=89639.167..89639.179 rows=18 loops=1)
-> Nested Loop (cost=0.00..23704227.20 rows=660621222 width=8) (actual time=92.667..88059.576 rows=5945457 loops=1)
-> CTE Scan on date_list (cost=0.00..20.00 rows=1000 width=16) (actual time=92.578..92.785 rows=19 loops=1)
-> Index Scan using date on netflows (cost=0.00..13794.89 rows=660621 width=8) (actual time=2.438..4571.884 rows=312919 loops=19)
Index Cond: ((datetime >= date_list.start_date) AND (datetime <= date_list.end_date))
Total runtime: 89668.047 ms
EXPLAIN of third run (5 seconds)
Sort (cost=27011357.45..27011357.95 rows=200 width=8) (actual time=5645.031..5645.032 rows=18 loops=1)
Sort Key: date_list.start_date
Sort Method: quicksort Memory: 25kB
CTE date_list
-> Function Scan on generate_series series (cost=0.13..12.63 rows=1000 width=8) (actual time=0.108..0.204 rows=19 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.05..0.06 rows=1 width=0) (actual time=0.050..0.050 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.046..0.046 rows=1 loops=1)
-> Index Scan using date on netflows (cost=0.00..303705.14 rows=5946469 width=8) (actual time=0.046..0.046 rows=1 loops=1)
Index Cond: (datetime IS NOT NULL)
InitPlan 4 (returns $3)
-> Result (cost=0.05..0.06 rows=1 width=0) (actual time=0.026..0.026 rows=1 loops=1)
InitPlan 3 (returns $2)
-> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1)
-> Index Scan Backward using date on netflows (cost=0.00..303705.14 rows=5946469 width=8) (actual time=0.026..0.026 rows=1 loops=1)
Index Cond: (datetime IS NOT NULL)
-> HashAggregate (cost=27011335.17..27011337.17 rows=200 width=8) (actual time=5645.005..5645.009 rows=18 loops=1)
-> Nested Loop (cost=0.00..23707741.28 rows=660718778 width=8) (actual time=0.134..4176.406 rows=5946329 loops=1)
-> CTE Scan on date_list (cost=0.00..20.00 rows=1000 width=16) (actual time=0.110..0.343 rows=19 loops=1)
-> Index Scan using date on netflows (cost=0.00..13796.94 rows=660719 width=8) (actual time=0.026..164.117 rows=312965 loops=19)
Index Cond: ((datetime >= date_list.start_date) AND (datetime <= date_list.end_date))
Total runtime: 5645.189 ms
If you are doing an INNER JOIN I don't think you need the CTE at all. You can define
select
datetime::date,
count(*)
from netflows
group by datetime::date /* or GROUP BY 1 as Postgres extension */
I don't see why you need the dates table unless you want a LEFT JOIN to get zeroes where appropriate. This will mean one pass through the data.
BTW, I discourage you from using keywords like date and datetime for entities and columns; even when it's legal, it's not worth it.
WITH date_list as (
SELECT t AS start_date
,(t + interval '1d') AS end_date
FROM (
SELECT generate_series((min(datetime))::date
,(max(datetime))::date
,'1d') AS t
FROM netflows
) x
)
SELECT d.start_date
,count(*) AS ct
FROM date_list d
LEFT JOIN netflows n ON n.datetime >= d.start_date
AND n.datetime < d.end_date
GROUP BY d.start_date;
And use a proper name for your index (already hinted by #Andrew):
CREATE INDEX netflows_date_idx ON netflows (datetime);
Major points
Assuming you want a row for every day of the calender, like #Andrew already mentioned on his answer, I replaced the JOIN with a LEFT JOIN.
It's much more efficient to grab min() and max() from netflows in one query.
Simplified type casting.
Fixed the date ranges. Your code would fail for timestamps like '2012-12-06 23:59:59.123'.
Tested this on a large table and performance was nice.
As to your original question: undoubtedly caching effects, which are to be expected - especially with limited RAM.

Resources