Postgres Hash Join speed [closed] - performance

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I have 3 tables which I wish to join using inner joins in Postgres 9.1, reads, devices, and device_patients. Below is an abbreviated schema for each table.
reads -- ~250,000 rows
CREATE TABLE reads
(
id serial NOT NULL,
device_id integer NOT NULL,
value bigint NOT NULL,
read_datetime timestamp without time zone NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT reads_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE reads
OWNER TO postgres;
CREATE INDEX index_reads_on_device_id
ON reads
USING btree
(device_id );
CREATE INDEX index_reads_on_read_datetime
ON reads
USING btree
(read_datetime );
devices -- ~500 rows
CREATE TABLE devices
(
id serial NOT NULL,
serial_number character varying(20) NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT devices_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE devices
OWNER TO postgres;
CREATE UNIQUE INDEX index_devices_on_serial_number
ON devices
USING btree
(serial_number COLLATE pg_catalog."default" );
patient_devices -- ~25,000 rows
CREATE TABLE patient_devices
(
id serial NOT NULL,
patient_id integer NOT NULL,
device_id integer NOT NULL,
issuance_datetime timestamp without time zone NOT NULL,
unassignment_datetime timestamp without time zone,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT patient_devices_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE patient_devices
OWNER TO postgres;
CREATE INDEX index_patient_devices_on_device_id
ON patient_devices
USING btree
(device_id );
CREATE INDEX index_patient_devices_on_issuance_datetime
ON patient_devices
USING btree
(issuance_datetime );
CREATE INDEX index_patient_devices_on_patient_id
ON patient_devices
USING btree
(patient_id );
CREATE INDEX index_patient_devices_on_unassignment_datetime
ON patient_devices
USING btree
(unassignment_datetime );
patients -- ~1,000 rows
CREATE TABLE patients
(
id serial NOT NULL,
first_name character varying(50) NOT NULL,
middle_name character varying(50),
last_name character varying(50) NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT participants_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE patients
OWNER TO postgres;
Here is my abbreviated query.
SELECT device_patients.patient_id, serial_number FROM reads
INNER JOIN devices ON devices.id = reads.device_id
INNER JOIN patient_devices ON device_patients.device_id = devices.id
WHERE (reads.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42')
AND (read_datetime > issuance_datetime) AND ((unassignment_datetime IS NOT NULL AND read_datetime < unassignment_datetime) OR
(unassignment_datetime IS NULL))
GROUP BY serial_number, patient_devices.patient_id LIMIT 10
Ultimately this will be a small part of a larger query (without the LIMIT, I only added the limit to prove to myself that the long runtime was not due to returning a bunch of rows), however I've done a bunch of experimenting and determined that this is the slow part of the larger query. When I run EXPLAIN ANALYZE on this query I get the following output (also viewable here)
Limit (cost=156442.31..156442.41 rows=10 width=13) (actual time=2815.435..2815.441 rows=10 loops=1)
-> HashAggregate (cost=156442.31..159114.89 rows=267258 width=13) (actual time=2815.432..2815.437 rows=10 loops=1)
-> Hash Join (cost=1157.78..151455.79 rows=997304 width=13) (actual time=30.930..2739.164 rows=250150 loops=1)
Hash Cond: (devices.device_id = devices.id)
Join Filter: ((reads.read_datetime > patient_devices.issuance_datetime) AND (((patient_devices.unassignment_datetime IS NOT NULL) AND (reads.read_datetime < patient_devices.unassignment_datetime)) OR (patient_devices.unassignment_datetime IS NULL)))
-> Seq Scan on reads (cost=0.00..7236.94 rows=255396 width=12) (actual time=0.035..64.433 rows=255450 loops=1)
Filter: ((read_datetime >= '2012-01-01 10:30:01'::timestamp without time zone) AND (read_datetime <= '2013-05-18 03:03:42'::timestamp without time zone))
-> Hash (cost=900.78..900.78 rows=20560 width=37) (actual time=30.830..30.830 rows=25015 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1755kB
-> Hash Join (cost=19.90..900.78 rows=20560 width=37) (actual time=0.776..20.551 rows=25015 loops=1)
Hash Cond: (patient_devices.device_id = devices.id)
-> Seq Scan on patient_devices (cost=0.00..581.93 rows=24893 width=24) (actual time=0.014..7.867 rows=25545 loops=1)
Filter: ((unassignment_datetime IS NOT NULL) OR (unassignment_datetime IS NULL))
-> Hash (cost=13.61..13.61 rows=503 width=13) (actual time=0.737..0.737 rows=503 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on devices (cost=0.00..13.61 rows=503 width=13) (actual time=0.016..0.466 rows=503 loops=1)
Filter: (entity_id = 2)
Total runtime: 2820.392 ms
My question is how do I speed this up? Right now I'm running this on my Windows machine for testing, but ultimately it will be deployed on Ubuntu, will that make a difference? Any insight into why this takes 2 seconds would be greatly appreciated.
Thanks
It has been suggested that the LIMIT might be altering the query plan. Here is the same query without the LIMIT. The slow part still appears to be the Hash Join.
Also, here are the relevant tuning parameters. Again I'm only testing this on Windows now, and I don't know what effect this would have on a Linux machine
shared_buffers = 2GB
effective_cache_size = 4GB
work_mem = 256MB
random_page_cost = 2.0
Here are the statistics for the reads table
Statistic Value
Sequential Scans 130
Sequential Tuples Read 28865850
Index Scans 283630
Index Tuples Fetched 141421907
Tuples Inserted 255450
Tuples Updated 0
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 255450
Dead Tuples 0
Heap Blocks Read 20441
Heap Blocks Hit 3493033
Index Blocks Read 8824
Index Blocks Hit 4840210
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum 2013-05-20 09:23:03.782-07
Last Autovacuum
Last Analyze 2013-05-20 09:23:03.91-07
Last Autoanalyze 2013-05-17 19:01:44.075-07
Vacuum counter 1
Autovacuum counter 0
Analyze counter 1
Autoanalyze counter 6
Table Size 27 MB
Toast Table Size none
Indexes Size 34 MB
Here are the statistics for the devices table
Statistic Value
Sequential Scans 119
Sequential Tuples Read 63336
Index Scans 1053935
Index Tuples Fetched 1053693
Tuples Inserted 609
Tuples Updated 0
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 609
Dead Tuples 0
Heap Blocks Read 32
Heap Blocks Hit 1054553
Index Blocks Read 32
Index Blocks Hit 2114305
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum
Last Autovacuum
Last Analyze
Last Autoanalyze 2013-05-17 19:02:49.692-07
Vacuum counter 0
Autovacuum counter 0
Analyze counter 0
Autoanalyze counter 2
Table Size 48 kB
Toast Table Size none
Indexes Size 128 kB
Here are the statistics for the patient_devices table
Statistic Value
Sequential Scans 137
Sequential Tuples Read 3065400
Index Scans 853990
Index Tuples Fetched 46143763
Tuples Inserted 25545
Tuples Updated 24936
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 25547
Dead Tuples 929
Heap Blocks Read 1959
Heap Blocks Hit 6099617
Index Blocks Read 1077
Index Blocks Hit 2462681
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum
Last Autovacuum 2013-05-17 19:01:44.576-07
Last Analyze
Last Autoanalyze 2013-05-17 19:01:44.697-07
Vacuum counter 0
Autovacuum counter 6
Analyze counter 0
Autoanalyze counter 6
Table Size 2624 kB
Toast Table Size none
Indexes Size 5312 kB
Below is the full query that I'm trying to speed up. The smaller query is indeed faster, but I was unable to make my full query faster which is reproduced below. As suggested, I added 4 new indices, UNIQUE(device_id, issuance_datetime), UNIQUE(device_id, issuance_datetime), UNIQUE(patient_id, unassignment_datetime), UNIQUE(patient_id, unassignment_datetime)
SELECT
first_name
, last_name
, MAX(max_read) AS read_datetime
, SUM(value) AS value
, serial_number
FROM (
SELECT
pa.first_name
, pa.last_name
, value
, first_value(de.serial_number) OVER(PARTITION BY pa.id ORDER BY re.read_datetime DESC) AS serial_number -- I'm not sure if this is a good way to do this, but I don't know of another way
, re.read_datetime
, MAX(re.read_datetime) OVER (PARTITION BY pd.id) AS max_read
FROM reads re
INNER JOIN devices de ON de.id = re.device_id
INNER JOIN patient_devices pd ON pd.device_id = de.id
AND re.read_datetime >= pd.issuance_datetime
AND re.read_datetime < COALESCE(pd.unassignment_datetime , 'infinity'::timestamp)
INNER JOIN patients pa ON pa.id = pd.patient_id
WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
) AS foo WHERE read_datetime = max_read
GROUP BY first_name, last_name, serial_number ORDER BY value desc
LIMIT 10
Sorry for not posting this earlier, but I thought this query would be too complicated, and was trying to simply the problem, but apparently I still can't figure it out. It seems like it would be a LOT quicker if I could limit the results returned by the nested select using the max_read variable, but according to numerous sources, that isn't allowed in Postgres.

FYI: sanitised query:
SELECT pd.patient_id
, de.serial_number
FROM reads re
INNER JOIN devices de ON de.id = re.device_id
INNER JOIN patient_devices pd ON pd.device_id = de.id
AND re.read_datetime >= pd.issuance_datetime -- changed this from '>' to '>='
AND (re.read_datetime < pd.unissuance_datetime OR pd.unissuance_datetime IS NULL)
WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42'
GROUP BY de.serial_number, pd.patient_id
LIMIT 10
;
UPDATE: without the original typos:
EXPLAIN ANALYZE
SELECT pd.patient_id
, de.serial_number
FROM reads re
INNER JOIN devices de ON de.id = re.device_id
INNER JOIN patient_devices pd ON pd.device_id = de.id
AND re.read_datetime >= pd.issuance_datetime
AND (re.read_datetime < pd.unassignment_datetime OR pd.unassignment_datetime IS NULL)
WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42'
GROUP BY de.serial_number, pd.patient_id
LIMIT 10
;
UPDATE: this is about 6 times as fast here (on synthetic data, and with a slightly altered data model)
-- Modified data model + synthetic data:
CREATE TABLE devices
( id serial NOT NULL
, serial_number character varying(20) NOT NULL
-- , created_at timestamp without time zone NOT NULL
-- , updated_at timestamp without time zone NOT NULL
, CONSTRAINT devices_pkey PRIMARY KEY (id )
, UNIQUE (serial_number)
) ;
CREATE TABLE reads
-- ( id serial NOT NULL PRIMARY KEY -- You don't need this surrogate key
( device_id integer NOT NULL REFERENCES devices (id)
, value bigint NOT NULL
, read_datetime timestamp without time zone NOT NULL
-- , created_at timestamp without time zone NOT NULL
-- , updated_at timestamp without time zone NOT NULL
, PRIMARY KEY ( device_id, read_datetime)
) ;
CREATE TABLE patient_devices
-- ( id serial NOT NULL PRIMARY KEY -- You don't need this surrogate key
( patient_id integer NOT NULL -- REFERENCES patients (id)
, device_id integer NOT NULL REFERENCES devices(id)
, issuance_datetime timestamp without time zone NOT NULL
, unassignment_datetime timestamp without time zone
-- , created_at timestamp without time zone NOT NULL
-- , updated_at timestamp without time zone NOT NULL
, PRIMARY KEY (device_id, issuance_datetime)
, UNIQUE (device_id, unassignment_datetime)
) ;
-- CREATE INDEX index_patient_devices_on_issuance_datetime ON patient_devices (device_id, unassignment_datetime );
-- may need some additional indices later
-- devices -- ~500 rows
INSERT INTO devices(serial_number) SELECT 'No_' || gs::text FROM generate_series(1,500) gs;
-- reads -- ~100K rows
INSERT INTO reads(device_id, read_datetime, value)
SELECT de.id, gs
, (random()*1000000)::bigint
FROM devices de
JOIN generate_series('2012-01-01', '2013-05-01' , '1 hour' ::interval) gs
ON random() < 0.02;
-- patient_devices -- ~25,000 rows
INSERT INTO patient_devices(device_id, issuance_datetime, patient_id)
SELECT DISTINCT ON (re.device_id, read_datetime)
re.device_id, read_datetime, pa
FROM generate_series(1,100) pa
JOIN reads re
ON random() < 0.01;
-- close the open intervals
UPDATE patient_devices dst
SET unassignment_datetime = src.issuance_datetime
FROM patient_devices src
WHERE src.device_id = dst.device_id
AND src.issuance_datetime > dst.issuance_datetime
AND NOT EXISTS ( SELECT *
FROM patient_devices nx
WHERE nx.device_id = src.device_id
AND nx.issuance_datetime > dst.issuance_datetime
AND nx.issuance_datetime < src.issuance_datetime
)
;
VACUUM ANALYZE patient_devices;
VACUUM ANALYZE devices;
VACUUM ANALYZE reads;
-- EXPLAIN ANALYZE
SELECT pd.patient_id
, de.serial_number
--, COUNT (*) AS zcount
FROM reads re
INNER JOIN devices de ON de.id = re.device_id
INNER JOIN patient_devices pd ON pd.device_id = de.id
AND re.read_datetime >= pd.issuance_datetime
AND re.read_datetime < COALESCE(pd.unassignment_datetime , 'infinity'::timestamp)
WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
GROUP BY de.serial_number, pd.patient_id
LIMIT 10
;

look at the parts of the analyze report where you see Seq Scan.
for example this parts could use some indexes:
Seq Scan on patient_devices - > unassignment_datetime
Seq Scan on devices -> entity_id
Seq Scan on reads - > read_datetime
about read_datetime: it is possible to create a specific index for mathematical equation like > and <=, which will come handy. i dont know the syntax for it, though

Related

How can i improce performance of sqlite with large table?

SQlite DB with single table and 60,000,000 records. time to run simple query is more then 100 seconds.
I've tried to switch to postgeSQL but its performance was even less good.
Hadn't test it on mySQL or msSQL.
Shell I split the table (lets say different table for each pointID - there are some hundreds of it? or different table for each month - then I'll have maximum of 10,000,000 records?)
sql scheme:
CREATE TABLE `collectedData` (
`id` INTEGER,
`timeStamp` double,
`timeDateStr` nvarchar,
`pointID` nvarchar,
`pointIDindex` double,
`trendNumber` integer,
`status` nvarchar,
`value` double,
PRIMARY KEY(`id`)
);
CREATE INDEX `idx_pointID` ON `collectedData` (
`pointID`
);
CREATE INDEX `idx_pointIDindex` ON `collectedData` (
`pointIDindex`
);
CREATE INDEX `idx_timeStamp` ON `collectedData` (
`timeStamp`
);
CREATE INDEX `idx_trendNumber` ON `collectedData` (
`trendNumber`
);
Next query took 107 seconds:
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by timestamp desc, id desc limit 5000
next query took 150 seconds (less conditions)
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointIDindex % 1 = 0
order by timestamp desc, id desc limit 5000
Editing:
Asnwer from another place - add the next index:
CREATE INDEX idx_All ON collectedData (trendNumber, pointid, pointIDindex, status, timestamp desc, id desc, timeDateStr, value)
had improved performance by factor of 3.
Editing #2: by #Raymond Nijland offer: the execution plan is:
SEARCH TABLE collectedData USING COVERING INDEX idx_All (trendNumber=? AND pointID=?)"
"0" "0" "0" "EXECUTE LIST SUBQUERY 1"
"0" "0" "0" "USE TEMP B-TREE FOR ORDER BY"
and thanks to him - using this data, I've changed the order of the rules in the query to the next:
select * from (
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by id desc limit 5000
) order by timestamp desc
this made big improvement (for me it's solved).
After #RaymondNijland had offered me to check the execution plan, I've changed the query to:
select * from (
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by id desc limit 5000
) order by timestamp desc
This query gives same results like the other, but is't 120 times faster (decrease the number of records before sorting).

postgresql hot updates table not working

I have a table:
CREATE TABLE my_table
(
id bigint NOT NULL,
data1 character varying(255),
data2 character varying(100000),
double1 double precision,
double2 double precision,
id2 bigint
);
With index on id2 (id2 is foreign key).
and i have a query:
update my_table set double2 = :param where id2 = :id2;
This query uses index on id2, but it works very-very slow.
I expected that my query will use HOT updates, but it is not true.
I checked HOT updates by query:
SELECT pg_stat_get_xact_tuples_hot_updated('my_table'::regclass::oid);
and it always returns zero.
What am I doing wrong? How i can speedup my update query?
Version of postgres is 9.4.11.
UPD:
execution plan for update:
Update on my_table (cost=0.56..97681.01 rows=34633 width=90) (actual time=42082.915..42082.915 rows=0 loops=1)
-> Index Scan using my_index on my_table (cost=0.56..97681.01 rows=34633 width=90) (actual time=0.110..330.563 rows=97128 loops=1)
Output: id, data1, data2, 0.5::double precision, double1, id2, ctid
Index Cond: (my_table.id2 = 379262689897216::bigint)
Planning time: 1.246 ms
Execution time: 42082.986 ms
The requirements for HOT updates are:
that you're updating only fields that aren't used in any indexes
that the page that contains the row you're updating has extra space in it (fillfactor should be less than 100)
which based on your comments, you seem to be doing.
But one thing I noticed is that you said you're using pg_stat_get_xact_tuples_hot_updated to check if HOT updates are happening; be aware that this function returns only the number of HOT-updated rows in the current transaction, not from all time. My guess is HOT updates are happening, but you used the wrong function to detect them. If instead you use
SELECT pg_stat_get_tuples_hot_updated('my_table'::regclass::oid);
you can get the total number of HOT-updated rows for all time.

How to define if table is a good candidate for a clustered columnstore index?

I have read (here,here and here) about clustered columnstore indexes introduced in SQL Server 2014. Basically, now:
Column store indexes can be updatable
Table schema can be modified (without drop column store indexes)
Structure of the base table can be columnar
Space saved by compression effects (with a column store index, you
can save between 40 to 50 percent of initial space used for the
table)
In addition, they support:
Row mode and Batch mode processing
BULK INSERT statement
More data types
AS I have understood there are some restrictions, like:
Unsupported data types
Other indexes cannot be created
But as it is said:
With a clustered column store index, all filter possibilities are
already covered; Query Processor, using Segment Elimination, will be
able to consider only the segments required by the query clauses. On
the columns where it cannot apply the Segment Elimination, all scans
will be faster than B-Tree index scans because data are compressed so
less I/O operations will be required.
I am interested in the following:
Does the statement above say that a clustered column store index is always better for extracting data than a B-Tree index when a lot of duplicated values exist?
What about the performance between clustered column store index and non-clustered B-Tree covering index, when the table has many columns for example?
Can I have a combination of clustered and non-clustered columnstores indexes on one table?
And most importantly, can anyone tell how to determine whether a table is a good candidate for a columned stored index?
It is said that the best candidates are tables for which update/delete/insert operations are not performed often. For example, I have a table with storage size above 17 GB (about 70 millions rows) and new records are inserted and deleted constantly. On the other hand, a lot of queries using its columns are performed. Or I have a table with storage size about 40 GB (about 60 millions rows) with many inserts performed each day - it is not queried often but I want to reduce its size.
I know the answer is mostly in running production tests but before that I need to pick the better candidates.
One of the most important restrictions for Clustered Columnstore is their locking, you can find some details over here: http://www.nikoport.com/2013/07/07/clustered-columnstore-indexes-part-8-locking/
Regarding your questions:
1) Does the statement above say that a clustered column store index is always better for extracting data then a B-Tree index when a lot of duplicated values exist
Not only duplicates are faster scanned by Batch Mode, but for data reading the mechanisms for Columnstore Indexes are more effective, when reading all data out of a Segment.
2) What about the performance between clustered column store index and non-clustered B-Tree covering index, when the table has many columns for example
Columnstore Index has a significantly better compression than Page or Row, available for the Row Store, Batch Mode shall make the biggest difference on the processing side and as already mentioned even reading of the equally-sized pages & extents should be faster for Columnstore Indexes
3) Can I have a combination of clustered and non clustered columnstores indexes on one table
No, at the moment this is impossible.
4) ... can anyone tell how to define if a table is a good candidate for a columned stored index?
Any table which you are scanning & processing in big amounts (over 1 million rows), or maybe even whole table with over 100K scanned entirely might be a candidate to consider.
There are some restrictions on the used technologies related to the table where you want to build Clustered Columnstore indexes, here is a query that I am using:
select object_schema_name( t.object_id ) as 'Schema'
, object_name (t.object_id) as 'Table'
, sum(p.rows) as 'Row Count'
, cast( sum(a.total_pages) * 8.0 / 1024. / 1024
as decimal(16,3)) as 'size in GB'
, (select count(*) from sys.columns as col
where t.object_id = col.object_id ) as 'Cols Count'
, (select count(*)
from sys.columns as col
join sys.types as tp
on col.system_type_id = tp.system_type_id
where t.object_id = col.object_id and
UPPER(tp.name) in ('VARCHAR','NVARCHAR')
) as 'String Columns'
, (select sum(col.max_length)
from sys.columns as col
join sys.types as tp
on col.system_type_id = tp.system_type_id
where t.object_id = col.object_id
) as 'Cols Max Length'
, (select count(*)
from sys.columns as col
join sys.types as tp
on col.system_type_id = tp.system_type_id
where t.object_id = col.object_id and
(UPPER(tp.name) in ('TEXT','NTEXT','TIMESTAMP','HIERARCHYID','SQL_VARIANT','XML','GEOGRAPHY','GEOMETRY') OR
(UPPER(tp.name) in ('VARCHAR','NVARCHAR') and (col.max_length = 8000 or col.max_length = -1))
)
) as 'Unsupported Columns'
, (select count(*)
from sys.objects
where type = 'PK' AND parent_object_id = t.object_id ) as 'Primary Key'
, (select count(*)
from sys.objects
where type = 'F' AND parent_object_id = t.object_id ) as 'Foreign Keys'
, (select count(*)
from sys.objects
where type in ('UQ','D','C') AND parent_object_id = t.object_id ) as 'Constraints'
, (select count(*)
from sys.objects
where type in ('TA','TR') AND parent_object_id = t.object_id ) as 'Triggers'
, t.is_tracked_by_cdc as 'CDC'
, t.is_memory_optimized as 'Hekaton'
, t.is_replicated as 'Replication'
, coalesce(t.filestream_data_space_id,0,1) as 'FileStream'
, t.is_filetable as 'FileTable'
from sys.tables t
inner join sys.partitions as p
ON t.object_id = p.object_id
INNER JOIN sys.allocation_units as a
ON p.partition_id = a.container_id
where p.data_compression in (0,1,2) -- None, Row, Page
group by t.object_id, t.is_tracked_by_cdc, t.is_memory_optimized, t.is_filetable, t.is_replicated, t.filestream_data_space_id
having sum(p.rows) > 1000000
order by sum(p.rows) desc

Slow query time in postgresql when UTC milliseconds is stored as bigint

We are migrating from a time series database (ECHO historian) to a open source database basically due to price factor. Our choice was PostgreSQL as there are no open source time series database. What we used to store in the ECHO was just time and value pairs.
Now here is the problem. The table that I created in postgre consists of 2 columns. First is of "bigint" type to store the time in UTC milliseconds(13 digit number) and second is the value whose data type is set to "real" type. I had filled up around 3.6 million rows (Spread across a time range of 30 days) of data and when I query for a small time range (say 1 day) the query takes 4 seconds but for the same time range in ECHO the response time is 150 millisecs!.
This is a huge difference. Having a bigint for time seems to be the reason for the slowness but not sure. Could you please suggest how the query time can be improved.
I also read about using the data type "timestamp" and "timestamptz" and looks like we need to store the date and time as regular format and not UTC seconds. Can this help to speed up my query time?
Here is my table definition :
Table "public. MFC2 Flow_LCL "
Column | Type | Modifiers | Storage | Stats target | Description
----------+--------+-----------+---------+--------------+-------------
the_time | bigint | | plain | |
value | real | | plain | |
Indexes:
"MFC2 Flow_LCL _time_idx" btree (the_time)
Has OIDs: no
Currently i am storing the time in UTC milliseconds (using bigint). The challenge here is there could be duplicate time value pairs.
This is the query i am using (called through a simple API which will pass table name, start and end time)
PGresult *res;
int rec_count;
std::string sSQL;
sSQL.append("SELECT * FROM ");
sSQL.append(" \" ");
sSQL.append(table);
sSQL.append(" \" ");
sSQL.append(" WHERE");
sSQL.append(" time >= ");
CString sTime;
sTime.Format("%I64d",startTime);
sSQL.append(sTime);
sSQL.append(" AND time <= ");
CString eTime;
eTime.Format("%I64d",endTime);
sSQL.append(eTime);
sSQL.append(" ORDER BY time ");
res = PQexec(conn, sSQL.c_str());
Your time series database, if it works like a competitor I examined once, stores data in the order of the "time" column automatically in a heap-like structure. Postgres does not. As a result, you are doing an O(n) search [n=number of rows in table]: the entire table must be read to look for rows matching your time filter. A Primary Key on the timestamp (which creates a unique index) or, if timestamps are not unique, a regular index will give you binary O(log n) searches for single records and improved performance for all queries retrieving less than about 5% of the table. Postgres will estimate the crossover point between where an index scan or a full table scan is better.
You probably also want to CLUSTER (PG Docs) the table on that index.
Also, follow the advice above not to use time or other SQL reserved words as column names. Even when it is legal, it's asking for trouble.
[This would be better as a comment, but it is too long for that.]
Are you really planning for the year 2038 problem already? Why not just use an int for time as in standard UNIX?
SET search_path=tmp;
-- -------------------------------------------
-- create table and populate it with 10M rows
-- -------------------------------------------
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE old_echo
( the_time timestamp NOT NULL PRIMARY KEY
, payload DOUBLE PRECISION NOT NULL
);
INSERT INTO old_echo (the_time, payload)
SELECT now() - (gs * interval '1 msec')
, random()
FROM generate_series(1,10000000) gs
;
-- DELETE FROM old_echo WHERE random() < 0.8;
VACUUM ANALYZE old_echo;
SELECT MIN(the_time) AS first
, MAX(the_time) AS last
, (MAX(the_time) - MIN(the_time))::interval AS width
FROM old_echo
;
EXPLAIN ANALYZE
SELECT *
FROM old_echo oe
JOIN (
SELECT MIN(the_time) AS first
, MAX(the_time) AS last
, (MAX(the_time) - MIN(the_time))::interval AS width
, ((MAX(the_time) - MIN(the_time))/2)::interval AS half
FROM old_echo
) mima ON 1=1
WHERE oe.the_time >= mima.first + mima.half
AND oe.the_time < mima.first + mima.half + '1 sec':: interval
;
RESULT:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.06..59433.67 rows=1111124 width=64) (actual time=0.101..1.307 rows=1000 loops=1)
-> Result (cost=0.06..0.07 rows=1 width=0) (actual time=0.049..0.050 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=8) (actual time=0.022..0.022 rows=1 loops=1)
-> Index Scan using old_echo_pkey on old_echo (cost=0.00..284873.62 rows=10000115 width=8) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (the_time IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.00..0.03 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1)
-> Index Scan Backward using old_echo_pkey on old_echo (cost=0.00..284873.62 rows=10000115 width=8) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (the_time IS NOT NULL)
-> Index Scan using old_echo_pkey on old_echo oe (cost=0.01..34433.30 rows=1111124 width=16) (actual time=0.042..0.764 rows=1000 loops=1)
Index Cond: ((the_time >= (($0) + ((($1 - $0) / 2::double precision)))) AND (the_time < ((($0) + ((($1 - $0) / 2::double precision))) + '00:00:01'::interval)))
Total runtime: 1.504 ms
(13 rows)
UPDATE: since the timestamp appears to be non-unique (btw: what do duplicates mean in that case?) I added an extra key column. An ugly hack, but it works here. query time 11ms for 10M -80% rows. (number of rows hit 210/222067):
CREATE TABLE old_echo
( the_time timestamp NOT NULL
, the_seq SERIAL NOT NULL -- to catch the duplicate keys
, payload DOUBLE PRECISION NOT NULL
, PRIMARY KEY(the_time, the_seq)
);
-- Adding the random will cause some timestamps to be non-unique.
-- (and others to be non-existent)
INSERT INTO old_echo (the_time, payload)
SELECT now() - ((gs+random()*1000::integer) * interval '1 msec')
, random()
FROM generate_series(1,10000000) gs
;
DELETE FROM old_echo WHERE random() < 0.8;

PostgreSQL Simple JOIN very slow

I have a simple query, and two tables:
drilldown
CREATE SEQUENCE drilldown_id_seq;
CREATE TABLE drilldown (
transactionid bigint NOT NULL DEFAULT nextval('drilldown_id_seq'),
userid bigint NOT NULL default 0 REFERENCES users(id),
pathid bigint NOT NULL default 0,
reqms bigint NOT NULL default 0,
quems bigint NOT NULL default 0,
clicktime timestamp default current_timestamp,
PRIMARY KEY(transactionid)
);
ALTER SEQUENCE drilldown_id_seq OWNED BY drilldown.transactionid;
CREATE INDEX drilldown_idx1 ON drilldown (clicktime);
querystats
CREATE SEQUENCE querystats_id_seq;
CREATE TABLE querystats (
id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
queryms bigint NOT NULL default 0,
PRIMARY KEY(id)
);
ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;
CREATE INDEX querystats_idx1 ON querystats (transactionid);
CREATE INDEX querystats_idx2 ON querystats (querynameid);
drilldown has 1.5 million records, and querystats has 10 million records; the problem happens when I to a join between the two.
QUERY
explain analyse
select avg(qs.queryms)
from querystats qs
join drilldown d on (qs.transactionid=d.transactionid)
where querynameid=1;
QUERY PLAN
Aggregate (cost=528596.96..528596.97 rows=1 width=8) (actual time=5213.154..5213.154 rows=1 loops=1)
-> Hash Join (cost=274072.53..518367.59 rows=4091746 width=8) (actual time=844.087..3528.788 rows=4117717 loops=1)
Hash Cond: (qs.transactionid = d.transactionid)
-> Bitmap Heap Scan on querystats qs (cost=88732.62..210990.44 rows=4091746 width=16) (actual time=309.502..1321.029 rows=4117717 loops=1)
Recheck Cond: (querynameid = 1)
-> Bitmap Index Scan on querystats_idx2 (cost=0.00..87709.68 rows=4091746 width=0) (actual time=307.916..307.916 rows=4117718 loops=1)
Index Cond: (querynameid = 1)
-> Hash (cost=162842.29..162842.29 rows=1371250 width=8) (actual time=534.065..534.065 rows=1372574 loops=1)
Buckets: 4096 Batches: 64 Memory Usage: 850kB
-> Index Scan using drilldown_pkey on drilldown d (cost=0.00..162842.29 rows=1371250 width=8) (actual time=0.015..364.657 rows=1372574 loops=1)
Total runtime: 5213.205 ms
(11 rows)
I know there are some tuning parameters I can adjust for PostgreSQL, but what I want to know is the query I am doing the most optimal way of joing the two tables?
Or maybe some sort of INNER JOIN? I'm just not sure.
Any pointers are appreciated!
EDIT
database#\d drilldown
Table "public.drilldown"
Column | Type | Modifiers
---------------+-----------------------------+--------------------------------------------------------
transactionid | bigint | not null default nextval('drilldown_id_seq'::regclass)
userid | bigint | not null default 0
pathid | bigint | not null default 0
reqms | bigint | not null default 0
quems | bigint | not null default 0
clicktime | timestamp without time zone | default now()
Indexes:
"drilldown_pkey" PRIMARY KEY, btree (transactionid)
"drilldown_idx1" btree (clicktime)
Foreign-key constraints:
"drilldown_userid_fkey" FOREIGN KEY (userid) REFERENCES users(id)
Referenced by:
TABLE "querystats" CONSTRAINT "querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)
database=# \d querystats
Table "public.querystats"
Column | Type | Modifiers
---------------+--------+---------------------------------------------------------
id | bigint | not null default nextval('querystats_id_seq'::regclass)
transactionid | bigint | not null default 0
querynameid | bigint | not null default 0
queryms | bigint | not null default 0
Indexes:
"querystats_pkey" PRIMARY KEY, btree (id)
"querystats_idx1" btree (transactionid)
"querystats_idx2" btree (querynameid)
Foreign-key constraints:
"querystats_querynameid_fkey" FOREIGN KEY (querynameid) REFERENCES queryname(id)
"querystats_transactionid_fkey" FOREIGN KEY (transactionid) REFERENCES drilldown(transactionid)
So here are the two tables requested and version
PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So what this query is doing is getting the average from all the rows values of queryms for each query type (querynameid)
name | current_setting | source
----------------------------+----------------------------------+----------------------
application_name | psql | client
client_encoding | UTF8 | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
enable_seqscan | off | session
external_pid_file | /var/run/postgresql/9.1-main.pid | configuration file
lc_messages | en_US.UTF-8 | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
log_line_prefix | %t | configuration file
log_timezone | localtime | environment variable
max_connections | 100 | configuration file
max_stack_depth | 2MB | environment variable
port | 5432 | configuration file
shared_buffers | 24MB | configuration file
ssl | on | configuration file
TimeZone | localtime | environment variable
unix_socket_directory | /var/run/postgresql | configuration file
(19 rows)
I see that enable_seqscan=off, I have not touched any settings, this is a completely default install.
UPDATE
I made some changes from the below comments and here is the results.
explain analyse SELECT (SELECT avg(queryms) AS total FROM querystats WHERE querynameid=3) as total FROM querystats qs JOIN drilldown d ON (qs.transactionid=d.transactionid) WHERE qs.querynameid=3 limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=196775.99..196776.37 rows=1 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (cost=196775.94..196775.99 rows=1 width=8) (actual time=2320.815..2320.815 rows=1 loops=1)
-> Bitmap Heap Scan on querystats (cost=24354.25..189291.69 rows=2993698 width=8) (actual time=226.516..1144.690 rows=2999798 loops=1)
Recheck Cond: (querynameid = 3)
-> Bitmap Index Scan on querystats_idx (cost=0.00..23605.83 rows=2993698 width=0) (actual time=225.119..225.119 rows=2999798 loops=1)
Index Cond: (querynameid = 3)
-> Nested Loop (cost=0.00..1127817.12 rows=2993698 width=0) (actual time=2320.876..2320.876 rows=1 loops=1)
-> Seq Scan on drilldown d (cost=0.00..76745.10 rows=1498798 width=8) (actual time=0.009..0.009 rows=1 loops=1)
-> Index Scan using querystats_idx on querystats qs (cost=0.00..0.60 rows=2 width=8) (actual time=0.045..0.045 rows=1 loops=1)
Index Cond: ((querynameid = 3) AND (transactionid = d.transactionid))
Total runtime: 2320.940 ms
(12 rows)
It's behaving as though you have set enable_seqscan = off, because it is using an index scan to populate a hash table. Never set any of the planner options off except as a diagnostic step, and if you are showing a plan, please show any options used. This can be run to show a lot of the useful information:
SELECT version();
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
It also helps if you tell us about the runtime environment, especially the amount of RAM on the machine, what your storage system looks like, and the size of the database (or even better, the active data set of frequently referenced data in the database).
As a rough breakdown, the 5.2 seconds breaks down to:
1.3 seconds to find the 4,117,717 querystats rows that match your selection criterion.
2.3 seconds to randomly match those against drilldown records.
1.6 seconds to pass the 4,117,717 rows and calculate an average.
So, even though you seem to have crippled its ability to use the fastest plan, it is taking only 1.26 microseconds (millionths of a second) to locate each row, join it to another, and work it into a calculation of an average. That's not too bad on an absolute basis, but you can almost certainly get a slightly faster plan.
First off, if you are using 9.2.x where x is less than 3, upgrade to 9.2.3 immediately. There was a performance regression for some types of plans which was fixed in the recent release which might affect this query. In general, try to stay up-to-date on minor releases (where version number changes past the second dot).
You can test different plans in a single session by setting planning factors on just that connection and running your query (or an EXPLAIN on it). Try something like this:
SET seq_page_cost = 0.1;
SET random_page_cost = 0.1;
SET cpu_tuple_cost = 0.05;
SET effective_cache_size = '3GB'; -- actually use shared_buffers plus OS cache
Make sure that all enable_ settings are on.
You claim in your question:
I see that enable_seqscan=off, I have not touched any settings, this is a completely default install.
In contrast, the output from pg_settings tells us:
enable_seqscan | off | session
Meaning, that you set enable_seqscan = off in your session. Something is not adding up here.
Run
SET enable_seqscan = on;
or
RESET enable_seqscan;
Assert:
SHOW enable_seqscan;
Also, your setting for shared_buffers is way too low for a db with millions of records. 24MB seems to be the conservative setting of Ubuntu out-of-the-box. You need to edit your configuration files for serious use! I quote the manual:
If you have a dedicated database server with 1GB or more of RAM, a
reasonable starting value for shared_buffers is 25% of the memory in your system.
So edit your postgresql.conf file to increase the value and reload.
Then try your query again and find out how enable_seqscan was turned off.
In this query
select avg(qs.queryms)
from querystats qs
join drilldown d
on (qs.transactionid=d.transactionid)
where querynameid=1;
you're not using any of the columns from the table "drilldown". Since the foreign key constraint guarantees there's a row in "drilldown" for every "transactionid" in "querystats", I don't think the join will do anything useful. Unless I've missed something, your query is equivalent to
select avg(qs.queryms)
from querystats qs
where querynameid=1;
No join at all. As long as there's an index on "querynameid" you should get decent performance.
When you don't join, avg(qs.queryms) executes once.
When you do the join, you are executing avg(qs.queryms) as many times as there are rows generated by the join.
If you're always interested in a single querynameid, try putting avg(qs.queryms) in a subselect:
SELECT
(SELECT avg(queryms) FROM querystats WHERE querynameid=1)
FROM querystats qs
JOIN drilldown d ON (qs.transactionid=d.transactionid)
WHERE qs.querynameid=1;
The querystats table looks like a fat junction table to me. In that case: omit the surrogate key, and live on the natural (composite) key (both components already are not NULLable) and add a reversed composite index. (the separate indices are useless, the FK constraint generates them automatically for you anyway)
-- CREATE SEQUENCE querystats_id_seq;
CREATE TABLE querystats (
-- id bigint NOT NULL DEFAULT nextval('querystats_id_seq'),
transactionid bigint NOT NULL default 0 REFERENCES drilldown(transactionid),
querynameid bigint NOT NULL default 0 REFERENCES queryname(id),
queryms bigint NOT NULL default 0,
PRIMARY KEY(transactionid,querynameid )
);
-- ALTER SEQUENCE querystats_id_seq OWNED BY querystats.id;
--CREATE INDEX querystats_idx1 ON querystats (transactionid);
-- CREATE INDEX querystats_idx2 ON querystats (querynameid);
CREATE UNIQUE INDEX querystats_alt ON querystats (querynameid, transactionid);

Resources