PostgreSQL: improving pg_dump, pg_restore performance - performance

When I began, I used pg_dump with the default plain format. I was unenlightened.
Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.
When it came time to create the database anew,
# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;
% gunzip dumpfile.gz # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile # into a new, empty database defined above
I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:
# select pg_size_pretty(pg_database_size('dbname'));
47 GB
Because there are predictions this database will be a few terabytes, I need to look at improving performance now.
Please, enlighten me.

First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffers should be set correctly, maintenance_work_mem should be increased during the restore, full_page_writes should be off during the restore, wal_buffers should be increased to 16MB during the restore, checkpoint_segments should be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuum should be disabled during the restore.
If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.

Improve pg dump&restore
PG_DUMP | always use format-directory and -j options
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf and format-directory and -j options
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/

Two issues/ideas:
By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.

I assume you need backup, not a major upgrade of database.
For backup of large databases you should setup continuous archiving instead of pg_dump.
Set up WAL archiving.
Make your base backups for example every day by using
psql template1 -c "select pg_start_backup('`date +%F-%T``')"
rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/
psql template1 -c "select pg_stop_backup()"`
A restore would be as simple as restoring database and WAL logs not older than pg_start_backup time from backup location and starting Postgres. And it will be much faster.

zcat dumpfile.gz | pg_restore -d db_name
Removes the full write of the uncompressed data to disk, which is currently your bottleneck.

As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.
So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.

If you're facing issues with the speed of pg_restore check whether you dumped your data using INSERT or COPY statement.
If you use INSERT (pg_dump is called with --column-inserts parameter) the restore of data would be significantly slower.
Using INSERT is good for making dumps that are loaded into non-Postgres databases. But if you do a restore to Postgres omit using --column-inserts parameter when using pg_dump.

Related

Unable to download data using Aspera

I am trying to download data from the European Nucleotide Archive (ENA) using Aspera CLI however my downloads are getting stalled. I have downloaded several files earlier using the same tool but this is happening since last one month. I usually use the following command:
ascp -QT -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .
From a post on Beta Science, I learnt that this might be due to not limiting the download speed and hence tried usng the -l argument but was of no help.
ascp -QT -l 300m -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .
Your command works.
you might be overdriving your local network ?
how much bandwidth do you have ?
here "-l 300m" sets a target rate to 300 Mbps, if you have less than 30, this can cause such problems.
try to reduce the target rate to what you actually have.
(using wired ? Wifi ?)

Show runtime of a query on monetdb

I am testing monetdb for a colunmnar storage.
I already installed and run the server
but, when I connect to the client and run a query, the response does not show the time to execute the query.
I am connecting as:
mclient -u monetdb -d voc
I already tried to connect with interactive like:
mclient -u monetdb -d voc -i
Output example:
sql>select count(*) from voc.regions;
+---------+
| L3 |
+=========+
| 5570699 |
+---------+
1 tuple
As mkersten mentioned, I would read through the options of the mclient utility first.
To get server and client timing measurements I used --timer=performance option when starting mclient.
Inside mclient I would then disable the result output by setting \f trash to ignore the results when measuring only.
Prepend trace to your query and you get your results like this:
sql>\f trash
sql>trace select count(*) from categories;
sql:0.000 opt:0.266 run:1.713 clk:5.244 ms
sql:0.000 opt:0.266 run:2.002 clk:5.309 ms
The first of the two lines shows you the server timings, the second one the overall timing including passing the results back to the client.
If you use the latest version MonetDB-Mar18 you have good control over the performance timers, which includes parsing, optimization, and runtime at server. See mclient --help.

How to zgrep the last line of a gz file without tail

Here is my problem, I have a set of big gz log files, the very first info in the line is a datetime text, e.g.: 2014-03-20 05:32:00.
I need to check what set of log files holds a specific data.
For the init I simply do a:
'-query-data-'
zgrep -m 1 '^20140320-04' 20140320-0{3,4}*gz
BUT HOW to do the same with the last line without process the whole file as would be done with zcat (too heavy):
zcat foo.gz | tail -1
Additional info, those logs are created with the data time of it's initial record, so if I want to query logs at 14:00:00 I have to search, also, in files created BEFORE 14:00:00, as a file would be created at 13:50:00 and closed at 14:10:00.
The easiest solution would be to alter your log rotation to create smaller files.
The second easiest solution would be to use a compression tool that supports random access.
Projects like dictzip, BGZF, and csio each add sync flush points at various intervals within gzip-compressed data that allow you to seek to in a program aware of that extra information. While it exists in the standard, the vanilla gzip does not add such markers either by default or by option.
Files compressed by these random-access-friendly utilities are slightly larger (by perhaps 2-20%) due to the markers themselves, but fully support decompression with gzip or another utility that is unaware of these markers.
You can learn more at this question about random access in various compression formats.
There's also a "Blasted Bioinformatics" blog by Peter Cock with several posts on this topic, including:
BGZF - Blocked, Bigger & Better GZIP! – gzip with random access (like dictzip)
Random access to BZIP2? – An investigation (result: can't be done, though I do it below)
Random access to blocked XZ format (BXZF) – xz with improved random access support
Experiments with xz
xz (an LZMA compression format) actually has random access support on a per-block level, but you will only get a single block with the defaults.
File creation
xz can concatenate multiple archives together, in which case each archive would have its own block. The GNU split can do this easily:
split -b 50M --filter 'xz -c' big.log > big.log.sp.xz
This tells split to break big.log into 50MB chunks (before compression) and run each one through xz -c, which outputs the compressed chunk to standard output. We then collect that standard output into a single file named big.log.sp.xz.
To do this without GNU, you'd need a loop:
split -b 50M big.log big.log-part
for p in big.log-part*; do xz -c $p; done > big.log.sp.xz
rm big.log-part*
Parsing
You can get the list of block offsets with xz --verbose --list FILE.xz. If you want the last block, you need its compressed size (column 5) plus 36 bytes for overhead (found by comparing the size to hd big.log.sp0.xz |grep 7zXZ). Fetch that block using tail -c and pipe that through xz. Since the above question wants the last line of the file, I then pipe that through tail -n1:
SIZE=$(xz --verbose --list big.log.sp.xz |awk 'END { print $5 + 36 }')
tail -c $SIZE big.log.sp.xz |unxz -c |tail -n1
Side note
Version 5.1.1 introduced support for the --block-size flag:
xz --block-size=50M big.log
However, I have not been able to extract a specific block since it doesn't include full headers between blocks. I suspect this is nontrivial to do from the command line.
Experiments with gzip
gzip also supports concatenation. I (briefly) tried mimicking this process for gzip without any luck. gzip --verbose --list doesn't give enough information and it appears the headers are too variable to find.
This would require adding sync flush points, and since their size varies on the size of the last buffer in the previous compression, that's too hard to do on the command line (use dictzip or another of the previously discussed tools).
I did apt-get install dictzip and played with dictzip, but just a little. It doesn't work without arguments, creating a (massive!) .dz archive that neither dictunzip nor gunzip could understand.
Experiments with bzip2
bzip2 has headers we can find. This is still a bit messy, but it works.
Creation
This is just like the xz procedure above:
split -b 50M --filter 'bzip2 -c' big.log > big.log.sp.bz2
I should note that this is considerably slower than xz (48 min for bzip2 vs 17 min for xz vs 1 min for xz -0) as well as considerably larger (97M for bzip2 vs 25M for xz -0 vs 15M for xz), at least for my test log file.
Parsing
This is a little harder because we don't have the nice index. We have to guess at where to go, and we have to err on the side of scanning too much, but with a massive file, we'd still save I/O.
My guess for this test was 50000000 (out of the original 52428800, a pessimistic guess that isn't pessimistic enough for e.g. an H.264 movie.)
GUESS=50000000
LAST=$(tail -c$GUESS big.log.sp.bz2 \
|grep -abo 'BZh91AY&SY' |awk -F: 'END { print '$GUESS'-$1 }')
tail -c $LAST big.log.sp.bz2 |bunzip2 -c |tail -n1
This takes just the last 50 million bytes, finds the binary offset of the last BZIP2 header, subtracts that from the guess size, and pulls that many bytes off of the end of the file. Just that part is decompressed and thrown into tail.
Because this has to query the compressed file twice and has an extra scan (the grep call seeking the header, which examines the whole guessed space), this is a suboptimal solution. See also the below section on how slow bzip2 really is.
Perspective
Given how fast xz is, it's easily the best bet; using its fastest option (xz -0) is quite fast to compress or decompress and creates a smaller file than gzip or bzip2 on the log file I was testing with. Other tests (as well as various sources online) suggest that xz -0 is preferable to bzip2 in all scenarios.
————— No Random Access —————— ——————— Random Access ———————
FORMAT SIZE RATIO WRITE READ SIZE RATIO WRITE SEEK
————————— ————————————————————————————— —————————————————————————————
(original) 7211M 1.0000 - 0:06 7211M 1.0000 - 0:00
bzip2 96M 0.0133 48:31 3:15 97M 0.0134 47:39 0:00
gzip 79M 0.0109 0:59 0:22
dictzip 605M 0.0839 1:36 (fail)
xz -0 25M 0.0034 1:14 0:12 25M 0.0035 1:08 0:00
xz 14M 0.0019 16:32 0:11 14M 0.0020 16:44 0:00
Timing tests were not comprehensive, I did not average anything and disk caching was in use. Still, they look correct; there is a very small amount of overhead from split plus launching 145 compression instances rather than just one (this may even be a net gain if it allows an otherwise non-multithreaded utility to consume multiple threads).
Well, you can access randomly a gzipped file if you previously create an index for each file ...
I've developed a command line tool which creates indexes for gzip files which allow for very quick random access inside them:
https://github.com/circulosmeos/gztool
The tool has two options that may be of interest for you:
-S option supervise a still-growing file and creates an index for it as it is growing - this can be useful for gzipped rsyslog files as reduces to zero in the practice the time of index creation.
-t tails a gzip file: this way you can do: $ gztool -t foo.gz | tail -1
Please, note that if the index doesn't exists, this will consume the same time as a complete decompression: but as the index is reusable, next searches will be greatly reduced in time!
This tool is based on zran.c demonstration code from original zlib, so there's no out-of-the-rules magic!

openldap read/write performance

I am using OpenLDAP slapd version 2.4.33 with the slapd configuration below. I am trying to run some tests with ldapadd and ldapsearch tools to measure its write and read performance but I am getting really poor performance results.
System: sun4v sparc SUNW,SPARC-Enterprise-T2000 with 64 GB of memory.
Tried to add 1500 entried and it took 1 minute 19 seconds.
time ldapadd -x -D "cn=Manager,dc=my-company,dc=com" -w mypassword -f /tmp/perf1500.ldif > /dev/null 2>&1
ldapadd -x -D "cn=Manager,dc=my-company,dc=com" -w mypassword -f /tmp/perf200.ldif > 1.27s user 0.14s system 1% cpu 1:19.95 total
There is a similar configuration test results are give on http://wiki.zimbra.com/wiki/OpenLDAP_MDB_vs_HDB_performance#MDB_configuration . But I am not even close to that.
MDB configuration:
database mdb
directory /home/myuser/var/openldap-data
suffix "dc=my-company,dc=com"
rootdn "cn=Manager,dc=my-company,dc=com"
rootpw mypassword
index objectClass eq
index myMSISDN eq
maxsize 10737418240
envflags writemap,nometasync
The problem is system is not even being busy but the openldap performance is very poor. How a configuration may provide a better performance? Is there something that is so obvious that I am missing?
Thanks

What parameters to tweak for a text-based PG restore?

Every night we dump and restore a 200 GB database using:
# Production, PG 9:
pg_dump DATNAME | some-irrelevant-pipe
# QA, PG 8.3:
some-irrelevant-pipe | psql -d DATNAME
I had to go for text-based backups in order to restore a dump from 9 on 8.3.
The restore is painfully and unreasonably slow. I noticed my log is full of these:
2011-05-22 08:02:47 CDT LOG: checkpoints are occurring too frequently (9 seconds apart)
2011-05-22 08:02:47 CDT HINT: Consider increasing the configuration parameter "checkpoint_segments".
2011-05-22 08:02:54 CDT LOG: checkpoints are occurring too frequently (7 seconds apart)
2011-05-22 08:02:54 CDT HINT: Consider increasing the configuration parameter "checkpoint_segments".
My question is: Is it possible that the setting of checkpoint_segments is the bottleneck? What other parameters can I tweak to speed up the process?
That machine has 4 GB RAM. Other possibly relevant settings in postgresql.conf are:
shared_buffers = 1000MB
work_mem = 200MB
maintenance_work_mem = 200MB
effective_cache_size = 2000MB
# fsync and checkpoint settings are default
Did you read this ? See specially sec 14.4.9
For the purposes of restoring a database, change:
# I don't think PostgreSQL 8.3 supports synchronous_commit
synchronous_commit = off
# only change fsync = off if your version of PG is too old to support synchronous_commit. If you do support synchronous_commit, don't ever change fsync to anything but on. Ever.
#fsync = off
checkpoint_segments = 25
Regarding checkpoint_segments, set that value to the size of your disk controller's write buffer. 25 = 400MB
Also, make sure your psql is loading everything in a single transaction:
some-irrelevant-pipe | psql -1 -d DATNAME

Resources