utl_file.FCLOSE() is slow with large files - oracle

We are using utl_file in Oracle 10g to copy a blob from a table row to a file on the file system and when we call utl_file.fclose() it takes a long time. It's a 10mb file, not very big, and it takes just over a minute to complete. Anyone know why this would be so slow?
Thanks
EDIT
Looks like this is related to our file system. When we write to a local drive it works fine.

We have determined that it is our network file system mount causing the issue. When we remove that from the problem and store the file to a local drive it works fine. We were able to test this on another environment with the same configuration and it's fast and works as expected.
Now we need to get our network guys involved and see why transfering data on the NFS is so slow in this environment.
EDIT
It was the network speed between the oracle server and the UNIX server. It was set to half duplex 10Mb per second. So we bumped it up to full duplex 100Mb and it works great now!

Are you doing a fflush prior to that? If not, then the fclose is performing the fflush for you and that may be where the time is. Check it by issuing a fflush prior to close.

Related

Very long time needed to access and parse file on a web server

I'm currently having an issue and I don't understand what causes it.
I could narrow it down to a very simple php script :
$start = microtime(true);
yaml_parse_file("someYamlFile.yaml");
print(microtime(true) - $start);
It's a 63 mb file I'm trying to parse.
When I try that script with Wamp server, on my own computer, it takes only a few seconds to complete.
However, when I run the same script on the remote server hosting my website, it takes about 170 seconds to run.
If I run it a second time, it only takes a few seconds though.
After several minutes, without any visitor to the website, if I run that script again, it takes 170 seconds!
I'm no expert when it comes to servers. Any idea on what could cause this?
Thanks.
Edit :
I tried fopen() and fread() instead and the issue remained.
It took more than 2 minutes to open and read the file the first time, and only a few seconds the second time.
It turns out the problem comes from accessing the file and not from yaml_parser.
I changed the title but still have no clue about what's causing the issue.
Everything you said is reasonable on a cheap server, on which the drive may be very inefficient.
Files are normally opened using sendfile, which uses the Linux Cache.
The loads being slow again after several minutes makes sense --> the cache is invalidated.
None of these issues are related to PHP :-)
I would recommend you put that file on a Ramdisk, so that it sits in RAM instead of on the disk.
/dev/shm is the default ramdisk on modern linux distribution ==> create a folder under it and place the file there.
As for putting the data in Memcache(where it should be): requires the Memcached pecl extension
To add(run once):
$m = new Memcached;
$m->addServer('localhost',11211);
$m->set('app_config',yaml_parse_file("someYamlFile.yaml"));
To load(run in your script):
$m = new Memcached;
$m->addServer('localhost',11211);
$data = $m->get('app_config');

Can I delete .dmp and .phd files of Liberty Profile server?

In the folder <WAS Liberty Profile root>\<profile>\usr\servers\defaultServer there are many files named core.*.dmp and heapdump.*.phd. The size of these files is between 130 MB and 1.3 GB when my deployed app uses 4 MB.
Can I delete these files *.dmp and *.phd?
What are these files for?
Short answer: yes, it's safe to delete them, but you should find out why they're appearing, as it could indicate that your application is not running correctly.
If your dump files were created a long time ago, or you know you were debugging an OutOfMemoryException or have been running server javadump --include=heap,system then go ahead and delete the files. If, however, you keep getting new dump files and don't know why then read on.
The core and heapdump files contain a snapshot of the memory of the application from a specific point in time. Usually you do this to capture the state of your application at the point where something goes wrong so that you can examine it with analysis tools and try to work out what went wrong.
For example, by default the IBM JVM will perform a dump when an OutOfMemoryException is thrown. This allows you to look at the dump file and see what's using up all the memory.
If you have a corresponding javacore file, the fourth line or so should say why the memory dump was made.
e.g. 1TISIGINFO Dump Requested By User (00100000) Through com.ibm.jvm.Dump.javaDumpToFile (caused by running server javadump)
or 1TISIGINFO Dump Event "user" (00004000) received (caused by running kill -3)
If it's a "user" event, then something's asking the JVM to create a dump. If not, and you're still not sure what's causing it, check your jvm.options file for any -Xdump options which can be used to cause the JVM to create a dump in response to certain events. More information on that in the Knowledge Center.

PostgreSQL statistics issue - could not rename temporary statistics file

I am running PotgreSQL 9.4 on Windows, and constantly get the error,
2015-06-15 09:35:36 EDT LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
I also see constant 200-800k writes to global.stat and global.tmp. I have seen other users with the same issue, but no solution.
It is a big database server, with 300g of data, and 6,000 databases.
I tried setting,
track_activities=off
In the config file, but it did not seem to have any affect.
Any help for the error, or reducing the write?
After my initial answer, I decided to research the operation of the stats collector and in particular what it is doing with the files in pg_stat_tmp. I've substantially re-written the answer as a result.
What are the global.stat / global.tmp files used for?
Postgresql contains functionality to collect statistics and status information about its operation. The function is described in Section 27.2 of the manual.
This information is collated by the stats collector process. It is made available to the other postgresql processes via the global.stat file. The first time you run a query that accesses this data within a transaction, the backend which you are connected to will read the global.stat file and cache the result, using it until the end of the transaction.
To keep this file up to date, the stats collector process periodically re-writes it with updated information. It typically does this several times a second. The process is as follows:
Create a new file global.tmp
Write data to this file
Rename global.tmp as global.stat, overwriting the previous global.stat
The global.tmp and global.stats files are written into the directory configured by the stats_temp_directory configuration parameter. Normally this is set to $PGDATA/pg_stat_tmp.
On shutdown, the stats file is written into the file $PGDATA/global/pgstat.stat, and the files in the tmp dir above are removed. This file is then read and removed when the database is started up again.
Why is the stats collector processor creating so much I/O load?
Normally, the amount of data written to the global.stats is relatively modest and writing it does not generate that much I/O traffic. However under some circumstances it does seem to get very bloated. When this happens the amount of load generated can start to get excessive as the entire file is rewritten more than once a second.
I have had one experience where it grew by a factor or 10 or more, compared to other similar servers. This machine did have an unusually large number of databases (for our application at least - 30-40 databases - but nothing like the 6000 you say you have). It is possible that having a large number of databases exacerbates this.
Some of the references below talk about a pattern of creating / dropping lots of tables causing bloat in these files, and that perhaps autovacuum is not running aggressively enough to remove the associated bloat. You may wish to consider your autovac settings.
Why do I get 'Permission Denied' errors on Windows?
After examining the postgresql source code I think there may be a race condition in accessing the global.stats file which could happen at any time, but is exacerbated by the size of the file.
The default mode of operation in Windows is that it is not possible to rename or remove a file while another process has it open. This is different to Linux (or Unix) where a file can be renamed or removed while other processes are accessing it.
In the sequence above you can see that if one of the backend processes is reading the file at the same time as the stats collector is rewriting it, then the backend process may still have the file open at the time the rename is attempted. That leads to the 'Permission Denied' error you are seeing.
Naturally when the file becomes very large, then the amount of time taken to read it becomes more significant, therefore the probability of the stats collector process attempting a rename while a backend still has it open increases.
However, since the file is frequently being rewritten, the impact of these errors is relatively mild. It just means that this particular update fails, leading the the backends getting slightly out of date statistics. The next update will probably succeed.
Note that Windows does offer a file opening mode which does allow files to be deleted or renamed while they are opened by another process, however as far as I could tell, this mode is not used by Postgresql. I could not find any bug report on this - seems like it should be reported.
In summary, these errors are a side effect of the main problem, which is the excessive size of the global.stat file.
I've turned track_activities off but the file is still being written - Why?
From what I can see, track_activites affects only one of the sets of information that the stats collector is collecting.
In addition, it looks as though the stats collector process is started regardless of these settings, and will continue to re-write the file. The settings appear to control only the collection of fresh data.
My conclusion is that once the file has become bloated, it will remain so and continue to be re-written, even once all of the stats collection options are turned off.
What can I do to avoid this problem?
Once the file has become bloated, it seems that the easiest way to get the database back into a good working state is to remove the file, using the following steps:
Stop the database
When the DB is stopped, the pg_stat_tmp directory is empty and a file $PGDATA/global/pgstat.stat is written. We renamed this file to pgstat.stat.old.
Start the database. It creates a fresh set of pgstat files. After confirming the server was operating correctly you can remove the old file you have renamed.
This is the process we used when one of our servers suffered from this problem.
Needless to say be very careful when manually manipulating any files under the Postgresql Data directory.
After this you may want to monitor the server to see if it the file becomes bloated again. If it does then here are some additional ideas to consider:
As mentioned above I have seen some references to this file becoming bloated if autovacuum is not running aggressively enough. You may wish to tune the autovacuum settings
Disabling any of the track_xxx options described in the Section 18.9.1 of the manual which are not required may help
It is possible to place the pg_stats_tmp directory in a tmpfs filesystem (or whatever equivalent RAM based filesystem is available in windows). Doing so should eliminate I/O as a concern for these files.
References:
Postgres stats collector showing high disk I/O
Too much I/O generated by postgres stats collector process
stats collector suddenly causing lots of IO
Here might be a solution for your problem. https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug
Another possibility could be antivirus settings. Try to turn it off temporarily.
It happened to me few days ago. I rebooted the machine, but the error did not disappeared.
Don't know why, but performing a vacuum analyze verbose did the trick, and the error has stoped to show up.

MySQL database backup: performance issues

Folks,
I'm trying to set up a regular backup of a rather large production database (half a gig) that has both InnoDB and MyISAM tables. I've been using mysqldump so far, but I find that it's taking increasingly longer periods of time, and the server is completely unresponsive while mysqldump is running.
I wanted to ask for your advice: how do I either
Make mysqldump backup non-blocking - assign low priority to the process or something like that, OR
Find another backup mechanism that will be better/faster/non-blocking.
I know of the existence of MySQL Enterprise Backup product (http://www.mysql.com/products/enterprise/backup.html) - it's expensive and this is not an option for this project.
I've read about setting up a second server as a "replication slave", but that's not an option for me either (this requires hardware, which costs $$).
Thank you!
UPDATE: more info on my environment: Ubuntu, latest LAMPP, Amazon EC2.
If replication to a slave isn't an option, you could leverage the filesystem, depending on the OS you're using,
Consistent backup with Linux Logical Volume Manager (LVM) snapshots.
MySQL backups using ZFS snapshots.
The joys of backing up MySQL with ZFS...
I've used ZFS snapshots on a quite large MySQL database (30GB+) as a backup method and it completes very quickly (never more than a few minutes) and doesn't block. You can then mount the snapshot somewhere else and back it up to tape, etc.
Edit: (previous answer was suggestion a slave db to back up from, then I noticed Alex ruled that out in his question.)
There's no reason your replication slave can't run on the same hardware, assuming the hardware can keep up. Grab a source tarball, ./configure --prefix=/dbslave; make; make install; and you'll have a second mysql server living completely under /dbslave.
EDIT2: Replication has a bunch of other benefits, as well. For instance, with replication running, you'll may be able to recover the binlog and replay it on top your last backup to recover the extra data after certain kinds of catastrophes.
EDIT3: You mention you're running on EC2. Another, somewhat contrived idea to keep costs down is to try setting up another instance with an EBS volume. Then use the AWS api to spin this instance up long enough for it to catch up with writes from the binary log, dump/compress/send the snapshot, and then spin it down. Not free, and labor-intensive to set up, but considerably cheaper than running the instance 24x7.
Try mk-parallel-dump utility from maatkit (http://www.maatkit.org/)
regards,
Something you might consider is using binary logs here though a method called 'log shipping'. Just before every backup, issue out a command to flush the binary logs and then you can copy all except the current binary log out via your regular file system operations.
The advantage with this method is your not locking up the database at all, since when it opens up the next binary log in sequence, it releases all the file locks on the prior logs so processing shouldn't be affected then. Tar'em, zip'em in place, do as you please, then copy it out as one file to your backup system.
An another advantage with using binary logs is you can restore up to X point in time if the logs are available. I.e. You have last year's full backup, and every log from then to now. But you want to see what the database was on Jan 1st, 2011. You can issue a restore 'until 2011-01-01' and when it stops, your at Jan 1st, 2011 as far as the database is concerned.
I've had to use this once to reverse the damage a hacker caused.
It is definately worth checking out.
Please note... binary logs are USUALLY used for replication. Nothing says you HAVE to.
Adding to what Rich Adams and timdev have already suggested, write a cron job which gets triggered on low usage period to perform the slaving task as suggested to avoid high CPU utilization.
Check mysql-parallel-dump also.

How do I verify the integrity of a Sybase dump file, without trying to load it?

Here's the scenario - a client uploads a Sybase dump file to (gzipped) to our local FTP server. We have an automated process which picks these up and then moves them to different server within the network where the database server resides. Unfortunately, this transfer is over a WAN, which for large files takes a long time, and sometimes our clients forget to FTP in binary mode, which results in 10GB of transfer over our WAN all for nothing as the dump file can't be loaded at the other end. What I'd like to do, is verify the integrity of the dump file on the local server before sending it out over the WAN, but I can't just try and "load" the dump file, as we don't have Sybase installed (and can't install it). Are there any tools or bits of code that I can use to do this?
There are a few things you can do from the command line. The first, on the sending side, is to generate md5sum's of the files.
$ md5sum *.dmp
2bddf3cd8b04010183dd3295ce7594ff pubs_1.dmp
7510e0250c8d68bae3e0e794c211e60b pubs_2.dmp
091fe54fa5fd81d8c109cc7835d37f4a pubs_3.dmp
On the client side, they can run the same. Secondly, usually Sybase dumps are done with the compress option. If this option is used, you can also test the file integrity by uncompressing the files via the command line. This isn't as complete, but it will verify the 8 byte CRC-32 checksum which is part of the compress algorithm.
$ gunzip --test *.dmp
gunzip: pubs_3.dmp: unexpected end of file
Neither of these methods validate that Sybase will be able to load the file, but it does help ensure the file isn't corrupt.
There is no way to really verify the integrity of the dump file without loading it in some way by a backup server. The client should know whether the dump is successful or not via the backup log or output during the dump.
But to solve your problem you should use to SFTP or SCP, all transfers are done in binary, alleviating your problem.
Ensure that they are also using compression in the dump a value of 1-3 is more than enough, this should reduce your network traffic also.

Resources