PostgreSQL 9.6 server process terminated by 0xc0000374 - windows

I have a pretty large project in which I am executing something about 500 queries one after the other and exporting the data and loading it later when process is finished, in another table, all through psql and a bash script on Windows.
PostgreSQL is crashing with this in the logfile:
LOG: server process (PID 5776) was terminated by exception 0xc0000374
DETAIL: Failed process was running: COPY ( SELECT ...)
HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
Unfortunately I don't have much details on the frequency of this. Seems like sometimes happens and sometimes not and it's not clear when it happens. Looks random to me... It's driving me crazy.
Now 0xc0000374 seems to mean Heap Corruption Exception which is still not so helpful applying it to my context.
How can I understand what's going on better or prevent it? I would be grateful for any feedback!

Related

Errors : ora-00604&ora-1578 &ora-01110

I get ora-00604 & ora-1578 & ora-01110, any one have solution ????
That doesn't sound good.
ORA-01578: ORACLE data block corrupted (file # string, block # string)
Cause : The data block indicated was corrupted, mostly due to software errors.
Action: Try to restore the segment containing the block indicated.
This may involve dropping the segment and recreating it.
If there is a trace file, report the errors in it to your ORACLE representative.
system02 database file is corrupted; there's a possibility that hard disk crashed and you should check it for errors. As it is the database server, I presume that it would be safer if you replaced it (instead of just fixed it, because - when one data block gets corrupted, there's a good chance that it'll happen again (according to Murphy's law, at least)).
Furthermore, it means that you'll have to restore the database. I hope you have backup (the one you got BEFORE corruption happened).

Oracle 12c startup error: ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0

We have a number of databases at our company. Among them an oracle 12c (12.2.0.1.0 to be precise), but we have no (qualified) oracle DBAs. The performance has deteriorated drastically in the last 6 months or so and I now have the task of finding out why. My research suggested that I should up some memory parameters in the initDBN.ora file. Here's what the original looked like:
DBN.__data_transfer_cache_size=0
DBN.__db_cache_size=50331648
DBN.__inmemory_ext_roarea=0
DBN.__inmemory_ext_rwarea=0
DBN.__java_pool_size=79691776
DBN.__large_pool_size=8388608
DBN.__oracle_base='/orabin/app/oracle'#ORACLE_BASE set from environment
DBN.__pga_aggregate_target=197132288
DBN.__sga_target=734003200
DBN.__shared_io_pool_size=12582912
DBN.__shared_pool_size=536870912
DBN.__streams_pool_size=4194304
*.audit_file_dest='/orabin/app/oracle/admin/tmf/adump'
*.audit_trail='db'
*.compatible='12.2.0'
*.control_files='/orabin/app/oracle/oradata/tmf/control01.ctl','/orabin/app/oracle/fast_recovery_area/tmf/control02.ctl'
*.db_16k_cache_size=8388608
*.db_32k_cache_size=8388608
*.db_4k_cache_size=8388608
*.db_block_size=8192
*.db_domain='ubs-hainer.com'
*.db_name='tmf'
*.db_recovery_file_dest='/orabin/app/oracle/fast_recovery_area/tmf'
*.db_recovery_file_dest_size=4096m
*.diagnostic_dest='/orabin/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=TMFXDB)'
*.local_listener='LISTENER_TMF'
*.memory_max_target=0
*.nls_language='GERMAN'
*.nls_territory='GERMANY'
*.open_cursors=300
*.pga_aggregate_target=188m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=700m
*.shared_pool_size=536870912
*.streams_pool_size=4194304
*.undo_tablespace='UNDOTBS1'
Please don't blame me for this, because I did not write it. It certainly doesn't look like the sample init.ora and I am not at all certain where the syntax came from. The values I changed were:
DBN.__sga_target=1024m
*.sga_target=1024m
*.memory_max_target=1408m
DBN.__pga_aggregate_target=384m and *.pga_aggregate_target=384m
That's the order in which I made the changes. After each change I used sqlplus to firstly recreate the spffile with:
create spfile='spfileDBN.ora' from pfile='initDBN.ora';
This was followed by an attempt to startup the database with startup nomount. In each case I got an error message which lead me to make the next change.
Finally I got the error which is in the title of this post. When I tried to search for information on this, the findings were grim. Mostly the information dealt with other parameters and did not explain what this error actually meant. The only thing that gave any real background was this link from Burleson Consulting. It didn't really help me solve the problem, so I decided to revert the initDBN.ora file and do some more research. A slow database is generally better than no database.
But Hey! I still get that same error, even after reerting to the original init file. I'm getting desperate now. I have no idea how to fix this. From what I've read to date, setting "underscore variables" in your init file is a "NO NO".
Can anybody provide me with some helpful tips as to how to get rid of this error?
We don't know if the apps running on this database need specific block sizes, but if the priority is getting the database open, you can shrink the init.ora down the smallest, simplest set of parameters that gets you moving forward.
*.audit_file_dest='/orabin/app/oracle/admin/tmf/adump'
*.audit_trail='db'
*.compatible='12.2.0'
*.control_files='/orabin/app/oracle/oradata/tmf/control01.ctl','/orabin/app/oracle/fast_recovery_area/tmf/control02.ctl'
*.db_block_size=8192
*.db_domain='ubs-hainer.com'
*.db_name='tmf'
*.db_recovery_file_dest='/orabin/app/oracle/fast_recovery_area/tmf'
*.db_recovery_file_dest_size=4096m
*.diagnostic_dest='/orabin/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=TMFXDB)'
*.local_listener='LISTENER_TMF'
*.nls_language='GERMAN'
*.nls_territory='GERMANY'
*.open_cursors=300
*.pga_aggregate_target=188m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1000m
*.undo_tablespace='UNDOTBS1'
should get you an open database. Notice I have bumped up the sga_target to 1000m which is about the minimum you need to get a database started. The true values for sga_target and pga_aggregate_target really need to be set based on your expected usage, and the server configuration. But the init.ora above should get your database running.
I am not sure that this really qualifies as a "solution", but it does fix the initial problem. As mentioned in my reply to Connor McDonald, I set the parameter _shared_pool_reserved_min_alloc to 3000 in the initDBN.ora file, which I copied from Connor's example (thanks for that). After recreating the spfile and trying to restart the database, I got the following error:
ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 11953766
That got me thinking that the value 0 in the original error was probably a standin value which really means "the maximum allowed". By actually setting the parameter, I have apparently managed to generate an error which is more meaningful.
The value of _shared_pool_reserved_min_alloc is now set to 4200, which is a value I recall reading in one of the less helpful posts. (No, that post did not say that this is a value that should be used, just that it could be used.) This time, after re-creating the spfile I was able to start the database.
Before I do any more fiddling with parameters, I will do a bit more research... or maybe a lot more.

How to rebuild an elasticsearch index for lots of data without it getting "Killed" after 15 hours or so

I have about 130 million articles in my Postgres database on AWS. I am trying to index them with elasticsearch. In a screen, I entered:
python manage.py search_index --rebuild -f --parallel --model [APP NAME].[MODEL NAME]
Everything began correctly. The output was
Deleting index '[MODEL NAME]'
Creating index '[MODEL NAME]'
Indexing 129413202 'MODEL NAME' objects (parallel)
But after about 15 hours, the output was "Killed". I was running this on a t2.xlarge EC2 instance, which has 16 GBs of memory. Interestingly, the "Killed" message happened after I saw that the connection to the AWS server was broken, but that shouldn't matter if the process was run in a screen. Any idea what the issue is? Do I just need to get an even larger EC2 instance?
A process unexpectedly exiting with message Killed often means it received a SIGKILL; if so then the exit code would be 137. Hard to be certain here, a process can obviously print Killed and exit with code 137 anyway, but assuming you're not doing that in your code then this is what I'd check next.
An unexpected SIGKILL often comes from the kernel's OOM killer which takes action when the system runs out of memory and typically kills the process with the largest memory footprint. If so it will have logged details in the kernel logs that you can read with dmesg.
If it was the OOM killer then this sounds like a bug in this indexing code. Indexing a large body of documents into Elasticsearch should require pretty limited working memory, nowhere near 16GB, but it's easy to accidentally keep too much data in memory for too long which would lead to excessive memory usage.
python manage.py search_index suggests you're using the Django Elasticsearch DSL which fixed a performance issue relatively recently. Make sure you're using a version that contains this fix.

What can lead to failures in appending data to a file?

I maintain a program that is responsible for collecting data from a data acquisition system and appending that data to a very large (size > 4GB) binary file. Before appending data, the program must validate the header of this file in order to ensure that the meta-data in the file matches that which has been collected. In order to do this, I open the file as follows:
data_file = fopen(file_name, "rb+");
I then seek to the beginning of the file in order to validate the header. When this is done, I seek to the end of the file as follows:
_fseeki64(data_file, _filelengthi64(data_file), SEEK_SET);
At this point, I write the data that has been collected using fwrite(). I am careful to check the return values from all I/O functions.
One of the computers (windows 7 64 bit) on which we have been testing this program intermittently shows a condition where the data appears to have been written to the file yet neither the file's last changed time nor its size changes. If any of the calls to fopen(), fseek(), or fwrite() fail, my program will throw an exception which will result in aborting the data collection process and logging the error. On this machine, none of these failures seem to be occurring. Something that makes the matter even more mysterious is that, if a restore point is set on the host file system, the problem goes away only to re-appear intermittently appear at some future time.
We have tried to reproduce this problem on other machines (a vista 32 bit operating system) but have had no success in replicating the issue (this doesn't necessarily mean anything since the problem is so intermittent in the first place.
Has anyone else encountered anything similar to this? Is there a potential remedy?
Further Information
I have now found that the failure occurs when fflush() is called on the file and that the win32 error that is being returned by GetLastError() is 665 (ERROR_FILE_SYSTEM_LIMITATION). Searching google for this error leads to a bunch of reports related to "extents" for SQL server files. I suspect that there is some sort of journaling resource that the file system is reporting and this because we are growing a large file by opening it, appending a chunk of data, and closing it. I am now looking for understanding regarding this particular error with the hope for coming up with a valid remedy.
The file append is failing because of a file system fragmentation limit. The question was answered in What factors can lead to Win32 error 665 (file system limitation)?

The bizarre case of the file that both is and isn’t there

In .Net 3.5, I have the following code.
If File.Exists(sFilePath & IndexFileName & ".NX") Then
Kill(sFilePath & IndexFileName & ".NX")
End If
At runtime, on one client's machine, I get the following exception, over and over, when this code executes
Source: Microsoft.VisualBasic
TargetSite: Microsoft.VisualBasic.FileSystem.Kill
Message: No files found matching 'I:\RPG\HGIAPVXD.NX'.
StackTrace:
at Microsoft.VisualBasic.FileSystem.Kill(String PathName)
(More trace that identifies the exact line of code.)
There are two people on different machines running this code, but only one of them is getting the exception. The exception does not happen every time, but it is happening regularly. (Multiple times every hour.) The code is not in a loop, nor does it run continuously, more like once every couple of minutes or so.
On the surface, this looks like a race condition, but given how infrequently this code is run and how often the error is happening I think there must be something else going on.
I would appreciate any suggestions on how I can track down what is really going on here. A solution to keep the error from happening would be even better.
I guess the first question to ask is "IS the file really there or not?" and if so, does it have any specical attributes (Is it Read-only or Hidden, or System --- or a Directory)?
Note the Microsoft.VisualBasic.FileSystem.Kill specifically looks for, and silently skips, any file marked "System" or "Hidden". For pretty much any other problem you would have gotten a different exception.
as James pointed out the Kill functions checks if the file in case is a system or hidden, you better use System.IO.File.Delete() instead
Try
System.IO.File.Delete(sFilePath & IndexFileName & ".NX")
Catch ex As System.Exception
...
End Try
using File.Exits is not neccasary because File.Delete() checks this by itself.
Is there any chance that the I: drive is a network drive? it could be some network issue... or then maybe a race condition

Resources