Errors : ora-00604&ora-1578 &ora-01110 - oracle

I get ora-00604 & ora-1578 & ora-01110, any one have solution ????

That doesn't sound good.
ORA-01578: ORACLE data block corrupted (file # string, block # string)
Cause : The data block indicated was corrupted, mostly due to software errors.
Action: Try to restore the segment containing the block indicated.
This may involve dropping the segment and recreating it.
If there is a trace file, report the errors in it to your ORACLE representative.
system02 database file is corrupted; there's a possibility that hard disk crashed and you should check it for errors. As it is the database server, I presume that it would be safer if you replaced it (instead of just fixed it, because - when one data block gets corrupted, there's a good chance that it'll happen again (according to Murphy's law, at least)).
Furthermore, it means that you'll have to restore the database. I hope you have backup (the one you got BEFORE corruption happened).

Related

ClickHouse log shows hash of uncompressed files doesn't match

ClickHouse logs printed the error messages as below frequently:
2021.01.07 00:55:24.112567 [ 6418 ] {} <Error> vms.analysis_data (7056dab3-3677-455b-a07a-4d16904479b4):
Code: 40, e.displayText() = DB::Exception: Checksums of parts don't match:
hash of uncompressed files doesn't match (version 20.11.4.13 (official build)).
Data after merge is not byte-identical to data on another replicas. There could be several reasons:
1. Using newer version of compression library after server update.
2. Using another compression method.
3. Non-deterministic compression algorithm (highly unlikely).
4. Non-deterministic merge algorithm due to logical error in code.
5. Data corruption in memory due to bug in code.
6. Data corruption in memory due to hardware issue.
7. Manual modification of source data after server startup.
8. Manual modification of checksums stored in ZooKeeper.
9. Part format related settings like 'enable_mixed_granularity_parts' are different on different replicas.
We will download merged part from replica to force byte-identical result.
We use the same version(20.11.4.13) and the same compression method (LZ4) for all data nodes in the production environment, we wouldn't modify the data files or the values stored in Zookeeper also.
So my questions are:
How was the error caused? Furtherly, in which cases will the CickHouse server throws those exceptions?
Is there a checksum-checking mechanism among the replicas during the merging parts?
I also found that in one of our data nodes, there are many folders named like "ignored_20201208_23116_23116_0" in the detached folder, were these files the corrupted data caused by the referred problem?
Thanks.
You need to upgrade all nodes to 20.11.6.6 ASAP.
The reason of these errors is a serious bug related to AIO.
ignored_ -- it's not related. You can remove them.
gtranslate: Inactive parts are not deleted immediately, because when writing a new part, fsync is not called, i.e. for some time, the new part is only in the server's RAM (OS cache). So when the server is rebooted spontaneously, a new (merged) part can be lost or damaged. In this case, ClickHouse, during the startup process is checking the integrity of the parts, if it detects a problem, it returns the inactive chunks to the active list, and later merge them again. In this case, the broken piece is renamed (the prefix broken_ is added) and moved to the detached folder. If the integrity check detects no problems in the merged part, then the original inactive chunks are renamed (prefix ignored_ is added) and moved to the detached folder.

Oracle 12c startup error: ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0

We have a number of databases at our company. Among them an oracle 12c (12.2.0.1.0 to be precise), but we have no (qualified) oracle DBAs. The performance has deteriorated drastically in the last 6 months or so and I now have the task of finding out why. My research suggested that I should up some memory parameters in the initDBN.ora file. Here's what the original looked like:
DBN.__data_transfer_cache_size=0
DBN.__db_cache_size=50331648
DBN.__inmemory_ext_roarea=0
DBN.__inmemory_ext_rwarea=0
DBN.__java_pool_size=79691776
DBN.__large_pool_size=8388608
DBN.__oracle_base='/orabin/app/oracle'#ORACLE_BASE set from environment
DBN.__pga_aggregate_target=197132288
DBN.__sga_target=734003200
DBN.__shared_io_pool_size=12582912
DBN.__shared_pool_size=536870912
DBN.__streams_pool_size=4194304
*.audit_file_dest='/orabin/app/oracle/admin/tmf/adump'
*.audit_trail='db'
*.compatible='12.2.0'
*.control_files='/orabin/app/oracle/oradata/tmf/control01.ctl','/orabin/app/oracle/fast_recovery_area/tmf/control02.ctl'
*.db_16k_cache_size=8388608
*.db_32k_cache_size=8388608
*.db_4k_cache_size=8388608
*.db_block_size=8192
*.db_domain='ubs-hainer.com'
*.db_name='tmf'
*.db_recovery_file_dest='/orabin/app/oracle/fast_recovery_area/tmf'
*.db_recovery_file_dest_size=4096m
*.diagnostic_dest='/orabin/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=TMFXDB)'
*.local_listener='LISTENER_TMF'
*.memory_max_target=0
*.nls_language='GERMAN'
*.nls_territory='GERMANY'
*.open_cursors=300
*.pga_aggregate_target=188m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=700m
*.shared_pool_size=536870912
*.streams_pool_size=4194304
*.undo_tablespace='UNDOTBS1'
Please don't blame me for this, because I did not write it. It certainly doesn't look like the sample init.ora and I am not at all certain where the syntax came from. The values I changed were:
DBN.__sga_target=1024m
*.sga_target=1024m
*.memory_max_target=1408m
DBN.__pga_aggregate_target=384m and *.pga_aggregate_target=384m
That's the order in which I made the changes. After each change I used sqlplus to firstly recreate the spffile with:
create spfile='spfileDBN.ora' from pfile='initDBN.ora';
This was followed by an attempt to startup the database with startup nomount. In each case I got an error message which lead me to make the next change.
Finally I got the error which is in the title of this post. When I tried to search for information on this, the findings were grim. Mostly the information dealt with other parameters and did not explain what this error actually meant. The only thing that gave any real background was this link from Burleson Consulting. It didn't really help me solve the problem, so I decided to revert the initDBN.ora file and do some more research. A slow database is generally better than no database.
But Hey! I still get that same error, even after reerting to the original init file. I'm getting desperate now. I have no idea how to fix this. From what I've read to date, setting "underscore variables" in your init file is a "NO NO".
Can anybody provide me with some helpful tips as to how to get rid of this error?
We don't know if the apps running on this database need specific block sizes, but if the priority is getting the database open, you can shrink the init.ora down the smallest, simplest set of parameters that gets you moving forward.
*.audit_file_dest='/orabin/app/oracle/admin/tmf/adump'
*.audit_trail='db'
*.compatible='12.2.0'
*.control_files='/orabin/app/oracle/oradata/tmf/control01.ctl','/orabin/app/oracle/fast_recovery_area/tmf/control02.ctl'
*.db_block_size=8192
*.db_domain='ubs-hainer.com'
*.db_name='tmf'
*.db_recovery_file_dest='/orabin/app/oracle/fast_recovery_area/tmf'
*.db_recovery_file_dest_size=4096m
*.diagnostic_dest='/orabin/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=TMFXDB)'
*.local_listener='LISTENER_TMF'
*.nls_language='GERMAN'
*.nls_territory='GERMANY'
*.open_cursors=300
*.pga_aggregate_target=188m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1000m
*.undo_tablespace='UNDOTBS1'
should get you an open database. Notice I have bumped up the sga_target to 1000m which is about the minimum you need to get a database started. The true values for sga_target and pga_aggregate_target really need to be set based on your expected usage, and the server configuration. But the init.ora above should get your database running.
I am not sure that this really qualifies as a "solution", but it does fix the initial problem. As mentioned in my reply to Connor McDonald, I set the parameter _shared_pool_reserved_min_alloc to 3000 in the initDBN.ora file, which I copied from Connor's example (thanks for that). After recreating the spfile and trying to restart the database, I got the following error:
ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 11953766
That got me thinking that the value 0 in the original error was probably a standin value which really means "the maximum allowed". By actually setting the parameter, I have apparently managed to generate an error which is more meaningful.
The value of _shared_pool_reserved_min_alloc is now set to 4200, which is a value I recall reading in one of the less helpful posts. (No, that post did not say that this is a value that should be used, just that it could be used.) This time, after re-creating the spfile I was able to start the database.
Before I do any more fiddling with parameters, I will do a bit more research... or maybe a lot more.

Appropriate way to cancel saving file via file stream?

A tool I'm writing is responsible for downloading thousands of image files over a matter of many hours. Originally, using TIdHTTP, I would Get the file(s) into a TMemoryStream, and then save that to a file, so long as there were no exceptions. In order to improve speed, I changed the TMemoryStream to a TFileStream.
However, now if the resource was not found, or otherwise any sort of exception which results in no actual file, it still saves an empty file.
Completely understandable, since I simply create a file stream just prior to the download...
FileStream:= TFileStream.Create(FileName, fmCreate);
try
Web.Get(AURL, FileStream);
finally
FileStream.Free;
end;
I know I could simply delete the file if there was an exception. But it seems far too sloppy. I'm sure there's a more appropriate method of aborting such a situation.
How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?
How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?
This isn't possible in general. Errors and failures can happen at any step if the way, including part way through the download. Once this point is understood, then you must accept that the file can be partially downloaded and then abandoned. At which point where do you store it?
The obvious choices are memory and file. You don't want to store to memory, which leaves to file.
This takes you back to your current solution.
I know I could simply delete the file if there was an exception.
This is the correct approach. There are a few variants on this. For instance you might download to a temporary file that is created with flags to arrange its deletion when closed. Only if the download completes do you then copy to the true destination. This is the approach that a browser takes. But the basic idea is to download to file and deal with any failure by tidying up.
Instead of downloading the entire image in one go, you could consider using HTTP range requests if the server supports it. Then you could chunk the file into smaller parts, requesting the next part after the first finishes (or even requesting multiple parts at the same time to increase performance). If there is an exception then you can about the future requests, so they never start in the first place.
YouTube and a number of streaming media sites started doing this a while ago. It used to be if you started playing a video, then paused it, then it would eventually cache the entire video. Now it only caches a little ahead of the current position. This saves a ton of bandwidth because of the abandon rate for videos.
You could write the partial file to disk or keep it in memory.

What can lead to failures in appending data to a file?

I maintain a program that is responsible for collecting data from a data acquisition system and appending that data to a very large (size > 4GB) binary file. Before appending data, the program must validate the header of this file in order to ensure that the meta-data in the file matches that which has been collected. In order to do this, I open the file as follows:
data_file = fopen(file_name, "rb+");
I then seek to the beginning of the file in order to validate the header. When this is done, I seek to the end of the file as follows:
_fseeki64(data_file, _filelengthi64(data_file), SEEK_SET);
At this point, I write the data that has been collected using fwrite(). I am careful to check the return values from all I/O functions.
One of the computers (windows 7 64 bit) on which we have been testing this program intermittently shows a condition where the data appears to have been written to the file yet neither the file's last changed time nor its size changes. If any of the calls to fopen(), fseek(), or fwrite() fail, my program will throw an exception which will result in aborting the data collection process and logging the error. On this machine, none of these failures seem to be occurring. Something that makes the matter even more mysterious is that, if a restore point is set on the host file system, the problem goes away only to re-appear intermittently appear at some future time.
We have tried to reproduce this problem on other machines (a vista 32 bit operating system) but have had no success in replicating the issue (this doesn't necessarily mean anything since the problem is so intermittent in the first place.
Has anyone else encountered anything similar to this? Is there a potential remedy?
Further Information
I have now found that the failure occurs when fflush() is called on the file and that the win32 error that is being returned by GetLastError() is 665 (ERROR_FILE_SYSTEM_LIMITATION). Searching google for this error leads to a bunch of reports related to "extents" for SQL server files. I suspect that there is some sort of journaling resource that the file system is reporting and this because we are growing a large file by opening it, appending a chunk of data, and closing it. I am now looking for understanding regarding this particular error with the hope for coming up with a valid remedy.
The file append is failing because of a file system fragmentation limit. The question was answered in What factors can lead to Win32 error 665 (file system limitation)?

Transaction implementation for a simple file

I'm a part of a team writing an application for embedded systems. The application often suffers from data corruption caused by power shortage. I thought that implementing some kind of transactions would stop this from happening. One scenario would include copying the area of a file before writing to some additional storage (transaction log). What are other possibilities?
Databases use a variety of techniques to assure that the state is properly persisted.
The DBMS often retains a replicated control file -- several synchronized copies on several devices. Two is enough. More if your're paranoid. The control file provides a few key parameters used to locate the other files and their expected states. The control file can include a "database version number".
Each file has a "version number" in several forms. A lot of times it's in plain form plus in some XOR-complement so that the two version numbers can be trivially checked to have the correct relationship, and match the control file version number.
All transactions are written to a transaction journal. The transaction journal is then written to the database files.
Before writing to database files, the original data block is copied to a "before image journal", or rollback segment, or some such.
When the block is written to the file, the sequence numbers are updated, and the block is removed from the transaction journal.
You can read up on RDBMS techniques for reliability.
There's a number of ways to do this; generally the only assumption required is that small writes (<4k) are atomic. For example, here's how CouchDB does it:
A 4k header contains, amongst other things, the file offset of the root of the BTree containing all the data.
The file is append-only. When updates are required, write the update to the end of the file, followed by any modified BTree nodes, up to and including the root. Then, flush the data, and write the new address of the root node to the header.
If the program dies while writing an update but before writing the header, the extra data at the end of the file is discarded. If it fails after writing the header, the write is complete and all is well. Because the file is append-only, these are the only failure scenarios. This also has the advantage of providing multi-version concurrency control with no read locks.
When the file grows too long, simply read out all the 'live' data and write it to a new file, then delete the original.
You can avoid implementing such transaction logs yourself by using existing transaction managers around file-systems, e.g. XADisk.
The old link is no longer available, a github repo is here.

Resources