In SQL Server Always ON configuration - will Transaction Log backup to Nul breaks Always On configuration? - database-backups

Imagine we have two nodes participating in SQL 2012 AO. This is a test instance. During one of the index rebuild operation the log was grown up really big (250 GB). We are unable to back it up due to space constraint. What if we backup the Tlog to Nul (just to shrink it down) – will that break Always On?

AlwaysOn is a (marketing) umbrella term that covers both Availability Groups (AGs) and Failover Cluster Instances (FCIs). From context, I assume you are asking about AGs?
For both FCIs and AGs, the short answer is the same: performing transaction log backups (regardless of the destination) will not "break" your HA capabilities. However, I would urge you to NEVER EVER back up to NUL:, unless you don't care about the data in your database. Taking a log backup to NUL: (Regardless of if you were using an AG, FCI, or neither) will break your log backup chain, and prevent point-in-time recovery.
If you are using an Availability Group, SQL Server does not use transaction log backups to synchronize between nodes. It uses the transaction log itself, and therefore will not clear the transaction log if there is log data that needs to be synchronized to another node. That is to say: if your AG synchronization is behind, your transaction log will continue to fill/grow until synchronization catches up, regardless of the number of transaction log backups performed.
There are multiple reasons your transaction log might continue to grow, and AG synchronization is just one of those reasons. If SQL Server cannot reuse the transaction log because of unsynchronized transactions in the AG, the log_reuse_wait_desc column in sys.databases will show the value "AVAILABILITY_REPLICA".
Getting back to your root problem: Rebuilding an index made your transaction log get really, really big.
When you perform an ALTER INDEX...REBUILD, SQL Server creates the entire new index (a size-of-data operation), and must be able to roll back the index creation if it errors or is killed prior to completion. Therefore, you may see the log_reuse_wait_desc column in sys.databases showing as "ACTIVE_TRANSACTION" during a very large, long-running index rebuild. The rebuild itself would prevent you from reusing the log, and would cause the log to grow.

Related

Can a Oracle query done after a commit, return values prior to the commit when such commit is done with COMMIT_WRITE = NOWAIT?

I have a 3th party Java library that in a moment, gets a JDBC connection, starts a transaction, does several batch updates with PreparedStatement.addBatch(), executes the batch, commits the transaction and closes the connection. Almost immediately after (in the span of <10 milliseconds), the library gets another connection and queries one of the records affected by the update.
For the proper functioning of the library, that query should return the updated record. However, in some rare cases, I'm getting (using P6Spy) that the query is returning the record with its values before the update (and the library fails in some point forwards due to unexpected data).
I'm trying to understand why this would happen, and then I found that in my database (Oracle 19c) there is a parameter COMMIT_WAIT that basically gives the possibility that a call to a commit doesn't block until the commit is finished, obtaining an asynchronous commit. So I used the SHOW PARAMETERS to see the value of that parameter and I found out that COMMIT_WAIT is set up to NOWAIT (also, COMMIT_LOGGING was set up to BATCH).
I began to speculate if what was happening was that the call to commit() just started the operation (without waiting for it to finish), and perhaps the next query occurred while the operation was still in progress, returning the value of the record before the transaction. (The isolation level for all connections is Connection.TRANSACTION_READ_COMMITTED)
Can COMMIT_WAIT set up to NOWAIT cause that kind of scenario? I read that the use of NOWAIT has a lot of risks associated with it, but mostly they refers to things like loss of durability if the database crashes.
Changing the commit behavior should not affect database consistency and should not cause wrong results to be returned.
A little background - Oracle uses REDO for durability (recovering data after an error) and uses UNDO for consistency (making sure the correct results are always returned for any point-in-time). To improve performance, there are many tricks to reduce REDO and UNDO. But changing the commit behavior doesn't reduce the amount of logical REDO and UNDO, it only delays and optimizes the REDO physical writes.
Before a commit happens, and even before your statements return, the UNDO data used for consistency has been written to memory. Changing the commit behavior won't stop the changes from making their way to the UNDO tablespace.
Per the Database Reference for COMMIT_WAIT, "Also, [the parameter] can violate the durability of ACID (Atomicity, Consistency, Isolation, Durability) transactions if the database shuts down unexpectedly." Since the manual is already talking about the "D" in ACID, I assume it would also explicitly mention if the parameter affects the "C".
On the other hand, the above statements are all just theory. It's possible that there's some UNDO optimization bug that's causing the parameter to break something. But I think that would be extremely unlikely. Oracle goes out of its way to make sure that data is never lost or incorrect. (I know because even when I don't want REDO or UNDO it's hard to turn them off.)

Oracle v19: can ongoing transactions block concurrent deletes on involved tables for extended periods?

We have a severe issue with threads hanging in operations to an Oracle DB (v19, connected to via JDBC connections).
The situation frequently happens while our application runs a big transaction within which it does a lot of major (i.e. quite complicated, lots of joins, etc.) queries and then updates a bunch of rows. These transactions can take several minutes.
As we were able to analyze so far the transaction processing blocks other concurrent tasks when they try to delete individual entries from tables that are involved in said transaction. Concurrent selects and also updates to these same tables work fine! It's only deletes that have issues! And, as we were able to "proof", this happens even for deletes of individual entries that for sure do not interfere with or touch on any entry involved in the ongoing transaction.
While we first suspected Hibernate to interfere and do funny things for deletions we had to learn that even deletes executed via SQLDeveloper (i.e. triggered "manually" by a completely unrelated DB session and client) do hang during such periods.
To us it almost seems as if an ongoing transaction does not only lock specific rows from manipulation but locks entire tables.
But can that really be that a transaction block entire tables from concurrent delete operations for extended periods?
We think that would be absurd but - as we had to learn and can easily reproduce - deleting entries from tables touched by our long-running transaction invariably hang. Several times we also witnessed that - as soon as the transaction finishes - those deletes that haven't timed out, yet, continue and run to completion.
We are not aware of doing anything weird or unusual in our Hibernate-based application. We certainly don't fiddle with any locking mechanism or such. Any idea or hint what could cause these hangs and/or in which direction to investigate further to resolve this?
Later addition:
We are currently considering the following work-around: we add a column to these tables where we mark entries as being "to-be-deleted" (instead of actually deleting them as we do now). We then run a regular job during times (e.g. nightly) which actually deletes these entries. We "only" need to make sure that no transaction is ever executed on these tables while that delete-job runs.
I really hate that approach, esp. since it will require to add another condition to many queries to exclude those "virtually deleted" entries but we have no better idea so far.

Does redo logs store all the changes applied to database buffer cache?

I know that the redo log entries are created when there is insert/update/delete/create/drop/alter occurs. What information gets stored in redo log ? In case of instance failure, redo log file is used to recover database, does it contain information on changes applied database buffer cache?
If one does redo log mining (to view exactly what is in the redo logs), there is a view that tells you what they store: V$LOGMNR_CONTENTS.
This typically shows:
- Operation: INSERT, UPDATE, DELETE, or DDL
- SCN - system change number - very important for recovery
- The transaction to which a change belongs
- The table and schema name of the modified object
- The name of the user who issued the DDL or DML
- the SQL needed to redo and/or undo your changes.
So yes, the redo logs contain changes made when dirty buffers are committed. They are used to reconstruct the database in the case of failure. They also protect rollback data, as the SQL both for redo and undo are stored and are played back during recovery.
V$LOG and V$LOGFILE show how your redo data files are allocated. You would generally want them in pairs, as you have a backup in case one is lost. You also want to have at least 3 groups (pairs) as some are active, some are current, and some are being written to the archive logs which are also critical for recovery.

Kafka Streams with lookup data on HDFS

I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day).
How can I load this in the Kafka Streams application and join to the actual KStream?
What would be the best practice to reread the data from HDFS when a new file arrives there?
Or would it be better switching to Kafka Connect and write the RDBMS table content to a Kafka topic which can be consumed by all the Kafka Streams application instances?
Update:
As suggested Kafka Connect would be the way to go. Because the lookup data is updated in the RDBMS on a daily basis I was thinking about running Kafka Connect as a scheduled one-off job instead of keeping the connection always open. Yes, because of semantics and the overhead of keeping a connection always open and making sure that it won't be interrupted..etc. For me having a scheduled fetch in this case looks safer.
The lookup data is not big and records may be deleted / added / modified. I don't know either how I can always have a full dump into a Kafka topic and truncate the previous records. Enabling log compaction and sending null values for the keys that have been deleted would probably won't work as I don't know what has been deleted in the source system. Additionally AFAIK I don't have a control when the compaction happens.
The recommend approach is indeed to ingest the lookup data into Kafka, too -- for example via Kafka Connect -- as you suggested above yourself.
But in this case how can I schedule the Connect job to run on a daily basis rather than continuously fetch from the source table which is not necessary in my case?
Perhaps you can update your question you do not want to have a continuous Kafka Connect job running? Are you concerned about resource consumption (load on the DB), are you concerned about the semantics of the processing if it's not "daily udpates", or...?
Update:
As suggested Kafka Connect would be the way to go. Because the lookup data is updated in the RDBMS on a daily basis I was thinking about running Kafka Connect as a scheduled one-off job instead of keeping the connection always open. Yes, because of semantics and the overhead of keeping a connection always open and making sure that it won't be interrupted..etc. For me having a scheduled fetch in this case looks safer.
Kafka Connect is safe, and the JDBC connector has been built for exactly the purpose of feeding DB tables into Kafka in a robust, fault-tolerant, and performant way (there are many production deployments already). So I would suggest to not fallback to "batch update" pattern just because "it looks safer"; personally, I think triggering daily ingestions is operationally less convenient than just keeping it running for continuous (and real-time!) ingestion, and it also leads to several downsides for your actual use case (see next paragraph).
But of course, your mileage may vary -- so if you are set on updating just once a day, go for it. But you lose a) the ability to enrich your incoming records with the very latest DB data at the point in time when the enrichment happens, and, conversely, b) you might actually enrich the incoming records with stale/old data until the next daily update completed, which most probably will lead to incorrect data that you are sending downstream / making available to other applications for consumption. If, for example, a customer updates her shipping address (in the DB) but you only make this information available to your stream processing app (and potentially many other apps) once per day, then an order processing app will ship packages to the wrong address until the next daily ingest will complete.
The lookup data is not big and records may be deleted / added / modified. I don't know either how I can always have a full dump into a Kafka topic and truncate the previous records. Enabling log compaction and sending null values for the keys that have been deleted would probably won't work as I don't know what has been deleted in the source system.
The JDBC connector for Kafka Connect already handles this automatically for you: 1. it ensures that DB inserts/updates/deletes are properly reflected in a Kafka topic, and 2. Kafka's log compaction ensures that the target topic doesn't grow out of bounds. Perhaps you may want to read up on the JDBC connector in the docs to learn which functionality you just get for free: http://docs.confluent.io/current/connect/connect-jdbc/docs/ ?

Does using NOLOGGING in Oracle break ACID? specifically during poweroutage

When using NOLOGGING in Oracle, say for inserting new records. Will my database be able to gracefully recover from a power outage? if it randomly went down during the insert.
Am I correct in stating that the the UNDO logs will be used for such recoveries ... as opposed to REDO log usage which be be used for recovery if the main datafiles were physically corrupted.
It seems to me, you're muddling some concepts together here.
First, let's talk about instance recovery. Instance recovery is what happens following a database crash, whether it is killed, server goes down, etc. On instance startup, Oracle will read data from the redo logs and roll forward, writing all pending changes to the datafiles. Next, it will read undo, determine which transactions were not committed, and use the data in undo to rollback any changes that had not committed up to the time of the crash. In this way, Oracle guarantees to have recovered up to the last committed transaction.
Now, as to direct loads and NOLOGGING. It's important to note that NOLOGGING is only valid for direct loads. This means that updates and deletes are never NOLOGGING, and that INSERT is only nologging if you specify the APPEND hint.
It's important to understand that when you do a direct load, you are literally "directly loading" data into the datafiles. So, no need to worry about issues around instance recovery, etc. When you do a NOLOGGING direct load, data is still written directly to the datafiles.
It goes something like this. You do a direct load (for now, let set aside the issue of NOLOGGING), and data is loaded directly into the datafiles. The way that happens, is that Oracle will allocate storage from above the high water mark (HWM), and format and load those brand new blocks directly. When that block allocation is made, those data dictionary updates that describe the space allocation are written to and protected by redo. Then when your transaction commits, the changes become permanent.
Now, in the event of an instance crash, either the transaction was committed (in which case the data is in the datafiles and the data dictionary reflects those new extents have been allocated), or it was not committed, and the table looks exactly like it did before the direct load began. So, again, data up to and including the last committed transaction is recovered.
Now, NOLOGGING. Whether a direct load is logged or not, is irrelevant for the purposes of instance recovery. It will only come into play in the event of media failure and media recovery.
If you have a media failure, you'll need to recover from backup. So, you'll restore the corrupted datafile and then apply redo, from archived redo logs, to "play back" the transactions that occurred from the time of the backup to the current point in time. As long as all the changes were logged, this is not a problem, as all the data is there in the redo logs. However, what will happen in the event of a media failure subsequent to a NOLOGGING direct load?
Well, when the redo is applied to your segments that were loaded with NOLOGGING, the required data is not in the redo. So, those data dictionary transactions that I mentioned that created the new extents where data was loaded, those are in the redo, but nothing to populate those blocks. So, the extents are allocated to the segment, but then are also marked as invalid. So, if/when you attempt to select from the table, and hit those invalid blocks, you'll get ORA-26040 "data was loaded using the NOLOGGING option". This is Oracle letting you know you have a data corruption caused by recovery through a NOLOGGING operation.
So, what to do? Well, first off, any time you load data with NOLOGGING, make sure you can re-run the load, if necessary. So, if you do suffer an instance failure during the load, you can restart the load, or if your suffer a media failure between the time of the NOLOGGING load and the next backup, you can re-run the load.
Note that, in the event of a NOLOGGING direct load, you're only exposed to data loss until your next backup of the datafiles/tablespaces containing the segments that had the direct load. Once it's protected by backup, you're safe.
Hope this helps clarify the ideas around direct loads, NOLOGGING, instance recovery, and media recovery.
IF you use NOLOGGING you don't care about the data. Nologging operations should be recoverable with other procedures than the regular databases recovery procedures. Many times the recovery will happen without problems. Problem is when you have a power failure on the storage. In that case you might end up corrupting the online redo - that was active - and because of that also have problems with corrupt undo segments.
So, specifically in your case: I would not bet on it.
Yes, much of the recovery would be done by reading undo, that might get stuck because of exactly the situation you described. That is one of the nastiest problems to recover.
As to be 100% ACID compliant a DBMS needs to be serializable, this is very rare even amongst major vendors. To be serializable read, write and range locks need to be released at the end of a transaction. There are no read locks in Oracle so Oracle is not 100% ACID compliant.

Resources