Vertica: Moveout error while moving partitions - vertica

In my ETL processes, I have a few lines like:
CODE: SELECT ALL
select move_partitions_to_table ('stg.A', 0, 24, 'tmp_table.A')
erroring out with
CODE: SELECT ALL
ERROR: A Moveout operation is already in progress on projection stg.B_all_nodes_b0 [txnid 45035996338570033 session fqdn-25466:0xd053]
In short, a moveout on table A fails because a moveout on table B is currently happening. On a sidenote, the error happens instantly, not after a timeout expired.
Does that make sense? One moveout per table I can understand, but one moveout only at a time for the whole cluster surely cannot be correct?
I did look at the tm pool, and the planned and max concurrency are 4 and 6 respectively. I upgrade the max concurrency to 20 but this made no difference at all.
This error happens only on Vertica 7.1.0-1. I upgraded yesterday from 6.3, and this is the only issue I get from the upgrade.
Would anybody have any idea how I could fix this short of downgrading?
Thanks,

It turns out that move_partitions_to_table() does trigger a moveout. This is quite normal as Vertica will try to first empty the WOS to disk before moving the partition to another table.
The issue is that there is a bug in Vertica 7.1.0-1, which means that move_partitions_to_table() generates a moveout for all tables instead of only for the table in question. The effect is that you cannot run 2 move_partitions_to_table() in parallel, even on 2 unrelated tables.
Vertica support is now aware of this, and it will be fixed in a release soon.

Related

Pentaho Dimension Lookup/Update deadlock error

I have a Dimension Lookup/update step, and I am trying to update a table with data from JSON files, but it is failing with the following error:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - ERROR (version 9.1.0.0-324, build 9.1.0.0-324 from 2020-09-07 05.09.05 by buildguy) : Because of an error this step can't continue:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Couldn't get row from result set
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Transaction (Process ID 78) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This is the configuration of the Dimension Lookup/update step.
and this is part of the transformation
If I use only one copy to start the step, it works everything ok, but if I put more than one it gives me the mentioned error. The thing is that the error seems to be casual, sometime crashes after inserting two rows, other times it inserts everything without giving the error.
Searching the documentation and in interned didn't help much, I was not able to fix it. I read that could be a insertion order problem or a primary key related problem, but the data is fine (the keys are unique) and the configuration of the step seems ok. What I noticed is that does not insert the Technical key in order, I think is because it depends on the process that finishes first, but I don't find a way to force it (assuming this is the problem).
Does anyone know which is the problem here, and how could I fix it? Thank you.
Don’t run multiple copies of the Lookup/Update step. It has a commit size of 100, so if you have 2 copies of the step you have two threads concurrently trying to update the same table. Most likely one of them is locking the table ( or a block of rows) that the other tries to write and that lock causes a timeout.
Why it sometimes crashes and sometimes works? It’s actually a bit random: each copy receives a batch of rows to act upon, and it depends on which rows are sent to each copy and how many updates are required.
So I managed to solve the problem finally. The problem was not related with Pentaho strictly but with SQL Server. In particular, I had to redefine the index on the table on which the insertion was made. You will find more details in this answer: Insert/Update deadlock with SQL Server

What will happen when inserting a row during a long running query

I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!

How to cause a "ORA-01555: snapshot too old error" without updates

I am running into ORA-01555: snapshot too old errors with Oracle 9i but am not running any updates with this application at all.
The error occurs after the application has been connected for some hours without any queries, then every query (which would otherwise be subsecond queries) comes back with a ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6$" too small.
Could this be cause of transaction isolation set to TRANSACTION_SERIALIZABLE? Or some other bug in the JDBC code? This could be caused by a bug in the jdbc-go driver but everything I've read about this bug has led me to believe scenarios where no DML statements are made this would not occur.
Read below a very good insight on this error by Tom Kyte. The problem in your case may come from what is called 'delayed block cleanout'. This is a case where selects creates redo. However, the root cause is almost sure improper size of rollback segments(but Tom adds as correlated causes: too frequently commits, a too big read after many updates, etc).
snaphot too old error (Tom Kyte)
When you run a query on an Oracle database the result will be what Oracle calls a "Read consistent snapshot".
What it means is that all the data items in the result will be represented with the value as of the the time the query was started.
To achieve this the DBMS looks into the rollback segments to get the original value of items which have been updated since the start of the query.
The DBMS uses the rollback segment in a circular way and will eventually wrap around - overwriting the old data.
If your query needs data that is no longer available in the rollback segment you will get "snapshot too old".
This can happen if your query is running for a long time on data being concurrently updated.
You can prevent it by either extending your rollback segments or avoid running the query concurrently with heavy updaters.
I also believe newer versions of Oracle provides better dynamic management of rollback segments than what is the case for Oracle 9i.

Why Oracle stuck after few deletes?

I have an application that do like:
delete from tableA where columnA='somevalue' and rownum<1000
In cycle like:
while(deletedRows>0) {
begin tran
deletedRows = session ... "delete from tableA where columnA='somevalue'
and rownum<1000"....
commit tran
}
It runs few times (each deleting takes near 20 seconds) and after hungs for long time
Why? Does it possible to fix?
Thanks.
The reason why the deletes are run in a loop rather than as a single SQL statement is lack of rollback space. See this question for more info.
Every time the query scans the table from the beginning. So, it scans the zones where there are no rows to delete(columnA='somevalue'). They are more and more far away from the first block of the table.
If the table is big and there would be no columnA='somevalue' the query will take the time to verify all the row for your condition.
What you can do is to make an index on columnA. In this case the engine will know faster where are the rows with that condition(search on index is exponential time faster).
Another possibility, if you are in a concurent system, is that someone updated a row that you ar trying to delete, but doesn't commited the transaction, so the row is locked.
You probably run into many different issues. As you are saying that database hungs the main reason is that your database is hitting ORA-00257 Archiver error.
Every delete produces a redo vector, all redos are then downloaded into an archive log. When archivelog space is exahausted your session hang and remain stuck until someone frees the space.
Usually your DBA has a job that run an archivelog backup every hour (this might be any couple of hours, or every 5 mins, depending by the database workload, etc...) and after the backup has done all sessions go ahead correctly.
Depending by the database configuration, from the client point of view, you might not see the error but just have the behaviour described where you session waits until the space is freed.
In term of design, I agree with other users that a DELETE in a loop is not a good idea. It could be interesting to know why you are trying to do this loop instead a single DELETE statement.

Snapshot too old error

I am getting 'snapshot too old error' frequently while i am running my workflow when it runs for more than 5 hrs.My source is oracle and target is Teradata. Please help to solve this issue.Thanks in advance
The best explanation of the ORA-01555 snapshot too old error that I've read, is found in this AskTom thread
Regards.
The snapshot too old error is more or less directly related to the running time of your queries (often a cursor of a FOR loop). So the best solution is to optimize your queries so they run faster.
As a short term solution you can try to increase the size of the UNDO log.
Update:
The UNDO log stores the previous version of a record before it's updated. It is used to rollback transactions and to retrieve older version of a record for consistent data snapshots for long running queries.
You'll probably need to dive into Oracle DB administration if you want to solve it via increasing the UNDO log. Basically you do (as SYSDBA):
ALTER SYSTEM SET UNDO_RETENTION = 21600;
21600 is 6 hours in seconds.
However, Oracle will only keep 6 hours of old data if the UNDO log files are big enough, which depends on the size of the rollback segments and the amount of updates executed on the database.
So in addition to changing the undo retention time, you should also make sure that few concurrent updates are executed while your job is running. In particular, updates of the data your job is reading should be minimized.
If everything fails, increase the UNDO logs.

Resources