Pentaho Dimension Lookup/Update deadlock error - etl

I have a Dimension Lookup/update step, and I am trying to update a table with data from JSON files, but it is failing with the following error:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - ERROR (version 9.1.0.0-324, build 9.1.0.0-324 from 2020-09-07 05.09.05 by buildguy) : Because of an error this step can't continue:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Couldn't get row from result set
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Transaction (Process ID 78) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This is the configuration of the Dimension Lookup/update step.
and this is part of the transformation
If I use only one copy to start the step, it works everything ok, but if I put more than one it gives me the mentioned error. The thing is that the error seems to be casual, sometime crashes after inserting two rows, other times it inserts everything without giving the error.
Searching the documentation and in interned didn't help much, I was not able to fix it. I read that could be a insertion order problem or a primary key related problem, but the data is fine (the keys are unique) and the configuration of the step seems ok. What I noticed is that does not insert the Technical key in order, I think is because it depends on the process that finishes first, but I don't find a way to force it (assuming this is the problem).
Does anyone know which is the problem here, and how could I fix it? Thank you.

Don’t run multiple copies of the Lookup/Update step. It has a commit size of 100, so if you have 2 copies of the step you have two threads concurrently trying to update the same table. Most likely one of them is locking the table ( or a block of rows) that the other tries to write and that lock causes a timeout.
Why it sometimes crashes and sometimes works? It’s actually a bit random: each copy receives a batch of rows to act upon, and it depends on which rows are sent to each copy and how many updates are required.

So I managed to solve the problem finally. The problem was not related with Pentaho strictly but with SQL Server. In particular, I had to redefine the index on the table on which the insertion was made. You will find more details in this answer: Insert/Update deadlock with SQL Server

Related

How to manually corrupt the Oracle CLOB data

I'm wondering if there's any way to manually corrupt the CLOB data for testing purpose.
I can find the steps for intensional block corruption, but can't find anything for the individual data in a table. Can anyone help me with this?
Below is what I'm trying to do and I need help for step 1:
Prepare the corrupted CLOB data
Run expdb and get ORA-01555 error
Test if my troubleshooting procedure works ok
Some background:
DB: Oracle 12.2.0.1 SE2
OS: Windows Server 2016
The app we're using (from the 3rd party) seems to occasionally corrupt the CLOB data when a certain type of data gets inserted in a table. We don't know what triggers it. The corruption doesn't affect the app's function, but leaving it unfixed gives the following error when running expdb for daily backup:
ORA-01555: snapshot too old: rollback segment number
CLOB consists of a mix of alphanumeric characters and line breaks. It gets inserted by the app, no manual insert takes place
Fixing/replacing the app isn't an option, so we've got a fixing procedure with us.
I took over this from another engineer (who's left already), but since then the app is happily working and no problem has occurred so far. I want to test run the fixing procedure in DEV environment, but the app doesn't reproduce the problem for me.
So I thought if I can manually prepare the "broken" CLOB for testing purpose
So this looks like it is caused by a known bug:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364607910994084&parent=DOCUMENT&sourceId=833635.1&id=787004.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_200
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364470181721910&id=846079.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_53
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364481844925661&id=833635.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_102
The main point here is that the corruption isn't caused by anything inherant in the data, but is more likely caused by something like concurrent access to the LOB by multiple updates (application or end-user behavior), or just by apparently random chance. As such, I doubt that there's any way for you to easily force this condition in order to validate your test for it.

Correct SSIS executing tasks in random order

This is a repost from 8 years ago, since the solutions provided there didn't worked for me, maybe now there are more alternatives for me and the other people which had that problem and couldn't solve it as well.
I have six Data Flow Task as shown in the following screenshot:
They execute in a different order everytime I execute them, and even the first one executes twice. I've recreated the tasks, hoping it was SSIS executing them in creation order.
They run in a random order each time I execute the package despite the Precedence Constraints, so I decided to recreate the WHOLE package. Failed as well.
It simply feels like Microsoft is messing up with me, since I don't find another explanation.
Any help provided will be a relief for me if my post is not voted as Redundant.
/Edited in order to add info/
My real problem is SSIS not inserting data in a defined order. It just executes the data insert as it pleases. Because I do need data to be natively stored with a specified order. I've done it before, just don't get why this tiem is different. I could however run a ORDER BY to get the data as I want except it's not me the one who'll be accessing the data, hope the one who's gonna extract and print the data notice that.
The biggest issue however is SSIS executing twice a random task, as I can't have for any reason a duplicate of the data as it will be later used for summarizing as well.(I suspect this is connected to the random order execution since the guy who posted the original question had the exact same issue as me).
The real way to notice these issues is not looking at the SSIS processes, but looking at the data stored in the DB. Sorry if I was unclear about my problem.
The SSIS log doesn't show you the tasks in the order that they run in. In your screenshot above it looks like it put them in alphabetical order, in fact.
Just because Abril is above Enero in your execution log doesn't mean that Abril ran first and Enero ran second.
Addendum based on comments below:
You are under the misconception that if you INSERT data into a database in a certain order, that when you SELECT that data without specifying an ORDER BY, you will get the data in the order it was inserted. This turns out not to be the case. The ONLY guaranteed way to get data from a database in a certain order is to use an ORDER BY clause when you SELECT it.
Let me be perfectly clear about this. When you say "I get my data from March being listed first than my data from January, meaning it was inserted first", you are wrong.
As for why your January data seems be to getting inserted twice, we would need to see the details of all the working parts: the original source data, the destination data before insert, the destination data after insert, and the SSIS package that does the insert. Without enough information to reproduce the issue ourselves, there is no way we can help you understand why it is happening in your package.

How to cause a "ORA-01555: snapshot too old error" without updates

I am running into ORA-01555: snapshot too old errors with Oracle 9i but am not running any updates with this application at all.
The error occurs after the application has been connected for some hours without any queries, then every query (which would otherwise be subsecond queries) comes back with a ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6$" too small.
Could this be cause of transaction isolation set to TRANSACTION_SERIALIZABLE? Or some other bug in the JDBC code? This could be caused by a bug in the jdbc-go driver but everything I've read about this bug has led me to believe scenarios where no DML statements are made this would not occur.
Read below a very good insight on this error by Tom Kyte. The problem in your case may come from what is called 'delayed block cleanout'. This is a case where selects creates redo. However, the root cause is almost sure improper size of rollback segments(but Tom adds as correlated causes: too frequently commits, a too big read after many updates, etc).
snaphot too old error (Tom Kyte)
When you run a query on an Oracle database the result will be what Oracle calls a "Read consistent snapshot".
What it means is that all the data items in the result will be represented with the value as of the the time the query was started.
To achieve this the DBMS looks into the rollback segments to get the original value of items which have been updated since the start of the query.
The DBMS uses the rollback segment in a circular way and will eventually wrap around - overwriting the old data.
If your query needs data that is no longer available in the rollback segment you will get "snapshot too old".
This can happen if your query is running for a long time on data being concurrently updated.
You can prevent it by either extending your rollback segments or avoid running the query concurrently with heavy updaters.
I also believe newer versions of Oracle provides better dynamic management of rollback segments than what is the case for Oracle 9i.

Upgrade to SonarQube 4.5.1 fails at UpdateMeasuresDebtToMinutes

I am trying to update from 4.0 to 4.5.1 but the process always fails at UpdateMeasuresDebtToMinutes. I am using MySQL 5.5.27 as a database with InnoDB as table engine.
Basically the problem looks like this problem
After the writeTimeout exceeds (600 seconds) there is an exception in the log
Caused by: java.io.EOFException: Can not read response from server. Expected to read 81 bytes, read 15 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3166) ~[mysql-connector-java-5.1.27.jar:na]
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3676) ~[mysql-connector-java-5.1.27.jar:na]
Adding the indexes as proposed in the linked issue did not help.
Investigating further I noticed several things:
the migration step reads data from a table and wants to write back to the same table (project_measures)
project_measures contains more than 770000 rows
the process always hangs after 249 rows
the hanging happens in org.sonar.server.migrations.MassUpdate when calling update.addBatch() which after the BatchSession.MAX_BATCH_SIZE (250) forces an execute and a commit
is there a way to configure the DB connection to allow this to proceed?
First of all, could you try to revert your db to 4.0 and try again ?
Then, could you please give us the JDBC url (sonar.jdbc.url) you're using ?
Thanks
As I need that sonar server to run I finally implemented a workaround.
It seems I cannot write to the database at all, as long as a big result set is still open (I tried with a second table but the same issue as before).
Therefore I changed all migrations that need to read and write the project_measurestable (org.sonar.server.db.migrations.v43.TechnicalDebtMeasuresMigration, org.sonar.server.db.migrations.v43.RequirementMeasuresMigration, org.sonar.server.db.migrations.v44.MeasureDataMigration) to load the changed data into a memory structure and after closing the read resultset write it back.
This is as hacky as it sounds and will not work for larger datasets where you would need to this with paging through the data or storing everything into a secondary datastore.
Furthermore I found that later on (in 546_inverse_rule_key_index.rb) an index needs to be created on the rules table which is larger than the max key length on mysql (2 varchar(255) columns with UTF-8 is more than 1000bytes .. ) so I had to limit the key length on that too ..
As I said, it is a workaround and therefore I will not accept it as an answer ..

Dropping a table partition avoiding the error ORA-00054

I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)

Resources