Upgrade to SonarQube 4.5.1 fails at UpdateMeasuresDebtToMinutes - sonarqube

I am trying to update from 4.0 to 4.5.1 but the process always fails at UpdateMeasuresDebtToMinutes. I am using MySQL 5.5.27 as a database with InnoDB as table engine.
Basically the problem looks like this problem
After the writeTimeout exceeds (600 seconds) there is an exception in the log
Caused by: java.io.EOFException: Can not read response from server. Expected to read 81 bytes, read 15 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3166) ~[mysql-connector-java-5.1.27.jar:na]
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3676) ~[mysql-connector-java-5.1.27.jar:na]
Adding the indexes as proposed in the linked issue did not help.
Investigating further I noticed several things:
the migration step reads data from a table and wants to write back to the same table (project_measures)
project_measures contains more than 770000 rows
the process always hangs after 249 rows
the hanging happens in org.sonar.server.migrations.MassUpdate when calling update.addBatch() which after the BatchSession.MAX_BATCH_SIZE (250) forces an execute and a commit
is there a way to configure the DB connection to allow this to proceed?

First of all, could you try to revert your db to 4.0 and try again ?
Then, could you please give us the JDBC url (sonar.jdbc.url) you're using ?
Thanks

As I need that sonar server to run I finally implemented a workaround.
It seems I cannot write to the database at all, as long as a big result set is still open (I tried with a second table but the same issue as before).
Therefore I changed all migrations that need to read and write the project_measurestable (org.sonar.server.db.migrations.v43.TechnicalDebtMeasuresMigration, org.sonar.server.db.migrations.v43.RequirementMeasuresMigration, org.sonar.server.db.migrations.v44.MeasureDataMigration) to load the changed data into a memory structure and after closing the read resultset write it back.
This is as hacky as it sounds and will not work for larger datasets where you would need to this with paging through the data or storing everything into a secondary datastore.
Furthermore I found that later on (in 546_inverse_rule_key_index.rb) an index needs to be created on the rules table which is larger than the max key length on mysql (2 varchar(255) columns with UTF-8 is more than 1000bytes .. ) so I had to limit the key length on that too ..
As I said, it is a workaround and therefore I will not accept it as an answer ..

Related

Pentaho Dimension Lookup/Update deadlock error

I have a Dimension Lookup/update step, and I am trying to update a table with data from JSON files, but it is failing with the following error:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - ERROR (version 9.1.0.0-324, build 9.1.0.0-324 from 2020-09-07 05.09.05 by buildguy) : Because of an error this step can't continue:
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Couldn't get row from result set
2021/08/03 12:51:58 - dlu-insrting_in_table.0 - Transaction (Process ID 78) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This is the configuration of the Dimension Lookup/update step.
and this is part of the transformation
If I use only one copy to start the step, it works everything ok, but if I put more than one it gives me the mentioned error. The thing is that the error seems to be casual, sometime crashes after inserting two rows, other times it inserts everything without giving the error.
Searching the documentation and in interned didn't help much, I was not able to fix it. I read that could be a insertion order problem or a primary key related problem, but the data is fine (the keys are unique) and the configuration of the step seems ok. What I noticed is that does not insert the Technical key in order, I think is because it depends on the process that finishes first, but I don't find a way to force it (assuming this is the problem).
Does anyone know which is the problem here, and how could I fix it? Thank you.
Don’t run multiple copies of the Lookup/Update step. It has a commit size of 100, so if you have 2 copies of the step you have two threads concurrently trying to update the same table. Most likely one of them is locking the table ( or a block of rows) that the other tries to write and that lock causes a timeout.
Why it sometimes crashes and sometimes works? It’s actually a bit random: each copy receives a batch of rows to act upon, and it depends on which rows are sent to each copy and how many updates are required.
So I managed to solve the problem finally. The problem was not related with Pentaho strictly but with SQL Server. In particular, I had to redefine the index on the table on which the insertion was made. You will find more details in this answer: Insert/Update deadlock with SQL Server

How to manually corrupt the Oracle CLOB data

I'm wondering if there's any way to manually corrupt the CLOB data for testing purpose.
I can find the steps for intensional block corruption, but can't find anything for the individual data in a table. Can anyone help me with this?
Below is what I'm trying to do and I need help for step 1:
Prepare the corrupted CLOB data
Run expdb and get ORA-01555 error
Test if my troubleshooting procedure works ok
Some background:
DB: Oracle 12.2.0.1 SE2
OS: Windows Server 2016
The app we're using (from the 3rd party) seems to occasionally corrupt the CLOB data when a certain type of data gets inserted in a table. We don't know what triggers it. The corruption doesn't affect the app's function, but leaving it unfixed gives the following error when running expdb for daily backup:
ORA-01555: snapshot too old: rollback segment number
CLOB consists of a mix of alphanumeric characters and line breaks. It gets inserted by the app, no manual insert takes place
Fixing/replacing the app isn't an option, so we've got a fixing procedure with us.
I took over this from another engineer (who's left already), but since then the app is happily working and no problem has occurred so far. I want to test run the fixing procedure in DEV environment, but the app doesn't reproduce the problem for me.
So I thought if I can manually prepare the "broken" CLOB for testing purpose
So this looks like it is caused by a known bug:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364607910994084&parent=DOCUMENT&sourceId=833635.1&id=787004.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_200
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364470181721910&id=846079.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_53
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=364481844925661&id=833635.1&_afrWindowMode=0&_adf.ctrl-state=3xr6s00uo_102
The main point here is that the corruption isn't caused by anything inherant in the data, but is more likely caused by something like concurrent access to the LOB by multiple updates (application or end-user behavior), or just by apparently random chance. As such, I doubt that there's any way for you to easily force this condition in order to validate your test for it.

How to cause a "ORA-01555: snapshot too old error" without updates

I am running into ORA-01555: snapshot too old errors with Oracle 9i but am not running any updates with this application at all.
The error occurs after the application has been connected for some hours without any queries, then every query (which would otherwise be subsecond queries) comes back with a ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6$" too small.
Could this be cause of transaction isolation set to TRANSACTION_SERIALIZABLE? Or some other bug in the JDBC code? This could be caused by a bug in the jdbc-go driver but everything I've read about this bug has led me to believe scenarios where no DML statements are made this would not occur.
Read below a very good insight on this error by Tom Kyte. The problem in your case may come from what is called 'delayed block cleanout'. This is a case where selects creates redo. However, the root cause is almost sure improper size of rollback segments(but Tom adds as correlated causes: too frequently commits, a too big read after many updates, etc).
snaphot too old error (Tom Kyte)
When you run a query on an Oracle database the result will be what Oracle calls a "Read consistent snapshot".
What it means is that all the data items in the result will be represented with the value as of the the time the query was started.
To achieve this the DBMS looks into the rollback segments to get the original value of items which have been updated since the start of the query.
The DBMS uses the rollback segment in a circular way and will eventually wrap around - overwriting the old data.
If your query needs data that is no longer available in the rollback segment you will get "snapshot too old".
This can happen if your query is running for a long time on data being concurrently updated.
You can prevent it by either extending your rollback segments or avoid running the query concurrently with heavy updaters.
I also believe newer versions of Oracle provides better dynamic management of rollback segments than what is the case for Oracle 9i.

Server issue when searching Oracle database

I have a JEE application searching a large Oracle databse for data. The application uses JDBC to query the database.
The issue I am having is that the results page is unable to be displayed. I get the following error:
The connection to the server was reset while the page was loading.
This happens after 60 seconds. When I run the sql query manually using a SQL client, the results return in 3 seconds.
I have checked the logs and there are no exceptions that I can see.
Do any of you know the best way to find what is causing the connection to be reset? If I break my search date range into 2, and search both ranges individually, both return results. So it seems that it's the larger result set causing the issue.
Any help is welcome.
You are probably right about the larger result set. Often when running a query from a SQL client, you'll get the first set of records right away. If you page down to force pull of all records, then it bogs down. Perhaps your hitting the same issue with JDBC client where it takes more than 60 sec to get all the rows. I've not done JDBC in a while, but can you get it to stream the result set?
Regards,
Roger
All views are mine ...

Why am I receiving a 255 Record insertion limit on Oracle 10g using DBExpress?

I am hitting an brick wall during normal SQL processing
When connected to Oracle 10g from a remote client w/DBEXPRESS, - using the standard dbxpora.dll + oci.dll
When in a transaction, after exactly 255 record insertions, the connection hangs for 30 seconds and returns the error:
ORA-03114 (as if it lost the connection...)
This happens when inserting 255 records into any table while in a transaction. (when run locally on the db box everything works fine)
Is there something I am missing?
Well, I don't know Oracle databases specifically, but I do know that 255 is a magic number. It's the maximum value you can express in a single byte. There's probably something declared as a Byte that's counting your records, and you're overflowing it. Try rebuilding your entire project with range checking and overflow checking turned on, and see if it raises an exception somewhere when you try to do this. That should help track it down, if it's actually in code you're compiling. If it's in one of the libraries, that won't help.

Resources