This may have been asked numerous times but none of them helped me so far.
Here's some history:
QueryTimeOut: 120 secs
Database:DB2
App Server: JBoss
Framework: Struts 2
I've one query which fetches around a million records. Yes, we need to fetch it all at once for caching purpose, sadly can't change the design.
Now, we've 2 servers Primary and DR. In DR server, the query is getting executed within 30 secs, so no timeout issue there. But in Primary serverit is getting time out due to some unknown reason. Sometimes it is getting timed out in rs.next() and sometime in pstmt.executeQuery().
All DB indexes, connection pool etc are in place. The explain plan shows, there are no full table scan as well.
My Analysis:
Since query is not the issue here, there might be issue in Network delay?
How can I get the root cause behind this timeout. How can I make sure there are no connection leakage? (Since all connection are closed properly).
Any way to recover from the timeout and again execute the query with increased timeout value for e.g: pstmt.setQueryTimeOut(600)? <- Note that this has no effect whatsoever. Don't know why..!
Appreciate any inputs.
Thank You!
Related
I set up a CockroachDB cluster for a school project. The only thing I have done is created 1 database with 1 table with 1 instance of 6 rows, but when I look at the dashboard I have already used 500K RUs. This seems like a huge amount to me, but I'm new to cloud databases so I don't know if this is normal behavior or not. I'm just worried I will run out of RUs without doing anything on the database. In this image the graph of the RU usage can be seen when there are no connections and when the hub wasn't opened. Can anyone maybe clarify this for me?
I think this explanation is more likely to be the reason:
https://www.cockroachlabs.com/docs/cockroachcloud/serverless-faqs.html#my-cluster-doesnt-have-any-current-co[…]ing-rus-when-there-are-no-connections
To summarize, the monitoring console uses up some RUs. So if you have a browser tab open with the console, it will use RUs even if you don't have any connections open.
As that FAQ says, this can use ~8 RUs per second. Over 19 hours, that is about ~540,000 RUs total. The solution is to not leave the console open.
On the stats point, note that auto-stats collection is only triggered when data in the table changes.
I believe what you're seeing is the Automatic Metric collection. You can read more about it on this FAQ.
While sending periodically the same graph query to Hasura server, I have observed significantly different execution times
In one of the these cases, the query was executed under a single seconds, where as in another case the same query took more than 150 seconds. The execution times were captures from the Hasura "http-log" statements.
An additional observation from the corresponding "query-log" statements is that, the SQLs are generated in both cases, within similar times.
Any reason for the generated SQL being executed after a significant and considerable delay compared to the other.
Any specific reason for this inconsistent behaviour and any specific configurations that can be made to overcome this issue.
I don't know if that counts as an answer, it's certainly not "a general case answer" as it reflects only our experience.
We encountered similar problem: inconsistent latencies for the same queries.
Where we looked and what we found.
1. hasura
Hasura itself is a very thin and predictable layer above postgresql (and now other DBs too).
I'm not a haskel expert but I got impression that SQL generation comes from here: https://github.com/hasura/graphql-engine/blob/b2461c5899a881183ad2d269ebe8a2c6f55e46af/server/src-lib/Hasura/GraphQL/Execute/LiveQuery.hs
(I could be wrong and I will be grateful if somebody will correct me)
So:
hasura always generate the same SQL for the same query
this process is predictable
it has a low cost
Conclusion: hasura itself could not be source of different latencies. We need to look on DB level
2. What we encountered on DB level
We build a simple test: running the same query on DB
And we discovered that the same query is running as 100ms-100ms-2 seconds - 150 ms - 3 seconds - 90 ms.
We search for locks - and did not found them.
We looked on buffering - and discovered that almost all DB is cached in memory.
Finally our suspicion was that it's Azure Database (we used cloud postgresql from MS) is misbehaving.
We contacted support (and we had other questions to them) and finally we discovered that we simply hit IOPS limit.
This hypothesis was supported by simple fact: if we run VACUUM/REINDEX/ REFRESH MATERIALIZED VIEW/heavy procedures then DB became much less responsive for an amount of time.
We considered upgrading Azure Database but we had other problems and we wanted to upgrade postgresql version so finally we decided to migrate to Amazon RDS.
(That's not bashing Azure or promoting Amazon, personally I think that running on-premise would be the best)
After that all strange execution times disappeared.
Think yourself how that reflects your case.
In general I recommend to look on DB level only.
I have an Azure website running about 100K requests/hour and it connects to Azure SQL S2 database with about 8GB throughput/day. I've spent a lot of time optimizing the database indexes, queries, etc. Normally the Data IO, CPU and Log IO percentages are well behaved in the 20% range.
A recent portion of the data throughput is retained for supporting our customers. I have a nightly maintenance procedure that removes obsolete data to manage database size. This mostly works well with the exception of removing image blobs in a varbinary(max) field.
The nightly procedure has a loop that sets 10 records varbinary(max) field to null at a time, waits a couple seconds, then sets the next 10. Nightly total for this loop is about 2000.
This loop will run for about 45 - 60 minutes and then stop running with no return to my remote Sql Agent job and no error reported. A second and sometimes third running of the procedure is necessary to finish setting the desired blobs to null.
In an attempt to alleviate the load on the nightly procedure, I started running a job once every 30 seconds throughout the day - it sets one blob to null each time.
Normally this trickle job is fine and runs in 1 - 6 seconds. However, once or twice a day something goes wrong and I can find no explanation for it. The Data I/O percentage peaks at 100% and stays there for 30 - 60 minutes or longer. This causes the database responsiveness to suffer and the website performance goes with it. The trickle job also reports running for this extended period of time. If I stop the Sql Agent job, it can take a few minutes to stop but the Data I/O continues at 100% for the 30 - 60 minute period.
The web service requests and database demands are relatively steady throughout the business day - no volatile demands that would explain this. No database deadlocks or other errors are reported. It's as if the database hits some kind of backlog limit where its ability to keep up suddenly drops and then it can't catch up until something that is jammed finally clears. Then the performance will suddenly return to normal.
Do you have any ideas what might be causing this intermittent and unpredictable issue? Any ideas what I could look at when one of these events is happening to determine why the Data I/O is 100% for an extended period of time? Thank you.
If you are on SQL DB V12, you may also consider using the Query Store feature to root cause this performance problem. It's now in public preview.
In order to turn on Query Store just run the following statement:
ALTER DATABASE your_db SET QUERY_STORE = ON;
We have a daily batch job executing a oracle-plsql function. Actually the quartz scheduler invokes a java program which makes a call to the oracle-plsql function. This oracle plsql function deletes data (which is more than 6 months) from 4 tables and then commits the transaction.
This batch job was running successfully in the test environment but started failing when new data was dumped to the tables which happened 2 weeks ago (The code is supposed to go into production this week). Earlier the number of rows in each table was not more than 0.1 million. But now it is 1 million in 3 tables and 2.4 million in the other table.
After running for 3 hours, we are getting a error in java (written in the log file) "...Connection reset; nested exception is java.sql.SQLException: Io exception: Connection reset....". When the row-counts on the tables were checked, it was clear that no record was deleted from any of the tables.
Is it possible in oracle database, for the plsql procedure/function to be automatically terminated/killed when the connection is timed out and the invoking session is no longer active?
Thanks in advance,
Pradeep.
The PL/SQL won't terminate because it is inactive, since by definition it isn't - it is still doing something. It won't be generating any network traffic back to your client though.
It appears something at the network level is causing the connection to be terminated. This could be a listener timeout, a firewall timeout, or something else. If it's consistently after three hours then it will almost certainly be a timeout configured somewhere rather than a network glitch, which would be more random (and possibly recoverable).
When the network connection is interrupted, Oracle will notice at some point and terminate the session. That will cause the PL/SQL call to be terminated, and that will cause any work it has done to be rolled back, which may take a while.
3 hours seems a long time for your deletes though even for a few million records. Perhaps you're deleting inefficiently, with individual inserts within your procedure. Which doesn't really help you of course. It might be worth pointing out that your production environment might not have whatever setting is killing your connection, or might have a shorter timeout, so even reducing the runtime might not make it bullet-proof in live. You probably need to find the source of the timeout and check the equivalent in the live env. to try to pre-empt similar problems there.
I have a JEE application searching a large Oracle databse for data. The application uses JDBC to query the database.
The issue I am having is that the results page is unable to be displayed. I get the following error:
The connection to the server was reset while the page was loading.
This happens after 60 seconds. When I run the sql query manually using a SQL client, the results return in 3 seconds.
I have checked the logs and there are no exceptions that I can see.
Do any of you know the best way to find what is causing the connection to be reset? If I break my search date range into 2, and search both ranges individually, both return results. So it seems that it's the larger result set causing the issue.
Any help is welcome.
You are probably right about the larger result set. Often when running a query from a SQL client, you'll get the first set of records right away. If you page down to force pull of all records, then it bogs down. Perhaps your hitting the same issue with JDBC client where it takes more than 60 sec to get all the rows. I've not done JDBC in a while, but can you get it to stream the result set?
Regards,
Roger
All views are mine ...