LogStash JDBC input scheduling issue - is it possible to run query a minute after the previous query finished - jdbc

I'm using the Jdbc input in LogStash to retrieve data from MS SQL database once in a minute.
Usually it works fine. But we know database performance is not very reliable thing and sometime it's takes longer than one minute to a query to return. Sometime event 5 minutes.
But the Jdbc scheduler still run a query once a minute so there situations when multiple queries run at the same time. This creates additional pressure on a database and after some time there are 20 almost same queries run at the same time.
I assume I'm not the first person which encounter this problem. I'm sure there is some way to make Jdbc to run next query a minute after the previous once is finished. Am I right?

Related

Fetching rows with Snowflake JDBC while the query is running in the server

I have a complex query that runs a long time (e.g 30 minutes) in Snowflake when I run it in the Snowflake console. I am making the same query from a JVM application using JDBC driver. What appears to happen is this:
Snowflake processes the query from start to finish, taking 30 minutes.
JVM application receives the rows. The first receive happens 30 minutes after the query started.
What I'd like to happen is that Snowflake starts to send rows to my application while it is still executing the query, as soon as data is ready. This way my application could start processing the rows in the first 30 minutes.
Is this possible with Snowflake and JDBC?
First of all, I would request to check the Snowflake warehouse size and do the tuning. It's not worth waiting for 30 mins when by resizing of the warehouse, the query time can be reduced one fourth or less than that. By doing any of the below, your cost will be almost the same or low. The query execution time will be reduced linearly as you increase the warehouse size. Refer the link
Scale up by resizing a warehouse.
Scale out by adding clusters to a warehouse (requires Snowflake
Enterprise Edition or higher).
Now coming to JDBC, I believe it behaves the same way as for other databases as well

How to recover from network error when running a large Teradata query?

I have a java job that runs a query on Teradata and pushes the results to a local database. It's a large query (>80M records) and can take hours to finish (The slowness is not due to Teradata, but the local DB). Because it takes so long, there is a chance that it gets interrupted by a network error or something. When that happens I get this exception:
org.skife.jdbi.v2.exceptions.ResultSetException: Unable to advance result set
If the failure occurs a few hours into the query then it cannot rerun the query because the job needs to deliver the result before a specific time every day. Is there a way to resume the query after such failure? I'm not sure if pagination is a good option because the query involves joining a few tables and the tables are updated frequently.

Azure SQL Data IO 100% for extended periods for no apparent reason

I have an Azure website running about 100K requests/hour and it connects to Azure SQL S2 database with about 8GB throughput/day. I've spent a lot of time optimizing the database indexes, queries, etc. Normally the Data IO, CPU and Log IO percentages are well behaved in the 20% range.
A recent portion of the data throughput is retained for supporting our customers. I have a nightly maintenance procedure that removes obsolete data to manage database size. This mostly works well with the exception of removing image blobs in a varbinary(max) field.
The nightly procedure has a loop that sets 10 records varbinary(max) field to null at a time, waits a couple seconds, then sets the next 10. Nightly total for this loop is about 2000.
This loop will run for about 45 - 60 minutes and then stop running with no return to my remote Sql Agent job and no error reported. A second and sometimes third running of the procedure is necessary to finish setting the desired blobs to null.
In an attempt to alleviate the load on the nightly procedure, I started running a job once every 30 seconds throughout the day - it sets one blob to null each time.
Normally this trickle job is fine and runs in 1 - 6 seconds. However, once or twice a day something goes wrong and I can find no explanation for it. The Data I/O percentage peaks at 100% and stays there for 30 - 60 minutes or longer. This causes the database responsiveness to suffer and the website performance goes with it. The trickle job also reports running for this extended period of time. If I stop the Sql Agent job, it can take a few minutes to stop but the Data I/O continues at 100% for the 30 - 60 minute period.
The web service requests and database demands are relatively steady throughout the business day - no volatile demands that would explain this. No database deadlocks or other errors are reported. It's as if the database hits some kind of backlog limit where its ability to keep up suddenly drops and then it can't catch up until something that is jammed finally clears. Then the performance will suddenly return to normal.
Do you have any ideas what might be causing this intermittent and unpredictable issue? Any ideas what I could look at when one of these events is happening to determine why the Data I/O is 100% for an extended period of time? Thank you.
If you are on SQL DB V12, you may also consider using the Query Store feature to root cause this performance problem. It's now in public preview.
In order to turn on Query Store just run the following statement:
ALTER DATABASE your_db SET QUERY_STORE = ON;

Impala query stuck in status Executing

I have a query CREATE TABLE foobar AS SELECT ... that runs successfully in Hue (the returned status is Inserted 986571 row(s)) and takes a couple seconds to complete. However, in Cloudera Manager its status - after more than 10 minutes - still says Executing.
Is it a bug in Cloudera Manager or is this query actually still running?
When Hue executes a query, it leaves the query open so that users can page through results at their own pace. (Of course, this behavior isn't very useful for DDL statements.) That means that Impala still considers the query to be executing, even if it is not actively using CPU cycles (keep in mind it is still holding memory!). Hue will close the query if explicitly told to, or when the page/session is closed, e.g. using the hue command:
> build/env/bin/hue close_queries --help
Note that Impala has a query option to automatically 'timeout' queries after a period of time, see query_timeout_s. Hue sets this to 10 minutes by default, but you can override it in the hue.ini settings.
One thing to note is that when queries 'time out', they are cancelled but not closed, i.e. the query will remain "in flight" with a CANCELLED status. The reason for this is so that users (or tools) can continue to observe the query metadata (e.g. query profile, status, etc.), which would not be available if the query is fully closed and thus deregistered from the impalad. Unfortunately these cancelled queries may still hold some non-negligible resources, but this will be fixed with IMPALA-1575.
More information: Hive and Impala queries life cycle

Oracle query running slower second time

I'm trying to run a query against an Oracle database. The first time I run it, it takes about 7 seconds. The second time I try to run it, it doesn't seem to ever actually complete. This is in both Oracle SQL developer and in the application I am developing (using JDBC Oracle Thin).
If I add a space to the query somewhere where I haven't done before, the query takes 7 seconds again.
I'm assuming this is because Oracle thinks it is a new query. Is there any way I can force Oracle to treat a query as one it hasn't seen before, even though it has?

Resources