Fetching rows with Snowflake JDBC while the query is running in the server - jdbc

I have a complex query that runs a long time (e.g 30 minutes) in Snowflake when I run it in the Snowflake console. I am making the same query from a JVM application using JDBC driver. What appears to happen is this:
Snowflake processes the query from start to finish, taking 30 minutes.
JVM application receives the rows. The first receive happens 30 minutes after the query started.
What I'd like to happen is that Snowflake starts to send rows to my application while it is still executing the query, as soon as data is ready. This way my application could start processing the rows in the first 30 minutes.
Is this possible with Snowflake and JDBC?

First of all, I would request to check the Snowflake warehouse size and do the tuning. It's not worth waiting for 30 mins when by resizing of the warehouse, the query time can be reduced one fourth or less than that. By doing any of the below, your cost will be almost the same or low. The query execution time will be reduced linearly as you increase the warehouse size. Refer the link
Scale up by resizing a warehouse.
Scale out by adding clusters to a warehouse (requires Snowflake
Enterprise Edition or higher).
Now coming to JDBC, I believe it behaves the same way as for other databases as well

Related

LogStash JDBC input scheduling issue - is it possible to run query a minute after the previous query finished

I'm using the Jdbc input in LogStash to retrieve data from MS SQL database once in a minute.
Usually it works fine. But we know database performance is not very reliable thing and sometime it's takes longer than one minute to a query to return. Sometime event 5 minutes.
But the Jdbc scheduler still run a query once a minute so there situations when multiple queries run at the same time. This creates additional pressure on a database and after some time there are 20 almost same queries run at the same time.
I assume I'm not the first person which encounter this problem. I'm sure there is some way to make Jdbc to run next query a minute after the previous once is finished. Am I right?

Sybase connection is idle for a long time

I'm reading data from a table in Sybase using a Table Input step. The query is really simple:
SELECT person_ref, displayname FROM person
That table has about 2 million rows. I'm connecting to Sybase ASE 12. My user has read-only rights. PDI is using the jconnect driver with the following options:
IMPLICIT_CURSOR_FETCH_SIZE=5000
SELECT_OPENS_CURSOR=True
I've also tried using the noholdlock option on that query to change the isolation level.
The problem is that the query seems to remain idle for a long time, nearly a minute. PDI indicates that the step is in idle state for that time and then changes to Running. This makes it hard to measure the time the process takes, because PDI won't start measuring time until the steps change state from idle.
I can't seem to find anything in the manuals, or any option that will speed up the read time by decreasing or eliminating this idel time. Is there any option I'm missing? Does the idle status mean that PDI is just waiting for a response from Sybase?
Maybe your query is long to retreive the data.
The latence time is in the jdbc architecture. It sends the query to the database, who stores the data in a buffer. Only when this buffer is full, the data is transferred back to PDI. Until it receives some data, the Input table is in idle mode.
If you want to measure the time including the idle time, put a step that will fire without any latency, for example a Generate row (1 row is enough) step. You do not need to connect this step to any thing, as the PDI will start all the steps in parallel as soon as possible.
You won't see the total result on the Input table row of the the Step Metrics bottom tab. But you will have the result on the Metrics.
You can also use a Block this step until steps finish. You have an example in the sample directory that was shipped with your distribution. Open youKettleInstallDir/sample/transformation/Block this step until steps finish.ktr, and replace the top row with your flow. Then watch the statistics of the blocking step.
In my opinion, you have another step in your transformation locking the tables person. There is an overwhelming probability that you have a Output table step trying to truncate the table person.
I don't know if this is what I would call an answer, but I definitely found a way to get the Sybase connection to respond quickly. There's a querying tool called Sybase anywhere, that you can use to query the DB directly. What I did was look into an installation in a separate machine that had a good connection.
That machine had an ODBC connection defined for the Sybase DB, and the install of the client tool had its own version of Sybase drivers, along with some DLL files. I tool the jars and dll's and put them in the machine that had PDI installed. I made sure they were all in the classpath, and created a generic JDBC connection that pointed to the system ODBC one. It's going at the speed you would expect now.

Long Running Query on MSSQL

In my team, we need to connect to Oracle, Sybase and MSSQL very frequently... We use Oracle's SQLDeveloper 3.3.2 Version to connect all 3 (using third party libs). This tool often has a problem that select queries never ends... Even if we get the results, it will keep on running... And because of this we receive database alerts for long running queries...
E.g.
Select * from products
If products has million records, then SQLDeveloper will show top records but in background the query will keep on running.
How Can this problem be solved?
Or
Is there a better product which can fulfill our need.
Your query - select * from products - is asking the database engine to send millions of records to your client application (SQLDeveloper in this case).
While SQLDeveloper (and many other GUIs of a similar design) will show you the first 30 (or 50, or 100, etc) rows, as far as the database engine is concerned you're still asking to see millions of rows hence your query continues to 'run' in the database engine.
For example, in Sybase ASE the query will show up with a status of 'send sleep' meaning the database engine is waiting for the client application to request the next batch of records to send down the connection.
To 'solve' this issue you have a few options:
using SQLDeveloper: scroll through (ie, display on your monitor) the
rest of the multi-million row result set [likely not what you want to
do; likely you don't have the time/desire to hit the 'Next' button
100's of thousands of times]
kill off your query after you've received/viewed the first set of
records [not recommended as there will likely be times when you
'forget' to kill of your query, thus earning the wrath of your DBA]
write your query to pull back only the records you REALLY want/need to see (eg, add a WHERE clause to limit the set of rows)
see if SQLDeveloper has any sort of configuration option to
auto-kill any 'long running' queries [I have no idea if this is even
doable in a client application]
see if the DBA can configure your login with a resource limit (eg,
auto-kill queries if they run for more than XX seconds)

SegmentCacheManager - warmup segments strategy/SQL in parallel?

We run Mondrian (version "3.7.preview.506") on a Tomcat Webserver.
We have some long running MDX-queries.
For example: The first calculation takes 276.764 ms and sends 84 SQL requests to the database (30 to 700ms for each SQL statement).
We see that the SQL-Statements are not executed in parallel - only two "mondrian.rolap.agg.SegmentCacheManager$sqlExecutor" are running at the same time.
Is there a way to force Mondrian/olap4j to execute the SQL statments more in parallel?
What is about the property "mondrian.rolap.maxSqlThreads" which is set to 100 by default?
Afterwards we execute the same MDX query and the calculation is finished in 4.904 ms.
Conclusion - if the "internal cache" (mondrian.rolap.agg.SegmentCacheManager) has loaded the segments the calculation is executed without any database request - but ...
3.How can we "warm up" the internal cache?
One way we tried was to rewrite the MDX-queries - we load several month into the cache by once (MDX-B):
MDX-A: SELECT ... ON ROWS FROM cube01 WHERE {[Time].[Default].[2017].[4]}
becomes
MDX-B: SELECT ... ON COLUMNS, CrossJoin( ... ,{[Time].[Default].[2017].[2]:[Time].[Default].[2017].[4]})" + " ON ROWS FROM cube01
The rewriten MDX query takes 1.235.128 ms (244 SQL requests) - afterwards we execute our orgin MDX query (MDX-A) and the calculating takes 6.987 ms
- the interessting part for us was, that the calculation takes longer as 5 sec. (compared with the second execution of the same query),
even if we did not have any SQL request anymore.
The warm-up of the cache does not work as expected (in our opinion) - MDX-B takes much longer to collect data with one statement, as we would run the the monthly execution in three steps (Febrary to April) - and the calculation in memory also takes more time - why - how does loading segmentation really works?
What is the best practice to load the segments to speed up calculation in memory?
Is there a way to feed the "Mondrian-Cube" with simple SQL statements?
Thanks in advance.
Fact table with 3.026.236 rows - growing daily
6 dimension tables.
Date dimension table 21.183 rows.
We have monitored our test classes with JVM's VisualAdmin.
Mondrian 3.7.preview.506 - olap4j-1.1.0
Database: Oracle Database 11g Release 11.2.0.4.0 - 64bit
(we tried to use also memSQL database, we was only 50% faster ...)

Azure SQL Data IO 100% for extended periods for no apparent reason

I have an Azure website running about 100K requests/hour and it connects to Azure SQL S2 database with about 8GB throughput/day. I've spent a lot of time optimizing the database indexes, queries, etc. Normally the Data IO, CPU and Log IO percentages are well behaved in the 20% range.
A recent portion of the data throughput is retained for supporting our customers. I have a nightly maintenance procedure that removes obsolete data to manage database size. This mostly works well with the exception of removing image blobs in a varbinary(max) field.
The nightly procedure has a loop that sets 10 records varbinary(max) field to null at a time, waits a couple seconds, then sets the next 10. Nightly total for this loop is about 2000.
This loop will run for about 45 - 60 minutes and then stop running with no return to my remote Sql Agent job and no error reported. A second and sometimes third running of the procedure is necessary to finish setting the desired blobs to null.
In an attempt to alleviate the load on the nightly procedure, I started running a job once every 30 seconds throughout the day - it sets one blob to null each time.
Normally this trickle job is fine and runs in 1 - 6 seconds. However, once or twice a day something goes wrong and I can find no explanation for it. The Data I/O percentage peaks at 100% and stays there for 30 - 60 minutes or longer. This causes the database responsiveness to suffer and the website performance goes with it. The trickle job also reports running for this extended period of time. If I stop the Sql Agent job, it can take a few minutes to stop but the Data I/O continues at 100% for the 30 - 60 minute period.
The web service requests and database demands are relatively steady throughout the business day - no volatile demands that would explain this. No database deadlocks or other errors are reported. It's as if the database hits some kind of backlog limit where its ability to keep up suddenly drops and then it can't catch up until something that is jammed finally clears. Then the performance will suddenly return to normal.
Do you have any ideas what might be causing this intermittent and unpredictable issue? Any ideas what I could look at when one of these events is happening to determine why the Data I/O is 100% for an extended period of time? Thank you.
If you are on SQL DB V12, you may also consider using the Query Store feature to root cause this performance problem. It's now in public preview.
In order to turn on Query Store just run the following statement:
ALTER DATABASE your_db SET QUERY_STORE = ON;

Resources