How to recover from network error when running a large Teradata query? - jdbc

I have a java job that runs a query on Teradata and pushes the results to a local database. It's a large query (>80M records) and can take hours to finish (The slowness is not due to Teradata, but the local DB). Because it takes so long, there is a chance that it gets interrupted by a network error or something. When that happens I get this exception:
org.skife.jdbi.v2.exceptions.ResultSetException: Unable to advance result set
If the failure occurs a few hours into the query then it cannot rerun the query because the job needs to deliver the result before a specific time every day. Is there a way to resume the query after such failure? I'm not sure if pagination is a good option because the query involves joining a few tables and the tables are updated frequently.

Related

Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

Fetching rows with Snowflake JDBC while the query is running in the server

I have a complex query that runs a long time (e.g 30 minutes) in Snowflake when I run it in the Snowflake console. I am making the same query from a JVM application using JDBC driver. What appears to happen is this:
Snowflake processes the query from start to finish, taking 30 minutes.
JVM application receives the rows. The first receive happens 30 minutes after the query started.
What I'd like to happen is that Snowflake starts to send rows to my application while it is still executing the query, as soon as data is ready. This way my application could start processing the rows in the first 30 minutes.
Is this possible with Snowflake and JDBC?
First of all, I would request to check the Snowflake warehouse size and do the tuning. It's not worth waiting for 30 mins when by resizing of the warehouse, the query time can be reduced one fourth or less than that. By doing any of the below, your cost will be almost the same or low. The query execution time will be reduced linearly as you increase the warehouse size. Refer the link
Scale up by resizing a warehouse.
Scale out by adding clusters to a warehouse (requires Snowflake
Enterprise Edition or higher).
Now coming to JDBC, I believe it behaves the same way as for other databases as well

LogStash JDBC input scheduling issue - is it possible to run query a minute after the previous query finished

I'm using the Jdbc input in LogStash to retrieve data from MS SQL database once in a minute.
Usually it works fine. But we know database performance is not very reliable thing and sometime it's takes longer than one minute to a query to return. Sometime event 5 minutes.
But the Jdbc scheduler still run a query once a minute so there situations when multiple queries run at the same time. This creates additional pressure on a database and after some time there are 20 almost same queries run at the same time.
I assume I'm not the first person which encounter this problem. I'm sure there is some way to make Jdbc to run next query a minute after the previous once is finished. Am I right?

Server issue when searching Oracle database

I have a JEE application searching a large Oracle databse for data. The application uses JDBC to query the database.
The issue I am having is that the results page is unable to be displayed. I get the following error:
The connection to the server was reset while the page was loading.
This happens after 60 seconds. When I run the sql query manually using a SQL client, the results return in 3 seconds.
I have checked the logs and there are no exceptions that I can see.
Do any of you know the best way to find what is causing the connection to be reset? If I break my search date range into 2, and search both ranges individually, both return results. So it seems that it's the larger result set causing the issue.
Any help is welcome.
You are probably right about the larger result set. Often when running a query from a SQL client, you'll get the first set of records right away. If you page down to force pull of all records, then it bogs down. Perhaps your hitting the same issue with JDBC client where it takes more than 60 sec to get all the rows. I've not done JDBC in a while, but can you get it to stream the result set?
Regards,
Roger
All views are mine ...

SQL Server 2000 query performance

We have migrated our few serevr to named instance and I have a situation where a a Stored procedure is taking more time for execution.
stored procedure has some bussiness logic pointing to a table. I have indexes on the table. My doubt how same stored procedure, used on same table with same index in two different production databases taking different times. I understand that database performnce depend upon load onto the datbase. But I am exeuting in non bussiness hours and I think load also almost similar. Its taking 10sec for execution in new named instancwe server and 3 sec in old server. Do I need to defragment the table in new server. will it solve problem. Any idea how shall I check where is the problem in new server.
Edit: when I checked the execution plan it was giving 38% time of exection in eager spool(to create temorary index). Can you please explain me how I can avoid this part in exceutiuon.
I am not getting this whhile executing to the non named instance server(where its taking 3 sec for execution)
Edit: will re building of indexes make any performance improvement
First check : check the execution plans for the queries on either server - do they match?
Edit : Plans do not match, so next thought is the schema (indexes) are not identical, or the stats on the newer instance are not up to date. try an sp_updatestats on the newer instance and see if the plan changes.

Resources