A little bit of background. This is a script designed to narrow down a large data set (3+GB files). What I have is a series of SQL queries to create temporary tables for inserting/deleting from other tables.
Here is what the first few queries look like:
Query #1
create table clash as
select *
from
StallConnected
group by Store, Stall, StartTime
having
count(*) > 1;
Query #2
create table OverlappingStarts as
select A.*
from
StallConnected as A
join
clash as B
on
A.Store = B.Store
and
A.Stall = B.Stall
and
A.StartTime = B.StartTime
order by
A.Store, A.Stall, A.StartTime;
Now on to the meat of the issue. I'm executing these queries in sequence using a db connection in python's sqlite3 module on a single thread. Here's the code:
for i, val in enumerate(queries):
print "Step " + str(i + 1) + " of " + steps
db.executescript(val)
db.commit()
I know that executescript() will cause a COMMIT to happen before each statement is executed, but what happens is that it will perform the first query just fine, but the second query will simply hang. No exceptions, nothing.
I know it can't possibly be the timeout happening since this is running on a single thread. It doesn't throw an exception either (obviously, it just hangs). I know it hangs because the db-journal file is only 2KB.
What I've tried:
Committing after every statement
Closing/reopening the connection
Using execute() over executescript()
Using a cursor object over directly calling execute() on the db connection
Any thoughts? Am I doing anything inherently wrong? Windows file locking issue that I don't know about?
EDIT 1: After running the script for the past hour I have found that some of the table has actually been populated. What is the deal here? Running my entire sql script inside DB Browser takes only about 30 seconds, and in python it takes upwards of 1 hour to populate one part of a table?
Side note:
>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.6.21'
>>>
Mystery solved! Evidently my sqlite.dll library was horribly out of date and couldn't efficiently perform joins (something that didn't immediately jump out at me).
#CL thanks for the heads up!
Related
I have come to very interesting problem (at least for me).
When I run following SQL:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
Execution takes 21s
However when I add GO statement as following:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
GO
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
It takes only 2s.
The view used in first SQL statement is based on the table called in second statement.
Could you please explain this behaviour to me? I know that GO statement makes database create separate execution plan for every batch. I have checked the execution plans, and the actual steps are identical.
Thank you!
The GO keyword separates execution batches. If the underlying tables are the same in both queries, and they are executed in the same batch, both queries have to be executed with the same transaction context. This ensures that the underlying data in both tables is the same during both executions.
If using separate batches (GO statement in-between), you cannot guarantee that the data will be consistent in that rows could theoretically be modified in between executions.
If you don't care about the chance of the data changing in between queries, then by all means use GO for performance. If you do care, consider it a dangerous move.
SQL Server applications can send multiple Transact-SQL statements to an instance of SQL Server for execution as a batch. The statements in the batch are then compiled into a single execution plan. Programmers executing ad hoc statements in the SQL Server utilities, or building scripts of Transact-SQL statements to run through the SQL Server utilities, use GO to signal the end of a batch.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-go?view=sql-server-ver15
I've gotten to one of those places where I've been toying with something for a little while trying to figure out why its not working and figured I would ask here. I am currently in the middle of making adjustments to a batch process that involves creating an external table A used for staging and then transferring the data from that table over to Table B for further processing.
There's a step in the batch that was there before to load all that data and it goes like this:
INSERT INTO TABLE B SELECT * FROM TABLE A
Upon running this statement in batch and outside of it in Oracle Developer I get the following error:
Run query ORA-00932: inconsistent datatypes: expected DATE got NUMBER
I went through my adjustments line by line and made sure I had the right data types. I also went over the data itself the best I could and from what I can tell it seems normal also. In an effort to find which individual field could have been having the error, I attempted to load data from Table A to Table B one column at a time...Doing this I received no errors which shocked me somewhat. If I use the SQL below and have all the fields listed out individually, the load of all the data works flawlessly. Can someone explain why this might be? Does the below function perform an internal Oracle working that the previous one does not?
insert into TABLE B (
COLUMN_ONE,
COLUMN_TWO,
COLUMN_THREE
.
.
.)
select
COLUMN_ONE,
COLUMN_TWO,
COLUMN_THREE
.
.
.
from TABLE A;
Well, if you posted description of tables A and B, we could see it ourselves. As it is now, we have to trust what you're saying, i.e. that everything matches (but Oracle disagrees), so I don't know what to say.
On the other hand, I've learnt that using
INSERT INTO TABLE B SELECT * FROM TABLE A
is a poor way of handling things (unless that's a quick & dirty testing). I try to always name all columns I'm working with, no matter how many of them are involved in that very operation. As you noticed, that seems to be working well for you too, so I'd suggest you to keep doing it.
Right now I am doing for my company a migration from Firebird 2.5 to Postgres 9.4 and I also converted Stored Procedures from Firebird into Functions to Postgres...
Now I figured out that the performance is quite slow, but only if there are loops in which I execute more SQLs whith changing parameters.
So for example it looks like this (I simplified it to the necessary things)
CREATE OR REPLACE FUNCTION TEST
(TEST_ID BigInt) returns TABLE(NAME VARCHAR)
AS $$
declare _tmp bigint;
begin
for _tmp in select id from test
loop
-- Shouldn't the following SQL work as a Prepared Statement?
for name in select label
from test2
where id = _tmp
loop
return next;
end loop;
end loop;
end; $$
LANGUAGE plpgsql;
So if I compare the the time it takes to execute just the select inside the loop with Postgres and Firebird then Postgres is usually a bit faster than Firebird. But if the loop runs like 100 or 1000 or 10000 times than the time of the Firebird Stored Procedure is much faster. When I compare the times in Postgres it seemes like if the loop runs 10 times it takes 10 times longer then 1 row and if it runs 1000 times it takes 1000 times longer.... That should not be if it its reallly a Prepared Statement, right?
I checked also other issues like setting the memory settings high, leaving the statement "return next" out because I read that can cause a performance problem also....
It has also nothing to do with the "returns table" expression. If I leave that out it takes also the same time..
Nothing worked so far...
Of course this simple example could be solved also with one SQL, but the functions I migrated are much more complicated and I don't want to change the whole functions into something new (if possible)...
Am I missing something?
PL/pgSQL reuses prepared queries across function invocations; you only incur preparation overhead once per session. So unless you've been reconnecting between each test, the linear execution times are expected.
But it may also reuse execution plans, and sometimes this does not work to your advantage. Running your query in an EXECUTE statement can give better performance, despite the overhead of repreparing it each time.
See the PL/pgSQL documentation for more detail.
Finally got it... It was an index problem but it makes not complete sense to me...
Because if I executed the SQLs outside the function they were even faster than Firebird with Indizes. Now they are outside the functions in Postgres even twice as fast as before and the functions works now also really fast. Also faster as in Firebird...
The reason why I also was not considering this is because in Firebird the Foreign Keys also works as indizes. I expected would be the same in Postgres but it's not...
Should have really considered that earlier also because of the comments of Frank and Pavel.
Thanks to all anyways...
What am I missing here? I am trying to test identifying long running queries.
I have a test table with about 400 million rows called mytest.
I ran select * from mytest in sqlplus
In another window, I ran the script below to see my long running query
select s.username, s.sid, s.serial#, s.schemaname,
s.program, s.osuser, s.status, s.last_call_et
from v$session s
where last_call_et >= 1 – this is just for testing
My long running query does not show up in the result from the query above. If I change the criteria to be >=0, then I see my query showing the status as INACTIVE and last_call_et of 0 despite the fact that the query is still running. What can I do to see my long running queries like the select * from... above so that I can kill it?
Thanks
First, you need to understand what a query like select * from mytest is really doing under the covers because that's generally not going to be a long-running query. Oracle doesn't ever need to materialize that result set and isn't going to read all the data as the result of a single call. Instead, what goes on is a series of calls each of which cause Oracle to do a little bit of work. The conversation goes something like this.
Client: Hey Oracle, run the query for me: select * from mytest
Oracle: Sure thing (last_call_et resets to 0 to reflect that a new call started). I've generated a query plan and opened a cursor,
here's a handle (note that no work has been done yet to actually
execute the query)
Client: Cool, thanks. Using this cursor handle,
fetch me the next 50 rows (the fetch size is a client-side setting)
Oracle: Will do (last_call_et resets to 0 to reflect that a new call started). I started full scanning the table, read a couple of
blocks, and got 50 rows. Here you go.
Client: OK, I've processed
those. Using this cursor handle, fetch the next 50 rows
Repeat until all the data is fetched
At no point in this process is Oracle ever really being asked to do more than read a handful of blocks to get the 50 rows (or whatever the fetch size the client is requesting). At any point, the client could simply not request the next batch of data so Oracle doesn't need to do anything long-running. Oracle doesn't track the application think time between requests for more data-- it has no idea whether the client is a GUI that is in a tight loop fetching data or whether it is displaying a result to a human and waiting for a human to hit the "next" button. The vast majority of the time, the session is going to be INACTIVE because it's mostly waiting for the client to request the next batch of data (which it generally won't do until it had formatted the last batch of data for display and done the work to display it).
When most people talk about a long-running query, they're talking about a query that Oracle is actively processing for a relatively long time with no waits on a client to fetch the data.
You can use the below script to find long running query:
select * from
(
select
opname,
start_time,
target,
sofar,
totalwork,
units,
elapsed_seconds,
message
from
v$session_longops
order by start_time desc
)
where rownum <=1;
I have a weird problem right now that if a ref cursor returned from a stored procedure that has only 1 record in it, the fetch operation will hang and freeze. The stored procedure execution was really fast, just the fetching process hangs. If the ref cursor has more than 1 record, then everything is fine. Does anyone have similar issues before?
The Oracle server is 11g running on Linus. The client is Windows Server 2003. I'm testing this using the generic Oracle sqlplus tool on the Windows Server.
Any help and comments would be greatly appreciated. thanks.
When you say hangs, what do you mean ?
If the session is still active in the database (status in V$SESSION), then it is probably waiting on some event (eg SQL*Net from client means it is waiting for the client to do something).
It may be that the query is taking a long time to find that there aren't any more rows. Consider a table of 10,000,000 rows with no indexes. The query may full scan the table and find the first row matches the criteria. It still has to scan the next 9,999,999 rows to find that they don't. That can take a while.
Since you are saying that the process hangs, Is there a chance that your cursor does a "select for Update" instead of "Select " ? Since you are saying that the fetch of multiple records does not cause this error, that might not be the case.
Can you show us the code (or a reproducible small test/sample) for your select and the fetch.
Also, you can check the v$locked_objects using the following query and giving in your table name(s) to see if the object in question is being locked. Again, unless your current query has "for update" this fetch should not hang.
select do.*
from v$locked_objects vo,
dba_objects do
where vo.object_id = do.object_id
and vo.object_name = '<your_table_name>'