Okay, I appreciate that the question is a tad vague, but after a day of googling, I'm getting nowhere, any help will be appreciated, and I'm willing to try anything.
The issue is that we have a PostgreSQL db, that has arount 10-15 million rows in a particular table.
We're doing a select for all the columns, based on a DateTime field in the table. No joins, just a standard select with a where clause (time >= x AND time <= y). There is an index on the field as well...
When I perform the sql using psql on the local machine, it runs in around 15-20 seconds, and brings back .5 million rows, one of which is a text field holding a large amount of data per row (a program stack trace). When we use the same sql and run it through Npgsql, or pgadmin III on windows, it takes around 2minutes.
This is leading me to think that it's a network issue. I've checked on the machine when the query is running and it's not using a huge amount of memory or CPU, and the network speed is negligible.
I've gone through the recommendations on the Postgres Site for the memory settings as well. including updating shmmax and shmall.
It's Ubuntu 10.04, PSQL 8.4, 4GB RAM, 2.8GHz Quad Xeon (virtual but dedicated resources). the machine has it's windows counterpart (2008 R2, SS2008) on there as well, but turned off. The Query returns in around 10-15 seconds using SS with the same schema and data, I know this wouldn't be a direct comparison, but wanted to show that it wasn't a disk performance issue.
So the question is... any suggestions? Are there any network settings I should be changing? Anything that I've missed? I can't give too much information about the database, but here is a EXPLAIN ANALYZE that's obfuscated...
Index Scan using "IDX_column1" on "table1" (cost=0.00..45416.20 rows=475130 width=148) (actual time=0.025..170.812 rows=482266 loops=1)
Index Cond: (("column1" >= '2011-03-14 00:00:00'::timestamp without time zone) AND ("column1" <= '2011-03-14 23:59:59'::timestamp without time zone))
Total runtime: 196.898 ms
Try setting cursor_tuple_fraction to 1 in psql and see if it changes the results. If so, then it is likely that the optimiser is picking a better plan based on only getting the top 10% or so of results compared to getting the whole lot. Istr psql uses a cursor to fetch results piece by piece rather than the "firehose" executequery method.
If this is the case, it doesn't point directly to a solution, but you will need to tweak your planner settings, and at least if you can reproduce the behaviour in psql than it may be easier to see the differences and test changes.
Related
I have been a problem in Cassandra. Please help me..
I am executing Select statement at 500K rows table at intervals 1 millisecond. After some time I get message "All host(s) tried for query failed. First host tried, 10.1.60.12:9042: Host considered as DOWN. See innerErrors".
I run select statement the fallowing:
select * from demo.users
this returning to me 5K rows. There are 500K rows in the users table.
I don't know what is wrong. I have not changed the cassandra.yaml file.
I need to make settings for the memory cache? There is too much disk i/o when I run select statement.
Please help me
A range query (select * with no primary key or token ranges) can be a very expensive query that has to hit at least 1 of every replica set (depend on size of dataset). If your trying to read the entire dataset or doing batch processing either be best to use spark connector or behave like it, and query individual token ranges to prevent putting too much load on coordinators.
If you are going to be using inefficient queries (which is fine, just don't expect the same throughput as normal reads) you will probably need more resources or some specialized tuning. You could add more nodes or look into whats causing it to go DOWN. Most likely its GCs from heap load, so can check GC log. If you have the memory available you can increase heap. Would be good idea to max heap size since with reading everything, system caches are not going to be as meaningful. Use G1 once over 16gb (which you should be) in the jvm.options.
Not sure if this question has been already asked. I face this problem where the 1st hit from the website to an Oracle SP takes a lot of time. Subsequent accesses work just fine.
The SP i'm taking about here is a dynamic SP used for Search functionality(With different search criteria selection option available)
1st access time ~200 seconds
subsequent access time ~20 to 30 seconds.
Stored Procedure logic on a high level.
Conditional JOINS are appended based on some logics.
Dynamic SQL and cursor used to retrieve data.
Any help to start tackling these kind of issues is very helpful..
Thanks,
Adarsh
The reason why it takes only a few secs to execute the query after the first run is that Oracle caches the results. If you change the SQL then Oracle considers it a different query and won't serve the results from the cache but executes the new query (even formatting the code again or adding a space in between will be a difference).
It is a hard question how to speed up first execution. You'll need to post your query and explain plan and probably you'll have to answer further questions if you want to get help on that.
Initiation
I have a SQL Server Express 2008 R2 running. There are ten users who read / write permanently to the same tables using Stored Procedures. They do this day and night.
Problem
The performance of the Stored Procedures is getting lower and lower with increasing database size.
A Stored Procedure call needs avg 10ms when the database size is about 200MB.
The same call needs avg 200ms when the database size is about 3GB.
So we have to cleanup the database once a month.
We already did index optimization for some tables with positive effects but the problem still exists.
Finally im not a SQL Server expert. Could you give me some hints to start getting rid of this performance problem?
Download and read Waits and Queues
Download and follow the Troubleshooting SQL Server 2005/2008 Performance and Scalability Flowchart
Read Troubleshooting Performance Problems in SQL Server 2005
The SQL Server Express Edition limitations (1GB memory buffer pool, only one socket CPU used, 10GB database size) are unlikely to be the issue. Application design, bad queries, excessive locking concurrency and poor indexing are more likely to be the problem. The linked articles (specially the first one) include methodology on how to identify the bottleneck(s).
This is MOST likely simple a programmer mistake - sounds like you simply do either have:
Non proper indexing on some tables. THis is NOT optimization - bad indices is like broken HTML for web people, if you have no index then basically you are not using SQL as it is supposed to be used, you should always have proper indexes.
Not enough hardware, such as RAM. yes, it can manage a 10gb database, but if your hot set (the suff accessed all the time) is 2gb and you have only 1gb it WILL hit disc more often than it needs.
Slow discs, particularly a express problem because most people do not bother to get a proper disc layout. THen they run a sQL database againnst a slow 200 IOPS end user disc where - depending on need - a SQL database wants MANY spindles or an SSD (typical SSD these days has 40.000 IOPS).
That is it at the end - plus possibly really bad SQL. Typical filter error: somefomula(field) LIKE value, which means "forget your index, please, make a table scan and calculate someformula(field) before checking".
First, SQL Server Express is not the best edition to your requierement. Get a Developer's Edition to test it. Its exactly like the Enterprise but free if you dont use on "production".
About the performance, there are so many things involved here, and you can improve it using, since indexes until partitioning. We need more info to provide help
Before Optimizing your SQL queries, you need to find the hotspot of the queries. Usually you can use SQL Profiler to do this on SQL Server. For Express edition, there's no such tool. But you can walk around by using a few queries:
Return all renct query:
SELECT *
FROM sys.dm_exec_query_stats order by total_worker_time DESC;
Return only top time consuming queries:
SELECT total_worker_time, execution_count, last_worker_time, dest.TEXT
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY total_worker_time DESC;
Now you should know which query needs to be optimized.
May be poor indexes,Poor design of database, may not apply normalization,unwanted column indexes,poor queries which take much time to execute.
SQLExpress is built for testing purposes and the performance is directly limited by Microsoft, If you use it in a production environment you may want to get a license for SQL Server.
Have a look here SQL Express for production?
I have a quite complex multi-join TSQL SELECT query that runs for about 8 seconds and returns about 300K records. Which is currently acceptable. But I need to reuse results of that query several times later, so I am inserting results of the query into a temp table. Table is created in advance with columns that match output of SELECT query. But as soon as I do INSERT INTO ... SELECT - execution time more than doubles to over 20 seconds! Execution plans shows that 46% of the query cost goes to "Table Insert" and 38% to Table Spool (Eager Spool).
Any idea why this is happening and how to speed it up?
Thanks!
The "Why" of it hard to say, we'd need a lot more information. (though my SWAG would be that it has to do with logging...)
However, the solution, 9 times out of 10 is to use SELECT INTO to make your temp table.
I would start by looking at standard tuning itmes. Is disk performing? Are there sufficient resources (IOs, RAM, CPU, etc)? Is there a bottleneck in the RDBMS? Does sound like the issue but what is happening with locking? Does other code give similar results? Is other code performant?
A few things I can suggest based on the information you have provided. If you don't care about dirty reads, you could always change the transaction isolation level (if you're using MS T-SQL)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
select ...
This may speed things up on your initial query as locks will not need to be done on the data you are querying from. If you're not using SQL server, do a google search for how to do the same thing with the technology you are using.
For the insert portion, you said you are inserting into a temp table. Does your database support adding primary keys or indexes on your temp table? If it does, have a dummy column in there that is an indexed column. Also, have you tried to use a regular database table with this? Depending on your set up, it is possible that using that will speed up your insert times.
I am trying to understand the performance of a query that I've written in Oracle. At this time I only have access to SQLDeveloper and its execution timer. I can run SHOW PLAN but cannot use the auto trace function.
The query that I've written runs in about 1.8 seconds when I press "execute query" (F9) in SQLDeveloper. I know that this is only fetching the first fifty rows by default, but can I at least be certain that the 1.8 seconds encompasses the total execution time plus the time to deliver the first 50 rows to my client?
When I wrap this query in a stored procedure (returning the results via an OUT REF CURSOR) and try to use it from an external application (SQL Server Reporting Services), the query takes over one minute to run. I get similar performance when I press "run script" (F5) in SQLDeveloper. It seems that the difference here is that in these two scenarios, Oracle has to transmit all of the rows back rather than the first 50. This leads me to believe that there is some network connectivity issues between the client PC and Oracle instance.
My query only returns about 8000 rows so this performance is surprising. To try to prove my theory above about the latency, I ran some code like this in SQLDeveloper:
declare
tmp sys_refcursor;
begin
my_proc(null, null, null, tmp);
end;
...And this runs in about two seconds. Again, does SQLDeveloper's execution clock accurately indicate the execution time of the query? Or am I missing something and is it possible that it is in fact my query which needs tuning?
Can anybody please offer me any insight on this based on the limited tools I have available? Or should I try to involve the DBA to do some further analysis?
"I know that this is only fetching the
first fifty rows by default, but can I
at least be certain that the 1.8
seconds encompasses the total
execution time plus the time to
deliver the first 50 rows to my
client?"
No, it is the time to return the first 50 rows. It doesn't necessarily require that the database has determined the entire result set.
Think about the table as an encyclopedia. If you want a list of animals with names beginning with 'A' or 'Z', you'll probably get Aardvarks and Alligators pretty quickly. It will take much longer to get Zebras as you'd have to read the entire book. If your query is doing a full table scan, it won't complete until it has read the entire table (or book), even if there is nothing to be picked up in anything after the first chapter (because it doesn't know there isn't anything important in there until it has read it).
declare
tmp sys_refcursor;
begin
my_proc(null, null, null, tmp);
end;
This piece of code does nothing. More specifically, it will parse the query to determine that the necessary tables, columns and privileges are in place. It will not actually execute the query or determine whether any rows meet the filter criteria.
If the query only returns 8000 rows it is unlikely that the network is a significant problem (unless they are very big rows).
Ask your DBA for a quick tutorial in performance tuning.