How to speed up a simple SQL Server 2000 query - performance

I have a product table which as many columns. Primary key is productid. There 50,000 rows but when I issue a select statement like select * from products then it is taking 10 minutes to get the full data. So advise me what to do as a result I can run my query faster.

Is your primary key also the clustering key on that table?
If you do a SELECT * .... you'll basically always get a full table scan. There's really nothing that can speed that query up - you want all rows, all columns - so you get it all and it takes the time it takes.
If you do more "focused" queries like
SELECT col1, col2 FROM dbo.Products WHERE SomeColumn = 42
then you have a chance of speeding this up by using the appropriate indices.

Buy a better computer.
Seriously.
SQL Server 2000 has been retired years ago, so this is an OLD install. 50.000 products is a joke - any table below 1 million is nothing.
But when i issue a select statement like select * from products then it is taking 10 minute
to get the full data.
Assuming this is over LAN, not over a slow internet connection, there can be 2 reasons for that:
System is TERRIBLY OVERLOADED. Like SERIOUS overloaded. Not like I have not seen that on old setups. Been there, seen that - hard discs so overloaded (hey, they are SCSI, they are fast) that they took more than 2 seconds to answer to a request.
System is programmed by incompetents. Could be bad transaction level handling leading to terrible locks for long duration which block you. This is possible, but then you are in for a LOT of rework to get the ridiculous code out of the programming.
A select * from table should not take more than a couple of seconds to transfer all the data over LAN. Point. Unless the bale has tons of binary data (i.e. HUGH amounts of data in some fields).
As your local database specialist to make an analysis. Start with hardware load then move to locking behavior. Consider upgrading to a technology that is more modern, By now you are a LOT of generations behind.

Because there's no criterium (WHERE), the time your query takes is not due to the selection (determining which rows to select) but most likely due to the sheer size of the data.
The only solution is:
Do not use SELECT *, but select only the columns you need.

Related

Speed up SQLite 'SELECT EXISTS' query

I've got a 3GB SQLite database file with a single table with 40 million rows and 14 fields (mostly integers and very short strings and one longer string), no indexes or keys or other constraints -- so really nothing fancy. I want to check if there are entries where a specific integer field has a specific value. So of course I'm using
SELECT EXISTS(SELECT 1 FROM FooTable WHERE barField=?)
I haven't got much experience with SQLite and databases in general and on my first test query, I was shocked that this simple query took about 30 seconds. Subsequent tests showed that it is much faster if a matching row occurs at the beginning, which of course makes sense.
Now I'm thinking of doing an initial SELECT DISTINCT barField FROM FooTable at application startup, and caching the results in software. But I'm sure there must be a cleaner SQLite way to do this, I mean, that should be part of what a DBMS's job right?
But so far, I've only created primary keys for speeding up queries, which doesn't work here because the field values are non-unique. So how can I speed up this query so that it works at constant time? (It doesn't have to be lightning fast, I'd be completely fine if it was under one second.)
Thanks in advance for answering!
P.S. Oh, and there will be about 500K new rows every month for an indefinite period of time, and it would be great if that doesn't significantly increase query time.
Adding an index on barField should speed up the subquery inside the EXISTS clause:
CREATE INDEX barIdx ON FooTable (barField);
To satisfy the query, SQLite would only have to seek the index once and detect that there is at least one matching value.

SQL Server Full Table Scan and Load

For the purpose of this question, let's pretend I have the following table:
Transaction:
Id
ProductId
ProductName
City
State
Country
UnitCost
SellAmount
NumberOfTimesPurchased
Profit (NumberOfTimesPurchased * (SellAmount - UnitCost))
Basically, a single de-normalized table with a million plus rows in it. It is important to note that only two columns will ever by updated: Profit and NumberOfTimesPurchased. When a sale is made, the NumberOfTimesPurchased will be updated and the new profit amount will be re-calculated.
Now, I need to do some minimal reporting on this table, which consists of queries that aggregate and group. As an example:
SELECT
City, AVG(UnitCost), AVG(SellAmount),
SUM(NumberOfTimesPurchased), AVG(Profit)
FROM
Transaction
GROUP BY
City
SELECT
State, AVG(UnitCost), AVG(SellAmount), SUM(NumberOfTimesPurchased),
AVG(Profit)
FROM
Transaction
GROUP BY
State
SELECT
Country, AVG(UnitCost), AVG(SellAmount), SUM(NumberOfTimesPurchased),
AVG(Profit)
FROM
Transaction
GROUP BY
Country
SELECT
ProductId, ProductName, AVG(UnitCost), AVG(SellAmount),
SUM(NumberOfTimesPurchased), AVG(Profit)
FROM
Transaction
GROUP BY
ProductId, ProductName
These queries are quick: ~1 second. However, I've noticed that under load, performance significantly drops (from 1 second up to a minute when there are 20+ concurrent requests), and I'm guessing the reason is that each query performs a full table scan.
I've attempted to use indexed views for each query, however my update statement performance takes a beating since each view needs to be rebuilt. On the same note, I've attempted to create covering indexes for each query, but again my update statement performance is not acceptable.
Assuming full table scans are the culprit, do I have any realistic options to get the query time down while keeping update performance at acceptable levels?
Note that I cannot use column store indexes (I'm using the cheaper version of Azure SQL Database). I'd also like to stay away from any sort of roll-up implementation, as I need the data available immediately.
Finally - the example above is not a completely accurate representation of my table. I have 20 or so different columns that can be 'grouped', and 6 columns that can be updated. No inserts or deletes.
Because there are no WHERE clauses on your queries, the database engine can nothing but a table scan (or clustered index scan which is really the same thing). If there were covering indexes with containing all the columns from your query, then the engine would prefer those. If your real queries have WHERE clauses, then appropriate indexing with those columns as the leading columns of the index might help.
But I think your problem lies elsewhere. As far as concurrency goes you haven't put enough money in the meter. According to the main service tiers doc, the Basic tier for Azure SQL Database is for:
... supporting typically one single
active operation at a given time. Examples include databases used for
development or testing, or small-scale infrequently used applications.
Therefore you might want to think about splashing out for Premium edition to support both your concurrency requirement and columnstore indexes, which are perfectly suited to this type of query. Just for fun, I created a test-rig based on AdventureWorksDW2012 to try and recreate your problem which is here. Query performance was atrocious (> 20 secs). I'd be surprised if you weren't getting DTU warnings on your portal:
An upgrade to Standard (S0-S2) did boost performance so you should experiment. You could look at scaling up for busy query times and down when not required.
This table also looks a bit like a fact table, so you might want to consider refactoring this as a fact / dimensional model then use Azure Analysis Services on top to bring that sub-second performance.
Coincidentally there is a feedback item you can vote for to bring columnstore to Standard tier:
https://feedback.azure.com/forums/217321-sql-database/suggestions/6878001-make-sql-column-store-feature-available-for-standa
Recent comments suggest it is "in the work queue" as at May 2017;

Holistic SQL queries (inside Oracle PLSQL) and UX

I have a question about how to handle errors with holistic SQL queries. We are using Oracle PL/SQL. Most of our codebase is row-by-row processing which ends up in extremely poor performance. As far as I understand the biggest problem with that is the context switches between the PL/SQL and SQL engine.
Problem with that is that the user don't know what went wrong. Old style would be like:
Cursor above some data
Fire SELECT (count) in another table if data exists, if not show errormsg
SELECT that data
Fire second SELECT (count) in another table if data exists, if not show errormsg
SELECT that data
modify some other table
And that could go on for 10-20 tables. It's basically pretty much like a C program. It's possible to remodel that to something like:
UPDATE (
SELECT TAB1.Status,
10 AS New_Status
FROM TAB1
INNER JOIN TAB2 ON TAB1.FieldX = TAB2.FieldX
INNER ..
INNER ..
INNER ..
INNER ..
LEFT ..
LEFT ..
WHERE TAB1.FieldY = 2
AND TAB3.FieldA = 'ABC'
AND ..
AND ..
AND ..
AND ..
) TAB
SET TAB.Status = New_Status
WHERE TAB.Status = 5;
A holistic SELECT like that would speed up a lot of things extremely. I changed some queries like that and that stuff went down from 5 hours to 3 minutes but that was kinda easy because it was a service without human interaction.
Question is how would you handle stuff like that were someone fills some form and waits for a response. So if something went wrong they need an errormsg. Only solution that came to my mind was checking if rows were updated and if not jump into another code section that still does all the single selects to determinate that error. But after every change we would have to update the holistic select and all single selects. Guess after some time they would differ and lead to more problems.
Another solution would be a generic errormsg which would lead to hundred calls a day and us replacing 50 variables into the query, kill some of the where conditions/joins to find out what condition filtered away the needed rows.
So what is the right approach here to get performance and still be kinda user friendly. At the moment our system feels unusable slow. If you press a button you often have to wait a long time (typically 3-10 seconds, on some more complex tasks 5 minutes).
Set-based operations are faster than row-based operations for large amounts of data. But set-based operations mostly apply to batch tasks. UI tasks usually deal with small amounts of data in a row by row fashion.
So it seems your real aim should be understanding why your individual statements take so long.
" If you press a button you often have to wait a long time (typically 3-10 seconds on some complexer tasks 5 minutes"
That's clearly unacceptable. Equally clearly it's not possible for us to explain it: we don't have the access or the domain knowledge to diagnose systemic performance issues. Probably you need to persuade your boss to spring for a couple of days of on-site consultancy.
But here is one avenue to explore: locking.
"many other people working with the same data, so state is important"
Maybe your problems aren't due to slow queries, but to update statements waiting on shared resources? If so, a better (i.e. pessimistic) locking strategy could help.
"That's why I say people don't need to know more"
Data structures determine algorithms. The particular nature of your business domain and the way its data is stored is key to writing performative code. Why are there twenty tables involved in a search? Why does it take so long to run queries on these tables? Is STORAGE_BIN_ID not a primary key on all those tables?
Alternatively, why are users scanning barcodes on individual bins until they find one they want? It seems like it would be more efficient for them to specify criteria for a bin, then a set-based query could allocate the match nearest to their location.
Or perhaps you are trying to write one query to solve multiple use cases?

(TSQL) INSERT doubling time of the query

I have a quite complex multi-join TSQL SELECT query that runs for about 8 seconds and returns about 300K records. Which is currently acceptable. But I need to reuse results of that query several times later, so I am inserting results of the query into a temp table. Table is created in advance with columns that match output of SELECT query. But as soon as I do INSERT INTO ... SELECT - execution time more than doubles to over 20 seconds! Execution plans shows that 46% of the query cost goes to "Table Insert" and 38% to Table Spool (Eager Spool).
Any idea why this is happening and how to speed it up?
Thanks!
The "Why" of it hard to say, we'd need a lot more information. (though my SWAG would be that it has to do with logging...)
However, the solution, 9 times out of 10 is to use SELECT INTO to make your temp table.
I would start by looking at standard tuning itmes. Is disk performing? Are there sufficient resources (IOs, RAM, CPU, etc)? Is there a bottleneck in the RDBMS? Does sound like the issue but what is happening with locking? Does other code give similar results? Is other code performant?
A few things I can suggest based on the information you have provided. If you don't care about dirty reads, you could always change the transaction isolation level (if you're using MS T-SQL)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
select ...
This may speed things up on your initial query as locks will not need to be done on the data you are querying from. If you're not using SQL server, do a google search for how to do the same thing with the technology you are using.
For the insert portion, you said you are inserting into a temp table. Does your database support adding primary keys or indexes on your temp table? If it does, have a dummy column in there that is an indexed column. Also, have you tried to use a regular database table with this? Depending on your set up, it is possible that using that will speed up your insert times.

Oracle Transaction - Count table

I have a table where I need to constrain by category and then find all overlapping dates against some date range. This takes about 2 seconds, which is unacceptable to do on every transaction which occurs at roughly 50/s. The alternative is to create some tally table -- then again, I don't know how great of an idea this is because things can get out of sync.
Date on Rent # Rented Category
9/5/2011 5 CATEGORY1
In Oracle (PL/SQL, if it matters), how can I maintain this performance, but ensure that concurrent transactions don't screw up the increment / decrement by making it one less or one more than it really is?
I have two types of transactions, kind of like a search and a rent. Only rents will be updating this tally table (and searches just reading from it). I don't mind if rents slow down, but do not want search performance impacted. Rents can occur as frequently as 5-10 / second.
Oracle uses locks to ensure data consistency during a query. The details of how they work can get complex, but the effect is that it guarantees that an update/insert to your tally table will only use the data in the main table as it was when the query began. If there is another update or insert to your main table while you are doing an update/insert to the tally table, it won't affect it.
You'll have to experiment with your data to see if keeping a summary/tally table helps or hurts you. It really depends on how quickly the main table is getting updated, how much time you spend updating your tally table vs how much time you save by being able to select on it, and how up-to-date you need selects to be.

Resources