SQL Server 2008 large table performance - performance

I have this relatively large table in a separate filegroup (2 GB, well, it's not THAT large but large enough I think to start thinking about performance as it's a heavy duty table).
This is the only table in this filegroup.
Right now the filegroup contains only one datafile.
Assuming the table is well-indexed and that index fragmentation is almost zero, would it increase performance (for select and insert statements) if I split the filegroup into two datafiles, BUT having those two datafiles reside on the same physical disk (as I don't have an array of disks at my disposal) ?
Or is a split into multiple files only an improvement when you can split those files over separate physical disks ?
Thanks for any replies.
ps: must add that we're using standard edition so table partitioning is a no-go
Mathieu

You really need to have separate spindles/LUNs if you're going to split index/data
For busting the "one thread per file" myth, read these from Paul Randall.

For the situation you have described, I doubt you could measure the difference accurately, since it would be insignificant. You would need a high end database with specific heavy workloads to entertain the thoughts that you are suffering SGAM / GAM contention.
GBN is right in indicating that you need it on seperate spindles to see a suitable difference.

Related

Will Shrinking and lowering the high water mark cause issues in OLTP systems

newb here, We have an old Oracle 10g instance that they have to keep alive until it is replaced. The nightly jobs have been very slow causing some issues. Every other Week there is a large process that does large amounts of DML (deletes, inserts, updates). Some of these tables have 2+ million rows. I noticed that some of the tables the HWM is higher than expected and in Toad I ran a database advisor check that recommended shrinking some tables, but I am concerned that the tables may need the space for DML operations or will shrinking them make the process faster or slower?
We cannot add cpu due to licensing costs
If you are accessing the tables with full scans and have a lot of empty space below the HWM, then yes, definitely reorg those (alter table move). There is no downside, only benefit. But if your slow jobs are using indexes, then the benefit will be minimal.
Don't assume that your slow jobs are due to space fragmentation. Use ASH (v$active_session_history) and SQL monitor (v$sql_plan_monitor) data or a graphical tool that utilizes this data to explore exactly what your queries are doing. Understand how to read execution plans and determine whether the correct plan is being used for your data. Tuning is unfortunately not a simple thing that can be addressed with a question on this forum.
In general, shrinking tables or rebuilding indexes should speed up reads of the table, or anything that does full table scans. It should not affect other DML operations.
When selecting or searching data, all of the empty blocks in the table and any indexes used by the query must still be read, so rebuilding them to reduce empty space and lower the high water mark will generally improve performance. This is especially true in indexes, where space lost to deleted rows is not recovered for reuse.

How expensive is a query in terms of TEMP tablespace?

I have a few sprocs that execute some number of more complex queries and liberally use collections.
My DBA is complaining that they occasionally consume a S#$%ton of in-memory TEMP tablespace.
I can perform optimizations on the queries but i also wish to be as noninvasive as possible and to do this i need to see the effects my changes have on the TEMP tablespace.
QUESTION:
How can i see what cost my query has on the TEMP tablespace?
One thing to consider is i dont have DBA access.
Thanks in advance.
Depends what you mean by the cost your query has on temp.
If you can select from v$tempseg_usage, you can see how much space you are consuming in temp - on a DEV database there is no reason your DBA cannot give you access to that view.
As was mentioned by gpeche - autotrace will give you a good idea about how many IOs you are doing from temp - so that combined with the space usage will give you a good idea about what is going on.
Large collections are generally a bad idea - they consume a lot of memory in the PGA (which is very different from TEMP) which is shared by all the other sessions - this will be what your DBA is concerned about. How large is large depends on your system - low thousands of small records probably isn't too bad, but 100's of thousands or millions of records in a collection and I would be getting worried.
Before doing all kinds of interesting queries and tricks, estimate the data volume that should be sorted, after filtering. If this is larger than what fits in the sort area, the sort will move blocks from memory to temp and read them back later. Add a little overhead to the raw data size; use 30% overhead. This should give a reasonable estimation for the needed total sort size.
Use the same strategy for collections. There has to be room for the data somewhere, there is no magic/compression that makes your data volume smaller. If you have memory for 1000 rows max and try to use it with 1000.000 rows it won't fit. In that case talk to your dba and try to find a solution. It could be that you end up partitioning your workload.
Without having DBA access, I would try with AUTOTRACE. It will not give you TEMP tablespace consumption, but you can get a lot of useful information for tuning your queries (logical reads, number of sorts to disk, recursive SQL, redo consumption, network roundtrips). Note that you need some privileges granted to use AUTOTRACE, but not full DBA rights.
While your query is running you can query v$sql_workarea_active, or after it has run you can query v$sql_workarea.
These will show you the temp tablespace usage in terms of memory used, disk space used, and (most importantly) the number of passes (space usage is only part of the issue -- multipass sorts are very expensive), and correlate the usage to steps in the explain plan.
You can then consider whether modifying memory management would help you reduce temp tablespace usage both in terms of absolute space used and in the pass count.

Storing arrays of integers in database

I am creating a database that will store 100.000 (and probably more in the future) users. While this obviously happens in a table with 1 row per user, every user can (and will) store hundreds of items. In programming language this would mean the user has 2 arrays (or one 2-dimensional array) of integers: a column for the itemid's and a column for the amounts.
My instincts tell me to create a table to hold all these items, with rows like (userid, itemid, amount). However this would result in a huge table. 200.000 users with 250 items each... that's 50 million entries in one table. This, plus the fact that the table will undergo continuous and rapid change, frightens me. (How rapid? I estimate up to 100 modifications per second.)
Typically there will be anywhere between 100 and 2000 users, all adding and removing items, and modifying amounts. These actions can and will happen in programming code. It would go as follows:
User starts session, program loads all the users items from the database
User modifies the item list
Every few minutes, the changes are saved into the database
When the user ends the session, it is also saved into the database
It is worth noting that there is a maximum to the number of items a user can store.
Are there any alternatives to using a separate table? Perhaps save the values in a formatted text string? Or is this one of the instances where using a MySQL database is actually a Bad Idea™?
Thank you for your time and insights.
My instincts tell me to create a table to hold all these items
Your instincts are right.
1) avoid premature optimisation
2) don't break the rules of normalization unless you've got a very good and real reason to do so
3) why do you suspect that the multi-table approach will be faster?
that's 50 million entries in one table
So what? Even if you only have an index on userid, the difference in performance compared with a single table per user will not be noticeably slower (in practice, with 200,000 users, it will be much, much faster - since the DBMS can comfortably keep an open file handle for each table!).
I estimate up to 100 modifications per second
Should be possible using MySQL and fairly basic hardware, but if it were me, and I wanted a bit of headroom, I'd go with a pair of mirrored SATA disks, tables on one mirror, indexes on the other.
The only issue I'd be concerned about (which applies regardless of which of the 2 models you choose) is supporting 2000 concurrent connections. Do the connections have to be concurrent? Or can each user download a working set (optionally using an optimistic locking strategy) and close off the connection, then push back the changes on a new connection? If not, then you'll probably want a good whack of memory and CPU.
But leaving aside whether to use one big table or lots of little ones, if this is the only use for the data, and access is not concurrent to particular data items, then why bother with a relational database at all? NoSQL or a shared filesystem might work just as well.
Putting data into one field as a array is alwmost always a mistake. It makes querying the data much harder and much more timeconsuming as well as much less likely to use indexes. It is ok, if the values were just text where you would never need to find one or more elements fo the array but it is my experience that this situation is rarely encountered. Modern databases can handle 50 million records without even breaking a sweat. That's a small table in daatbase terms.
It should be OK to do it as you described using two tables. The database should be able to handle millions of records.
The important points to look at:
1- Optimize your queries as much as possible.
2- Create the appropriate index(es) to speed up your queries.
3- Use InnoDB if you have concurrent read/update operations as it supports row-level locking as opposed to MyISAM.
4- Provide good hardware to support the database server.
5- Run the database server on a dedicated server if affordable.

Is there any logical reason of having different tablespace for indexes?

Hi Can some let me know why we created different table space for Index and data.
It is a widespread belief that keeping indexes and tables in separate tablespaces improves performance. This is now considered a myth by many respectable experts (see this Ask Tom thread - search for "myth"), but is still a common practice because old habits die hard!
Third party edit
Extract from asktom: "Index Tablespace" from 2001 for Oracle version 8.1.6 the question
Is it still a good idea to keep indexes in their own tablespace?
Does this inhance performance or is it more of a recovery issue?
Does the answer differ from one platform to another?
First part of the Reply
Yes, no, maybe.
The idea, born in the 1980s when systems were tiny and user counts were in the single
digits, was that you separated indexes from data into separate tablespaces on different
disks.
In that fashion, you positioned the head of the disk in the index tablespace and the head
of the disk in the data tablespace and that would be better then seeking 2 times on the
same disk.
Drives back then were really slow at seeking and typically measured in the 10's to 100's
of megabytes (if you were lucky)
Today, with logical volumes, raid, NN gigabyte (nn is rapidly becoming NNN gigabytes)
drives, hundreds/thousands of concurrent users, thousands of tables, 10's of thousands of
indexes - this sort of "optimization" is sort of impossible.
What you strive for today is to be able to manage things, to spread IO out evenly
avoiding hot spots.
Since I believe all things should be in locally managed tablespaces with UNIFORM extent
sizes, I would say that yes, indexes would be in a different tablespace from the data but
only because they are a different SIZE then the data. My table with 50 columns and an
average row size of 4k might belong in a tablespace that has 5meg extents whereas the
index on a single number column might belong in a tablespace with 512k or 1m extents.
I tend to keep my indexes separate from the data but for the above sizing reason. The
tablespaces frequently end up on the same exact mount points. You strive for even io
across your disks and you may end up with indexes and data on the same devices.
It makes a sense in 80s, when there were not to many users and the databases size was not too big. At that time it was usefull to store indexes and tables in the different physical volumes.
Now there are the logical volumes, raid and so on and it is not necessary to store the indexes and tables in different tablespaces.
But all tablespaces must be locally managed with uniform extends size. From this point of view the indexes must be stored in different tablespace as the table with the 50 columns could be stored in the tablespace with 5Mb exteds size, when the tablespace for indexes will be enought 512Kb extended size.
Performance. It should be analyzed from case to case. I think that keeping all toghether in one tablespace becomes another myth too! It should be enough spindles, enough luns and take care of queuing in operating system. if someone thinks that making one tablespace is enough and is the same like many tablespaces without taking in consideration all other factors, means again another myth. It depends!
High Avalilability. using separate tablespaces can improve high availability of the system in case that some file corrution, files system corruption, block corruption. If the problem occures only at index tablespace there is achance to do the recovery online and our application still beeing available to the customer. see also: http://richardfoote.wordpress.com/2008/05/02/indexes-in-their-own-tablespace-recoverability-advantages-get-back/
using separate tablespaces for indexes, data, blobs, clobs, eventually some individual tables can be important for the manageability and costs. We can use our storage system to store our blobs, clobs, eventually archive to a different layer of storage with different quality of service

Database speed optimization: few tables with many rows, or many tables with few rows?

I have a big doubt.
Let's take as example a database for a whatever company's orders.
Let's say that this company make around 2000 orders per month, so, around 24K order per year, and they don't want to delete any orders, even if it's 5 years old (hey, this is an example, numbers don't mean anything).
In the meaning of have a good database query speed, its better have just one table, or will be faster having a table for every year?
My idea was to create a new table for the orders each year, calling such orders_2008, orders_2009, etc..
Can be a good idea to speed up db queries?
Usually the data that are used are those of the current year, so there are less lines the better is..
Obviously, this would give problems when I search in all the tables of the orders simultaneously, because should I will to run some complex UNION .. but this happens in the normal activities very rare.
I think is better to have an application that for 95% of the query is fast and the remaining somewhat slow, rather than an application that is always slow.
My actual database is on 130 tables, the new version of my application should have about 200-220 tables.. of which about 40% will be replicated annually.
Any suggestion?
EDIT: the RDBMS will be probably Postgresql, maybe (hope not) Mysql
Smaller tables are faster. Period.
If you have history that is rarely used, then getting the history into other tables will be faster.
This is what a data warehouse is about -- separate operational data from historical data.
You can run a periodic extract from operational and a load to historical. All the data is kept, it's just segregated.
Before you worry about query speed, consider the costs.
If you split the code into separate code, you will have to have code that handles it. Every bit of code you write has the chance to be wrong. You are asking for your code to be buggy at the expense of some unmeasured and imagined performance win.
Also consider the cost of machine time vs. programmer time.
If you use indexes properly, you probably need not split it into multiple tables. Most modern DBs will optimize access.
Another option you might consider is to have a table for the current year, and at the end append the data to another table which has data for all the previous years. ?
I would not split tables by year.
Instead I would archive data to a reporting database every year, and use that when needed.
Alternatively you could partition the data, amongst drives, thus maintaining performance, although i'm unsure if this is possible in postgresql.
For the volume of data you're looking at splitting the data seems like a lot of trouble for little gain. Postgres can do partitioning, but the fine manual [1] says that as a rule of thumb you should probably only consider it for tables that exceed the physical memory of the server. In my experience, that's at least a million rows.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html
I agree that smaller tables are faster. But it depends on your business logic if it makes sense to split a single entity over multiple tables. If you need a lot of code to manage all the tables than it might not be a good idea.
It also depends on the database what logic you're able to use to tackle this problem. In Oracle a table can be partitioned (on year for example). Data is stored physically in different table spaces which should make it faster to address (as I would assume that all data of a single year is stored together)
An index will speed things up but if the data is scattered across the disk than a load of block reads are required which can make it slow.
Look into partitioning your tables in time slices. Partitioning is good for the log-like table case where no foreign keys point to the tables.

Resources