In our application (Oracle based) we are handling high volume of data. For few major tables, we are using separate tablespace but for the remaining tables default tablespace is being used.
My query is,
a. Is it good (In terms of performance) to have separate tablespace for every table (where the number of records are more than million)
b. Or we can define a separate tablespace instead of default table space for the remaining tables.
c. Does it affect the performance if default tablespace is used for the high volume tables?
Any suggestion would be appreciated.
Usually nowadays you don't have dedicated disc for your tablespaces, data are stored in a storage network (SAN) and you don't have any fixed relation to a physical file system. Thus the distribution of tablespaces is less sensitive or critical as in earlier day - as long as you don't have very special or very big data.
For example I have an application where I get every day about 1 billion records, i.e app. 150GB. There I use one tablespace (i.e. 150 cycling tablespaces) for each daily partition. The main reason is easier maintenance, for example truncating old data.
SANs seem to have killed this debate, however, when you consider things like locally managed tablespaces, there is a trend to use these to get tables to inherit storage properties. For example, it is very common these days to see tablepsaces like LM_SMALL_TABLE, LM_MEDIUM_TABLE, LM_LARGE_TABLE (and similar for index), or LM_16k_TABLE, LM_1M_TABLE, LM_10M_TABLE, LM_100M_TABLE and similar for indexes. These will have initial and next extents set to 16k, 1m, etc. Tables are then placed in the tablespace that is appropriate to expected volume. You sometimes see database where there is archived/read-only data moved to cheaper disk by moving the table/partition to such tablespaces. The only time I've seen 1 tablespace per table was on 8i, where the client wanted had this so that a particular table could be restored to a backup, by restoring just the tablespace/datafiles.
Related
Could you please help me with the following issue?
I have installed vertica cluster. I can't understand how I can restrict database size in time or in size. For example, data in a database must be deleted older than 30 days or when a database size of 100 GB is reached (what comes first)
There is no automated way of doing this, and no logical way of "restricting database size". You can't just trim "data" from a "database".
What are you talking about (in terms of limiting data outside of 30 days old) needs to be done on the table level. You would need some kind of date field and delete anything older than 30 days. However, I would advice against deleting rows in this way. It is non-performant and can cause queries against the table to be slow: see DELETE and UPDATE Performance Considerations. The best way of doing this would be to partition the table by day, and create an automated script (bash, python, etc) to each day drop the partition that corresponds with the date 30 days ago: see Dropping Partitions.
As for deleting data—if the size of the "database" goes above 100GB—this requirement is extremely vague and would be impossible to enforce. Let's say you have 50 tables, and the size of several of those tables grows so that the total size of the database is over 100GB, how would you decide which table to prune? This also must be done on a table by table level (or in this case—technically—on a projection level, since that is where the data is actually stored).
To see the compressed size (size on disk) of the database you can use this query:
SELECT SUM(used_bytes) / ( 1024^3 ) AS database_size_gb
FROM projection_storage;
However, since data can only be deleted with a DELETE or DROP PARTITION statement on a table, it would also be helpful to see the size of each table. You can do this by using this query:
SELECT projection_schema, anchor_table_name, SUM(used_bytes) / ( 1024^3 ) AS table_size_gb
FROM projection_storage
GROUP BY 1, 2
ORDER BY 3 DESC;
From the results you can decide which tables you want to prune.
A couple of notes (as a Vertica DBA):
Data is stored in projections. Having too many projections on a single table can not only cause queries to be slow but will also increase the overall data footprint. Avoid using too many projections (especially too many superprojections, don't have more than two per table, and most tables will only need one). Use the database designer or follow the guidelines in the documentation for creating custom projections: Design Fundamentals.
Also, another trick to keep database size down is to use the DESIGNER_DESIGN_PROJECTION_ENCODINGS function. Unless your projections are created with the database designer, they will likely only contain the auto encoding. Using the DESIGNER_DESIGN_PROJECTION_ENCODINGS function will help you to pick the most optimal encoding for each column. I have seen properly encoded projections take up a mere 2% disk size compared to the previously un-optimized projection. That is rare, but in my experience you will still see at least a 20-40% reduction in size. Do not be afraid to use this function liberally. It is one of my favorite tools as a Vertica DBA.
Also
Overview:
We have a table in a database (Oracle) which is occupying pretty much 80% of the total space allocated to the database. This particular table stores data in a JSON blob format and contains trace information, mostly related to how the central engine works in the application.
Problem:
For certain clients, this trace information has grown at a rapid space to a point that this table now holds almost a terabyte of trace information. This has caused concerns related to high cost of storage as well as degraded query performance.
Question:
What are the best choices to tackle this scenario:
Use NoSQL to store the JSON blob data ? Not sure if addresses the cost though.
Store the JSON blob data in a cheaper object store and use Lucene/Elastic search etc for index and search? We do have an in house EMC object store appliance (ECS) that can communicate over S3 API. ECS is cheaper than SAN storage, however I am not sure about the performance though.
Use Oracle partitioning to store the JSOB Blob in a separate tablespace ? It was a suggested by a DBA sometimes ago, but I am not sure if it is going to address either the cost or performance aspect of the problem.
Any other suggestion ?
Database compression may be the simplest way to quickly decrease the storage:
LOB compression. If you have the Advanced Compression license, you can enable compression for the column. This might be somewhat related to suggestion #3, except that changing the tablespace doesn't matter. (Maybe the DBA was referring to a tablespace that was setup for compression. But I don't think tablespace compression automatically compresses LOBs.)
--These commands may take a long time to complete.
alter table table_name modify column_name(a) (compress);
alter table table_name move;
UTL_COMPRESS.LZ_COMPRESS. If you don't have the Advanced Compression license, you could use the package UTL_COMPRESS to compress and decompress the data. Although that might significantly change the way the table is used.
The nature of my application involves daily deleting and bulk inserting of large datasets into an Oracle 12c database. My tables are interval-partitioned by a date field and partitioned-indexed. I use a stored procedure to gather statistics for the affected partitions after each run. Lately, I found that the runs have been slowing down considerably and was wondering if this was due to the increasing size of the database.
I have searched for how to calculate the total disk space that my tables use and usually arrive at this:
select sum(bytes)/1024/1024/1024
from dba_segments
where owner='SCHEMA' and segment_name in ('TABLE_A', 'TABLE_B');
However, the numbers were huge and do not reflect the actual data volume used. When we exported the tables for restoration to another database, the file was much smaller than that query suggests. I dug deeper and arrived at this query instead:
select partition_name,
blocks*8/1024 size_m,
num_rows*avg_row_len/1024/1024 occ_m,
blocks*8/1024 - num_rows*avg_row_len/1024/1024 wast_m
from dba_tab_partitions
where table_name='TABLE_A';
This query suggests that there is a "wasted" space concept where after performing bulk inserts and deleting the data before it is replaced again, the space used is not reclaimed.
Thus I have the following questions:
Does the "wasted" space contribute to performance degradation when I
perform delete from table where ..?
Is there a difference between
performing a delete from table where .. as compared to dropping
the partitions with regard to "wasted" space?
Is performing table reorganization / defragmentation on a regular basis to reclaim table space a recommended practice?
Does the "wasted" space contribute to performance degradation when I perform delete from table where ..?
Yes, you are deleting from table Oracle has to to perform Full Tabl Scan/Index Range Scan(Index leaf node may lead to empty blocks) on the underlying table up to High Water Mark, which makes your delete slow.
Is there a difference between performing a delete from table where .. as compared to dropping the partitions with regard to "wasted" space?
Deleting is a slow process. It has to create before images(undo), update indexes, write redo logs and remove the data. Since DDL(Drop) doesnt generate redo/undo(Generate tiny bit of undo/redo for meta data) it would be faster than DML(delete).
Is performing table reorganization/defragmentation on a regular basis to reclaim table space a recommended practice?
Objects with fragmented free space can result in much wasted space, and can impact database performance. The preferred way to defragment and reclaim this space is to perform an online segment shrink.
For details:Reclaiming Unused Space
The following blog post demostrate the performance impact during DML becuase of wasted space and how to get rid of it.
Defragmentation Can Degrade Query Performance
If you're doing deletes or updates your space is getting fragmented. You can read about it in documentation.
To improve your process you can either perform some cleaning operations like shrink or just recreate tables on some big inserts. I mean instead of doing delete and insert do create table as select from old where rows not to delete and then insert new set into new table. After that just swap names and drop old table.
With your second question I think answer is here. Dropping partition will reduce HWM and delete will not.
This query suggests that there is a "wasted" space concept where after performing bulk inserts and deleting the data before it is replaced again, the space used is not reclaimed.
This is correct.
A direct path insert uses space above the high water mark for the segment. Subsequent deletes remove rows, but do not reset the high water mark.
It would be best to be able to truncate the segment prior to performing another direct path insert, as this resets the high water mark as well as removing all the rows.
I recently needed to import a .dmp into a new user I created. I also created a new tablespace for the the user with the following command:
create tablespace my_tablespace
datafile 'C:\My\Oracle\Install\DataFile01.dbf' size 10M
autoextend on
next 512K
maxsize unlimited;
While the import was running I got an error:
ORA-01652 Unable to extend my_tablespace segment by in tablespace
When I examined the data files in the dba_data_files table, I observed the maxsize was around 34gb. Because I knew the general size of the database, I was able to import the .dmp without any issues after adding multiple datafiles to the tablespace.
Why did I need to add multiple datafiles to the tablespace when the first one I added was set to automatically grow to an unlimited size? Why was the maximum size 34gb and not unlimited? Is there a hard cap of 34gb?
As you've discovered, and as Alex Poole pointed out, there are limits to an individual data file size. Smallfiles are limited to 128GB and bigfiles are limited to 128TB, depending on your block size. (But you do not want to change your block size just to increase those limits.) The size limit in the create tablespace command is only there if you want to further limit the size.
This can be a bit confusing. You probably don't care about managing files and want it to "just work". Managing database storage is always gonna be annoying, but here are some things you can do:
Keep your tablespaces to a minimum. There are some rare cases where it's helpful to partition data into lots of small tablespaces. But those rare benefits are usually outnumbered by the pain you will experience managing all those objects.
Get in the habit of always adding more than one data file. If you're using ASM (which I wouldn't recommend if this is a local instance), then there is almost no reason not to go "crazy" when adding datafiles. Even if you're not using ASM you should still go a little crazy. As long as you set the original size to low, you're not close to the MAX_FILES limit, and you're not dealing with one of the special tablespaces like UNDO and TEMP, there is no penalty for adding more files. Don't worry too much about allocating more potential space than your hard-drive contains. This drives some DBAs crazy, but you have to weigh the chance of running out of OS space versus the chance of running out of space in a hundred files. (In either case, your application will crash.)
Set the RESUMABLE_TIMEOUT parameter. Then SQL statements will be suspended, may generate an alert, will be listed in DBA_RESUMABLE, and will wait patiently for more space. This is very useful in data warehouses.
Why is it called "UNLIMITED"?
I would guess the keyword UNLIMITED is a historical mistake. Oracle has had the same file size limitation since at least version 7, and perhaps earlier. Oracle 7 was released in 1992, when a 1GB hard drive cost $1995. Maybe every operating system at the time had a file size limitation lower than that. Perhaps it was reasonable back then to think of 128GB as "unlimited".
Unlimited maxsize is not enough for this operation, also your resumable timeout must be enough, you can set the value as milisecond, if you want unlimited;
alter system set resumable_timeout=0;
Hi Can some let me know why we created different table space for Index and data.
It is a widespread belief that keeping indexes and tables in separate tablespaces improves performance. This is now considered a myth by many respectable experts (see this Ask Tom thread - search for "myth"), but is still a common practice because old habits die hard!
Third party edit
Extract from asktom: "Index Tablespace" from 2001 for Oracle version 8.1.6 the question
Is it still a good idea to keep indexes in their own tablespace?
Does this inhance performance or is it more of a recovery issue?
Does the answer differ from one platform to another?
First part of the Reply
Yes, no, maybe.
The idea, born in the 1980s when systems were tiny and user counts were in the single
digits, was that you separated indexes from data into separate tablespaces on different
disks.
In that fashion, you positioned the head of the disk in the index tablespace and the head
of the disk in the data tablespace and that would be better then seeking 2 times on the
same disk.
Drives back then were really slow at seeking and typically measured in the 10's to 100's
of megabytes (if you were lucky)
Today, with logical volumes, raid, NN gigabyte (nn is rapidly becoming NNN gigabytes)
drives, hundreds/thousands of concurrent users, thousands of tables, 10's of thousands of
indexes - this sort of "optimization" is sort of impossible.
What you strive for today is to be able to manage things, to spread IO out evenly
avoiding hot spots.
Since I believe all things should be in locally managed tablespaces with UNIFORM extent
sizes, I would say that yes, indexes would be in a different tablespace from the data but
only because they are a different SIZE then the data. My table with 50 columns and an
average row size of 4k might belong in a tablespace that has 5meg extents whereas the
index on a single number column might belong in a tablespace with 512k or 1m extents.
I tend to keep my indexes separate from the data but for the above sizing reason. The
tablespaces frequently end up on the same exact mount points. You strive for even io
across your disks and you may end up with indexes and data on the same devices.
It makes a sense in 80s, when there were not to many users and the databases size was not too big. At that time it was usefull to store indexes and tables in the different physical volumes.
Now there are the logical volumes, raid and so on and it is not necessary to store the indexes and tables in different tablespaces.
But all tablespaces must be locally managed with uniform extends size. From this point of view the indexes must be stored in different tablespace as the table with the 50 columns could be stored in the tablespace with 5Mb exteds size, when the tablespace for indexes will be enought 512Kb extended size.
Performance. It should be analyzed from case to case. I think that keeping all toghether in one tablespace becomes another myth too! It should be enough spindles, enough luns and take care of queuing in operating system. if someone thinks that making one tablespace is enough and is the same like many tablespaces without taking in consideration all other factors, means again another myth. It depends!
High Avalilability. using separate tablespaces can improve high availability of the system in case that some file corrution, files system corruption, block corruption. If the problem occures only at index tablespace there is achance to do the recovery online and our application still beeing available to the customer. see also: http://richardfoote.wordpress.com/2008/05/02/indexes-in-their-own-tablespace-recoverability-advantages-get-back/
using separate tablespaces for indexes, data, blobs, clobs, eventually some individual tables can be important for the manageability and costs. We can use our storage system to store our blobs, clobs, eventually archive to a different layer of storage with different quality of service