For insert-performance consideration, should a clustered index on a timestamp be ascending or descending? - performance

I just realized I have a clustered index on a Timestamp in descending order. I'm thinking about switching it to ascending, so that as new, ever-increasing timestamps are inserted, they are added to the end of the table. As it stands now, I suspect it has to add rows to the beginning of the table, and I wonder how SQL Server handles that.
Can it efficiently allocate new pages at the beginning of the table, and efficiently insert new rows into those pages, or would it be better filling up pages in the order of the timestamps and allocating new pages at the end with an ascending clustered index.

It's actually the same whether you add at start or end.
Page fills up, page splits, new page is allocated...
The new page may or not be contiguous whether it's at the start or the end.. which is why you run ALTER INDEX etc regularly.
The ASC/DEC order of the clustered index will matter more for SELECT/ORDER BY in practice... although I've noticed this less in SQL Server 2005 and above.

Related

How to know when was the last row of a table inserted

First of all I want to apologize because I do not have the vocabulary to talk about hive properly, I'm not sure if what goes into a row is called data and so on, I'm trying to be as correct as possible.
I want to know if it's possible, without adding an extra column to a hive table (where you would put the date/some metadata), what where the new rows added.
The case is as follows: A very large number of data is going to be processed, and the data selected ends in another hive table. If some new data is added to the original tables, I want to only process that new data, not to re-process the whole process, because it seems waste(we're talking several million entries).
I would normally add a new column with dates, or just metadata that tells me whether or not a row was already "computed" with.
edit: I have been updated with more info. Turns out, there are actually two problems, imo.
One, new data may come, and it would be infinitely better to just insert thus new ones in the destination table.
Second, data might be updated. I've been told that hive does not allow updates in the normal sense, since for example insert overwrite would just rewrite the whole set (turns out it's Hive 0.12.0, and in 0.14 SOME functionality has been added but updating is not a possibility).

Deletes Slow on a Oracle BIG Table

I have a table which has around 180 million records and 40 indexes. A nightly program, loads data into this table but due to certain business conditions we can only delete and load data into this table. The nightly program will bring new records or updates to existing records in the table from the source system.We have limited window i.e about 6 hours to complete the extract from the source system, perform business transformations and finally load the data into this target table and be ready for users to consume the data in the morning. The issue which we are facing is that the delete from this table takes a lot of time mainly due to the 40 indexes on the table(an average of 70000 deletes per hour). I did some digging on the internet and see the below options
a) Drop or disable indexes before delete and then rebuild indexes: The program which loads data into the target table after delete and loading the data needs to perform quite a few updates for which the indexes are critical. And to rebuild 1 index it takes almost 1.5 hours due to the enormous amount of data in the table. So this approach is not feasible due to the time it takes to rebuild indexes and due to the limited time we have to get the data ready for the users
b) Use bulk delete: Currently the program deletes based on rowid and deletes records one by one as below
DELETE
FROM <table>
WHERE rowid = g_wpk_tab(ln_i);
g_wpk_tab is the collection which holds rowids to be deleted which is read by looping via FOR ALL and I do an intermediate commit every 50000 row deletes.
Tom of AskTom says in this discussion over here says that the bulk delete and row by row delete will take almost the same amount of time
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5033906925164
So this wont be a feasible option as well
c)Regular Delete: Tom of AskTom suggests to use the regular delete and even that takes a long time probably due to the number of indexes on this table
d)CTAS: This approach is out of question because the program needs to recreate the table , create the 40 indexes and then proceed with the updates and I mentioned above an index will take atleast 1.5 hrs to create
If you could provide me any other suggestions I would really appreciate it.
UPDATE: As of now we have decided to go with the approach suggested by https://stackoverflow.com/users/409172/jonearles to archive instead of delete. Approach is to add a flag to the table to mark the records to be deleted as DELETE and then have a post delete program run during the day to delete off the records. This will ensure that the data is available for users at the right time. Since users consume via OBIEE we are planning to set content level filter on the table to not look at the archival column so that users needn't know about what to select and what to ignore.
Parallel DML alter session enable parallel dml;, delete /*+ parallel */ ...;, commit;. Sometimes it's that easy.
Parallel DDL alter index your_index rebuild nologging compress parallel;. NOLOGGING to reduce the amount of redo generated during the index rebuild. COMPRESS can significantly reduce the size of a non-unique index, which significantly reduces the rebuild time. PARALLEL can also make a huge difference in rebuild time if you have more than one CPU or more than one disk. If you're not already using these options, I wouldn't be surprised if using all of them together improves index rebuilds by an order of magnitude. And then 1.5 * 40 / 10 = 6 hours.
Re-evaluate your indexes Do you really need 40 indexes? It's entirely possible, but many indexes are only created because "indexes are magic". Make sure there's a legitimate reason behind each index. This can be very difficult to do, very few people document the reason for an index. Before you ask around, you may want to gather some information. Turn on index monitoring to see which indexes are really being used. And even if the index is used, see how it is used, perhaps through v$sql_plan. It's possible that an index is used for a specific statement but another index would have worked just as well.
Archive instead of delete Instead of deleting, just set a flag to mark a row as archived, invalid, deleted, etc. This will avoid the immediate overhead of index maintenance. Ignore the rows temporarily and let some other job delete them later. The large downside to this is that it affects any query on the table.
Upgrading is probably out of the question, but 12c has an interesting new feature called in-database archiving. It's a more transparent way of accomplishing the same thing.

Does postgresql index update on inserting new row?

Sorry if this is a dumb question but do i need to reindex my table every time i insert rows, or does the new row get indexed when added?
From the manual
Once an index is created, no further intervention is required: the system will update the index when the table is modified
http://postgresguide.com/performance/indexes.html
I think when you insert rows, the index does get updated. It maintains the sort on the index table as you insert data. Hence there are performance issues or downtimes on a table, if you try adding large number of rows at once.
On top of the other answers: PostgreSQL is a top notch Relational Database. I'm not aware of any Relational Database system where indices are not updated automatically.
It seems to depend on the type of index. For example, according to https://www.postgresql.org/docs/9.5/brin-intro.html, for BRIN indexes:
When a new page is created that does not fall within the last summarized range, that range does not automatically acquire a summary tuple; those tuples remain unsummarized until a summarization run is invoked later, creating initial summaries. This process can be invoked manually using the brin_summarize_new_values(regclass) function, or automatically when VACUUM processes the table.
Although this seems to have changed in version 10.

Using Partitioning and Indexing on Same Column in Oracle is there any benefit out there

We are having a database design where we have table on which we have 1 Day Interval Partitioning on the column named as "5mintime" and on the same column we have created index also.
"5mintime" column can have data such as 1-Mar-2011,2-Mar-2011, in short there is no time component in it and from the UI also the user can select only one day period as minimum date.
My question is that while firing the select queries is there any advantage gained because of indexes since the partition is already there, on the flip side if i remove the indexes the insertion will be come faster, so any help on this would be greatly appreciated.
If I understand you right, then I think there's no need for the index:
A local index is indexed for every partition, which in your case has the same value in all rows (ie: 1-Mar-2011 in the 1-Mar-2011 partition, 2-Mar-2011 in the 2-Mar-2011 partiotion and so on).
A global Index will actually index the whole table but will find a whole partiotion, which is also not usefull since you already have partiones ...
But, why not check it?
If each day's data goes into its own partition and you can never search within days, but only for entire days worth of data, then, no, I don't see this index adding any value.
You can confirm whether or not SQL queries are using this index by enabling monitoring:
alter index myindex monitoring usage;
And then check to see if it's been used by querying v$object_usage for it some time later.

Oracle text index (Context) keep growing

In my application I needed to search through many varchar columns from differents tables.
So I created a materialized view in which I concatenate those columns, since they exceed the 4000 characters I merged them concatenating the columns with the TO_CLOBS(column1) || TO_CLOB(column)... || TO_CLOB(columnN).
The query is complex, so the refresh is complete on demand for the view. We refresh it every 2 minutes.
The CONTEXT index is created with the sync on commit parameter.
The index then is synchronized every two minutes.
But when we run the optimize index it does not defrag the index. So it keeps growing.
In ctx_user_indexes I see how optimize drops the docid count but tokens doesnt shrinks. But when I use the REBUILD parameter in the index optimization it works correctly (drops down number of rows in DR$TEXT_INDEX_IDX$I).
Any idea ?
Thanks, and sorry for my poor english.
By adding a job to decrease the number of rows works.

Resources