Fastest way to SELECT a row from a table in a database (Microsoft SQL server) - performance

I have a huge table with one int PRIMARY KEY IDENTITY column.
I guess making the SELECT query using that primary key is the fastest way for the database to find the row in the table isn't it?
If that is true i still have a question.
Is that query as fast as a call to a dictionary by key or the database still has to read all the rows from the beginning (the Primary Key column) till it finds the row itself?
Thanks in advance ^^

Using primary key is obviously the fastest way to access a particular row.
If you want to understand how it works, you have to understand how index works.
In general it works like that :
Let's say you have a table t1(col1,col2...col10) and you have an index on col1.
Index on col1 means that you have some data structure which contains pairs (col1, rec_id)
and rec_id allows direct access to row with appropriate col1.
The data structure is ordered by col1 and therefore allows efficient searching by col1.

I think searching in dictionary works per dictionary search algorithm which should be more like binary search kind.
When you declare a column as Primary key in table, then that column is indexed, hence it should be working based on hashing principle, so searching is definitely NOT row by row as you mentioned.
Finally, yes it is the common and fast way, but you should be selective about the number of columns and rows you need in your sql query. Avoid fetching large number of rows per select call.

Related

Deletion is slow in oracle DB

We have a table that doesn't have much data. The table has 3 partitions and we are deleting data in one partition only.
delete from table AB partition(A) where id=value;
here id has an index also but still delete is slow.
The datatype of id is varchar2 and the value is number.
Please help me to understand why the delete statement is slow.
I don't think the index has much use in this case. It has to evaluate every single row in the partition to see if it matches id=value. Typically this will be a full table scan and no index will be used. It totally depends on the number of rows in the partition how long it will take. But maybe i did not understand the question properly. I presumed "value" is a column in the same table, like ID.

How to create an index so that the following statement has an index

I have table A(id,name,code)
I have sql statement:
Select * from A where upper(code || name) like upper('%<search text>%');
How to create an index so that the following statement has an index?
Question for two option: table partitioned, and table not partitioned
Thanks & BR
Do you have a performance issue or is this just a hypothetical question?
An index is unlikely to help with this example: a full table scan will probably be the quickest solution. Why? Your table has 3 columns. The best index would be one that avoided looking in the table at all e.g.
create index ai on a (code, name, id);
But that needs to contain all the same data as the table plus a ROWID for each table row - so it is going to be bigger than the table and take longer to scan. You could try putting the columns in the index with the least selective first and using compression:
create index ai on a (code, name, id) compress;
Now the index may be smaller than the table - it depends on how selective the code and name columns are. If it is small enough, the optimizer might decide to use it instead of the table. It still contains all the IDs and ROWIDs so the reduction in size probably won't be dramatic. In the test case I set up the compressed index is about half the size of the table, yet Explain Plan shows the query has a higher cost if I use a hint to force it to use the index - maybe due to overheads of compression, I don't know.
You could look into Oracle Text and the CONTAINS expression - but then you would be writing a different query, not using LIKE.

Oracle 12c - refreshing the data in my tables based on the data from warehouse tables

I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful

Recommended way to index a date field in postgres?

I have a few tables with about 17M rows that all have a date column I would like to be able to utilize frequently for searches. I am considering either just throwing an index on the column and see how things go or sorting the items by date as a one time operation and then inserting everything into a new table so that the primary key ascends as the date ascends.
Since these are both pretty time consuming I thought it might be worth it to ask here first for input.
The end goal is for me to load sql queries into pandas for some analysis if that is relevant here.
The index on a date column makes sense when you are going to search the table for a given date(s), e.g.:
select * from test
where the_date = '2016-01-01';
-- or
select * from test
where the_date between '2016-01-01' and '2016-01-31';
-- etc
In these queries there is no matter whether the sort order of primary key and the date column are the same or not. Hence rewriting the data to the new table will be useless. Just create an index.
However, if you are going to use the index only in ORDER BY:
select * from test
order by the_date;
then a primary key integer index may be significantly (2-4 times) faster then an index on a date column.
Postgres supports to some extend clustered indexes, which is what you suggest by removing and reinserting the data.
In fact, removing and reinserting the data in the order you want will not change the time the query takes. Postgres does not know the order of the data.
If you know that the table's data does not change. Then cluster the data based on the index you create.
This operation reorders the table based on the order in the index. It is very effective until you update the table. The syntax is:
CLUSTER tableName USING IndexName;
See the manual for details.
I also recommend you use
explain <query>;
to compare two queries, before and after an index. Or before and after clustering.

Best way to identify a handful of records expected to have a flag set to TRUE

I have a table that I expect to get 7 million records a month on a pretty wide table. A small portion of these records are expected to be flagged as "problem" records.
What is the best way to implement the table to locate these records in an efficient way?
I'm new to Oracle, but is a materialized view an valid option? Are there such things in Oracle such as indexed views or is this potentially really the same thing?
Most of the reporting is by month, so partitioning by month seems like an option, but a "problem" record may be lingering for several months theorectically. Otherwise, the reporting shuold be mostly for the current month. Would you expect that querying across all month partitions to locate any problem record would cause significant performance issues compared to usinga single table?
Your general thoughts of where to start would be appreciated. I realize I need to read up and I'll do that but I wanted to get the community thought first to make sure I read the right stuff.
One more thought: The primary key is a GUID varchar2(36). In order of magnitude, how much of a performance hit would you expect this to be relative to using a NUMBER data type PK? This worries me but it is out of my control.
It depends what you mean by "flagged", but it sounds to me like you would benefit from a simple index, function based index, or an indexed virtual column.
In all cases you should be careful to ensure that all the index columns are NULL for rows that do not need to be flagged. This way your index will contain only the rows that are flagged (Oracle does not - by default - index rows in B-Tree indexes where all index column values are NULL).
Your primary key being a VARCHAR2 GUID should make no difference, at least with regards to the specific flagging of rows in this question, indexes will point to rows via Oracle internal ROWIDs.
Indexes support partitioning, so if your data is already partitioned, your index could be set to match.
Simple column index method
If you can dictate how the flagging works, or the column already exists, then I would simply add an index to it like so:
CREATE INDEX my_table_problems_idx ON my_table (problem_flag)
/
Function-based index method
If the data model is fixed / there is no flag column, then you can create a function-based index assuming that you have all the information you need in the target table. For example:
CREATE INDEX my_table_problems_fnidx ON my_table (
CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END
)
/
Now if you use the same logic in your SELECT statement, you should find that it uses the index to efficiently match rows.
SELECT *
FROM my_table
WHERE CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END IS NOT NULL
/
This is a bit clunky though, and it requires you to use the same logic in queries as the index definition. Not great. You could use a view to mask this, but you're still duplicating logic in at least two places.
Indexed virtual column
In my opinion, this is the best way to do it if you are computing the value dynamically (available from 11g onwards):
ALTER TABLE my_table
ADD virtual_problem_flag VARCHAR2(1) AS (
CASE
WHEN amount > 100 THEN 'Y'
ELSE NULL
END
)
/
CREATE INDEX my_table_problems_idx ON my_table (virtual_problem_flag)
/
Now you can just query the virtual column as if it were a real column, i.e.
SELECT *
FROM my_table
WHERE virtual_problem_flag = 'Y'
/
This will use the index and puts the function-based logic into a single place.
Create a new table with just the pks of the problem rows.

Resources