Adding new column and index to a table with a billion records

Adding new column and index to a table with a billion records - oracle

I want to add a new column into to a table with billion records. To speed the up the select statement, I need to add a new index which will contain this column and the PK column.
How long will it take to add a new index in a billion records table?
The new column for example [Field], the value will be 0,1,2,9.
Most of record's will be 9. In the select condition
Field=0 or Field=1 or Field=2 will be used, but the Field=9 will not be used.
for example in the a billion records table ,
Field with value 0 records:100,000;
Field with value 1 records:100,000;
Field with value 2 records:100,000;
Field with value 9 records:a billion-300,000
Should I create index on the column?
If not, the select sql that contain the condition
Field=0 will be too slow to return results?

If most of the values are 9's then you can avoid including them in the index with:
create index my_index on my_table (case column_name when 9 then null else column_name end);
Then query on ...
select ...
from ...
where case column_name when 9 then null else column_name end = 2
... for example.
The time taken will be the time required to scan the entire table, then sort the 300,000 records that will go in the index. Faster with a parallel index build, of course.

Related

How to retrieve workflow attribute values from workflow table?

I have a situation where in I need to take the values from table column which has data based on one of the column in same table.
There are two column values like that which is required to compare with another table.
Scenario:
Column 1 query:
SELECT text_value
FROM WF_ITEM_ATTRIBUTE_VALUES
WHERE name LIKE 'ORDER_ID' --AND number_value IS NOT NULL
AND Item_type LIKE 'ABC'
this query returns 14 unique records
Column 2 query:
SELECT number_value
FROM WF_ITEM_ATTRIBUTE_VALUES
WHERE name LIKE 'Source_ID' --AND number_value IS NOT NULL
AND Item_type LIKE 'ABC'
this also returns 14 records
and order_id of column 1 query is associated with source_id of column 2 query using this two column values i want to compare 14 records combined order_id, source_id with another table column i.e. Sales_tbl
columns sal_order_id, sal_source_id
Sample Data from WF_ITEM_ATTRIBUTE_VALUES:
Note: same data in the sales_tbl table but order_id is sal_order_id and sal_source_id
Order_id
204994 205000 205348 198517 198176 196856 204225 205348 203510 206528 196886 198971 194076 197940
Source_id
92262138 92261783 92262005 92262615 92374992 92375051 92374948 92375000 92375011 92336793 92374960 92691360 92695445 92695880
Desired O/p based on comparison:
Please help me in writing the query

Split the table into equal chunks based on varchar column Oracle

I have a huge table with 20 million records and I want to split the table into 10 equal chunks.
The problem is the table only has varchar columns. I am able to use ROWNUM column and split the table into equal chunks but I couldn't seem to get the Start and End value of the varchar column into the query result set. Below is the query.
with bkt as (
select ROWNUM, width_bucket(ROWNUM, 1, 100100, 10) as id_bucket from "BOOKER"."test"
)
select id_bucket
, min(ROWNUM) as bkt_start
, max(ROWNUM) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1;
Please advise how can I add the varchar column with this query to give me the start and end varchar values of the column.

Why oracle uses index skip scan for this query?

The SQL queries only one table, the table has 100 millions rows.
The SQL has three columns in the where clause, col_date, col_char1 and col_char2. col_date is of date type, but it has only day part, no time part, like '2016-02-25 00:00:00', this column has about 1000 unique values, and these values spread evenly among the records in the table. col_char1 is of varchar2 type, it has about 30 unique values, and these values also spread evenly. col_char2 is also of varchar2 type, it has about 20 unique values, and these values spread evenly. where clause is like col_date >= to_date('2016-02-24 00:00:00') and col_char1 = 'VAL1' and col_char2 = 'VAL2'. The query result is about 3000 rows.
I created an index INDEX1 with col_date, col_char1 and col_char2, in the order col_date, col_char1 and col_char2.
The execution plan is index skip scan using INDEX1. I don't know why it uses skip scan instead of range scan. I think skip scan should make this query very slow because the first column (col_date) in the index has so many distinct values.

The best index for the conditions you have in your question is a composite index on (col_char1, col_char2, col_date) (or the first two keys can be reversed).
If you don't have this index, but have a similar index, then I think a skip-scan will be used.

optimize query with minus oracle

Wanted to optimize a query with the minus that it takes too much time ... if they can give thanked help.
I have two tables A and B,
Table A: ID, value
Table B: ID
I want all of Table A records that are not in Table B. Showing the value.
For it was something like:
Select ID, value
FROM A
WHERE value> 70
MINUS
Select ID
FROM B;
Only this query is taking too long ... any tips how best this simple query?
Thank you for attention

Are ID and Value indexed?
The performance of Minus and Not Exists depend:
It really depends on a bunch of factors.
A MINUS will do a full table scan on both tables unless there is some
criteria in the where clause of both queries that allows an index
range scan. A MINUS also requires that both queries have the same
number of columns, and that each column has the same data type as the
corresponding column in the other query (or one convertible to the
same type). A MINUS will return all rows from the first query where
there is not an exact match column for column with the second query. A
MINUS also requires an implicit sort of both queries
NOT EXISTS will read the sub-query once for each row in the outer
query. If the correlation field (you are running a correlated
sub-query?) is an indexed field, then only an index scan is done.
The choice of which construct to use depends on the type of data you
want to return, and also the relative sizes of the two tables/queries.
If the outer table is small relative to the inner one, and the inner
table is indexed (preferrable a unique index but not required) on the
correlation field, then NOT EXISTS will probably be faster since the
index lookup will be pretty fast, and only executed a relatively few
times. If both tables a roughly the same size, then MINUS might be
faster, particularly if you can live with only seeing fields that you
are comparing on.
Minus operator versus 'not exists' for faster SQL query - Oracle Community Forums
You could use NOT EXISTS like so:
SELECT a.ID, a.Value
From a
where a.value > 70
and not exists(
Select b.ID
From B
Where b.ID = a.ID)
EDIT: I've produced some dummy data and two datasets for testing to prove the performance increases of indexing. Note: I did this in MySQL since I don't have Oracle on my Macbook.
Table A has 2600 records with 2 columns: ID, val.
ID is an autoincrement integer
Val varchar(255)
Table b has one column, but more records than Table A. Autoincrement (in gaps of 3)
You can reproduce this if you wish: Pastebin - SQL Dummy Data
Here is the query I will be using:
select a.id, a.val from tablea a
where length(a.val) > 3
and not exists(
select b.id from tableb b where b.id = a.id
);
Without Indexes, the runtime is 986ms with 1685 rows.
Now we add the indexes:
ALTER TABLE `tablea` ADD INDEX `id` (`id`);
ALTER TABLE `tableb` ADD INDEX `id` (`id`);
With Indexes, the runtime is 14ms with 1685 rows. That's 1.42% the time it took without indexes!

hive : select row with column having maximum value without join

writing hive query over a table to pick the row with maximum value in column
there is table with following data for example:
key value updated_at
1 "a" 1
1 "b" 2
1 "c" 3
the row which is updated last needs to be selected.
currently using following logic
select tab1.* from table_name tab1
join select tab2.key , max(tab2.updated_at) as max_updated from table_name tab2
on tab1.key=tab2.key and tab1.updated_at = tab2.max_updated;
Is there any other better way to perform this?

If it is true that updated_at is unique for that table, then the following is perhaps a simpler way of getting you what you are looking for:
-- I'm using Hive 0.13.0
SELECT * FROM table_name ORDER BY updated_at DESC LIMIT 1;
If it is possible for updated_at to be non-unique for some reason, you may need to adjust the ORDER BY logic to break any ties in the fashion you wish.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio