fast comparison a list with itself - performance

I have a list giant list (100k entries) in my database. Each entry contains a id, text and a date.
I created a function to compare two text as possible. How it looks like is not necessary right now.
Is there a "good" way to remove "duplicates" (as possible) from the list by text?
Currently I'm looping through the list twice and compare each entry with each entry, except itself by id.

If your question is when you insert a row in the table... you can include the unique constraint.
Postgresql
CREATE TABLE table1 (
id serial PRIMARY KEY,
txt VARCHAR (50),
dt timestamp,
UNIQUE(txt)
);
Oracle
CREATE TABLE table1
( id numeric(10) NOT NULL,
txt varchar2(50) NOT NULL,
date timestamp,
CONSTRAINT txt_unique UNIQUE (txt)
);

Related

Expensive subquery tuning with SQLite

I'm working on a small media/file management utility using sqlite for it's persistent storage needs. I have a table of files:
CREATE TABLE file
( file_id INTEGER PRIMARY KEY AUTOINCREMENT
, file_sha1 BINARY(20)
, file_name TEXT NOT NULL UNIQUE
, file_size INTEGER NOT NULL
, file_mime TEXT NOT NULL
, file_add_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);
And also a table of albums
CREATE TABLE album
( album_id INTEGER PRIMARY KEY AUTOINCREMENT
, album_name TEXT
, album_poster INTEGER
, album_created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
, FOREIGN KEY (album_poster) REFERENCES file(file_id)
);
to which files can be assigned
CREATE TABLE album_file
( album_id INTEGER NOT NULL
, file_id INTEGER NOT NULL
, PRIMARY KEY (album_id, file_id)
, FOREIGN KEY (album_id) REFERENCES album(album_id)
, FOREIGN KEY (file_id) REFERENCES file(file_id)
);
CREATE INDEX file_to_album ON album_file(file_id, album_id);
Part of the functionality is to list albums, exposing
the album id,
the album's name,
an poster image for that album and
the number of files in the album
which currently uses this query:
SELECT a.album_id, a.album_name,
COALESCE(
a.album_poster,
(SELECT file_id FROM file
NATURAL JOIN album_file af
WHERE af.album_id = a.album_id
ORDER BY file.file_name LIMIT 1)),
(SELECT COUNT(file_id) AS file_count
FROM album_file WHERE album_id = a.album_id)
FROM album a
ORDER BY album_name ASC
The only "tricky" part of that query is that the album_poster column may be null, in which case COALESCE statement is used to just return the first file in the album as the "default poster".
With currently ~260000 files, ~2600 albums and ~250000 entries in the album_file table, this query takes over 10 seconds which makes for a not-so-great user experience. Here's the query plan:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|1|SEARCH TABLE album_file AS af USING COVERING INDEX album_to_file (album_id=?)
1|1|0|SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
1|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
Replacing the COALESCE statement with just a.album_poster, sacrificing the auto-poster functionality, brings the query time down to a few milliseconds:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
What I don't understand is that limiting the album listing to 1 or 1000 rows makes no difference. It seems SQLite is doing the expensive sub-query for the "default" poster on all albums, only to throw away most of the results when finally cutting down the result set to the LIMITs specified with the query.
Is there something I can do to make the original query substantially faster, especially given that I'm usually only querying a small subset (using LIMIT) of all rows for display?

How can i create double sided composite primary key in oracle?

I have a table and this table has relationship with itself as many-to-many. So i am create another table (second table) that stores two id for composite primary key which comes from original (first) table. But if in second table there is id1=1 and id2=2, then second table shouldn't have id1=2 and id2=1.
So how can i do that, should i write a trigger for that or is there a simple way for oracle.
I use Oracle11g and pl/sql developer.
You can try defining a unique function-based index that automatically defines the index in numerical order. This would ensure that only one of the 2 combinations is ever allowed. Something like:
create unique index your_index on your_table(
least(id1, id2),
greatest(id1, id2)
);
If it matters, there is a slight difference between this approach and MT0's answer that uses the check constraint.
With the check constraint approach, only (id1=1, id2=2) is valid.
With the function-based index approach, both (id1=1, id2=2) and (id1=2, id2=1) are valid, but they can't both be present in the table at any given time.
CREATE TABLE table_name (
id INT PRIMARY KEY
);
CREATE TABLE table_name_many_to_many(
id1 INT REFERENCES table_name( id ),
id2 INT REFERENCES table_name( id ),
CONSTRAINT tnmtm__id1__id2__pk PRIMARY KEY ( id1, id2 ),
CONSTRAINT tnmtm__id1__id2__chk CHECK ( id1 <= id2 )
);

What is the best way of creating a unique identifier name in Oracle?

I want to create a check constraint on several thousand columns in a database, but all constraints needs a name that is unique in the database. I wanted to use a guid, but because of limitations in Oracle, the name can't be longer than 30 characters.
Here is an example of the syntax:
CREATE TABLE mytable
(
col1 DATE
DEFAULT to_date('19000101', 'yyyymmdd')
CONSTRAINT unique_name_needed CHECK(col1 = TRUNC(col1))
NOT NULL
)
We use a 3 letter short name for every table. These three letters we use to name our constraints. So the short name for mytable could be mtb.
Constraint names are then:
Primary: mtb_pk
Unique: mtb_uk
Foreign: mtb_otb_fk where otb is the short name of the other table.
The trick of course is to come up with unique short names for every table.

Contraint to set one column as the sum of two others automatically

I'm wondering is it possible to use a constraint to set the value of one column to be sum of two others. For example given the following tables:
CREATE TABLE Room (
Room_Num NUMBER(3),
Room_Band_ID NUMBER(2),
Room_Type_ID NUMBER(2),
Room_Price NUMBER(4),
PRIMARY KEY (Room_Num),
FOREIGN KEY(Room_Band_ID)
REFERENCES Room_Band(Room_Band_ID),
FOREIGN KEY(Room_Type_ID)
REFERENCES Room_Type(Room_Type_ID)
);
CREATE TABLE Booking (
Booking_ID NUMBER(10) NOT NULL,
GuestID NUMBER(4) NOT NULL,
StaffID NUMBER(2) NOT NULL,
Payment_ID NUMBER(4) NOT NULL,
Room_Num NUMBER(3) NOT NULL,
CheckInDate DATE NOT NULL,
CheckOutDate DATE NOT NULL,
Booking NUMBER(2) NOT NULL,
Price NUMBER(4),
PRIMARY KEY (Booking_ID),
FOREIGN KEY(GuestID)
REFERENCES Guest(GuestID),
FOREIGN KEY(StaffID)
REFERENCES Staff(StaffID),
FOREIGN KEY(Payment_ID)
REFERENCES Payment(Payment_ID),
FOREIGN KEY(Room_Num)
REFERENCES Room(Room_Num)
);
I know it is possible to do something like:
Constraint PriceIs CHECK (Booking.Price=(Room.Room_Price*
(Booking.CheckOutDate - Booking.CheckInDate)));
Is it also possible to set up a constraint that doesn't just ensure that the price is correct, but to calculate the price automatically into the price field for the relevant tuple?
Update,
So I've tried to set up a trigger as follows:
CREATE OR REPLACE trigger PriceCompute
AFTER INSERT ON Booking
FOR each row
BEGIN
UPDATE Booking
SET
SELECT (Room.Room_Price*(Booking.CheckOutDate - Booking.CheckInDate))
INTO
Booking.Price
FROM Booking
JOIN ROOM ON Booking.Room_Num = Room.Room_Num
END;
/
But I'm getting the following errors back:
Can anyone see where I'm going astray here, as its beyond me.
Yes, you can. Here are your options. Listed in order of my personal preference:
You can have a table without this column. And create a view that will be calculating this column on a fly.
You may use oracle virtual columns
create table Room (
...
price NUMBER GENERATED ALWAYS AS (room_price*(checkOut-checkIn)) VIRTUAL,
...)
You may use actual column (same as 2, per Dave Costa):
create table Room (
...
price AS (room_price*(checkOut-checkIn)),
...)
You can write trigger to populate it (like Mat M suggested)
You can write stored procedure, but it will be an overkill in this situation
I think you would have to put a trigger on both tables for whenever the price value of the room is changed or the checkout/in dates are changed, it will update the PriceIs field from your calculation.
If you don't need the calculated portion stored in an actual field, you can always create a view that calculates it whenever you look at the view.
I think the better solution is to use a view that calculates the value on the fly. But regarding your attempt to create a trigger, you should use :new.<column_name> to refer to the values being inserted into the Booking table. You don't need to perform updates and queries on that table to get or modify the values in the row that is being inserted*. You just refer to them as variables. So you would want to do something like:
SELECT (Room.Room_Price*(:new.CheckOutDate - :new.CheckInDate))
INTO
:new.Price
FROM ROOM WHERE :new.Room_Num = Room.Room_Num
*In fact, you can't perform queries or updates on the table whose modification invoked the trigger in the first place. You would get the infamous "mutating table" error if your trigger actually compiled and ran.

Sequence with variable

In SQL we will be having a sequence. But it should be appended to a variable like this
M1,M2,M3,M4....
Any way of doing this ?
Consider having the prefix stored in a separate column in the table, e.g.:
CREATE TABLE mytable (
idprefix VARCHAR2(1) NOT NULL,
id NUMBER NOT NULL,
CONSTRAINT mypk PRIMARY KEY (idprefix, id)
);
In the application, or in a view, you can concatenate the values together. Or, in 11g you can create a virtual column that concatenates them.
I give it 99% odds that someone will say "we want to search for ID 12345 regardless of the prefix" and this design means you can have a nice index lookup instead of a "LIKE '%12345'".
select 'M' || my_sequence.nextval from dual;

Resources