SQLite index performance - performance

I have a table in the following layout
email, item_id, json where
email is a string
item_id is a unix timestamp in ms
json is the item data to be used with the JSON1 extension
and I also have a multi-column index on that table with email, id
I perform a lot of queries in the style of WHERE email = 'asd' AND item_id > ... AND item_id < ...
I've been dealing with MongoDB for too many years now, so I'm used to not dealing with database normalization, and just went with the easiest SQL table layout.
On a phone, a query for the abovementioned style can take up to a second for a 35000 item query. The index does get used.
Will I get a noticeable performance boost when I normalize the database by creating a new table with email, email_id and change the original to email_id, item_id, json and start querying via JOINS? In that case email, email_id would contain about 2-5 items and email_id, item_id, json many thousands.

Using 3 tests the original, a query using a JOIN and an additional option, using a subquery rather than a join to get the email id based upon the email address and compare this to the email_id. The subquery came out on top, the original fared worst.
Results were :-
SELECT * FROM original WHERE email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406
OK
Time: 0.199s
SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3#ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406
OK
Time: 0.082s
SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406
OK
Time: 0.109s
The following was used to create and test :-
DROP TABLE IF EXISTS original;
CREATE TABLE IF NOT EXISTS original (email TEXT, item_id INTEGER, json BLOB, PRIMARY KEY(email,item_id));
WITH RECURSIVE cnt(x,y,z)
AS (
SELECT 'email'||(1 + ABS(random() / (9223372036854775807 / 5)))||'#ouremail.com',
ABS(random()),
randomblob(ABS(random() / (9223372036854775807 / 40) ))
UNION ALL SELECT
'email'||(1 + ABS(random() / (9223372036854775807 / 5)))||'#ouremail.com',
ABS(random()),
randomblob(ABS(random() / (9223372036854775807 / 40)))
FROM cnt LIMIT 350000
)
INSERT INTO original SELECT * FROM cnt;
DROP TABLE IF EXISTS email;
CREATE TABLE IF NOT EXISTS email (email_id INTEGER PRIMARY KEY, email TEXT);
INSERT INTO email SELECT DISTINCT null,email FROM original;
DROP TABLE IF EXISTS item;
CREATE TABLE IF NOT EXISTS item (email_id, item_id, json);
INSERT INTO item SELECT
(SELECT email_id FROM email WHERE original.email = email.email),
item_id,
json FROM original;
SELECT * FROM original WHERE email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3#ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406;
SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
You could worse then to run the following and look at the output.
EXPLAIN QUERY PLAN SELECT * FROM original WHERE email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
EXPLAIN QUERY PLAN SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3#ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406;
EXPLAIN QUERY PLAN SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3#ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;

Related

How to insert comma delimited values in one column in oracle?

I have a table by the name of personal_info and it contains id,name and phone_number as columns. So the following is table structure which I want to store data.
id
name
phone_number
1
ali
03434444, 03454544, 0234334
So how to store data in phone_number column in comma delimited format and how to filter that column in where clause for example
Select * from personal_info where phone_number = 03454544 ;
And which datatype is suitable for phone_number column.
Well, the real good practice would rather be to have another table PHONE with a 1xN association (for example a PHONE_ID primary key, and ID and PHONE columns.)
You may then have the result you want with a view based on your two tables and using the LISTAGG operator : https://fr.wikibooks.org/wiki/Oracle_Database/Utilisation_de_fonctions/fonction_LISTAGG, but this will be much efficient to work with, especially if you want WHERE clauses based on your phone numbers.
Use LIKE with the delimiters:
Select *
from personal_info
where ', ' || phone_number || ', ' LIKE '%, ' || '03454544' || ', %';
However
You should consider changing your data structure to store the phone numbers in a separate table:
CREATE TABLE phone_numbers (
person_id REFERENCES personal_info (id),
phone_number VARCHAR2(12)
);
And then you can get the data using a JOIN
SELECT pi.*,
pn.phone_number
FROM personal_info pi
INNER JOIN phone_numbers pn
ON (pi.id = pn.person_id)
WHERE pn.phone_number = '03434444'
or, if you want all the phone numbers:
SELECT pi.*,
pn.phone_numbers
FROM personal_info pi
INNER JOIN (
SELECT person_id,
LISTAGG(phone_number, ', ') WITHIN GROUP (ORDER BY phone_number)
AS phone_numbers
FROM phone_numbers
GROUP BY person_id
HAVING COUNT(CASE WHEN phone_number = '03434444' THEN 1 END) > 0
) pn
ON (pi.id = pn.person_id)
db<>fiddle here
VARCHAR2 is suitable for phone numbers.
You can get the values this way:
WITH personal_info AS
(
SELECT 1 AS ID, 'Ali' AS NAME, '03434444, 03454544, 0234334' AS phone_number FROM dual
)
SELECT *
FROM (SELECT id, name, TRIM(regexp_substr(phone_number, '[^,]+', 1, LEVEL)) AS phone_number
FROM personal_info
CONNECT BY LEVEL <= LENGTH (phone_number) - LENGTH(REPLACE(phone_number, ',' )) + 1)
WHERE phone_number = '03454544';
Wrong data model, it isn't normalized. You should create a new table:
create table phones
(id_phone number constraint pk_phone primary key,
id_person number constraint fk_pho_per references person (id_person),
phone_number varchar2(30) not null
);
Then you'd store as many numbers as you want, one-by-one (row-by-row, that is).
If you want to do it your way, store it just like that:
insert into person (id, name, phone_number)
values (1, 'ali', '03434444, 03454544, 0234334');
One option of querying such data is using the instr function:
select * from person
where instr(phone_number, '03434444') > 0;
or like:
select * from person
where phone_number like '%'% || '03434444' || '%'
or split it into rows:
select * from person a
where '03434444' in (select regexp_substr(b.phone_number, '[^,]+', 1, level)
from person b
where b.id_person = a.id_person
connect by level <= regexp_count(b.phone_number, ',') + 1
)
I'd do it my way, i.e. with a new table that contains only phone numbers.

Oracle Delete/Update in one query

I want to Delete the Duplicates from the table update the unique identifier and merge it with the already existing record.
I have a table which can contain following records -
ID Name Req_qty
1001 ABC-02/01+Time 10
1001 ABC-03/01+Time 20
1001 ABC 30
1002 XYZ 40
1003 DEF-02/01+Time 10
1003 DEF-02/01+Time 20
And I am expecting the records after the operation as follows:
ID Name Req_Qty
1001 ABC 60
1002 XYZ 40
1003 DEF 30
Any assistance would be really helpful. Thanks!
It is possible to do this in a single SQL statement:
merge into (select rowid as rid, x.* from test_table x ) o
using ( select id
, regexp_substr(name, '^[[:alpha:]]+') as name
, sum(reg_qty) as reg_qty
, min(rowid) as rid
from test_table
group by id
, regexp_substr(name, '^[[:alpha:]]+')
) n
on (o.id = n.id)
when matched then
update
set o.name = n.name
, o.reg_qty = n.reg_qty
delete where o.rid > n.rid;
Working example
This uses a couple of tricks:
the delete clause of a merge statement will only operate on data that has been updated, and so there's no restriction on what gets updated.
you can't select rowid from a "view" and so it's faked as rid before updating
by selecting the minimum rowid from per ID we make a random choice about which row we're going to keep. We can then delete all the rows that have a "greater" rowid. If you have a primary key or any other column you'd prefer to use as a discriminator just substitute that column for rowid (and ensure it's indexed if your table has any volume!)
Note that the regular expression differs from the other answer; it uses caret (^) to anchor the search for characters to the beginning of the string before looking for all alpha characters thereafter. This isn't required as the default start position for REGEXP_SUBSTR() is the first (1-indexed) but it makes it clearer what the intention is.
In your case, you will need to update the records first and then delete the records which are not required as following (Update):
UPDATE TABLE1 T
SET T.REQ_QTY = (
SELECT
SUM(TIN.REQ_QTY) AS REQ_QTY
FROM
TABLE1 TIN
WHERE TIN.ID = T.ID
)
WHERE (T.ROWID,1) IN
(SELECT TIN1.ROWID, ROW_NUMBER() OVER (PARTITION BY TIN1.ID)
FROM TABLE1 TIN1); --TAKING RANDOM RECORD FOR EACH ID
DELETE FROM TABLE1 T
WHERE NOT EXISTS (SELECT 1 FROM TABLE1 TIN
WHERE TIN.ID = T.ID AND TIN.REQ_QTY > T.REQ_QTY);
UPDATE TABLE1 SET NAME = regexp_substr(NAME,'[[:alpha:]]+');
--Update--
The following merge should work for you
MERGE INTO
(select rowid as rid, T.* from MY_TABLE1 T ) MT
USING
(
SELECT * FROM
(SELECT ID,
regexp_substr(NAME,'^[[:alpha:]]+') AS NAME_UPDATED,
SUM(Req_qty) OVER (PARTITION BY ID) AS Req_qty_SUM,
ROWID AS RID
FROM MY_TABLE1) MT1
WHERE RN = 1
) mt1
ON (MT.ID = MT1.ID)
WHEN MATCHED THEN
UPDATE SET MT.NAME = MT1.NAME_UPDATED, MT.Req_qty = MT1.Req_qty_SUM
delete where (MT.RID <> MT1.RID);
Cheers!!

ORACLE Query to find value in other table based on dates

I have two tables, Table A has an ID and an Event Date and Table B has an ID, a Description and an Event Date.
Not all IDs in Table A appear in Table B and some IDs appear multiple times in Table B with different Descriptions for each event.
The Description in Table B is an attribute that can change over time, the Event date in Table B is the date that a given ID's Description changes from its default value (kept in another table) to the new value.
I want to find the Description in Table B that matches the Event Date in Table A so, for example
Table Sample Data
A1234 would return Green and A4567 would return Null
I can't create tables here so I need to be able to this with a query.
This query will select last description from before the event:
SELECT * FROM (
SELECT tabA.id, tabA.event_date, tabB.description,
ROW_NUMBER() OVER(PARTITION BY tabB.id ORDER BY tabB.event_date DESC) rn
FROM Table_A tabA
LEFT JOIN Table_B tabB ON tabA.id = tabB.id AND tabB.event_date <= tabA.event_date
) WHERE rn = 1
If I understand well your need, this could be a way:
select a.id, description
from tableA A
left join
(select id,
description,
event_date from_date,
lead(event_date) over (partition by id order by event_date) -1 as to_date
from tableB
) B
on (A.id = B.id and a.event_date between b.from_date and b.to_date)
The idea here is to evaluate, for each row in tableB the range of dates for which that row, and its description, is valid; given this, a simple join should do the job.
You can left join tables like:
select a.ID , b1.DESCRIPTION
from TABLE_A a
left join TABLE_B b1 on a.ID = b1.id and a.EVENT_DATE > b1.EVENT_DATE
left join TABLE_B b2 on a.ID = b2.id and b1.EVENT_DATE < b2.EVENT_DATE and a.EVENT_DATE > b2.EVENT_DATE
where b1.id is null or b2.EVENT_DATE is null;

Oracle - Update rows with a min value in the group of a column from another table

I am facing this scenario: Table Employee has joining_date column. Table Booking has booking_date column and a foreign key (employee_id) to Employee. Employee has some NULL values in its joining_date column. I want to fill them with the FIRST booking_date value of those employees. How can I do?
FYI:
I can query with complex join statements to extract the first booking_date of employees whose joining_date is NULL as below:
SELECT emp.employee_id, emp.joining_date, temp2.booking_date FROM employee emp
LEFT JOIN (SELECT bo.employee_id, bo.booking_date FROM booking bo
INNER JOIN (SELECT employee_id, MIN(booking_date) mindate FROM booking GROUP BY employee_id) temp1
ON bo.employee_id = temp1.employee_id AND bo.booking_date = temp1.mindate) temp2
ON emp.employee_id = temp2.employee_id
WHERE emp.joining_date IS NULL;
But I'm struggling with putting this complex select into the update statement:
UPDATE employee emp
SET emp.joining_date = (SELECT ...)
WHERE emp.joining_date IS NULL;
Your select statement is more complex than it needs to be, you will get the same set this way:
SELECT emp.employee_id,min(bo.booking_date) booking_date
FROM employee emp
LEFT JOIN booking bo
ON bo.employee_id = emp.employee_id
WHERE emp.joining_date is NULL
GROUP BY emp.employee_id;
Your update can be done like this, note that the "and exists" section is optional but I tend to include it to make the intent of the query more clear.
UPDATE employee emp
SET emp.joining_date =
(SELECT min(booking_date) from booking bo where bo.employee_id = emp.employee_id)
WHERE emp.joining_date IS NULL
and exists(select * from booking bo where bo.employee_id = emp.employee_id);

identify duplicates as well as the matched unique record in oracle

hi i am running the follwoing query to identify the duplicate records.
SELECT *
FROM unique2 P WHERE EXISTS(SELECT 1 FROM unique2 C
WHERE ( (C.surname) = (P.surname))
AND ( (C.postcode) = (P.postcode))
AND ((( (C.forename) IS NULL OR (P.forename) IS NULL)
AND (C.initials) = (P.initials))
OR (C.forename) = (P.forename))
AND ( (C.sex) = (P.sex)
OR (C.title) = (P.title))
AND (( (C.address1))=( (P.address1))
OR ( (C.address1))=( (P.address2))
OR ( (C.address2))=( (P.address1))
OR instr(C.address1_notrim, P.address1_notrim) > 0
OR instr(P.address1_notrim, C.address1_notrim) > 0)
AND C.rowid < P.rowid);
But with this query i can't identify the unique record id which is matched to the duplicate records. Is there a way to identify the
duplicates as well as the unique record id(my table has unique key) to which those duplicates are matched?
select id
from promolog
where surname, postcode, dob in (
select surname, postcode,dob
from (
select surname, postcode, dob, count(1)
from promolog
group by surname,postcode,dob
having count(1) > 1
)
)
You can also do this with analytic functions:
select id, num_of_ids, first_id, surname, postcode, dob
from (
select id,
count(*) over (partition by surname, postcode, dob) as num_of_ids,
first_value(id)
over (partition by surname, postcode, dob order by id) as first_id,
surname,
postcode,
dob
from promolog
)
where num_of_ids > 1;
Based on your update, I think you can just do a self-join, which you can make as complicated as you like:
select dup.*, master.id as duplicate_of
from promolog dup
join promolog master
on master.surname = dup.surname
and master.postcode = dup.postcode
and master.dob = dup.dob
... and <address checks etc. > ...
and master.rowid < dup.rowid;
But maybe I'm still missing something. As the name suggests, exists is for testing the existence of a matching record; if you want to retrieve any of the data from the matched record then you'll need to join to it at some point.

Resources