Accelerate SQLite Query - performance

I'm currently learning SQLite (called by Python).
According to my previous question (Reorganising Data in SQLLIte), I want to store multiple time series (Training data) in my database.
I have defined the following fields:
CREATE TABLE VARLIST
(
VarID INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL
)
CREATE TABLE DATAPOINTS
(
DataID INTEGER PRIMARY KEY,
timeID INTEGER,
VarID INTEGER,
value REAL
)
CREATE TABLE TIMESTAMPS
(
timeID INTEGER PRIMARY KEY AUTOINCREMENT,
TRAININGS_ID INT,
TRAINING_TIME_SECONDS FLOAT
)
VARLIST has 8 entries, TIMESTAMPS 1e5 entries and DATAPOINTS around 5e6.
When I now want to extract data for a given TrainingsID and VarID, I try it like:
SELECT
(SELECT TIMESTAMPS.TRAINING_TIME_SECONDS
FROM TIMESTAMPS
WHERE t.timeID = timeID) AS TRAINING_TIME_SECONDS,
(SELECT value
FROM DATAPOINTS
WHERE DATAPOINTS.timeID = t.timeID and DATAPOINTS.VarID = 2) as value
FROM
(SELECT timeID
FROM TIMESTAMPS
WHERE TRAININGS_ID = 96) as t;
The command EXPLAIN QUERY PLAN delivers:
0|0|0|SCAN TABLE TIMESTAMPS
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE TIMESTAMPS USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SCAN TABLE DATAPOINTS
This basically works.
But there are two problems:
Minor problem: If there is a timeID where no data for the requested VarID is availabe, I get an line with the valueNone`.
I would prefer this line to be skipped.
Big problem: the search is incredibly slow (approx 5 minutes using http://sqlitebrowser.org/).
How do I best improve the performance?
Are there better ways to formulate the SELECT command, or should I modify the database structure itself?

Ok, based on the hints I have got I could extremly accelerate the search by applieng INDEXES as:
CREATE INDEX IF NOT EXISTS DP_Index on DATAPOINTS (VarID,timeID,DataID);
CREATE INDEX IF NOT EXISTS TS_Index on TIMESTAMPS(TRAININGS_ID,timeID);
The EXPLAIN QUERY PLAN output now reads as:
0|0|0|SEARCH TABLE TIMESTAMPS USING COVERING INDEX TS_Index (TRAININGS_ID=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE TIMESTAMPS USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE DATAPOINTS USING INDEX DP_Index (VarID=? AND timeID=?)
Thanks for your comments.

Related

find a best way to traverse oracle table

I have an oracle table. Table's DDL is (not have the primary key)
create table CLIENT_ACCOUNT
(
CLIENT_ID VARCHAR2(18) default ' ' not null,
ACCOUNT_ID VARCHAR2(18) default ' ' not null,
......
)
create unique index UK_ACCOUNT
on CLIENT_ACCOUNT (CLIENT_ID, ACCOUNT_ID)
Then, the data's scale is very huge, maybe 100M records. I want to traverse this whole table's data with batch.
Now, I use the table's index to batch traverse. But I have some oracle grammar problems.
# I want to use this SQL, but grammar error.
# try to use b-tree's index to locate start position, but not work
select * from CLIENT_ACCOUNT
WHERE (CLIENT_ID, ACCOUNT_ID) > (1,2)
AND ROWNUM < 1000
ORDER BY CLIENT_ID, ACCOUNT_ID
Has the fastest way to batch touch table data?
Wild guess:
select * from CLIENT_ACCOUNT
WHERE CLIENT_ID > '1'
and ACCOUNT_ID > '2'
AND ROWNUM < 1000;
It would at least compile, although whether it correctly implements your business logic is a different matter. Note that I have cast your filter criteria to strings. This is because your columns have a string datatype and you are defaulting them to spaces, so there's a high probability those columns contain non-numeric values.
If this doesn't solve your problem, please edit your question with more details; sample input data and expected output is always helpful in these situations.
Your data model seems odd.
Your columns are defined as varchar2. So why is your criteria numeric?
Also, why do you default the key columns to space? It would be better to leave unpopulated values as null. (To be clear, NULL is not a good thing in an indexed column, it's just better than a space.)

Create a generic DB table

I am having multiple products and each of them are having there own Product table and Value table. Now I have to create a generic screen to validate those product and I don't want to create validated table for each Product. I want to create a generic table which will have all the Products details and one extra column called ProductIdentifier. but the problem is that here in this generic table I may end up putting millions of records and while fetching the data it will take time.
Is there any other better solution???
"Millions of records" sounds like a VLDB problem. I'd put the data into a partitioned table:
CREATE TABLE myproducts (
productIdentifier NUMBER,
value1 VARCHAR2(30),
value2 DATE
) PARTITION BY LIST (productIdentifier)
( PARTITION p1 VALUES (1),
PARTITION p2 VALUES (2),
PARTITION p5to9 VALUES (5,6,7,8,9)
);
For queries that are dealing with only one product, specify the partition:
SELECT * FROM myproducts PARTITION FOR (9);
For your general report, just omit the partition and you get all numbers:
SELECT * FROM myproducts;
Documentation is here:
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/toc.htm

WITH Clause performance issue in Oracle 11g

Table myfirst3 have 4 columns and 1.2 million records.
Table mtl_object_genealogy has over 10 million records.
Running the below code takes very long time. How to tune this code using with options?
WITH level1 as (
SELECT mln_parent.lot_number,
mln_parent.inventory_item_id,
gen.lot_num ,--fg_lot,
gen.segment1,
gen.rcv_date.
FROM mtl_lot_numbers mln_parent,
(SELECT MOG1.parent_object_id,
p.segment1,
p.lot_num,
p.rcv_date
FROM mtl_object_genealogy MOG1 ,
myfirst3 p
START WITH MOG1.object_id = p.gen_object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
CONNECT BY nocycle PRIOR MOG1.parent_object_id = MOG1.object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
UNION all
SELECT p1.gen_object_id,
p1.segment1,
p1.lot_num,
p1.rcv_date
FROM myfirst3 p1 ) gen
WHERE mln_parent.gen_object_id = gen.parent_object_id )
select /*+ NO_CPU_COSTING */ *
from level1;
execution plan
CREATE TABLE APPS.MYFIRST3
(
TO_ORGANIZATION_ID NUMBER,
LOT_NUM VARCHAR2(80 BYTE),
ITEM_ID NUMBER,
FROM_ORGANIZATION_ID NUMBER,
GEN_OBJECT_ID NUMBER,
SEGMENT1 VARCHAR2(40 BYTE),
RCV_DATE DATE
);
CREATE TABLE INV.MTL_OBJECT_GENEALOGY
(
OBJECT_ID NUMBER NOT NULL,
OBJECT_TYPE NUMBER NOT NULL,
PARENT_OBJECT_ID NUMBER NOT NULL,
START_DATE_ACTIVE DATE NOT NULL,
END_DATE_ACTIVE DATE,
GENEALOGY_ORIGIN NUMBER,
ORIGIN_TXN_ID NUMBER,
GENEALOGY_TYPE NUMBER,
);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N1 ON INV.MTL_OBJECT_GENEALOGY(OBJECT_ID);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N2 ON INV.MTL_OBJECT_GENEALOGY(PARENT_OBJECT_ID);
Your explain plan shows some very big numbers. The optimizer reckons the final result set will be about 3227,000,000,000 rows. Just returning that many rows will take some time.
All table accesses are Full Table Scans. As you have big tables that will eat time too.
As for improvements, it's pretty hard to for us understand the logic of your query. This is your data model, you business rules, your data. You haven't explained anything so all we can do is guess.
Why are you using the WITH clause? You only use the level result set once, so just have a regular FROM clause.
Why are you using UNION ALL? That operation just duplicates the records retrieved from myfirst3 ( all those values are already included as rows where MOG1.object_id = p.gen_object_id.
The MERGE JOIN CARTESIAN operation is interesting. Oracle uses it to implement transitive closure. It is an expensive operation but that's because treewalking a hierarchy is an expensive thing to do. It is unfortunate for you that you are generating all the parent-child relationships for a table with 27 million records. That's bad.
The full table scans aren't the problem. There are no filters on myfirst3 so obviously the database has to get all the records. If there is one parent for each myfirst3 record that's 10% of the contents mtl_object_genealogy so a full table scan would be efficient; but you're rolling up the entire hierarchy so it's like you're looking at a much greater chunk of the table.
Your indexes are irrelevant in the face of such numbers. What might help is a composite index on mtl_object_genealogy(OBJECT_ID, PARENT_OBJECT_ID, END_DATE_ACTIVE).
You want all the levels of PARENT_OBJECT_ID for the records in myfirst3. If you run this query often and mtl_object_genealogy is a slowly changing table you should consider materializing the transitive closure into a table which just has records for all the permutations of leaf records and parents.
To sum up:
Ditch the WITH clause
Drop the UNION ALL
Tune the tree-walk with a composite index (or materializing it)

Expensive subquery tuning with SQLite

I'm working on a small media/file management utility using sqlite for it's persistent storage needs. I have a table of files:
CREATE TABLE file
( file_id INTEGER PRIMARY KEY AUTOINCREMENT
, file_sha1 BINARY(20)
, file_name TEXT NOT NULL UNIQUE
, file_size INTEGER NOT NULL
, file_mime TEXT NOT NULL
, file_add_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);
And also a table of albums
CREATE TABLE album
( album_id INTEGER PRIMARY KEY AUTOINCREMENT
, album_name TEXT
, album_poster INTEGER
, album_created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
, FOREIGN KEY (album_poster) REFERENCES file(file_id)
);
to which files can be assigned
CREATE TABLE album_file
( album_id INTEGER NOT NULL
, file_id INTEGER NOT NULL
, PRIMARY KEY (album_id, file_id)
, FOREIGN KEY (album_id) REFERENCES album(album_id)
, FOREIGN KEY (file_id) REFERENCES file(file_id)
);
CREATE INDEX file_to_album ON album_file(file_id, album_id);
Part of the functionality is to list albums, exposing
the album id,
the album's name,
an poster image for that album and
the number of files in the album
which currently uses this query:
SELECT a.album_id, a.album_name,
COALESCE(
a.album_poster,
(SELECT file_id FROM file
NATURAL JOIN album_file af
WHERE af.album_id = a.album_id
ORDER BY file.file_name LIMIT 1)),
(SELECT COUNT(file_id) AS file_count
FROM album_file WHERE album_id = a.album_id)
FROM album a
ORDER BY album_name ASC
The only "tricky" part of that query is that the album_poster column may be null, in which case COALESCE statement is used to just return the first file in the album as the "default poster".
With currently ~260000 files, ~2600 albums and ~250000 entries in the album_file table, this query takes over 10 seconds which makes for a not-so-great user experience. Here's the query plan:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|1|SEARCH TABLE album_file AS af USING COVERING INDEX album_to_file (album_id=?)
1|1|0|SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
1|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
Replacing the COALESCE statement with just a.album_poster, sacrificing the auto-poster functionality, brings the query time down to a few milliseconds:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
What I don't understand is that limiting the album listing to 1 or 1000 rows makes no difference. It seems SQLite is doing the expensive sub-query for the "default" poster on all albums, only to throw away most of the results when finally cutting down the result set to the LIMITs specified with the query.
Is there something I can do to make the original query substantially faster, especially given that I'm usually only querying a small subset (using LIMIT) of all rows for display?

Define index for sparse column

I have a table with a columns 'A' and 'B'.
'A' is a column with 90% 'null' and 10% different values , and most of the time I query to have record with one or two of these different values.
and 'B' is a column with 90% value='1' and 10% different values and most of the time I query to have record with one or two of these different values.
In this table we have DML transaction most of the time.
now , I don't know define index on these columns is good? if yes which type of index?
In principle Bitmap Index would be the best in such situation. However, due to mulit-user environment they are not suitable - you would slow down your application significantly by table locks and perhaps get even dead-locks.
Maybe you can optimize your application by smart partitioning and usage of Partial Indexes (new feature in Oracle 12c)
CREATE TABLE statements below should be equivalent.
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b) (
PARTITION part_a_NULL VALUES (NULL) (
SUBPARTITION part_a_NULL_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_NULL_b_other VALUES (DEFAULT) INDEXING ON
),
PARTITION part_a_others VALUES (DEFAULT) (
SUBPARTITION part_a_others_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_others_b_other VALUES (DEFAULT) INDEXING ON
)
);
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b)
SUBPARTITION TEMPLATE (
SUBPARTITION b_1 VALUES (1) INDEXING OFF,
SUBPARTITION b_other VALUES (DEFAULT) INDEXING ON
)
(
PARTITION part_a_NULL VALUES (NULL),
PARTITION part_a_others VALUES (DEFAULT)
);
CREATE INDEX IND_A ON YOUR_TABLE (A) LOCAL INDEXING PARTIAL;
CREATE INDEX IND_B ON YOUR_TABLE (B) LOCAL INDEXING PARTIAL;
By this your index will consume only 10% of entire tablespace. If your WHERE condition is WHERE A IS NULL or WHERE B = 1 then Oracle optimizer would skip such indexes anyway.
Verify with this query
SELECT table_name, partition_name, subpartition_name, indexing
FROM USER_TAB_SUBPARTITIONS
WHERE table_name = 'YOUR_TABLE';
if INDEXING is used on desired subpartitions.
Update
I just see actually this is an overkill because NULL values on column A do not create any index entry anyway. So, it can be simplified to
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (b) (
PARTITION part_b_1 VALUES (1) INDEXING OFF,
PARTITION part_b_other VALUES (DEFAULT) INDEXING ON
);
For example, if you have index a_b_idx on A, B (in that order):
a) select ... from ... where A = ... will use index
b) select ... from ... where B = ... will not use index
On the other side, if you have index b_a_idx on B, A:
a) select ... from ... where A = ... will not use index
b) select ... from ... where B = ... will use index
Oracle can't use second column in index if it doesn't filter on first column, since in regular cases index is tree-like structure: column1->column2->column3->etc.
You need index on column A only or on columns A, B if you do queries like a).
You need index on column B only or on columns B, A if you do queries like b).
Oracle doesn't store all-null values in index, but it can store null value for A if B contains non-null value.
Sometimes it's more fruitful to read whole table into memory and ignore index. Optimizer can do it if possible result set is big and it goes for all records, since index-to-record transition costs more than simple records read.
Also sometimes it happens erroneously for tables without statistics, so you either need jobs with alter table ... compute statistics or oracle 11+ that can compute statistics like this without jobs.
Most of the times, another index is good thing for queries, but bad thing for updates/disk. Each index takes disk space and each update of record(s) makes updates to every index. So for heavily updated tables it's not good to have many indexes, but for frequently queried tables it's better to have indexes covering all common cases.
For most flat queries (without joins/subqueries/hierarchy) only 1 index is used, so having indexes for each column is generally just a waste of disk space. You need multicolumn index to optimize where A=... and B=...
As for index type, you probably need simple non-unique indexes.
Column A
Let assume that you create an index named _columnA_index_. In general, indexes in RDBMS would not include NULL values, which means there is no index entries in _columnA_index_ pointing to records having NULL values. Thus, the following query
Q1: select * from MyTable where A is null;
will result in a table scan instead ( or DBMS opts to use another index on another column if any)
However, since there is 10% of records having 'different values', the _columnA_index_ will of course help for queries, for example.
Q2: select * from MyTable where A = '123';
In the above example, if the query returns < 1% of the records, the _columnA_index_ is helpful. Depending on how selective the query is, the index greatly improves the performance. You can create an index that is suitable for datatype of column A.
Column B
Similarly, an index on B will not help
Q3: select * from MyTable where B = 1;
but it will help with different values
Q4: select * from MyTable where B = '456';
NULL values
So far, I answered that any index does not help with NULL values. Therefore, if you need to query Q1 most of the time, I suggest the following ideas
Make sure that your version of DBMS does support NULL values be included in indexes. For example Oracle 11g does but not versions before that.
Plan to create function-based index here, again with Oracle. But you can take the idea at least.
Redesign the logic of your application / your need to do querying on Null values. I prefer this approach.

Resources