SQLITE: Best practices about using AUTOINCREMENT - performance

According to the official manual:
"The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed."
So it is better not to use it? Do you have any benchmark of using the implicit rowid against using AUTOINCREMENT?

As recommended in the documentation, it is better to not use AUTOINCREMENT unless you need to ensure that the alias of the rowid (aka the id) is greater then any that have been added. However, (in normal use) it's a moot point as such, as even without AUTOINCREMENT, until you have reached 9223372036854775807 rows then a higher rowid/id will result.
If you do reach the id/rowid of 9223372036854775807, then that's it if you have AUTOINCREMENT coded, as an SQLITE_FULL exception will happen. Whilst without AUTOINCREMENT attempts will be made to get an unused id/rowid.
AUTOINCREMENT adds a row (table if required) to sqlite_sequence that records the highest allocated id. The difference between with and without AUTOINCREMENT is that the sqlite_sequecence table is referenced, whilst without AUTOINCREMENT the isn't. So if a row is deleted that has the highest id AUTOINCREMENT gets the highest ever allocated id from the sqlite_sequence table (and user the greater of that or max(rowid)), without doesn't so it uses the highest in the table where the row is being inserted (equivalent to max(rowid)).
With limited testing an overhead of 8-12% was found to be the overhead as per What are the overheads of using AUTOINCREMENT for SQLite on Android?
.

I have tried sqlite3 autoincrement with python3 and sqlalchemy 1.4.
Before enable autoincrement on Integer Primary Key ID column, single insert use about less than 0.1 seconds. After enable this feature, single insert use more than 1.5 seconds.
The performance gap is big.

Related

How to Increment the ID value in Cassandra Table Automatically?

I have a challenge when I am inserting a values in Cassandra table , I have one column name is "ID", this ID column values are increase the automatically like mysql auto_increment column. I think Counter DataType is not suitable in this Scenario. Please any one help me to design the Schema, I don't want use the UUID's also for Replace the ID column
In short I don't believe it is possible. The nature of Cassandra is that it does not do a read before write. With only one exception, lightweight transactions, but all they do is what's called "compare and swap", but there is no way, the autoincrement can be implemented on the server side.
Even with counters, you won't be able to achieve the desired result, if you increase the counter every time you add a record to the table, because you will not know whether the current value (even if it is totally consistent), is a result of an increment from your process, or from a concurrent process.
The only way is to implement this mechanism on the application side.

Can Oracle JDBC driver cache sequence values

Oracle sequence values can be cached on the database side through use of the 'Cache' option when creating sequences. For example
CREATE SEQUENCE sequence_name
CACHE 1000;
will cache up to 1000 values for performance.
My question is whether these values can be cached in the the oracle drivers.
In my application I want to pull back a range of sequence values but don't want to have to go back to the database for each new value. I know Hibernate has similar functionality but I've been unable to find out exactly how it's accomplished.
Any advice would be greatly appreciated.
Thanks,
Matt
No, you can not reserve one batch of numbers in one session (if I understood correctly). Setting correct cache value would very likely make this acceptable from performance perspective.
If you still insist you can can create similar functionality yourself - to be able to reserve at once one range of numbers
As mentioned by igr it seems that the oracle drivers cannot cache sequence values in the java layer.
However, I have got round this by setting a large increment on the sequence and generating key values myself. The increment for a sequence can be set as follows:
CREATE SEQUENCE sequence_name
INCREMENT BY $increment;
In my application, every time sequence.nextval is executed I assume that the previous $increment values are reserved, and can be used as unique key values. This means that the database is hit once for every $increment key values that are generated.
So lets say for example that $increment=5000, and we have a starting sequence value of 1. When sequence.nextval is run for the first time the sequence value is incremented to 5001. I then assume that the values 2..5001 are reserved. Then the 5000 values are used in the application (In my use case, they are used for table primary keys), as soon as they are all used up sequence.nextval is run again to reserve another 5000 values, and the process repeated.
The only real downside I can see to this approach is that there is a tiny risk from someone running ddl to modify the $increment between the time the nextval is run and the $increment is used to generate values. Given this is very unlikely, and in my case the ddl will not be updated at runtime, the solution was acceptable.
I realise this doesn't directly answer the question I posed but hopefully it'll be useful to someone else.
Thanks,
Matt.

Why no primary key

I have inherited a datababase with tables that lack primary keys. It's an OLTP database. One of the tables in question has ~300k records, and has no primary key implemented, even though examining the rest of the schema tells me one column is used AS a primary key, ie being replicated in another table, with identical name, etc. ie. This is not an 'end of line' table
This database also does not implement FKs.
My question is - is there ANY valid reason for a table (in Oracle for that matter) NOT to have a primary key?
I think PK is mandatory for almost all cases. Lots of reasons will exist but I'll treat some of them.
prevent to insert duplicate rows
rows will be referenced, so it must have a key for it
I saw very few cases make tables without PK (e.g. table for logs).
Not specific to Oracle but I recall reading about one such use-case where mysql was highly customized for a dam (electricity generation) project, I think. The input data from sensors were in the order 100-1000 per second or something. They were using timestamps for each record so didn't need a primary key (like with logs/logging mentioned in another answer here).
So good reasons would be:
Overhead, in the case of high frequency transactions
Necessity or Un-necessity in that case
"Uniqueness" maintained or inferred by application, not by db
In a normalized table, if every record needs to be unique and every field is referenced in other tables, then having a PK additionally adds an index overhead and if the PK would never actually be used in any SQL query (imho, I disagree with this but it's possible). But it should still have a unique index encompassing all the fields.
Bad reasons are infinite :-)
The most frequent bad reason which is actually responsible for the lack of a primary key is when DBs are designed by application/code-developers with little or no DB experience, who want to (or think they should) handle all data constraints in the application.
Any valid reason? I'd say "No"--I'm a database guy--but there are places that insist on using the database as a dumb data store. They usually implement all integrity "constraints" in application code.
Putting integrity constraints into application code isn't usually done to improve performance. In fact, if you built one database that enforces all the known constraints, and you built another with functionally identical constraints only in application code, the first one would almost certainly run rings around the second one.
Instead, application-level constraints usually hope to increase flexibility. (And, in the process, some of the known constraints are usually dropped, which appears to improve performance.) If it becomes inconvenient to enforce certain constraints in order to bulk load some scruffy data, an application programmer can just side-step the application-level constraints for a little while, then clean up the data when it's more convenient.
I'm not a db expert but I remember a conversation with a friend who worked in the Oracle apps dept. who told me that this was done to handle emergencies. If there was a problem in some report being generated which you could fix by putting in a row, db level constraints often stand in your way. They generally implemented things like unique primary keys in the application rather than the database. It was inefficient but enough and for them and much more manageable in case of a disaster recovery scenario.
You need a primary key to enforce uniqueness for a subset of its columns (useful if you need to refer to individual rows). It also speeds up certain queries because of the index associated to it.
If you do not need that index, or that uniqueness constraint, then you may not need a primary key (the index does not come free).
An example that comes to mind are logging tables, that just record some data (that is never updated or queried for individual records).
There is a small overhead when inserting to a table with an index and you need an index if you have a primary key. Downside of course is that finding a row is very costly.

Primary Key Effect on Performance in SQLite

I have an sqlite database used to store information about backup jobs. Each run, it increases approximately 25mb as a result of adding around 32,000 entries to a particular table.
This table is a "map table" used to link certain info to records in another table... and it has a primary key (autoincrement int) that I don't use.
sqlite will reserve 1, 2, 4, or 8 bytes for INT column depending on its value. This table only has 3 additional columns, also of INT type.
I've added indexes to the database on the columns that I use as filters (WHERE) in my queries.
In the presence of indexes, etc. and in the situation described, do primary keys have any useful benefit in terms of performance?
Note: Performance is very, very important to this project - but not if 10ms saved on a 32,000 entry job means an additional 10MB of data!
A primary key index is used to look up a row for a given primary key. It is also used to ensure that the primary key values are unique.
If you search your data using other columns, the primary key index will not be used, and as such will yield no performance benefit. Its mere existence should not have a negative performance impact either, though.
An unnecessary index wastes disk space, and makes INSERT and UPDATE statements execute slower. It should have no negative impact on query performance.
If you really don't use this id what don't you drop this column + primary key? The only reason to keep a non-used primary key id column alive is to make it possible to create a master-detail relation with another table.
Another possibility is to keep the column but to drop the primary key. That will mean that the application has to take care of providing a unique id with every insert statement. Before and after each batch operation you have to check whether this column is still unique. This doesn't work in for instance MySQL and Oracle because of multi concurrency issues but it does work in sqlite.

How does SCN_TO_TIMESTAMP work?

Does the SCN itself encode a timestamp or is it a lookup from some table.
From an AskTom post he explains that the timestamp to +/-3seconds is stored in raw field in smon_scn_time. IS that where the function is going to get the value?
If so, when is that table purged if ever? If so, what triggers that purge?
If it is, does that make it impossible to translate old SCN's to Timestamps?
If it's impossible, then it eliminates any uses of that field that are long term things (read: auditing).
If I put that function in a query, would joining to that table be faster?
If so, anyone know how to covert that Raw column?
The SCN does not encode a time value. I believe it is an autoincrementing number.
I would guess that SMON is inserting a row into SMON_SCN_TIME (or whatever table underlies it) every time it increments the SCN, including the current timestamp.
I queried for the minimum recorded timestamp in several databases and they all go back about 5 days and have a little under 1500 rows in the table. So it is less than the instance lifetime.
I imagine the lower bound on how long the data is kept might be determined by the DB_FLASHBACK_RETENTION_TARGET parameter, which defaults to 1 day.
I would recommend using the function, they've probably provided it so they can change the internals at will.
No idea what the RAW column TIM_SCN_MAP contains, but the TIME_DP and SCN column would appear to give you the mapping.

Resources