Which is better: single global sequence vs. sequence per table? - oracle

In a JPA/Hibernate application using native Oracle sequences, is it better to use a single global sequence like hibernate_sequence, or to define a separate sequence per table?
It seems like a single sequence is easier to maintain, but could make troubleshooting or ad hoc queries harder by making longer ID's.

Although cacheing alleviates it, a sequence can cause contention when multiple sessions require nextvals.
If you have one sequence serving all tables then all inserts on all tables will contend for the same sequence. If you are after performance, one sequence for each table will give less contention.

A single sequence means no matching ids in two tables. In practice I like this because you can never get something when you accidentally query the wrong table. Particularly useful with deletes. It is a little thing but I find it useful.

I would recommend you use a sequence per table. It is just a little cleaner in my book. The standard at my current placement of employment is sequence per table.

Related

Does it matter if I don't stress test Oracle 11g with random data

We need to stress test our Oracle database with about 5 million row inserts. According to our DBA, the only columns that need to be different are the Primary or foreign key...all other columns can be the same. He said if we do that, then Oracle will not do any sort of caching when inserting the data.
I just want to make sure that he is right and that by doing this, the stress testing results would be nearly as accurate as using random data. Thank you for your help.
In a very narrow set of circumstances, the DBA is correct. If ALL your queries are lookups based upon primary and foreign keys, then they may be right. In the past when the rule-based optimizer was king, then the data didn't matter so much. Record counts, yes, but not really the data.
In the real world, though, this is not the case. Do you have any other indexes? Then the data matters. Do you join against things other than primary/foreign keys? Then the data matters. Are your strings all 1 byte or null? I doubt it, and the size of these variable-length fields may affect the amount of IO. Basically, for any non-trivial schema in a non-trivial application, having "realistic" data can be significant. The Oracle optimizer takes into account a large variety of statistics when determining how to perform a query.
Are you REALLY only doing inserts in this load test? That's kinda silly. 5 million records is chump change by modern standards. Desktops do that in seconds, typically. Even simple applications will perform some select to do a lookup, or get a set of records based upon a non-key value.
You seem to be smart enough to evaluate the DBA's statement. If you can get him to put that in writing, sign off on it, and have the responsibility fall on him when his idea of a load test doesn't work as expected, then that's great. It sounds like you're the one responsible for this test, though.
If I were in your shoes, I would want to load test with the most accurate data possible. Copying from a production system or known test set of data is a much better option than "random" and light-years better than "nulls except for the primary key" approach.

How to design database to store and retrieve large item/skill lists in ruby

I plan a role playing game where characters are supposed to carry/use items and train skills. When it comes to store (possibly numerous) items/skills possessed by characters, I can't think of a better way than putting a row for every possible item and skill to each character instantiated. However this seems to be an overkill to me.
To be clear, if this would be an exercise or a small game where total number of items/skills is ~30, I would add an items and a skills hash to the character class and methods to add and remove them like:
def initialize
#inventory = {}
#skills = {}
end
def add_item item, number
#inventory[item] += number
end
Regarding that I would like to store the number of the items and the levels of the skills, what else can I try to handle ~1000 items and ~150 in the inventory and possibly 100 skills?
Plan for Data Retrieval
Generally, it's a good idea to design your database around how you plan to look up and retrieve your data, rather than how you want to store it. A bad design makes your data very expensive to collect from the database.
In your example, having a separate model for each inventory item or skill would be hugely expensive in terms of lookups whenever you want to load a character. Do you really want to do 1,000 lookups every time you load someone's inventory? Probably not.
Denormalize for Speed
You typically want to normalize data that needs to be consistent, and denormalize data that needs to be retrieved/updated quickly. One option might be to serialize your character attributes.
For example, it should be faster to store a serialized Character#inventory_items field than update 100 separate records with a has_many :though or has_and_belongs_to_many relationship. There are certainly trade-offs involved with denormalization in general and serialization in particular, but it might be a good fit for your specific use case.
Consider a Document Database
Character sheets are documents. Unless you need the relational power of a SQL database, a document-oriented database might be a better fit for the data you want to manage. CouchDB seems particularly well-suited for this example, but you should certainly evaluate all your NoSQL options to see if any offer the features you need. Your mileage will definitely vary.
Always Benchmark
Don't take my word for what's optimal. Try a design. Benchmark it. See what the design does with your data. In the end, that's the only thing that matters.
I can't think of a better way than putting a row for every possible item and skill to each character instantiated.
Do characters evolve independently?
Assuming yes, there is no other choice but having each end every relevant combination physically represented in the database.
If not, then you can "reuse" the same set or items/skills for multiple characters, but this is probably not what is going on here.
In any case, relational databases are very good at managing huge amounts of data and the numbers you mentioned don't even qualify as "huge". By correctly utilizing techniques such as clustering, you can ensure that a lookup of all items/skills for a given character is done in a minimal number of I/O operations, i.e. very fast.

Algorithm to organize table into many tables to have less cells?

I'm not really trying to compress a database. This is more of a logical problem. Is there any algorithm that will take a data table with lots of columns and repeated data and find a way to organize it into many tables with ID's in such a way that in total there are as few cells as possible, and that this tables can be then joined with a query to replicate the original one.
I don't care about any particular database engine or language. I just want to see if there is a logical way of doing it. If you will post code, I like C# and SQL but you can use any.
I don't know of any automated algorithms but what you really need to do is heavily normalize your database. This means looking at your actual functional dependencies and breaking this off wherever it makes sense.
The problem with trying to do this in a computer program is that it isn't always clear if your current set of stored data represents all possible problem cases. You can't only look at numbers of values either. It makes little sense to break off booleans into their own table because they have only two values, for example, and this is only the tip of the iceberg.
I think that at this point, nothing is going to beat good ol' patient, hand-crafted normalization. This is something to do by hand. Any possible computer algorithm will either make a total mess of things or make you define the relationships such that you might as well do it all yourself.

Schemas and Indexes & Primary Keys : Differences in lookup performance?

I have a database (running on postgres, precisely) , with the following structure :
user1 (schema)
|
- cars (table)
- airplanes (table, again)
...
user2
|
- cars
- airplanes
...
It's clearly not structurized the way classic relational databes should be, but it "just works" as it is now. As you can see, schemas are like primary keys used to identify entries.
In terms of performance -and nothing else-, is it worth rebuilding it so it'll have traditional primary keys (varchar being their type) & clustered indexes instead of schemas ?
From a Performance Perspective, actually from any perspective surely this is a NIGHTMARE, REBUILD!
Without knowing any more about your situation, I guess the answer would be YES, this would effect performance. Ordinarilly simple queries would not only be much more complicated to write and maintain but the db would produce query plans that were significantly more costly to execute.
Edit: I've worked with, and designed, DB's to handle a lot of data in high workload environments (banking and medical) and I have never seen anything like it; well not in the modern world!
So it looks like each user just has their own schema? Often large, large data sets are split up close to this (more often by customer in a lot of business scenarios). It's often a premature optimization because it introduces additional complexity to your application and a single table with a user column would scale to a reasonable number of rows.
However, whether or not you'll gain any performance from combining into a single schema really is determinate on whether or not you do many cross-user queries (in other words, queries that have to cross schemas/tables) and whether the data in each set of tables is exclusive to that user. If you're replicating data from other user's table to another, then you need to at least redesign those tables into a common schema.
I personally try to avoid a per-schema approach under normal circumstances (due to additional maintenance overhead and app complexity), but it has its place. And I'd hardly call this a "nightmare" unless I'm not understanding something correctly.

Asking for opinions : One sequence for all tables

Here's another one I've been thinking about lately.
We have concluded in earlier discussions : 'natural primary keys are bad, artificial primary keys are good.'
Working with Hibernate earlier I have seen that Hibernate default creates one sequence for all tables. At first I was puzzled by this, why would you do this. But later I saw the advantage that it makes linking parents and children fool proof. Because no tables have the same primary key value, accidentally linking a parent with a table that is not a child gives no results.
Does anyone see any downsides to this approach. I only see one : you cannot have more than 999999999999999999999999999 records in your database.
There could be performance issues with all code getting values from a single sequence - see this Ask Tom thread.
Depending on how sequences are implemented in the database, always hitting the same sequence can be better or worse. When only a few or only one thread request new values, there will be no locking issues. But a bad implementation could cause congestion.
Another problem is rolling back transactions: Sequences don't get rolled back (because someone else might have requested a higher value already), so you can have large gaps which will eat your number space much more quickly than you might expect. OTOH, it will take some time to eat 2 or 4 billion IDs (if you "only" use 32 bit (signed) ints), so it's rarely an issue in practice.
Lastly, you can't easily reset the sequence if you have to. But if you need to have a restarting sequence (say, number of records since midnight), you can tell Hibernate to create/use a second sequence.
A major advantage is that you can uniquely identify objects anywhere in the DB just by the ID. That means you can severely cut down the log information you write in the production system and still find something if you only have the ID.
I prefer having one sequence per table. This comes from one general observation: Some tables ("master tables") have a relatively small row count and have to be kept "forever". For example, the customer table in an ERP.
In other tables ("transaction tables"), many rows are generated perpetually, but after some time, those rows can be archived (or simply deleted). The most extreme example is a tracing table used for debugging purposes; it might grow by hundreds of rows per second, but each row is obsolete after a few days.
Small IDs in the master tables make it easier when working directly on the database, e.g. for debugging purposes.
select * from orders where customerid=415
vs
select * from orders where customerid=89461836571
But this is only a minor issue. The bigger issue is cycling. If you use one sequence for all tables, you simply cannot let it restart. With one sequence per table, you can restart the sequences for the transaction tables when you have archived or deleted the old data. Master tables hardly ever have that problem, since they grow much slower.
I see little value in having only one sequence for all tables. The arguments told so far do not convince me.
There are a couple of disadvantages of using a single sequence:-
reduced concurrency. Handing out the next sequence value involves synchronisation. In practice, I do not think this is likely to be a big problem
Oracle has special code when maintaining btree indexes to detect monotonically increasing values and balance the tree approriately
The CBO might have a better time estimating range queries on the index (if you ever did this) if most values were filled in
An advantage might be that you can determine the order of inserts amongst different tables.
Certainly there are pros and cons to the one-sequence versus one-sequence-per-table approach. Personally I find the ability to assign a truly unique identifier to a row, making each id column a uuid, to be enough of a benefit to outweigh any disadvantages. As Aaron D. succinctly writes:
you can uniquely identify objects anywhere in the DB just by the ID
And, for most applications, due to the way Hibernate3 batches IMPORT statements, this will not be a performance bottleneck unless massive amounts of records are vying for the same db resource (SELECT hibernate_sequence.nextval FROM dual).
Also, this sequence mapping is not supported in the latest release (1.2) of Grails. Though it was supported in Grails 1.1 (!). It now requires subclassing one of the Hibernate dialect classes as a workaround.
For those using Grails/GORM, have a look at this JIRA entry:
Oracle Sequence mappings ignored

Resources