Joomla 2.5 How many possible categories - joomla

I googled but could only find answers for the max amount of articles.
Question: short Version:
How many possible categories (with subcategories) can Joomla 2.5 handle on a shared host. Which problems do I have to expect?
Question: long Version:
I m building an website for architects. The content structure looks like this
HOUSES
Architect A
Project 1
Project 2
...
Architect B
Project 1
Project 2
...
Places
Architect C
Project 1
Project 2
...
Architect D
Project 1
Project 2
...
And so on. The most obvious would be to have HOUSES and PLACES as Categories. Architect A, Architect B ... as subcategories and the Projects as articles. This would on the one hand keep the ability to use Joomlas Blog View etc. and not use third party CCK extensions but on the other hand this would probably cause 300 and more categories.
Thanks for your always great input,
Tony

You're looking at 2147483647 possible categories in terms of the #__categories table.
In Joomla! 2.5's definition for the Categories table you will find:
CREATE TABLE `#__categories` (
`id` int(11) NOT NULL auto_increment,
`asset_id`
<snip ... >
`language` char(7) NOT NULL,
PRIMARY KEY (`id`),
KEY `cat_idx` (`extension`,`published`,`access`),
KEY `idx_access` (`access`),
KEY `idx_checkout` (`checked_out`),
KEY `idx_path` (`path`),
KEY `idx_left_right` (`lft`,`rgt`),
KEY `idx_alias` (`alias`),
INDEX `idx_language` (`language`)
) DEFAULT CHARSET=utf8;
As you can see the primary key is defined as an int so the max value on MySQL of an signed INT (2147483647) and default starting point of 1 results in just over 2.1 billion categories. See this *Note about AUTO_INCREMENT on MySQL
Note
There can be only one AUTO_INCREMENT column per table, it must be
indexed, and it cannot have a DEFAULT value. An AUTO_INCREMENT column
works properly only if it contains only positive values. Inserting a
negative number is regarded as inserting a very large positive number.
This is done to avoid precision problems when numbers “wrap” over from
positive to negative and also to ensure that you do not accidentally
get an AUTO_INCREMENT column that contains 0.
On a shared host you will run out of database space long before you reach this limit of the #__categories primary key — the rest of the record in the categories table specifies about 1300 times more space than the primary key uses. (roughly)
So, you're more likely to have normal content affect the hosting limits than just the categories table.

The only limits are theoretical and largely depend on your hosting environment. Just know that the more you add, the more memory will be required to load category data where it is used, and can complicate the user interface depending on how much is visible at once.

See some measurements done in a test site of mine (J2.5.x , did not test with J3.x but it should be the same):
A common PHP memory limit per script execution is 40MBs (server default)
You can expect to hit it with about 800 categories
and thus get a memory exhausted fatal error
(permission checking will take up 26MBs leaving the rest memory for other stuff ...)
that is in places where category tree is created and permissions are checked
e.g. Joomla article Form
Note:
if you are super admin then category permission checking will not take up ANY memory
Note:
with a memory limit of 128MBs you can get to have up to (about) 3500 categories
Also with 5000+ or 10000+ categories you start having some small or medium performance problems, but that is really too many categories

Related

Magento reindexing of indexes database table name?

Which tables are connected with the process of reindexing of index in magento.
Please share any documents available for the same.
Can't take credit for this as it is taken from original post at: Can someone explain Magentos Indexing feature in detail?
Magento's indexing is only similar to database-level indexing in spirit. As Anton states, it is a process of denormalization to allow faster operation of a site. Let me try to explain some of the thoughts behind the Magento database structure and why it makes indexing necessary to operate at speed.
In a more "typical" MySQL database, a table for storing catalog products would be structured something like this:
PRODUCT:
product_id INT
sku VARCHAR
name VARCHAR
size VARCHAR
longdesc VARCHAR
shortdesc VARCHAR
... etc ...
This is fast for retrieval, but it leaves a fundamental problem for a piece of eCommerce software: what do you do when you want to add more attributes? What if you sell toys, and rather than a size column, you need age_range? Well, you could add another column, but it should be clear that in a large store (think Walmart, for instance), this would result in rows that are 90% empty and attempting to maintenance new attributes is nigh impossible.
To combat this problem, Magento splits tables into smaller units. I don't want to recreate the entire EAV system in this answer, so please accept this simplified model:
PRODUCT:
product_id INT
sku VARCHAR
PRODUCT_ATTRIBUTE_VALUES
product_id INT
attribute_id INT
value MISC
PRODUCT_ATTRIBUTES
attribute_id
name
Now it's possible to add attributes at will by entering new values into product_attributes and then putting adjoining records into product_attribute_values. This is basically what Magento does (with a little more respect for datatypes than I've displayed here). In fact, now there's no reason for two products to have identical fields at all, so we can create entire product types with different sets of attributes!
However, this flexibility comes at a cost. If I want to find the color of a shirt in my system (a trivial example), I need to find:
The product_id of the item (in the product table)
The attribute_id for color (in the attribute table)
Finally, the actual value (in the attribute_values table)
Magento used to work like this, but it was dead slow. So, to allow better performance, they made a compromise: once the shop owner has defined the attributes they want, go ahead and generate the big table from the beginning. When something changes, nuke it from space and generate it over again. That way, data is stored primarily in our nice flexible format, but queried from a single table.
These resulting lookup tables are the Magento "indexes". When you re-index, you are blowing up the old table and generating it again.

Sql Server heavily queried Table - should I store secondary info (html text) in another table

The Overview:
I have a table "category" that is for the most part used to categorise products and currently looks like this:
CREATE TABLE [dbo].[Category]
(
CategoryId int IDENTITY(1,1) NOT NULL,
CategoryNode hierarchyid NOT NULL UNIQUE,
CategoryString AS CategoryNode.ToString() PERSISTED,
CategoryLevel AS CategoryNode.GetLevel() PERSISTED,
CategoryTitle varchar(50) NOT NULL,
IsActive bit NOT NULL DEFAULT 1
)
This table is heavily queried to display the category hierarchy on a shopping website (typically every page view) and can have a substantial number of items.
I'm using the Entity Framework in my data layer.
The Question:
I have a need to add what could potentially be a fairly large "description" which could come in the form of the entire contents of a web-page and I'm wondering whether I should store this in a related table rather than adding it to the existing category table given that the entity framework will drag the "description" column out of the database 100% of the time when 99.5% of the time I'll only want the CategoryTitle and CategoryId.
Typically I wouldn't worry about the overhead of the Entity Framework, but in the case I think it might be important to take it into consideration. I could work around this with a view or a complex type from a stored proc, but this means a lot of refactoring that I'd prefer to avoid.
I'm just interested to know if anyone has any thoughts, suggestions or a desire to slap my wrists in relation to this scenario...
EDIT:
I should add that the reason I'm hesitating to set up a secondary table is because I don't like the idea of adding an additional table that has a 1 to 1 relationship with the Category table - it seems somewhat pointless. But I'm also not a DBA so I'm not sure whether this is an acceptable practice or not.
You could put your column in the table and then create an index covering all other columns. That way the index will be used when you do all lookups you do with your current schema.
The key word for this construction is Covering Index: http://en.m.wikipedia.org/wiki/Database_index#Covering_index
I would store in a different table for the simple reason to not increase the size of a record in Category table. An increase in record size due to such a VARCHAR column will reduce the number of records that can fit a given disk page (typically of size 4KB), thereby increasing the number of pages to fetch to main memory to search, increasing the number of disk accesses, affecting the query execution times.
I would store this in a different table (i.e. vertically partition the category table into most frequently accessed columns and not-so-frequently used columns), and define a OneToOne relationship at the application layer with the entity that contains the not-so-frequently used column, as a member in the main Category entity, set the fetch type to LAZY.

SQL Azure and Membership Provider Tenant ID

What might be a good way to introduce BIGINT into the ASP.NET Membership functionality to reference users uniquely and to use that BIGINT field as a tenant_id? It would be perfect to keep the existing functionality generating UserIds in the form of GUIDs and not to implement a membership provider from ground zero. Since application will be running on multiple servers, the BIGINT tenant_id must be unique and it should not depend on some central authority generating these IDs. It will be easy to use these tenant_id with a SPLIT AT command down the road which will allow bucketing users into new federated members. Any thoughts on this?
Thanks
You can use bigint. But you may have to modify all stored procedures that rely on user ID. Making ID global unique is usually not a problem. As long as the ID is the primary key, database will force it to be unique. Otherwise you will get errors when inserting new data (in that case, you can modify ID and retry).
So the most important difference is you may need to modify stored procedures. You have a choice here. If you use GUID, you don't need to do anything. But it may be difficult to predict how to split the federation to balance queries. As pointed out in another thread (http://stackoverflow.com/questions/10885768/sql-azure-split-on-uniqueidentifier-guid/10890552#comment14211028_10890552), you can sort existing data at the mid point. But you don't know future data will be inserted in which federation. There's a potential risk that federations will become unbalanced, and you may need to merge and split them at a regular interval to keep them in shape.
By using bigint, you have better control over the key. For example, you have two federations. The first has ID from 1 to 10000, and the second has ID from 10001 to 20000. When creating a new user, you first check how many records are in each federation. Suppose federation 1 has 500 records and federation 2 has 1000 records, to balance the load, you choose to insert to federation 1, so you choose an ID between 1 and 10000. But using bigint, you may need to do more work to modify stored procedures.

Implementing User Defined Fields

I am creating a laboratory database which analyzes a variety of samples from a variety of locations. Some locations want their own reference number (or other attributes) kept with the sample.
How should I represent the columns which only apply to a subset of my samples?
Option 1:
Create a separate table for each unique set of attributes?
SAMPLE_BOILER: sample_id (FK), tank_number, boiler_temp, lot_number
SAMPLE_ACID: sample_id (FK), vial_number
This option seems too tedious, especially as the system grows.
Option 1a: Class table inheritance (link): Tree with common fields in internal node/table
Option 1b: Concrete table inheritance (link): Tree with common fields in leaf node/table
Option 2: Put every attribute which applies to any sample into the SAMPLE table.
Most columns of each entry would most likely be NULL, however all of the fields are stored together.
Option 3: Create _VALUE_ tables for each Oracle data type used.
This option is far more complex. Getting all of the attributes for a sample requires accessing all of the tables below. However, the system can expand dynamically without separate tables for each new sample type.
SAMPLE:
sample_id*
sample_template_id (FK)
SAMPLE_TEMPLATE:
sample_template_id*
version *
status
date_created
name
SAMPLE_ATTR_OF
sample_template_id* (FK)
sample_attribute_id* (FK)
SAMPLE_ATTRIBUTE:
sample_attribute_id*
name
description
SAMPLE_NUMBER:
sample_id* (FK)
sample_attribute_id (FK)
value
SAMPLE_DATE:
sample_id* (FK)
sample_attribute_id (FK)
value
Option 4: (Add your own option)
To help with Googling, your third option looks a little like the Entity-Attribute-Value pattern, which has been discussed on StackOverflow before although often critically.
As others have suggested, if at all possible (eg: once the system is up and running, few new attributes will appear), you should use your relational database in a conventional manner with tables as types and columns as attributes - your option 1. The initial setup pain will be worth it later as your database gets to work the way it was designed to.
Another thing to consider: are you tied to Oracle? If not, there are non-relational databases out there like CouchDB that aren't constrained by up-front schemas in the same way as relational databases are.
Edit: you've asked about handling new attributes under option 1 (now 1a and 1b in the question)...
If option 1 is a suitable solution, there are sufficiently few new attributes that the overhead of altering the database schema to accommodate them is acceptable, so...
you'll be writing database scripts to alter tables and add columns, so the provision of a default value can be handled easily in these scripts.
Of the two 1 options (1a, 1b), my personal preference would be concrete table inheritance (1b):
It's the simplest thing that works;
It requires fewer joins for any given query;
Updates are simpler as you only write to one table (no FK relationship to maintain).
Although either of these first options is a better solution than the others, and there's nothing wrong with the class table inheritance method if that's what you'd prefer.
It all comes down to how often genuinely new attributes will appear.
If the answer is "rarely" then the occasional schema update can cope.
If the answer is "a lot" then the relational DB model (which has fixed schemas baked-in) isn't the best tool for the job, so solutions that incorporate it (entity-attribute-value, XML columns and so on) will always seem a little laboured.
Good luck, and let us know how you solve this problem - it's a common issue that people run into.
Option 1, except that it's not a separate table for each set of attributes: create a separate table for each sample source.
i.e. from your examples: samples from a boiler will have tank number, boiler temp, lot number; acid samples have vial number.
You say this is tedious; but I suggest that the more work you put into gathering and encoding the meaning of the data now will pay off huge dividends later - you'll save in the long term because your reports will be easier to write, understand and maintain. Those guys from the boiler room will ask "we need to know the total of X for tank grouped by this set of boiler temperature ranges" and you'll say "no prob, give me half an hour" because you've done the hard yards already.
Option 2 would be my fall-back option if Option 1 turns out to be overkill. You'll still want to analyse what fields are needed, what their datatypes and constraints are.
Option 4 is to use a combination of options 1 and 2. You may find some attributes are shared among a lot of sample types, and it might make sense for these attributes to live in the main sample table; whereas other attributes will be very specific to certain sample types.
You should really go with Option 1. Although it is more tedious to create, Option 2 and 3 will bite you back when trying to query you data. The queries will become more complex.
In fact, the most important part of storing the data, is querying it. You haven't mentioned how you are planning to use the data, and this is a big factor in the database design.
As far as I can see, the first option will be most easy to query. If you plan on using reporting tools or an ORM, they will prefer it as well, so you are keeping your options open.
In fact, if you find building the tables tedious, try using an ORM from the start. Good ORMs will help you with creating the tables from the get-go.
I would base your decision on the how you usually see the data. For instance, if you get 5-6 new attributes per day, you're never going to be able to keep up adding new columns. In this case you should create columns for 'standard' attributes and add a key/value layout similar to your 'Option 3'.
If you don't expect to see this, I'd go with Option 1 for now, and modify your design to 'Option 3' only if you get to the point that it is turning into too much work. It could end up that you have 25 attributes added in the first few weeks and then nothing for several months. In which case you'll be glad you didn't do the extra work.
As for Option 2, I generally advise against this as Null in a relational database means the value is 'Unknown', not that it 'doesn't apply' to a specific record. Though I have disagreed on this in the past with people I generally respect, so I wouldn't start any wars over it.
Whatever you do option 3 is horrible, every query will have join the data to create a SAMPLE.
It sounds like you have some generic SAMPLE fields which need to be join with more specific data for the type of sample. Have you considered some user_defined fields.
Example:
SAMPLE_BASE: sample_id(PK), version, status, date_create, name, userdata1, userdata2, userdata3
SAMPLE_BOILER: sample_id (FK), tank_number, boiler_temp, lot_number
This might be a dumb question but what do you need to do with the attribute values? If you only need to display the data then just store them in one field, perhaps in XML or some serialised format.
You could always use a template table to define a sample 'type' and the available fields you display for the purposes of a data entry form.
If you need to filter on them, the only efficient model is option 2. As everyone else is saying the entity-attribute-value style of option 3 is somewhat mental and no real fun to work with. I've tried it myself in the past and once implemented I wished I hadn't bothered.
Try to design your database around how your users need to interact with it (and thus how you need to query it), rather than just modelling the data.
If the set of sample attributes was relatively static then the pragmatic solution that would make your life easier in the long run would be option #2 - these are all attributes of a SAMPLE so they should all be in the same table.
Ok - you could put together a nice object hierarchy of base attributes with various extensions but it would be more trouble than it's worth. Keep it simple. You could always put together a few views of subsets of sample attributes.
I would only go for a variant of your option #3 if the list of sample attributes was very dynamic and you needed your users to be able to create their own fields.
In terms of implementing dynamic user-defined fields then you might first like to read through Tom Kyte's comments to this question. Now, Tom can be pretty insistent in his views but I take from his comments that you have to be very sure that you really need the flexibility for your users to add fields on the fly before you go about doing it. If you really need to do it, then don't create a table for each data type - that's going too far - just store everything in a varchar2 in a standard way and flag each attribute with an appropriate data type.
create table sample (
sample_id integer,
name varchar2(120 char),
constraint pk_sample primary key (sample_id)
);
create table attribute (
attribute_id integer,
name varchar2(120 char) not null,
data_type varchar2(30 char) not null,
constraint pk_attribute primary key (attribute_id)
);
create table sample_attribute (
sample_id integer,
attribute_id integer,
value varchar2(4000 char),
constraint pk_sample_attribute primary key (sample_id, attribute_id)
);
Now... that just looks evil doesn't it? Do you really want to go there?
I work on both a commercial and a home-made system where users have the ability to create their own fields/controls dynamically. This is a simplified version of how it works.
Tables:
Pages
Controls
Values
A page is just a container for one or more controls. It can be given a name.
Controls are linked to pages and represents user input controls.
A control contains what datatype it is (int, string etc) and how it should be represented to the user (textbox, dropdown, checkboxes etc).
Values are the actual data that the users have typed into the controls, a value contains one column for every datatype that it can represent (int, string, etc) and depending on the control type, the relevant column is set with the user input.
There is an additional column in Values which specifies which group the value belong to.
Each time a user fills in a form of controls and clicks save, the values typed into the controls are saved into the same group so that we know that they belong together (incremental counter).
CodeSpeaker,
I like you answer, it's pointing me in the right direction for a similar problem.
But how would you handle drop-downlist values?
I am thinking of a Lookup table of values so that many lookups link to one UserDefinedField.
But I also have another problem to add to the mix. Each field must have multiple linked languages so each value must link to the equivilant value for multiple languages.
Maybe I'm thinking too hard about this as I've got about 6 tables so far.

ASP.NET Membership Provider, User ID GUID, and disk space

I'm currently using the SQL Membership provider for ASP.NET, which uses GUIDs for the User ID. My application has several custom tables that have foreign key relations back to the User table and I'm concerned about the disk space and performance implications of the standard provider's use of GUIDs for user ID.
Has anyone run into space / performance issues related to this and if so are there custom approaches that people have implemented to address this?
Any insight or suggestions would be most appreciated.
Thanks
I doubt you'll have any space issues as a result of using GUIDs rather than INT types for example. One thing I will warn you about is that you might be tempted to create clustered indexes on the GUID columns in the database. DO NOT DO THIS. By default, GUIDs are random, and inserting random data into a column that has a clustered index causes a few issues. Clustered, as you might know, means IN PHYSICAL STORAGE SEQUENCE. So when you insert a new random value (GUID) that row usually has to be inserted into the middle of the table. This can lead to massively fragmented indexes.
My advice would be to create a table that links the GUIDs to INT values (BIGINT if you expect that many users) and then use the INT everywhere else. Like Fermin just said.
Could you not have a custom table which maps the GUID to an integer value which you can then use the integer in custom tables?
UserId guid
FriendlyUserId int //use this as FK in other tables?
If you are using SQL Server 2005, you may want to look at the NewSequentialId() method. Eric Swann provides a good overview of its use with the Membership provider. There is also a nice article on benefits of using sequential GUIDs over the default random ones. Here is a performance comparison excerpt from the article...
[Reads] [Writes] [Leaf Pages] [Avg Page Used] [Avg Fragmentation] [Record Count]
IDENTITY(,) 0 1,683 1,667 98.9% 0.7% 50,000
NEWID() 0 5,386 2,486 69.3% 99.2% 50,000
NEWSEQUENTIALID() 0 1,746 1,725 99.9% 1.0% 50,000

Resources