(ASP.NET) How would you go about creating a real-time counter which tracks database changes? - performance

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!

Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.

If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.

I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?

Related

Simulating server-side group and sort in Azure table storage

I have a table to which I add records whenever the user views a particular resource. The key fields are
Username
Resource
Date Viewed
On a history page of my app, I want to present a set number (e.g., top 5) of the user's most recently viewed Resources, but I want to group by Resource, so that if some were viewed several times, only the most recent of each one is shown.
To be clear, if the raw data looked like this:
UserA | ResourceA | Jan 1
UserA | ResourceA | Jan 2
UserA | ResourceB | Jan 3
UserA | ResourceA | Jan 4
...
...only the bottom two records would appear in the history page.
I know you can get server-side chronological sorting by using a string derived from the date in the PartitionKey or RowKey fields.
I also see that you could enable a crude grouping mechanism by using Username and Resource as your PartitionKey and RowKey fields, and then using Insert-or-update, to maintain a table in which you kept pointers for the most recent value for each combination. However, those records wouldn't be sorted chronologically.
Is there any way to design a set of tables so that I can get the data I need without retrieving tons of extra entities and sorting on the client? I'm willing to get elaborate with the design if that's what it takes. Thanks in advance!
First, I would strongly recommend that you read this excellent Azure Storage Table Design Guide: Designing Scalable and Performant Tables document from Storage team.
Yes, I would agree that it is somewhat tricky with Azure Table Storage but it is doable :).
What you have to do is keep multiple copies of the same data. Each copy will serve a different purpose.
Considering the scenario where you want to fetch most recent lines for Resource A and B, here's what your entity structure would look like:
PartitionKey: Date/Time (in Ticks) reversed i.e. DateTime.MaxValue.Ticks - LastAccessedDateTime.Ticks. Reverse ticks is required to that most recent entries will show up on the top of the table.
RowKey: Resource name.
AccessDate: Indicates the last access date/time.
User: Name of the user who accessed that resource.
So when you are interested in just finding out most recently used resources, you could start fetching records from the top.
In short, your data storage approach should be primarily governed by how you want to fetch the data. It would even mean you will have to save the same data multiple times.
UPDATE
As discussed in the comments below, Table Service doesn't directly support Server Side Grouping. This is something that you would need to do on your own. What you could do is create a separate table to store the access counts. As and when the resources are accessed, you basically either insert a new record in that table or update the count for that resource in that table.
Assuming you're always interested in finding out resource access count within a date/time range, here's what your entity structure would look like:
PartitionKey: Date/Time (in Ticks). The precision would depend on your reporting requirement. For example, if you want to maintain access counts by day then your precision would be a day.
RowKey: Resource name.
AccessCount: This field will constantly update as and when a resource is accessed.
LastAccessDateTime: This field will denote when a resource was last accessed.
For updating access counts, I would recommend that you make use of a background process. Basically in this approach, as a resource is accessed you add a message in a queue. This message will have resource name and date/time resource was last accessed. Then have a background process poll this queue and fetch messages. As the messages are received, you first get the current count and last access date/time for that resource. If no records are found, you simply insert a record in this table with count as 1. If a record is found then you compare the date/time from the table with the date/time sent in the message. If the date/time from the table is smaller than the date/time sent in the message, you update both count (increase that by 1) and last access date/time. If the date/time from the table is more than the date/time sent in the message, you only update the count.
Now to find most accessed resources in a time span, you simply query this table. Assuming there are limited number of resources (say in 100s), you can get this information from the table with at least 1 request. Since you're dealing with small amount of data, you can simply download this data on the client side and order it anyway you see fit. However to see the access details for a particular resource, you would have to fetch detailed data (1000 entities at a time).
Part of your brain might still be unconsciously trapped in relational-table design paradigms, I'm still getting to grips with that issue myself.
Rather than think of table storage as a database table (with the "query-ability" that goes with it) try visualizing it in more simple (dumb) terms.
A design problem I'm working on now is storing financial transaction data, and I want to know what the total $ amount of these transactions are. Because Azure table storage doesn't (yet?) offer aggregate functions I can't simply go .Sum(). To get around that I'm going to:
Sum the values of the transactions in my app before I pass them to azure.
I'll then pass that the result of the sum into azure as a separate piece of information, called RunningTotal.
Later on I can just return RunningTotal rather than pulling down all the transactions, and I can repeat the process by increment the value of RunningTotal each time i get new transactions.
Of course there are risks to this but the app is a personal one so the risk level is low and manageable, at least as a proof-of-concept.
Perhaps you can use a similar approach for the design of your system: compute useful values in advance. I'll almost be using table storage as a long-term cache rather than a database.

Caching expensive SQL query in memory or in the database?

Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps

nhibernate doesn't get a chance to update?

I am building a small web application, where the user is granted the ability to rate items.
In my application I am using nhibernate and asp.net mvc.
All the rating requests are sent by jquery (ajax/post).
When the user votes an item, I check if the item has been previously voted. If so, I update the last voting value to the new one received. If not, I just add a new rating to my table.
I have noticed something very strange. This works well, but when I click several times really fast something gets screwed up. I get multiple ratings, it seems as if nhibernate doesn't bother checking if the user has previously voted and just returns a false value.
Is this possible? How can I see what's going under the hood?
thank you
You probably have a concurrency problem. I assume that you get a thread and transaction per click. Clicking very fast results in parallel transactions which can't see what others are doing.
You have a typical problem that items which aren't in the database (the new votes) can't be locked.
The solutions are:
Use lock to avoid multiple votes of the same user being stored at the same time. This doesn't work when you have multiple servers (or AppDomains) on the same database, because the lock is restricted to the AppDomain.
Use table locks in the database to lock out the whole votes table that only one transaction can add votes at the same time.
Have you turned on NHibernate logging?
Add the following to the hibernate.config.xml file:
<property name="show_sql">true</property>
The sql generated can be seen in the console or test runner output if you are running unit tests. You can also configure log4net to write NHibernate logging information to file (See https://web.archive.org/web/20110514164829/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/07/01/how-to-configure-log4net-for-use-with-nhibernate.aspx)
Lastly, how are you using NHibernate? Are you using a repository pattern? Its hard to determine what is wrong with your application without some idea of the code.

Separating Demo data in Live system

If we put aside the rights and wrongs of putting demo data into a live system for a minute (that's a whole separate discussion!), we are being asked to store some demo data in our live system so that it can be credibly demonstrated without the appearance of smoke + mirrors (we want to use the same login page for example)
Since I'm sure this is a challenge many other people must have - I'd be interested to know what approaches have people have devised to separating this data so that it doesn't get in the way of day to day operations on their systems?
As I alluded to above, I'm aware that this probably isn't best practice. :-)
Can you instead, segregate the data into a new database, and just redirect your connection strings (they're not hard-coded, right? right?) to point to the demo database. This way, live data isn't tainted, and your code looks identical. We actually do a three tier-deployment system this way, where we do local development, deploy to QC environments that have snapshots of the live data every few months, and then deploy to live when testing is complete.
FWIW, we're looking at using Oracle's row level security / virtual private database feature to seperate the demo data from the rest.
I've often seen it on certain types of live systems.
For example, point of sale systems in a supermarket: cashiers are trained on the production point of sale terminals.
The key is to carefully identify the test or training data. I wouldn't say that there's any explicit best practice for how to model this in a database - it's going to be applicaiton specific.
You really have to carefully define the scope of what is covered by the test/training scenarios. For example, you don't want the training/test transactions to appear in production reports (but you may want to be able to create reports with this data for training/test purposes).
Completely disagree with Joe. Oracle has a tool to do this regardless of implementation. Before I read your answer I was going to say VPD... But that could have an impact on Production.
Remember Every table in a query changes from
SELECT * FROM tableA
to
SELECT * FROM (SELECT * FROM tableA WHERE Data_quality = 'PROD' <or however you do it>
Every table with a policy that is...
So assuming your test data has to span EVERY table, every table will have to have a policy and every table will be filtered before a SQL can begin working.
You can even hide that column from the users. You'll need to write the policy with some deftness if you do. You'll have to create that value based on how the data is inserted and expose the column to certain admin accounts for maintenance.

Designing a Calendar system like Google Calendar [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have to create something similiar to Google Calendar, so I created an events table that contains all the events for a user.
The hard part is handling re-occurring events, the row in the events table has an event_type field that tells you what kind of event it is, since an event can be for a single date only, OR a re-occuring event every x days.
The main design challenge is handling re-occurring events.
When a user views the calendar, using the month's view, how can I display all the events for the given month? The query is going to be tricky, so I thought it would be easier to create another table and create a row for each and every event, including the re-occuring events.
What do you guys think?
I'm tackling exactly this problem, and I had completely spaced iCalendar (rfc 2445) up until reading this thread, so I have no idea how well this will or won't integrate with that. Anyway the design I've come up with so far looks something like this:
You can't possibly store all the instances of a recurring event, at least not before they occur, so I simply have one table that stores the first instance of the event as an actual date, an optional expiration, and nullable repeat_unit and repeat_increment fields to describe the repetition. For single instances the repition fields are null, otherwise the units will be 'day', 'week', 'month', 'year' and increment is simply the multiple of units to add to start date for the next occurrence.
Storing past events only seems advantageous if you need to establish relationships with other entities in your model, and even then it's not necessary to have an explicit "event instance" table in every case. If the other entities already have date/time "instance" data then a foreign key to the event (or join table for a many-to-many) would most likely be sufficient.
To do "change this instance"/"change all future instances", I was planning on just duplicating the events and expiring the stale ones. So to change a single instances, you'd expire the old one at it's last occurrence, make a copy for the new, unique occurrence with the changes and without any repetition, and another copy of the original at the following occurrence that repeats into the future. Changing all future instances is similar, you would just expire the original and make a new copy with the changes and repition details.
The two problems I see with this design so far are:
It makes MWF-type events hard to represent. It's possible, but forces the user to create three separate events that repeat weekly on M,W,F individually, and any changes they want to make will have to be done on each one separately as well. These kind of events aren't particularly useful in my app, but it does leave a wart on the model that makes it less universal than I'd like.
By copying the events to make changes, you break the association between them, which could be useful in some scenarios (or, maybe it would just be occasionally problematic.) The event table could theoretically contain a "copied_from" id field to track where an event originated, but I haven't fully thought through how useful something like that would be. For one thing, parent/child hierarchical relationships are a pain to query from SQL, so the benefits would need to be pretty heavy to outweigh the cost for querying that data. You could use a nested-set instead, I suppose.
Lastly I think it's possible to compute events for a given timespan using straight SQL, but I haven't worked out the exact details and I think the queries usually end up being too cumbersome to be worthwhile. However for the sake of argument, you can use the following expression to compute the difference in months between the given month and year an event:
(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12))
Building on the last example, you could use MOD to determine whether difference in months is the correct multiple:
MOD(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12), repeatIncrement) = 0
Anyway this isn't perfect (it doesn't ignore expired events, doesn't factor in start / end times for the event, etc), so it's only meant as a motivating example. Generally speaking though I think most queries will end up being too complicated. You're probably better off querying for events that occur during a given range, or don't expire before the range, and computing the instances themselves in code rather than SQL. If you really want the database to do the processing then a stored procedure would probably make your life a lot easier.
As previously stated, don't reinvent the wheel, just enhance it.
Checkout VCalendar, it is open source, and comes in PHP, ASP, and ASP.Net (C#)!
Also you could check out Day Pilot which offers a calendar written in Asp.Net 2.0. They offer a lite version that you could check out, and if it works for you, you could purchase a license.
Update (9/30/09):
Unless of course the wheel is broken! Also, you can put a shiny new coat of paint if you like (ie: make a better UI). But at least try to find some foundation to build off of, since the calendar system can be tricky (with Repeating events), and it's been done thousands of times.
Attempting to store each instance of every event seems like it would be really problematic and, well, impossible. If someone creates an event that occurs "every thursday, forever", you clearly cannot store all the future events.
You could try to generate the future events on demand, and populate the future events only when necessary to display them or to send notification about them. However, if you are going to build the "on-demand" generation code anyway, why not use it all the time? Instead of pulling from the event table, and then having to use on-demand event generation to fill in any new events that haven't been added to the table yet, just use the on-demand event generation exclusively. The end result will be the same. With this scheme, you only need to store the start and end dates and the event frequency.
I don't see any way that you can avoid having on-demand event generation, so I can't see the utility in the event table. If you want it for the sake of caching, then I think you're taking the wrong approach. First, it's a poor cache because you can't avoid on-demand event generation anyway. Second, you should probably be caching at a higher level anyway. If you want to cache, then cache generated pages, not events.
As far as making your polling more efficient, if you are only polling every 15 minutes, and your database and/or server can't handle the load, then you are already doomed. There's no way your database will be able to handle users if it can't handle much, much more frequent polling without breaking a sweat.
I would say start with the ical standard. If you use it as your model, then you'll be able to do everything that google calendar, outlook, mac ical (the program), and get virtually instant integration with them.
From there, time to bone up on your ajax and javascript cuz you can't have a flashy web ui with drag drop and multiple calendars without a ton of ajax and javascript.
You should have a start date, end date, and expiration date. Single day events would have the same start date and end date, and allows you to do partial day events as well. As for re-occuring events, then the start and end date would be for the same day, but have different times, then you have an enumeration or table that specifies the repeat frequency (daily, weekly, monthly, etc).
This allows you to say "this event appears every day" for daily, "this event appears on the 2nd day of every week" for weekly, "this event appears on the 5th day of every month" for monthly, "this event appears on the 215th day of every year" for yearly as long as the date is less than the expiration date.
Darren,
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
This is why I wanted to create another table that would expand out all the events/re-occuring events, this way I can simple hit that table and get the users months view of events without doing any complicated querying and business logic, AND it will make polling much more effecient also.
The question is, should this secondary table be for the next day or month? What makes more sense? Since the maximum a user can view is a months view, I am leaning towards a table that writes out all the events for a given month.
(ofcourse I will have to maintain this secondary table for any edits the user might make to the original events table).
ChanChan,
I have designed it with the same sort of functionality actually, but I am just referring to how I will go about storing events, specifically how to handle re-occurring events.
The brute-force-ish but still reasonable way would be to create a new row in your single events table for every instance of the recurring event, all pointing not to the event preceding it in the series but to the first event in the series. This simplifies selecting and/or deleting all elements in a particular series, since you can select based on parent id. It also allows users to delete individual items from a series without affecting the rest of them.
This query gets you the series that starts on element 3:
SELECT * FROM events WHERE id = 3 OR parentid = 3
To get all items for this month, assuming you'd have a start date and an end date in your events table, all you'd have to do is:
SELECT * FROM events WHERE startdate >= '2008-08-01' AND enddate <= '2008-08-31'
Handling the creation/modification of series programmatically wouldn't be very difficult, but it really would depend on the feature set you want to provide and how you think it'll be used. If you want to differentiate between series and events, you could have a separate series table and a nullable series_id on your events, allowing you the freedom to muck about with individual events while still retaining control over your series.
From past experience I would create a new record for each occurring event and then have a column which references the previous event so you can track all events in a series.
This has two advantages:
No complicated routines to work out the next event date
Individual occurrences can be edited without effecting the rest
Hope this gives you some food for thought :)
I have to agree with #ChanChan on reading the ical spec for how to store these things. There is no easy way to handle recurrences, especially ones that have exceptions. I've built and rebuilt and rebuilt a database to handle this, and I keep coming back to ical.
It's not a bad idea to create a subordinate table, depending on use cases. The algorithm for calculating exactly when occurrences . . . um, occur . . . can indeed be quite complex. There's no getting away from running it, but it's worth considering caching the results.
#GateKiller
I hadn't thought of the case where you edit individual occurrences. It makes sense you would store the occurrences separately in that case.
When you do that, though, how far in the future do you store events? Do you pick an arbitrary date? Do you auto-generate the new occurrences the first time a user browses out into future months/years?
Also, how do you handle the case where the user wants to edit the whole series. "We've had a meeting every Tuesday morning at 10:30 but we're going to start meeting on Wednesday at 8"
I think I understand your second paragraph to mean you are considering a second events table that has a row for each occurrence of an event. I would avoid that.
Re-occurring events should have a start date and a stop date (which could be Null for events that continue every X days "forever") You'll have to decide what kinds of frequency you want to allow -- every X days, the Nth day of each month, every given weekday, etc.
I'd probably tend toward two tables - one for one time events and a second for recurring events. You'll have to query and display the two separately.
If I were going to tackle this (and I'd try as hard as I can to avoid reinventing this wheel) I'd look for open-source libraries or, at the very least, open source projects with Calendars that you can look at. Any recommendations guys?
undefined wrote:
…this table will be hit pretty
hard, especially if I am going to be
sending out emails to remind people of
their events (events can be for
specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Create a table for the notifications. Poll only it.
Update the notification table when events (recurring or otherwise) are updated.
EDIT: A database View might not violate normal forms, if that's a concern. But, you'll probably want to track which notifications were sent and which have not yet been sent somewhere anyway.
Derek Park,
I would be creating each and every instance of an event in a table, and this table would be regenerated every month (so any event that was set to reoccurr 'forever' would be regenerated one month in advance using a windows service or maybe at the sql server level).
The polling won't only be done every 15 minutes, that might only be for polls related to email notifications. When someone wants to view their events for a month, I will have to fetech all their events, and re-occuring events and figure out which events to display (since a re-occuring event might have been created 6 months ago, but relates to a month the user is viewing).
Zack, i'm not too concerned with having a perfectly normalized database, the fact that I'm thinking of creating a secondary table is already breaking one of the rules hehe. My core database tables are following 'the rules', but I don't mind creating secondary tables/columns at times when it benefits things performance wise.
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Databases do exception jobs of handling sets of data, so i wouldn't be too worried about that. What you should do is use this as your primary table, and then as events expire then move them into another table (like an archive).
The next thing is you want to try is to query the db as little as possible, so move the information into a caching tier (like velocity) and just persist data to the database.
Then, you can partition the information across multiple databases for scaling purposes. ie users 1-10000 calendars exist on server 1, 10001 - 20000 exist on server 2, etc.
That's how i would scale a solution like this, but i still think the original solution i proposed is the way to go, it's just how you scale it that becomes the question.
The Ra-Ajax Calendar starter-kit features a sample of handling the RenderDate event which can modify the dates of specifically. Though the "recurring events" is more of an algorithmic thing and here I doubt very few calendars will help you much...
If anyone is doing Ruby there's a great library Runt that does this kind of thing. Worth checking out. http://runt.rubyforge.org/

Resources